High Throughput Virtual Screening of Metal-Organic Frameworks for Hydrogen Storage

Student: Zhifei Liu
Table: CHEM4
Experimentation location: School, Home
Regulated Research (Form 1c): No
Project continuation (Form 7): No

Display board image not available



  1. N. L. Rosi et al., Hydrogen Storage in Microporous Metal-Organic Frameworks, Science 300 (5622) 1127-1129 (2003) doi:10.1126/science.1083440
  2. D. Sun et al., An Interweaving MOF with High Hydrogen Uptake, JACS 128 3896-3897 (2006) doi:10.1021/ja058777l
  3. M. Y. Masoomi et al., Mixed-Metal MOFs: Unique Opportunities in Metal-organic Framework Functionality and Design, Angewandte Chemie 131 (43) 15330-15347 (2019) doi:10.1002/ange.201902229
  4. Doe technical targets for onboard hydrogen storage for light-duty vehicles. Energy.gov. (n.d.). https://energy.gov/eere/fuelcells/doe-technical-targets-onboard-hydrogen-storage-light-duty-vehicles
  5. Materials-Based hydrogen storage. (n.d.). Energy.gov.https://www.energy.gov/eere/fuelcells/materials-based-hydrogen-storage
  6. Lü, X., Xie, Z., Wu, X., Li, M., & Cai, W. (2022). Hydrogen storage metal-organic framework classification models based on crystal graph convolutional neural networks. Chemical Engineering Science259 117813. https://doi.org/10.1016/j.ces.2022.117813

Additional Project Information

Project website: -- No project website --
Research paper:
Additional Resources: -- No resources provided --
Project files:
Project files

Research Plan:

A. Question or Problem being addressed

Metal-organic frameworks (MOFs) are a class of material with recurring reticular structures that have complex pore chemistry and high surface area. It is suspected that MOF structures with closely packed crystalline structures with metal coordination centers in close proximity, have maximized and accessible pore surface area and sufficient unblocked adsorption sites with high binding energies would be optimal for hydrogen adsorption. 

The capacity of hydrogen storage using adsorption by MOFs has been extensively investigated, but the optimization of this class of materials has not yet reached the goal set by the Department of Energy by 2025. One of the reasons is that the MOF space is not thoroughly explored due to the high cost of experimental synthesis and validation. This work seeks to utilize a data-driven approach to generate a user-defined hypothetical MOF database and identify top-performing MOF prototypes for hydrogen storage. 

B. Goals/Expected Outcomes/Hypotheses

Goals: We aim to develop a high throughput screening of MOF for hydrogen storage and to identify high-performance MOF structures. 

Hypotheses: By utilizing in silico modeling and computational screening, we hypothesize that the MOF structural space can be significantly enlarged and explored as compared with conventional experimental screening. This will facilitate the MOF design and discovery as well as lead to MOF candidates with hydrogen uptake comparable to or beyond that of contemporary high-performance MOFs. 

Expected Outcomes: In this work, we anticipate the generation of a user-defined MOF database that includes optimized structures, calculated MOF properties, and predicted hydrogen uptake. In addition, the relationship between MOF properties and hydrogen uptake will be further studied which will potentially offer insights to guide experimental synthesis. 

C. Description in detail of methods or procedures

To identify the MOF candidates with top-performing capacity for hydrogen storage, we designed the following workflow that includes the experimental available MOF analysis, hypothetical MOF database generation and structure optimization, and deep-learning-based hydrogen storage capacity prediction. 

Figure 1. Schematic showing the workflow of high throughput screening on MOF database for H2 storage.

Figure 1. Schematic showing the workflow of high throughput screening on MOF database for H2 storage. 

First, we will utilize the CoREMOF 2019 database to identify the major metal clusters among the top-performing experimental MOFs for hydrogen storage. This ensures the synthetic likelihood of the target MOF database and decreases the sample size as well as computational cost. Then, the selected clusters or topologies would be used to generate the hypothetical MOF database using the ToBaCCo code developed by Anderson et al. The linkers within the ToBaCCo were used except the ones containing sulfur atoms. Since the generated structures purely rely on geometry matching, we need to optimize the structures first before proceeding to ML-based prediction of hydrogen storage. We seek to use molecular dynamics simulation for this purpose. The LAMMPS package will be used to implement the structures and the UFF4MOF force field will be used to represent the forces between atoms in MOFs. In addition, we will calculate the geometrical properties of the MOF prototypes for subsequent investigation of the relationships between hydrogen storage performance and MOF properties. 

After the geometry optimization, we then aim to use a deep-learning approach to predict the performances of the generated MOF structures. The MOFTranformer model is selected as it is a pre-trained model that shown exceptional performance for multimodal training tasks with limited sample sizes. Thus, the labeled MOFs in the CoREMOF 2019 database will be used to train and validate the fine-tuned MOFTransformer model, which would be used to predict the performance of the generated MOF structures. From this cycle of high throughput screening, we propose top-performing MOFs for hydrogen storage. In addition, we will investigate the relationship between hydrogen storage capacities and their geometric properties, aiming to provide insights for experimental synthesis. 

In summary, we propose a high throughput screening workflow that enables the MOF generation, optimization, and hydrogen storage performance prediction, which would significantly facilitate the MOF discovery for hydrogen storage purposes and contribute to the clean energy area. 

Questions and Answers

1. What was the major objective of your project and what was your plan to achieve it? 

       a. Was that goal the result of any specific situation, experience, or problem you encountered?  

Difficulties of using hydrogen:

While I am in China, the government is concerned about the emissions of the transportation sector and sought to obstruct the expansion of traditional petrol automobile market. A popular alternative was electric or hybrid vehicles, but news coverage also focused on hydrogen fuelled cars, and the difficulties companies face when developing them. I was interested in this promising fuel, so when I become more learned in this field and was provided the opportunity to engage in a school research program, I decided to investigate the difficulties hydrogen fuels face, and that includes storage, for which MOFs seem a promising candidate. Thus I eventually settled on this topic.


  b. Were you trying to solve a problem, answer a question, or test a hypothesis?    

I am trying to solve the problem that the conventional MOFs cannot yet achieve the functionality goals of hydrogen uptake (in mass of volume percentage). I hope to identify higher-performance MOFs that further approaches this goal.


2. What were the major tasks you had to perform in order to complete your project?

       a. For teams, describe what each member worked on.
A. Computational analysis of the experimental MOFs and their capacity in storing H2
B. The filtering and composition of existing MOFs, the nodes and linkers of which are then used to generate new hypothetical MOF structures by ToBaCCo code.
C. Fine tuning the pretrained MOFTransformer model with the existing MOFs for the purpose of predicting H2 storage capacity of the generated structures.
D. The compilation and analyses of identified high-performance MOFs and the evaluation of their plausibility and possible efficiency.
E. The comparison of this MOF with existing similar structures and the validation of the obtained speculated data.

3. What is new or novel about your project?

       a. Is there some aspect of your project's objective, or how you achieved it that you haven't done before?

I have not conducted research previously in this field, so this is my initial hands-on experience on reticular chemistry. Though I have conducted previous chemistry research projects, this is also the first incorporated with computational simulation and machine learning. I believe my achievements in this project can prepare me for my further career, where computers and machine learning are likely to be essential.

       b. Is your project's objective, or the way you implemented it, different from anything you have seen?

The conventional MOF screening basically relies on experimental testing, which is inefficient in a way that it cannot access the whole MOF space and requires tons of time and cost to discover new MOF materials. One of the novelties of this project is that we use an integrated multiscale workflow that involves hypothetical MOF generation, MD-based optimization, and ML-based prediction of H2 uptake. This screening protocol is expected to be more efficient than experimental screening or even some conventional computational screening. Secondly, we utilize transfer learning by MOFTranformer model to enable MOF property prediction with limited data. This scheme can be applied to other systems when training data are difficult to acquire. Overall, the proposed workflow will vastly accelerate the MOF exploration and further guide experimentalists on MOF synthesis for the application of H2 storage.

       c. If you believe your work to be unique in some way, what research have you done to confirm that it is?

I have not conducted research in this field prior to this project, and I have no proof that my work is unique. However, the results of my research project surpassed the previously reported records, leading my to believe that I have indeed developed a novel method.

4. What was the most challenging part of completing your project?

      a. What problems did you encounter, and how did you overcome them?

The primary challenge of this project is that I must use the model to handle vast quantities of data and generate equally much products, which is quite overwhelmingly computational intensive even for the machine learning model. I thought of ways of narrowing down search space but I must also keep the characteristics stand out and make the results plausible. This requires significant work and intellectual effort.

      b. What did you learn from overcoming these problems?
I learned that it is crucial to consider multiple aspects of a problem while attempting to solve it, especially when maintaining the functionality of a complex system requires multiple conditions to be met. I also learned the repetitive process of trial and error must be conducted in scientific research in order to achieve a satisfactory result.

5. If you were going to do this project again, are there any things you would you do differently the next time?

The current high-throughput screening workflow of MOFs indeed greatly facilitates MOF design, while there are aspects we can improve in future work. First, the MOF generation process purely rely on geometry matching and the topology space is limited to the existing net topologies. One thing to improve this and find more possible MOF structures would be using generative AI models to generate MOF structures. This may open up novel MOF structures that have never been seen before. Secondly, we can use alternative models for the prediction task. The current MOFTransformer model is based on graph representations of MOFs which can become extremely memory intensive when dealing with large datasets. Alternative models, such as MOFormer, can be used to embed MOFs as string representations, which can improve the computation efficiency. Finally, since we narrowed down the MOF space, another thing I would do if I were to do this project again is search for all possible MOFs.

6. Did working on this project give you any ideas for other projects?

The other potential project would be more focused on ML model development. As mentioned earlier, using generative ML models to generate MOF structures has multiple benefits. However, using variational autoencoders or generative diffusion models to design new MOF structures, are rarely realized or optimized due to the complexity of MOFs in its crystal structures. This would be a promising direction to go after, which is related to this project. 

 I also consider work on other functionalities of MOFs and of similar reticular compounds. MOF can also adsorb other solvent and gas molecules, including pollutant gasses such as sulfur dioxide. Adsorption surface can also naturally lead to catalysis, but that requires a greater level of knowledge in surface chemistry and catalytic chemistry. In addition, building high quality and transferable databases for the involvement of machine learning is required, as the current limited data quantity renders this powerful tool inapplicable. Other interesting forms of reticular molecules also exist: for example, it has been reported that covalent organic frameworks exhibit similar properties as well, though counterpart research is less ample.


7. How did COVID-19 affect the completion of your project?


Covid-19 did not have a significant impact on the proceeding of my project, as my project does not yet involve in-person experimentation and is mainly completed either on a server node or on my own local device.