Predicting Compound Melting Temperatures from Computationally Derived Properties Using Machine Learning

Student: Amy Lin
Table: CHEM3
Experimentation location: School, Home
Regulated Research (Form 1c): No
Project continuation (Form 7): No

Display board image not available

Abstract:

Bibliography/Citations:

[1] Hong Q.J., Ushakov S.V., van de Walle A., Navrotsky A. Melting temperature prediction using a graph neural network model: From ancient minerals to new materials. Proc Natl Acad Sci U S A. 2022 Sep 6;119(36):e2209630119. doi: 10.1073/pnas.2209630119. Epub 2022 Aug 31. PMID: 36044552; PMCID: PMC9457469. https://doi.org/10.1073/pnas.2209630119  

[2] Lee, A., Sarker, S., Saal, J.E. et al. Machine learned synthesizability predictions aided by density functional theory. Commun Mater 3, 73 (2022). https://doi.org/10.1038/s43246-022-00295-7  

[3] Guan P.-W., Viswanathan V. MeltNet: Predicting alloy melting temperature by machine learning. (2020). 

[4] Legrain, F., Carrete, J., van Roekeghem, A., Curtarolo, S. & Mingo, N. How chemical composition alone can predict vibrational free energies and entropies of solids. Chem. Mater. 29, 6220–6227 (2017). 

[5] Kirklin, S. et al. The open quantum materials database (OQMD): assessing the accuracy of DFT formation energies. npj Comput. Mater. 1, 15010 (2015). 

[6] de Jong, M. et al. Charting the complete elastic properties of inorganic crystalline compounds. Sci. Data 2, 150009 (2015). 

[7] The Materials Project. Materials Project. https://legacy.materialsproject.org/  

[8] Pymatgen. pymatgen. https://pymatgen.org/  

[9] Matminer. matminer (Materials Data Mining) - matminer 0.9.0 documentation. https://hackingmaterials.lbl.gov/matminer/  

[10] Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (oqmd). J. Minerals Metals Mater. Soc. 65, 1501–1509 (2013). 


Additional Project Information

Project website: -- No project website --
Additional Resources: -- No resources provided --
Project files:
Project files
 

Research Plan:

  1. Question or Problem being addressed

Melting temperature is a fundamental material property used in a wide variety of scientific disciplines. Structural materials used in machines, buildings, heat shields, and more must have a sufficiently high melting temperature to ensure safe operation, while molten salt used in thermal energy storage needs a sufficiently low melting temperature to be useful. Measuring the melting temperature is an important task in material science. However, compounds with extreme melting temperatures are time-consuming to measure experimentally [1] and such experiments can be dangerous. Although simulations exist for calculating melting temperatures, they are computationally expensive. Only 10% of around 200,000 known inorganic compounds have determined melting temperatures. How to efficiently measure melting temperature is a research question I plan to address. 

 

  1. Goals/expected outcomes/hypothesis

There are two goals for my project. The first is to develop a computational approach that predicts melting temperatures in a low-cost and safe way. I expect that Random Forest Regression (RFR) models can be learned to predict compound melting temperatures from computationally derived properties. The second goal is to determine which material properties, and which mathematical combination of material properties, are most important to determining melting temperature of compounds. I expect to see that the most important properties differ for different compound groups.

 

C. Description in detail of method or procedures

(The following are important and key items that should be included when formulating ANY AND ALL research plans.)

Procedures: Detail all procedures and experimental design to be used for data collection
 

This project gets material properties from public databases. The procedures are detailed as follows:

  1. Gather the chemical compounds, their space groups, and their melting temperatures from a public dataset (see my research paper) to use as training data for building a machine learning model.
  2. Retrieve material properties for each of the compounds from the Materials Project database and Matminer, which are widely recognized databases in computational chemistry. 
  3. Once the basic properties of the materials are acquired, prepare data for building machine learning models:
    1. Because the compound melting temperatures and the material properties come from different databases, the Material ID (MPID) is used to match a compound’s melting temperature to its properties.
    2. Augment data by creating extended features from pairwise products of material properties. This allowed material properties to be used jointly.
    3. Divide data into 6 compound groups based on the elements they contained. This allows detailed analysis on properties specific to each compound group.

 

Data Analysis: Describe the procedures you will use to analyze the data/results that answer research questions or hypotheses

We will then train Random Forest Regression (RFR) models, a type of machine learning model, to predict melting temperature from material properties. We will evaluate the prediction performance of these machine learning models and study the importance of different material properties in computationally determining melting temperature. The steps are detailed below:

  1. Train Random Forest Regression (RFR) models to predict melting temperatures of compounds given their properties.
    1. Train models with only material properties as well as with both material properties and extended features.
    2. Train models with different hyperparameters.
    3. Train separate models specific to compound groups.
    4. Calculate the prediction error for different models.
  2. Analyze the performance of different models.
    1. Study the effect of hyperparameters in the RFR model and find the hyperparameters which gave optimal performance
    2. Study the effect of using extended features and how they change model performance. 
    3. Analyze the feature importance for different compound groups and find the most important features specific to each compound group.

 

 

Questions and Answers

1. What was the major objective of your project and what was your plan to achieve it? 

There are two objectives. The first objective was to find a cost-effective and safe way to determine and predict the melting temperatures of chemical compounds given their material properties. The second objective was to determine which material properties, and which mathematical combination of material properties, are most important to determining melting temperature of compounds.

To achieve this, I first gathered data of chemical compounds and their properties. The properties are computationally determined, which is much more cost-effective and safe than experimentally determining properties. Then, I trained a machine learning model to predict a chemical compound’s melting temperature from its material properties.

 

 a. Was that goal the result of any specific situation, experience, or problem you encountered?  

Last summer, I worked on a project to study the relationship between melting temperatures of binary compounds and material properties. After an initial study, I became aware that there is a large range of compound melting temperatures, and experimentally determining melting temperatures can be costly and potentially dangerous. Only about 10% of the 200,000 known inorganic compounds have known melting temperatures, a property that is crucial in engineering and construction. I became interested in finding out if computational methods can be created to predict melting temperatures of compounds based on material properties.

 

b. Were you trying to solve a problem, answer a question, or test a hypothesis?

I was trying to solve a problem and answer a question. I tried to solve the problem of determining melting temperatures of chemical compounds in a rapid and safe way, and answer the question of what properties or mathematical combinations of properties are most important to determining melting temperature of a compound.

 

 

2. What were the major tasks you had to perform in order to complete your project?

  1. I gathered the chemical compounds, their space groups, and their melting temperatures from a public dataset (see my research paper) to use as training data for building a machine learning model.
  2. I retrieved material properties for each of the compounds from the Materials Project database and Matminer, which are widely recognized databases in computational chemistry. 
  3. Once the basic properties of the materials were acquired, I prepared data for building machine learning models:
    1. Because the compound melting temperatures and the material properties come from different databases, I used the Material ID (MPID) to match a compound’s melting temperature to its properties.
    2. I augmented data by creating extended features from pairwise products of material properties. This allowed material properties to be used jointly.
    3. I divided data into 6 compound groups based on the elements they contained.
  4. I trained Random Forest Regression (RFR) models to predict melting temperatures of compounds given their properties.
    1. I trained models with only material properties as well as with both material properties and extended features.
    2. I also trained models with different hyperparameters.
    3. I trained separate models specific to compound groups.
    4. I calculated the prediction error for different models.
  5. I analyzed the performance of different models.
    1. I studied the effect of hyperparameters in the RFR model and found the hyperparameters which gave optimal performance
    2. I studied the effect of using extended features and how they change model performance. 
    3. I analyzed the feature importance for different compound groups and found the most important features specific to each compound group.

 

 

    a. For teams, describe what each member worked on.

This was not a team project.

 

3. What is new or novel about your project?

My project used computationally determined material properties to predict melting points, including Density-Function Theory (DFT) derived properties as well as element-based properties determined computationally. This is different from the traditional approach which experimentally determines melting temperatures. By using properties that are computationally determined, it eliminates the need for costly and dangerous lab work to measure extreme temperatures. Additionally, I performed data augmentation by creating extended features from material properties. I also grouped compounds according to the elements they contain, and studied the most relevant properties specific to each compound group. 

 

   a. Is there some aspect of your project's objective, or how you achieved it that you haven't done before?

It was my first time studying material properties and its relationship to melting temperature. I learned about a few popular material science databases such as the Materials Project database and Matminer. These extensive databases can be used to study other important material properties.

 

      b. Is your project's objective, or the way you implemented it, different from anything you have seen?

Yes. In the implementation, I collected data of DFT-derived properties and other properties that can be computationally determined, rather than experimentally determining these values, as one of the goals in this project was to create a low-cost and safe method to predict melting temperatures. I also performed feature augmentation to create extended features that improved the model’s prediction accuracy.

 

       c. If you believe your work to be unique in some way, what research have you done to confirm that it is?

I have conducted literature review. To my knowledge, there are no projects that use solely computationally-derived properties to build machine learning models for predicting melting temperatures. In addition, data augmentation by creating extended features was a new idea. It allowed material properties to be jointly used to improve model accuracy.

 

4. What was the most challenging part of completing your project?

The most challenging part of completing my project was gathering data, specifically computationally derived data.

 

      a. What problems did you encounter, and how did you overcome them?

In the beginning, I was familiar with some material properties but not all. I had to study those material properties and find public databases which provide those values. This took me more time than I expected. I could not find a database that gave the complete set of material properties, so I included information from multiple databases. For example, I had to retrieve the melting temperature from one database and the formation energy from another, and find a way to map properties to the corresponding melting temperature.

 

      b. What did you learn from overcoming these problems?

I learned that when conducting research projects, the information and data needed to carry out the work may not be immediately available. I need to be resourceful and creative in gathering the data.

 

5. If you were going to do this project again, are there any things you would you do differently the next time?

If I were to do this project again, I would code the machine learning models in parallel with finding material property data, rather than waiting for all the data to be found before programming the machine learning models. I would manage time better and start some tasks earlier.

 

6. Did working on this project give you any ideas for other projects? 

Yes. One potential future project would be discovering new chemical compounds given a melting temperature and other properties. The insights gained from my current project could be applied to do “reverse-engineering”.

 

7. How did COVID-19 affect the completion of your project?

COVID-19 did not affect the completion of my project.