Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 36 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
36
Dung lượng
2,92 MB
Nội dung
VIETNAM NATIONAL UNIVERSITY, HANOI VIETNAM JAPAN UNIVERSITY NGUYEN VAN QUYEN PREDICTING ELASTIC MODULUS OF MULTI-COMPONENT ALLOYS BY DEEP NEURAL NETWORK MASTER’S THESIS VIETNAM NATIONAL UNIVERSITY, HANOI VIETNAM JAPAN UNIVERSITY NGUYEN VAN QUYEN PREDICTING ELASTIC MODULUS OF MULTI-COMPONENT ALLOYS BY DEEP NEURAL NETWORK MAJOR: INFRASTRUCTURE ENGINEERING CODE: 8900201.04 QTD RESEARCH SUPERVISOR: Prof.Dr.Sci NGUYEN DINH DUC Hanoi, 2022 PLEDGE I have read and understood the plagiarism violations I pledge with personal honor that this research result is my own and does not violate the Regulation on prevention of plagiarism in academic and scientific research activities at VNU Vietnam Japan University (Issued together with Decision No 700/QD-ĐHVN dated 30/9/2021 by the Rector of Vietnam Japan University) Author of the thesis Nguyen Van Quyen ACKNOWLEDGEMENT Firstly, I would like to express my appreciation to the instructor, Professor Nguyen Dinh Duc, who devotedly guided, helped, created all favorable conditions, and regularly encouraged me to complete this thesis I would like to express my deepest thanks to Professor Kato, Professor Takeda, Professor Dao Nhu Mai, Dr Phan Le Binh, Dr Nguyen Tien Dung, and Dr Nguyen Ngoc Vinh from the Infrastructure Engineering Program for always caring and helping, supporting, and giving useful advice during the time I study and complete the thesis In addition, I feel delighted because of the enthusiastic support from the program assistants Bui Hoang Tan, Huong san, and Hoa san, who assisted in studying at Vietnam Japan University In particular, I would like to express my gratitude to Dr Pham Tien Lam, Dr Tran Quoc Quan, Master Vu Minh Anh, and Master Ngo Dinh Dat for giving me valuable suggestions and advice to help me complete my thesis during meetings outside the lecture hall I would like to thank everyone at VJU, my classmates for creating unforgettable memories Besides, I would like to thank my family and friends, who are always with me at difficult times Finally, I would like to express my appreciation for the support of Vingroup for providing me with rewarding opportunities so that I can lead and advance the development of science and technology in Vietnam in the future This thesis was also funded by Vingroup Joint Stock Company, Viet Nam and supported by the Domestic Master/PhD Scholarship Programme of Vingroup Innovation Foundation (VINIF), Vingroup Big Data Institute (VINBIGDATA), code VINIF.2021.ThS.71 The authors are grateful for this support TABLE OF CONTENTS LIST OF TABLES i LIST OF FIGURES ii LIST OF ABBREVIATIONS iii ABSTRACT iv CHAPTER INTRODUCTION 1.1 Background 1.2 Research objectives 1.3 Structure of the thesis CHAPTER LITERATURE REVIEW CHAPTER MODELING & METHODOLOGY 3.1 Data 3.2 Descriptor 10 3.3 Neural network 11 3.4 Methodology 12 CHAPTER RESULTS AND DISCUSSION 17 CHAPTER CONCLUSIONS 21 LIST OF PUBLICATIONS 22 REFERENCES 23 LIST OF TABLES Table 3.1 Structure information and the calculated results that contained in the database Table 4.1 Model performance for elastic constant prediction: Root Mean Squared Error, Mean Absolute Error, Coefficient of Determination estimated from the validation set 17 Table 4.2 K-Fold Cross-Validation 19 i LIST OF FIGURES Figure 3.1 Distribution of calculated Poisson ratio, bulk modulus and shear modulus by DFT calculation Figure 3.2 Elements contained in the dataset The color represents the frequency of elements containing in the system Figure 3.3 Correlation plot of features in the database Figure 3.4 The general learning circle diagram 12 Figure 3.5 The general workflow for current ML schemes, in which a chemical structure is decomposed to a descriptor and a neural network is adopted to extract embedding representation for predicting elastic constant 13 Figure 3.6 Deep neural network architecture for predicting the elastic constant coefficients The input (structures) are converted into 3D tensors of pairwise terms After that, Conv2D layers are applied to the tensors to extract the embedding features The output of these convolutional layers is summed to represent the pairwise interactions of an atom with its neighboring atoms Finally, fully connected layers are applied for predicting 𝐶𝑖𝑗 15 Figure 4.1 Comparison of predicted 𝐶11 (GPa) by using deep neural network and DFT calculation (left) and Mean Absolute Error is used to train the model (learning curves) (right) 17 Figure 4.2 Comparison of predicted(a) bulk modulus (𝐵) and (b) Poisson’s ratio (ν) by DFT calculation and averaged values calculated from predicted 𝐶𝑖𝑗 using deep neural network 18 Figure 4.3 Comparison of predicted 𝐶𝑖𝑗 (GPa) by using deep neural network and DFT calculation 20 ii LIST OF ABBREVIATIONS ML DNN DFT MSE MAE RMSE R2 NN ANN 𝐶𝑖𝑗 𝐵 𝐺 ν Machine Learning Deep Neural Network Density Functional Theory Mean square error Mean absolute error Root mean square error Coefficient of determination Neural Network Artificial Neural Network Elastic Constant Coefficients Bulk Modulus Shear Modulus Poisson’s ratio iii ABSTRACT Abstract: This thesis investigates the application of the machine learning approach to predict elastic constants of multi-component systems Based on the accurate database from Material Project, we propose a model using deep neural network to enhance the prediction of elastic constant coefficients of multi-component alloys The data is generated using density functional theory (DFT) calculations and covers a huge number of fundamental elements in the periodic table The performance of the models is confirmed by validating on the unseen validation set and compared with other studies More remarkably, the performance of the developed model was illustrated by comparing it with the experimental results of relevant multi-component systems Keywords: Deep learning, Material Informatics, Elasticity iv CHAPTER INTRODUCTION 1.1 Background Advanced materials play a dominant role to human well-being and economic security It has applications in multiple industries, including those aimed at addressing challenges in national security, clean energy, and human welfare In all industries, materials are the most important factor to create certain products and details Materials determine the design, construction and cost of the product Elastic constants play a dominant role to determine the mechanical properties of a material A matrix of elastic constant coefficients defines the properties of material that undergoes stress, deform, and then recovers and turns back its original shape after the stress applied Based on elastic constants, we can easily calculate the vast majority of dominant values such as Bulk modulus, Lame constant, Young’s modulus, etc There exist several methods to estimate elastic constant coefficients including the combination of manufacturing and testing, calculation and simulation And interested advanced materials are selected in a technique that depends on experimental study through many experiments which requires a lot of time and money These conventional approaches come at the cost of a huge amount of time and money due to the limitation of experimentation In the 2000 report, the National Research Council reported that taking a new consumer product from the invention to wide-spread production may take from to years, but doing the same for a new material may take from 15 to 20 years And it did not include the time of the invention at the lab scale and the delay that relevant to the process Consequently, it raised a problem that needs to be overcome, time and money requirement and a demand of an efficient path towards target candidate materials design, replacement, and optimization With the development rapidly of science and technology nowadays, machine learning is a promising way to provide a solution With the accessibility to large-scale experimental sources, bigger, faster, and cheaper not only computer but also high-performance computing system, cheap data storage, powerful and well-maintained open-source 𝑥1,𝑝 𝑥2,𝑝 xp = [ ⋮ ], 𝑥𝑁,𝑝 𝑦𝑝 = [𝑦1,𝑝 𝑦2,𝑝 … 𝑦𝐶,𝑝 ] in which the 𝑁 × input column vector and the × 𝐶 output row vector are corresponding to the feature vector of a structure and a flattened array elastic constant 𝐶𝑖𝑗 , respectively To tune the weights, we invoke regression cost function and properly minimize in order to make this approximation hold as well as possible NEURAL NETWORK DESCRIPTOR ELASTIC CONSTANT Cij Figure 3.5 The general workflow for current ML schemes, in which a chemical structure is decomposed to a descriptor and a neural network is adopted to extract embedding representation for predicting elastic constant In the second part of the architecture of the model, the feed-forward neural network is proposed which has as many output units as the target 𝑦𝑝 The elastic constant vector can be expressed in terms of linear relationship as follows: 𝑦𝑝 = 𝐻(𝑥𝑝 ) For instance, if we use one hidden layer, the output of the model can be calculated as follows:𝑦𝑝 = 𝐻(𝑥𝑝 ) = 𝑊2 × 𝑔(𝑊1 × 𝑥𝑝 ), where 𝑊1 and 𝑊2 are the 𝑁 × 𝐶 weight matrix of the hidden and output layers, respectively; 𝑔 is a nonlinear activation function applied on each element of its input matrix; and × notation represents dot product We derive the feature vector, 𝑥𝑝 , of a local structure from the set of pairwise terms which represent the interaction of the center atom and its neighboring atoms Furthermore, we represent the pairwise term using the feature vector 𝑏𝑖𝑗 for the 13 representation of the interaction between atoms i and j The number of neighboring atoms are depended on the local structure We follow Behler method using symmetry basis functions, although our basis functions are learned by the neural network: 𝑥𝑝 = 𝑥𝑝 ({𝑏𝑖𝑗 }) = ∑𝑗 𝑓(𝑏𝑖𝑗 ) 𝑓(𝑏𝑖𝑗 ) is a vector function, and it maps the feature vector of the pairwise terms into the embedding feature vectors with desired dimensions being considered as the basis functions (e.g., say 𝑎𝑖𝑗 ) This function is shared by all the pairs of atoms, and we adopt a deep NN to represent it In this study, we employ a neural network with three hidden layers, and the embedding feature vector can be expressed as follows: 𝑎𝑖𝑗 = 𝑤3 × 𝑔 (𝑤2 × 𝑔(𝑤1 × 𝑏𝑖𝑗 )), where 𝑤1 , 𝑤2 , and 𝑤3 are the weights of hidden layers, and 𝑔 is a non-linear activation function To utilize the common deep learning framework, we define the representation matrix of the local structures as descending- sorted using the distance to the center atoms of the set of their pairwise interaction terms of 𝑏𝑖𝑗 : 𝐵 = (𝑏𝑖1 , 𝑏𝑖2 , … , 𝑏𝑖𝑛 ) , where n is the number of atoms in the chemical environment of atom i Because the number of atoms in the chemical environments is not unique to the local structures, we employ the padding technique to ensure that all the vectors have the same dimensions This indicates that they append zero elements to the end of the shorter vectors Therefore, we can obtain the local structure using a matrix whose number of rows and columns is the maximum number of neighboring atoms and the dimensions of the pairwise terms Finally, we stack the matrices of the local structures to make a 3D input tensor as the representation for a structure The first dimension of the 3D input tensor is the number of atoms in the systems The second dimension is the maximum number of neighboring atoms in the chemical environments, and the third dimension is the number of dimensions of the pairwise vectors We employ the convolutional networks to implement our model as shown in Figure 3.6 The 2D Convolutional layers are applied to extract the hidden basis functions 14 Input layer Hidden layer Output layer Structure C11 C12 C66 Conv2D Conv2D Conv2D ∑ Conv1D Feature extraction Fully connected layer Figure 3.6 Deep neural network architecture for predicting the elastic constant coefficients The input (structures) are converted into 3D tensors of pairwise terms After that, Conv2D layers are applied to the tensors to extract the embedding features The output of these convolutional layers is summed to represent the pairwise interactions of an atom with its neighboring atoms Finally, fully connected layers are applied for predicting 𝐶𝑖𝑗 We designed the feature vectors for describing the pairwise interaction of atom i and j as follows: 𝑏𝑖𝑗 = (𝑟𝑖𝑗 𝑓𝑐 (𝑟𝑖𝑗 ), 𝑓 (𝑟 )), 𝑟𝑖𝑗 𝑐 𝑖𝑗 where 𝑟𝑖𝑗 is the distance of atoms i and j To obtain the embedding feature vector for the pairwise terms, we fed the feature vector to a three-layer perceptron network This network was shared by all the pairwise terms Furthermore, the embedding feature vector for an atom is the sum of the output of the network of its pairwise interactions with the neighboring atoms We adopted 2D convolutional (Conv2D) layers implemented in the TensorFlow/Keras Library [53] with × kernel to extract the embedding vector representing the pairwise terms, 𝑎𝑖𝑗 The number of embedding features was determined by the number of kernels (filters) of the Conv2D layers We used three Conv2D layers with 128 neurons (filters) After a Conv2D layer, the input tensor of a structure transforms into a new tensor that has the same dimensions as the input tensor However, the third is the number of filters of the Conv2D layer To obtain the feature vectors of the local structures, we performed the sum of the third dimension, the output tensor 𝑥𝑝 = ∑𝑗 𝑎𝑖𝑗 Therefore, we could obtain a matrix, X, with the rows being the feature vectors of the local structures Subsequently, we implemented the network for the elastic constant using two fully connected layers 15 with 128 neurons Regarding the final layer, we used nine neurons to represent the elastic constants The mean absolute error can be obtained as: 𝑃 𝐶 𝐿 = ∑ ∑|𝑦̂ 𝑐,𝑝 − 𝑦𝑐,𝑝 | 𝑃 (3) 𝑝=1 𝑐 where L is the loss function for the model, 𝑦𝑐,𝑝 and 𝑦̂ 𝑐,𝑝 are the elastic constant vector of the structure by DFT and by our model, respectively, was used for the loss for training our network Figure 3.6 illustrates the general architecture of our model for predicting the elastic constants, which is a collection of independent convolutional neural networks (CNNs) and two fully connected layers We employed batch normalization (also known as the batch norm) after each convolution layer to reduce the hidden unit values shifting around (covariance shift) We also used the L2-norm regularization to apply penalty on the layer parameters of the layers kernel and Dropout technique to control the overfitting This indicated that we add the L2-norm of the kernels weights to the loss function that the network opti- mizes, and random connections between layers are frozen We examined the rectified linear unit (ReLU) activation functions [54] for all the hidden layers, whereas a linear activation function is applied to the last layer The Adam Optimizer is used for optimizing the neural network 16 CHAPTER RESULTS AND DISCUSSION To evaluate the performance of our deep neural network model for predicting elastic constant coefficients, we adopted the dataset of 1181 structures with DFT-calculated 𝐶𝑖𝑗 The dataset contains multiple component systems spanning over a large chemical space consisting of 44 metallic elements in the periodic table We divided the dataset into training and test sets of 1063 and 118 structures, respectively Figure 4.7 Comparison of predicted 𝐶11 (GPa) by using deep neural network and DFT calculation (left) and Mean Absolute Error is used to train the model (learning curves) (right) Figure 4.1 (right) illustrates the results of the mean absolute error metric (MAE) for the elasticity model It can be observed that the learning curves decrease in both the training and validation sets Figure 4.1 (left) illustrates the comparison of the elastic constant 𝐶11 predicted by deep neural network and the DFT calculated one For both training and validation sets, they appear to be consistent with most of the points on the scatter plot in a straight line cutting through the origin with a slope of one It means that the predicted results have a good agreement compare with DFT results Table 4.2 Model performance for elastic constant prediction: Root Mean Squared Error, Mean Absolute Error, Coefficient of Determination estimated from the validation set 17 𝐶𝑖𝑗 Metric 𝐶𝑖𝑗 𝐶𝑖𝑗 (Exp) 𝐵 ν Revi et al [51] RMSE (GPa) 16.9324 17.7787 36.41 9.4941 0.0104 MAE (GPa) 13.4939 20.1406 23.46 6.3895 0.0081 𝑅2 0.9819 0.9463 0.779 0.9780 0.9747 Table 4.1 summarizes the statistical metrics for evaluating the performance of our model: the root mean square error (RMSE) and mean absolute error (MAE) By using ReLU activation function for reducing the computational cost, we obtained RMSE and MAE for the test set of 16.9324 GPa and 13.4939 GPa, respectively Figure 4.8 Comparison of predicted(a) bulk modulus (𝐵) and (b) Poisson’s ratio (ν) by DFT calculation and averaged values calculated from predicted 𝐶𝑖𝑗 using deep neural network Moreover, our model illustrated a significant improvement, compared to the RF model with an RMSE of 34.61 GPa and coefficient of determination of 0.779 In some cases, due to the number of training data is limited, and the scale of elastic constant is significant big, we can see some outline point It is also mean that the performance is relatively poor, which can potentially be further improved by collecting more training data points 18 Table 4.3 K-Fold Cross-Validation Cross validation score 0.9767 Average 0.9767 0.9981 0.9512 0.9717 0.9938 We also use the K-Fold Cross-Validation for estimating the performance of machine learning models It is a technique to assess how the trained model performs and generalize to an unseen independent data set K is chosen as equal and we obtained the average result of 0.9767 Next, the Voigt-Reuss-Hill equations (i.e., Equations 1) is used to calculate estimate 𝐵𝑉 , 𝐵𝑅 , 𝐺𝑉 and 𝐺𝑅 using the ML-predicted 𝐶𝑖𝑗 constants on a randomly selected 10% test set Subsequently, the average values from both the Voigt and Reuss equations were used to get approximations of 𝐵 and 𝐺 Further, using these averages, ν was calculated based on Equation Finally, the results are visualized in Figure 4.2 to compare the obtained values of 𝐵 and ν by deep neural network and that by DFT calculation The performance for 𝐵 and 𝜈 predictions is examine by several statistical metrics which are also listed in Table 4.1 We obtained the MAE of 6.3895 GPa and 0.0081 for the 𝐵 and 𝜈 predictions, respectively, and the RMSE of 9.4941 GPa and 0.0104 for the 𝐵 and 𝜈 predictions, respectively It indicated a good agreement with the DFT computation 19 Figure 4.9 Comparison of predicted 𝐶𝑖𝑗 (GPa) by using deep neural network and DFT calculation Figure 4.3 illustrate the comparison between predicted 𝐶𝑖𝑗 (GPa) by using deep neural network and DFT calculation It is easy to see that almost element in the elastic tensor illustrated a good agreement compare with DFT calculation 20 CHAPTER CONCLUSIONS In the present work, we proposed a machine learning approach for efficient predictions of the elastic properties The data used in this study is collected from acurated DFTcomputed elastic constants database that spanning a wide range of binary alloy chemistries Most notably, our ML models used deep neural network to automatically extract hidden features for representing a descriptor of a structure, which illustrates a significant improvement, compare with previous study relied on a set of feature or compositionally-averaged features The neural network models were also utilized to calculate the Bulk modulus and Poisson’s ratio, which was consistent with DFT 21 LIST OF PUBLICATIONS Nguyen Van Quyen, Nguyen Van Thanh, Tran Quoc Quan, Nguyen Dinh Duc, Nonlinear forced vibration of sandwich cylindrical panel with negative Poisson’s ratio auxetic honeycombs core and CNTRC face sheets, Thin-Walled Structures, Volume 162, 2021, 107571, ISSN 0263-8231, https://doi.org/10.1016/j.tws.2021.107571 (Elsevier, IF = 4.442) Nguyen Van Quyen, Nguyen Dinh Duc, Vibration and nonlinear dynamic response of nanocomposite multi-layer solar panel resting on elastic foundations, Thin-Walled Structures, Volume 177, 2022, 109412, ISSN 0263-8231, https://doi.org/10.1016/j.tws.2022.109412 (Elsevier, IF = 4.442) Tien-Cuong Nguyen, Van-Quyen Nguyen, Van-Linh Ngo, Quang-Khoat Than, TienLam Pham, Learning hidden chemistry with deep neural networks, Computational Materials Science, Volume 200, 2021, 110784, ISSN 0927-0256, https://doi.org/10.1016/j.commatsci.2021.110784 (Elsevier, IF = 3.3) Van-Quyen Nguyen, Viet-Cuong Nguyen, Tien-Cuong Nguyen, Nguyen-Xuan-Vu Nguyen, Tien-Lam Pham, Pairwise interactions for potential energy surfaces and atomic forces using deep neural networks, Computational Materials Science, Volume 209, 2022, 111379, ISSN 0927-0256, https://doi.org/10.1016/j.commatsci.2022.111379 (Elsevier, IF = 3.3) 22 REFERENCES [1] K Rajan, Materials informatics, Materials Today (2005) 38–45 https://doi.org/https://doi.org/10.1016/S1369-7021(05)71123-8 [2] G Mulholland, S Paradiso, Perspective: Materials informatics across the product lifecycle: Selection, manufacturing, and certification, APL Materials (2016) 53207 https://doi.org/10.1063/1.4945422 [3] D Morgan, R Jacobs, Opportunities and Challenges for Machine Learning in Materials Science, Annual Review of Materials Research 50 (2020) 71–103 https://doi.org/10.1146/annurev-matsci-070218-010015 [4] R Ramprasad, R Batra, G Pilania, A Mannodi-Kanakkithodi, C Kim, Machine Learning and Materials Informatics: Recent Applications and Prospects, Npj Computational Materials (2017) https://doi.org/10.1038/s41524-017-0056-5 [5] K Butler, D Davies, H Cartwright, O Isayev, A Walsh, Machine learning for molecular and materials science, Nature 559 (2018) https://doi.org/10.1038/s41586-018-0337-2 [6] J Schmidt, M.R ~G Marques, S Botti, M.A ~L Marques, Recent advances and applications of machine learning in solid-state materials science, Npj Computational Mathematics (2019) 83 https://doi.org/10.1038/s41524-0190221-0 [7] R Vasudevan, G Pilania, P v Balachandran, Machine learning for materials design and discovery, Journal of Applied Physics 129 (2021) 70401 https://doi.org/10.1063/5.0043300 [8] A Agrawal, A Choudhary, Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science, APL Materials (2016) 53208 https://doi.org/10.1063/1.4946894 [9] C Draxl, M Scheffler, NOMAD: The FAIR concept for big data-driven materials science, MRS Bulletin 43 (2018) 676–682 https://doi.org/10.1557/mrs.2018.208 [10] L Ward, C Wolverton, Atomistic calculations and materials informatics: A review, Current Opinion in Solid State and Materials Science 21 (2017) 167–176 https://doi.org/10.1016/j.cossms.2016.07.002 [11] A Jain, G Hautier, S.P Ong, K Persson, New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships, Journal of Materials Research 31 (2016) 977–994 https://doi.org/10.1557/jmr.2016.80 [12] B Sanchez-Lengeling, A Aspuru-Guzik, Inverse molecular design using machine learning: Generative models for matter engineering, Science (1979) 361 (2018) 360–365 https://doi.org/10.1126/science.aat2663 23 [13] L Chen, G Pilania, R Batra, T.D Huan, C Kim, C Kuenneth, R Ramprasad, Polymer Informatics: Current Status and Critical Next Steps, ArXiv abs/2011.00508 (2021) [14] A Mannodi-Kanakkithodi, G Pilania, T Huan, T Lookman, R Ramprasad, Machine Learning Strategy for Accelerated Design of Polymer Dielectrics, Scientific Reports (2016) https://doi.org/10.1038/srep20952 [15] R Batra, H Dai, T.D Huan, L Chen, C Kim, W.R Gutekunst, L Song, R Ramprasad, Polymers for Extreme Conditions Designed Using Syntax-Directed Variational Autoencoders, Chemistry of Materials 32 (2020) 10489–10500 https://doi.org/10.1021/acs.chemmater.0c03332 [16] T Lookman, P Balachandran, D Xue, R Yuan, Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design, Npj Computational Materials (2019) https://doi.org/10.1038/s41524019-0153-8 [17] F Häse, L.M Roch, A Aspuru-Guzik, Next-Generation Experimentation with Self-Driving Laboratories, Trends in Chemistry (2019) 282–291 https://doi.org/10.1016/j.trechm.2019.02.007 [18] V Tshitoyan, J Dagdelen, L Weston, A Dunn, Z Rong, O Kononova, K Persson, G Ceder, A Jain, Unsupervised word embeddings capture latent knowledge from materials science literature, Nature 571 (2019) 95–98 https://doi.org/10.1038/s41586-019-1335-8 [19] E Kim, K Huang, A Saunders, A McCallum, G Ceder, E Olivetti, Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning, Chemistry of Materials 29 (2017) 9436–9444 https://doi.org/10.1021/acs.chemmater.7b03500 [20] A Jain, S.P Ong, G Hautier, W Chen, W.D Richards, S Dacek, S Cholia, D Gunter, D Skinner, G Ceder, K.A Persson, Commentary: The Materials Project: A materials genome approach to accelerating materials innovation, APL Materials (2013) 11002 https://doi.org/10.1063/1.4812323 [21] S Curtarolo, W Setyawan, S Wang, J Xue, K Yang, R.H Taylor, L.J Nelson, G.L.W Hart, S Sanvito, M Buongiorno-Nardelli, N Mingo, O Levy, AFLOWLIB.ORG: A distributed materials properties repository from highthroughput ab initio calculations, Computational Materials Science 58 (2012) 227–235 https://doi.org/https://doi.org/10.1016/j.commatsci.2012.02.002 [22] Jarvis, The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design, Npj Computational Materials (2020) https://doi.org/10.1038/s41524-020-00440-1 24 [23] C Draxl, M Scheffler, The NOMAD laboratory: from data sharing to artificial intelligence, Journal of Physics: Materials (2019) 36001 https://doi.org/10.1088/2515-7639/ab13bb [24] L Ward, A Dunn, A Faghaninia, N.E.R Zimmermann, S Bajaj, Q Wang, J Montoya, J Chen, K Bystrom, M Dylla, K Chard, M Asta, K.A Persson, G.J Snyder, I Foster, A Jain, Matminer: An open source toolkit for materials data mining, Computational Materials Science 152 (2018) 60–69 https://doi.org/10.1016/j.commatsci.2018.05.018 [25] F Pedregosa, G Varoquaux, A Gramfort, V Michel, B Thirion, O Grisel, M Blondel, P Prettenhofer, R Weiss, V Dubourg, J Vanderplas, A Passos, D Cournapeau, M Brucher, M Perrot, E Duchesnay, G Louppe, Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research 12 (2012) [26] L Himanen, M.O.J Jäger, E v Morooka, F Federici Canova, Y.S Ranawat, D.Z Gao, P Rinke, A.S Foster, DScribe: Library of descriptors for machine learning in materials science, Computer Physics Communications 247 (2020) 106949 https://doi.org/https://doi.org/10.1016/j.cpc.2019.106949 [27] B Blaiszik, L Ward, M Schwarting, J Gaff, R Chard, D Pike, K Chard, I Foster, A data ecosystem to support machine learning in materials science, MRS Communications (2019) 1125–1133 https://doi.org/10.1557/mrc.2019.118 [28] L Ghiringhelli, C Carbogno, S Levchenko, F Mohamed, G Huhs, M Lueders, M Oliveira, M Scheffler, Towards efficient data exchange and sharing for bigdata driven materials science: Metadata and data formats, Npj Computational Materials (2017) https://doi.org/10.1038/s41524-017-0048-5 [29] A Talapatra, B.P Uberuaga, C.R Stanek, G Pilania, A Machine Learning Approach for the Prediction of Formability and Thermodynamic Stability of Single and Double Perovskite Oxides, Chemistry of Materials 33 (2021) 845– 858 https://doi.org/10.1021/acs.chemmater.0c03402 [30] A.C Rajan, A Mishra, S Satsangi, R Vaish, H Mizuseki, K.-R Lee, A.K Singh, Machine-Learning-Assisted Accurate Band Gap Predictions of Functionalized MXene, Chemistry of Materials 30 (2018) 4031–4038 https://doi.org/10.1021/acs.chemmater.8b00686 [31] A Seko, T Maekawa, K Tsuda, I Tanaka, Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids, Phys Rev B 89 (2014) 54303 https://doi.org/10.1103/PhysRevB.89.054303 [32] G Pilania, P v Balachandran, C Kim, T Lookman, Finding New Perovskite Halides via Machine Learning, Frontiers in Materials (2016) https://doi.org/10.3389/fmats.2016.00019 25 [33] G Pilania, C.N Iverson, T Lookman, B.L Marrone, Machine-Learning-Based Predictive Modeling of Glass Transition Temperatures: A Case of Polyhydroxyalkanoate Homopolymers and Copolymers, Journal of Chemical Information and Modeling 59 (2019) 5013–5025 https://doi.org/10.1021/acs.jcim.9b00807 [34] A Seko, A Togo, H Hayashi, K Tsuda, L Chaput, I Tanaka, Prediction of LowThermal-Conductivity Compounds with First-Principles Anharmonic LatticeDynamics Calculations and Bayesian Optimization, Phys Rev Lett 115 (2015) 205901 https://doi.org/10.1103/PhysRevLett.115.205901 [35] M Andersen, S v Levchenko, M Scheffler, K Reuter, Beyond Scaling Relations for the Description of Catalytic Materials, ACS Catalysis (2019) 2752–2759 https://doi.org/10.1021/acscatal.8b04478 [36] B Weng, Z Song, Z Rlong, Q Yan, Q Sun, C Grice, Y Yan, W.-J Yin, Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts, Nature Communications 11 (2020) 3513 https://doi.org/10.1038/s41467-020-17263-9 [37] L Breiman, Random Forests, Machine Learning 45 (2001) 5–32 https://doi.org/10.1023/A:1010933404324 [38] A Furmanchuk, A Agrawal, A Choudhary, Predictive analytics for crystalline materials: bulk modulus, RSC Adv (2016) 95246–95251 https://doi.org/10.1039/C6RA19284J [39] P Gorai, D Gao, B Ortiz, S Miller, S.A Barnett, T Mason, Q Lv, V Stevanović, E.S Toberer, TE Design Lab: A virtual laboratory for thermoelectric material design, Computational Materials Science 112 (2016) 368–376 https://doi.org/https://doi.org/10.1016/j.commatsci.2015.11.006 [40] N Duffy, D Helmbold, Boosting Methods for Regression, Machine Learning 47 (2002) 153–200 https://doi.org/10.1023/A:1013685603443 [41] J.D Evans, F.-X Coudert, Predicting the Mechanical Properties of Zeolite Frameworks by Machine Learning, Chemistry of Materials 29 (2017) 7833–7839 https://doi.org/10.1021/acs.chemmater.7b02532 [42] J Wang, X Yang, Z Zeng, X Zhang, X Zhao, Z Wang, New methods for prediction of elastic constants based on density functional theory combined with machine learning, Computational Materials Science 138 (2017) 135–148 https://doi.org/https://doi.org/10.1016/j.commatsci.2017.06.015 [43] A.K Jain, J Mao, K.M Mohiuddin, Artificial neural networks: a tutorial, Computer (Long Beach Calif) 29 (1996) 31–44 https://doi.org/10.1109/2.485891 [44] H Drucker, C.J.C Burges, L Kaufman, A Smola, V Vapnik, Support Vector Regression Machines, in: M.C Mozer, M Jordan, T Petsche (Eds.), Advances in 26 [45] [46] [47] [48] [49] [50] [51] Neural Information Processing Systems, MIT Press, 1996 https://proceedings.neurips.cc/paper/1996/file/d38901788c533e8286cb6400b40 b386d-Paper.pdf R Wang, S Zeng, X Wang, J Ni, Machine learning for hierarchical prediction of elastic properties in Fe-Cr-Al system, Computational Materials Science 166 (2019) 119–123 https://doi.org/https://doi.org/10.1016/j.commatsci.2019.04.051 C Wen, Y Zhang, C Wang, D Xue, Y Bai, S Antonov, L Dai, T Lookman, Y Su, Machine learning assisted design of high entropy alloys with desired property, Acta Materialia 170 (2019) 109–117 https://doi.org/https://doi.org/10.1016/j.actamat.2019.03.010 U.M Chaudry, K Hamad, T Abuhmed, Machine learning-aided design of aluminum alloys with high performance, Materials Today Communications (2020) 101897 C Zhu, C Li, D Wu, W Ye, S Shi, H Ming, X Zhang, K Zhou, A titanium alloys design method based on high-throughput experiments and machine learning, Journal of Materials Research and Technology 11 (2021) 2336–2353 https://doi.org/https://doi.org/10.1016/j.jmrt.2021.02.055 S Ping Ong, W Davidson Richards, A Jain, G Hautier, M Kocher, S Cholia, D Gunter, V.L Chevrier, K.A Persson, G Ceder, Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis, (2013) https://doi.org/10.1016/j.commatsci.2012.10.028 M de Jong, W Chen, T Angsten, A Jain, R Notestine, A Gamst, M Sluiter, C Krishna Ande, S van der Zwaag, J.J Plata, C Toher, S Curtarolo, G Ceder, K.A Persson, M Asta, Charting the complete elastic properties of inorganic crystalline compounds, Scientific Data (2015) 150009 https://doi.org/10.1038/sdata.2015.9 V Revi, S Kasodariya, A Talapatra, G Pilania, A Alankar, Machine learning elastic constants of multi-component alloys, Computational Materials Science 198 (2021) https://doi.org/10.1016/J.COMMATSCI.2021.110671 27