Using support vector machine for materials design

Lu, Wen-Cong; Ji, Xiao-Bo; Li, Min-Jie; Liu, Liang; Yue, Bao-Hua; Zhang, Liang-Miao

doi:10.1007/s40436-013-0025-2

Using support vector machine for materials design

Published: 05 June 2013

Volume 1, pages 151–159, (2013)
Cite this article

Download PDF

Advances in Manufacturing Aims and scope Submit manuscript

Using support vector machine for materials design

Download PDF

Wen-Cong Lu¹,
Xiao-Bo Ji¹,
Min-Jie Li¹,
Liang Liu¹,
Bao-Hua Yue¹ &
…
Liang-Miao Zhang¹

8279 Accesses
38 Citations
Explore all metrics

Abstract

Materials design is the most important and fundamental work on the background of materials genome initiative for global competitiveness proposed by the National Science and Technology Council of America. As far as the methodologies of materials design, besides the thermodynamic and kinetic methods combing databases, both deductive approaches so-called the first principle methods and inductive approaches based on data mining methods are gaining great progress because of their successful applications in materials design. In this paper, support vector machine (SVM), including support vector classification (SVC) and support vector regression (SVR) based on the statistical learning theory (SLT) proposed by Vapnik, is introduced as a relatively new data mining method to meet the different tasks of materials design in our lab. The advantage of using SVM for materials design is discussed based on the applications in the formability of perovskite or BaNiO₃ structure, the prediction of energy gaps of binary compounds, the prediction of sintered cold modulus of sialon-corundum castable, the optimization of electric resistances of VPTC semiconductors and the thickness control of In₂O₃ semiconductor film preparation. The results presented indicate that SVM is an effective modeling tool for the small sizes of sample sets with great potential applications in materials design.

Prediction of the lattice constants of pyrochlore compounds using machine learning

Article 27 June 2022

Database-Driven High-Throughput Calculations and Machine Learning Models for Materials Design

Interpretable machine learning for materials design

Article 12 October 2023

1 Introduction

Over the last several decades, it is a great challenge for scientists to develop, manufacture, and deploy advanced materials as fast as possible. In June of 2011, the materials genome initiative (MGI) for global competitiveness was proposed by the National Science and Technology Council of America for the development of an infrastructure to shorten the materials development cycle. The most important and fundamental goal of MGI is to accelerate materials design through the use of computational capabilities, data management, and an integrated approach to materials science and engineering [1].

In principle, there are two strategies for materials design. One strategy is to start from the first principle, i.e., from quantum mechanics and statistical mechanics, to predict the properties of unknown materials. Although the first principle method has been widely used in materials design [2, 3], up to now, it is still impossible to solve most of complicated problems in materials exploration work by using this strategy solely. The other strategy is to start from the semi-empirical way, i.e., from the known data of some materials to find semi-empirical rules, which can be used to predict the properties of unknown materials. In general, the second strategy is more practicable than the first one in materials design or new materials exploration area, since a variety of data mining methods can be utilized to construct statistical models for a lot of data sets available from scientific experiments [4–9].

In the research of materials design by using data mining methods, principal component analysis (PCA), partial least squares (PLS) and artificial neural networks (ANNs) are very helpful because of their relative good performance, speed, simplicity to construct statistical models [10, 11]. However, ANN may give rise to over-fitting problems [12] (i.e., may lead to good performance in fitting but poor performance in prediction) in treating finite, multivariate data set. At the meanwhile, nonlinear relations can only be modeled in limited way by using PCA or PLS algorithm [13].

In the semi-empirical method of materials design, the data of known materials is usually used as the training set. In most cases, the numbers of available known data in the training sets are rather limited, which means that the data processing tasks usually deals with the problem of small sample size, and hence cause serious over-fitting problem. As an effective way to overcome the problem of over-fitting, support vector machine (SVM) based on statistical learning theory (SLT) has been proposed by Vapnik [14]. SVM has been shown to perform well in various applications including drug design [15, 16], materials design [17–19] and chemistry researches [20].

In new materials exploration work, there are two questions with general significance in need of answers: the first question is “what is the chemical composition of the substance having desirable properties?”, and the second one is “what are the optimal conditions of preparation or production for this material at low cost?”. Since both of these two questions involve with very complicated systems or processes, we have to solve these problems by using some semi-empirical methods. To answer the first question, the relationships between the microscopic structure of materials and their properties need to be addressed. This relationship is usually known as quantitative structure–property relationship (QSPR). To deal with the second question, mathematical models are usually set up for the optimization of the processes.

Based on the problems concerning materials design, the tasks of materials design can be classified into four different categories. The first type of task is to solve the “formability problems”, i.e., to find some mathematical model or criterion for the stability of some unknown substances. The second type of task is the “property prediction”, i.e., to make mathematical models for the structure–property relationships and use these models to predict the properties of new materials (or the inverse problem: to search the unknown new materials with some pre-assigned properties). The third type of task is to solve the “optimization problems”, i.e., to find the conditions for optimizing some properties of certain materials. The last but not the least type of task is to solve the “problem of control”, i.e., to find the mathematical model to control some index of materials within a desired range. Different data mining techniques should be adopted for these different purposes. In this paper, we demonstrate some examples of applying SVM methods including support vector classification (SVC) and support vector regression (SVR) as a relatively new tool to meet the different tasks of materials design in our lab. The advantage of using SVM for materials design is discussed based on the applications presented.

2 Methods of SVM

The foundations of SVM have been developed by Vapnik [14] and are gaining popularity due to many attractive features, and promising empirical performance. In this paper the term SVM will refer to both SVC and SVR methods, which can be used for solving qualitative and quantitative problems respectively [21–25].

2.1 SVC

SVC has been recently proposed as a very effective method for solving classification problems, which can be restricted to consideration of the two-class problem without loss of generality [14, 20]. In this problem the goal is to separate the two classes by a classifier induced from available examples. It is expected that the classifier constructed has good performance on unseen examples, i.e., it generalizes well.

The geometrical interpretation of SVC is that it determines the optimal separating surface, i.e., a hyperplane, which is equidistant from two sets of data points. This hyperplane has many interesting statistical properties as discussed by Vapnik [14]. Consider the problem of separating the set of training vectors belonging to two separate classes, $ (y_{1} ,\varvec{x}_{1} ),(y_{2} ,\varvec{x}_{2} ), \cdots ,(y_{n} ,\varvec{x}_{n} ),\varvec{x} \in \varvec{R}^{m} ,y \in - 1, + 1, $ with a hyperplane

$$ \varvec{w}^{\text{T}} \varvec{x} + b = 0, $$

(1)

where w and b are the weight vector and bias, respectively.

If the training data are linearly separable, then there exists a pair of parameter set (w, b), for which we can write

$$ y_{i} (\varvec{w}^{\text{T}} \varvec{x}_{i} + b) - 1 \ge 0,\quad i = 1,2, \cdots ,l, $$

(2)

$$ \varvec{w}^{\text{T}} \varvec{x} + b \ge + 1,\quad {\text{for all }}\varvec{x} \in \varvec{P}, $$

(3)

$$ \varvec{w}^{\text{T}} \varvec{x} + b \le - 1,\quad {\text{for all }}\varvec{x} \in \varvec{N} $$

(4)

where P is the set of positive sample, and N is the set of negative sample.

The decision rule is

$$ f_{{\varvec{w},b}} (\varvec{x}) = \text{sgn} (\varvec{w}^{\text{T}} \varvec{x} + b). $$

(5)

The pair (w, b) can be rescaled without loss of generality

$$ \mathop {\hbox{min} }\limits_{i = 1,2, \cdots ,l} \left| {\varvec{w}^{T} \varvec{x}_{i} + b} \right| = 1. $$

(6)

The learning problem is hence reformulated as follows. Let us minimize ||w||² subject to the constraints of linear separability. This is equivalent to maximizing the distance, normal to the hyperplane, between the convex hulls of two classes and the optimisation becomes a quadratic programming (QP) problem

$$ \mathop {\text{Min}}\limits_{{\varvec{w},b}} \phi (\varvec{w}) = \frac{1}{2}\left\| \varvec{w} \right\|^{2} , $$

(7)

subject to $ y_{i} (\varvec{w}^{\text{T}} \varvec{x}_{i} + b) \ge 1,\quad i = 1,2, \cdots ,l. $ This problem has global optimum, and the Lagrangian is written as

$$ L(\varvec{w},b,\Uplambda ) = \frac{1}{2}\left\| \varvec{w} \right\|^{2} - \sum\limits_{i = 1}^{t} {\lambda_{i} [y_{i} (\varvec{w}^{\text{T}} \varvec{x}_{i} + b) - 1]} , $$

(8)

where $ \Uplambda = \{ \lambda_{1} ,\lambda_{2} , \cdots ,\lambda_{l} \} $ are the Lagrange multipliers, one for each data point. Hence we can write

$$ F(\Uplambda ) = \sum\limits_{i = 1}^{t} {\lambda_{i} - \frac{1}{2}\left\| \varvec{w} \right\|^{2} = \sum\limits_{i = 1}^{l} {\lambda_{i} - } \frac{1}{2}\sum\limits_{i = 1}^{l} {\sum\limits_{j = 1}^{l} {\lambda_{i} \lambda_{j} {y}_{i} {y}_{j} \varvec{x}_{i}^{T} \varvec{x}_{j} } .} } $$

(9)

The Lagrange multipliers are only non-zero when $ y_{i} (\varvec{w}^{\text{T}} \varvec{x}_{i} + b) = 1 $. Vectors fulfilling this requirement are called support vectors since they lie closest to the separating hyperplane. Then, the optimal separating hyperplane is given as follows

$$ \varvec{w}^{*} = \sum\limits_{i + 1}^{l} {\lambda_{i}^{*} \varvec{x}_{i} y_{i} } , $$

(10)

and the bias is given by

$$ b^{*} = - \frac{1}{2}(\varvec{w}^{*} )^{\text{T}} (\varvec{x}_{\text{s}} + \varvec{x}_{\text{r}} ), $$

(11)

where x _r and x _s are any support vectors from each class satisfying the following equation

$$ y_{\text{r}} = 1,\quad y_{\text{s}} = - 1. $$

(12)

The hard classifier is then

$$ f(\varvec{x}) = \text{sgn} ((\varvec{w}^{*} )^{\text{T}} \varvec{x} + b^{*} ). $$

(13)

In the case where a linear boundary is inappropriate, the SVC can map the input vector, x, into a high dimensional feature space, F. By choosing a non-linear mapping Φ, the SVC constructs an optimal separating hyperplane in this higher dimensional space. Among acceptable mappings are polynomials, radial basis functions and certain sigmoid functions. Then the optimisation problem becomes

$$ W(\alpha ) = \sum\limits_{i = 1}^{l} {\alpha_{i} - \frac{1}{2}\sum\limits_{i,j = 1}^{l} {y_{i} y_{j} } \alpha_{i} \alpha_{j} \left\langle {\varPhi (\varvec{x}_{i} ) \cdot \varPhi (\varvec{x}_{j} )} \right\rangle } . $$

(14)

In this case, the decision function in SVC is as follows

$$ \begin{aligned} g(\varvec{x}) =& \text{sgn} (f(\varvec{x})) = \text{sgn} \left( {\sum\limits_{i \in SV} {\alpha_{i} y_{i} \left\langle {\varPhi (\varvec{x}) \cdot \varPhi (\varvec{x}_{i} )} \right\rangle + b} } \right) \\= & \text{sgn} \left( {\sum\limits_{i \in SV} {\alpha_{i} y_{i} K(\varvec{x},\varvec{x}_{i} ) + b} } \right),\\ \end{aligned} $$

(15)

where the x _i is the set of support vectors and K(x, x _i) is called the kernel function.

2.2 SVR [14, 20]

In SVR, the basic idea is to map the data x into a higher-dimensional feature space F via a nonlinear mapping Φ and then to do linear regression in this space. Therefore, regression approximation addresses the problem of estimating a function based on a given data set $ G = \{ (\varvec{x}_{i} ,d_{i} )\}_{i = 1}^{l} $ (x _i is input vector, and d _i is the desired value). SVR approximates the function in the following form

$$ y = \sum\limits_{i = 1}^{l} {w_{i} \varPhi (\varvec{x}_{i} ) + b} , $$

(16)

where $ \left\{ {\varPhi (\varvec{x}_{i} )} \right\}_{i = 1}^{l} $ is the set of mappings of input features, and $ \left\{ {w_{i} } \right\}_{i = l}^{I} $ and b are coefficients. They are estimated by minimizing the regularized risk function R(C):

$$ R(C) = C\frac{1}{N}\sum\limits_{i = 1}^{N} {L_{\varepsilon } } (d_{i}, y_{i} ) + \frac{1}{2}\left\| \varvec{w} \right\|^{2} , $$

(17)

where

$$ L_{\varepsilon } (d,y) = \left( {\begin{array}{*{20}c} {\left| {d - y} \right| - \varepsilon ,} & {{\text{for }}\left| {d - y} \right| \ge \varepsilon ,} \\ {0 ,} & {{\text{otherwise}}.} \\ \end{array} } \right. $$

(18)

and ε is a prescribed parameter.

In Eq. (17), $ C\frac{1}{N}\sum\nolimits_{i = 1}^{N} {L_{\varepsilon } (d_{i} ,y_{i} )} $ is the so-called empirical error (risk), which is measured by ε-insensitive loss function $ L_{\varepsilon } (d,y) $, which indicates that it does not penalize errors below ε. The second term, $ \frac{1}{2}\left\| \varvec{w} \right\|^{2} , $ is used as a measurement of function flatness. C is a regularized constant determining the tradeoff between the training error and the model flatness. Introduction of slack variables ξ leads Eq. (17) to the following constrained function:

$$ {\text{Min}} R(\varvec{w},\xi ,\xi^{*} ) = \frac{1}{2}\left\| \varvec{w} \right\|^{2} + C^{*} \sum\limits_{i = 1}^{n} {(\xi_{i} + \xi_{i}^{*} )} , $$

(19)

s.t.

$$ \left( \begin{gathered} \varvec{w} \varPhi (x_{i} ) + b - d_{i} \le \varepsilon + \xi_{i} , \hfill \\ d_{i} - \varvec{w}\varPhi (x_{i} ) - b \le \varepsilon + \xi_{i}^{*} , \hfill \\ \xi_{i} ,\xi_{i}^{*} \ge 0. \hfill \\ \end{gathered} \right. $$

(20)

Thus, decision function Eq. (16) becomes the following form:

$$ f(\varvec{x},\alpha ,\alpha^{*} ) = \sum\limits_{i = 1}^{l} {\left( {\alpha_{i} - \alpha_{i}^{*} } \right)K(\varvec{x},\varvec{x}_{i} ) = b.} $$

(21)

In Eq. (21), $ \alpha_{i} ,\alpha_{i}^{*} $ are the introduced Lagrange multipliers. They satisfy the equality: $ \alpha_{i} \cdot \alpha_{i}^{*} = 0,\quad \alpha_{i} \ge 0,\quad i = 1,2, \cdots ,l, $ and are obtained by maximizing the dual form of Eq. (19), which has the following form:

$$ \begin{aligned} w(\alpha ,\alpha^{*} ) & = \sum\limits_{i = 1}^{l} {d_{i} (\alpha_{i} - \alpha_{i}^{*} ) - \varepsilon \sum\limits_{i = 1}^{l} {(\alpha_{i} + \alpha_{i}^{*} )} } \\ \quad - \frac{1}{2}\sum\limits_{i = 1}^{l} {\sum\limits_{j = 1}^{l} {(\alpha_{i} - \alpha_{i}^{*} )(\alpha_{j} - \alpha_{j}^{*} )K(\varvec{x}_{i} ,\varvec{x}_{j} )} } , \\ \end{aligned} $$

(22)

with the following constrains:

$$ \left[ \begin{gathered} 0 \le \alpha_{i} \le C\quad i = 1,2, \cdots ,l, \hfill \\ 0 \le \alpha_{i}^{*} \le C\quad i = 1,2, \cdots ,l, \hfill \\ \sum\limits_{i + 1}^{l} {(\alpha_{i} - \alpha_{i}^{*} )} = 0. \hfill \\ \end{gathered} \right. $$

(23)

Based on the Karush–Kuhn–Tucker (KKT) conditions of quadratic programming, only a number of coefficients $ \alpha_{i} - \alpha_{i}^{*} $ will assume nonzero values, and the data points associated with them could be referred to as support vectors. In Eq. (21), $ K(\varvec{x},\varvec{x}_{i} ) $is the kernel function. The value is equal to the inner product of two vectors x and x _i in the feature space $ \varPhi $(x). That is, $ K(\varvec{x},\varvec{x}_{i} ) = \varPhi (\varvec{x})\varPhi (\varvec{x}_{i} ) $. The elegance of using kernel function lied in the fact that one can deal with feature spaces of arbitrary dimensionality without having to compute the map $ \varPhi $(x) explicitly. Any function that satisfies Mercer’s condition can be used as the kernel function.

2.3 Implementation of SVM

According to the Ref. [14], the SVM software package ChemSVM including SVC and SVR has been programmed in our lab. The free version of ChemSVM can be downloaded on the website of Laboratory of Computational Chemistry in Shanghai University (http://chemdata.shu.edu.cn:8080/MyLab/Lab/download.jsp). The validation of the software has also been performed in the applications of chemistry [20].

3 Applications

3.1 SVC applied to the formability of perovskite or BaNiO₃ structure

The most exciting achievement of materials research is to find some new compound (or new phases) with specified structure and outstanding properties. In this work, the materials design problems of compounds with perovskite-type structures or BaNiO₃ structure will be discussed based on SVC model.

There are numerous complex oxides or halides with general formula ABX ₃ (X = oxygen or halogen) having perovskite-type crystal structure and outstanding functional properties [26]. Since 1945, when the ferroelectric properties of barium titanate were discovered, a series of complex oxides and complex halides with perovskite-type structure have been found to be valuable functional materials. In recent years, searching new complex oxides and complex halides with perovskie-type structure has become an active research field of new materials exploration.

The crystal structure of compounds with ideal perovskite structure is illustrated in Fig. 1. It is the structure of a unit cell of SrTiO₃ crystal. In this structure, tetravalent Ti⁴⁺ cation is surrounded by 6 oxygen anions to form octahedral structure, and bivalent Sr²⁺ cation is surrounded by 12 oxygen anions to form cubo-octahedral structure. Based on the understanding of such type of crystal structure, Goldschmidt proposed a famous crystal-chemical criterion of the formability or the stability of perovskite structure for ABX ₃-type compounds:

$$ t = \frac{{R_{a} + R_{x} }}{{\sqrt 2 (R_{b} + R_{x} )}}, $$

(24)

where t is called tolerance factor. R _a, R _b are cationic radii of A, B ions respectively, and R _x is the ionic radius of anion of X. According to Goldschmidt, the cubic perovskite structure is stable only if the value of tolerance factor has an approximate range of 0.8 < t < 0.9, and the distorted perovskite structure can be stable in a somewhat larger range of tolerance factor. This criterion is widely used in the exploration work of new compounds with perovskite-type or perovskite-like type structure. Owing to the accumulation of the crystallographic data of compounds with perovskite structure, it is now widely recognized that the range of tolerance factor for the stability of perovskite structure should be 0.75 < t < 1.00.

Although Goldschmidt’s tolerance factor t is indeed very useful for the exploration of new compounds with perovskite structure, it is only a necessary condition but not the sufficient condition for the formation or the stability of perovskite structure [27]. Many systems having t in the range of 0.75–1.0 do not form perovskite-type compound.

For example, MnSiO₃ has t = 0.856, but it has CdGeO₃ structure; RbMnCl₃ has t = 0.88, but it has hexagonal BaTiO₃ structure. NaI-MgI₂ system has t = 0.826, but it has no intermediate compound at all. Therefore it is desirable to investigate the complementary conditions for the formability of perovskite structure, in order to help the computerized materials design for new materials with perovskite structure. Atomic parameters and SVM technique can be used for this purpose.

Since BX ₆ octahedra and AX ₁₂ cubo-octahedra are the basic sub-structures of perovskite lattice, as shown in Fig. 1, it is easy to see that the stability of BX ₆ octahedra and AX ₁₂ cubo-octahedra are also necessary conditions for the stability of perovskite lattice. It is obvious that the condition 0.75 < t<1.00 is not enough to assure the stability of the BX ₆ octahedral and AX ₁₂ cubo-octahedral structure. It is necessary to find the suitable criteria for the above-mentioned stability requirements.

Some ABX ₃ type compounds, such as RbNiCl₃, although having 0.75 < t < 1.00, do not form perovskite-type lattice but the crystal lattices with BaNiO₃ structure. The chief difference of BaNiO₃ structure (or hexagonal BaTiO₃ structure) from perovskite structure is that in BaNiO₃ structure (or hexagonal BaTiO₃ structure) the BX ₆ octahdra are shared by their face with each other (see Fig. 2), while in perovskite structure they are shared with each other by their corners.

In order to find the criterion of relative stability between BaNiO₃ structure and perovskite structure, the data set containing 23 samples is used for data mining [28]. By using SVC combined with atomic parameters of compounds, the mathematical model of SVC was found to differentiate between perovskite structure and the hexagonal ABX ₃ structures involving face-shared octahedral. The SVM model with linear kernel function manifests that 100 % of separation of the chlorides with perovskite structure and the chlorides with face-shared structures can be achieved. In this work, the leaving-one-out cross validation (LOOCV) method was undertaken to evaluate the performances of the models obtained. As such, the data set of n samples was divided into two disjoint subsets including a training data set (n–1 samples) and a test data set (only 1 sample). After developing each model based on the training set, the omitted data was predicted by the model developed. In LOOCV test, the rate of correctness of prediction is 91 %. The criterion for the formation of face-shared structure found by SVC can be expressed as follows

$$ 4.52R_{\text{b}} - 1.83R_{\text{a}} + 2.23X_{\text{a}} - 0.142X_{\text{b}} - 4.10N_{\text{d}} + 0.589 < 0, $$

(25)

where R _a and R _b denote the Shanon-Preweit ionic radii of A and B in ABCl₃ respectively. X _a and X _b denote the Basanov electronegativity of A and B respectively. N _d denotes the number of d electrons in the unfilled shell of d electrons of B ions. It implies that large A⁺ cation, small B²⁺ cation and large electronegativity of B favor the formation of face-sharing of BX ₆ octahedra. This fact can be explained as follows: In perovskite-type lattice the network of corner shared BX ₆ octahedra form cages of A⁺ ions. The repulsive force due to the large A⁺ and small cage formed by small B²⁺ and X ⁻ will make perovskite-type lattice unstable, while the BX ₆ octahedra in face-shared structure form parallel chains, and A⁺ ions are located between these chains without strict confinement. Thus large A⁺ and small B²⁺ favor the face-shared structure.

3.2 SVR applied to the prediction of energy gaps of binary compounds

III-V and II-VI binary compounds are important semiconductors for microwave, optoelectron and infrared devices. The band gaps (E _g) are essential properties of these compounds. It would be helpful for materials scientists to estimate the E _g of a compound before synthesizing it. On the basis of known data set available, it is reasonable to predict the properties of unseen samples by using data mining methods. Since there are a lot of data mining methods available, one has to deal with the troublesome problem about model selection for a particular data set with finite number of samples and multiple features. It is very important to select a proper model with good generalization ability, i.e., low mean relative error for the properties of new compounds (unseen samples).

In this work, the data set consists of 25 compounds, including AlP, AlAs, AlSb, GaP, GaAs, GaSb, InP, InAs, InSb, ZnS, ZnSe, ZnTe, CdS, CdSe, CdTe, HgS, HgSe, HgTe, AlN, GaN, InN, PbO, PbS, PbSe, and PbTe [29, 30]. Based on the data set available, the SVR model for predicting E _g of A^IIIB^V and A^IIB^VI binary compounds was constructed by using atomic parameters as features including electronegativity, valence, radius, atomic mass and their functions. The data mining results indicated that the sum of proportion of atomic electrovalent and covalent radius $ \sum {(z/r_{\text{cov}} )} $ [31], mean atomic number, $ \bar{N} $ atomic electrovalent Z _A and Z _B should be selected as parameters in the model of band gap,

$$ {\text{where}}\quad \bar{N} = \frac{{N_{\text{A}} + N_{\text{B}} }}{2}, $$

(26)

$$ \sum {(z/r_{\text{cov}} )} = (z/r_{\text{cov}} )_{A} + (z/r_{\text{cov}} )_{B} . $$

(27)

In the present work, the LOOCV test was undertaken to find the suitable capacity parameter C, ε-insensitive loss function and kernel function for SVR model. In order to measure the generalization ability of SVR model, we defined the mean error function (MEF) U _m as Eq. (28)

$$ U_{\text{m}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\frac{{\left| {p_{i} - e_{i} } \right|}}{{e_{\hbox{max} } - e_{\hbox{min} } }}} \times 100\,\% , $$

(28)

where e _i is the experimental value of sample i, p _i the predicted value of sample i, n the number of the whole samples. e _max, e _min are the maximum and minimum experimental value of whole samples respectively. In general, the smaller the value of U _m obtained, the better generalization ability expected. It is found that the optimal SVR model with the least U _m is available when the kernel function is polynomial form of $ K(\varvec{x}_{i} ,\varvec{x}_{j} ) = \left( {\left\langle {\varvec{x}_{i} \cdot \varvec{x}_{j} } \right\rangle + 1} \right)^{2} $, while ε = 0.07 and the regularized constant C = 70. By using above kernel function and parameters optimized, the trained SVR model for E _g of A^IIIB^V and A^IIB^VI binary compounds with original data is available as follows

$$ E_{g} = 3.479 \times \left( {\sum\limits_{i,j = 1}^{25} {\left( {\alpha_{i} - \alpha_{i}^{*} } \right)\left( {\left\langle {\varvec{x}_{i} \cdot \varvec{x}_{j} } \right\rangle + 1} \right)^{2} + 0.7473} } \right) + 0.1410, $$

(29)

where $ (\alpha_{i} - \alpha_{i}^{*} ) $is the Lagrange coefficient corresponding to support vector. Figure 3 illustrates the relationship of predicted E _g and experimental E _g of A^IIIB^V and A^IIB^VI binary compounds, with related coefficient (R) of 0.97.

In this work, the LOOCV method was also undertaken to evaluate the performances of the models obtained. Figure 4 is the plot of the predicted values employing LOOCV of SVR versus experimental values for E _g of binary compounds.

From Fig. 4, it can be concluded that the predicted results are in good agreement to experimental ones [25].

3.3 SVR applied to the prediction of sintered cold modulus of sialon-corundum castable

Although the discovery of new materials is very exciting in materials research, the most part of tasks of materials research everyday is to try to improve the preparation technology of known materials. The economic effect of such kind of improvement is very significant because these efforts eventually determine the cost and the quality of products or the competitive ability in international market. Here the example of using SVR in materials optimization of preparing Sialon Ceramic will be described.

Sialons are silicon aluminium oxynitride ceramic materials with a range of technically important application, from cutting tools to specialized refractories. Furthermore, they can have a wide range of compositions and occur in several different families of crystal structures, the properties of sialons can be tailored for specific purposes [32]. β-sialon corundum find applications as high temperature, corrosion resistant, thermal shock resistant, high strength and toughness structural material [33]. The sintered cold modulus of sialon-corundum castable is an important property of sialon material, but the relationship between the property and process parameters is very complicated [34]. Hence it is necessary to find some computational methods to correlate the properties of sialon-corundum with their process parameters.

In this work, the data set consists of 20 samples from our experiments. The root mean square error (RMSE) V _m of LOOCV was adopted to estimate the quality of the model for predicting the sintered cold modulus of sialon-corundum castable. The V _m is defined as follows

$$ V_{\text{m}} = \sqrt {\frac{{\sum\limits_{i = 1}^{n} {(p_{i} - e_{i} )^{2} } }}{n}} , $$

(30)

where e _i is the experimental value of i sample, p _i the predicted value of i sample, n the number of the whole samples in LOOCV. Based on the data mining work, it was found that the linear kernel function with C = 7.0 and ε = 0.16 can be used to construct SVR model for the quantitative relationship of sintered cold modulus with their process parameters. Finally, the SVR model obtained can be presented as follows [19]

$$ S_{\text{pred}} = - 2.78C_{\text{Water}} + 0.40C{}_{{{\text{SiO}}_{2} }} - 1.35C_{{{\text{Al}}_{2} {\text{O}}_{3} }} + 2.25C_{\text{Disperser}} + 36.77, $$

(31)

where S _pred means the experimental values of sintered cold modulus of sialon-corundum castable (strength, unit: MPa), while $ C_{\text{Water}} ,C{}_{{{\text{SiO}}_{ 2} }},C_{{{\text{Al}}_{ 2} {\text{O}}_{ 3} }} $ and C _Disperser are the content (mass %) of water, SiO₂ powder and ρ-Al₂O₃, dispersant and water respectively.

Figure 5 illustrates the experimental values versus predicted values of sialon-corundum cold modulus using LOOCV of SVR model with linear kernel (C = 7.0 and ε = 0.16).

According to the Eq. (31), in order to increase the S _pred, the content of SiO₂ and dispersant should be increased, at the mean while the content of ρ-Al₂O₃ should be decreased, which is consistent with the mechanism as follows. Addition of SiO₂ improves the flowability, but it reduces the added water content of castable. Simultaneously, SiO₂ reacts with water and then forms net structure of siloxene, which accelerates the sintering of castable and improves its cold strength after sintering. However, ρ-Al₂O₃ which usually hydrates into Al(OH)₃ and AlOOH as bonder, cannot accelerate the sintering with water increasing, because mass agglomeration γ-Al₂O₃ exists in the commercial ρ-Al₂O₃, which holds lots of water resulting bad flowability. Therefore, the increase of ρ-Al₂O₃ restrains the sintering and reduces the sintered cold strength of castable. As for dispersant, it improves the flowability of castable by avoiding the flocculation structure of micelle and making water difficultly into this structure.

3.4 SVM applied to the optimization of electric resistances of VPTC semiconductors

VPTC materials are a kind of ceramic semiconductors for electronic uses. The task of the research work of VPTC materials is to search the optimal composition and the optimal preparation conditions for high value of ρ ₀/ρ _min (the ratio of the electric resistance at zero degree centigrade to the minimum electric resistance) of these materials. There are five influencing factors including Yb₂O₃ content (W ₁), excess TiO₂ (W ₂), sintering temperature (T _c), sintering time (T _k), and relative cooling rate (V). By using linear kernel function (C = 10 and ε = 0.15), the trained SVR model for predicting ρ ₀/ρ _min is available as follows

$$ \rho_{0} /\rho_{\hbox{min} } = 29.52[W_{1} ] + 1.315[W_{2} ] + 0.03098[T_{c} ] + 1.075[T_{k} ] - 3.487[V] - 40.56. $$

(32)

It is found that the relationship between the property of ρ ₀/ρ _min and the five influencing factors is nearly linear one. Figure 6 shows the comparison between the experimental values and the predicted values of ρ ₀/ρ _min by SVR in LOOCV test. By using the SVR model combined with pattern recognition techniques, some unseen samples can be designed with new compositions and technological conditions for the optimization of VPTC semiconductors. The experimental results prove that the property ρ ₀/ρ _min of new sample designed by using data mining increases to 27, which is much higher than that of the best sample (ρ ₀/ρ _min = 21) obtained before the optimization.

3.5 SVM applied to the thickness control of In₂O₃ semiconductor film preparation

In₂O₃ semiconductor nanometer film is a new material for combustible gas detector uses. It can be prepared by sol-gel method. How to control the thickness of the semiconductor film is one of the crucial problems in the preparation work. There are several factors influencing the thickness of film: the mass percentage of In₂O₃ and PVA in the bath, the viscosity of coating liquids, the drawing rate and the drawing number in preparation. So it is desirable to have a mathematical model for the automatic control in the film production. In our lab, SVM methods have been used for data mining of this purpose.

It has been found that the SVR with polynomial kernel of second degree can make the mathematical model for the thickness control of the semiconductor films. Figure 7 shows the comparison between the experimental thickness data and the predicted thickness in LOOCV test [35].

4 Discussion and conclusions

Generally speaking, how to choose the right balance between model flexibility and over-fitting to a limited training set is one of the most difficult obstacles for obtaining a model with good generalization ability to predict properties of materials. In the computation of SVM model, it should be noted that the selection of appropriate value for the regularization parameter C is very important because of its possible effects on both trained and predicted results, since it controls the tradeoff between maximizing the margin and minimizing the training error. Usually, C should be optimized for fear of neither under-fitting nor over-fitting. It is also noticed that the predicted results are largely affected by the kernel functions and its parameters adopted.

It should be emphasized that the advantage of SVM is workable with a small size of sample set. In many cases, obtaining a sufficient number of experimental samples is still time-consuming and costly in the development of novel materials. Therefore, efficient learning from a limited number of samples becomes increasingly important for shortening the materials development cycle.

Although our research results indicate that the performance of SVM outperforms those of traditional data mining methods, it should be realized that different data mining methods would have their own advantages and disadvantages in different applications. Sometimes the best approach is a combination of different methods since the complementary approaches can provide helpful information from different point of views.

From the examples introduced in this paper, it can be concluded that the SVM is an effective modeling tool with great potential in materials design. Therefore, it can be expected that the SVM method will be further applied in various fields of materials science.

References

National Science and technology Coucil (2011) Materials genome initiative for global competitiveness, Washington DC, America, June 24, 2011
Choi YM, Lin MC, Liu ML (2010) Rational design of novel cathode materials in solid oxide fuel cells using first-principles simulations. J Power Sources 195(5):1441–1445
Article Google Scholar
Ceder G (2010) Opportunities and challenges for first-principles materials design and applications to Li battery materials. MRS Bull 35(9):693–701
Article Google Scholar
Curtarolo S, Hart GLW, Nardelli MB, Mingo N, Sanvito S, Levy O (2013) The high-throughput highway to computational materials design. Nat Mater 12(3):191–201
Article Google Scholar
Kong CS, Rajan K (2012) Rational design of binary halide scintillators via data mining. Nucl Instrum Methods Phys Res A 680(1):145–154
Article Google Scholar
Suh C, Kim K, Berry JJ, Lee J, Jones WB (2010) Data mining-aided crystal engineering for the design of transparent conducting oxides materials research society fall meeting. Cambridge University Press, Cambridge
Google Scholar
Liu X, Lu WC, Peng CR, Su Q, Guo J (2009) Two semi-empirical approaches for the prediction of oxide ionic conductivities in ABO₃ perovskites. Comp Mater Sci 46(4):860–868
Article Google Scholar
Gu TH, Lv W, Shao X, Lu WC (2012) Detection of high energy materials using support vector classification, Adv Mater Res 554–556:1628–1631
Google Scholar
Wu ML, Zhang LM, Gu TH, Qian N, Ma WJ, Lu WC (2013) Shape-controlled synthesis and pattern recognition of dendritic Co₃O₄ superstructures. Adv Mater Res 652–654:352–355
Google Scholar
Liu HL, Guo J, Chen NY (1996) A PLS-BPN pattern recognition method applied to computer-aided materials design. Anal Lett 29(2):341–350
Article Google Scholar
Chen NY, Li CH, Qin P (1998) KDPAG expert system applied to materials design and manufacture. Eng Appl Artif Intell 11(5):669–674
Article Google Scholar
Patterson DW (1996) Artificial neural networks: theory and applications. Prentice Hall, New Jersey
MATH Google Scholar
Wold S, Sjostroma M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemometr Intell Lab 58(2):109–130
Article Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Burbidge R, Trotter M, Buxton B, Holden S (2001) Drug design by machine learning: support vector machines for pharmaceutical data analysis. J Comput Chem 26(1):5–14
Article Google Scholar
Lu WC, Dong N, Nάray-Szabό G (2005) Predicting anti-HIV-1 activities of HEPT-analog compounds by using support vector classification. QSAR Comb Sci 24(9):1021–1025
Article Google Scholar
Li J, Qi M, Kong J, Wang J, Yan Y, Huo W, Yu J, Xu R, Xu Y (2010) Computational prediction of the formation of microporous aluminophosphates with desired structural features. Micropor Mesopor Mat 129(1–2):251–255
Article Google Scholar
Yan Q (2012) Prediction of porosity of porous NiTi alloy from processing parameters based on SVR. Adv Mater Res 393–395:231–235
Google Scholar
Liu X, Lu WC, Jin SL, Li YW, Chen NY (2006) Support vector regression applied to materials optimization of sialon ceramics. Chemometr Intell Lab 82(1–2):8–14
Google Scholar
Chen NY, Lu WC, Yang J, Li GZ (2004) Support vector machine in chemistry. World Scientific Publishing Company, Singapore
Book Google Scholar
Niu B, Lu WC, Yang SS, Cai YD, Li GZ (2007) Support vector machine for SAR/QSAR of phenethyl-amines. Acta Pharmacol Sin 28(7):1075–1086
Article Google Scholar
Zhu JX, Lu WC, Liu L, Gu TH, Niu B (2009) Classification of Src kinase inhibitors based on support vector machine. QSAR Comb Sci 28(6–7):719–727
Article Google Scholar
Yang SS, Lu WC, Gu TH, Yan LM, Li GZ (2009) QSPR study of n-octanol/water partition coefficient of some aromatic compounds using support vector regression. QSAR Comb Sci 28(2):175–182
Article Google Scholar
Liu X, Chen HC, Liu TA, Li YL, Lu ZR, Lu WC (2007) Application of PCA-SVR to NIR prediction model for tobacco chemical composition. Spectrosc Spect Anal 27(12):2460–2463
Google Scholar
Gu TH, Lu WC, Bao XH, Chen NY (2006) Using support vector regression for the prediction of the band gap and melting point of binary and ternary compound semiconductors. Solid State Sci 8(2):129–136
Article Google Scholar
Galasso FS (1990) Perovskites and high Tc superconductors. Wiley, New York
Google Scholar
Liu L, Lu WC, Chen NY (2004) On the criteria of formation and lattice distortion of perovskite-type complex halides. J Phys Chem Solids 65(5):855–860
Article Google Scholar
Müller O, Roy R (1974) The major ternary structural families. Springer, Berlin
Google Scholar
Madelung O (1996) Semiconductors—basic data. Springer, Berlin
Book Google Scholar
Boca R (1997) Semiconductors Materials. CRC Press, New York
Google Scholar
Chen NY (1976) Application of bond parameter function. Press of Science, Beijing
Google Scholar
MacKenzie KJD, Temuujin J, Smith ME, Okada K, Kameshima Y (2003) Mechanochemical processing of sialon compositions. J Eur Ceram Soc 23(7):1069–1082
Article Google Scholar
Kudyba-Jansen AA, Hintzen HT, Metselaar R (2001) The influence of green processing on the sintering and mechanical properties of β-sialon. J Eur Ceram Soc 21(12):2153–2160
Article Google Scholar
Li YW, Zhang X, Jin SL (2001) Corundum castables containing nitrogen for purging plug in Ladle. In: Proceedings of 44th international colloquium on refractories, pp 26–27, Aachen, Germany
Bao XH, Pan QY, Chen NY (2002) Support vector regression model for controlling the thickness of semiconductor In₂O₃ film. Comput Appl Chem 19(6):733–736
Google Scholar

Download references

Acknowledgments

Financial supports to this work from the National Natural Science Foundation of China (Grant No. 21273145) and the 085 Project of Materials Genome Engineering of Shanghai University are gratefully acknowledged.

Author information

Authors and Affiliations

Department of Chemistry, College of Sciences, Shanghai University, Shanghai, 200444, People’s Republic of China
Wen-Cong Lu, Xiao-Bo Ji, Min-Jie Li, Liang Liu, Bao-Hua Yue & Liang-Miao Zhang

Authors

Wen-Cong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Bo Ji
View author publications
You can also search for this author in PubMed Google Scholar
Min-Jie Li
View author publications
You can also search for this author in PubMed Google Scholar
Liang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bao-Hua Yue
View author publications
You can also search for this author in PubMed Google Scholar
Liang-Miao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen-Cong Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, WC., Ji, XB., Li, MJ. et al. Using support vector machine for materials design. Adv. Manuf. 1, 151–159 (2013). https://doi.org/10.1007/s40436-013-0025-2

Download citation

Received: 20 April 2013
Accepted: 07 May 2013
Published: 05 June 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s40436-013-0025-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Using support vector machine for materials design

Abstract

Similar content being viewed by others

Prediction of the lattice constants of pyrochlore compounds using machine learning

Database-Driven High-Throughput Calculations and Machine Learning Models for Materials Design

Interpretable machine learning for materials design

1 Introduction