1 Introduction

Over the last several decades, it is a great challenge for scientists to develop, manufacture, and deploy advanced materials as fast as possible. In June of 2011, the materials genome initiative (MGI) for global competitiveness was proposed by the National Science and Technology Council of America for the development of an infrastructure to shorten the materials development cycle. The most important and fundamental goal of MGI is to accelerate materials design through the use of computational capabilities, data management, and an integrated approach to materials science and engineering [1].

In principle, there are two strategies for materials design. One strategy is to start from the first principle, i.e., from quantum mechanics and statistical mechanics, to predict the properties of unknown materials. Although the first principle method has been widely used in materials design [2, 3], up to now, it is still impossible to solve most of complicated problems in materials exploration work by using this strategy solely. The other strategy is to start from the semi-empirical way, i.e., from the known data of some materials to find semi-empirical rules, which can be used to predict the properties of unknown materials. In general, the second strategy is more practicable than the first one in materials design or new materials exploration area, since a variety of data mining methods can be utilized to construct statistical models for a lot of data sets available from scientific experiments [49].

In the research of materials design by using data mining methods, principal component analysis (PCA), partial least squares (PLS) and artificial neural networks (ANNs) are very helpful because of their relative good performance, speed, simplicity to construct statistical models [10, 11]. However, ANN may give rise to over-fitting problems [12] (i.e., may lead to good performance in fitting but poor performance in prediction) in treating finite, multivariate data set. At the meanwhile, nonlinear relations can only be modeled in limited way by using PCA or PLS algorithm [13].

In the semi-empirical method of materials design, the data of known materials is usually used as the training set. In most cases, the numbers of available known data in the training sets are rather limited, which means that the data processing tasks usually deals with the problem of small sample size, and hence cause serious over-fitting problem. As an effective way to overcome the problem of over-fitting, support vector machine (SVM) based on statistical learning theory (SLT) has been proposed by Vapnik [14]. SVM has been shown to perform well in various applications including drug design [15, 16], materials design [1719] and chemistry researches [20].

In new materials exploration work, there are two questions with general significance in need of answers: the first question is “what is the chemical composition of the substance having desirable properties?”, and the second one is “what are the optimal conditions of preparation or production for this material at low cost?”. Since both of these two questions involve with very complicated systems or processes, we have to solve these problems by using some semi-empirical methods. To answer the first question, the relationships between the microscopic structure of materials and their properties need to be addressed. This relationship is usually known as quantitative structure–property relationship (QSPR). To deal with the second question, mathematical models are usually set up for the optimization of the processes.

Based on the problems concerning materials design, the tasks of materials design can be classified into four different categories. The first type of task is to solve the “formability problems”, i.e., to find some mathematical model or criterion for the stability of some unknown substances. The second type of task is the “property prediction”, i.e., to make mathematical models for the structure–property relationships and use these models to predict the properties of new materials (or the inverse problem: to search the unknown new materials with some pre-assigned properties). The third type of task is to solve the “optimization problems”, i.e., to find the conditions for optimizing some properties of certain materials. The last but not the least type of task is to solve the “problem of control”, i.e., to find the mathematical model to control some index of materials within a desired range. Different data mining techniques should be adopted for these different purposes. In this paper, we demonstrate some examples of applying SVM methods including support vector classification (SVC) and support vector regression (SVR) as a relatively new tool to meet the different tasks of materials design in our lab. The advantage of using SVM for materials design is discussed based on the applications presented.

2 Methods of SVM

The foundations of SVM have been developed by Vapnik [14] and are gaining popularity due to many attractive features, and promising empirical performance. In this paper the term SVM will refer to both SVC and SVR methods, which can be used for solving qualitative and quantitative problems respectively [2125].

2.1 SVC

SVC has been recently proposed as a very effective method for solving classification problems, which can be restricted to consideration of the two-class problem without loss of generality [14, 20]. In this problem the goal is to separate the two classes by a classifier induced from available examples. It is expected that the classifier constructed has good performance on unseen examples, i.e., it generalizes well.

The geometrical interpretation of SVC is that it determines the optimal separating surface, i.e., a hyperplane, which is equidistant from two sets of data points. This hyperplane has many interesting statistical properties as discussed by Vapnik [14]. Consider the problem of separating the set of training vectors belonging to two separate classes, \( (y_{1} ,\varvec{x}_{1} ),(y_{2} ,\varvec{x}_{2} ), \cdots ,(y_{n} ,\varvec{x}_{n} ),\varvec{x} \in \varvec{R}^{m} ,y \in - 1, + 1, \) with a hyperplane

$$ \varvec{w}^{\text{T}} \varvec{x} + b = 0, $$
(1)

where w and b are the weight vector and bias, respectively.

If the training data are linearly separable, then there exists a pair of parameter set (w, b), for which we can write

$$ y_{i} (\varvec{w}^{\text{T}} \varvec{x}_{i} + b) - 1 \ge 0,\quad i = 1,2, \cdots ,l, $$
(2)
$$ \varvec{w}^{\text{T}} \varvec{x} + b \ge + 1,\quad {\text{for all }}\varvec{x} \in \varvec{P}, $$
(3)
$$ \varvec{w}^{\text{T}} \varvec{x} + b \le - 1,\quad {\text{for all }}\varvec{x} \in \varvec{N} $$
(4)

where P is the set of positive sample, and N is the set of negative sample.

The decision rule is

$$ f_{{\varvec{w},b}} (\varvec{x}) = \text{sgn} (\varvec{w}^{\text{T}} \varvec{x} + b). $$
(5)

The pair (w, b) can be rescaled without loss of generality

$$ \mathop {\hbox{min} }\limits_{i = 1,2, \cdots ,l} \left| {\varvec{w}^{T} \varvec{x}_{i} + b} \right| = 1. $$
(6)

The learning problem is hence reformulated as follows. Let us minimize ||w||2 subject to the constraints of linear separability. This is equivalent to maximizing the distance, normal to the hyperplane, between the convex hulls of two classes and the optimisation becomes a quadratic programming (QP) problem

$$ \mathop {\text{Min}}\limits_{{\varvec{w},b}} \phi (\varvec{w}) = \frac{1}{2}\left\| \varvec{w} \right\|^{2} , $$
(7)

subject to \( y_{i} (\varvec{w}^{\text{T}} \varvec{x}_{i} + b) \ge 1,\quad i = 1,2, \cdots ,l. \) This problem has global optimum, and the Lagrangian is written as

$$ L(\varvec{w},b,\Uplambda ) = \frac{1}{2}\left\| \varvec{w} \right\|^{2} - \sum\limits_{i = 1}^{t} {\lambda_{i} [y_{i} (\varvec{w}^{\text{T}} \varvec{x}_{i} + b) - 1]} , $$
(8)

where \( \Uplambda = \{ \lambda_{1} ,\lambda_{2} , \cdots ,\lambda_{l} \} \) are the Lagrange multipliers, one for each data point. Hence we can write

$$ F(\Uplambda ) = \sum\limits_{i = 1}^{t} {\lambda_{i} - \frac{1}{2}\left\| \varvec{w} \right\|^{2} = \sum\limits_{i = 1}^{l} {\lambda_{i} - } \frac{1}{2}\sum\limits_{i = 1}^{l} {\sum\limits_{j = 1}^{l} {\lambda_{i} \lambda_{j} {y}_{i} {y}_{j} \varvec{x}_{i}^{T} \varvec{x}_{j} } .} } $$
(9)

The Lagrange multipliers are only non-zero when \( y_{i} (\varvec{w}^{\text{T}} \varvec{x}_{i} + b) = 1 \). Vectors fulfilling this requirement are called support vectors since they lie closest to the separating hyperplane. Then, the optimal separating hyperplane is given as follows

$$ \varvec{w}^{*} = \sum\limits_{i + 1}^{l} {\lambda_{i}^{*} \varvec{x}_{i} y_{i} } , $$
(10)

and the bias is given by

$$ b^{*} = - \frac{1}{2}(\varvec{w}^{*} )^{\text{T}} (\varvec{x}_{\text{s}} + \varvec{x}_{\text{r}} ), $$
(11)

where x r and x s are any support vectors from each class satisfying the following equation

$$ y_{\text{r}} = 1,\quad y_{\text{s}} = - 1. $$
(12)

The hard classifier is then

$$ f(\varvec{x}) = \text{sgn} ((\varvec{w}^{*} )^{\text{T}} \varvec{x} + b^{*} ). $$
(13)

In the case where a linear boundary is inappropriate, the SVC can map the input vector, x, into a high dimensional feature space, F. By choosing a non-linear mapping Φ, the SVC constructs an optimal separating hyperplane in this higher dimensional space. Among acceptable mappings are polynomials, radial basis functions and certain sigmoid functions. Then the optimisation problem becomes

$$ W(\alpha ) = \sum\limits_{i = 1}^{l} {\alpha_{i} - \frac{1}{2}\sum\limits_{i,j = 1}^{l} {y_{i} y_{j} } \alpha_{i} \alpha_{j} \left\langle {\varPhi (\varvec{x}_{i} ) \cdot \varPhi (\varvec{x}_{j} )} \right\rangle } . $$
(14)

In this case, the decision function in SVC is as follows

$$ \begin{aligned} g(\varvec{x}) =& \text{sgn} (f(\varvec{x})) = \text{sgn} \left( {\sum\limits_{i \in SV} {\alpha_{i} y_{i} \left\langle {\varPhi (\varvec{x}) \cdot \varPhi (\varvec{x}_{i} )} \right\rangle + b} } \right) \\= & \text{sgn} \left( {\sum\limits_{i \in SV} {\alpha_{i} y_{i} K(\varvec{x},\varvec{x}_{i} ) + b} } \right),\\ \end{aligned} $$
(15)

where the x i is the set of support vectors and K(x, x i ) is called the kernel function.

2.2 SVR [14, 20]

In SVR, the basic idea is to map the data x into a higher-dimensional feature space F via a nonlinear mapping Φ and then to do linear regression in this space. Therefore, regression approximation addresses the problem of estimating a function based on a given data set \( G = \{ (\varvec{x}_{i} ,d_{i} )\}_{i = 1}^{l} \) (x i is input vector, and d i is the desired value). SVR approximates the function in the following form

$$ y = \sum\limits_{i = 1}^{l} {w_{i} \varPhi (\varvec{x}_{i} ) + b} , $$
(16)

where \( \left\{ {\varPhi (\varvec{x}_{i} )} \right\}_{i = 1}^{l} \) is the set of mappings of input features, and \( \left\{ {w_{i} } \right\}_{i = l}^{I} \) and b are coefficients. They are estimated by minimizing the regularized risk function R(C):

$$ R(C) = C\frac{1}{N}\sum\limits_{i = 1}^{N} {L_{\varepsilon } } (d_{i}, y_{i} ) + \frac{1}{2}\left\| \varvec{w} \right\|^{2} , $$
(17)

where

$$ L_{\varepsilon } (d,y) = \left( {\begin{array}{*{20}c} {\left| {d - y} \right| - \varepsilon ,} & {{\text{for }}\left| {d - y} \right| \ge \varepsilon ,} \\ {0 ,} & {{\text{otherwise}}.} \\ \end{array} } \right. $$
(18)

and ε is a prescribed parameter.

In Eq. (17), \( C\frac{1}{N}\sum\nolimits_{i = 1}^{N} {L_{\varepsilon } (d_{i} ,y_{i} )} \) is the so-called empirical error (risk), which is measured by ε-insensitive loss function \( L_{\varepsilon } (d,y) \), which indicates that it does not penalize errors below ε. The second term, \( \frac{1}{2}\left\| \varvec{w} \right\|^{2} , \) is used as a measurement of function flatness. C is a regularized constant determining the tradeoff between the training error and the model flatness. Introduction of slack variables ξ leads Eq. (17) to the following constrained function:

$$ {\text{Min}} R(\varvec{w},\xi ,\xi^{*} ) = \frac{1}{2}\left\| \varvec{w} \right\|^{2} + C^{*} \sum\limits_{i = 1}^{n} {(\xi_{i} + \xi_{i}^{*} )} , $$
(19)

s.t.

$$ \left( \begin{gathered} \varvec{w} \varPhi (x_{i} ) + b - d_{i} \le \varepsilon + \xi_{i} , \hfill \\ d_{i} - \varvec{w}\varPhi (x_{i} ) - b \le \varepsilon + \xi_{i}^{*} , \hfill \\ \xi_{i} ,\xi_{i}^{*} \ge 0. \hfill \\ \end{gathered} \right. $$
(20)

Thus, decision function Eq. (16) becomes the following form:

$$ f(\varvec{x},\alpha ,\alpha^{*} ) = \sum\limits_{i = 1}^{l} {\left( {\alpha_{i} - \alpha_{i}^{*} } \right)K(\varvec{x},\varvec{x}_{i} ) = b.} $$
(21)

In Eq. (21), \( \alpha_{i} ,\alpha_{i}^{*} \) are the introduced Lagrange multipliers. They satisfy the equality: \( \alpha_{i} \cdot \alpha_{i}^{*} = 0,\quad \alpha_{i} \ge 0,\quad i = 1,2, \cdots ,l, \) and are obtained by maximizing the dual form of Eq. (19), which has the following form:

$$ \begin{aligned} w(\alpha ,\alpha^{*} ) & = \sum\limits_{i = 1}^{l} {d_{i} (\alpha_{i} - \alpha_{i}^{*} ) - \varepsilon \sum\limits_{i = 1}^{l} {(\alpha_{i} + \alpha_{i}^{*} )} } \\ \quad - \frac{1}{2}\sum\limits_{i = 1}^{l} {\sum\limits_{j = 1}^{l} {(\alpha_{i} - \alpha_{i}^{*} )(\alpha_{j} - \alpha_{j}^{*} )K(\varvec{x}_{i} ,\varvec{x}_{j} )} } , \\ \end{aligned} $$
(22)

with the following constrains:

$$ \left[ \begin{gathered} 0 \le \alpha_{i} \le C\quad i = 1,2, \cdots ,l, \hfill \\ 0 \le \alpha_{i}^{*} \le C\quad i = 1,2, \cdots ,l, \hfill \\ \sum\limits_{i + 1}^{l} {(\alpha_{i} - \alpha_{i}^{*} )} = 0. \hfill \\ \end{gathered} \right. $$
(23)

Based on the Karush–Kuhn–Tucker (KKT) conditions of quadratic programming, only a number of coefficients \( \alpha_{i} - \alpha_{i}^{*} \) will assume nonzero values, and the data points associated with them could be referred to as support vectors. In Eq. (21), \( K(\varvec{x},\varvec{x}_{i} ) \)is the kernel function. The value is equal to the inner product of two vectors x and x i in the feature space \( \varPhi \)(x). That is, \( K(\varvec{x},\varvec{x}_{i} ) = \varPhi (\varvec{x})\varPhi (\varvec{x}_{i} ) \). The elegance of using kernel function lied in the fact that one can deal with feature spaces of arbitrary dimensionality without having to compute the map \( \varPhi \)(x) explicitly. Any function that satisfies Mercer’s condition can be used as the kernel function.

2.3 Implementation of SVM

According to the Ref. [14], the SVM software package ChemSVM including SVC and SVR has been programmed in our lab. The free version of ChemSVM can be downloaded on the website of Laboratory of Computational Chemistry in Shanghai University (http://chemdata.shu.edu.cn:8080/MyLab/Lab/download.jsp). The validation of the software has also been performed in the applications of chemistry [20].

3 Applications

3.1 SVC applied to the formability of perovskite or BaNiO3 structure

The most exciting achievement of materials research is to find some new compound (or new phases) with specified structure and outstanding properties. In this work, the materials design problems of compounds with perovskite-type structures or BaNiO3 structure will be discussed based on SVC model.

There are numerous complex oxides or halides with general formula ABX 3 (X = oxygen or halogen) having perovskite-type crystal structure and outstanding functional properties [26]. Since 1945, when the ferroelectric properties of barium titanate were discovered, a series of complex oxides and complex halides with perovskite-type structure have been found to be valuable functional materials. In recent years, searching new complex oxides and complex halides with perovskie-type structure has become an active research field of new materials exploration.

The crystal structure of compounds with ideal perovskite structure is illustrated in Fig. 1. It is the structure of a unit cell of SrTiO3 crystal. In this structure, tetravalent Ti4+ cation is surrounded by 6 oxygen anions to form octahedral structure, and bivalent Sr2+ cation is surrounded by 12 oxygen anions to form cubo-octahedral structure. Based on the understanding of such type of crystal structure, Goldschmidt proposed a famous crystal-chemical criterion of the formability or the stability of perovskite structure for ABX 3-type compounds:

$$ t = \frac{{R_{a} + R_{x} }}{{\sqrt 2 (R_{b} + R_{x} )}}, $$
(24)

where t is called tolerance factor. R a, R b are cationic radii of A, B ions respectively, and R x is the ionic radius of anion of X. According to Goldschmidt, the cubic perovskite structure is stable only if the value of tolerance factor has an approximate range of 0.8 < t < 0.9, and the distorted perovskite structure can be stable in a somewhat larger range of tolerance factor. This criterion is widely used in the exploration work of new compounds with perovskite-type or perovskite-like type structure. Owing to the accumulation of the crystallographic data of compounds with perovskite structure, it is now widely recognized that the range of tolerance factor for the stability of perovskite structure should be 0.75 < t < 1.00.

Fig. 1
figure 1

Crystal structure of SrTiO3, a typical compound with ideal perovskite structure

Although Goldschmidt’s tolerance factor t is indeed very useful for the exploration of new compounds with perovskite structure, it is only a necessary condition but not the sufficient condition for the formation or the stability of perovskite structure [27]. Many systems having t in the range of 0.75–1.0 do not form perovskite-type compound.

For example, MnSiO3 has t = 0.856, but it has CdGeO3 structure; RbMnCl3 has t = 0.88, but it has hexagonal BaTiO3 structure. NaI-MgI2 system has t = 0.826, but it has no intermediate compound at all. Therefore it is desirable to investigate the complementary conditions for the formability of perovskite structure, in order to help the computerized materials design for new materials with perovskite structure. Atomic parameters and SVM technique can be used for this purpose.

Since BX 6 octahedra and AX 12 cubo-octahedra are the basic sub-structures of perovskite lattice, as shown in Fig. 1, it is easy to see that the stability of BX 6 octahedra and AX 12 cubo-octahedra are also necessary conditions for the stability of perovskite lattice. It is obvious that the condition 0.75 < t<1.00 is not enough to assure the stability of the BX 6 octahedral and AX 12 cubo-octahedral structure. It is necessary to find the suitable criteria for the above-mentioned stability requirements.

Some ABX 3 type compounds, such as RbNiCl3, although having 0.75 < t < 1.00, do not form perovskite-type lattice but the crystal lattices with BaNiO3 structure. The chief difference of BaNiO3 structure (or hexagonal BaTiO3 structure) from perovskite structure is that in BaNiO3 structure (or hexagonal BaTiO3 structure) the BX 6 octahdra are shared by their face with each other (see Fig. 2), while in perovskite structure they are shared with each other by their corners.

Fig. 2
figure 2

Face-shared BX 6 structure in BaNiO3 lattice

In order to find the criterion of relative stability between BaNiO3 structure and perovskite structure, the data set containing 23 samples is used for data mining [28]. By using SVC combined with atomic parameters of compounds, the mathematical model of SVC was found to differentiate between perovskite structure and the hexagonal ABX 3 structures involving face-shared octahedral. The SVM model with linear kernel function manifests that 100 % of separation of the chlorides with perovskite structure and the chlorides with face-shared structures can be achieved. In this work, the leaving-one-out cross validation (LOOCV) method was undertaken to evaluate the performances of the models obtained. As such, the data set of n samples was divided into two disjoint subsets including a training data set (n–1 samples) and a test data set (only 1 sample). After developing each model based on the training set, the omitted data was predicted by the model developed. In LOOCV test, the rate of correctness of prediction is 91 %. The criterion for the formation of face-shared structure found by SVC can be expressed as follows

$$ 4.52R_{\text{b}} - 1.83R_{\text{a}} + 2.23X_{\text{a}} - 0.142X_{\text{b}} - 4.10N_{\text{d}} + 0.589 < 0, $$
(25)

where R a and R b denote the Shanon-Preweit ionic radii of A and B in ABCl3 respectively. X a and X b denote the Basanov electronegativity of A and B respectively. N d denotes the number of d electrons in the unfilled shell of d electrons of B ions. It implies that large A+ cation, small B2+ cation and large electronegativity of B favor the formation of face-sharing of BX 6 octahedra. This fact can be explained as follows: In perovskite-type lattice the network of corner shared BX 6 octahedra form cages of A+ ions. The repulsive force due to the large A+ and small cage formed by small B2+ and X will make perovskite-type lattice unstable, while the BX 6 octahedra in face-shared structure form parallel chains, and A+ ions are located between these chains without strict confinement. Thus large A+ and small B2+ favor the face-shared structure.

3.2 SVR applied to the prediction of energy gaps of binary compounds

III-V and II-VI binary compounds are important semiconductors for microwave, optoelectron and infrared devices. The band gaps (E g) are essential properties of these compounds. It would be helpful for materials scientists to estimate the E g of a compound before synthesizing it. On the basis of known data set available, it is reasonable to predict the properties of unseen samples by using data mining methods. Since there are a lot of data mining methods available, one has to deal with the troublesome problem about model selection for a particular data set with finite number of samples and multiple features. It is very important to select a proper model with good generalization ability, i.e., low mean relative error for the properties of new compounds (unseen samples).

In this work, the data set consists of 25 compounds, including AlP, AlAs, AlSb, GaP, GaAs, GaSb, InP, InAs, InSb, ZnS, ZnSe, ZnTe, CdS, CdSe, CdTe, HgS, HgSe, HgTe, AlN, GaN, InN, PbO, PbS, PbSe, and PbTe [29, 30]. Based on the data set available, the SVR model for predicting E g of AIIIBV and AIIBVI binary compounds was constructed by using atomic parameters as features including electronegativity, valence, radius, atomic mass and their functions. The data mining results indicated that the sum of proportion of atomic electrovalent and covalent radius \( \sum {(z/r_{\text{cov}} )} \) [31], mean atomic number, \( \bar{N} \) atomic electrovalent Z A and Z B should be selected as parameters in the model of band gap,

$$ {\text{where}}\quad \bar{N} = \frac{{N_{\text{A}} + N_{\text{B}} }}{2}, $$
(26)
$$ \sum {(z/r_{\text{cov}} )} = (z/r_{\text{cov}} )_{A} + (z/r_{\text{cov}} )_{B} . $$
(27)

In the present work, the LOOCV test was undertaken to find the suitable capacity parameter C, ε-insensitive loss function and kernel function for SVR model. In order to measure the generalization ability of SVR model, we defined the mean error function (MEF) U m as Eq. (28)

$$ U_{\text{m}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\frac{{\left| {p_{i} - e_{i} } \right|}}{{e_{\hbox{max} } - e_{\hbox{min} } }}} \times 100\,\% , $$
(28)

where e i is the experimental value of sample i, p i the predicted value of sample i, n the number of the whole samples. e max, e min are the maximum and minimum experimental value of whole samples respectively. In general, the smaller the value of U m obtained, the better generalization ability expected. It is found that the optimal SVR model with the least U m is available when the kernel function is polynomial form of \( K(\varvec{x}_{i} ,\varvec{x}_{j} ) = \left( {\left\langle {\varvec{x}_{i} \cdot \varvec{x}_{j} } \right\rangle + 1} \right)^{2} \), while ε = 0.07 and the regularized constant C = 70. By using above kernel function and parameters optimized, the trained SVR model for E g of AIIIBV and AIIBVI binary compounds with original data is available as follows

$$ E_{g} = 3.479 \times \left( {\sum\limits_{i,j = 1}^{25} {\left( {\alpha_{i} - \alpha_{i}^{*} } \right)\left( {\left\langle {\varvec{x}_{i} \cdot \varvec{x}_{j} } \right\rangle + 1} \right)^{2} + 0.7473} } \right) + 0.1410, $$
(29)

where \( (\alpha_{i} - \alpha_{i}^{*} ) \)is the Lagrange coefficient corresponding to support vector. Figure 3 illustrates the relationship of predicted E g and experimental E g of AIIIBV and AIIBVI binary compounds, with related coefficient (R) of 0.97.

Fig. 3
figure 3

Experimental E g versus predicted E g of binary compound semiconductors with trained SVR model

In this work, the LOOCV method was also undertaken to evaluate the performances of the models obtained. Figure 4 is the plot of the predicted values employing LOOCV of SVR versus experimental values for E g of binary compounds.

Fig. 4
figure 4

Experimental E g versus predicted E g of binary compound semiconductors by using LOOCV of SVR (R = 0.93)

From Fig. 4, it can be concluded that the predicted results are in good agreement to experimental ones [25].

3.3 SVR applied to the prediction of sintered cold modulus of sialon-corundum castable

Although the discovery of new materials is very exciting in materials research, the most part of tasks of materials research everyday is to try to improve the preparation technology of known materials. The economic effect of such kind of improvement is very significant because these efforts eventually determine the cost and the quality of products or the competitive ability in international market. Here the example of using SVR in materials optimization of preparing Sialon Ceramic will be described.

Sialons are silicon aluminium oxynitride ceramic materials with a range of technically important application, from cutting tools to specialized refractories. Furthermore, they can have a wide range of compositions and occur in several different families of crystal structures, the properties of sialons can be tailored for specific purposes [32]. β-sialon corundum find applications as high temperature, corrosion resistant, thermal shock resistant, high strength and toughness structural material [33]. The sintered cold modulus of sialon-corundum castable is an important property of sialon material, but the relationship between the property and process parameters is very complicated [34]. Hence it is necessary to find some computational methods to correlate the properties of sialon-corundum with their process parameters.

In this work, the data set consists of 20 samples from our experiments. The root mean square error (RMSE) V m of LOOCV was adopted to estimate the quality of the model for predicting the sintered cold modulus of sialon-corundum castable. The V m is defined as follows

$$ V_{\text{m}} = \sqrt {\frac{{\sum\limits_{i = 1}^{n} {(p_{i} - e_{i} )^{2} } }}{n}} , $$
(30)

where e i is the experimental value of i sample, p i the predicted value of i sample, n the number of the whole samples in LOOCV. Based on the data mining work, it was found that the linear kernel function with C = 7.0 and ε = 0.16 can be used to construct SVR model for the quantitative relationship of sintered cold modulus with their process parameters. Finally, the SVR model obtained can be presented as follows [19]

$$ S_{\text{pred}} = - 2.78C_{\text{Water}} + 0.40C{}_{{{\text{SiO}}_{2} }} - 1.35C_{{{\text{Al}}_{2} {\text{O}}_{3} }} + 2.25C_{\text{Disperser}} + 36.77, $$
(31)

where S pred means the experimental values of sintered cold modulus of sialon-corundum castable (strength, unit: MPa), while \( C_{\text{Water}} ,C{}_{{{\text{SiO}}_{ 2} }},C_{{{\text{Al}}_{ 2} {\text{O}}_{ 3} }} \) and C Disperser are the content (mass %) of water, SiO2 powder and ρ-Al2O3, dispersant and water respectively.

Figure 5 illustrates the experimental values versus predicted values of sialon-corundum cold modulus using LOOCV of SVR model with linear kernel (C = 7.0 and ε = 0.16).

Fig. 5
figure 5

Experimental values versus predicted values of sialon-corundum cold modulus using LOOCV of SVR model with linear kernel (C = 7.0 and ε = 0.16)

According to the Eq. (31), in order to increase the S pred, the content of SiO2 and dispersant should be increased, at the mean while the content of ρ-Al2O3 should be decreased, which is consistent with the mechanism as follows. Addition of SiO2 improves the flowability, but it reduces the added water content of castable. Simultaneously, SiO2 reacts with water and then forms net structure of siloxene, which accelerates the sintering of castable and improves its cold strength after sintering. However, ρ-Al2O3 which usually hydrates into Al(OH)3 and AlOOH as bonder, cannot accelerate the sintering with water increasing, because mass agglomeration γ-Al2O3 exists in the commercial ρ-Al2O3, which holds lots of water resulting bad flowability. Therefore, the increase of ρ-Al2O3 restrains the sintering and reduces the sintered cold strength of castable. As for dispersant, it improves the flowability of castable by avoiding the flocculation structure of micelle and making water difficultly into this structure.

3.4 SVM applied to the optimization of electric resistances of VPTC semiconductors

VPTC materials are a kind of ceramic semiconductors for electronic uses. The task of the research work of VPTC materials is to search the optimal composition and the optimal preparation conditions for high value of ρ 0/ρ min (the ratio of the electric resistance at zero degree centigrade to the minimum electric resistance) of these materials. There are five influencing factors including Yb2O3 content (W 1), excess TiO2 (W 2), sintering temperature (T c), sintering time (T k), and relative cooling rate (V). By using linear kernel function (C = 10 and ε = 0.15), the trained SVR model for predicting ρ 0/ρ min is available as follows

$$ \rho_{0} /\rho_{\hbox{min} } = 29.52[W_{1} ] + 1.315[W_{2} ] + 0.03098[T_{c} ] + 1.075[T_{k} ] - 3.487[V] - 40.56. $$
(32)

It is found that the relationship between the property of ρ 0/ρ min and the five influencing factors is nearly linear one. Figure 6 shows the comparison between the experimental values and the predicted values of ρ 0/ρ min by SVR in LOOCV test. By using the SVR model combined with pattern recognition techniques, some unseen samples can be designed with new compositions and technological conditions for the optimization of VPTC semiconductors. The experimental results prove that the property ρ 0/ρ min of new sample designed by using data mining increases to 27, which is much higher than that of the best sample (ρ 0/ρ min = 21) obtained before the optimization.

Fig. 6
figure 6

Comparison between the experimental values and predicted values of ρ 0/ρ min by SVR in LOOCV test

3.5 SVM applied to the thickness control of In2O3 semiconductor film preparation

In2O3 semiconductor nanometer film is a new material for combustible gas detector uses. It can be prepared by sol-gel method. How to control the thickness of the semiconductor film is one of the crucial problems in the preparation work. There are several factors influencing the thickness of film: the mass percentage of In2O3 and PVA in the bath, the viscosity of coating liquids, the drawing rate and the drawing number in preparation. So it is desirable to have a mathematical model for the automatic control in the film production. In our lab, SVM methods have been used for data mining of this purpose.

It has been found that the SVR with polynomial kernel of second degree can make the mathematical model for the thickness control of the semiconductor films. Figure 7 shows the comparison between the experimental thickness data and the predicted thickness in LOOCV test [35].

Fig. 7
figure 7

Result of prediction of thickness of In2O3 film by SVR in LOOCV test

4 Discussion and conclusions

Generally speaking, how to choose the right balance between model flexibility and over-fitting to a limited training set is one of the most difficult obstacles for obtaining a model with good generalization ability to predict properties of materials. In the computation of SVM model, it should be noted that the selection of appropriate value for the regularization parameter C is very important because of its possible effects on both trained and predicted results, since it controls the tradeoff between maximizing the margin and minimizing the training error. Usually, C should be optimized for fear of neither under-fitting nor over-fitting. It is also noticed that the predicted results are largely affected by the kernel functions and its parameters adopted.

It should be emphasized that the advantage of SVM is workable with a small size of sample set. In many cases, obtaining a sufficient number of experimental samples is still time-consuming and costly in the development of novel materials. Therefore, efficient learning from a limited number of samples becomes increasingly important for shortening the materials development cycle.

Although our research results indicate that the performance of SVM outperforms those of traditional data mining methods, it should be realized that different data mining methods would have their own advantages and disadvantages in different applications. Sometimes the best approach is a combination of different methods since the complementary approaches can provide helpful information from different point of views.

From the examples introduced in this paper, it can be concluded that the SVM is an effective modeling tool with great potential in materials design. Therefore, it can be expected that the SVM method will be further applied in various fields of materials science.