1 Introduction

Superconducting materials (materials that conduct current with zero resistance) have significant practical applications [1,2,3,4]. Perhaps the best-known application is in the Magnetic Resonance Imaging (MRI) systems widely employed by healthcare professionals for detailed internal body imaging. Other prominent applications include the superconducting coils used to maintain high magnetic fields in the Large Hadron Collider at CERN and the extremely sensitive magnetic field measuring devices called SQUIDs (Superconducting Quantum Interference Devices). Furthermore, superconductors could revolutionize the energy industry as frictionless (zero resistance) superconducting wires and electrical system may transport and deliver electricity with no energy loss.

A superconductor conducts current with zero resistance only at or below its superconducting critical temperature (Tc) [5,6,7,8,9]. Moreover, the scientific model and theory that predicts Tc is an open problem, which has been baffling the scientific community since the discovery of superconductivity in 1911 by Heike Kamerlingh Onnes [1,2,3,4,5,6,7,8,9]. In the absence of any theory-based prediction models, we take here an entirely data-driven approach to create a statistical model that predicts Tc based on its chemical formula. Indeed, an alternative approach for the superconducting critical temperature prediction problem is the machine learning (ML) approach, which builds data-driven predictive models by exploring the relationship between material composition similarity and critical temperature. Machine learning methods need a sufficient amount of training data to be available [10,11,12,13,14], but the availability of an increasing number of materials databases with experimental properties allows the application of these methods for materials property prediction.

In this investigation, a new hybrid regressive model based on the multivariate adaptive regression splines (MARS) technique has been used to successfully predict the superconducting critical temperature Tc for different types of superconductors. This novel procedure, which combines the MARS approximation [15,16,17,18,19] with the whale optimization algorithm (WOA) [20,21,22], could be an attractive methodology that has not been tackled as of yet. For comparative purposes, the Ridge, Lasso, and Elastic-net regression models were also fitted to the same experimental dataset to estimate the Tc and compare the results obtained [23,24,25,26,27,28,29]. However, the MARS technique is a statistical learning methodology built up in accordance with the statistics and mathematical analysis which has the ability to deal with nonlinearities including interactions among variables [30, 31]. It is a nonparametric regression technique and can be seen as an extension of linear models that automatically model nonlinearities and complex interactions between variables. MARS approximation presents some benefits in comparison with the classical and metaheuristic regression techniques, including [32,33,34,35]: (1) avoiding physical models of the superconductor; (2) providing models that are more flexible than linear regression models; (3) creating models that are simple to understand and interpret; (4) allowing for the modeling of nonlinear relationships among the physico-chemical input variables of a superconductor; (5) offering a good bias-variance trade-off; and (6) providing an explicit mathematical formula of the dependent variable as a function of the independent variables through an expansion of the basis functions (hinge functions and products of two or more hinge functions). This last feature is a fundamental and noteworthy difference compared to other alternative methods, as most of them behave like a black box. Moreover, the WOA optimizer has been used to satisfactorily calculate the optimal MARS hyperparameters. In addition, previous research has indicated that MARS is a very effective tool for use in a large number of real applications, including soil erosion susceptibility prediction [36], rapid chloride permeability prediction of self-compacting concrete [37], evaluation of the earthquake induced uplift displacement of tunnels [38], estimation of hourly global solar radiation [39], atypical algal proliferation modeling in a reservoir [40], pressure drop estimation produced by different filtering media in microirrigation sand filters [41], assessing frost heave susceptibility of gravelly soils [42] and so on. However, it has never been used for evaluating superconducting critical temperature Tc from the input physico-chemical parameters in most types of superconductors.

This paper is structured as follows: Sect. 2 contains the experimental arrangement, all the variables included in this research and MARS, Ridge, Lasso, and Elastic-net methodologies; Sect. 3 presents the findings acquired with this novel technique by collating the MARS results with the observed values as well as the significance ranking of the input variables, and Sect. 4 concludes this study by providing an inventory of principal results of the research.

2 Materials and methods

2.1 Dataset

The SuperCon database [43] is currently the biggest and most comprehensive database of superconductors in the world. It is free and open to the public, and it has been used in almost all ML studies of superconductors [44,45,46]. The SuperCon dataset was pre-processed for further research by Hamidieh [7], and this database is deposited in the University of California Irvine data repository [47]. As a result of the pre-treatment, materials that had some missing features were removed. Also, preliminary processing included the formation of new features based on existing ones. Atomic mass, density, first ionization energy, atomic radius, density, electron affinity, fusion heat, thermal conductivity, and valence were taken as the initial 8 features (see Table 1). That is, the chemical formula of the material was considered and based on statistical parameters of each features: mean, weighted mean, geometric mean, weighted geometric mean, entropy, entropy weighted, range, weighted range, standard deviation, and weighted standard deviation were calculated (see Table 2). This gives us 8 × 10 = 80 features. One additional feature, a numeric variable counting the number of elements in the superconductor, is also extracted. We end up with 81 features. Thus, we have data with 83 columns: 1 column corresponding to the name of the material (identification), 81 columns corresponding to the features extracted, and 1 column of the observed critical temperature (Tc) values. The dataset contains information for 21,263 superconductors so that we have 21,262 rows of data. All 82 attributes for each material are numeric. The 81 features extracted are used as independent predictors (input variables) of the critical temperature (Tc), which is the dependent variable of the model. This approach to the formation of features is quite general and suitable for the study of superconducting materials due to the general uncertainty of the dependence of the critical temperature.

Table 1 The physico-chemical properties of an element that are employed for building its features in order to forecast Tc
Table 2 Description of the procedure for features extraction from material’s chemical formula. (The last column serves as an example: features relied on thermal conductivities for Re7Zr1 are derived and reported to two decimal places; Rhenium and Zirconium’s thermal conductivity coefficients are \(t_{1} = 48\,\) and \(t_{2} = 23\) W/(m K), respectively. Here: \(p_{1} = \frac{2}{3};\,p_{2} = \frac{1}{3}\); \(w_{1} = \frac{48}{{71}};\,w_{2} = \frac{23}{{71}}\); \(A = \frac{{p_{1} w_{1} }}{{p_{1} w_{2} + p_{2} w_{2} }} \approx 0.867;\,B = \frac{{p_{2} w_{2} }}{{p_{1} w_{2} + p_{2} w_{2} }} \approx 0.193)\)

2.2 Multivariate adaptive regression splines (MARS) approach

In statistical machine learning, multivariate adaptive regression splines (MARS) is a regression method first conceived by Friedman in 1991 which is appropriate for problems containing a large number of input variables [15,16,17,18,19]. The technique uses a nonparametric approach that can be understood as a prolongation of linear models which allows for considering interactions among input variables and nonlinearities.

The MARS technique constructs models according to the following expansion [15,16,17,18,19]:

$$\hat{f}\left( x \right) = \sum\limits_{i = 0}^{M} {c_{i} B_{i} \left( x \right)}$$
(1)

Therefore, this technique approximates the dependent output variable y by means of an averaged addition of \(B_{i} \left( x \right)\) so that the coefficients \(c_{i}\) are constant. \(B_{i} \left( x \right)\) can be [15,16,17,18,19]:

  • constant and equal to 1. This term is called intercept and corresponds to the term \(c_{0}\);

  • a hinge or hockey stick function: this function is \(\max \left( {0,constant - x} \right)\) or \(\max \left( {0,x - {\text{constant}}} \right)\). The constant value is termed knot. The MARS technique chooses variables and knot values for these according to the procedure indicated later;

  • the multiplication of hinge functions: in this case, these functions model nonlinear relationships between variables.

For instance, Fig. 1 shows a couple of splines for q = 1 at the node t = 3.5.

Fig. 1
figure 1

An example of linear basis functions (hinge function)

Two steps provide the base of the MARS method. First, it constructs a very complex model in the forward phase and then it simplifies it in the backward stage [19, 30, 34, 48]:

  • Forward stage: MARS starts with the intercept term, which is calculated by averaging the values of the dependent variable. Next, it adds linear combinations of pairs of hinge functions with the aim of minimizing the least-square error. These new hinge functions depend on a knot and a variable. Thus, to add new terms MARS has to try all the different combinations of variables and knots with the previous terms, called parent terms. Then, the coefficients \(c_{i}\) are determined using linear regression. Finally, it adds terms until a certain threshold for the residual error or a maximum number of terms is reached.

  • Backward stage: the previous stage usually constructs an overfitted model. In order to construct a better model with greater generalization skill, this new stage simplifies the model by removing terms, using the generalized cross-validation (GCV) criterion described below by first removing the terms that add more GCV to the model.

Generalized cross-validation (GCV) is the goodness-of-fit index utilized to assess the suitability of the terms of the model in order to prune them from the model. GCV not only takes into account the residual error but also how complex the model is. High values of GCV mean high residual error and complexity. The formula of this index is [15,16,17,18,19, 30, 34, 48]:

$${\text{GCV}}\left( M \right) = \frac{{\frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \hat{f}_{M} \left( {{\mathbf{x}}_{i} } \right)} \right)^{2} } }}{{\left( {1 - C\left( M \right)/n} \right)^{2} }}$$
(2)

where the parameter \(C\left( M \right)\) increases with the number of terms in the regression function and thus, the value of the GCV index rises. It is given by [15,16,17,18,19]:

$$C\left( M \right) = \left( {M + 1} \right) + d\,M$$
(3)

where d is a coefficient that determines the importance of this parameter and M is the number of terms in Eq. (1).

The relative importance of the independent variables that appear in the regression function (as only some of these variables remain in the final function) can be assessed using different criteria [15,16,17,18,19, 30, 34, 48]: (a) the GCV attached to a variable can be one of these criteria, and it is measured taking into account how much this index increases if the variable is erased from the final function; (b) the same criterion can be applied using the RSS index; (c) another criterion is the number of subsets (Nsubset) of which the variable is a part. If it is part of more terms, its importance is greater.

2.3 Whale optimization algorithm (WOA)

The whale optimization algorithm (WOA) is a new technique for solving optimization problems that was first proposed by Mirjalili and Lewis in order to optimize numerical problems [20]. The algorithm simulates the highly intelligent hunting behavior of humpback whales. This foraging behavior is called the bubble-net feeding method and is only observed in humpback whales, which create bubbles to encircle their prey while hunting. The whales dive approximately 12 m deep and then create the bubble spiral around their prey and then swim upward the surface following the bubbles. The mathematical model for spiral bubble-net feeding behavior is given as follows [20,21,22]:

  • Encircling prey

Humpback whales can recognize the location of prey and encircle them. Since the position of the optimum design in the search space is not known a priori, the WOA algorithm assumes that the current best candidate solution is the target prey or is close to the optimum. After the best search agent is defined, the other search agents will hence try to update their positions toward the best search agent. This behavior is represented by the following equations:

$$\begin{aligned} \vec{D} & = \left| {\vec{C} \cdot \vec{X}_{p} \left( t \right) - \vec{X}\left( t \right)} \right| \\ \vec{X}\left( {t + 1} \right) & = \vec{X}_{p} \left( t \right) - \vec{A} \cdot \vec{D} \\ \end{aligned}$$
(4)

where t indicates the current iteration, \(\vec{A}\) and \(\vec{C}\) are coefficient vectors, \(\vec{X}_{p}\) is the position vector of the prey, and \(\vec{X}\) indicates the position vector of a whale. The vectors \(\vec{A}\) and \(\vec{C}\) are calculated as follows:

$$\begin{aligned} \vec{A} & = 2\vec{a} \cdot \vec{r}_{1} - \vec{a} \\ \vec{C} & = 2\vec{r}_{2} \\ \end{aligned}$$
(5)

where components of \(\vec{a}\) are linearly decreased from 2 to 0 over the course of iterations and \(\vec{r}_{1}\), \(\vec{r}_{2}\) are random vectors in [0,1].

  • Exploitation phase: bubble-net attack method

The bubble-net strategy is a hybrid technique that combines two approaches that can be mathematically modeled as follows [20,21,22]:

  1. 1.

    Shrinking encircling mechanism: This behavior is achieved by decreasing the value of \(\vec{a}\). Note that the fluctuation range of \(\vec{A}\) is also decreased by \(\vec{a}\). In other words, \(\vec{A}\) is a random value in the interval \(\left[ { - a,a} \right]\) where a is decreased from 2 to 0 over the course of iterations. Setting random values for \(\vec{A}\) in \(\left[ { - 1,1} \right]\), the new position of a search agent can be defined anywhere in between the original position of the agent and the position of the current best agent.

  2. 2.

    Spiral updating position: This approach first calculates the distance between the whale located at \(\left( {\vec{X},\vec{Y}} \right)\) and prey located at \(\left( {\vec{X}^{ * } ,\vec{Y}^{ * } } \right)\). A spiral equation is then created between the position of whale and prey to mimic the helix-shaped movement of humpback whales as follows:

    $$\vec{X}\left( {t + 1} \right) = \vec{D}^{\prime}e^{bt} \cos \left( {2\pi t} \right) + \vec{X}^{ * }$$
    (6)

where \(\vec{D}^{\prime} = \left| {\vec{X}^{ * } \left( t \right) - \vec{X}\left( t \right)} \right|\) is the distance between the i-th whale and the prey (best solution obtained so far), b is a constant for defining the shape of the logarithmic spiral, and t is a random number in \(\left[ { - 1,1} \right]\). Note that humpback whales swim around the prey within an increasingly shrinking spiral-shaped path. In order to model this simultaneous behavior, we assume that there is a probability of 50% to choose between either the shrinking encircling mechanism or the spiral model to update the position of the whales during optimization. The mathematical model is as follows [20,21,22]:

$$\vec{X}\left( {t + 1} \right) = \left\{ {\begin{array}{*{20}c} {\vec{X}^{*} \left( t \right) - \vec{A} \cdot \vec{D}} & {{\text{if}}} & {p < 0.5} \\ {\vec{D}^{\prime}e^{bt} \cos \left( {2\pi t} \right) + \vec{X}^{ * } } & {{\text{if}}} & {p \ge 0.5} \\ \end{array} } \right\}$$
(7)

where p is a random number in \(\left[ {0,1} \right]\). In addition to the bubble-net method, the humpback whales search for prey randomly. The mathematical model of the search is as follows:

  • Exploration phase: search for prey

The same approach based on the variation of the \(\vec{A}\) vector can be utilized to search for prey (exploration). In fact, humpback whales search randomly according to their relative position to each other. Therefore, we use \(\vec{A}\) with the random values greater than 1 or less than \(- 1\) to force the search agent to move far away from a reference whale. In contrast to the exploitation phase, the position of a search agent in the exploration phase is updated according to a randomly chosen search agent instead of the best search agent. This mechanism and \(\left| {\vec{A}} \right| > 1\) emphasize exploration and allow the WOA algorithm to perform a global search. The mathematical model is as follows [20,21,22]:

$$\begin{aligned} \vec{D} & = \left| {\vec{C} \cdot \vec{X}_{rand} - \vec{X}} \right| \\ \vec{X}\left( {t + 1} \right) & = \vec{X}_{rand} - \vec{A} \cdot \vec{D} \\ \end{aligned}$$
(8)

where \(\vec{X}_{rand}\) is a random position vector (a random whale).

The WOA algorithm starts with a set of random solutions. At each iteration, search agents update their positions with respect to either a randomly chosen search agent or the best solution obtained so far. The a parameter is decreased from 2 to 0 in order to provide exploration and exploitation, respectively. A random search agent is chosen when \(\left| {\vec{A}} \right| > 1\), while the best solution is selected when \(\left| {\vec{A}} \right| < 1\) for updating the position of the search agents. Finally, the WOA algorithm is concluded upon the satisfaction of a termination criterion.

2.4 Ridge regression (RR)

Typically, we consider a sample consisting of n cases (or number of observations), that is, we have a set of training data \(\left( {{\mathbf{x}}_{1} ,y_{1} } \right),...,\left( {{\mathbf{x}}_{n} ,y_{n} } \right)\), each of which consists of p covariates (number of variables) and a single outcome. Let \(y_{i}\) be the outcome and \({\mathbf{x}}_{i} = \left( {x_{i1} ,x_{i2} ,...,x_{ip} } \right)^{T}\) be the covariate vector for the ith case. The most popular estimation method is known as the least squares fitting procedure, in which the coefficients \(\beta = \left( {\beta_{0} ,\beta_{1} ,...,\beta_{p} } \right)^{T}\) have been selected to minimize the residual sum of squares (RSS) [23,24,25]:

$${\text{RSS}} = \sum\limits_{i = 1}^{n} {\left( {y_{i} - \beta_{0} - \sum\limits_{j = 1}^{p} {\beta_{j} x_{ij} } } \right)^{2} }$$
(9)

Ridge regression is very similar to least squares, with the exception that their coefficients are estimated by minimizing a slightly different quantity. Specifically, the ridge regression coefficient estimates \(\hat{\beta }^{RR}\) are the values that minimize [18, 23,24,25]:

$$L^{RR} \left( {{\varvec{\upbeta}}} \right) = \sum\limits_{i = 1}^{n} {\left( {y_{i} - \beta_{0} - \sum\limits_{j = 1}^{p} {\beta_{j} x_{ij} } } \right)^{2} } + \lambda \sum\limits_{j = 1}^{p} {\beta_{j}^{2} } {\text{ = RSS + }}\lambda \sum\limits_{j = 1}^{p} {\beta_{j}^{2} }$$
(10)

where \(\lambda \ge 0\) is the regularization parameter or complexity parameter to be determined separately (tuning parameter), that controls the amount of shrinkage: the larger the value of \(\lambda\), the greater the amount of shrinkage. Indeed, Eq. (10) trades off two different criteria. As with least squares, Ridge regression seeks coefficient estimates that fit the data well, by making the RSS small. However, the second term, \(\lambda \sum\limits_{j = 1}^{p} {\beta_{j}^{2} }\), called a shrinkage penalty, is small when \(\beta_{1} ,...,\beta_{p}\) are close to zero, and so it has the effect of shrinking the estimates of \(\beta_{j}\) toward zero. The tuning parameter λ serves to control the relative impact of these two terms on the regression coefficient estimates. When \(\lambda = 0\), the penalty term has no effect, and Ridge regression will produce the least squares estimates \(\left( {{\text{as}}\,\,\lambda \to 0,\,\,\,{\hat{\mathbf{\beta }}}^{RR} \to {\hat{\mathbf{\beta }}}^{RRS} } \right)\). However, as \(\lambda \to \infty\), the impact of the shrinkage penalty grows, and the Ridge regression coefficient estimates will approach zero \(\left( {{\text{as}}\,\,\lambda \to \infty ,\,\,\,{\hat{\mathbf{\beta }}}^{RR} \to {\mathbf{0}}} \right)\). Unlike least squares, which generates only one set of coefficient estimates, ridge regression will produce a different set of coefficient estimates, \(\hat{\beta }_{\lambda }^{RR}\), for each value of \(\lambda\). Since selecting a good value for \(\lambda\) is critical, cross-validation has been used.

The advantage of Ridge regressions over least squares is rooted in the bias-variance trade-off. As λ increases, the flexibility of the ridge regression fit decreases, leading to decreased variance but increased bias. At the least squares coefficient estimates, which correspond to ridge regression with \(\lambda = 0\), the variance is high, but there is no bias. But as \(\lambda\) increases, the shrinkage of the ridge coefficient estimates leads to a substantial reduction in the variance of the predictions, at the expense of a slight increase in bias. Ridge regression improves prediction error by shrinking large regression coefficients in order to reduce overfitting, but it does not perform covariate selection and therefore does not help to make the model more interpretable.

2.5 Least absolute shrinkage and selection operator (Lasso) regression (LR)

Ridge regression does have one obvious disadvantage: it will include all p predictors in the final model. The penalty \(\lambda \sum\nolimits_{j = 1}^{p} {\beta_{j}^{2} }\) in Eq. (10) will shrink all of the coefficients toward zero, but it will not set any of them exactly to zero (unless \(\lambda \to \infty\)). This may not be a problem for prediction accuracy, but it can create a challenge in model interpretation in situations in which the number of p variables is quite large.

The Lasso regression is a relatively recent alternative to Ridge regression that helps to overcome this disadvantage. The Lasso coefficients, \(\hat{\beta }_{\lambda }^{Lasso}\), minimize the quantity [18, 25,26,27,28]:

$$L^{LR} \left( {{\varvec{\upbeta}}} \right) = \sum\limits_{i = 1}^{n} {\left( {y_{i} - \beta_{0} - \sum\limits_{j = 1}^{p} {\beta_{j} x_{ij} } } \right)^{2} } + \lambda \sum\limits_{j = 1}^{p} {\left| {\beta_{j} } \right|} {\text{ = RSS + }}\lambda \sum\limits_{j = 1}^{p} {\left| {\beta_{j} } \right|}$$
(11)

Comparing Eqs. (11) to (10) demonstrates that the Lasso and Ridge regressions have similar formulations. The only difference is that the \(\beta_{j}^{2}\) term in the Ridge regression penalty in Eq. (10) has been replaced by \(\left| {\beta_{j} } \right|\) in the Lasso penalty in Eq. (11). In statistical terms, the Lasso uses an \(L_{1}\) penalty instead of an \(L_{2}\) penalty. The \(L_{p}\) norm of a coefficient vector \(\beta\) is given by \(\left\| \beta \right\|_{p} = \left( {\sum\nolimits_{i = 1}^{n} {\left| {\beta_{i} } \right|^{p} } } \right)^{1/p}\).

As with Ridge regression, the Lasso shrinks the coefficient estimates toward zero. However, in the case of the Lasso, the \(L_{1}\) penalty has the effect of forcing some of the coefficient estimates to be exactly equal to zero when the tuning parameter \(\lambda\) is sufficiently large. Hence, then it performs variable selection. As a result, the models generated are generally much easier to interpret than those produced by Ridge regression. It can be said to yield sparse models, that is, models that involve only a subset of the variables. As in Ridge regression, selecting a good value of λ for the Lasso is critical. As a result, cross-validation has been employed.

2.6 Elastic-net regression (ENR)

Elastic-net regression (ENR) first emerged in response to critiques of the Lasso regression model, whose variable selection can be too dependent on data and thus unstable. The solution was to combine the penalties of Ridge and Lasso regressions to get the best of both worlds. Therefore, ENR is a convex combination of Ridge and Lasso regressions. Indeed, it aims at minimizing the following loss function [18, 23,24,25,26,27,28,29]:

$$L^{ENR} \left( {{\varvec{\upbeta}}} \right) = \frac{1}{2n}\sum\limits_{i = 1}^{n} {\left( {y_{i} - \beta_{0} - \sum\limits_{j = 1}^{p} {\beta_{j} x_{ij} } } \right)^{2} } + \lambda \left( {\frac{1 - \alpha }{2}\sum\limits_{j = 1}^{p} {\beta_{j}^{2} } + \alpha \sum\limits_{j = 1}^{p} {\left| {\beta_{j} } \right|} } \right)$$
(12)

where \(\alpha\) is the mixing parameter between Ridge (\(\alpha = 0\)) and Lasso (\(\alpha = 1\)). Now, there are two parameters to tune: \(\lambda\) and \(\alpha\). In short, the ENR is a regularized regression method that linearly combines both penalties i.e. \(L_{1}\) and \(L_{2}\) of the Lasso and Ridge regression methods, and it proves particularly useful when there are multiple correlated features. The essential difference between Lasso and Elastic-net regressions lies in the fact that the Lasso model is likely to pick only one of these features at random while elastic-net model is likely to pick both at once.

2.7 Approach accuracy

Eighty of the above-mentioned input variables from Sect. 2.1 have been employed in this study to build this novel WOA/MARS-based method. As is well known, the superconducting critical temperature Tc is the dependent variable to be predicted. In order to predict Tc from eighty variables with sufficient confidence, it is essential to select the best model fitted to the observed dataset. Although there are several possible statistics that can be used to ascertain the goodness-of-fit, the rule employed in this study was the coefficient of determination \(R^{2}\) [48,49,50], as it is a statistic employed in the scope of a statistical model whose principal objective is to predict upcoming results or to check an assumption. Next, the observed values are referred to as \(t_{i}\) and the values predicted by the model \(y_{i}\), making it possible to define the following sums of squares given by [48,49,50]:

  • \(SS_{tot} = \sum\nolimits_{i = 1}^{n} {\left( {t_{i} - \overline{t}} \right)^{2} }\): is the overall sum of squares, proportional to the sample variance.

  • \(SS_{reg} = \sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \overline{t}} \right)^{2} }\): is the regression sum of squares, also termed the explained sum of squares.

  • \(SS_{err} = \sum\nolimits_{i = 1}^{n} {\left( {t_{i} - y_{i} } \right)^{2} }\): is the residual sum of squares.

where \(\overline{t}\) is the mean of the n observed data:

$$\overline{t} = \frac{1}{n}\sum\limits_{i = 1}^{n} {t_{i} }$$
(13)

Based on the former sums, the coefficient of determination is specified by the following equation [48,49,50]:

$$R^{2} \equiv 1 - \frac{{SS_{err} }}{{SS_{tot} }}$$
(14)

Further criteria considered in this study were the root-mean-square error (RMSE) and mean absolute error (MAE) [48,49,50,51]. The RMSE is a statistic that is frequently used to evaluate the predictive capability of a mathematical model. Indeed, the root-mean-square error (RMSE) [48,49,50,51] is given by:

$${\text{RMSE}} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {t_{i} - y_{i} } \right)^{2} } }}{n}}$$
(15)

If the root-mean-square error (RMSE) is zero, there is no difference between the predicted and the observed data. The MAE, on the other hand, measures the average magnitude of the errors in a set of forecasts without considering their direction. MAE is the average over the verification sample of the absolute values of the differences between a forecast and the corresponding observation. Its mathematical expression is given by [48,49,50,51]:

$${\text{MAE}} = \frac{{\sum\nolimits_{i = 1}^{n} {\left| {t_{i} - y_{i} } \right|} }}{n}$$
(16)

Moreover, the MARS methodology relies heavily on the three hyperparameters [15,16,17,18,19]:

  • Maximum number of basis functions (Maxfuncs): maximum number of model terms before pruning, i.e., the maximum number of terms created by the forward pass.

  • Penalty parameter (d): the generalized cross-validation (GCV) penalty per knot. A value of 0 penalizes only terms, not knots. The value \(- 1\) means no penalty.

  • Interactions: maximum degree of interaction between variables.

It is important to consider that the MARS technique relies largely on the determination of all three of the aforementioned optimal hyperparameters. Some of the methods often used to determine suitable hyperparameters are [15,16,17,18,19, 30, 34, 48, 52]: grid search, random search, Nelder-Mead search, artificial bee colony, genetic algorithms, pattern search, etc. In this study, the numerical optimizer denominated whale optimization algorithm (WOA) [20,21,22] has been employed to determine these parameters based on its ability to solve nonlinear optimization problems.

Hence, a novel hybrid WOA/MARS-based method has been applied to predict the superconducting critical temperature Tc (output variable) from eighty variables (input variables) by studying their influence in order to optimize the calculation through the analysis of the coefficient of determination R2 with success. Figure 2 shows the flowchart of this new hybrid WOA/MARS-based model developed in this study.

Fig. 2
figure 2

Flowchart of the process of parameter optimization with WOA of the MARS model

Cross-validation was the standard technique used to find the real coefficient of determination (R2) [48,49,50]. Indeed, in order to guarantee the predictive ability of the WOA/MARS-based model, an exhaustive tenfold cross-validation algorithm was used [53], which involved splitting the sample into 10 parts and using nine of them for training and the remaining one for testing. This process was performed 10 times using each of the parties of the 10 divisions for testing and calculating the average error. Therefore, all the possible variability within the WOA/MARS-based model parameters has been evaluated in order to determine the optimum point, by having first searched for the parameters, which minimize the average error.

The implementation of the new hybrid WOA/MARS-based model has been performed using a multivariate adaptive regression splines (MARS) method, based on information obtained from the Earth library [54] together with the WOA technique with the MetaheuristicOpt package [20, 52] from the R Project. Additionally, the Ridge, Lasso, and Elastic-net regression models were implemented by using the glmnet package [55].

The bounds (initial ranges) of the space of solutions used in the WOA technique are shown in Table 3. A population of 40 whales has been used in the WOA optimization. The stopping criteria were the number of iterations along with at least 5 iterations with the same results. A total of fifty iterations were performed.

Table 3 Search space for each of the MARS parameters in the WOA tuning process

To optimize the MARS parameters, the WOA module is used as it searches for the best Maxfuncs, Interactions, and Penalty parameters by comparing the cross-validation error in every iteration. The search space is organized into three dimensions, one for each parameter. The main fitness factor or objective function is the coefficient of determination R2.

3 Analysis of results and discussion

All of the eighty independent input variables (eighty physico-chemical variables) are indicated above in Tables 1 and 2. The total number of samples used in the present study was 21,263, which is to say that it has built and treated data from 21,263 experimental samplings. This entire dataset was split into two approximate halves and one was used as a training set while the other was used as the testing set. As the training set still contained a very large number of samples, 1000 samples were randomly extracted and the hyperparameter tuning was performed using tenfold cross-validation. Once the optimal parameters were determined, a model was constructed with the whole training dataset, which served as model validation using the testing dataset.

Based on this methodology, Table 4 identifies the optimal parameters of the best fitted MARS-relied approach that were encountered using the WOA optimizer.

Table 4 Optimal hyperparameters of the best fitted MARS model found with the WOA technique in this investigation for the training set

Table 5 shows a list of 32 main basis functions for fitted WOA/MARS-based model and their coefficients, respectively. Note that \(h\left( x \right) = x\,\) if \(x > 0\) and \(h\left( x \right) = 0\) if \(x \le 0\). Therefore, the MARS model can be seen as an extension of linear models that automatically model nonlinearities and interactions as a weighted sum of the basis functions called hinge functions [15,16,17,18,19].

Table 5 List of basis functions of the best fitted WOA/MARS-based model for the superconducting critical temperature (Tc) and their coefficients \(c_{i}\)

A pictorial graph of the first-order and second-order terms that create the MARS-based approach for the superconducting critical temperature Tc is shown in Figs. 3 and 4, respectively.

Fig. 3
figure 3

Representation of the first-order terms of the three more important input independent variables for the dependent superconducting critical temperature Tc variable: a Tc vs. Range atomic radius; b Tc vs. Mean density; and c Tc vs. Weighted standard deviation Thermal Conductivity

Fig. 4
figure 4

Representation of the second-order terms of the more important input independent variables for the dependent critical temperature Tc variable

Based on the resulting calculations, the WOA/MARS-based technique allowed for the construction of a model with high allowances to assess the critical temperature Tc by means of the test dataset. Additionally, the Ridge, Lasso, and Elastic-net regression models were also built for the Tc output variable in order to predict the superconducting critical temperature of the superconductor state for different types of materials. Table 6 shows the determination and correlation coefficients (R2 and r), root-mean-square error (RMSE), and mean absolute error (MAE) over the testing set for the WOA/MARS, Ridge, Lasso, and Elastic-net models for the dependent Tc variable.

Table 6 Coefficients of determination (\(R^{2}\)), correlation coefficients (r), root-mean-square deviation (RMSE) and mean absolute error (MAE) over the testing set for the models fitted (WOA/MARS, Ridge, Lasso and Elastic-net) in this study using the training set

3.1 Significance of variables

Another important result of the current study is the relevance of the independent input variables in order to predict the superconducting critical temperature Tc for this nonlinear complex problem (see Table 7 and Fig. 5).

Table 7 Relative importance of the input physico-chemicals variables involved in the best fitted WOA/MARS-based model for the superconducting critical temperature Tc prediction according to criteria Nsubsets, GCV, and RSS
Fig. 5
figure 5

Relative importance of the input physico-chemical variables to predict the critical temperature Tc for the optimized fitted WOA/MARS-relied model according to the GCV parameter criterion

Ultimately, the most relevant input variable according to WOA/MARS approach in the Tc forecasting is Weighted Standard Deviation Thermal Conductivity. The second most significant input variable is Standard Deviation Atomic Mass, followed by: Range Atomic Mass, Weighted Mean Valence, Geometric Mean Density, Weighted Entropy Thermal Conductivity, Weighted Standard Electron Affinity, Mean Density, Weighted Range Electron Affinity, Standard Valence, Weighted Geometric Mean Thermal Conductivity, Weighted Standard Valence, Weighted Standard Atomic Mass, Range First Ionization Energy, Weighted Geometric Mean Density and Mean Fusion Heat.

We found that the most influential attributes were related to thermal conductivity. This is to be expected as both superconductivity and thermal conductivity are driven by lattice phonons and electrons transitions [8]. Also, the influence of ionic properties (related to the first ionization energy and electron affinity) could likely reflect the capability of superconductors to form ions, which is related to the movement through the crystalline lattice. This interpretation aligns well with BCS theory of superconductivity [2]. The knowledge of the physico-chemical features that are more directly related to the critical temperature can facilitate the study of superconducting materials.

Overall, the MARS-based technique has demonstrated itself to be an extremely accurate and highly satisfactory tool to indirectly assess the superconducting critical temperature Tc (dependent variable), conforming to the real observed data in this study, as a function of some main measured physico-chemical parameters. Specifically, Fig. 6 indicates the comparison between the experimental and predicted Tc values employing the WOA/MARS, Ridge, Lasso, and Elastic-net regression models for the test dataset. Thus, it is essential to combine the MARS methodology with the WOA optimizer to overcome this nonlinear regression problem through a novel hybrid approach that is significantly more robust and more effective than the three remaining regression models. In particular, the modeled and measured Tc values were found to be highly correlated. Table 8 shows the Tc observed and predicted for the first materials in Fig. 6.

Fig. 6
figure 6

Observed vs. predicted superconducting critical temperature Tc values using 100 samples from the testing dataset for four different models: a Ridge regression model (\(R^{2} = 0.6936\) and \(r = 0.8334\)); b Lasso regression model (\(R^{2} = 0.8005\) and \(r = 0.8541\)); c Elastic-net regression model (\(R^{2} = 0.7291\) and \(r = 0.8531\)); and d WOA/MARS-relied model (\(R^{2} = 0.8005\) and \(r = 0.8950\))

Table 8 Tc observed and predicted for some of the first materials in Fig. 6 for WOA/MARS model

4 Conclusion

Based on the abovementioned results, several core discoveries of this study can be drawn:

  • Existing analytical models to predict the superconducting critical temperature Tc from the observed values are not accurate enough as they make too many simplifications of a highly nonlinear and complex problem. Consequently, the use of machine learning methods such as the novel hybrid WOA/MARS-based approach employed in this study offer the best option for making accurate estimations of the Tc from experimental samplings.

  • The hypothesis that the identification of Tc can be determined with precision by employing a hybrid WOA/MARS-based approach in a wide variety of superconductors has been successfully validated here.

  • The application of this MARS-based methodology to the complete experimental dataset belonging to the Tc resulted in a satisfactory coefficient of determination and correlation coefficient whose values were 0.8005 and 0.8950, respectively.

  • The ranking according to the order of importance of the input variables entailed in the estimation of the Tc from experimental samplings in different superconductors has been established. Specifically, Weighted Standard Thermal Conductivity has been identified as the single most important factor in predicting critical temperature Tc. It is also important to note the successive order of importance, which is as follows: the Standard Atomic Mass, Atomic Range Radius, Weighted Mean Valence, Geometric Mean Density, Weighted Entropy Thermal Conductivity, Weighted Standard Electron Affinity, Mean Density, Weighted Range Electron Affinity, Standard Valence, Weighted Geometric Mean Thermal Conductivity, Weighted Standard Valence, Weighted Standard Atomic Mass, Range First Ionization Energy, Weighted Geometric Mean Density and Mean Fusion Heat in the obtained Tc outcome.

  • The principal role of the accurate hyperparameter determination in the MARS-based methodology in relation to the regression performance carried out for critical temperature Tc has been established using the WOA optimizer.

In conclusion, this procedure can be applied to successfully predict the superconducting critical temperature Tc of a variety of superconductors; however, it remains essential to consider the different physico-chemical features of each superconductor and/or experiment. Hence, the WOA/MARS-based method proves to be an extremely robust and useful answer to the nonlinear problem of the estimation of the Tc from experimental samplings in different superconductors. Researchers interested in finding high temperature superconductors may use the model to narrow their search. As a future extension of this work, we intend to apply the presented methodology to a more extensive database [43]. For instance, researchers could use this dataset along with new data (such as pressure or crystal structure) to make better models.