Abstract
Extreme learning machine (ELM) as a type of feedforward neural network has been widely used to obtain beneficial insights from various disciplines and realworld applications. Despite the advantages like speed and highly adaptability, instability drawbacks arise in case of multicollinearity, and to overcome this, additional improvements were needed. Regularization is one of the best choices to overcome these drawbacks. Although ridge and Liu regressions have been considered and seemed effective regularization methods on ELM algorithm, each one has own characteristic features such as the form of tuning parameter, the level of shrinkage or the norm of coefficients. Instead of focusing on one of these regularization methods, we propose a combination of ridge and Liu regressions in a unified form for the context of ELM as a remedy to aforementioned drawbacks. To investigate the performance of the proposed algorithm, comprehensive comparisons have been carried out by using various realworld data sets. Based on the results, it is obtained that the proposed algorithm is more effective than the ELM and its variants based on ridge and Liu regressions, RRELM and LiuELM, in terms of the capability of generalization. Generalization performance of proposed algorithm on ELM is remarkable when compared to RRELM and LiuELM, and the generalization performance of the proposed algorithm on ELM increases as the number of nodes increases. The proposed algorithm outperforms ELM in all data sets and all node numbers in that it has a smaller norm and standard deviation of the norm. Additionally, it should be noted that the proposed algorithm can be applied for both regression and classification problems.
1 Introduction
The feedforward neural networks (FNNs) have been seen as powerful tools in machine learning fields due to the adaptability on complex learning problems. However, difficulties arise in choosing parameters such as learning rate, momentum, period, stopping criteria, input weights, biases, and so on in FNNS. Therefore, Huang et al. (2006) proposed a learning algorithm called extreme learning machine (ELM) which overcomes slow learning speed and overfitting. The logic behind ELM is to generate the main networks parameters like input weights and biases randomly and to train a single layer feedforward network (SLFN) with a solution of classic linear systems. This main logic brings extra speed and performance improvement on the learning and generalization aspects to the ELM.
In recent years, ELM has been attracting considerable attention from the researchers and is widely used in realworld applications. There are many studies published on ELM in different research areas to demonstrate the performance of ELM or to improve ELM according to the applied area to get accurate results. Some of them are as follows: telecommunication for developing a robust and precise indoor positioning system (IPS) (Zou et al. 2016) and for the evaluation of intrusion detection mechanisms (Ahmad et al. 2018), neuroscience for concept drift learning (Mirza and Lin 2016), for discriminating preictal and interictal brain states in intracranial EEG (Song and Zhang 2016), for pathological brain detection (Lu et al. 2017), robotics for building an effective prediction model of input displacement of gripper (Petković et al. 2016), for determination the inverse kinematics solutions of a robotic manipulator (Zhou et al. 2018), astronomy for contour detection using Cassini ISS images (Yang et al. 2020), for developing a prediction model for the ionospheric propagation factor M(3000)F2 (Bai et al. 2020), psychology for attention deficit hyperactivity disorder using functional brain MRI (Qureshi et al. 2017), geology for mapping mineral prospectivity (Chen and Wu 2017), education for designing English classroom teaching quality evaluation model (Wang et al. 2017), biology for hepatocellular carcinoma nuclei grading (Li et al. 2017), chemistry for shortterm wind speed prediction (Chen et al. 2019), mathematics for solving ordinary differential equations (Yang et al. 2018), physics for the shortterm photovoltaic power generation forecasting (Tang et al. 2016), economics for gold price prediction (Weng et al. 2020) and credit score classification (Kuppili et al. 2020), energy for prediction of photovoltaic power (Zhou et al. 2020) and for resource optimization model (Han et al. 2021), environmental engineering for modeling qualitative and quantitative parameters of groundwater (Poursaeid et al. 2021), computer science for intrusion detection system (AlYaseen et al. 2017), for dimension reduction (Kasun et al. 2016), for shortterm load forecasting (Zeng et al. 2017), automation such as for traffic sign recognition (Huang et al. 2017) and for malware hunting (Jahromi et al. 2020). Also, ELM is also used in healthcare for the classification of COVID19 pneumonia infection from normal chest CT scans (Khan et al. 2021; Turkoglu 2021; Murugan and Goel 2021), for developing cloud computingbased framework for breast cancer diagnosis (Lahoura et al. 2021), for brain tumor detection (Özyurt et al. 2020), engineering for prediction compressive strength of concrete with partial replacements for cement (Shariati et al. 2020), for predicting the thermal conductivity of soil (Kardani et al. 2021) and for wheat yield prediction (Liu et al. 2022).
Although it has many applications in real life, ELM is also inadequate or has fields that need improvement. There are many studies done to eliminate these shortcomings; some of the improvement fields are as follows: the structure of hidden layer output matrix (Huynh et al. 2008; Ding et al. 2017), heteroscedasticity (Deng et al. 2009), outliers (Chen et al. 2017; Deng et al. 2009; Xu et al. 2016), overfitting (Chen et al. 2017; Deng et al. 2009; Zhang et al. 2017), feature selection (Miche et al. 2010, 2011; MartínezMartínez et al. 2011; Fakhr et al. 2015; He et al. 2017) and multicollinearity (Miche et al. 2011; MartínezMartínez et al. 2011; Toh 2008; Li and Niu 2013; Su et al. 2018; Nóbrega and Oliveira 2019; Yıldırım and Özkale 2019, 2020; Cancelliere et al. 2015). In this study, we will focus on multicollinearity in ELM and seek to answer the question of how ELM shows better stability and generalization performance when there is multicollinearity.
Multicollinearity which is defined as the linear dependencies among the predictors in linear regression has serious effects on the ordinary least squares (OLS) estimator in linear regression resulting in large variance and far away from the true parameters. Although the classical linear models related with OLS estimators like the linear discriminant analysis or linear regression model have been widely used in practical applications such as dimension reduction on big data (Reddy et al. 2020) and engineering (Kaluri et al. 2021), these estimates are unstable and show worse generalization performance in the presence of multicollinearity. Biased estimation, which is also known as shrinkage estimation, methods have been proposed to overcome the negative effects of multicollinearity on the OLS estimator. One of the most wellestablished biased estimator for dealing with multicollinearity in linear regression is the ridge estimator (a.k.a. Tikhonov regularization, \(L_{2}\)norm regularization) proposed by Hoerl and Kennard (1970). In ridge estimation, Hoerl and Kennard (1970) were interested in the estimation of parameters that provide a smaller mean square error by adding an \(L_{2}\)norm penalty along with a constant term to the objective function of the OLS. This additional constant term, known as the tuning parameter, affects the performance of the ridge estimator. Although the ridge estimator is the best known method in multicollinearity, its biggest problem is the lack of precision in tuning parameter selection. In addition, since the ridge estimator is not linear in terms of this tuning parameter, tuning parameter is difficult to choose. Therefore, Liu estimator was proposed by Liu (1993) as an alternative to ridge estimator which also has a tuning parameter affecting the performance of the model. Liu estimator can deal with multicollinearity and provides easier and solid ways on the selection of tuning parameters. Subsequently, many biased estimators have been proposed. One of them is the two parameter estimator proposed by Özkale and Kaçıranlar (2007) which was later called as OK estimator by Gruber (2012) and Özkale et al. (2022). In proposing this estimator, Özkale and Kaçıranlar (2007) have taken advantage of the idea that combining the two estimators will inherently make a new estimator with the advantages of both estimators.
1.1 Problem and some of the existing solutions
As in linear regression, in the context of ELM, multicollinearity also arises between the columns of the hidden layer output matrix (i.e., nodes) and causes instability and poor generalization performance of the ELM (Li and Niu 2013). Studies have been made and continue to be done to eliminate the negative effects of multicollinearity on ELM. Toh (2008) proposed a new approach based on ridge regression with sigmoid activation function to obtain minimum error for SLFNs in classification field. Deng et al. (2009) developed a novel algorithm called regularized ELM to deal with heteroscedasticity, outliers and multicollinearity and to obtain better generalization performance. Miche et al. (2010, 2011) proposed optimally pruned ELM (OPELM) and Tikhonov regularized optimally pruned ELM (TROPELM) based on \(L_{1}\)norm for sparsity and both \(L_{1}\) and \(L_{2}\) norms for both sparsity and stability, respectively. MartínezMartínez et al. (2011) developed a unified solution via ridge (\(L_{2}\)norm), elastic net and lasso (\(L_{1}\)norm) methods. Li and Niu (2013) proposed ELM based on ridge and almost unbiased ridge estimators (RRELM and AURELM) with an appropriate selection method of ridge tuning parameter. Yu et al. (2013) proposed a new approach based on TROPELM and pairwise distance calculation to deal with the missing data problem in ELM. Shao et al. (2015) proposed “automatic regularized ELM with leaveoneout crossvalidation” based on ridge regression to investigate the randomness performance of ELM. Cao et al. (2016) presented a novel approach based on stacked sparse denoising autoencoder—ridge regression to achieve more stable performance with comparable processing time in classification and regression applications. Luo et al. (2016) developed a unified framework of ELM using both \(L_{1}\)norm and \(L_{2}\)norm for regression and multiclass classification problems. Yu et al. (2018) proposed dual adaptive regularized online sequential extreme learning machine called as DAROSELM to improve the performance on detecting network intrusion. Wang and Li (2019) developed an ELM algorithm based regularization via an \( L_{0}\)based broken adaptive ridge (BAR) penalization on Coxtype model with advantages like avoiding some assumptions of classical survival models and achieving reasonable computation time. Yan et al. (2020) proposed a kernel ridge ELM algorithm by using artificial bee colony algorithm to determine the appropriate parameter for the insurance fraud problems. Guo (2020) is proposed a regularized ELM algorithm based on elastic net to keep a balance between system stability and solution’s sparsity. Jiao et al. (2021) presented an optimized regularized extreme learning machine algorithm based on the conjugate gradient (called as CGRELM) for estimating the state of charge.
Although RidgeELM is frequently used in the literature, there is no general rule regarding the selection of the ridge tuning parameter. On the other hand, the selection of ridge tuning parameter plays critical role on the performance of ridgetype ELM algorithms which affects both training & testing performance and speed of the algorithm. Also, there is no single selection method providing reasonable performance. That’s why, Yıldırım and Özkale (2019) proposed some alternative approaches for the selection of the ridge tuning parameter for RRELM (a.k.a RidgeELM) which are based on Akaike information criterion (AIC), Bayesian information criterion (BIC) and crossvalidation (CV) method and presented a comprehensive comparison. Furthermore, selection of the ridge tuning parameter is not easy because of the ridge estimator being nonlinear function of the ridge tuning parameter. Therefore, Yıldırım and Özkale (2020) proposed a novel algorithm called LELM (a.k.a. LiuELM) as an alternative to RRELM algorithm and provided more stable and generalizable results than its competitors like ELM, RRELM and AURELM and OPELM.
1.2 Contribution
Both ridge and Liu estimators have individual characteristic properties at the point of dealing with multicollinearity. Depending on the problem and data structure, these estimators can overperform each other. Even if the ridge and Liu estimation methods are adapted to ELM to improve the multicollinearity problem in ELM studies, estimator adaptations that can outperform RRELM and LiuELM under multicollinearity can further be made. Utilizing from the idea of using both estimators in a unified form, we consider a new method alternative to RRELM and LiuELM that provides more insightful and better results in terms of learning capability, stability and generalization performance. Our novel method in ELM is based on twoparameter (a.k.a OK) estimator originally proposed by Özkale and Kaçıranlar (2007) in linear regression field. The key features of the proposed algorithm are summarized as follows:

The proposed algorithm presents a unified form of RRELM and LiuELM which improves the ELM and its variants RRELM and LiuELM at the point of obtaining more stable and generalizable results due to the existence of the effects of both k and d tuning parameters in the model.

The proposed algorithm gives a regularization method which can be easily adjustable to any other model for dealing with multicollinearity and irrelevant features.

The proposed algorithm depends on two tuning parameters so that one of the tuning parameters provides better generalization performance while the other provides better shrinkage.

The proposed algorithm can be easily integrated to any system & algorithm to provide solutions for both classification and regression studies in the context of ELM.
1.3 Organization
The rest of the paper is structured as follows. We present the review of related studies including the preliminary ELM and its variants in Sect. 2. In Sect. 3, the details of our proposed method are described. Experimental results and findings are given in Sect. 4. The conclusions are summarized in Sect. 5.
2 Review of related studies
The ELM introduced by Huang et al. (2006) to make possible a network training without tuning any parameter was proposed as an alternative to gradientdescent based algorithms like backpropagation for SLFNs. The idea was noteworthy because of speed capability. ELM algorithm is based on searching best weights providing minimum training error and minimum normed weights via randomly assign networks parameters including input weights and biases. As a results of random assignment, the output weights can be obtained by solving a classic linear system in the output layer. The usage of some approaches like least squares, MoorePenrose inverse for solution stage brings ELM some advantages like faster learning, less need for human intervention, less possibility for reaching local optima and mostly reasonable generalization performance. Table 1 summarizes the features/key findings and challenges of ELM in the context of regularized ELM. In this section, we summarize the preliminary ELM and RRELM and LiuELM.
2.1 The preliminary ELM
A classic SLFN can be expressed as
where \(\left( {\textbf{x}}_{j}^{T},{\textbf{t}}_{j}^{T}\right) \) is the set of N distinct patterns with \({\textbf{x}}_{j}\in R^{p}\) and \({\textbf{t}}_{j}\in R^{m}\) is the mdimensional network output, \(\mathbf {\delta }_{i}\) are the input weights, \(b_{i}\) are the biases, \(\theta \) is the number of hidden neurons, \(f\left( .\right) \) is the activation function and \(\varvec{ \beta }_{i}\) are the output weights (Huang et al. 2004, 2006). The basic SLFN structure is given by Fig. 1.
The matrix form of Eq. (1) can be written as:
where
is the output matrix of hidden layer, \(\varvec{\beta }_{\left( \theta \times m\right) }=\left( \beta _{1},\ldots ,\beta _{\theta }\right) ^{T}\) and \( {\varvec{T}}_{\left( N\times m\right) }=\left( {\textbf{t}}_{1},\ldots ,{\textbf{t}}_{N}\right) ^{T}\) are the output weights vector and output values vector, respectively. Here, m corresponds to the number of output layer neurons which is commonly equal to the number of target variable and fixed as 1 in most practical applications.
In order to get the solution of Eq. (2), the following objective function is minimized:
The minimizer of the objective function in Eq. (3) (i.e., the estimator of \(\varvec{\beta }\)) can be found analytically as
where \({\textbf{H}}^{+}\) is the Moore–Penrose inverse of matrix \({\textbf{H}}\) (Huang et al. 2006). Some popular ways to calculate the MoorePenrose inverse are the orthogonal projection method, iterative methods and singular value decomposition (Rao et al. 1971; Schott 2005). According to the orthogonal projection method, \({\textbf{H}}^{+}\) is calculated via \({\textbf{H}}^{T}\left( \textbf{HH}^{T}\right) ^{1}\) if \({\textbf{H}}\) is full row rank, else \( {\textbf{H}}^{+}=\left( {\textbf{H}}^{T}{\textbf{H}}\right) ^{1}{\textbf{H}}^{T}\) if \({\textbf{H}}\) is full column rank.
2.2 ELM based on ridge and Liu regression
Although the solution given by Eq. (4) provides faster solutions, it has some drawbacks in some situations like multicollinearity. Due to the multicollinearity, the stability and generalization performance may weaken. Ridgebased ELM is defined by Li and Niu (2013) and optimizes the objective function
The closed form solution of Eq. (5) by using simple algebra can be found as
where k is the ridge tuning parameter (Yıldırım and Özkale 2020).
Yıldırım and Özkale (2020) by minimizing the objective function
introduced LiuELM as
where \(0<d<1\) is called as Liu tuning parameter. The properties of LiuELM are considered by Yıldırım and Özkale (2020).
3 A new type of ELM based on unified ridge and Liu idea
RRELM and LiuELM have individual advantages in terms of capabilities for improving the stability and generalization performance. Starting from the idea of combining both estimators, we propose a new estimator named OKELM. For this, we utilize the objective function
which was originally the idea of Özkale and Kaçıranlar (2007) in linear regression and the resulted estimator was then called as OK estimator by Gruber (2012). The objective function in Eq. (6) looks for an estimator of \(\varvec{\beta }\) which minimizes \(\left( {\textbf{H}}\varvec{\beta }{\textbf{T}}\right) ^{T}\left( {\textbf{H}}\varvec{\beta }{\textbf{T}}\right) \) in an equivalence class of estimators of \(\varvec{\beta }\) which are equal distance from \(d\varvec{\hat{\beta }}_\textrm{ELM}\) and its general form of LiuELM by k constant. By minimizing the objective function in Eq. (6), we get
where \(0<d<1\) and \(k>0\) are the tuning parameters. \(\widehat{\varvec{ \beta }}_\mathrm{OKELM}\) has some statistical properties:

The OKELM enjoys the computational advantages of ELM. For this purpose, we define the augmented matrices
$$\begin{aligned} {\varvec{{\tilde{H}}}}=\genfrac(){0.0pt}0{{\textbf{H}}}{\sqrt{k}{\textbf{I}}_{\theta }},~ {\varvec{{\tilde{T}}}}=\genfrac(){0.0pt}0{{\textbf{T}}}{\sqrt{k} d\varvec{\hat{\beta }} _\textrm{ELM}} \end{aligned}$$where \({\textbf{I}}_{\theta }\) is the identity matrix with dimension \(\theta \) . This implies that the OKELM is obtained by using the prior information on \(\varvec{\beta }\) in the form of linear stochastic restrictions \(\sqrt{k}d \varvec{\hat{\beta }}_\textrm{ELM}=\sqrt{k}\beta +\varepsilon ^{*}\) where \( \varepsilon ^{*}\) is a random vector with mean 0 and variance–covariance matrix same with the output weight vector. The optimal solution \(\widehat{\varvec{\beta }}_{\text {OKELM}}\) based on augmented form of the linear system corresponds to the minimizer of the objective function
$$\begin{aligned} ({\varvec{{\tilde{H}}}}\varvec{\beta }{\varvec{{\tilde{T}}}})^{T}({\varvec{{\tilde{H}}}}\varvec{\beta }{\varvec{{\tilde{T}}}}). \end{aligned}$$ 
It is a convex combination of RRELM and ELM (Özkale 2013; Gruber 2012):
$$\begin{aligned} \widehat{\varvec{\beta }}_{\text {OKELM}}=d\varvec{\hat{\beta }}_\textrm{ELM}+\left( 1d\right) \widehat{\varvec{\beta }}_{\text {RRELM}}^{(k)} \end{aligned}$$
This convex combination shows that the tuning parameter, d, controls the respective contributions of the ELM and RRELM. As d goes to 1, the contribution of ELM is more and RRELM is less; however, as d goes to 0 RRELM has more contribution than ELM. Thus the parameter d plays a role as proportion of contribution between ELM and RRELM. As a common choice in experimental settings, we consider a grid search of the related parameters for OKELM. The main goal is to obtain the optimal parameters combination yielding the minimum testing error. The details of the computing algorithm for OKELM are as explained in Fig. 2.
4 Experimental procedure and results
In this section, a comparative analysis is presented to measure the performance of the proposed algorithm (OKELM) with its competitors including ELM (Huang et al. 2006), RRELM (Li and Niu 2013) and LiuELM (Yıldırım and Özkale 2020) on twelve different regression benchmark data sets which have been collected from UCI repository (Asuncion and Newman 2007). The description details of these data sets are summarized in Table 2 . Sigmoidal activation function described as \(f(\delta ,b,X)=1/\left( 1+e^{\left( \delta X+b\right) }\right) \) is used for all data sets. The number of hidden layers neuron \(\left( \theta \right) \) is set equally as 50, 100, 150 for all algorithms. The experiments have been conducted in R Software platform and all codes related with the algorithms have been written from scratch. In order to eliminate the effect of data scale, each attribute of data sets has been standardized as zero mean and unit variance by using the formula:
To calculate the generalization performance effectively, we used fivefold CV approach. For each fold, forty trials have been carried out and the mean of all metrics for all trials has been reported with its standard deviation. As the performance metric, we used root mean square error (RMSE) which is defined as
where \(\left( {\textbf{o}}_{j}{\textbf{t}}_{j}\right) \) corresponds to the error between the actual and output values of the network. The values of tuning parameters have significant effects on the performances of the algorithm based ridge and Liu estimators. In order to observe the effects of tuning parameter, the selection process of ridge and Liu tuning parameters for all data sets has been carried out in the same way and range. The ridge tuning parameter \(\left( k\right) \) and Liu tuning parameter \(\left( d\right) \) are, respectively, selected via CV within the following ranges:
In each fold, the k and d parameters minimizing the testing CV error are determined as optimal for corresponding fold. This process is repeated for all trials and the mean values of d and k parameters calculated as overall for all folds are given in Table 3. Table 3 summarizes the performance of each algorithm against its RMSE and standard deviation in the optimum tuning parameter for all data sets. Besides, the norm values with standard deviations are presented in Table to investigate the effect of the proposed algorithm in terms of shrinkage performance. Based on the results in Table 3, we also show the reduction rate (RR) to obtain the performance percentage of OKELM over the other methods and give in Figs. 2 and 3. The RR is calculated for both testing RMSE and standard deviation of RMSE as
We obtain the following conclusions:

What stands out in Table 3 and Fig. 3 is that is that there is a clear trend of decreasing test RMSE value for the proposed algorithm (OKELM) compared to ELM, RRELM and LiuELM regardless of the node number. Only for Forests data set, OKELM is worse than others in terms of testing RMSE. As the node number increases, the RR of OKELM over ELM, RRELM and LiuELM increases that its superiority over ELM is remarkable. It can therefore be assumed that the proposed algorithm is more generalizable than its competitors.

Table 3 and Fig. 3 present that the changes on the standard deviation of testing RMSE vary depending on the node numbers. When the node number is 50, OKELM is the best except Auto Price, Body Fat, Machine CPU and Forest data sets while RRELM is the best on these 4 data sets. There is no single biased ELM method that is best when the number of node is 100 or 150 in terms of the standard deviation of testing RMSE. According to the data, one biased ELM method may be better than the other. There are data where RRELM or LiuELM is better than OKELM, while OKELM, RRELM and LiuELM all are always better than ELM in all data and all node numbers in the sense of the standard deviation of testing RMSE.

In Table 4, it is shown that OKELM outperforms ELM in the sense of having smaller norm and the standard deviation of the norm in all data sets and all node numbers. The OKELM algorithm provides smaller norm and the standard deviation of the norm than all other algorithms for Abalone, Auto Mpg, Fish and Yacht data sets regardless of the node numbers. For the rest of data sets, the performance of OKELM over RRELM or LiuELM depends on the node number in terms of norm and standard deviation of norm. In all data except Auto Price and Servo data, as the number of nodes increases, the norm value of OKELM becomes better than RRELM and LiuELM, that is, OKELM is better than RRELM and LiuELM while the number of nodes is large under the fixed number of nodes. The results validate that the OKELM algorithm gives smaller norm of coefficients (i.e., satisfying shrinkage performance) than the ELM, RRELM and LiuELM algorithms especially when the node number is large. Except for Auto Price and Servo data, with 150 nodes in all other data, OKELM outperforms RRELM and LiuELM in terms of the standard deviation of the norm.

Figures 5, 6 and 7 show the change of errors of all four algorithms for Abalone, Servo and Strikes data sets, respectively. The errors have been retrieved from the testing results of a random cross validation process at the optimal parameter values which are approximately equal to the mean values given in Table 3. Figures 5, 6 and 7 present the stability performance, the spread around zero values will be more homogeneous. When the range of errors is examined in Figs. 5, 6 and 7, it is observed that RRELM and LiuELM have almost same stability performance while ELM is the worst and the OKELM is the best. At the point of stability, the OKELM algorithm is more stable around zero than its competitors.
As mentioned before, the tuning parameters for all algorithms including RRELM, LiuELM and OKELM have effect on the performance of each algorithm. In this study, we have been carried out a comprehensive grid search on experiments. To investigate the effect of each parameter, we give the performance change of RRELM, LiuELM and OKELM based on their own tuning parameter. By using an optimum value of each tuning parameter, the changes of testing performance depending on the other tuning parameter are given in Figs. 8 and 9 for Abalone and Fish data sets for 150 and 50 node number, respectively. To investigate the performance change of OKELM and RRELM, the Liu tuning parameter is taken as fixed as the optimum value which is approximately equal the mean value given in Tables 2 and 3. Similar process is repeated for OKELM and LiuELM by taking the ridge tuning parameter as fixed. It can be seen in Figs. 8 and 9 that the tuning parameters significantly affect the performance of the algorithms in training process. The performance of OKELM can outperform both RRELM and LiuELM if the tuning parameters are properly tuned. For a particular data set, the breaking points can give useful insights to determine the optimal range of each tuning parameter.
5 Conclusions
In this paper, we proposed a novel algorithm based on the combination of ridge and Liu algorithms in order to deal with the multicollinearity problem in the context of ELM. The main advantage of the proposed algorithm is to enjoy the properties of both ridge and Liu algorithms and to present an alternative and easily adaptable to any other system and algorithm for obtaining the solutions of both regression and classification problems. Based on the experimental results, the proposed algorithm can outperform its competitors in terms of testing RMSE and stability performances for the appropriate selection of the (k, d) parameters.
The OKELM method that we newly propose has three main limitations:

Because of depending on two tuning parameters, it takes time in terms of computation

It cannot be used in highdimensional ELM settings

It does not select nodes
In the future research direction, an effective estimation method on ELM can be proposed on highdimensional data which can do variable selection.
6 Future studies
The main shortcomings of this study are to integrate some deterministic selection methods of the tuning parameters for the proposed method and not to be able to use in highdimensional settings. In the future works, we focus on the highdimensional issues to provide more effective algorithms to the field of machine learning, especially ELM.
Data availability
Enquiries about data availability should be directed to the authors.
References
Ahmad I, Basheri M, Iqbal MJ, Rahim A (2018) Performance comparison of support vector machine, random forest, and extreme learning machine for intrusion detection. IEEE Access 6:33789–33795. https://doi.org/10.1109/ACCESS.2018.2841987
AlYaseen WL, Othman ZA, Nazri MZA (2017) Multilevel hybrid support vector machine and extreme learning machine based on modified kmeans for intrusion detection system. Expert Syst Appl 67:296–303. https://doi.org/10.1016/j.eswa.2016.09.041
Asuncion A, Newman D (2007) UCI machine learning repository
Bai H, Feng F, Wang J, Wu T (2020) Modeling M(3000)F2 based on extreme learning machine. Adv Space Res 65:107–114. https://doi.org/10.1016/j.asr.2019.09.021
Cancelliere R, Gai M, Gallinari P, Rubini L (2015) OCReP: an optimally conditioned regularization for pseudoinversion based neural training. Neural Netw 71:76–87
Cao LL, Huang WB, Sun FC (2016) Building feature space of extreme learning machine with sparse denoising stackedautoencoder. Neurocomputing 174:60–71
Chen Y, Wu W (2017) Mapping mineral prospectivity using an extreme learning machine regression. Ore Geol Rev. https://doi.org/10.1016/j.oregeorev.2016.06.033
Chen K, Lv Q, Lu Y, Dou Y (2017) Robust regularized extreme learning machine for regression using iteratively reweighted least squares. Neurocomputing 230:345–358. https://doi.org/10.1016/j.neucom.2016.12.029
Chen MR, Zeng GQ, Lu KD, Weng J (2019) A twolayer nonlinear combination method for shortterm wind speed prediction based on ELM, ENN, and LSTM. IEEE Internet Things J 6:6997–7010
Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: 2009 IEEE symposium on computational intelligence and data mining, IEEE, Nashville, TN, USA, pp 389–395. https://doi.org/10.1109/CIDM.2009.4938676
Ding S, Zhang N, Zhang J, Xu X, Shi Z (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8:587–595. https://doi.org/10.1007/s1304201503518
Fakhr MW, Youssef ENS, ElMahallawy MS (2015) L1regularized least squares sparse extreme learning machine for classification, p 4
Gruber MH (2012) Liu and ridge estimators—a comparison. Commun StatTheory Methods 41:3739–3749
Guo L (2020) Extreme learning machine with elastic net regularization. Intell Autom Soft Comput 26:421–427
Han Y, Liu S, Cong D, Geng Z, Fan J, Gao J, Pan T (2021) Resource optimization model using novel extreme learning machine with tdistributed stochastic neighbor embedding: application to complex industrial processes. Energy 225:120255
He B, Sun T, Yan T, Shen Y, Nian R (2017) A pruning ensemble model of extreme learning machine with \(L_{1/2}\) regularizer. Multidim Syst Sign Process 28:1051–1069. https://doi.org/10.1007/s1104501604379
Hoerl AE, Kennard RW (1970) Ridge regression: applications to nonorthogonal problems. Technometrics 12:69–82. https://doi.org/10.1080/00401706.1970.10488635
Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE international joint conference on neural networks (IEEE Cat. No.04CH37541), IEEE, pp 985–990. https://doi.org/10.1109/IJCNN.2004.1380068
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501. https://doi.org/10.1016/j.neucom.2005.12.126
Huang Z, Yu Y, Gu J, Liu H (2017) An efficient method for traffic sign recognition based on extreme learning machine. IEEE Trans Cybern 47:14
Huynh HT, Won Y, Kim JJ (2008) An improvement of extreme learning machine for compact singlehiddenlayer feedforward neural networks. Int J Neural Syst 18:433–441. https://doi.org/10.1142/S0129065708001695
Jahromi AN, Hashemi S, Dehghantanha A, Choo KKR, Karimipour H, Newton DE, Parizi RM (2020) An improved twohiddenlayer extreme learning machine for malware hunting. Comput Secur 89:101655
Jiao M, Yang Y, Wang D, Gong P (2021) The conjugate gradient optimized regularized extreme learning machine for estimating state of charge. Ionics 27:4839–4848
Kaluri R, Rajput DS, Xin Q, Lakshmanna K, Bhattacharya S, Gadekallu TR, Maddikunta PKR (2021) Roughsetsbased approach for predicting battery life in IoT. arXiv preprint arXiv:2102.06026
Kardani N, Bardhan A, Samui P, Nazem M, Zhou A, Armaghani DJ (2021) A novel technique based on the improved firefly algorithm coupled with extreme learning machine (ELMIFF) for predicting the thermal conductivity of soil. Eng Comput. https://doi.org/10.1007/s00366021013293
Kasun LLC, Yang Y, Huang GB, Zhang Z (2016) Dimension reduction with extreme learning machine. IEEE Trans Image Process 25:3906–3918. https://doi.org/10.1109/TIP.2016.2570569
Khan MA, Kadry S, Zhang YD, Akram T, Sharif M, Rehman A, Saba T (2021) Prediction of COVID19pneumonia based on selected deep features and one class kernel extreme learning machine. Comput Electr Eng 90:106960
Kuppili V, Tripathi D, Reddy Edla D (2020) Credit score classification using spiking extreme learning machine. Comput Intell 36:402–426
Lahoura V, Singh H, Aggarwal A, Sharma B, Mohammed MA, Damaševičius R, Kadry S, Cengiz K (2021) Cloud computingbased framework for breast cancer diagnosis using extreme learning machine. Diagnostics 11:241
Li G, Niu P (2013) An enhanced extreme learning machine based on ridge regression for regression. Neural Comput Appl 22:803–810. https://doi.org/10.1007/s0052101107717
Li S, Jiang H, Pang W (2017) Joint multiple fully connected convolutional neural network with extreme learning machine for hepatocellular carcinoma nuclei grading. Comput Biol Med 84:156–167. https://doi.org/10.1016/j.compbiomed.2017.03.017
Liu K (1993) A new class of blased estimate in linear regression. Commun Stat  Theory Methods 22:393–402. https://doi.org/10.1080/03610929308831027
Liu Y, Wang S, Wang X, Chen B, Chen J, Wang J, Huang M, Wang Z, Ma L, Wang P et al (2022) Exploring the superiority of solarinduced chlorophyll fluorescence data in predicting wheat yield using machine learning and deep learning methods. Comput Electron Agric 192:106612
Lu S, Qiu X, Shi J, Li N, Lu ZH, Chen P, Yang MM, Liu FY, Jia WJ, Zhang Y (2017) A pathological brain detection system based on extreme learning machine optimized by bat algorithm. CNS Neurol Disord  Drug Targets 16:23–29. https://doi.org/10.2174/1871527315666161019153259
Luo X, Chang X, Ban X (2016) Regression and classification using extreme learning machine based on L1norm and L2norm. Neurocomputing 174:179–186. https://doi.org/10.1016/j.neucom.2015.03.112
MartínezMartínez JM, EscandellMontero P, SoriaOlivas E, MartínGuerrero JD, MagdalenaBenedito R, GómezSanchis J (2011) Regularized extreme learning machine for regression problems. Neurocomputing 74:3716–3721. https://doi.org/10.1016/j.neucom.2011.06.013
Miche Y, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A (2010) OPELM: optimally pruned extreme learning machine. IEEE Trans Neural Networks 21:158–162. https://doi.org/10.1109/TNN.2009.2036259
Miche Y, van Heeswijk M, Bas P, Simula O, Lendasse A (2011) TROPELM: a doubleregularized ELM using LARS and Tikhonov regularization. Neurocomputing 74:2413–2421. https://doi.org/10.1016/j.neucom.2010.12.042
Mirza B, Lin Z (2016) Metacognitive online sequential extreme learning machine for imbalanced and conceptdrifting data classification. Neural Netw 80:79–94. https://doi.org/10.1016/j.neunet.2016.04.008
Murugan R, Goel T (2021) EDiCoNet: extreme learning machine based classifier for diagnosis of COVID19 using deep convolutional network. J Ambient Intell Humaniz Comput 12:8887–8898
Naik SM, Jagannath RPK, Kuppili V (2020) An automatic estimation of the ridge parameter for extreme learning machine. Chaos: Interdiscip J Nonlinear Sci 30:013106
Nóbrega JP, Oliveira AL (2019) A sequential learning method with Kalman filter and extreme learning machine for regression and time series forecasting. Neurocomputing 337:235–250. https://doi.org/10.1016/j.neucom.2019.01.070
Özkale MR (2013) Influence measures in affine combination type regression. J Appl Stat 40:2219–2243
Özkale MR, Abbasi A (2022) Iterative restricted OK estimator in generalized linear models and the selection of tuning parameters via MSE and genetic algorithm. Stat Pap. https://doi.org/10.1007/s00362022013040
Özkale MR, Kaçıranlar S (2007) The restricted and unrestricted twoparameter estimators. Commun Stat  Theory Methods 36:2707–2725. https://doi.org/10.1080/03610920701386877
Özyurt F, Sert E, Avcı D (2020) An expert system for brain tumor detection: fuzzy Cmeans with super resolution and convolutional neural network with extreme learning machine. Med Hypotheses 134:109433
Petković D, Seyed Danesh A, Dadkhah M, Misaghian N, Shamshirband S, Zalnezhad E, Pavlović ND (2016) Adaptive control algorithm of flexible robotic gripper by extreme learning machine. Robot ComputIntegr Manuf 37:170–178. https://doi.org/10.1016/j.rcim.2015.09.006
Poursaeid M, Mastouri R, Shabanlou S, Najarchi M (2021) Modelling qualitative and quantitative parameters of groundwater using a new wavelet conjunction heuristic method: wavelet extreme learning machine versus wavelet neural networks. Water Environ J 35:67–83
Qureshi MNI, Oh J, Min B, Jo HJ, Lee B (2017) Multimodal, multimeasure, and multiclass discrimination of ADHD with hierarchical feature extraction and extreme learning machine using structural and functional brain MRI. Front Hum Neurosci. https://doi.org/10.3389/fnhum.2017.00157
Rao C, Mitra SK, Mitra J (1971) Generalized inverse of matrices and its applications. Probability and statistics series. Wiley, Hoboken, NJ
Reddy GT, Reddy MPK, Lakshmanna K, Kaluri R, Rajput DS, Srivastava G, Baker T (2020) Analysis of dimensionality reduction techniques on big data. IEEE Access 8:54776–54788
Schott J (2005) Matrix analysis for statistics. Wiley series in probability and statistics. Wiley, Hoboken, NJ
Shao Z, Er MJ, Wang N (2015) An effective semicrossvalidation model selection method for extreme learning machine with ridge regression. Neurocomputing 151:933–942. https://doi.org/10.1016/j.neucom.2014.10.002
Shariati M, Mafipour MS, Ghahremani B, Azarhomayun F, Ahmadi M, Trung NT, Shariati A (2020) A novel hybrid extreme learning machine–grey wolf optimizer (ELMGWO) model to predict compressive strength of concrete with partial replacements for cement. Eng Comput. https://doi.org/10.1007/s00366020010810
Song Y, Zhang J (2016) Discriminating preictal and interictal brain states in intracranial EEG by sample entropy and extreme learning machine. J Neurosci Methods 257:45–54. https://doi.org/10.1016/j.jneumeth.2015.08.026
Su X, Zhang S, Yin Y, Xiao W (2018) Prediction model of permeability index for blast furnace based on the improved multilayer extreme learning machine and wavelet transform. J Franklin Inst 355:1663–1691. https://doi.org/10.1016/j.jfranklin.2017.05.001
Tang P, Chen D, Hou Y (2016) Entropy method combined with extreme learning machine method for the shortterm photovoltaic power generation forecasting. Chaos Solitons Fractals 89:243–248. https://doi.org/10.1016/j.chaos.2015.11.008
Toh KA (2008) Deterministic neural classification. Neural Comput 20:1565–1595
Turkoglu M (2021) COVID19 detection system using chest CT images and multiple kernelsextreme learning machine based on deep neural network. IRBM 42:207–214
Wang H, Li G (2019) Extreme learning machine cox model for highdimensional survival analysis. Stat Med 38:2139–2156
Wang B, Wang J, Hu G (2017) College English classroom teaching evaluation based on particle swarm optimization—extreme learning machine model. Int J Emerg Technol Learn (iJET) 12:82. https://doi.org/10.3991/ijet.v12i05.6782
Weng F, Chen Y, Wang Z, Hou M, Luo J, Tian Z (2020) Gold price forecasting research based on an improved online extreme learning machine algorithm. J Ambient Intell Humaniz Comput 11:4101–4111
Xu C, Tao D, Xu C (2016) Robust extreme multilabel learning. https://doi.org/10.1145/2939672.2939798
Yan C, Li Y, Liu W, Li M, Chen J, Wang L (2020) An artificial bee colonybased kernel ridge regression for automobile insurance fraud identification. Neurocomputing 393:115–125
Yang Y, Hou M, Luo J (2018) A novel improved extreme learning machine algorithm in solving ordinary differential equations by Legendre neural network methods. Adv Differ Equ. https://doi.org/10.1186/s136620181927x
Yang X, Zhang Q, Li Z (2020) Contour detection in Cassini ISS images based on hierarchical extreme learning machine and dense conditional random field. Res Astron Astrophys 20:011. https://doi.org/10.1088/16744527/20/1/11. arXiv:1908.08279
Yıldırım H, Özkale MR (2019) The performance of ELM based ridge regression via the regularization parameters. Expert Syst Appl 134:225–233. https://doi.org/10.1016/j.eswa.2019.05.039
Yıldırım H, Özkale MR (2020) An enhanced extreme learning machine based on Liu regression. Neural Process Lett 52:421–442. https://doi.org/10.1007/s11063020102632
Yıldırım H, Revan Özkale M (2021) LLELM: a regularized extreme learning machine based on L1norm and Liu estimator. Neural Comput Appl 33:10469–10484
Yu Q, Miche Y, Eirola E, van Heeswijk M, Séverin E, Lendasse A (2013) Regularized extreme learning machine for regression with missing data. Neurocomputing 102:45–51. https://doi.org/10.1016/j.neucom.2012.02.040
Yu Y, Kang S, Qiu H (2018) A new network intrusion detection algorithm: DAROSELM. IEEJ Trans Electr Electron Eng 13:602–612
Zeng N, Zhang H, Liu W, Liang J, Alsaadi FE (2017) A switching delayed PSO optimized extreme learning machine for shortterm load forecasting. Neurocomputing 240:175–182. https://doi.org/10.1016/j.neucom.2017.01.090
Zhang Y, Wu J, Zhou C, Cai Z (2017) Instance cloned extreme learning machine. Pattern Recognit 68:52–65. https://doi.org/10.1016/j.patcog.2017.02.036
Zhou Z, Guo H, Wang Y, Zhu Z, Wu J, Liu X (2018) Inverse kinematics solution for robotic manipulator based on extreme learning machine and sequential mutation genetic algorithm. Int J Adv Rob Syst 15:172988141879299. https://doi.org/10.1177/1729881418792992
Zhou Y, Zhou N, Gong L, Jiang M (2020) Prediction of photovoltaic power output based on similar day analysis, genetic algorithm and extreme learning machine. Energy 204:117894
Zou H, Huang B, Lu X, Jiang H, Xie L (2016) A robust indoor positioning system based on the procrustes analysis and weighted extreme learning machine. IEEE Trans Wirel Commun 15:1252–1266. https://doi.org/10.1109/TWC.2015.2487963
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author selfarchiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yıldırım, H., Özkale, M.R. A combination of ridge and Liu regressions for extreme learning machine. Soft Comput 27, 2493–2508 (2023). https://doi.org/10.1007/s0050002207745x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0050002207745x
Keywords
 Extreme learning machine
 Regularization
 Liu regression
 Tikhonov regularization
 Multicollinearity