1 Introduction

The soil cation exchange capacity (CEC) is the total number of exchangeable cations that held in the soil by electrostatic forces at a specified pH in the unit weight (Amini et al. 2005; Velde and Bauer 2014). It is commonly referred to the number of negative charges in soil. CEC is one of the important chemical properties of soil, which shows the ability of soil to maintain positively charged ions, and also it is a good index to indicate the quality, fertility, and efficiency of soil (Arias et al. 2005; Khaledian et al. 2017). Even though it is possible to measure the CEC directly, the acquisition process is difficult and expensive, especially in Iran due to more significant amounts of lime and gypsums (Amini et al. 2005; Carpena et al. 1972; McBratney et al. 2002). Hence, several methods have been developed to estimate the CEC from soil properties, which can be easily measured. In general, there are two main groups of literature for estimating the CEC. The first group of studies focuses on developing regression-based empirical models called pedotransfer functions (PTFs). This set of methods tried to establish empirical relation between the CEC and physical and chemical properties of soil, such as soil pH, soil texture, and organic matter, which can be easily measured (Bell and Van Keulen 1995; Drake and Motto 1982; Fooladmand 2008; Ghorbani et al. 2015; Krogh et al. 2000; Manrique et al. 1991). In recent years, the second group of studies involving CEC estimates is related to machine learning methods such as support vector machines (SVM), artificial neural networks (ANN), adaptive neuro-fuzzy inference system (ANFIS), genetic expression programming (GEP), and others. In recent years, several studies, such as Emamgholizadeh (2012); Parhizkar et al., (2015); Emamgholizadeh et al., (2017); Parsaie et al., (2018a, b); Maroufpoor et al., (2018); Emamgholizadeh et al. (2018); Emamgholizadeh, and Karimi (2019); Bazoobandi et al., (2019); Parsaie et al., (2018a,b), have reported successful applications of these intelligent models to estimate parameters in soil science, water engineering, and civil engineering, for modeling soil CEC in a nonlinear framework and create relationships between inputs (physicochemical properties of soil data) and output (CEC) (da Silva et al. 2018; Emamgolizadeh et al. 2015; Ghorbani et al. 2015; Jafarzadeh et al. 2016; Kashi et al. 2014; Keshavarzi and Sarmadian 2010; Keshavarzi et al. 2017; Liao et al. 2014). One of the benefits of using artificial intelligence models over pedotransfer functions (PTFs)-based models lies in not depending on specific functions with unusual patterns but just in the training process. Furthermore, the accuracy of these models to retrieve the CEC was better than regression-based PTFs models particularly, when the relationship between input and output data is unknown, and also there is a nonlinear and complex relationship between them (Emamgolizadeh et al. 2015).

Although most of the previous studies indicate the superiority of the data-driven models in comparison with the regression-based PTFs models, it is possible to reduce the learning error and increase the performance of these models by coupling them with meta-heuristic optimization algorithms such as particle swarm optimization (PSO), invasive weed optimization (IWO), genetic algorithm (GA), and cultural algorithm (CA). The literature of past studies indicates that the integrated forecasting methods outperform the individual predictions (Da and Xiurun 2005; Mohammadi and Mehdizadeh 2020; Holland 1975; Kennedy and Eberhart 1995; Mehrabian and Lucas 2006; Meshram et al. 2019; Ndiritu and Daniell 2001; Reynolds 1994; Tien Bui et al. 2018; Mohammadi et al. 2020a). Therefore, in the current study, two meta-heuristic optimization algorithms, namely the PSO, and the IWO were used to predict CEC. The IWO is a nature-inspired meta-heuristic optimization technique, which proposed by Mehrabian and Lucas (Mehrabian and Lucas 2006), inspired by the dynamic growth of the weed’s colony and can be used for continuous function optimization. Also, the PSO algorithm was introduced by Kennedy and Eberhart (1995). This technique is a population-based search algorithm inspired on the social behavior of birds within a flock.

In recent years, the PSO and IWO algorithms are successfully used for improving the prediction ability in soil science. For example, Moazenzadeh and Mohammadi (2019) utilized a hybrid of bio-inspired meta-heuristic optimization algorithms and SVM model to assess the soil temperature. Their findings indicated that the proposed hybrid algorithm was a powerful computational tool for estimating soil temperature compared to the SVM model. Xue et al. (2014) applied hybrid SVM-PSO to predict slope stability of soil, and they stated that using the PSO algorithm enhanced the forecasting accuracy of the SVM model, and the PSO-SVM can be used as a powerful model to estimate the slope stability. Rui et al. (2019) used the PSO-SVM for estimation of the total organic carbon (TOC) content from DT (acoustic log), DEN (bulk density), GR (natural gamma-ray), SP (natural potential), and some array resistivity logged similar M2R1 to M2RX. They found that this model can be used as an efficient and reliable method to estimate TOC content. Also, other researchers such as Tang et al. (2019), Wang et al. (2013) Du et al. (2017) used the PSO-SVM in their studies. Moazenzadeh et al. (2018) applied hybrid support vector machines (SVM) with meta-heuristic optimization algorithms to estimate evaporation values. The results showed that the hybrid model produced a better-estimated result compared to the SVM model alone. In another study, Ghorbani et al. (2017) estimated the field capacity and wilting point of the soil combining the SVM and firefly algorithm models, and they had shown that the hybrid model performs well compared to the SVM method alone. Additionally, Mohammadi et al. (2020b) used a hybrid of grey wolf optimizer and SVR method for modeling lake water level, and they stated that the hybrid model outperforms compared to the SVR model.

Several studies have recently used a hybrid invasive weed optimization algorithm (IWO) and support vector machines (SVM) to find complex relations between inputs and outputs in numerous engineering problems. For example, Huang et al. (2013) applied the invasive weed optimization algorithm (IWO) for optimization of the parameters of support vector machines (SVM). Goli et al. (2018) compared the ability of the invasive weed optimization (IWO) algorithm with three meta-heuristic algorithms, including particle swarm optimization (PSO), genetic algorithm (GA), and cultural algorithm (CA), to improve the artificial intelligence models, namely support vector regression (SVR), multilayer perceptron (MLP), and adaptive-neural-based fuzzy inference system (ANFIS) to predict the demand of dairy products (DDP). Their results showed that using the hybrid IWO and ANFIS model produced better estimation compared to the other hybrid models. These studies confess that when meta-heuristic optimization algorithms are used to learn the target function in the intelligence models such as SVM, the new hybrid model can be better learned and therefore perform better than the SVM model alone. Due to the aforementioned advantages of the hybrid models, in the current study, the SVM model was coupled with IWOA and PSO methods for estimating the CEC.

There is no work to date, of which we are aware, that has no other study has used SVM-PSOIWO to estimate CEC. The advantage of the proposed new hybrid model (SVM-PSOIWO) is (1) optimization of the objective function to estimate the CEC simultaneously with two methods of meta-heuristic optimization algorithms (IWOA and PSO), (2) local and global search to find the optimal solutions near the definitive answers, so the new proposed model will never drop in the local optimal because all the possible answers at the same time are analyzed, (3) the new proposed model is not sensitive to outlier data and also to the existence of noise in data, and it can consider the extreme values, (4) the optimization of the target function simultaneously by two meta-heuristic optimization algorithms makes it possible to minimize the response of the target function of the SVM model to the extent possible; as a result, the objective function is optimized, and finally, the CEC values are estimated with the highest accuracy and the least error.

The main goal of this study is to examine the ability of the SVM-PSOIWO method to estimate CEC. As a second perspective, a comparison was done between the proposed method and existing methods.

2 Materials and methods

2.1 Case study and data description

The soil physical and chemical data obtained from two field sites, namely Taybad (latitude, 34.6983° N to 34.7000° N; longitude, 60.7667° E to 60.7817° E) and Semnan (latitude, 35.5667° N to 35.5816° N and longitude, 53.4667° E to 53.4817° E), were used in this study (Emamgolizadeh et al. 2015). The area of each site was approximately 400 hectares. Two hundred and fifty soil samples were taken from each site from the top 30 cm of the soil profile. Soil samples randomly and with appropriate distribution were taken from 500 locations in two study sites. The distance between soil sampling points was between 100 and 400 m.

Cation exchange capacity (CEC) determined by Bower’s method (Sparks et al. 1996). Also, soil texture, OM percentage, and pH were measured by hydrometer technique (Gee and Bauder 1986), the Walkley–Black approach (Walkley and Black 1934; Nelson and Sommers 1982), and pH-meter, respectively. According to the USDA soil classification criteria, the study area has two types of Entisols and Aridisols (USDA, soil taxonomy 2010). Table 1 indicates a summary of the statistical characteristics of the data. To examine the ability of models to predict the CEC, the whole data sets consisted of 500 experimental data points of organic matter (OM), pH, clay, and soil cation exchange capacity (CEC) split into two categories of training and testing based on simple random sampling approach. Overall, 80% of data (N = 400 data points) considered for training, and 20% of the remaining data (M = 100 data points) for testing (see Fig. 1).

Table 1 Statistical characteristics of data (Emamgholizadeh et al. 2015)
Fig. 1
figure 1

Sample series plot of CEC measurement data

The previous results showed that CEC depends on many factors such as soil texture, organic matter, soil humus content, and soil pH. Among these parameters, OM, clay, and pH are more important than other parameters (Bell and Van Keulen 1995; Brady and Weil 2016; Emamgolizadeh et al. 2015; Fooladmand 2008). Bell and Van Keulen (1995), and Krogh et al. (2000) showed that there is a positive correlation between CEC and soil pH. As the soil pH is increasing, the amount of hydrogen held by organic colloids and silicate clays (kaolinite) is ionized, and replaced; therefore, the number of negative charges on the colloids increases and as a result the CEC value increases (Pratt 1961; Sparks 2012). Soil organic matter (OM) is another important parameter of soil that has a significant contribution to the CEC of the soil due to its high surface area and high electrical charge (negative charges). Studies showed that near the soil surface where the organic matter content is higher, the CEC content increases and, conversely, at lower soil depths, it decreases (Oorts et al. 2003; Parfitt et al. 1995; Sparks 2003). Similar to the OM, and pH, there are several reports on the impact of clay content on the CEC of soils. Clay can absorb and retain cations due to a large number of negative charges on their surface, thereby increasing the amount of the CEC (Amini et al. 2005; Emamgolizadeh et al. 2015; Seybold et al. 2005).

A correlation analysis was done to survey the relationship between soil CEC and clay, OM, and pH (see Fig. 2). For this purpose, the Pearson product-moment correlation performed to find the strength of the linear relationship between variables. Figure 2 shows that there is a high correlation between CEC and OM with R = 0.83. Also, the correlation between CEC and clay and pH is 0.76 and 0.54, respectively.

Fig. 2
figure 2

Pearson product-moment correlations for all data sets

2.2 Support vector machine (SVM)

The SVM method is a supervised learning, and for the first time it was introduced by Vladimir Vapnik in 1995. The support vector machine is an efficient learning system based on bounded optimization theory that utilizes the principle of structural error minimization induction and results in an optimal solution (Tang et al. 2019). To categorize vectors that are not linearly separable, a kernel function such as degreed polynomial, radial basis, or hyperbolic tangent is used to map the observed multidimensional vectors to a space with higher dimensions. Recently, some researchers suggested radial basic functions (RBF) as a powerful tool for considering as a kernel function in soil and water studies (Moazenzadeh et al. 2018; Mohammadi et al. 2021), and the RBF kernel function parameters were optimized through the trial and error method. Figure 3 shows a schematic structure of the SVM model.

Fig. 3
figure 3

Schematic structure of the SVM model

2.3 Particle swarm optimization (PSO)

The PSO meta-heuristic algorithm was first proposed by Kennedy and Eberhart (1995) for optimization of the complicated process. The PSO algorithm is inspired by the collective performance of animal groups such as birds and fishes (Assareh et al. 2010). In this algorithm, a bunch of creatures, which are called particles, spread in the search area. Every single particle approximates its situation relative to the target position. They adjust their position and the velocity based on the current situation and the best position they were already in, and the situation of the best particles in the bunch:

$${{\mathrm{V}}_{\mathrm{id}}}^{t}=w{{\mathrm{V}}_{\mathrm{id}}}^{t-1} + {C}_{1}{r}_{1}\left({{\mathrm{P}}_{\mathrm{id}}}^{t}-{{\mathrm{x}}_{\mathrm{id}}}^{t}\right)+{C}_{2}{r}_{2}\left({{\mathrm{P}}_{\mathrm{gd}}}^{t}-{{\mathrm{x}}_{\mathrm{gd}}}^{t}\right),\mathrm{ d}=\mathrm{1,2},\dots \mathrm{D}$$
(1)

where \(x_{id}^{t}\) indicates the location of the particle id = 1,…,D in iteration t, \(V_{id}^{t}\) is velocity of particle id = 1,…,D in iteration t, \(P_{id}^{t}\) is the best location of the particle id = 1,…,D in iteration t, \(P_{gd}^{t}\) is the global best position of the article gd = 1,…,D in iteration t, w expresses the inertia weight, C1expresses the cognition learning factor, C2 expresses the social learning factor, and r1 and r2 denote the random values in [0,1].

The basic steps for implementing the algorithm are as follows: step (1) generating the initial swarm and assessing it, step (2) evaluation of the fitness of every single particle within the bunch, step (3) update velocity of every single particle according to Eq. 1 and update the position by \(x_{id}^{t + 1} = x_{id}^{t} + v_{id}^{t}\), step (4) each particle moves to the next position based on the \(x_{id}^{t + 1} = x_{id}^{t} + v_{id}^{t}\), step (5) the algorithm will stop when the termination criterion is satisfied or returned to the step 2.

2.4 Invasive weed optimization (IWO)

The invasive weed optimization (IWO) was introduced by Mehrabian and Lucas (2006). It is an intelligent and evolutionary algorithm for solving optimization problems. In this algorithm, the meta-heuristic procedure is inspired by the dynamic growth performance of the weeds colony in nature (Safari et al. 2020). Also, this iterating algorithm is useful for continuous functions works in five steps consist of initialization, reproduction, spatial dispersal, competitive exclusion, and termination condition (Fig. 4).

Fig. 4
figure 4

Sketch procedure of the basic IWO algorithm

Each step in the IWO algorithm is summarized below:

Step 1- Initialization: In the first stage, the initial population of weeds, X = {x1, x2, …, xPS0}, is generated in the search space, PSO is the size of the initial population of weeds. Each weed, xi = (xi1, xi2, …, xin) is an n-dimensional real-valued vector, and each dimension xik of xi generated as follows:

$${\mathrm{x}}_{\mathrm{ik}}={\mathrm{lb}}_{\mathrm{k}} +\mathrm{ r}\times \left( {up}_{k}-{lb}_{k} \right),\mathrm{ i}= 1, 2, ... , {PS}_{0},\mathrm{ k}= 1, 2, ... ,\mathrm{ n}$$
(2)

where r is a uniform random number between 0 and 1. \({lb}_{k}\) and \({up}_{k}\) denote the lower and upper bounds for the k dimension, respectively.

Step 2- Reproduction: Each weed produces seeds based on its fitness. In fact, the number of seeds (Si) produced by a weed (xi) is determined by the fitness of the plant (Eq. 3). A weed that has higher fitness has a greater chance of reproduction.

$${\mathrm{s}}_{\mathrm{i}}=\mathrm{floor}\left({s}_{min}+\frac{{s}_{max}-{s}_{min}}{{f}_{max}-{f}_{min}}\times \left(f\left({x}_{i}\right)-{f}_{min}\right)\right)$$
(3)

where Si is the number of seeds generated by weed xi, f(xi) stands for the fitness of xi, so \({\mathrm{f}}_{min}=\mathrm{min}\begin{array}{c}f\left({x}_{i}\right) \\ {x}_{i}\in X\end{array}\) and \({\mathrm{f}}_{max}=\mathrm{max}\begin{array}{c}f\left({x}_{i}\right) \\ {x}_{i}\in X\end{array}\). Floor is a function which rounds the element to the nearest integer towards minus infinity. smin and smax define the number of seeds generated by the worst and the best weeds in the population, respectively. The generated seeds in this step have a normal distribution with a mean equal to zero but the variance is different.

Step 3- Spatial Dispersal: In this step, the randomness and adaptation are done in the IWO algorithm. In order to group fitter plants and eliminate inappropriate ones, the nonlinearity at each iteration must be decreased. To achieve this, in each generation over time, the standard deviation (σ) of the normal distribution is reduced from specific initial value (σ0) to final value (σf) according to Eq. 4:

$${\sigma }_{iter}=\frac{{\left({iter}_{max}-iter\right)}^{a}}{{\left({iter}_{max}\right)}^{a}}\times \left({\sigma }_{0}-{\sigma }_{f}\right)+{\sigma }_{f}$$
(4)

σiter represents the standard deviation at the current iteration; iter, and itermax define the maximum number of iterations, and α, which is generally set to 3, is a nonlinear modulation index.

Step 4- Competitive Exclusion: In this step, all the weeds in the initialization step and the seeds produced in the reproduction step joint together to form the next generation population. Because the number of weeds does not exceed a given maximum allowable population in a colony, PSmax, the mechanism of the competitive elimination is used to the members of the population, and weeds with lower fitness will be eliminated.

Step 5- Termination Condition: In this step, steps 2 to 4 are repeated until a given termination condition has occurred. Termination condition could be the maximum number of iteration or the maximum elapsed CPU time.

2.5 Hybrid models (SVM-PSO and SVM-PSOIWO)

SVM model does not require complicated calculations, but it needs to adjust network weights and coordinate neurons when performing local convergence and optimization in the network. One of the novelties of this study is to apply the new hybrid SVM-PSOIWO method in comparison with basic SVM and SVM-PSO to obtain a rapid and efficient method for predicting the CEC in the study area. For optimizing the train performance, the PSO algorithm was then integrated with the ordinary SVM model to construct a single-phase SVM-PSO model. And the PSO aimed to determine the optimized values of the ordinary SVM model parameters (i.e., weights and biases) at the model’s training section. Then, two-phase hybrid model (SVM-PSOIWO) was also constructed to further improve the SVM-PSO model for acquiring the best synaptic weights and biases within the two-phase hybrid model’s hidden layers. SVM-PSOIWO stops when a mathematical fit between support vector machine weights and the IWO is created, or the maximum number of iterations occurs. It is an estimator hybrid procedure that utilizes both support vector machine capabilities and optimization algorithm capabilities. Some research has shown that such a hybrid technique can predict more successful results (Ghorbani et al. 2017; Moazenzadeh and Mohammadi 2019; Mohammadi and Mehdizadeh 2020). The flowchart of the SVM-PSOIWO is shown in Fig. 5.

Fig. 5
figure 5

The schematic diagram includes input and output along with details of the SVM-PSOIWO hybrid method

2.6 Model performance criteria

The estimated soil CEC values were compared with observed values using five different performance evaluation criteria: The root mean square error (RMSE), the coefficient of determination (R2), the mean absolute error (MAE), the relative root mean square error (RRMSE), and mean absolute percentage error (MAPE). Table 2 shows mathematical expressions of these performance evaluation criteria,

Table 2 Mathematical expressions of statistical metrics

where Oi is the observed CEC values, Pi is the predicted CEC values, n is the number of CEC data, and the bar denotes the mean of the variable.

3 Results and discussion

3.1 CEC estimates from the SVM, SVM-PSO, and SVM-PSOIWO models

In this study, the SVM, SVM-PSO, and SVM-PSOIWO methods were employed to predict the soil CEC value. For this purpose, 80% of the data (400 data points) was used for training predictor models. In addition, 20% of the data was employed in the testing stage. The proper selection of inputs data for models, i.e., SVM, SVM-PSO, and SVM-PSOIWO, has an important role in the accurate estimation of the CEC. For this purpose, based on the Pearson correlation analysis three variables, namely OM, clay, and pH, were selected among different measured physical and chemical parameters as input data to the models. Three scenarios for the input configurations were defined (see Table 3). These scenarios were selected based on the highest correlation of input parameters with the CEC parameter. The first scenario includes OM which has the highest correlation with the CEC, and in the second scenario the clay parameter, which after OM has the highest correlation with the CEC, added into the first scenario, and finally, the third scenario includes OM, clay, and pH.

Table 3 The scenarios of the input combinations of models

After designing the different scenarios highlighted in Table 3, the input configurations were introduced to the mentioned models for implementation of them. Tables 4 and 5 show the RMSE, MAE, MAPE, RRMSE, and R2 of CEC estimates from the SVM, SVM-PSO, and SVM-PSOIWO methods for training and testing stages, respectively. As can be seen in Tables 4 and 5, indicated with the first input configuration (i.e., OM), the RMSE (R2), of CEC estimates from SVM, SVM-PSO and SVM-PSOIWO are 0.419 Cmol + kg−1 (0.772), 0.334 Cmol + kg−1 (0.846), and 0.298 Cmol + kg−1 (0.888), respectively, for training, and 0.429 Cmol + kg−1 (0.740), 0.367 Cmol + kg−1 (0.807), and 0.316 Cmol + kg−1 (0.857), for testing.

Table 4 Result of models related to the training phase
Table 5 Result of models related to the testing phase

For the second input configuration, by adding the clay to the second input configuration (i.e., OM, and clay), the accuracy of CEC estimates increased. The RMSE varies from a minimum of 0.243 Cmol + kg−1 to a maximum of 0.408 Cmol + kg−1 for training and testing stages. Comparing the results of the first and second input configurations indicates that the average RMSE of CEC estimates decreased by 15.17% and 8.82% for training and testing stages, respectively. Finally, for the third input configuration (i.e., OM, clay, and pH), using these configurations of input data, the average RMSE decreased by 31.95% and 19.78% compared to the first and second input configurations for training and by 24.19% and17.76% for testing stages, respectively.

Compared to the SVM model, using the SVM-PSO model to estimate the CEC value the accuracy of the model increased, and the average RMSE, MAE, MAPE, and RRMSE decreased by 16.72%, 20.59%, 20.67%, and 16.60% in the testing stage. Similarly, a comparison of the performance of the SVM, SVM-PSO, and SVM-PSOIWO models implied that the SVM-PSOIWO estimation was much more accurate than both SVM and SVM-PSO methods. Also, the findings in Tables 4 and 5 illustrate that the RMSE of SVM-PSOIWO decreased by approximately 35.81% and 19.49% compared to the SVM and SVM-PSO models for training and by 31.64% and 17.92% for testing, respectively. Overall, the results of this study imply that the SVM-PSOIWO model has been able to estimate the CEC values with low error and it suggests the success of the support vector machine (SVM) coupling with particle swarm optimization (PSO) and integrated invasive weed optimization algorithm.

In order to indicate the performance of the SVM, SVM-PSO, and SVM-PSOIWO models, the scatter plot and residual (error plot) of observed and predicted CEC values from the best input configuration (i.e., OM, clay, and pH), are drawn in Fig. 6 for testing phase. Based on this figure, the agreement between the measured and predicted CEC was very good for the SVM-PSOIWO model in training stage (R2 = 0.953, RMSE = 0.190 Cmol + kg−1, MAE = 0.132 Cmol + kg−1), and in testing stage (R2 = 0.924, RMSE = 0.229 Cmol + kg−1, MAE = 0.152 Cmol + kg−1).

Fig. 6
figure 6

Scatter plot and residual (error plot) of measurement and predict of CEC by SVM3, SVM-PSO3, and SVM-PSOIWO3 models for the testing phase

Mentioning the optimized parameters of the used models in the hydrological modeling process is very important because it can help researchers measure their new models with optimized parameters (Mohammadi 2019). Concerning this issue, Table 6 lists the optimized parameters and structure of the models used in this study.

Table 6 Parameters setting for models used

Also, in order to compare the SVM, SVM-PSO, and SVM-PSOIWO models to predict the CEC value, the box plot was used. Figure 7 shows the results of models for three scenarios and three models versus the measured data in the testing stage. In the box plot, the statistical characteristics of the measured and predicted soil CEC values are compared. In this figure, the green color represents 25% of the data (first quartile), which is less than the average of the data, and the orange color represents 75% of the data (quartile third). As can be seen, among all used models and scenarios, the SVM-PSOIWOS3 model has the most similar statistical characteristics to the observed values, which means that the third scenario (i.e., OM, Clay, and pH) are more adequate inputs for modeling CEC. On the other hand, this suggests that the new proposed model, which was first used in this study to predict the CEC, is a successful model and can estimate the CEC values with the least error compared to previous popular models.

Fig. 7
figure 7

Box plot of the measured and predicted CEC for the testing phase

In Fig. 8, the comparison between methods (SVM, SVM-PSO, and SVM-PSOIWO) was investigated according to the RRMSE index in the training and testing stages. As expected, in all scenarios and for all used methods, the accuracy of methods in the training stage was better than in the testing stage. Also, based on this index, the use of the third combination of data (i.e., OM, clay, and pH) is the best and most effective combination of input data to estimate the CEC. As shown in Fig. (8), the new model, the SVM-PSOIWO, has been able to reduce the value of the RRMSE index by almost half as compared to the SVM model, which represents a good and positive feature of the newly proposed method. In both the training and testing stages, the results of the new SVM-PSOIWO model are much more satisfactory than SVM and SVM-PSO methods, so that the SVM-PSOIWO model could significantly reduce the error rate in the CEC estimate.

Fig. 8
figure 8

Three-dimensional bar graph of performance models for prediction CEC in both the train and test sections by the RRMSE index

3.2 Comparing SVM-PSOIWO with previous studies

As shown in Sect. 3.1, the comparison results of CEC estimates for the three methods and SVM-PSOIWO revealed that the best results were achieved when the SVM-PSOIWO model was used with the third combination of data (i.e. OM, clay, and pH). This finding is consistent throughout the study of Emamgholizadeh et al. (2015). To further evaluate the capability of the SVM-PSOIWO method in estimating the CEC parameter, the result of the SVM-PSOIWO model was compared with those of previous studies that used the ANN, GEP, MARS, and MLR models to estimate CEC. The statistical indices of all models in the testing stage are given in Table 7. As shown, compared to other models, the CEC estimates from the SVM-PSOIWO model with R2 and RMSE 0.229 Cmol + kg−1, and 0.924 provide accurate results and reduce the RMSE by 9.1%, 28.0%, 38.1%, and 43.9% compared to ANN, GEP, MARS, and MLR, respectively. Overall, the estimating performance of the SVM model improves when the coupling of this model with particle swarm optimization (PSO) and integrated invasive weed optimization algorithm is used instead of using the conventional SVM model. Also, the results of this study suggest that SVM-PSOIWO is a viable alternative procedure for the commonly used models such as ANN, GEP, MARS, and MLR models to retrieve CEC.

Table 7 Comparing the performance of different studies

4 Conclusion

Soil cation exchange capacity (CEC) is an important parameter in agriculture and soil science. In this research, the SVM-PSOIWO is proposed as a new method for estimating CEC. Accordingly, the physical and chemical data (i.e., clay, OM, and pH) from two field sites of Taybad and Semnan in Iran were used to estimate CEC. For this purpose, three configurations of input data (i.e., clay, OM, and pH) were used to train and test models. It was found that the performance of the three used methods of SVM, SVM-PSO, and SVM-PSOIWO is promising for estimating the CEC as a function of physical and chemical data as input parameters. However, the SVM-PSOIWO performed better than the individual model (SVM) and the hybrid model (SVM-PSO). Moreover, the experiments demonstrated that combinations of clay, OM, and pH are the most effective input parameters for an accurate estimation of the CEC values instead of one and two input combinations of data. In another word, the performance of the models to retrieve the CEC was greatly improved by increasing the number of inputs data. Since the measurements of Clay, OM, and pH are easy and have low cost, therefore, the proposed new hybrid model can be employed to estimate CEC with acceptable accuracy. The estimated CEC from the SVM, SVM-PSO, and SVM-PSOIWO was also compared with those of existing studies such as ANN, GEP, MARS, and MLR. It was found that the SVM-PSOIWO models estimate the CEC more accurately than those studies. In general, the results of this study showed that the improvement in SVM-PSO provided by the IWO algorithm could be used as a predictive tool along with high-performance optimization to estimate the CEC parameter and other soil and water parameters. The high precision of the proposed method (PSOIWO) can be related to its capability to find the best outcome in the search space, so that this hybrid algorithm simultaneously searches the optimal answer in the local and global search space. Another advantage of this algorithm is that when it finds the optimal solution, all of the other optimal answers analyzed in the neighborhood of the optimal solution, which prevents being trapped in a local optimum. Although the result of the SVM-PSOIWO model demonstrates improvements in the prediction of CEC compared to other artificial intelligence (AI) models, however, same to other AI models, the shortcoming of the proposed model is that it acts like a black box and it must be considered by researchers in their studies.