Introduction

The demand for renewable energy has been rapidly increasing in recent years due to the negative effects of fossil fuel-based energy sources on our environment and their contribution to climate change. Consequently, there has been a growing interest in clean energy resources, such as solar energy [1]. As a result, predicting solar radiation reaching the Earth's surface has paramount importance for various applications, including engineering designs, heating and cooling systems, building energy systems, medical studies, agriculture, climatological research, evapotranspiration studies, solar collector efficiency, and seawater desalination [2, 3]. Reliable solar radiation statistics are essential for achieving successful outcomes in any of these fields of research.

Algeria is strategically positioned in the Sunbelt, offering a significant advantage in terms of its solar energy potential. Throughout the national territory, annual sunlight duration exceeds 3,000 h, and in the high plateaus and the Sahara, it can reach up to approximately 3,900 h [4]. Regrettably, obtaining accurate solar irradiation measurements in various regions of Algeria remains a challenge, primarily due to the high costs associated with measurement equipment such as solarimeters and pyranometers, as well as the expense, maintenance, and calibration requirements of the systems involved. While Algeria hosts numerous meteorological stations in different parts of the country, the availability of solar irradiation measurements is not always guaranteed. This is often due to issues related to recording problems caused by frequent power outages, especially during the summer months, or limitations in the number of variables that can be recorded. As a result, it becomes significantly more important to employ sophisticated procedures for accurately estimating solar radiation using readily available meteorological data [5]. In light of the absence of solar radiation measurements in all regions of the Earth, various models have been devised to estimate solar radiation in areas lacking monitoring stations. The progress in developing global solar radiation models is continuous; however, it's essential to recognize that these models may produce differing outcomes across various regions. Consequently, it holds significance to construct location-specific models whenever possible. As part of this initiative, a study was conducted to formulate solar radiation models customized explicitly for India [6].

Over time, several artificial intelligence (AI) methods for predicting global solar radiation on a horizontal surface have been developed. These methods include the support vector machine approach for estimating global solar radiation while accounting for the influence of fog and haze [7], as well as the least squares-support vector machine (LS-SVM) [8]. Hansen and Salamon [9] introduced the concept of an aggregated or stacked neural network, which enhances a model's generalization by training multiple neural networks and fusing their outputs. This highly effective approach has found wide application [10]. Research has demonstrated that stacked neural networks outperform individual ones in terms of generalization capability [11]. Literature studies have shown that artificial neural networks (ANN) are superior to traditional empirical models in predicting solar radiation (SR) [12]. Support vector machines (SVM), developed by Vapnik [13], have recently found wide application in computer science, bioinformatics, and environmental science [14, 15]. Previous studies have proven that SVMs perform better than neural networks and other statistical models [14]. However, there is limited literature on the application of SVMs in predicting SR. Thus, the goal of this research work is to develop a support vector machine-based bootstrap aggregated support vector machine (BASVM) model to predict hourly global solar radiation received on the horizontal plane over one year in the Bouzareah region of Algeria. This prediction will be based on nine meteorological and climatological parameters: Month, Day, Time (h), Average Temperature (K), Relative Humidity (%), Atmospheric Pressure (mbar), Wind Speed (m/s), Wind Direction (°), and global solar radiation (Wh/m2).

To the best of our knowledge, no studies using a bootstrap-based support vector machine for predicting solar radiation or in any other domain have been described in the literature. This will be the first study to predict global solar radiation using the BASVM Model. We will compare the individual support vector machine (ISVM) and a single support vector machine (SSVM) to the BASVM. The paper is structured as follows: Section 2 presents the materials and methods, Section 3 introduces the evaluation criteria, Section 4 covers the results and discussion, and Section 5 summarizes the conclusions of our research.

Literature Review

The modeling of solar irradiation has been the focus of several research and studies, with the most significant conducted in the last two decades. Quej et al. [16] employed three machine learning algorithms, specifically SVM, ANN, and ANFIS, to predict daily global solar radiation data for six stations in Mexico. The algorithms were trained using extraterrestrial solar radiation, rainfall, minimum temperature, and temperature data. The comparative analysis revealed that SVM outperformed the other models, achieving the best results with a root mean square error (RMSE) of 2.578, mean absolute error (MAE) of 1.97, and coefficient of determination (R2) of 0.689. Dos Santos et al. [17] evaluated hourly and daily direct solar radiation using two methods, ANN and SVM, with 13 years of data. The results demonstrated the positive performance of both methods. Lima et al. [18] applied three predictive models, namely multilayer perceptron back propagation neural network (MLPBP-NN), the Radial Basis Function network (RBF), and the support vector machine (SVM), to predict daily solar radiation. The input variables used for these models were solar irradiance and temperature. The results showed that the proposed model exhibited high efficiency in forecasting daily solar radiation, especially in areas located close to the Equator line.

Takilate et al. [19] developed a novel approach to estimate inclined irradiation in 5-min intervals across three distinct climatic regions: Algiers and Ghardaïa in Algeria, and Malaga in Spain. The model is a combination of two conventional models, namely the Perrin Brichambaut and Liu and Jordan models. The results demonstrated that the normalized root mean square error (nRMSE) ranged from 4.7% to 6.41%, indicating the model's accuracy in predicting inclined irradiation in the specified areas. Gao et al. [20] propose a hybrid hourly irradiance forecasting method that combines CEEMDAN (Complete Ensemble Empirical Mode Decomposition with Adaptive Noise) with Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM). They conclude that the proposed hybrid approach yields better results than a large number of alternative methods. Peng et al. [21] introduced a novel hybridization methodology, primarily relying on the utilization of the recent CEEMDAN algorithm as a pre-processing technique, in combination with the sine cosine search algorithm (SCA) for feature selection, and Bidirectional Long Short-Term Memory (BiLSTM) as the core prediction model. The proposed CEEMDAN-SCA-Bi-LSTM model demonstrated superior forecasting accuracy compared to seven reference models. Keshtegar et al. [22] Conducted a study that evaluated the effectiveness of four empirical regression methods – namely Kriging, MARS, RSM, and M5 Tree – in accurately assessing solar energy using diverse input data from Adana and Antakaya in Turkey. The findings revealed that the Kriging model exhibited superior performance when compared to the cyclic MARS, RSM, and M5 Tree models. This investigation took place in the context of West Africa. Nwokolo et al. [22] Provided a quantitative evaluation of the global solar energy literature. Utilizing a range of models such as sunlight-based, temperature-based, precipitation-based, cloud-core, comparative humidity-based, and hybrid parameter-based models, they amassed a collection of 356 empirical models and 68 functional forms. These studies collectively showcase the evolution of solar irradiation modeling, with a growing emphasis on the integration of advanced algorithms and hybrid methodologies to achieve more accurate and reliable predictions, In the context of this present research work, we have developed a non-linear model based on bootstrapped aggregated support vector machine (BASVM) for predicting hourly global solar radiation received on the horizontal plane over one year in the region of Bouzareah (Algeria).

Material and Methods

In this research study, we utilized parameters widely used in the literature [23,24,25,26] we used hourly data for one year, We collected hourly data for one year (2015) from the radiometric station 'Shems,' which is part of the Centre for Renewable Energy Development (CDER) located in Bouzareah, Algiers, at a latitude of 36.8° and a longitude of 3.17°. These data were applied to predict hourly global solar radiation using both the single support vector machine (SSVM) and the bootstrapped aggregated support vector machine (BASVM) with nine different parameter configurations. Figure 1 illustrates the measurement instruments at the Bouzareah station in Algeria.

Fig. 1
figure 1

Photo of the measuring station at CDER [27]

This database (DB) comprises 3603 data points and has been utilized to optimize the parameters of the bootstrapped aggregated support vector machine (BASVM). In the database, values less than 120 W/m2 (from 5 a.m. to 5 p.m.) have been excluded, as defined by the World Meteorological Organization (WMO), which establishes sunshine duration when global solar radiation values exceed 120 W/m2 [28]. The statistical analysis of the input and output data was performed in terms of the minimum (min), the average (mean), the maximum (max), the sum (sum), the sample variance (Var), and the standard deviation (STD), all of which are detailed in Table 1.

Table 1 Numerical analysis of inputs and output

Support Vector Machines (SVM)

Support Vector Machine (SVM) is a supervised learning method that has become exceedingly popular for predicting meteorological data such as temperature [29], wind speed [30], and global solar radiation [7] in the past few years. Due to its simplicity and flexibility, it can handle a range of classification and regression difficulties in different fields, for example, mechanical engineering [31], energy [32], finance [33], and other fields. SVMs distinctively afford balanced predictive performance, even in studies where sample sizes may be limited.

The regression function can use the nonlinear relationship between the input and output in a support vector machine model. The output of the SVM model is obtained by the following equation [34]:

$$\mathrm f\left({\mathrm x}_{\mathrm i}\right)=\mathrm\omega^{\mathrm T}\varnothing\left({\mathrm x}_{\mathrm i}\right)+\mathrm b,\;\mathrm i=1,2,\dots,\;\mathrm n$$
(1)

\(\mathrm{f}\left({\mathrm{x}}_{\mathrm{i}}\right)\): the predicted data of the SVM model.

\(\mathrm{\varnothing }\left({\mathrm{x}}_{\mathrm{i}}\right)\): the implicitly constructed nonlinear function that transforms input finite-dimensional space into higher-dimensional space.

\(\upomega ^\mathrm{T}\): This is the weight vector, which corresponds to the coefficients associated with the feature vector in the high-dimensional feature space. It helps determine the importance of each feature in the regression process.

\(\mathrm{b}\): the bias of the SVM model.

i = 1,2,…,n: This indicates that the regression function is calculated for each input sample in the dataset, where n is the total number of samples.

The dataset has a D-dimensional input vector \({\mathrm{x}}_{\mathrm{i}}\) \(\in\) \({\mathrm{R}}^{\mathrm{D}}\) and a scalar output \({\mathrm{y}}_{\mathrm{i}}\) \(\in\) \(\mathrm{R}\).

The following equations provide the SVM optimization model (for the training set):

$$\left\{\begin{array}{c}\min\;R\left(\mathrm w,\mathrm\xi,\mathrm\xi^\ast,\mathrm\varepsilon\right)=\frac12{\Arrowvert\mathrm w\Arrowvert}^2+C\left[\mathrm v\mathrm\varepsilon+\frac1{\mathrm N}\sum_{\mathrm i=1}^{\mathrm N}\left({\mathrm\xi}_{\mathrm i}+\mathrm\xi_{\mathrm i}^\ast\right)\right]\\\mathrm{subjective}\;\mathrm{to}:{\mathrm y}_{\mathrm i}-\mathrm w^{\mathrm T}\mathrm\varphi\left({\mathrm x}_{\mathrm i}\right)-\mathrm b\leq\mathrm\varepsilon+{\mathrm\xi}_{\mathrm i}\\\mathrm w^{\mathrm T}\mathrm\varphi\left({\mathrm x}_{\mathrm i}\right)+\mathrm b-{\mathrm y}_{\mathrm i}\leq\mathrm\varepsilon+{\mathrm\xi}_{\mathrm i}\\\mathrm\xi^\ast,\varepsilon\geq0\end{array}\right.$$
(2)

\(\frac{1}{2}{\Vert \mathrm{w}\Vert }^{2}\): represents the regularization term or the norm of the weight vector.

\(\mathrm{C}\): the factor that balances model complexity with empirical risk \({\Vert \mathrm{w}\Vert }^{2}\)

\({\upxi }_{\mathrm{i}}^{*}\): the slack variable to denote the distance of the ith sample outside of the \(\upvarepsilon\)-tube.

As a standard nonlinear constrained optimization problem, the above problem can be resolved by constructing the dual optimization problem based on the Lagrange multipliers techniques:

$$\left\{\begin{array}{c}\max\;R\left({\mathrm a}_{\mathrm i},\mathrm a_{\mathrm i}^\ast\right)=\sum_{\mathrm i=1}^{\mathrm N}{\mathrm y}_{\mathrm i}\left({\mathrm a}_{\mathrm i},\mathrm a_{\mathrm i}^\ast\right)-\frac12\sum_{\mathrm i=1}^{\mathrm N}\sum_{\mathrm j=1}^{\mathrm N}\left({\mathrm a}_{\mathrm i},\mathrm a_{\mathrm i}^\ast\right)\left({\mathrm a}_{\mathrm j},\mathrm a_{\mathrm j}^\ast\right)\mathrm K({\mathrm x}_{\mathrm i},{\mathrm x}_{\mathrm j})\\\mathrm{subjective}\;\mathrm{to}:\sum_{\mathrm i=1}^{\mathrm N}{\mathrm y}_{\mathrm i}\left({\mathrm a}_{\mathrm i},\mathrm a_{\mathrm i}^\ast\right)=0\\0\leq{\mathrm a}_{\mathrm i},\mathrm a_{\mathrm i}^\ast\leq\mathrm C/\mathrm N\\\sum_{\mathrm i=1}^{\mathrm N}\left({\mathrm a}_{\mathrm i}+\mathrm a_{\mathrm i}^\ast\right)\leq\mathrm C.\mathrm v\end{array}\right.$$
(3)

\(\mathrm{K}({\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}})\): the kernel function satisfying the Mercer’s condition;

\({\mathrm{a}}_{\mathrm{i}}\mathrm{ and }{\mathrm{a}}_{\mathrm{i}}^{*}\): the nonnegative Lagrange multipliers.

$$\widehat{\mathrm{y}}=\mathrm{f}\left({\mathrm{x}}_{\mathrm{i}}\right)=\sum\nolimits_{\mathrm{i}=1}^{\mathrm{N}}\left({\mathrm{a}}_{\mathrm{i}}-{\mathrm{a}}_{\mathrm{i}}^{*}\right)\mathrm{K}\left(\mathrm{x}-{\mathrm{x}}_{\mathrm{i}}\right)+\mathrm{b},\mathrm{ i}=\mathrm{1,2},\dots ,\mathrm{n}$$
(4)

Bootstrap Aggregated Support Vector Machine (BASVM)

One crucial strategy for enhancing the robustness and performance of prediction models involves improving a collection of prediction models, such as Support Vector Machines (SVMs), and subsequently combining them. The development of the Bootstrap Aggregated Support Vector Machine model (BASVM) entails the process of sampling the training datasets using a MATLAB function. Figure 2 provides a visual representation of the Bootstrap Aggregated SVM (BASVM), wherein multiple individual SVM models (ISVM) are created to model the same underlying relationship. This approach enables a more robust and accurate prediction through the combination of these individual models.

Figure 2
figure 2

Bootstrap aggregated support vector machine

The process, focused on designing and optimizing the architecture of both ISVM and BASVM, is depicted in Fig. 3. It begins with resampling the training dataset using a bootstrap technique to generate a set of n different training datasets, where n takes values of 10, 15, 20, 25, and 30. Subsequently, for each of these training datasets, an ISVM model is constructed and evaluated using the testing dataset. The ISVM models that are developed are then combined by taking the average using the following equation:

$$y=\frac{\sum_{i=1}^{n}{y}_{i}}{n}$$
(5)

where: \({y}_{i}\) is the output of the individual SVM "ISVM", \(y\) represent the output of the BASVM, and n is the number of ISVM models. The output of BASVM is the mean of the outputs of ISVM.

Fig. 3
figure 3

Flow diagram for support vector machine development (SSVM, ISVM, and BASVM (Stacking of 10, 15, 20, 25, and 30 ISVM))

Modeling the Support Vector Machine

This research study introduces a novel approach aimed at enhancing and refining the architecture of the support vector machine. The method, as illustrated in Fig. 3, encompasses the creation of three distinct support vector machine models: the single support vector machine (SSVM), an individual support vector machine (ISVM), and a bootstrapped aggregated support vector machine (BASVM, which stacks 10, 15, 20, 25, and 30 ISVMs). The bootstrap technique is employed to calculate the average of the outputs from these individual support vector machines. To predict hourly global solar radiation, the study leveraged SVM modeling and executed the analysis using both MATLAB and STATISTICA software.

Evaluation Criteria

In our current study, we employed a variety of error measures to assess the effectiveness of our prediction models. These measures included the Correlation Coefficient (R), the Mean Absolute Error (MAE) along with its normalized counterpart (nMAE), the Model Predictive Error (MPE), the Root Mean Squared Error (RMSE) and its normalized counterpart (nRMSE), as well as the Standard Error of Prediction (SEP) [35, 36]:

$$\overline{\mathrm{y} }=\sum\nolimits_{\mathrm{i}=1}^{\mathrm{N}}{\mathrm{y}}_{\mathrm{i},\mathrm{cal}}/\mathrm{N}$$
(6)
$$\begin{array}{ccc}\mathrm{MAE}=\frac1{\mathrm N}{\textstyle\sum_{\mathrm i=1}^{\mathrm N}}\left|{\mathrm y}_{\mathrm i,\exp}-{\mathrm y}_{\mathrm i,\mathrm{cal}}\right|&;&\mathrm{nMAE}=\mathrm{MAE}/\overline{\mathrm y}\end{array}$$
(7)
$$\mathrm{MPE}\left(\mathrm{\%}\right)=\frac{100}{\mathrm{N}}\sum_{\mathrm{i}=1}^{\mathrm{n}}\left|\frac{({\mathrm{y}}_{\mathrm{i},\mathrm{exp}}-{\mathrm{y}}_{\mathrm{i},\mathrm{cal}}}{{\mathrm{y}}_{\mathrm{i},\mathrm{exp}}}\right|$$
(8)
$$\begin{array}{ccc}\mathrm{RMSE}=\sqrt{\frac{\sum_{\mathrm i=1}^{\mathrm N}\left({\mathrm y}_{\mathrm i,\mathrm{cal}}-{\mathrm y}_{\mathrm i,\exp}\right)^2}{\mathrm N}}&;&\mathrm{nRMSE}=\mathrm{RMSE}/\overline{\mathrm y}\end{array}$$
(9)

According to Despotovic et al. [37] the model accuracy is considered excellent if nRMSE < 10%, good if 10% < nRMSE < 20%, fair if 20% < nRMSE < 30% and low if nRMSE > 30%.

$$\mathrm{SEP}\left(\mathrm{\%}\right)=\frac{\mathrm{Rmse}}{{\mathrm{Y}}_{\mathrm{I},\mathrm{exp}}}\times 100$$
(10)

where n is the total number of data; \({\mathrm{Y}}_{\mathrm{i},\mathrm{exp}}\) and \({\mathrm{Y}}_{\mathrm{i},\mathrm{cal}}\) are the experimental and calculated data point of global solar radiation, respectively.

Results and Discussion

Effect of the Division of Database

In our current study, we segregated the entire database into two distinct samples: the training dataset, which constitutes the bulk of the database, and a testing dataset, which we employed to gauge the performance of the Support Vector Machine (SVM) in practical scenarios and assess its predictive prowess. A visual representation of this database partition is illustrated in Fig. 4.

Fig. 4
figure 4

The division of the whole database: a "Division 1", b "Division 2", c "Division 3"

Figure 5a and b showcase the results pertaining to the relative Mean Absolute Error (nMAE), relative Root Mean Squared Error (nRMSE), and the Standard Error of Prediction (SEP) in the context of predicting hourly global solar radiation when considering different database partitioning schemes. Notably, the initial partition of the dataset, denoted as the first sample, yielded the most favorable outcomes during the testing phase. Consequently, the individual support vector machine (ISVM) was built based on this initial database partition.

Fig. 5
figure 5

Effect of the division of database: a test phase, b total phase

Comparison between Different Stacking “BASVM” Models

In order to compare different stacking models of BASVM, five distinct stacking models were implemented: stacking 10 ISVM, stacking 15 ISVM, stacking 20 ISVM, stacking 25 ISVM, and stacking 30 ISVM. The process involved resampling the training data using bootstrap resampling with replacement [38] to create different sets of training data, resulting in 10 datasets for stacking 10 ISVM, 15 datasets for stacking 15 ISVM, 20 datasets for stacking 20 ISVM, 25 datasets for stacking 25 ISVM, and 30 datasets for stacking 30 ISVM.

For each of these stacking models, an Individual Support Vector Machine (ISVM) was created for each training dataset. Each ISVM was configured with eight parameters in the input layer and one unit responsible for generating the predicted values of global solar radiation in the output layer. The radial basis function kernel was consistently used for each model (SSVM, ISVM), while the values of C and gamma were varied within the ranges of 10 to 13 and 13 to 14, respectively, with Nu set to 1. The optimal correlation coefficient (R) for each ISVM was selected through a trial and error method.

In each stacking model, the output of the Bootstrap Aggregated Support Vector Machine (BASVM) was computed as the mean of the ISVM outputs, following Eq. 5. The performance of these developed stacking models was evaluated using key metrics such as the correlation coefficient (R), relative Mean Absolute Error (nMAE), relative Root Mean Squared Error (nRMSE), and the Standard Error of Prediction (SEP) across different phases, including training, testing, and the total dataset [50].

A comparison of nMAE, nRMSE, and SEP for the various BASVM stacking models is depicted in Fig. 6. It becomes evident that the BASVM stacking 30 ISVM model demonstrates superior robustness compared to other stacking models, with nMAE at 4.5487%, nRMSE at 6.2509%, and SEP at 6.2493%. Consequently, this paper places further emphasis on the BASVM (stacking 30 ISVM) model due to its remarkable performance.

Figure 6
figure 6

nMAE, nRMSE, and SEP for the different stacking support vector machine models for testing sample

Performance SVM Models

The structures of the individual support vector machine model labeled as "ISVM" and the single support vector machine model denoted as "SSVM" can be found in Fig. 7. An evident observation is that these support vector machines, "ISVM" and "SSVM," exhibit dissimilar structures and do not exhibit a harmonious relationship in their design. In particular, when comparing the thirty individual support vector machines in "ISVM" to the single support vector machine in "SSVM," it is notable that "ISVM" had a lower count of support vectors. Additionally, each individual support vector machine in "ISVM" achieved a higher correlation coefficient "R" in contrast to "SSVM".

Fig. 7
figure 7

Optimize the structure of ISVM and SSVM models

Based on the preceding discussion, two support vector machine (SVM) models were developed, namely SSVM and BASVM (stacking 30 ISVM models), with the primary objective of predicting global solar radiation. The plots and the parameters of linear regression are distinctly discernible. In Fig. 8a and b, a comparison is presented between the experimental and calculated global solar radiation, where agreement vectors closely approach the ideal values [i.e., a = 1 (slope), b = 0 (intercept), R = 1 (regression coefficient)] during the adjustment of the support vector machine profiles.

Fig. 8
figure 8

Evaluating global solar radiation through experimental and calculated data: a SSVM ‘‘Testing phase’’, b BASVM (Stacking of 30 ISVM) ‘‘Testing phase’’

For the SSVM model in the test phase, the parameter values were [a, b, R] = [0.9264, 35.9732, 0.9727], while for the BASVM model (stacking 30 ISVM models) during the testing phase, the parameters were [a, b, R] = [1.0173, -9.1196, 0.9913]. Notably, the slope in both SVM models is very close to 1 during the testing phase, indicating a strong correlation with the ideal value. Furthermore, the intercept (b) is in proximity to 0 for the testing phase in both SVM models, which is indicative of minimal bias in the predictions. The regression coefficients (R) fall within the generally accepted excellent range (0.90 ≤ R ≤ 1.00) for SVM models (both SSVM and BASVM with 30 networks). This attests to the robustness of the established SVM models and their capability to reliably predict global solar radiation.

Comparison Between ISVM, BASVM, and SSVM

Table 2 provides a comprehensive overview of the performance metrics for thirty individual support vector machine (ISVM) models, the bootstrap aggregated support vector machine (BASVM), and the single support vector machine (SSVM) across different datasets, including training data, testing data, and the combined total datasets for ISVM and SSVM. Specifically, the metrics considered for comparison include the relative Mean Absolute Error (nMAE), Model Predictive Error (MPE), relative Root Mean Squared Error (nRMSE), and the Standard Error of Prediction (SEP).

Table 2 Errors of ISVM, and SSVM models

This comparative analysis serves to underscore that the BASVM models represent a credible alternative to the SSVM models. It is worth noting that the performance of these support vector machines (SSVM, ISVM, and BASVM) can vary across the training, testing, and total datasets. In some cases, a support vector machine that exhibits minimal errors in the training dataset might exhibit more substantial errors when applied to the test dataset.

For instance, ISVM 20 demonstrates a lower relative Root Mean Squared Error (nRMSE) of 5.6766% on the training set, 9.1624% on the testing set, and 6.5116% on the combined total datasets. Notably, the BASVM outperforms in terms of nRMSE, achieving a lower value of 6.2509% on the testing set. This highlights a significant enhancement in accuracy achieved by the collaborative approach of combining multiple models within the BASVM.

Figure 9 illustrates a comparison a comparison between the BASVM (Stacking 30 ISVM) and SSVM models. This comparison of testing outcomes between the BASVM (Stacking 30 ISVM) and SSVM models clearly underscores the advantages of the bootstrap aggregated support vector machine model over the Single Support Vector Machine model (SSVM). It underscores the enhanced performance of the BASVM model in terms of precision, illustrating its superior capability to provide more accurate predictions of global solar radiation when compared to the single support vector machine model.

Figure 9
figure 9

nMAE, MPE, nRMSE, and SPE of bootstrap aggregated support vector machine “BASVM” (stacking 30 ISVM) and signal support vector machine “SSVM” for testing test set

Comparison with Other Models

To assess the significance of our findings, we conducted a comparative analysis with other studies carried out by different researchers, with a particular focus on models that used similar inputs to ours. These models were all aimed at predicting solar radiation. The results of this assessment provide strong evidence of the effectiveness and accuracy of the developed BASVM(Stacking of 30 ISVM) model for predicting global solar radiation. Table 3 displays the outcomes obtained from these aforementioned models alongside the results from our study.

Table 3 Overview of various models for predicting global solar radiation

Conclusion

The primary objective of this current research study is to enhance the predictive capabilities of two robust support vector machine models, namely SSVM and BASVM (Stacking of 30 ISVM), by leveraging an accessible structure–activity relationship. These models are designed for the precise prediction of hourly global solar radiation. The comparative analysis between SSVM and BASVM (Stacking of 30 ISVM) serves to highlight the robustness, reliability, and effectiveness of support vector machine models when applied to meteorological input parameters, including variables such as month, day, time, average temperature, relative humidity, atmospheric pressure, wind speed, and wind direction.

The results of this study exhibit a noteworthy performance difference between the two models. During the testing phase, BASVM (Stacking of 30 ISVM) achieves remarkable consistency between the calculated and experimental data, boasting a relative Root Mean Squared Error (nRMSE) of 6.2509%. In contrast, SSVM records an nRMSE of 11.0112%. This novel model, BASVM (Stacking of 30 ISVM), proves to be a valuable tool for predicting solar radiation, especially in locations without access to measurement equipment such as solarimeters or pyranometers and associated systems. It particularly shines when dealing with scenarios marked by limited available data, the presence of outliers necessitating exclusion, or instances of missing data.

Additionally, this model can play a pivotal role in supporting the installation of solar-energy systems and evaluating thermal conditions in building studies, particularly in regions like Algeria or those with similar climatic characteristics. Its ability to deliver accurate solar radiation predictions makes it a valuable asset for both energy planning and building design in such areas.