1 Introduction

Nowadays, the construction sector is under strain due to global warming and high carbon emissions, the implications of which are increasing significantly. For this reason, the transformation of the construction sector has been one of the hot topics in recent years, on the path to a green and low-carbon mission. Because of the CO2 released during both manufacturing and consumption, the cement industry, which executes the binding task of construction materials, accounts for a considerable share of the entire carbon footprint amount [1,2,3]. To give an example; the cement sector is in charge of approximately 5–7% of global CO2 emissions with an annual CO2 emission of 2.2 Gt and is one of the sectors that cause the most environmental problems [4,5,6,7]. Therefore, there has been a great interest in sustainable construction materials over the last few years. In this context, many researchers in search of innovative material for a green transformation indicate that geopolymer composites can completely or partially replace cementitious materials [8,9,10].

Geopolymers have recently increased their use in the construction sector as they significantly reduce the amount of CO2 emitted to the environment and enable the disposal of factory waste materials thanks to their ability to be produced from waste materials. Geopolymer composites have been shown to reduce CO2 emissions by approximately 60% compared to traditional cementitious materials [11]. However, geopolymer composites have not replaced concrete in the production of structural systems on a global scale because some application disadvantages have not yet been sufficiently overcome [12]. Numerous secondary construction applications, including pipelines, box culverts, pavements, and bridges, can make use of them [13,14,15].

Geopolymer mortars are formed by chemical reactions of materials such as FA, ground granulated blast furnace slag (GGBS), and metakaolin (MT) rich in silica and alumina with alkaline activators [16,17,18]. Geopolymers, in other words, are inorganic compounds with three-dimensional Si–O–Al frameworks that are solvent in alkaline solutions and synthesized by aluminosilicates. Geopolymers are a novel class of ecologically friendly construction materials that provide several benefits over cementitious compounds, including reduced density, excellent fire and acid resistance, minimal shrinkage, great durability, and high compressive and flexural strength [19, 20].

Although geopolymers have recently been the main subject of widespread academic studies and some practical applications, there are currently no production standards for geopolymers. Even if there are some recommendations and design guides, these documents do not constitute a global specification [21]. For this reason, the amount of the materials in geopolymer mixtures can be determined mostly by trial-and-error method. In addition, since there are many different materials in the composition of geopolymers, obtaining the desired compressive and flexural strength based on many parameters like the ratio of binders to each other, alkali/binder ratio, preferred alkali type, sand/binder ratio, alkali molar concentration, water/binder ratio, curing temperature, duration, and type [2]. The proper parameters are not always met for the desired compressive strength and flexural strength, necessitating a retry of the trials. Because of this, there is a great loss of labor, time, and natural resources in experimental studies. A potential solution to this issue is machine learning algorithms, which have gained popularity in recent years.

ML algorithm techniques such as artificial neural networks (ANN) [22], deep neural networks [23], decision trees (DT) [24], and random forests (RF) [25] have gained popularity in recent years. Today, the need for fast solutions caused by increasing industrialization has led to the evolution of algorithms and the increase in their accuracy. The performance criteria of algorithms can generally be determined by the R2 coefficient. For example, in the past, machine learning solutions with an R2 coefficient of 60–70% could be considered adequate, while today these solutions cannot be said to be accurate enough. Studies also support this situation. To evaluate the strength of FA-based geopolymers, Nazar et al. [26] employed an ANN method. Nazar et al. [26] found the R2 value 0.91. Su et al. [27] attempted to determine the strength of GGBS and MT-based geopolymer pastes employing regularized multivariate polynomial regression (RMP). The performance of the algorithm used is 0.927 in terms of R2. Huynh et al. [28] reported that the R2 value for the deep neural network algorithm was 0.912 in their academic study in which they predicted the experimental mechanical outcomes of geopolymer concretes utilizing FA as a precursor.

The mechanical characteristics of geopolymer materials can vary greatly based on the type of precursor or binder used, the type and molar quantity of activator utilized, and the curing configuration. This is assumed to be due to the nonlinear interaction between the parameters utilized, the geopolymer matrix, and the mechanical characteristics. Nguyen et al. [29] found that the mechanical strength of GGBS and FA-based geopolymers varied between 50.4 and 86.5 MPa. According to Kurt et al. [30], the flexural strength of OB, GGBS, and MT-based geopolymer mortars ranged from 0 to 21 MPa. Since it is very important to predict the mechanical characteristics of the building material used in structural engineering applications with sufficient accuracy, it is vital to utilize proper machine learning algorithms as well as significant laboratory investigations. The previous studies [31,32,33,34] investigated the prediction of properties for geopolymer composites using machine learning methods. They explored various aspects such as mechanical strength, microstructure, and environmental sustainability, contributing to the advancement of predictive modeling in the field of geopolymer materials. There are also several studies in the literature that estimate the mechanical strengths of geopolymer composites with machine learning algorithms. Li et al. [2] investigated the optimum design to be used in the production of geopolymers, taking into account the cost and carbon emission factors, using the dataset created by combining the strength properties of GGBS and FA-based geopolymers concrete. In the study [2], a dataset consisting of varying parameters such as slag, F-class FA, sodium silicate, sodium hydroxide, GW, water content, superplasticizer, curing conditions, moisture, and aggregate size was created. Also, Li et al. [2] aimed for the best mix with limiting factors by utilizing different ML algorithms like particle swarm optimization, RF, gradient boosting (GB), and backpropagation neuron network (BPNN). The algorithms with the best prediction performance are BPNN (R2 = 0.76) and GB (R2 = 0.7). Rahmati and Toufigh [22] employed an ANN, support vector regression (SVR), and nonlinear support vector regression (NSVR) approach to predict the compressive strength of FA and GGBS-based geopolymers subjected to high temperatures between 100 and 1000 °C. The results revealed that ANN algorithms performed better. Amin et al. [24] estimated the strength of FA and GGBS-based geopolymer concretes using a combination of some machine learning algorithms like ensemble and single-based. In the study [24], algorithms such as DT, SVR, RF, and other models were applied. According to the performance indicators, the R2 value produced via DT provided the best prediction with 0.93 and the lowest statistical error. Shen et al. [35] evaluated the compressive strength of geopolymers generated from solid waste using machine learning approaches such as RF, GB, and XGBoost. Different methods produced high R2 values in [35], such as RF producing an R2 of 0.911, GB producing an R2 of 0.931, and XGBoost producing an R2 of 0.939. It was evident in [35] that the XGBoost algorithm performs the best. Qi et al. [36] employed hybrid machine learning models for the quick screening of coal waste fly ash based on its link between chemical composition and amorphous phase structure. This study employs a variety of methods, including hybrid ML, regression tree (RT), RF, and the Artificial Bee Colony (ABC) algorithm. The R2 value was found as 0.87 as a consequence of the analyses performed. Using Ca(OH)2 powder, Lin et al. [37] predicted the mechanical characteristics of hydrothermally solidified clay and construction materials. Various methods were utilized in the study including RF, gated recurrent unit (GRU), kNN, BPNN, XGBoost, and Gaussian process regression (GPR). The GPR method produced the best results, R2 value of 0.989 [37]. Tanyildizi [38] predicted the chemical process of FA-based geopolymers using ML methods such as deep long short-term memory (LTSM), KNN, and SVR. The algorithms utilized predicted the dissolution peak time values with precisions of 99.49%, 99.43%, and 92.86%, respectively. Although multiple algorithms were utilized in the above-mentioned experiments based on variable factors such as binder material, alkali activator, curing duration and temperature, and experiment set, different findings were achieved from the diverse algorithms. Consequently, the performance of the algorithms used in the research should be evaluated for the purpose of minimizing workmanship and work duration in the fabrication of geopolymer production.

The main objectives of this paper are as follows:

  • To create a new dataset comprising compressive strength and flexural strength values of geopolymers with different binders, including OB, GW, and FA, to facilitate the evaluation of potential algorithms for strength prediction.

  • To utilize ML approaches in order to provide the best prediction method for flexural strength and compressive strength of geopolymers, aiming to address the significant impact of various parameters on the strength of geopolymers.

  • To determine and compare the performance of various machine learning algorithms, such as LR, DNN, RF, kNN, and XGBoost, in predicting the mechanical properties of geopolymers, thereby evaluating the effectiveness of these algorithms.

  • To assess the sensitivity of the prediction models to different input variables (OB, GW, FA, age, and heat) through sensitivity analysis, aiming to identify the most influential factors on the prediction accuracy of compressive strength and flexural strength.

2 Literature Review

Machine learning techniques, one of the most important subfields of artificial intelligence, are constantly improving. It can be used effectively in different fields such as engineering, space, agriculture, and health in science [39]. Machine learning is often utilized to determine if there is any association between the dataset and its constituent parts [26]. XGboost, LR, RF, kNN, and DNN techniques were used for this work. Since the mathematical model referenced by each algorithm is different, the algorithms can produce solutions at different performance levels for the problem they are used. Therefore, many researchers aim to improve the performance of various algorithms by taking inspiration from different statistical situations. As an example, a machine learning technique known as LR determines whether or not there is a direct correlation between the dependent and independent variables. One of the earliest machine learning algorithms, linear regression is based on several of the premises, the first of which is that inputs and outputs have a linear relationship. Nonetheless, the connection between the problem's inputs and outputs is frequently nonlinear in civil engineering applications. In addition, since errors in the inputs and the model itself can be independent of each other, complex structural engineering more complex and highly accurate algorithms are often needed for the problems [40]. For instance, DNN is a deep learning technique that, like neuron networks in the human brain, creates a network architecture by connecting several sub-layers [41]. KNN is a slack learning model that may be used for tasks involving classification and regression. To predict the outcomes from these training samples, the classifier utilized in the KNN method chooses k training points that are near to the test set point. The Euclidean distance metric is utilized to figure out the distance between the test and training data [42]. In contrast, RF employs a decision tree as its basic learner and randomly breaks the feature set using bootstrap to generate a decision tree. This method assigns regression coefficients to describe the linear link; the gradient descent method then chooses the optimal set of provided regression coefficients. An enhancement of gradient boosting decision tree regression (GBDT) is the ensemble boosting technique known as XGB. By including a penalty component in the GBDT objective function, the number of leaf nodes and their values in the tree model is restricted, minimizing model complexity and avoiding over-fitting [40]. At this stage, random forest builds decision trees independently of one another, whereas XGBoost creates new trees to supplement the algorithm's weak trees. This is the primary difference between the two algorithms [43, 44]. Table 1 lists research in the literature on various algorithms used in machine learning and provides details on these investigations.

Table 1 List of related publications

As seen in Table 1, many different algorithms have been used to predict the strength of geopolymers produced with many different precursors. According to the literature review, the precursors usually used are fly ash and blast furnace slag. Our work is motivated by the fact that ML approaches have not been utilized to evaluate the strength values of geopolymer mortars manufactured from waste glass. To address this gap in the scientific literature, 533 strength values for compressive strength and flexural strength were analyzed using the LR, RF, KNN, DNN, and XGboost algorithms, separately. 80% of the data were utilized to train algorithms, while the remaining 20% was used for testing. The impact of the inputs on mechanical parameters like compressive strength and flexural strength was investigated using the SOBOL sensitivity analysis. The data presented in the study are limited to data obtained as a result of comprehensive laboratory experiments.

In the scope of this work, binders such as OB, GW, and FA were employed to make geopolymer mortars. Although research employing FA are popular, strength data built by using alternative precursors are not accessible in the literature. Academic papers on the ML prediction of geopolymer mortars created using obsidian and waste glass powder are extremely few. In addition, increasing algorithm accuracy requirements have recently popularized the use of the XGBoost algorithm, whose R2 value is usually above 95%. Moreover, the performance of the XGBoost algorithm has not been tested on the precursors (OB and GW). To fill up this gap in the literature, 13 distinct combinations of OB, GW, and FA each had its own dataset of 533 compressive strength and 533 flexural strength values.

3 Materials and Methods

Descriptions of machine learning methods utilized in the work are included in this section. The components and properties of the dataset used for algorithm selection and sensitivity analysis have been extensively defined. The machine learning-based data mining flow diagram that performs the mechanical strength prediction is presented in Fig. 1; through this diagram, the step-by-step operations can be better understood. The procedure applied during the creation of the dataset within the parameters of the study is shown in detail in Fig. 2.

Fig. 1
figure 1

The flow diagram for the mechanical strength prediction using ML in data mining

Fig. 2
figure 2

Experiment and data collection scheme

3.1 Materials

3.1.1 Data Collection

The dataset comprised the compressive strength and flexural strength values of two-part geopolymers containing GW, OB, and FA as binders. Geopolymer mortar samples were produced in 13 different combinations using different binders such as examples GW, OB, and FA, standard sand, and 12 Molar NaOH alkaline activator. A predictive investigation into the influence of molar ratios on the compressive strength of geopolymer mortars is detailed in [49]. Three 40 × 40 × 160 mm prismatic samples were created for each combination. Each poured prismatic specimen provided one flexural and two compressive strength values. The three samples produced for each combination obtained three flexural and six compressive strengths. The produced mortar samples were kept in the mold for 24 h to set them in room condition. The samples taken from the mold were exposed to the curing process for 72 h in ovens at different temperatures like between 75 and 120 °C in 15 °C increments. After the curing process, the samples were left in airtight ziplock bags in the laboratory until the day of breakage. These acquired mechanical strength values comprise the dataset. The dataset consists of five different inputs, namely GW, OB, FA, Age, and Heat, and output with compressive and flexural strengths as separate outputs. The whole process, including the laboratory process, experiments, and tests that make up the dataset, is given schematically in Fig. 2.

3.1.2 Obsidian

Obsidian, also known as volcanic glass, is naturally occurring. The obsidian utilized in the tests conducted within the study was sourced from the obsidian deposits found in the Cagırankaya locality of the Ikizdere district in Rize province. Initially, the obsidian obtained in rock form was crushed using a jaw crusher to reduce the grain size. Subsequently, the reduced-grain-size obsidian was ground in a ball mill with a ratio of obsidian weight to ball weight of 1/24, preparing it for use in the study. The specific gravity is 2.6. Chemical contents are provided in Table 2.

Table 2 Chemical composition of binders

3.1.3 Fly Ash

Fly ash, a waste material from thermal power plants in the industrial sector, was utilized in the production of geopolymers as part of the study. The fly ash was obtained from the Zonguldak Catalagzi thermal power plant. Based on the chemical components determined through X-Ray Fluorescence Spectrometry (XRF) analysis, it is classified as class V according to TS EN 450 [50] standard and F-class FA according to ASTM C618 [51], as the sum of SiO2 + Al2O3 + Fe2O3 exceeds 70%. The specific gravity is 2.06. Chemical contents are provided in Table 2.

3.1.4 Glass Waste

GW from doors and windows commonly found in buildings was utilized in the study. The waste glass underwent grinding in a ball mill, ensuring a GW weight-to-ball weight ratio of 1/24, rendering it suitable for use in the study. The specific weight is 2.6. Chemical contents are provided in Table 2.

3.1.5 Feature Statistics Analysis

In this section, we present the feature statistics for two key datasets: compressive strength and flexural strength. The analysis provides insights into the distribution and characteristics of the features within each dataset. Tables 3 and 4 display the feature statistics for the compressive strength dataset and the flexural strength dataset, respectively.

Table 3 Feature statistics for compressive strength dataset
Table 4 Feature statistics for flexural strength dataset

These statistics provide valuable insights into the distribution and characteristics of the features within each dataset, which can inform further analysis and modeling efforts. Differences in the mean, median, and dispersion values between the two datasets suggest that the datasets might have different underlying distributions. Understanding these differences is crucial for further analysis and modeling efforts, as it helps in selecting appropriate modeling techniques and interpreting the results accurately. There are no missing values for any of these features, and each feature has 533 data points. Overall, the feature statistics analysis provides a foundation for understanding the datasets and informs subsequent steps in the data analysis process.

3.2 Methods

In this subsection, we have described the machine learning methods employed in our study to predict the compressive strength and flexural strength of geopolymer mortar. Each method offers unique advantages and characteristics, contributing to a comprehensive analysis of the relationship between mortar properties and strength properties. Through the application of these methods, we aim to enhance our understanding of the factors influencing the mechanical properties of geopolymer mortar and facilitate the development of optimized mortar formulations.

3.2.1 Linear Regression

LR is a statistical method used to establish the relationship between one or more independent variables (input) and a dependent variable (target) [52]. In our study, we utilize LR to understand how individual variables, namely OB, GW, FA, Age, and Heat, affect the compressive strength and flexural strength of geopolymer mortar. By analyzing the coefficients derived from the linear regression model, we can quantify the impact of each independent variable on the target variables and make predictions for new observations based on these coefficients.

3.2.2 Deep Neural Network Algorithm

We employ the radial basis function networks (RBFN) method, a type of DNN, to predict compressive strength and flexural strength of geopolymer mortar using a comprehensive dataset [53, 54]. RBFN is a supervised learning algorithm that constructs a learning model based on mortar properties such as OB, GW, FA, Age, and Heat, as well as their associated compressive and flexural strengths. By utilizing core and spread functions within its architecture, RBFN identifies patterns and correlations in the dataset, allowing for accurate predictions of compressive strength and flexural strength based on input characteristics.

3.2.3 Random Forest Regressor

The RF algorithm leverages multiple DTs to predict compressive strength and flexural strength of geopolymer mortar [55]. Each decision tree is trained independently on different subsets of the dataset, and predictions from all trees are combined to produce an average prediction, enhancing the accuracy and robustness of the model. Notably, RF can mitigate the impact of noisy or outlier data points, thereby improving prediction reliability [56, 57].

3.2.4 k-Nearest Neighbor Method

KNN method is a machine learning approach used for classification and regression tasks, relying on the similarity between observations [58]. In our study, we represent each observation using a vector of mortar properties (OB, GW, FA, Age, Heat), with compressive strength and flexural strength as the target variables. By considering the k closest neighbors to a given point, the kNN model predicts the target variable values based on the average or weighted average of the neighbors' target values, providing insights into the relationship between mortar characteristics and strength properties.

3.2.5 Extreme Gradient Boosting (XGBoost)

XGBoost, a tree-based learning technique, utilizes gradient boosting to combine weak prediction models into a strong predictive model [59]. By sequentially building trees to correct prediction errors of previous trees, XGBoost effectively captures complex patterns in the dataset. In our study, the XGBoost model learns the relationship between mortar properties (OB, GW, FA, Age, Heat) and compressive strength and flexural strength, thereby enabling accurate predictions of strength properties based on input variables.

3.2.6 Sensitivity Analysis

This section describes the sensitivity analysis conducted on a dataset used for compressive and flexural strength prediction. The dataset consists of six features: OB, GW, FA, Age, Heat, and compressive strength and flexural strength. The goal of the analysis is to come out the impact of each feature on the prediction of compressive strength and flexural strength. To ascertain the significance of inputs in the geopolymer mortar dataset, a Sobol sensitivity analysis was conducted.

The analysis attempted to figure out the individual and cumulative impacts of each variable. The results revealed that OB and GW were critical in the prediction process. Furthermore, these variables were also found to be important properties for flexural strength. These findings suggest that improving and optimizing concrete performance requires a focus on these variables.

The Sobol analysis measures the first-order effects of the output variable, the effect of one input variable on the output independent of the others [60, 61]. Additionally, total-order effects (ST) assess the impact of one input variable along with all other factors, whereas second-order effects (S2) assess the impact of two input variables together on output. In the Sobol analysis, each dataset's input variable is employed, and its impact on the output variable is measured. The outcomes are presented as first-order (S1), S2, and ST. The second-order index depicts the combined influence of two input variables on output, whereas the first-order index depicts the independent effect of input variables on output. The total-order index displays the combined impact of all input variables and output variables [62]. The effect of input variables (OB, GW, FA, Age, and Heat) on compressive strength and flexural strength is measured. The analysis's S1, S2, and ST indices show how important input variables are classified for compressive strength and flexural strength prediction. This study is crucial for figuring out key input factors, bettering the performance of the compressive strength and flexural strength prediction model, and comprehending the connections and effects of the variables in the dataset. Sobol sensitivity analysis is a valuable tool in data-based decision-making processes and provides important information for accurate modeling and predictability [61].

3.2.7 Evaluation Indicators

R2 was selected as a main performance indicator because it is a widely accepted metric for assessing the overall fit of regression models [37, 48, 63]. It provides a clear indication of the proportion of variance explained by the model, making it suitable for comparing different models and evaluating their predictive performance. However, we acknowledge that using R2 alone may not provide a comprehensive assessment of model performance. Therefore, we also utilized additional evaluation metrics such as RMSE, MSE, and MAE to provide a more comprehensive evaluation of the XGBoost model's performance in predicting mechanical properties in geopolymer mortars.

  • R2 (coefficient of determination): It measures the proportion of the variance in the dependent variable that is predictable from the independent variables. R2 values range from 0 to 1, where a higher value indicates a better fit of the model to the data.

  • RMSE (root mean square error): It represents the square root of the average of the squared differences between predicted and actual values. RMSE provides a measure of the typical error in the predictions.

  • MSE (mean squared error): It calculates the average of the squares of the errors between predicted and actual values. Like RMSE, MSE is a measure of the model's accuracy, with lower values indicating better performance.

  • MAE (mean absolute error): It computes the average of the absolute differences between predicted and actual values. MAE provides a measure of the average magnitude of the errors in the predictions.

Table 5 provides formulas for performance evaluation metrics commonly used in regression analysis and machine learning for predictive models.

Table 5 Performance Evaluation Metrics for Predictive Models in Regression Analysis and Machine Learning

4 Results and Discussion

This research seeks to understand how five factors impact the strengths of mortars. For this purpose, several machine learning algorithms tried to predict 533 mechanical strengths (compressive + flexural) data. 80% of the 533 data were employed for testing, while 20% were used for training. Additionally, comparisons between the actual strength values attained through intense laboratory research and the values forecast by ML algorithms are demonstrated. Ultimately, Sobol sensitivity analysis is conducted, and all inputs' importance weights on the outcomes are given. The XGboost algorithm outscored the others, with an MSE of 8.811, RMSE of 2.968, MAE of 1.582, and R2 of 0.981. With an MSE of 90.383, an RMSE of 9.507, an MAE of 5.987, and an R2 of 0.763, the linear regression model had the lowest prediction performance metrics based on numerical data. The two approaches' respective MSE and RMSE were found to differ by around 10.3 and 3.2 times, respectively.

The graphical data demonstrate that compressive strengths between 30 and 45 MPa can be predicted well, however, the prediction ability of compressive strength values between 0 and 30 MPa is poorer than that of higher strengths. It can be observed that the XGboost algorithm performs well in terms of forecasting real values.

For flexural strength, among the algorithms, XGBoost performed best, with 2.668 MSE, 1.633 RMSE, 0.816 MAE, and 0.898 R2 and the LR model performed the poorest, with 6.77 MSE, 2.602 RMSE, 1.417 MAE, and 0.716 R2. There is a 2.5 times difference in MSE and a 1.6 times difference in RMSE between the two techniques (Figs. 3, 4). There is also around 25% difference between their R2.

Fig. 3
figure 3

Performance parameters of different algorithms for compressive strengths

Fig. 4
figure 4

Various performance metrics of different algorithms for flexural strengths

4.1 Linear Regression Algorithm

Figure 5 presents the comparison data for compressive strength and flexural strength obtained from the LR model. Figure 5a, c exhibits a correlation between the model and real results for compressive strength and flexural strength, respectively. According to the data placement, for compressive strength; although most of the data are located on the center line, about 40% of the total data are located outside the ± 20% error line. This indicates that the LR model predicts some of the data quite well but predicts a significant amount of data quite poorly. Figure 5b displays the distribution of the real data, predicted data, and error rates for the LR model's compressive strength. According to these statistics, the LR model's minimum, maximum, and average error values are, respectively, 0.87, 15.54, and 7.74 MPa. The performance metrics of the LR model are given in Fig. 3. The R2, MAE, RMSE, and MSE metrics of the LR model for the test data are 0.763, 5.987, 9.507, and 90.383, respectively. From these values, it can be inferred that the capacity of the LR model to correlate prediction and actual values is quite weak in general. Using LR, ANN, and AdaBoost algorithms, Ansari et al. [62] attempted to predict the compressive strength of FA-based geopolymer concretes. The performance metrics of R2, MAE, and RMSE for the LR model are 0.651, 5.027, and 7.211, respectively. In addition, the data outside the ± 20% error line constitute 25% of the total data. These results demonstrated that the study's linear regression model was in line with the literature. The flexural strength data in Fig. 5c show that while most of the data are on the center line, almost 28% of the total data are outside the ± 20% error line. The distribution of actual data, prediction data, and error rates for the flexural strength of the linear regression model is shown in Fig. 5d. These numbers clearly demonstrate that the LR model's minimum, maximum, and average error values are, respectively, 0.45, 4.70, and 2.03 MPa. The performance metrics of the LR model are given in Fig. 4. The R2, MAE, RMSE, and MSE metrics of the LR model for the test data are 0.716, 1.417, 2.602, and 6.77, respectively.

Fig. 5
figure 5

Compressive and flexural strengths real and estimated data graph obtained from the LR algorithm: a compressive strength, b error values for compressive strength, c flexural strength, d error values for flexural strength

4.2 k-Nearest Neighbor Model (KNN)

The comparison data of the kNN prediction model for the compressive strength and flexural strength of mortars are given in Fig. 6. Figure 6a, c indicates the association between the model-predicted values and the real ones for compressive strength and flexural strength, respectively. According to the configuration of the data in the graph for compressive strength, about 46% of all data are located on the center line, while about 22% of the total data are located outside the ± 20% error line. This demonstrated that while the kNN model predicted certain data with a big variance, it typically performed better than the linear regression model. Figure 6b exhibits the distribution of experimental data, prediction data, and error rates for the compressive strength of the kNN model. The kNN model has minimal, maximum, and average error values of 0.97, 11.21, and 5.82 MPa, respectively. Furthermore, the performance metrics of the kNN model are given in Fig. 3. The R2, MAE, RMSE, and MSE metrics of the kNN model for the test data are 0.804, 4.669, 8.643, and 74.709, respectively. These numerical metrics indicated that the kNN model improves the estimation performance over the LR model. Tanyildizi [38] estimated values for the chemical process of FA-based geopolymers using LSTM, kNN, and SVR algorithms. It was seen that the kNN model of our study was consistent with the study mentioned in the literature. According to the flexural strength data in Fig. 6c; although most of the data are on the centerline, about 21% of the total data are outside the ± 20% error line. The distribution of actual data, prediction data, and error rates of the kNN model for flexural strength is shown in Fig. 6d. The results obtained indicates that the kNN model's minimum, maximum, and average error values are, respectively, 0.13, 2.86, and 1.26 MPa. The kNN model performance metrics are given in Fig. 4. The R2, MAE, RMSE, and MSE metrics of the model for the test data are 0.835, 0.964, 2.041, and 4.165, respectively.

Fig. 6
figure 6

Compressive and flexural strengths real and estimated data graph obtained from the kNN algorithm: a compressive strength, b error values for compressive strength, c flexural strength, d error values for flexural strength

4.3 Deep Neural Network Model (DNN)

Figure 7 displays the comparison data of the mechanical strength prediction model for DNN. The correlation between DNN prediction model data and experimental data for compressive strength is given in Fig. 7a. According to the distribution of the data, about 49% of the total data are located in the centerline, while about 24% of all data are located outside the ± 20% error line. This may point out that the error rates of kNN and DNN models are similar. Figure 7b illustrates the real data, predicted data, and realized error rates of the DNN model for compressive strength. The DNN model has minimal, maximum, and average error values of 0.56, 12.49, and 5.19 MPa, respectively, given the error rates data. The R2, MAE, RMSE, and MSE metrics of the DNN model for the test data are 0.93, 0.2848, 5.17, and 26.732, respectively. These results revealed that the DNN model's prediction performance was at a high level. Emarah et al. [63] estimated 862 compressive strength values that have FA-based geopolymer concrete by utilizing ANN, DNN, and ResNet algorithms. The R2 for the performance metric of the DNN model was 0.878, and the mean-absolute-percentage deviation (MAPD) for the error rate was 9.476. The R2 value of our DNN model is about 6% higher than the study [63] in the literature, while the average error value is about 83% lower. With these values, it can be concluded that the performance of the DNN model used in our study is quite good compared to the literature. From the flexural strength data displayed in Fig. 7c, about 17% of the total data are beyond the ± 20% error line, even though most of the data are on the center line. Figure 7d presents the pattern of distribution of the real data, predicted data, and the error rates that obtained from the DNN model for flexural strength. These numbers implied that the DNN model's lowest, maximum, and average error levels were, respectively, 0.49, 3.57, and 1.82 MPa. The results of prediction metrics for the DNN model are given in Fig. 4. The values of 0.864, 0.834, 1.797, and 3.23 are the model's R2, MAE, RMSE, and MSE for the test data, as well.

Fig. 7
figure 7

Compressive and flexural strengths real and estimated data graph obtained from the DNN algorithm: a compressive strength, b error values for compressive strength, c flexural strength, d error values for flexural strength

4.4 Random Forest Model (RF)

The comparison data of the random forest prediction model for compressive strength and flexural strength are given in Fig. 8. Figure 8a demonstrates a connection between the RF model and the experimental results for compressive strength. According to the distribution of the data, about 22% of the total data deviated from the error line of ± 20%. With these results, it can be said that the random forest algorithm performs predictions with excellent accuracy. The actual data, predicted data, and the error rates between them for the compressive strength of the RF model are depicted in Fig. 8b. Given these error values, the RF model's lowest, maximum, and average error values are, respectively, 0.31, 11.58, and 4.21 MPa. The performance metrics of the RF model are given in Fig. 3. The R2, MAE, RMSE, and MSE metrics for the RF model are 0.96, 2.119, 3.992, and 15.933, respectively. These metric values can indicate that the overall prediction level of the RF model is quite good. Li et al. [4] employed RF, GB, and BPNN algorithms to predict the 28-day compressive strength of GGBS and FA-based geopolymer concretes. For the RF model, the R2 and RMSE performance indices are 0.71 and 5.79, respectively [4]. In addition, the data outside the ± 20% error line constitute 11% of the total data. The reason why the deviant data are few in the current study is the fact that the total data are approximately 3 times the data used in the current study. Although there were not many deviations within the ± 20% error line, it was found that most of the data diverged from the main trend. The R2 value of the current study is similar to the R2 value obtained by Li et al. [2], which is about 35% higher than the R2 value obtained in the present study. According to the flexural strength data in Fig. 8c, although most of the data are located in the centerline, about 23% of the total data are located outside the ± 20% error line. The distribution of the actual data, prediction data, and the resulting error rates for the flexural strength of the RF model is depicted in Fig. 8d. These values point out that the minimum, maximum, and average error values of the RF model are 0.29, 3.04, and 1.71 MPa, respectively. The performance metrics of the RF model are given in Fig. 4. 0.851, 0.868, 1.887, and 3.56 are the model's R2, MAE, RMSE, and MSE values for the test data, respectively.

Fig. 8
figure 8

Compressive and flexural strengths real and estimated data graph obtained from the RF algorithm: a compressive strength, b error values for compressive strength, c flexural strength, d error values for flexural strength

4.5 Extreme Gradient Boosting Model

The data of the XGBoost model for the predicted compressive strength and flexural strength values are given in Fig. 9. Figure 9a depicts the link between the XGBoost model and real compressive strength data. According to the placement of the scatter of the data; approximately 13% of the total data lie outside the ± 20% error line. This numerical data showed that the least deviated prediction dataset among the algorithms used was obtained in the dataset using the XGBoost algorithm. It was also found that the data distributed on the ± 20% error line were more balanced than the RF model. Experimental data, prediction data, and error rates for compressive strength are shown in Fig. 9b. These error data implied that the XGBoost model's minimum, maximum, and average error values were 0.52, 9.25, and 3.45 MPa, respectively. Although the minimum error was 40% higher for the XGBoost model than for the RF model, the maximum and average errors were about 25% and 22% lower, respectively. The performance metrics of the XGBoost model are given in Fig. 3. The R2, MAE, RMSE, and MSE metrics for the XGBoost model are 0.981, 1.582, 2.968, and 8.811, respectively. These metric values demonstrated the XGBoost model's very good prediction ability because the R2 values are 2.1%, 5.5%, 22%, and 28.6% higher than the other algorithms, respectively. When the metrics and error rates are analyzed, R2 increases as the prediction performance of the algorithm improves, while MAE, RMSE, MSE, minimum, maximum, and average error rates decrease. The compressive strength of Ca-based geopolymers was predicted by Huo et al. [41] employing modern algorithms as KNN, SVM, BA, RF, ET, GBDT, XGBoost, and DNN. The performance metrics for the XGBoost model are R2 0.91 and the average error rate is 2.51 [41]. According to this data, the R2 value of the current study is about 7.8% higher than Huo et al. [41] while the average error value is about 37.4% higher. The reason why the average error value is higher than the current study is that the number of data they used are quite dense compared to the current study. Approximately 19% of the total data, as indicated by the flexural strength data in Fig. 9c, are outside the ± 20% error line, even though the majority of the data are in the center. The distribution of actual data, prediction data, and error rates for the flexural strength of the XGBoost model is shown in Fig. 9d. Considering on the supplied parameters, the XGBoost model's lowest, maximum, and average error values are 0.32, 5.49, and 1.95 MPa, respectively. The performance metrics of the XGBoost model are given in Fig. 4. The model's R2, MAE, RMSE, and MSE metrics for the test data are 0.898, 0.816, 1.633, and 2.668, respectively.

Fig. 9
figure 9

Compressive and flexural strengths real and estimated data graph obtained from the XGBoost algorithm: a compressive strength, b error values for compressive strength, c flexural strength, d error values for flexural strength

The R2 performances of the algorithms were realized as LR < KNN < DNN < RF < XGBoost for compressive strength. It is LR < KNN < RF < DNN < XGBoost with a small difference for flexural strength. In accordance with the literature, flexural strength prediction performance is often lower than compressive strength prediction performance. This is thought to be due to the narrower data range of flexural strengths compared to compressive strengths [64].

Parhi and Patro [25] predicted compressive strengths using 1123 concrete compressive strength datasets. They employed a variety of cutting-edge methods, RF, neural network (NN), multivariate adaptive regression splines (MARS), and hybrid ensemble machine learning (HEML). Their research revealed that the HEML method delivered the best outcome (R2 = 0.962). Additionally, by utilizing the XGBoost algorithm as a meta-regressor, Parhi and Patro [25] created the HEML technique. When Saad et al. [62] wanted to predict the compressive strength of FA-based geopolymer concrete, they favored techniques including LR, ANNs, and AdaBoost as an ensemble machine learning (EML). The AdaBoost model was shown to be the most successful for a precise compressive strength prediction by the study's findings, which included an R2 of 0.944, RMSE of 2.506, and MAE of 1.259. Also, R2 was 0.701, RMSE 5.805, and MAE 4.502, for LR algorithm result, respectively. Our study's R2 = 0.76 value and the R2 = 0.71 value achieved by linear regression are pretty comparable. The discrepancy is thought to be due to the fact that compressive strength measurements in the published study [62] were obtained from mortar specimens. Ma et al. [65] used 896 data points to estimate the compressive strength of FA-based geopolymers utilizing three novel algorithms like SVR, RF, and XGBoost. They reported that XGBoost's prediction performance was pretty good (R2 = 0.97). The material added to the combination, of which FA is the best significant component, adds more to the compressive strength than the other criteria, according to the conclusion of the feature ranking analysis used in their research. Consequently, the study based on feature ranking and other research in the literature demonstrate the accuracy of the results that obtained from computer science computations by utilizing machine learning algorithms.

Mehta [66] employed ANN to forecast the strength of concrete that was made using residual foundry sand. He determined that compressive strength had an R2 value of 0.903 and flexural strength had an R2 value of 0.831. They detected a reduction of 8.6% in flexural strength between R2 levels. Although the data and methodology in this study differ, the results are consistent with the literature since the R2 indicator value achieved decreased flexural strength by roughly 9.2%.

The R2 values of several investigations in the literature are displayed in Table 6. For the LR approach, the R2 value from the current study is almost 17% higher than the R2 value found by Ansari et al. [62]. The average R2 value obtained for the KNN algorithm in the table is approximately 0.837. While the R2 value obtained from the present study is 0.804 for the KNN algorithm, Lin [37] obtained this value as 0.906. While the average value of the DNN algorithm in the table is 0.846, the R2 value obtained in the current study is about 10% higher than this value. The R2 value obtained for the RF algorithm from studies in the table is 0.88. The R2 value obtained in this study is about 9% higher than the average R2 value in the literature for the RF algorithm. In addition, while the average R2 value in the table for the XGBoost algorithm, which usually has maximum R2 values, is 0.934, the R2 value of the current study is about 5% higher than this value. The fact that the R2 values of the algorithms in the current study are higher than the R2 values of other studies in the literature shows that the strength values of geopolymers using OB and GW can be predicted with higher precision.

Table 6 R2 values of different works from literature

4.6 Sensitivity Analysis Results

Figure 10 depicts various Sobol sensitivity index values for compressive strength. Also, for S1 and ST indices, glass waste powder is the best fundamental input for compressive strength, followed by GW and FA, albeit in lower amounts. Sobol sensitivity results implied that the OB, GW, and FA powder amounts seem to be the best significant in ascertaining the output values. Other factors (age and heat) were less relevant single variables in their interaction effects, but they nevertheless made a contribution to compressive strength anticipation. These insights may assist in prioritizing the efficient components in the compressive strength estimation issue.

Fig. 10
figure 10

Sobol analysis results for compressive strength values

The Sobol index findings for the flexural strength dataset are given in Fig. 11. The three most crucial ingredients were OB, GW, and FA powder, both separately and in combination with other elements. Age and heat were less important than other components. The S2 indices pointed out that there were coaction effects between specific variable pairs, stressing the need of investigating their combined influence. OB was the most essential input for flexural strength output, according to the conclusions of the S1 and ST order indices. According to Sobol sensitivity analysis, the effect of age and heat components on compressive and flexural strengths was quite low. Since they were low for these datasets, their effects can be ignored in different studies using OB and GW [67,68,69].

Fig. 11
figure 11

Sobol analysis results for flexural strength values

The Sobol sensitivity analysis provides valuable insights into the influence of various parameters on the compressive strength and flexural strength of geopolymer mortars. These results allow us to prioritize the parameters that significantly affect the mechanical properties, thereby informing future research directions and optimization strategies.

Among the parameters studied, it was evident that the binder type, represented by parameters such as OB, GW, and FA, played a crucial role in determining the compressive strength of geopolymer mortar. The first-order indices indicated that OB had the highest influence, followed by GW and FA. This underscored the importance of selecting the appropriate binder material to achieve the desired compressive strength properties.

The mutual influence of various factors, as indicated by the second-order indices, further elucidated the interactions between different parameters. The significant second-order indices between OB-GW and OB-FA highlighted the synergistic effects of these binder materials on compressive strength. Understanding these interactions is essential for optimizing geopolymer mortars formulations and enhancing their compressive strength properties. Similarly, in the flexural strength sensitivity analysis, OB, GW, and FA emerged as influential parameters, with OB exhibiting the highest S1 index. This reaffirmed the importance of the binder composition in determining the flexural strength of geopolymer mortars. The S2 indices revealed notable interactions between OB-GW and OB-FA, emphasizing the combined effects of these binder materials on flexural strength. Additionally, the interactions between age and heat, although minimal, suggested that curing conditions also contributed to the flexural strength properties. Comparing the sensitivity results between compressive strength and flexural strength, it was evident that the same parameters, namely OB, GW, and FA, consistently exhibited significant influence on both mechanical properties. This consistency underscored the robustness of these parameters in affecting the overall mechanical performance of geopolymer mortars.

Based on these sensitivity results, future research efforts should focus on further optimizing the formulations of geopolymer mortars by exploring the synergistic effects between binder materials and other influencing factors such as curing conditions. Additionally, advanced modeling techniques can be employed to accurately predict the mechanical properties of geopolymer mortars based on the identified influential parameters. In conclusion, the Sobol sensitivity analysis offers insights into the factors influencing the mechanical properties of geopolymer mortars, thereby guiding the formulation and optimization of these materials for various structural applications in the construction industry.

4.7 Comparative Generalization Analysis of ML Models

Tables 7 and 8 present the probabilities that the performance of one model is higher than another based on the coefficient of determination (R2) obtained from fivefold cross-validation on the geopolymer dataset.

Table 7 Comparative generalization analysis of model performance based on fivefold cross-validation results for the compressive strength dataset
Table 8 Comparative generalization analysis of model performance based on fivefold cross-validation results for the flexural strength dataset

Based on the provided cross-validation results for the geopolymer dataset focusing on compressive strength and comparing various machine learning models, XGBoost consistently demonstrated high probabilities of outperforming other models, with probabilities ranging from 0.956 to 0.987. This indicated that XGBoost generally achieves superior predictive accuracy compared to other models across the fivefold cross-validation process. RF also showed competitive performance, particularly against NN, LR, and KNN. However, its dominance was not as pronounced as that of XGBoost. Random forest was known for its robustness and ability to handle complex datasets, which was reflected in its relatively high probabilities of outperforming other models. NN and KNN models generally showed lower probabilities of outperforming other models, indicating comparatively weaker predictive performance in this context. NNs were powerful models capable of learning complex relationships in data, but their performance may vary depending on factors such as architecture and hyperparameters. Similarly, KNN's performance might be limited by its reliance on local similarities, especially in high-dimensional spaces. LR consistently exhibited the lowest probabilities of outperforming other models, suggesting relatively weak predictive performance. LR assumes linear relationships between variables and may struggle to capture nonlinear patterns present in the geopolymer dataset. It was important to interpret these results in the context of cross-validation, which assessed the models' performance on unseen data from the same dataset. While XGBoost and RF emerged as strong performers in this study, further evaluation on independent test datasets was necessary to assess the models' generalization performance and robustness across datasets. Therefore, the same analysis was carried out on the flexural strength dataset.

The results of the fivefold cross-validation on flexural strength dataset provided insights into the relative performance of different machine learning models based on the coefficient of determination. XGBoost emerged as the top performer in terms of the coefficient of determination, with high probabilities of outperforming other models. The probabilities of it outperforming other models ranged from 0.932 to 0.99, indicating its superior predictive accuracy compared to other models. The RF model also demonstrated competitive performance, particularly against NN, LR, and KNN. With probabilities ranging from 0.516 to 0.989, the RF exhibits strong predictive capability, suggesting that it captures important patterns in the flexural strength dataset effectively. The KNN model exhibited mixed performance, with moderate probabilities of outperforming other models. While KNN performed relatively well against NN and LR, its performance was weaker compared to XGBoost and RF. These results provided understandings into the relative performance of models within the specific context of the flexural strength dataset and the coefficient of determination metric.

Based on the cross-validation results provided in both tables, we can draw several conclusions regarding the generalization performance of the models on both the compressive strength and flexural strength datasets.

  • The high probabilities of XGBoost consistently outperforming other models on both the compressive strength and flexural strength datasets suggested that XGBoost had strong generalization capabilities. This indicated that XGBoost is robust and capable of capturing complex patterns in geopolymer materials, making it a reliable choice for predictive modeling tasks in this domain.

  • While RF and NN also demonstrated competitive performance, XGBoost consistently outperformed them, indicating its robustness and effectiveness in capturing complex patterns in geopolymer dataset. LR and KNN, on the other hand, exhibit lower generalization performance, indicating that they may not be as effective in capturing the underlying relationships in the dataset.

To wrap up, the findings presented in this paper make important contributions to the field of geopolymer materials research, offering valuable insights into the predictive modeling of mechanical strengths and paving the way for enhanced understanding and optimization of prediction models for geopolymer mortar formulations. These contributions can be listed as follows:

  1. 1.

    A novel machine learning framework for geopolymer mortars made using OB, GW, and FA had been presented in the literature. Due to the nonlinear nature of geopolymer mortars, it is a very difficult problem to predict strength predictions with non-experimental methods. For this reason, trial-and-error methods offer a traditional solution to this problem, but the proposed ML framework will reduce labor and time losses.

  2. 2.

    Prediction of mechanic strength of geopolymer mortars produced with GW by ML algorithms based on performance metrics such as R2, MAE, and RMSE had been carried out for the first time.

  3. 3.

    XGBoost algorithm was found to be the highest-strength prediction algorithm. The use of this algorithm increased the accuracy of strength predictions of geopolymer mortars based on GW.

  4. 4.

    The sensitivity analysis results indicated that GW was the most efficient component. These studies allowed for the determination of the important weights for the impact parameters influencing the output variables.

5 Conclusion

This paper presents ML predictions of the compressive strength and flexural strength of two-part geopolymer mortars using various binders such as OB, GW, and FA. Comprehensive laboratory research yielded 533 compressive and 533 flexural strength values. These values form the dataset for several machine learning algorithms such as LR, DNN, kNN, RF, and XGboost. Furthermore, the impacts of inputs such as OB, GW, FA, Age, and Heat on output values such as compressive strength and flexural strength, among others, were investigated using SOBOL sensitivity analysis on the dataset.

  • The XGBoost algorithm exhibited the highest performance in predicting compressive strength values, achieving an R2 value of 0.981, while the LR algorithm demonstrated the lowest performance with an R2 value of 0.763.

  • For the prediction of flexural strength values, similar to the compressive strength prediction performance, the XGBoost algorithm exhibited the highest performance with an R2 value of 0.898, while the LR algorithm displayed the lowest performance with an R2 value of 0.716.

  • The prediction performances of the algorithms are ranked from highest to lowest as follows: XGBoost, RF, DNN, kNN, and LR.

  • Analyzing the MAE, MSE, RMSE, and R2 values, which are important parameters for evaluation, for both prediction sets, the highest prediction performance was obtained from the XGBoost algorithm, while the lowest prediction performance was attained from the LR algorithm.

  • In the sensitivity analysis, while the order of importance of the components for compressive strength is OB, GW, and FA, this order is OB, GW, and FA for flexural strength. In this case, it is explicitly seen that OB has a significant effect in both analyses. The effect of age and temperature seems to be less effective in both analyses. According to this finding, it was evident that the predictive performance of the algorithms used in the mechanical properties of geopolymer mortar samples was highly dependent on the binder used.

  • As a result of different cross-validation analyses for compressive and flexural strengths, the highest performance was obtained from the XGBoost algorithm, with probabilities ranging from 0.956 to 0.987 for compressive strength and from 0.932 to 0.990 for flexural strength.

Considering factors such as laboratory operations, time, labor, and cost, predicting the mechanical properties of building materials containing different binders provides significant benefits. Therefore, it is important to predict the mechanical properties of building materials using machine learning algorithms. In the literature, mechanical properties of building materials containing common binders such as GGBS and FA are generally available. The research team plans to explore construction materials utilizing binders not documented in the literature and aims to uncover the relationship between chemical content and mechanical properties of materials, preparing samples using various machine learning algorithms.