1 Introduction

The replacement of virgin materials with recycled waste materials is among the core pillars of sustainable construction to address global warming, CO2 emissions, and natural resource depletion. Geopolymer is an alternative cementitious binder to ordinary Portland cement [1] and it provides a promising lower CO2 emission option for use in construction [2, 3]. French Engineer Professor Joseph Davidovits coined the terminology “Geopolymer” in 1978 [4, 5] to represent the polymeric Si–O–Al reaction product between an aluminosilicate precursor (e.g., fly ash, slag, volcanic ash, rice husk ash) and an alkaline medium (e.g., NaOH and/or Na2SiO3) giving a chemical equation \(Mn[ - \left( {SiO_{2} )_{z} - AlO_{2} } \right]n \cdot wH_{2} O\) where M represents a K+ or Na+ cation, n is the polycondensation degree, and z is 1, 2, 3, or ≫ 3. The geopolymerization process is greatly influenced by the aluminosilicate precursor, SiO2/Al2O3, alkaline activator, and curing conditions [6]. Aided by the curing condition, the dissolution to hardening process can happen in either an alkaline or acidic medium yielding a 3-D polymeric product having comparable mechanical properties to ordinary Portland cement [7]. The alkaline activator consisting of alkali hydroxides, silicates, or a mixture of both, is commonly used to dissolve the precursor. The reactivity of the precursor depends on the mineralogy, morphology, chemical composition, and particle size distribution. The performance of the developed geopolymer product can vary for the same aluminosilicate precursor and alkaline activator due to raw material heterogeneity, nonlinearity, and inconsistency [8]. The input parameters commonly used for the mixture design of geopolymers comprise alkaline liquid/binder ratio (AL/B), molarity of alkali hydroxide, alkali silicate modulus, alkali silicate/alkali hydroxide ratio, curing time and temperature, and fine/coarse aggregates [9].

Due to the lack of universal geopolymer mix design standards and the burden of time-consuming costly extensive laboratory experiment testing, machine learning and statistical modeling have been applied in predicting the mechanical properties of geopolymers and other civil engineering materials. It is imperative to create a mix design that gives better engineering properties by optimizing the different geopolymer formulation parameters. Machine learning (ML) and statistical modeling (SM) provide versatile tools of optimization and inference analytics to find global (or local) minima or maxima predicting the mechanical strength of geopolymers. Statistical modeling is a powerful optimization tool used to develop mathematical relationships and make predictions between variables [10]. The statistical models are commonly grouped into regression, classification, and clustering. The regression models consist of linear, logistic, and polynomial regression, and are used to predict a dependent variable based on independent variables. The classification models consist of neural networks, decision trees, and Naïve Bayes, and are used to categorize data into different classes. The clustering models consist of K-means which group data points into clusters based on similarity [10]. The term “machine learning”, a subset of artificial intelligence, was coined in 1959 by American computer scientist Arthur Samuel to represent self-teaching computers capable of developing different algorithms to solve problems [11,12,13]. Artificial intelligence (AI) represents the theory and integration of computer systems in performing tasks that mimic human intelligence e.g., perceiving, learning, classifying, and decision-making [14]. The use of ML has tremendously increased over the years due to its complex problem-solving abilities even amidst lacking and chaotic data. Its wide-ranging applications cover different fields such as medicine, economy, military, and construction. The U.S. National Institute of Standards and Technology (NIST) advanced concrete technology by implementing computer-integrated knowledge systems (CIKS) to predict the performance and life-cycle cost of high-performance concrete and new construction materials [15, 16]. Different supervised and unsupervised machine learning algorithms comprising artificial neural networks (ANN), deep neural networks (DNN), random forest (RF), decision tree (DT), gene expression programming (GEP), support vector machines (SVM), bagging regressor (BR), dimensionality reduction, clustering, etc., have been extensively applied in predicting the engineering properties of construction materials [17,18,19,20,21]. However, supervised machine learning algorithms are commonly used for predicting and forecasting the mechanical properties of civil engineering construction materials [22,23,24]. Compressive strength is the most modeled engineering property since it greatly influences durability and safety rating [25,26,27]. Validation of the developed strength predictive models is done using the statistical performance metrics comprising Pearson correlation coefficient (R), determination coefficient (R2), mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and a20-index [28,29,30].

Several researchers have applied machine learning and statistical modeling to predict the mechanical properties of geopolymers and other civil engineering materials. Huo et al. [8] used machine learning models to predict the strength of calcium-based geopolymers and concluded that tree-based ensemble models perform better in strength prediction. Cavaleri et al. [31] concluded that ensemble convolution-based machine learning models perform in predicting the strength of concrete giving an R2, MSE, and a-20 index of 0.84, 0.022, and 0.75. Asteris et al. [32] applied machine learning models in the strength prediction of cement-based mortar and concluded that AdaBoost and RF have better prediction performance giving an R2, RMSE, and a-20 index of (0.9965, 1.305, and 0.985) and (0.9768, 3.359, and 0.950), respectively. Chou et al. [27] used individual and ensemble machine learning models to predict the compressive strength of high-performance concrete and concluded that SVM and MLP performed better in strength prediction. Similarly, Golafshani et al. [33] developed strength predictive models for normal and high-performance concrete using machine learning techniques and concluded that the hybridization of ANN and ANFIS with Grey Wolf Optimizer had better performance in strength prediction. Zhong et al. [34] used ANN to predict the peak stress, elastic modulus, and peak strain of geopolymer concrete giving relative predictive errors of less than 10% and 20%, respectively. Tian et al. [35] used statistical modeling to predict the compressive strength of fly ash-slag-based geopolymer and concluded that the statistical model based on the response surface methodology-central composite design gives a better predictive performance with an R2 value of 0.9415. Similarly, Zahid et al. [36] and Hamdane et al. [37] concluded that response surface methodology-central composite design provides better strength prediction performance. Ghanizadeh et al. [38] utilized a hybrid multivariate adaptive regression splines-escaping bird search optimization algorithm to predict the bearing capacity of geogrid reinforced stone columns and concluded that MARS-EBS model had better predictive performance (R2 of 0.997, RMSE of 4.19) compared to SVR-POLY (R2 of 0.952, RMSE of 20.88) and SVR-RBF (R2 of 0.985, RMSE of 9.61). Ali et al. [39] concluded that full quadratic models provide better concrete strength prediction accuracy in concrete giving an R2 of 0.96 and RMSE of 3.49.

The nonlinearity, heterogeneity, and inconsistency of geopolymer mix design have urged the research community to intensify research on supplementing the experimental design approach with machine learning and statistical modeling to improve the practical strength performance of geopolymers. Despite several studies on using ML and SM in materials science and engineering, a few researchers have systematically reviewed the state-of-the-art application of machine learning and statistical modeling in predicting geopolymer strength. Li et al. [9] reviewed the mixture design methods for geopolymer concrete (GPC) categorized into target strength, performance-based, and statistical models. Ahmed et al. [40] reviewed the mix design parameters of fly ash geopolymer composites and the development of compressive strength prediction models using linear regression, multi-logistic regression, M5P-tree, and ANN. Alaneme et al. [41] reviewed the theoretical principles of GEP, FIS, ANFIS, and ANN in the strength prediction of agro-waste geopolymers and concluded that AI techniques could contribute to lowering global warming and CO2 emissions. Paruthi et al. [42] reviewed the use of ANN for strength prediction of geopolymer concrete and concluded that ANN gives significant performance with minimal errors. Rathnayaka et al. [43] reviewed the machine learning models applied in the strength prediction of fly ash-based geopolymer concrete and concluded that ANN, DNN, SVM, RF, ANFIS, and ResNet have better strength prediction accuracy. However, the limited knowledge of the consolidation between conventional experimental mix design approaches and machine learning techniques has impacted the adoption pace of geopolymer strength predictive models. Due to geopolymer heterogeneity, nonlinearity, and uncertainties, it is imperative to supplement the existing experimental testing with machine learning in characterizing mechanical properties and mix design of geopolymers to advance and improve their practical usage in the construction industry. Therefore, this systematic review aims to elaborate and consolidate the fundamental machine learning algorithms and statistical models applied in geopolymer strength prediction to balance economies of scale and mix design. This review specifically delves into the statistical linear/nonlinear optimization algorithms, supervised machine learning algorithms, and model performance statistical metrics. Finding an optimal strength predictive model, equation, or activation function based on the geopolymer experimental dataset is an iterative optimization process that depends on the type of input and output variables. This review promotes the supplementation of conventional experimental mix design approaches with machine learning techniques in solving geopolymer strength-related problems to give greater confidence to engineers and researchers in the applicability and versatility of these models to real-life practical scenarios saving time and minimizing costs.

2 Research significance

In this era of big data, fourth industrial revolution, and artificial intelligence, it is imperative to widely promote the supplementation of ML techniques in developing data-driven strength predictive models for geopolymer. The nonlinearity and heterogeneity of geopolymer mix design require supplementing the traditional experimental laboratory design approaches with machine learning algorithms and empirical regression models during pre-/post-design and quality control. This review provides a detailed reference guideline for construction practitioners and researchers on the state-of-the-art fundamental machine learning algorithms and statistical models applied in strength prediction and sustainable mix design of geopolymers. Furthermore, the review provides a comprehensive breakdown of future research areas required to promote the practical applicability of ML in geopolymer strength prediction and mix design.

3 Methodology

The Preferred Reporting of Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist [44] was used for data preparation. The Scopus database was chosen for bibliometric data extraction due to its versatility. A similar approach was used by Matsimbe et al. [3] and Zhang et al. [45]. The search strings devised to carry out the review were “geopolymer” OR “alkali-activated materials” AND “machine learning” OR “statistical modeling”. Table 1 shows the eligibility criteria used for data inclusion in this review. The PRISMA flowchart in Fig. 1 indicates that 169 articles were collected from the Scopus database as of 20 September 2023. The snowballing technique [46] and ROBIS tool [47, 48] were further adopted to collect additional data and assess the risk of bias, respectively.

Table 1 Eligibility criteria used in retrieving data from Scopus
Fig. 1
figure 1

PRISMA flowchart

4 Results analysis

4.1 Statistical linear and nonlinear optimization algorithms

Statistical linear and nonlinear models associate dependent and independent variables through mathematical relationships [10]. The developed mathematical equations are used to understand patterns between variables and predict the compressive strength of geopolymers. Sharma et al. [49] developed predictive models for the compressive strength of ground granulated blast furnace slag (GGBFS) and fly ash (FA) geopolymer composite using different regression algorithms i.e., linear regression (LR), ridge regression (RR), and the least absolute shrinkage and selection operator (LASSO). The mathematical algorithms for LR [50], RR [51], and LASSO [52] are represented in Eqs. 1, 2, and 3, respectively.

$${\text{Y}} = {\beta X} + {\upvarepsilon }$$
(1)

where Y: the target variable, X: the predictor variable, \({\upbeta }\): the coefficient of regression, and \({\upvarepsilon }\): random error.

$${\upbeta }_{{{\text{min}}}} = \left( {\frac{1}{{2_{{\text{n samples}}} }}} \right)\left| {\left| {{\beta X} - {\text{y}}} \right||_{2}^{2} + {\upalpha }} \right|\left| {\upbeta } \right||_{1}$$
(2)

where \({\upalpha }\): constant value, \({\upbeta }\): absolute L1 – norm penalty

$$\hat{\beta } = (X^{T} {\text{X}} + {\upalpha }I_{p} )^{ - 1} X^{T} {\text{Y }}$$
(3)

where \(\hat{\beta }\): ridge estimator, α > 0: complexity parameter, and \(I_{p}\): complexity matrix.

Performance analysis of the regression models was done using R2, MSE, RMSE, and MAE. 147 datasets collected from experimental work were split into 70% training and 30% testing set. The input variables comprise cement, FA, GGBFS, coarse aggregate, fine aggregate, NaOH/Na2SiO3, activator/binder, superplasticizer, NaOH molarity, extra water, and curing age. The results showed that linear regression has a high-performance predictive accuracy with an R2 of 0.80, MSE of 15.76, RMSE of 3.97, and MAE of 2.84. LASSO has the worst performance with R2 of 0.34, MSE of 51.55, RMSE of 7.18, and MAE of 4.65. The correlation heatmap spotting patterns in the dataset (Fig. 2) show that cement (X1) has the most impact on the compressive strength (Y1) of the FA-GGBFS geopolymer as it gave the highest positive correlation coefficient of 0.41.

Fig. 2
figure 2

Heatmap showing the correlation between input features (X1–X11) and compressive strength (Y1) [49]

Similarly, Prem et al. [53] compared the performance of different regression algorithms i.e., LASSO, elastic net (EN), decision tree (DT), bagging tree (BT), kernel ridge regression (KRR), relevance vector machine (RVM), support vector regression (SVR), and gaussian process regression (GPR) in predicting the compressive strength of geopolymer concrete. MATLAB [54] was used in the evaluation process through the simpleR toolbox where the objective function consisted of the core algorithm (i.e., penalized regression, sorting and grouping, bootstrap aggregation, structural risk minimization, matrix inversion, and Bayesian statistical inference), loss function to measure model fitting (i.e., quadratic, hinge, \({\upvarepsilon }\) insensitive, and marginal likelihood), and regularization (i.e., L1-norm, L2-norm, and probabilistic) to measure method complexity. Regularization is a technique of constraining the model from overfitting by shrinking the coefficients to zero. The results showed that GPR had the best performance accuracy with R2 of 0.9801, RMSE of 0.96, MAE of 1.23, and ME of 0.12. LASSO had the worst performance with an R2 of 0.8649, RMSE of 16.73, MAE of 17.09, and ME of 16.73 as depicted in Fig. 3. A similar inference on LASSO performance was made by Sharma et al. [49] implying that LASSO may not be the best choice to get the best prediction accuracy for geopolymer strength as it only selects a few features in the dataset (i.e., sparse solution) attributed to highly collinear predictors [55], dropping of variates and erratic coefficient estimates [56] thus leading to high prediction variance and cross-validation errors. In contrast, Volker et al. [57] observed that GPR had poorer predictive performance than RF due to its Bayesian requirement for continuous data for the probability distribution functions. This agrees with Kurt et al. [58] who observed that RF performs better in predicting the strength properties of geopolymers than regression models. Because of the complexity and heterogeneity of geopolymer mix design, tree-based ensemble models are commonly preferred over empirical regression models because they fully incorporate the nonlinearity of input and output variables [57, 59].

Fig. 3
figure 3

Comparison of residuals for the different regression algorithms [53]

A study by Ellis et al. [60] used the simplex design and least squares technique to obtain a compressive strength prediction model which was then validated through experimental tests and analysis of variance (ANOVA). The model performance was deemed acceptable and significant, giving an R2 of 0.8534, an MAE of 5, and a p-value < 0.0001. It was observed that the relationship between compressive strength and sodium carbonate (Na2CO3) as an activator for blast furnace slag was not straightforward as evidenced in their compressive strength prediction model (Eq. 4).

$$\begin{aligned} Compressive strength \left( {{\text{MPa}}} \right) & = 1.30894*\left( {grams SiO_{2} } \right) \\ & \quad - 57.64157*\left( {grams Na_{2} CO_{3} } \right) - 0.80877*\left( {grams H_{2} O} \right) \\ & \quad - 0.00150258*\left( {grams Sand} \right) + 0.26552*\left( {grams Slag} \right) \\ & \quad + 0.052\left( {grams Na_{2} CO_{3} } \right)*\left( {grams H_{2} O} \right) + 0.041074 \\ & \quad *\left( {grams Na_{2} CO_{3} } \right)*\left( {grams Sand} \right) + 0.036474 \\ & \quad *\left( {grams Na_{2} CO_{3} } \right)*\left( {grams Slag} \right) \\ \end{aligned}$$
(4)

Zahid et al. [36] also developed a prediction model between compressive strength (Y) and NaOH molarity (x1), curing temperature (x2), and Na2SiO3/NaOH (x3) as represented by Eq. 5. The developed prediction model performed satisfactorily when validated against actual experimental data giving an R2 of 0.9951, RMSE of 1.72, and p-value of < 0.0001 confirming the significance of the model optimized using the response surface methodology (RSM) [61] and ANOVA [62]. The compressive strength was greatly influenced by NaOH molarity and Na2SiO3/NaOH. A similar technique of RSM and ANOVA was used by Cortes and Garcia [26], Tian et al. [35], and Hamdane et al. [37] to model and optimize the compressive strength of geopolymers.

$$\begin{aligned} Compressive strength \left( {{\text{MPa}}} \right) & = - 69.90722 + 11.56527x_{1} \\ & \quad + 2.21019x_{2} + 7.24578x_{3} - 0.053167x_{1} x_{2} \\ & \quad + 0.65889x_{1} x_{3} - 0.017630x_{2} x_{3} - 0.34918x_{1}^{2} \\ & \quad - 0.00976990x_{2}^{2} - 3.94373x_{3}^{2} \\ \end{aligned}$$
(5)

Mazzinghy et al. [63] modeled the compressive strength (S) of one-part iron ore tailings-geopolymer with respect to time (t) as illustrated by Eq. 6:

$$S = A\left( {1 - exp\left( { - bt} \right)} \right)$$
(6)

where A and b are parameters providing maximum value and rate of approach. For model-fitting, the objective function (Eq. 7) was used:

$$\theta = \mathop \sum \limits_{t = 1}^{28} (St - \overline{St} )^{2} /St$$
(7)

where \(St\) is the experimental strength and \(\overline{St}\) is the model strength in time t.

Furthermore, Mazzinghy et al. [63] used the Excel solver function to find the best features by minimizing the value given in Eq. 7 using a non-linear optimization algorithm. Ahmed et al. [28] used a similar Excel solver function to minimize an objective function on fly ash geopolymer mortar. The developed model by Mazzinghy et al. [63] is illustrated in Fig. 4 and shows compressive strength findings of 43 MPa, 40.7 MPa, and 48.1 MPa after 28 days of ambient curing with SS/SH ratio of 10:1, 7:1, and 4:1, respectively. However, the fitting curve did not provide a better representation of the experimental data which can be attributed to the choice of fitting equation or activation function defined as a mathematical equation that calculates an output based on the input variables. The Avram and Tanh equations work best with a sigmoidal curve (0–1) and hyperbolic curve (− 1 to 1) represented by Eqs. 8 and 9, respectively, commonly used in neural networks.

$$Y = {\text{A*}}\left( {1 - \exp \left( { - b*t^{n} } \right)} \right)$$
(8)

where A, b, and n are fitting constants while t is the correspondent time.

$$\tanh \left( x \right) = \left( {{\text{e}}^{{\text{x}}} - e^{ - x} } \right)/\left( {{\text{e}}^{{\text{x}}} + e^{ - x} } \right)$$
(9)
Fig. 4
figure 4

Relationship between compressive strength and curing time [63]

The residuals are squared to avoid minimizing the sum with positives and negatives. Being on either side of the fit is an error, but if left unchecked a residual of 0.1 and -0.1 would equate to zero when in fact they should be agnostic to which side the error occurred and sum to 0.2. To fix this problem, the residuals are squared, and everything is positive thus finding the "least squares" for the best fit.

In situations where there are more variables and need to find the equation to fit the experimental data, it is best to first plot the data and start with a linear model and then use higher-order polynomials as the first optimization test to examine the fitting error and see if it appears quadratic, logarithmic, exponential, etc. Lastly, try other model forms and evaluate how they fit with an R2, sum of squared errors, or sum of absolute errors. A similar technique was used by Petroli et al. [64] who tested the tangent sigmoid (Eq. 10), logarithmic sigmoid (Eq. 11), and linear algorithm (Eq. 12) for use as activation functions in predicting transition pressures for two ternary systems. The Levenberg–Marquardt algorithm was used as the training model for the least-squares curve-fitting problem since its output results gave the lowest mean squared error (MSE), root mean squared error (RMSE), and the mean absolute deviation (MAD) represented by Eqs. 13, 14, and 15, respectively [64].

$$f\left( x \right) = \frac{2}{{\left( {1 + e^{ - 2x} } \right)}} - 1$$
(10)
$$f\left( x \right) = \frac{1}{{\left( {1 + e^{ - x} } \right)}} - 1$$
(11)
$$f\left( x \right) = ax$$
(12)
$$MSE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {Yi - \overline{Y}i} \right)^{2}$$
(13)
$$RMSE = SQRT \left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {Yi - \overline{Y}i} \right)^{2} } \right)$$
(14)
$$MAD = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {Yi - \overline{Y}i} \right|$$
(15)

The Feed-Forward Network (FFN) used for the methanol system achieved a determination coefficient (R2) of 0.99878, MSE of 0.01612, RMSE of 0.13, and MAD of 0.10 while the Elman Network used for the ethanol system achieved an R2 of 0.99359, MSE of 0.99359, RMSE of 0.25, and MAD of 0.16. The results showed that the artificial neural networks (ANN) method better fitted the experimental dataset compared to the Peng-Robinson models which used a mixing approach subject to Wong-Sandler mixing rules and van der Waals quadratic.

4.2 Supervised learning algorithms

Supervised machine learning algorithms are commonly used in model development due to their use of labelled data making them efficient and versatile. Sun et al. [65] used five RF algorithms to predict the compressive strength (using 193 test samples dataset), slump (using 145 test samples dataset), dynamic yield stress, static yield stress, and plastic viscosity of alkali-activated concrete. RF is an ensemble bagging/bootstrap aggregation technique consisting of row sampling with replacement and feature sampling to form several decision trees (DT) which are combined into a majority vote (classifier) and average (regressor) to mitigate bias, variance, and overfitting [66, 67]. Bagging is commonly employed in creating low-variance models by eliminating noise and bias. Figure 5 illustrates the workflow applied to derive the RF algorithm consisting of many decision trees. To achieve accuracy in predictive performance and computational cost based on the predictor/input design parameters i.e., precursor content, blast furnace slag ratio, Na2O content, Ms, water content, fine aggregate, coarse aggregate, and testing age, an optimal hyperparameter tuning set [68] was found at mtry equal to 8 and ntree equal to 210. Hyperparameters are the options or variable values fed to an algorithm to fine-tune how it works. The experimental dataset was split into 80% for training and 20% for testing using tenfold cross-validation to evaluate the accuracy of the trained model in generalizing to new data and minimizing overfitting/underfitting.

Fig. 5
figure 5

Workflow for the development of the RF model [65]

Statistical metrics e.g., coefficient of determination (R2), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and alpha 20 (a20-index), were applied to assess the prediction performance of the developed random forest algorithms.

$${\text{R}}^{2} = 1 - \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{n}}} \left( {{\text{Yi}} - {\overline{\text{Y}}\text{i}}} \right)}}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{n}}} \left( {{\text{Yi}} - {\hat{\text{Y}}\text{i}}} \right)}}$$
(16)
$${\text{MAE }} = \frac{1}{{\text{n}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{n}}} \left| {{\text{Yi}} - {\overline{\text{Y}}\text{i}}} \right|$$
(17)
$${\text{MAPE }} = \frac{1}{{\text{n}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{n}}} \left| {\frac{{{\text{Yi}} - {\overline{\text{Y}}\text{i}}}}{{{\text{Yi}}}}} \right|{*}100$$
(18)
$${\text{a}}20{\text{ index}} = \frac{{{\text{m}}20}}{{\text{n}}}$$
(19)

where \({\text{Yi}}\), \({\overline{\text{Y}}\text{i}}\), \(\widehat{{{\text{Yi}}}}\), \({\text{n}}\), and \({\text{m}}20\) represents the actual value, predicted value, mean value, sample size, and number of samples with an experimental/predicted ratio between 0.80 and 1.20 [30, 32].

After iteration [65], the results showed a good fit between the observation and prediction with errors within ± 20%. For instance, random forest prediction of the 28-day compressive strength training dataset achieved an R2 of 0.96, MAE of 2.45, RMSE of 3.22, and MAPE of 9.21% while the testing dataset had an R2 of 0.92, MAE of 4.48, RMSE of 5.43, and MAPE of 15.98%. The impact of predictor features on output values was examined using the out-of-bag permutation which decreases the mean accuracy. The results showed that features with greater effect on compressive strength were curing age and silicate modulus (Ms) giving normalized importance factors of 0.653 and 0.154, respectively. For the slump, it was the water content (0.355), silicate modulus (0.189), and precursor content (0.142). A similar RF algorithm technique for compressive strength prediction was used and recommended by Verma [69], Li et al. [70], and Ding et al. [71]. In contrast, Verma [69], and Li et al. [70] split their training/testing dataset into 70:30 and 60:40, respectively, while Ding et al. [71], Sun et al. [65] and Nguyen et al. [72] used an 80:20 dataset split.

Peng and Unluer [73] used support vector machine (SVM), extreme learning machine (ELM), and back propagation neural network (BPNN) to predict the 28-day compressive strength of fly ash-based geopolymer concrete. A dataset of 110 groups was collected from literature consisting of different input parameters/features e.g., fly ash content/composition, alkaline activator content (NaOH + Na2SiO3), AA/FA ratio, water, polycarboxylate superplasticizer, curing temperature and duration, fine and coarse aggregate. The performance accuracy of the models was examined using R2, MSE, RMSE, and MAE. All the algorithms were run in MATLAB 2016a [74]. The training/testing/validation dataset was split into 70:15:15 which concurs with [33, 75]. It was found that all models had relatively good performance between the predicted and actual values with errors within ± 20%. In terms of performance accuracy, BPNN prediction of the 28-day compressive strength achieved an R2 of 0.9323, MSE of 6.83, RMSE of 2.61, and MAE of 1.61. SVM prediction of the 28-day compressive strength achieved an R2 of 0.9148, MSE of 11.39, RMSE of 3.37, and MAE of 2.30. ELM prediction of the 28-day compressive strength achieved an R2 of 0.9146, MSE of 11.41, RMSE of 3.38, and MAE of 2.57. Figure 6 shows an illustration of SVM, ELM, and BPNN.

Fig. 6
figure 6

Graphical representation of (a) BPNN (b) SVM (c) ELM [73]

SVM is one of the best nonlinear supervised machine learning models [76]. Given a set of labelled training data, SVM finds the optimal hyperplane which categorizes new examples. Unlike linear regression or neural networks, only the support vector decides the best decision boundary known as a hyperplane. ELM is a generalized single-hidden layer feed-forward neural network (SLFFNN) and provides a faster learning speed and better generalization capability because the learning algorithms, hidden node weights, and biases are randomly assigned and need not be tuned [77, 78]. BPNN is widely used to train neural networks through the chain rule by fine-tuning the weights and biases to minimize the previous gradient descent and iteration error [79,80,81]. An artificial neural network (ANN), inspired by the human brain, is a group of artificial neurons composed of an input layer, hidden layer(s), weights, a bias (or threshold), and output layer [82] given by Eqs. 20 and 21:

$$y = \mathop \sum \limits_{i = 1}^{n} \left( {WiXi + b} \right)$$
(20)
$$f\left( x \right) = Activation\, function \left( y \right)$$
(21)

As shown in Fig. 7, the weights (Wi) in the neural network are assigned to each input layer (Xi) to convey the importance of the input feature in predicting the output value. The bias (b) adjusts the activation function f(x) either to the left or right. The summation function (i.e., \(y = \mathop \sum \nolimits_{i = 1}^{n} \left( {WiXi + b} \right)\) binds the weights and inputs together to determine the sum and produce a single input value to the neuron. Activation functions e.g., binary step function, logistic sigmoid function, tanh function, arctan function, rectified linear unit (ReLU), leaky ReLU, and softmax function, are defined as mathematical functions that calculate the output of a neuron based on inputs and weights [83, 84]. The activation function introduces non-linearity in the model and transforms the neuron input value through a sigmoid, hyperbolic tangent, or ReLU [86, 85]. However, the drawback of the sigmoid and Tanh functions is that when taking the derivative of both these functions during gradient descent, the smaller value of the derivative leads to slow learning (vanishing gradient problem) which can be overcome by using the rectified linear unit [87].

Fig. 7
figure 7

Illustration of the artificial neural network

Therefore, neural network learning entails finding the right weights and biases to a problem through a forward and backward propagation where the activations range between 0 and 1 forming a sigmoid curve. In the study by Peng and Unluer [73], both ELM and BPNN adopted the sigmoid activating function given by Eq. 22:

$$y = \frac{1}{{\left( {1 + { }e^{ - x} } \right)}}$$
(22)

Awoyera et al. [88] estimated the strength properties of geopolymer self-compacting concrete using ANN and GEP. The input parameters used for model development comprised GGBFS, silica fume, FA, and workability measured using slump flow, T50 cm, V-funnel, L-box, and J-ring tests. A dataset of 105 samples, split into 80:20 training/testing sets, was used to develop the ANN model while a dataset of 412 samples was used to develop the GEP. The feed-forward back propagation neural network (FFBPNN) subject to the Levenberg–Marquardt algorithm [89,90,91] was utilized to train the data in MATLAB. The Levenberg–Marquardt algorithm interpolates between the gradient descent and Gauss–Newton methods to find a local and global minimum based on Eq. 23:

$$x_{n + 1} = x_{n} - (\nabla^{2} f\left( {x_{n} } \right) + \lambda I)^{ - 1} \nabla f\left( {x_{n} } \right)$$
(23)

GEP applies evolutionary biology principles to find a global minimum to a problem. The procedure starts with a preliminary generation of creatures validated on the objective function and then continues to crossover and mutate until an evolved optimal solution is generated [92, 93]. GEP utilizes natural selection to simulate the survival of the fittest and natural selection inside the computer given by Eq. 24:

$$M_{i} = \mathop \sum \limits_{j = 1}^{{k_{t} }} \left( {F - \left| {K_{{\left( {i,j} \right)}} - T_{j} } \right|} \right)$$
(24)

where Mi: fitness function, M: data selection range, K(i,j): value found by creature i for fitness case j, and Tj: target value [88]. It was found that the developed ANN and GEP models demonstrated high-performance accuracy in predicting the strength of self-compacting geopolymer concrete. ANN prediction of the 28-day compressive strength achieved an R2 of 0.89, MSE of 0.00566, RMSE of 0.07523 while GEP achieved an R2 of 0.45465, MSE of 11.1, RMSE of 3.33, and MAE of 2.02, for the testing dataset. In contrast, Mazumder and Prasad [94] observed that GEP had better performance accuracy in predicting geopolymer compressive strength than ANN and SVM since it gave an R2 of 0.9922, MSE of 3.3302, RMSE of 1.8248, and MAE of 1.5053 for the testing dataset. ANN has some drawbacks such as implicit storage of acquired knowledge and difficulty in interpreting network architecture decision-making process, slow convergence speed, local minima solutions, poor generalization performance, and overfitting [95].

In another study, Nazari and Sanjayan [76] utilized SVM, ANN, adaptive neuro-fuzzy interfacial systems (ANFIS), and five hybrid algorithms (imperialist competitive algorithm (ICOA), artificial bee colony optimization algorithm (ABCOA), ant colony optimization algorithm (ACOA), particle swarm optimization algorithm (PSOA) [70], and genetic algorithm (GA)) to predict the compressive strength of geopolymer paste, mortar, and concrete. The metaheuristic algorithms optimized the SVM to form hybrid algorithms which were then compared to the traditional non-optimized SVM, ANN, and ANFIS. 1347 datasets collected from the literature were used for training/testing the model. 12 input parameters were used consisting of slag, fly ash, water, fine aggregate, coarse aggregate, Na2SiO3, NaOH, KOH, superplasticizer, curing temperature, and curing time. The results showed that all R2 values of the hybrid models were better than the traditional models. ICOA–SVM had the highest performance with an R2, MAE, RMSE, and MAPE of 0.8993, 1.9092, 3.2603, and 7.6373, respectively. It was followed by GA–SVM and ANN as the second and third-best models for predicting geopolymer strength. Mozumder et al. [95] used support vector machine regression (SVMR) to predict the 28-day compressive strength of GGBFS geopolymer stabilized clayey soil. A dataset of 213 geopolymer-stabilized clayey soil was utilized in model development. The dataset was split into 70% training and 30% testing. The modeling was performed in MATLAB using the SVR toolbox. The input parameters comprised liquid limit (LL), plasticity index (PI), binder content (GGBFS), molar concentration (M), and alkali/binder (A/B). The model performance indicators comprised R2, RMSE, and MAPE. SVM, invented by Vapnik [96], is a powerful and robust kernel-based classification and regression algorithm with superior generalization capability [97, 98]. The generalization capability is dependent on the optimal hyperplane defined by Eq. 25 and Fig. 8:

$$f\left( x \right) = \left( {{\text{w}}.{\text{x}}} \right) + b$$
(25)

where w: weight vector; x: input vector; and b: bias.

Fig. 8
figure 8

Geometrical illustration of the optimal hyperplane (H), two parallel hyperplanes (H1 and H2), and support vectors [97]

However, the inclusion of an insensitive loss function (ε-) in Eq. 25 makes it a support vector regression (SVR) problem as it introduces the concept of margin in SVM to minimize model complexity implying no prediction error in the model if within ε- [96] as defined by Eq. 26.

$$L_{\varepsilon } (y) = |y - f(x)|_{\varepsilon } = f(x) = \left\{ {\begin{array}{*{20}l} 0 \hfill & {if\left| {y - f(x)} \right| \le \varepsilon } \hfill \\ {\left| {y - f(x)} \right| - \varepsilon , } \hfill & {otherwise} \hfill \\ \end{array} } \right.$$
(26)

where \(L_{\varepsilon } \left( y \right)\): loss function and ε > 0.

The use of kernel functions addresses nonlinear regression computation difficulty due to the high dimensionality of feature space. This study by Mozumder et al. [95] applied the radial basis kernel function (RBF), exponential radial basis kernel function (ERBF), and polynomial kernel function (POLY). The results showed that SVR-ERBF performed better in compressive strength prediction (R2 = 0.9992, RMSE = 0.1973, MAPE = 0.2586) compared to SVR-RBF (R2 = 0.9829, RMSE = 0.8679, MAPE = 3.2542) and SVR-POLY (R2 = 0.9688, RMSE = 1.1722, MAPE = 4.9073). A similar technique of SVR was adopted by Kumar et al. [99] which gave an R2 of 0.87385, MAE of 1.6034, RMSE of 2.1850, and MAPE of 4.3084. A parametric study with the SVR-ERBF model showed that compressive strength is directly proportional to binder content but inversely proportional to LL and PI. However, the effect of M and A/B on strength was equivocal which concurs with studies by Duxson et al. [100] and Thokchom et al. [101].

Huo et al. [8] developed compressive strength prediction models for geopolymer concrete using 8 machine learning algorithms i.e., RF, extra-trees (ET), gradient boosting decision tree (GBDT), bootstrap aggregating (BA), k nearest neighbor (KNN), extreme gradient boosting (XGB) [102], SVM, and deep neural networks (DNN). A 557 dataset on calcium-based geopolymers was collected from the literature and split into 80% training and 20% testing data using Scikit-learn in Python [103, 104]. The input features comprised oxide composition (Si/Al, Si/Ca, Si/Na2O, H2O/Na2O, and Na2O content), liquid/solid, curing temperature, and curing age. The influence of the input features on strength development was assessed using Shapley Additive Explanations (SHAP) [105,106,107]. The results showed that XGB had the highest prediction accuracy with an R2 of 0.91, RMSE of 3.85, MAE of 2.51, and MAPE of 16.94 for the testing dataset as depicted by the radar chart in Fig. 9 where XGB is at the inner core of the radar plot. SHAP showed that the curing age, curing temperature, H2O/Na2O, Si/Ca, and L/S, have the most influence on the compressive strength. A similar technique of SHAP was applied by Shah et al. [108] to quantify the significance of each feature in fly ash-slag one-part geopolymers and found that XGB had the highest performance accuracy with R2 of 0.90, MAE of 4.47, and RMSE of 7.90 greatly influenced by Na2O dosage, precursor content, water/binder ratio, and curing temperature. Also, Nguyen et al. [59] found XGB to be a robust model in predicting the compressive strength of fly ash-based geopolymer concrete as it had an R2 of 0.964, RMSE of 2.457, MAE of 1.794, and MAPE of 0.086 for the testing data even though the RF was found to be the most effective in the training data.

Fig. 9
figure 9

Radar chart of ML performance indicators [8]

5 Discussion

Several researchers have considered machine learning and statistical modeling as novel techniques in predicting geopolymer strength. The commonly modeled output parameter has been compressive strength based on input variables consisting of precursor materials (e.g., cement, FA, and GGBFS), coarse aggregate, fine aggregate, NaOH/Na2SiO3, activator/binder ratio, superplasticizer, NaOH molarity, AL/B, Ms, extra water, curing temperature, and curing age. Compressive strength is the most modeled output variable since it greatly influences the geopolymer structural performance, durability, and safety rating [109]. The precursor materials have been used in their unary and binary forms to improve the physico-chemical properties of geopolymers. Most ML models have been based on unary precursors to minimize the complexity of the model due to the heterogeneous nature of the waste materials. The variation in the contents and ratios of SiO2, Al2O3, Fe2O3, and CaO, greatly influences the suitability of the precursor material in producing a geopolymer of good quality strength [110] . All the precursor materials come from different sources and are bound to have different physico-chemical properties. Interestingly, not all precursor material properties e.g., morphology, mineralogy, and chemical composition, have been incorporated in most of the ML and SM thus negatively influencing their performance when it comes to experimental testing data validation.

Alkaline activators also play a crucial role in the strength development of geopolymers and subsequently performance of the developed ML and SM. The most used alkaline activators have been NaOH and Na2SiO3 due to their favourable performance in the dissolution of precursors and strength development of geopolymers [7]. Furthermore, the sensitivity of the strength prediction models is greatly influenced by the curing conditions (i.e., temperature and time) and alkaline activator (NaOH + Na2SiO3). The dissolution of precursor materials and their rearrangement into poly(sialate) and poly(sialate-siloxo) depends on the concentration of the alkaline activator. Elevated temperature curing controls the setting time and reactivity of the precursor + alkaline liquid mixture ensuring complete dissolution into geopolymeric gels with no unreacted particles left behind.

Figure 10 shows a consolidated framework of the machine learning architectural layers comprising the acquisition of structured and unstructured data, data conditioning, algorithm and model training, human–machine interaction, and perception and deployment. The geopolymer dataset comprising precursor materials, alkaline activators, aggregates, and curing time and temperature, is acquired through experimental testing. The raw data is pre-processed through data conditioning to filter outliers, treat missing data, and produce information. The algorithms commonly based on supervised machine learning (SML) are used to model experimental data and perform geopolymer strength predictions. Engineers and construction practitioners interact with the developed predictive model in real practical deployments to gain further insights. The framework provides a structured summary of employing ML to mimic real experimental data for use in structural engineering.

Fig. 10
figure 10

A flowchart showing machine learning architectural layers

The application of ML in predicting the mechanical properties of geopolymers is advantageous over empirical regression models (SM) because it explicitly iterates the nonlinear connection between dependent and independent variables. Interestingly, the empirical regression models and machine learning algorithms complement each other in precisely predicting the strength properties of geopolymers, thereby minimizing laboratory experiments and rationing precursor materials. This approach can sustainably optimize geopolymer mix design giving better strength properties, lower operational cost, and minimal environmental impact during the pre- and post-construction phases. Statistical performance metrics comprising correlation coefficient (R), determination coefficient (R2), mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and a20 index are crucial tools in validating the model strength prediction performance accuracy. The inclusion of sensitivity analysis and permutation feature importance in the models can advance the understanding and ranking of key features influencing the strength prediction in geopolymers. Based on this review, consolidating particle size distribution, surface area, specific gravity, density, chemical composition, morphology, and mineralogy, into the geopolymer strength prediction models can improve their replicability and generalizability regardless of precursor material source and nature. Furthermore, the level of dataset quality, size, and hyperparameter tuning influences the model performance thus explaining the variation in sensitivity for the same input variables [73].

Table 2 summarizes the performance of various ML and SM. Regression is a reliable statistical technique useful in data prediction, modelling, and analysis. Based on its efficiency, simplicity, and prognosis ability, LR is commonly used in most statistical models dealing with larger and smaller datasets. However, in smaller datasets, LR tends to overfit data by assigning more weights to specific features thereby reducing the performance accuracy of the predictive model. To overcome the overfitting problem and subpar nonlinear performance, RR, LASSO, and GPR are suitable modern modifiers as they tend to penalize/regularize the model coefficients and make the weights of higher-order features approach zero. LASSO selects a few features in the dataset dropping off variates and erratic coefficient estimates. GPR has a Bayesian requirement for continuous data to execute the probability distribution functions. Nevertheless, this review observed that neural networks, support vector machines, random forest, and Gaussian process regression strength-prediction models have significant performance since their R2 values are > 0.9, the p-values < 0.0001, and errors are within ± 20%. The mix design is the core determinant of the mechanical strength of geopolymers such that a broad optimization of the mix materials and proportions dataset can improve safety ratings, control quality, save time, and minimize cost during the pre-and post-design phases.

Table 2 Summary comparing the performance of various ML and SM models

However, some drawbacks to the applicability of machine learning and statistical models in the strength prediction of geopolymers exist. Firstly, the choice of activation function which depends on the type of network and experimental data influences the model performance and defines the prediction output accuracy. For example, most regression problems use one node linear activation function for the output layer while classification problems use one node sigmoid or softmax function for the output layer. In contrast, the multilayer perceptron and convolutional neural network use the ReLU function for each hidden layer while the recurrent neural network uses the Sigmoid and Tanh functions for each hidden layer. Nevertheless, it is recommended to try a few activation functions for each type of network and prediction problem and then compare the results to make an informed choice on what to use for the neural network model. Secondly, the dataset used in the activation function may be prone to noise and bias due to variability in experimental conditions affecting model interpretability and generalizability. Furthermore, each prediction model developed is specific to the nonlinear material properties such that the same precursor materials in a different or similar laboratory environment can give a different compressive strength. To minimize noise and bias and improve model interpretability/generalizability for use in structural engineering, the dataset needs to be subjected to robust preprocessing, structural risk minimization, hyperparameter tuning, regularization, cross-validation, statistical metrics, hybrid algorithms, and inter-laboratory validation experiments. The use of kernel functions addresses nonlinear regression computation difficulty due to the high dimensionality of feature space. Data preprocessing can help detect/remove outliers and treat missing data. This ensures consistency and splitting of data to be used in model development. The most used data splitting is a combination of 70/15/15% for training/testing/validating giving relatively good performance between the predicted and actual value with errors within ± 20%. The tenfold cross-validation is commonly applied to evaluate the accuracy of the trained model in generalizing to new data and minimizing overfitting/underfitting. Thereafter, inter-laboratory validation of the developed models can improve their reliability and streamline model performance to new materials and experimental conditions. Hybridized models tend to perform better than traditional models such that their incorporation improves prediction performance and interpretability. Advanced ML techniques such as deep learning, ANFIS, ResNet, and GPR, have better regularization and structural risk minimization improving the model's reliability and stability. Thirdly, a scarcity of datasets due to limited experimental tests. As the saying goes ‘garbage in garbage out’ [111], the robustness of the predictive models depends on the input data variables which are mostly determined through laboratory experiments. Using too few data in the model is prone to overfitting affecting its predictive ability under different datasets. Therefore, conducting more experiments using various materials prepared under different laboratory and in-situ conditions will greatly increase the datasets providing a wide range of input and output data for developing reliable predictive models. Besides data normalization, the performance of the predictive models could benefit from other data conditioning approaches detailing the deletion of outliers and the identification of missing and duplicated data. Moreover, the moulds used in casting geopolymers are of different dimensions, so their inclusion in data normalization could improve the performance of the prediction models. Lastly, there is little hyperparameter tuning to identify the crucial features influencing model performance. An optimal combination of the key features and selection of hidden layers dependent on problem complexity and data availability provides improved strength predictive models with minimal underfitting/overfitting without increasing the computational power. Tuning ANN models and using Levenberg–Marquardt or Bayesian regularization expands their ability to perform at par or even better than the ANFIS and DNN. ANFIS and DNN are emerging technologies with advanced predictive performance attributed to their integration of fuzzy logic + neural networks and multiple hidden layers/neurons, respectively. Incorporating the proposed suggestions in most geopolymer strength prediction models would give engineers and researchers greater confidence in accepting the interpretability/generalizability of the models to real-life practical scenarios further validated through laboratory trials.

6 Limitations and future work

The data used for the systematic review was collected from the Scopus database with set eligibility criteria such that articles written in non-English languages were not included. The search strings were set to “geopolymer” OR “alkali-activated materials” AND “machine learning” OR “statistical modeling” which might not guarantee the retrieval of alternative synonyms such as “eco-cement”, and “green-cement”. However, non-English publications and alternative databases or synonyms are not commonly used for systematic and bibliometric studies in this field hence nothing substantial is expected to change from the present review. Regarding the content of the reviewed studies, the development of the strength predictive models employed different input features and dataset ranges. Most studies did not specify the data normalization, preprocessing, and hyperparameter tuning making the inter-study comparison of the developed models very challenging. Furthermore, most studies did not tackle the impact of treating outliers and missing data on the model performance. These limitations could be addressed by specifying the feature selection criteria, data conditioning, and using completely new datasets.

Future research should investigate the impact of integrating precursor reactivity, particle size distribution, chemical composition, and mineralogy on model performance. The commonly used precursors consisting of fly ash, slag, and rice husk ash, have different reactivity and composition such that developed predictive models based on their unary and binary/ternary combinations behave differently. Additional mechanical properties such as elastic modulus, peak stress, and peak strain, could be predicted. Structural members e.g., beams, and columns, are influenced by the varying material properties, and their use is not only defined by compressive strength but also shear strength and flexural strength. Therefore, developing a universal strength predictive model incorporating all the physico-chemical and engineering properties could enhance the interpretability of ML and SM and eventually address the black-box nature.

7 Conclusions

This paper has systematically reviewed the fundamental machine learning algorithms and statistical models applied in predicting geopolymer compressive strength. The following conclusions were drawn from the review:

  • The commonly used input variables comprise FA, GGBFS, coarse aggregate, fine aggregate, NaOH/Na2SiO3, activator/binder ratio, superplasticizer, NaOH molarity, AL/B, Ms, extra water, curing temperature, and curing age. Hyperparameter tuning and SHAP showed that input features with a greater effect on compressive strength were curing conditions and Ms giving normalized importance factors greater than 0.6 and 0.2, respectively.

  • LR, RR, GPR, and LASSO are commonly used empirical regression techniques in geopolymer data prediction and modeling. RF, DT, ET, SVM, ELM, BPNN, ANN, DNN, SLFFNN, GEP, ANFIS, ICOA, ABCOA, ACOA, PSOA, GA, KNN, and XGB, are commonly used machine learning algorithms. NN gives better strength prediction performances with R2 values > 0.99 followed by RF, SVM, and GPR.

  • Activation functions are a vital part of neural networks. The activation function introduces non-linearity in the model and transforms the neuron input value through a sigmoid, hyperbolic tangent, or ReLU.

  • Machine learning models have better predictive ability than empirical regression models attributed to their advanced ability to incorporate the nonlinearity of specific input and output variables.

  • To minimize noise and bias and improve model interpretability/generalizability for use in structural engineering, the dataset needs to be subjected to robust preprocessing, structural risk minimization, hyperparameter tuning, regularization, cross-validation, statistical metrics, hybrid algorithms, and inter-laboratory validation experiments.