Fundamental machine learning algorithms and statistical models applied in strength prediction of geopolymers: a systematic review

Matsimbe, Jabulani; Dinka, Megersa; Olukanni, David; Musonda, Innocent

doi:10.1007/s42452-024-06244-y

Fundamental machine learning algorithms and statistical models applied in strength prediction of geopolymers: a systematic review

Review
Open access
Published: 09 October 2024

Volume 6, article number 538, (2024)
Cite this article

Download PDF

You have full access to this open access article

Discover Applied Sciences Aims and scope Submit manuscript

Fundamental machine learning algorithms and statistical models applied in strength prediction of geopolymers: a systematic review

Download PDF

Jabulani Matsimbe^1,2,3,
Megersa Dinka¹,
David Olukanni⁴ &
…
Innocent Musonda²

Abstract

The nonlinearity and heterogeneity of geopolymer mix design have urged the research community to supplement the existing experimental design approach with machine learning and empirical regression models to improve the practical strength performance of geopolymers. This systematic review aims to elaborate and consolidate the fundamental machine learning algorithms and statistical models applied in the strength prediction of geopolymers. This review specifically delves into the statistical linear/nonlinear optimization algorithms, supervised machine learning algorithms, and model performance statistical metrics. The PRISMA and Scopus databases were used for bibliometric data extraction. The search strings devised to carry out the review were “geopolymer” OR “alkali-activated materials” AND “machine learning” OR “statistical modeling”. This review observed that neural networks, random forest, support vector machines, and Gaussian process regression give better strength prediction performances with R² values > 0.9 and RMSE values < 3. The choice of activation function influenced the training performance of the algorithm and defined the prediction output accuracy. Hyperparameter tuning and Shapley additive explanations showed that input features with a greater effect on compressive strength were curing conditions and silicate modulus. This review promotes the consolidation of conventional experimental mix design approaches with machine learning techniques in solving geopolymer mix design and strength-related problems to give greater confidence to engineers and researchers in the applicability and versatility of these models to real-life practical scenarios saving time and minimizing costs.

Article Highlights

Mix design is the core determinant of the mechanical strength
Raw data undergoes preprocessing to address outliers and missing data
ANFIS and DNN are emerging technologies with advanced predictive power

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The replacement of virgin materials with recycled waste materials is among the core pillars of sustainable construction to address global warming, CO₂ emissions, and natural resource depletion. Geopolymer is an alternative cementitious binder to ordinary Portland cement [1] and it provides a promising lower CO₂ emission option for use in construction [2, 3]. French Engineer Professor Joseph Davidovits coined the terminology “Geopolymer” in 1978 [4, 5] to represent the polymeric Si–O–Al reaction product between an aluminosilicate precursor (e.g., fly ash, slag, volcanic ash, rice husk ash) and an alkaline medium (e.g., NaOH and/or Na₂SiO₃) giving a chemical equation $Mn[ - \left( {SiO_{2} )_{z} - AlO_{2} } \right]n \cdot wH_{2} O$ where M represents a K⁺ or Na⁺ cation, n is the polycondensation degree, and z is 1, 2, 3, or ≫ 3. The geopolymerization process is greatly influenced by the aluminosilicate precursor, SiO₂/Al₂O₃, alkaline activator, and curing conditions [6]. Aided by the curing condition, the dissolution to hardening process can happen in either an alkaline or acidic medium yielding a 3-D polymeric product having comparable mechanical properties to ordinary Portland cement [7]. The alkaline activator consisting of alkali hydroxides, silicates, or a mixture of both, is commonly used to dissolve the precursor. The reactivity of the precursor depends on the mineralogy, morphology, chemical composition, and particle size distribution. The performance of the developed geopolymer product can vary for the same aluminosilicate precursor and alkaline activator due to raw material heterogeneity, nonlinearity, and inconsistency [8]. The input parameters commonly used for the mixture design of geopolymers comprise alkaline liquid/binder ratio (AL/B), molarity of alkali hydroxide, alkali silicate modulus, alkali silicate/alkali hydroxide ratio, curing time and temperature, and fine/coarse aggregates [9].

Due to the lack of universal geopolymer mix design standards and the burden of time-consuming costly extensive laboratory experiment testing, machine learning and statistical modeling have been applied in predicting the mechanical properties of geopolymers and other civil engineering materials. It is imperative to create a mix design that gives better engineering properties by optimizing the different geopolymer formulation parameters. Machine learning (ML) and statistical modeling (SM) provide versatile tools of optimization and inference analytics to find global (or local) minima or maxima predicting the mechanical strength of geopolymers. Statistical modeling is a powerful optimization tool used to develop mathematical relationships and make predictions between variables [10]. The statistical models are commonly grouped into regression, classification, and clustering. The regression models consist of linear, logistic, and polynomial regression, and are used to predict a dependent variable based on independent variables. The classification models consist of neural networks, decision trees, and Naïve Bayes, and are used to categorize data into different classes. The clustering models consist of K-means which group data points into clusters based on similarity [10]. The term “machine learning”, a subset of artificial intelligence, was coined in 1959 by American computer scientist Arthur Samuel to represent self-teaching computers capable of developing different algorithms to solve problems [11,12,13]. Artificial intelligence (AI) represents the theory and integration of computer systems in performing tasks that mimic human intelligence e.g., perceiving, learning, classifying, and decision-making [14]. The use of ML has tremendously increased over the years due to its complex problem-solving abilities even amidst lacking and chaotic data. Its wide-ranging applications cover different fields such as medicine, economy, military, and construction. The U.S. National Institute of Standards and Technology (NIST) advanced concrete technology by implementing computer-integrated knowledge systems (CIKS) to predict the performance and life-cycle cost of high-performance concrete and new construction materials [15, 16]. Different supervised and unsupervised machine learning algorithms comprising artificial neural networks (ANN), deep neural networks (DNN), random forest (RF), decision tree (DT), gene expression programming (GEP), support vector machines (SVM), bagging regressor (BR), dimensionality reduction, clustering, etc., have been extensively applied in predicting the engineering properties of construction materials [17,18,19,20,21]. However, supervised machine learning algorithms are commonly used for predicting and forecasting the mechanical properties of civil engineering construction materials [22,23,24]. Compressive strength is the most modeled engineering property since it greatly influences durability and safety rating [25,26,27]. Validation of the developed strength predictive models is done using the statistical performance metrics comprising Pearson correlation coefficient (R), determination coefficient (R²), mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and a20-index [28,29,30].

Several researchers have applied machine learning and statistical modeling to predict the mechanical properties of geopolymers and other civil engineering materials. Huo et al. [8] used machine learning models to predict the strength of calcium-based geopolymers and concluded that tree-based ensemble models perform better in strength prediction. Cavaleri et al. [31] concluded that ensemble convolution-based machine learning models perform in predicting the strength of concrete giving an R², MSE, and a-20 index of 0.84, 0.022, and 0.75. Asteris et al. [32] applied machine learning models in the strength prediction of cement-based mortar and concluded that AdaBoost and RF have better prediction performance giving an R², RMSE, and a-20 index of (0.9965, 1.305, and 0.985) and (0.9768, 3.359, and 0.950), respectively. Chou et al. [27] used individual and ensemble machine learning models to predict the compressive strength of high-performance concrete and concluded that SVM and MLP performed better in strength prediction. Similarly, Golafshani et al. [33] developed strength predictive models for normal and high-performance concrete using machine learning techniques and concluded that the hybridization of ANN and ANFIS with Grey Wolf Optimizer had better performance in strength prediction. Zhong et al. [34] used ANN to predict the peak stress, elastic modulus, and peak strain of geopolymer concrete giving relative predictive errors of less than 10% and 20%, respectively. Tian et al. [35] used statistical modeling to predict the compressive strength of fly ash-slag-based geopolymer and concluded that the statistical model based on the response surface methodology-central composite design gives a better predictive performance with an R² value of 0.9415. Similarly, Zahid et al. [36] and Hamdane et al. [37] concluded that response surface methodology-central composite design provides better strength prediction performance. Ghanizadeh et al. [38] utilized a hybrid multivariate adaptive regression splines-escaping bird search optimization algorithm to predict the bearing capacity of geogrid reinforced stone columns and concluded that MARS-EBS model had better predictive performance (R² of 0.997, RMSE of 4.19) compared to SVR-POLY (R² of 0.952, RMSE of 20.88) and SVR-RBF (R² of 0.985, RMSE of 9.61). Ali et al. [39] concluded that full quadratic models provide better concrete strength prediction accuracy in concrete giving an R² of 0.96 and RMSE of 3.49.

The nonlinearity, heterogeneity, and inconsistency of geopolymer mix design have urged the research community to intensify research on supplementing the experimental design approach with machine learning and statistical modeling to improve the practical strength performance of geopolymers. Despite several studies on using ML and SM in materials science and engineering, a few researchers have systematically reviewed the state-of-the-art application of machine learning and statistical modeling in predicting geopolymer strength. Li et al. [9] reviewed the mixture design methods for geopolymer concrete (GPC) categorized into target strength, performance-based, and statistical models. Ahmed et al. [40] reviewed the mix design parameters of fly ash geopolymer composites and the development of compressive strength prediction models using linear regression, multi-logistic regression, M5P-tree, and ANN. Alaneme et al. [41] reviewed the theoretical principles of GEP, FIS, ANFIS, and ANN in the strength prediction of agro-waste geopolymers and concluded that AI techniques could contribute to lowering global warming and CO₂ emissions. Paruthi et al. [42] reviewed the use of ANN for strength prediction of geopolymer concrete and concluded that ANN gives significant performance with minimal errors. Rathnayaka et al. [43] reviewed the machine learning models applied in the strength prediction of fly ash-based geopolymer concrete and concluded that ANN, DNN, SVM, RF, ANFIS, and ResNet have better strength prediction accuracy. However, the limited knowledge of the consolidation between conventional experimental mix design approaches and machine learning techniques has impacted the adoption pace of geopolymer strength predictive models. Due to geopolymer heterogeneity, nonlinearity, and uncertainties, it is imperative to supplement the existing experimental testing with machine learning in characterizing mechanical properties and mix design of geopolymers to advance and improve their practical usage in the construction industry. Therefore, this systematic review aims to elaborate and consolidate the fundamental machine learning algorithms and statistical models applied in geopolymer strength prediction to balance economies of scale and mix design. This review specifically delves into the statistical linear/nonlinear optimization algorithms, supervised machine learning algorithms, and model performance statistical metrics. Finding an optimal strength predictive model, equation, or activation function based on the geopolymer experimental dataset is an iterative optimization process that depends on the type of input and output variables. This review promotes the supplementation of conventional experimental mix design approaches with machine learning techniques in solving geopolymer strength-related problems to give greater confidence to engineers and researchers in the applicability and versatility of these models to real-life practical scenarios saving time and minimizing costs.

2 Research significance

In this era of big data, fourth industrial revolution, and artificial intelligence, it is imperative to widely promote the supplementation of ML techniques in developing data-driven strength predictive models for geopolymer. The nonlinearity and heterogeneity of geopolymer mix design require supplementing the traditional experimental laboratory design approaches with machine learning algorithms and empirical regression models during pre-/post-design and quality control. This review provides a detailed reference guideline for construction practitioners and researchers on the state-of-the-art fundamental machine learning algorithms and statistical models applied in strength prediction and sustainable mix design of geopolymers. Furthermore, the review provides a comprehensive breakdown of future research areas required to promote the practical applicability of ML in geopolymer strength prediction and mix design.

3 Methodology

The Preferred Reporting of Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist [44] was used for data preparation. The Scopus database was chosen for bibliometric data extraction due to its versatility. A similar approach was used by Matsimbe et al. [3] and Zhang et al. [45]. The search strings devised to carry out the review were “geopolymer” OR “alkali-activated materials” AND “machine learning” OR “statistical modeling”. Table 1 shows the eligibility criteria used for data inclusion in this review. The PRISMA flowchart in Fig. 1 indicates that 169 articles were collected from the Scopus database as of 20 September 2023. The snowballing technique [46] and ROBIS tool [47, 48] were further adopted to collect additional data and assess the risk of bias, respectively.

Table 1 Eligibility criteria used in retrieving data from Scopus

Full size table

4 Results analysis

4.1 Statistical linear and nonlinear optimization algorithms

Statistical linear and nonlinear models associate dependent and independent variables through mathematical relationships [10]. The developed mathematical equations are used to understand patterns between variables and predict the compressive strength of geopolymers. Sharma et al. [49] developed predictive models for the compressive strength of ground granulated blast furnace slag (GGBFS) and fly ash (FA) geopolymer composite using different regression algorithms i.e., linear regression (LR), ridge regression (RR), and the least absolute shrinkage and selection operator (LASSO). The mathematical algorithms for LR [50], RR [51], and LASSO [52] are represented in Eqs. 1, 2, and 3, respectively.

$${\text{Y}} = {\beta X} + {\upvarepsilon }$$

(1)

where Y: the target variable, X: the predictor variable, ${\upbeta }$: the coefficient of regression, and ${\upvarepsilon }$: random error.

$${\upbeta }_{{{\text{min}}}} = \left( {\frac{1}{{2_{{\text{n samples}}} }}} \right)\left| {\left| {{\beta X} - {\text{y}}} \right||_{2}^{2} + {\upalpha }} \right|\left| {\upbeta } \right||_{1}$$

(2)

where ${\upalpha }$: constant value, ${\upbeta }$: absolute L₁ – norm penalty

$$\hat{\beta } = (X^{T} {\text{X}} + {\upalpha }I_{p} )^{ - 1} X^{T} {\text{Y }}$$

(3)

where $\hat{\beta }$: ridge estimator, α > 0: complexity parameter, and $I_{p}$: complexity matrix.

Performance analysis of the regression models was done using R², MSE, RMSE, and MAE. 147 datasets collected from experimental work were split into 70% training and 30% testing set. The input variables comprise cement, FA, GGBFS, coarse aggregate, fine aggregate, NaOH/Na₂SiO₃, activator/binder, superplasticizer, NaOH molarity, extra water, and curing age. The results showed that linear regression has a high-performance predictive accuracy with an R² of 0.80, MSE of 15.76, RMSE of 3.97, and MAE of 2.84. LASSO has the worst performance with R² of 0.34, MSE of 51.55, RMSE of 7.18, and MAE of 4.65. The correlation heatmap spotting patterns in the dataset (Fig. 2) show that cement (X₁) has the most impact on the compressive strength (Y₁) of the FA-GGBFS geopolymer as it gave the highest positive correlation coefficient of 0.41.

Similarly, Prem et al. [53] compared the performance of different regression algorithms i.e., LASSO, elastic net (EN), decision tree (DT), bagging tree (BT), kernel ridge regression (KRR), relevance vector machine (RVM), support vector regression (SVR), and gaussian process regression (GPR) in predicting the compressive strength of geopolymer concrete. MATLAB [54] was used in the evaluation process through the simpleR toolbox where the objective function consisted of the core algorithm (i.e., penalized regression, sorting and grouping, bootstrap aggregation, structural risk minimization, matrix inversion, and Bayesian statistical inference), loss function to measure model fitting (i.e., quadratic, hinge, ${\upvarepsilon }$ insensitive, and marginal likelihood), and regularization (i.e., L₁-norm, L₂-norm, and probabilistic) to measure method complexity. Regularization is a technique of constraining the model from overfitting by shrinking the coefficients to zero. The results showed that GPR had the best performance accuracy with R² of 0.9801, RMSE of 0.96, MAE of 1.23, and ME of 0.12. LASSO had the worst performance with an R² of 0.8649, RMSE of 16.73, MAE of 17.09, and ME of 16.73 as depicted in Fig. 3. A similar inference on LASSO performance was made by Sharma et al. [49] implying that LASSO may not be the best choice to get the best prediction accuracy for geopolymer strength as it only selects a few features in the dataset (i.e., sparse solution) attributed to highly collinear predictors [55], dropping of variates and erratic coefficient estimates [56] thus leading to high prediction variance and cross-validation errors. In contrast, Volker et al. [57] observed that GPR had poorer predictive performance than RF due to its Bayesian requirement for continuous data for the probability distribution functions. This agrees with Kurt et al. [58] who observed that RF performs better in predicting the strength properties of geopolymers than regression models. Because of the complexity and heterogeneity of geopolymer mix design, tree-based ensemble models are commonly preferred over empirical regression models because they fully incorporate the nonlinearity of input and output variables [57, 59].

A study by Ellis et al. [60] used the simplex design and least squares technique to obtain a compressive strength prediction model which was then validated through experimental tests and analysis of variance (ANOVA). The model performance was deemed acceptable and significant, giving an R² of 0.8534, an MAE of 5, and a p-value < 0.0001. It was observed that the relationship between compressive strength and sodium carbonate (Na₂CO₃) as an activator for blast furnace slag was not straightforward as evidenced in their compressive strength prediction model (Eq. 4).

$$\begin{aligned} Compressive strength \left( {{\text{MPa}}} \right) & = 1.30894*\left( {grams SiO_{2} } \right) \\ & \quad - 57.64157*\left( {grams Na_{2} CO_{3} } \right) - 0.80877*\left( {grams H_{2} O} \right) \\ & \quad - 0.00150258*\left( {grams Sand} \right) + 0.26552*\left( {grams Slag} \right) \\ & \quad + 0.052\left( {grams Na_{2} CO_{3} } \right)*\left( {grams H_{2} O} \right) + 0.041074 \\ & \quad *\left( {grams Na_{2} CO_{3} } \right)*\left( {grams Sand} \right) + 0.036474 \\ & \quad *\left( {grams Na_{2} CO_{3} } \right)*\left( {grams Slag} \right) \\ \end{aligned}$$

(4)

Zahid et al. [36] also developed a prediction model between compressive strength (Y) and NaOH molarity (x₁), curing temperature (x₂), and Na₂SiO₃/NaOH (x₃) as represented by Eq. 5. The developed prediction model performed satisfactorily when validated against actual experimental data giving an R² of 0.9951, RMSE of 1.72, and p-value of < 0.0001 confirming the significance of the model optimized using the response surface methodology (RSM) [61] and ANOVA [62]. The compressive strength was greatly influenced by NaOH molarity and Na₂SiO₃/NaOH. A similar technique of RSM and ANOVA was used by Cortes and Garcia [26], Tian et al. [35], and Hamdane et al. [37] to model and optimize the compressive strength of geopolymers.

$$\begin{aligned} Compressive strength \left( {{\text{MPa}}} \right) & = - 69.90722 + 11.56527x_{1} \\ & \quad + 2.21019x_{2} + 7.24578x_{3} - 0.053167x_{1} x_{2} \\ & \quad + 0.65889x_{1} x_{3} - 0.017630x_{2} x_{3} - 0.34918x_{1}^{2} \\ & \quad - 0.00976990x_{2}^{2} - 3.94373x_{3}^{2} \\ \end{aligned}$$

(5)

Mazzinghy et al. [63] modeled the compressive strength (S) of one-part iron ore tailings-geopolymer with respect to time (t) as illustrated by Eq. 6:

$$S = A\left( {1 - exp\left( { - bt} \right)} \right)$$

(6)

where A and b are parameters providing maximum value and rate of approach. For model-fitting, the objective function (Eq. 7) was used:

$$\theta = \mathop \sum \limits_{t = 1}^{28} (St - \overline{St} )^{2} /St$$

(7)

where $St$ is the experimental strength and $\overline{St}$ is the model strength in time t.

Furthermore, Mazzinghy et al. [63] used the Excel solver function to find the best features by minimizing the value given in Eq. 7 using a non-linear optimization algorithm. Ahmed et al. [28] used a similar Excel solver function to minimize an objective function on fly ash geopolymer mortar. The developed model by Mazzinghy et al. [63] is illustrated in Fig. 4 and shows compressive strength findings of 43 MPa, 40.7 MPa, and 48.1 MPa after 28 days of ambient curing with SS/SH ratio of 10:1, 7:1, and 4:1, respectively. However, the fitting curve did not provide a better representation of the experimental data which can be attributed to the choice of fitting equation or activation function defined as a mathematical equation that calculates an output based on the input variables. The Avram and Tanh equations work best with a sigmoidal curve (0–1) and hyperbolic curve (− 1 to 1) represented by Eqs. 8 and 9, respectively, commonly used in neural networks.

$$Y = {\text{A*}}\left( {1 - \exp \left( { - b*t^{n} } \right)} \right)$$

(8)

where A, b, and n are fitting constants while t is the correspondent time.

$$\tanh \left( x \right) = \left( {{\text{e}}^{{\text{x}}} - e^{ - x} } \right)/\left( {{\text{e}}^{{\text{x}}} + e^{ - x} } \right)$$

(9)

The residuals are squared to avoid minimizing the sum with positives and negatives. Being on either side of the fit is an error, but if left unchecked a residual of 0.1 and -0.1 would equate to zero when in fact they should be agnostic to which side the error occurred and sum to 0.2. To fix this problem, the residuals are squared, and everything is positive thus finding the "least squares" for the best fit.

In situations where there are more variables and need to find the equation to fit the experimental data, it is best to first plot the data and start with a linear model and then use higher-order polynomials as the first optimization test to examine the fitting error and see if it appears quadratic, logarithmic, exponential, etc. Lastly, try other model forms and evaluate how they fit with an R², sum of squared errors, or sum of absolute errors. A similar technique was used by Petroli et al. [64] who tested the tangent sigmoid (Eq. 10), logarithmic sigmoid (Eq. 11), and linear algorithm (Eq. 12) for use as activation functions in predicting transition pressures for two ternary systems. The Levenberg–Marquardt algorithm was used as the training model for the least-squares curve-fitting problem since its output results gave the lowest mean squared error (MSE), root mean squared error (RMSE), and the mean absolute deviation (MAD) represented by Eqs. 13, 14, and 15, respectively [64].

$$f\left( x \right) = \frac{2}{{\left( {1 + e^{ - 2x} } \right)}} - 1$$

(10)

$$f\left( x \right) = \frac{1}{{\left( {1 + e^{ - x} } \right)}} - 1$$

(11)

$$f\left( x \right) = ax$$

(12)

$$MSE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {Yi - \overline{Y}i} \right)^{2}$$

(13)

$$RMSE = SQRT \left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {Yi - \overline{Y}i} \right)^{2} } \right)$$

(14)

$$MAD = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {Yi - \overline{Y}i} \right|$$

(15)

The Feed-Forward Network (FFN) used for the methanol system achieved a determination coefficient (R²) of 0.99878, MSE of 0.01612, RMSE of 0.13, and MAD of 0.10 while the Elman Network used for the ethanol system achieved an R² of 0.99359, MSE of 0.99359, RMSE of 0.25, and MAD of 0.16. The results showed that the artificial neural networks (ANN) method better fitted the experimental dataset compared to the Peng-Robinson models which used a mixing approach subject to Wong-Sandler mixing rules and van der Waals quadratic.

4.2 Supervised learning algorithms

Supervised machine learning algorithms are commonly used in model development due to their use of labelled data making them efficient and versatile. Sun et al. [65] used five RF algorithms to predict the compressive strength (using 193 test samples dataset), slump (using 145 test samples dataset), dynamic yield stress, static yield stress, and plastic viscosity of alkali-activated concrete. RF is an ensemble bagging/bootstrap aggregation technique consisting of row sampling with replacement and feature sampling to form several decision trees (DT) which are combined into a majority vote (classifier) and average (regressor) to mitigate bias, variance, and overfitting [66, 67]. Bagging is commonly employed in creating low-variance models by eliminating noise and bias. Figure 5 illustrates the workflow applied to derive the RF algorithm consisting of many decision trees. To achieve accuracy in predictive performance and computational cost based on the predictor/input design parameters i.e., precursor content, blast furnace slag ratio, Na₂O content, M_s, water content, fine aggregate, coarse aggregate, and testing age, an optimal hyperparameter tuning set [68] was found at mtry equal to 8 and ntree equal to 2¹⁰. Hyperparameters are the options or variable values fed to an algorithm to fine-tune how it works. The experimental dataset was split into 80% for training and 20% for testing using tenfold cross-validation to evaluate the accuracy of the trained model in generalizing to new data and minimizing overfitting/underfitting.

Statistical metrics e.g., coefficient of determination (R²), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and alpha 20 (a20-index), were applied to assess the prediction performance of the developed random forest algorithms.

$${\text{R}}^{2} = 1 - \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{n}}} \left( {{\text{Yi}} - {\overline{\text{Y}}\text{i}}} \right)}}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{n}}} \left( {{\text{Yi}} - {\hat{\text{Y}}\text{i}}} \right)}}$$

(16)

$${\text{MAE }} = \frac{1}{{\text{n}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{n}}} \left| {{\text{Yi}} - {\overline{\text{Y}}\text{i}}} \right|$$

(17)

$${\text{MAPE }} = \frac{1}{{\text{n}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{n}}} \left| {\frac{{{\text{Yi}} - {\overline{\text{Y}}\text{i}}}}{{{\text{Yi}}}}} \right|{*}100$$

(18)

$${\text{a}}20{\text{ index}} = \frac{{{\text{m}}20}}{{\text{n}}}$$

(19)

where ${\text{Yi}}$, ${\overline{\text{Y}}\text{i}}$, $\widehat{{{\text{Yi}}}}$, ${\text{n}}$, and ${\text{m}}20$ represents the actual value, predicted value, mean value, sample size, and number of samples with an experimental/predicted ratio between 0.80 and 1.20 [30, 32].

After iteration [65], the results showed a good fit between the observation and prediction with errors within ± 20%. For instance, random forest prediction of the 28-day compressive strength training dataset achieved an R² of 0.96, MAE of 2.45, RMSE of 3.22, and MAPE of 9.21% while the testing dataset had an R² of 0.92, MAE of 4.48, RMSE of 5.43, and MAPE of 15.98%. The impact of predictor features on output values was examined using the out-of-bag permutation which decreases the mean accuracy. The results showed that features with greater effect on compressive strength were curing age and silicate modulus (Ms) giving normalized importance factors of 0.653 and 0.154, respectively. For the slump, it was the water content (0.355), silicate modulus (0.189), and precursor content (0.142). A similar RF algorithm technique for compressive strength prediction was used and recommended by Verma [69], Li et al. [70], and Ding et al. [71]. In contrast, Verma [69], and Li et al. [70] split their training/testing dataset into 70:30 and 60:40, respectively, while Ding et al. [71], Sun et al. [65] and Nguyen et al. [72] used an 80:20 dataset split.

Peng and Unluer [73] used support vector machine (SVM), extreme learning machine (ELM), and back propagation neural network (BPNN) to predict the 28-day compressive strength of fly ash-based geopolymer concrete. A dataset of 110 groups was collected from literature consisting of different input parameters/features e.g., fly ash content/composition, alkaline activator content (NaOH + Na₂SiO₃), AA/FA ratio, water, polycarboxylate superplasticizer, curing temperature and duration, fine and coarse aggregate. The performance accuracy of the models was examined using R², MSE, RMSE, and MAE. All the algorithms were run in MATLAB 2016a [74]. The training/testing/validation dataset was split into 70:15:15 which concurs with [33, 75]. It was found that all models had relatively good performance between the predicted and actual values with errors within ± 20%. In terms of performance accuracy, BPNN prediction of the 28-day compressive strength achieved an R² of 0.9323, MSE of 6.83, RMSE of 2.61, and MAE of 1.61. SVM prediction of the 28-day compressive strength achieved an R² of 0.9148, MSE of 11.39, RMSE of 3.37, and MAE of 2.30. ELM prediction of the 28-day compressive strength achieved an R² of 0.9146, MSE of 11.41, RMSE of 3.38, and MAE of 2.57. Figure 6 shows an illustration of SVM, ELM, and BPNN.

SVM is one of the best nonlinear supervised machine learning models [76]. Given a set of labelled training data, SVM finds the optimal hyperplane which categorizes new examples. Unlike linear regression or neural networks, only the support vector decides the best decision boundary known as a hyperplane. ELM is a generalized single-hidden layer feed-forward neural network (SLFFNN) and provides a faster learning speed and better generalization capability because the learning algorithms, hidden node weights, and biases are randomly assigned and need not be tuned [77, 78]. BPNN is widely used to train neural networks through the chain rule by fine-tuning the weights and biases to minimize the previous gradient descent and iteration error [79,80,81]. An artificial neural network (ANN), inspired by the human brain, is a group of artificial neurons composed of an input layer, hidden layer(s), weights, a bias (or threshold), and output layer [82] given by Eqs. 20 and 21:

$$y = \mathop \sum \limits_{i = 1}^{n} \left( {WiXi + b} \right)$$

(20)

$$f\left( x \right) = Activation\, function \left( y \right)$$

(21)

As shown in Fig. 7, the weights (Wi) in the neural network are assigned to each input layer (Xi) to convey the importance of the input feature in predicting the output value. The bias (b) adjusts the activation function f(x) either to the left or right. The summation function (i.e., $y = \mathop \sum \nolimits_{i = 1}^{n} \left( {WiXi + b} \right)$ binds the weights and inputs together to determine the sum and produce a single input value to the neuron. Activation functions e.g., binary step function, logistic sigmoid function, tanh function, arctan function, rectified linear unit (ReLU), leaky ReLU, and softmax function, are defined as mathematical functions that calculate the output of a neuron based on inputs and weights [83, 84]. The activation function introduces non-linearity in the model and transforms the neuron input value through a sigmoid, hyperbolic tangent, or ReLU [86, 85]. However, the drawback of the sigmoid and Tanh functions is that when taking the derivative of both these functions during gradient descent, the smaller value of the derivative leads to slow learning (vanishing gradient problem) which can be overcome by using the rectified linear unit [87].

Therefore, neural network learning entails finding the right weights and biases to a problem through a forward and backward propagation where the activations range between 0 and 1 forming a sigmoid curve. In the study by Peng and Unluer [73], both ELM and BPNN adopted the sigmoid activating function given by Eq. 22:

$$y = \frac{1}{{\left( {1 + { }e^{ - x} } \right)}}$$

(22)

Awoyera et al. [88] estimated the strength properties of geopolymer self-compacting concrete using ANN and GEP. The input parameters used for model development comprised GGBFS, silica fume, FA, and workability measured using slump flow, T50 cm, V-funnel, L-box, and J-ring tests. A dataset of 105 samples, split into 80:20 training/testing sets, was used to develop the ANN model while a dataset of 412 samples was used to develop the GEP. The feed-forward back propagation neural network (FFBPNN) subject to the Levenberg–Marquardt algorithm [89,90,91] was utilized to train the data in MATLAB. The Levenberg–Marquardt algorithm interpolates between the gradient descent and Gauss–Newton methods to find a local and global minimum based on Eq. 23:

$$x_{n + 1} = x_{n} - (\nabla^{2} f\left( {x_{n} } \right) + \lambda I)^{ - 1} \nabla f\left( {x_{n} } \right)$$

(23)

GEP applies evolutionary biology principles to find a global minimum to a problem. The procedure starts with a preliminary generation of creatures validated on the objective function and then continues to crossover and mutate until an evolved optimal solution is generated [92, 93]. GEP utilizes natural selection to simulate the survival of the fittest and natural selection inside the computer given by Eq. 24:

$$M_{i} = \mathop \sum \limits_{j = 1}^{{k_{t} }} \left( {F - \left| {K_{{\left( {i,j} \right)}} - T_{j} } \right|} \right)$$

(24)

where Mi: fitness function, M: data selection range, K_(i,j): value found by creature i for fitness case j, and T_j: target value [88]. It was found that the developed ANN and GEP models demonstrated high-performance accuracy in predicting the strength of self-compacting geopolymer concrete. ANN prediction of the 28-day compressive strength achieved an R² of 0.89, MSE of 0.00566, RMSE of 0.07523 while GEP achieved an R² of 0.45465, MSE of 11.1, RMSE of 3.33, and MAE of 2.02, for the testing dataset. In contrast, Mazumder and Prasad [94] observed that GEP had better performance accuracy in predicting geopolymer compressive strength than ANN and SVM since it gave an R² of 0.9922, MSE of 3.3302, RMSE of 1.8248, and MAE of 1.5053 for the testing dataset. ANN has some drawbacks such as implicit storage of acquired knowledge and difficulty in interpreting network architecture decision-making process, slow convergence speed, local minima solutions, poor generalization performance, and overfitting [95].

In another study, Nazari and Sanjayan [76] utilized SVM, ANN, adaptive neuro-fuzzy interfacial systems (ANFIS), and five hybrid algorithms (imperialist competitive algorithm (ICOA), artificial bee colony optimization algorithm (ABCOA), ant colony optimization algorithm (ACOA), particle swarm optimization algorithm (PSOA) [70], and genetic algorithm (GA)) to predict the compressive strength of geopolymer paste, mortar, and concrete. The metaheuristic algorithms optimized the SVM to form hybrid algorithms which were then compared to the traditional non-optimized SVM, ANN, and ANFIS. 1347 datasets collected from the literature were used for training/testing the model. 12 input parameters were used consisting of slag, fly ash, water, fine aggregate, coarse aggregate, Na₂SiO₃, NaOH, KOH, superplasticizer, curing temperature, and curing time. The results showed that all R² values of the hybrid models were better than the traditional models. ICOA–SVM had the highest performance with an R², MAE, RMSE, and MAPE of 0.8993, 1.9092, 3.2603, and 7.6373, respectively. It was followed by GA–SVM and ANN as the second and third-best models for predicting geopolymer strength. Mozumder et al. [95] used support vector machine regression (SVMR) to predict the 28-day compressive strength of GGBFS geopolymer stabilized clayey soil. A dataset of 213 geopolymer-stabilized clayey soil was utilized in model development. The dataset was split into 70% training and 30% testing. The modeling was performed in MATLAB using the SVR toolbox. The input parameters comprised liquid limit (LL), plasticity index (PI), binder content (GGBFS), molar concentration (M), and alkali/binder (A/B). The model performance indicators comprised R², RMSE, and MAPE. SVM, invented by Vapnik [96], is a powerful and robust kernel-based classification and regression algorithm with superior generalization capability [97, 98]. The generalization capability is dependent on the optimal hyperplane defined by Eq. 25 and Fig. 8:

$$f\left( x \right) = \left( {{\text{w}}.{\text{x}}} \right) + b$$

(25)

where w: weight vector; x: input vector; and b: bias.

However, the inclusion of an insensitive loss function (ε-) in Eq. 25 makes it a support vector regression (SVR) problem as it introduces the concept of margin in SVM to minimize model complexity implying no prediction error in the model if within ε- [96] as defined by Eq. 26.

$$L_{\varepsilon } (y) = |y - f(x)|_{\varepsilon } = f(x) = \left\{ {\begin{array}{*{20}l} 0 \hfill & {if\left| {y - f(x)} \right| \le \varepsilon } \hfill \\ {\left| {y - f(x)} \right| - \varepsilon , } \hfill & {otherwise} \hfill \\ \end{array} } \right.$$

(26)

where $L_{\varepsilon } \left( y \right)$: loss function and ε > 0.

The use of kernel functions addresses nonlinear regression computation difficulty due to the high dimensionality of feature space. This study by Mozumder et al. [95] applied the radial basis kernel function (RBF), exponential radial basis kernel function (ERBF), and polynomial kernel function (POLY). The results showed that SVR-ERBF performed better in compressive strength prediction (R² = 0.9992, RMSE = 0.1973, MAPE = 0.2586) compared to SVR-RBF (R² = 0.9829, RMSE = 0.8679, MAPE = 3.2542) and SVR-POLY (R² = 0.9688, RMSE = 1.1722, MAPE = 4.9073). A similar technique of SVR was adopted by Kumar et al. [99] which gave an R² of 0.87385, MAE of 1.6034, RMSE of 2.1850, and MAPE of 4.3084. A parametric study with the SVR-ERBF model showed that compressive strength is directly proportional to binder content but inversely proportional to LL and PI. However, the effect of M and A/B on strength was equivocal which concurs with studies by Duxson et al. [100] and Thokchom et al. [101].

Huo et al. [8] developed compressive strength prediction models for geopolymer concrete using 8 machine learning algorithms i.e., RF, extra-trees (ET), gradient boosting decision tree (GBDT), bootstrap aggregating (BA), k nearest neighbor (KNN), extreme gradient boosting (XGB) [102], SVM, and deep neural networks (DNN). A 557 dataset on calcium-based geopolymers was collected from the literature and split into 80% training and 20% testing data using Scikit-learn in Python [103, 104]. The input features comprised oxide composition (Si/Al, Si/Ca, Si/Na₂O, H₂O/Na₂O, and Na₂O content), liquid/solid, curing temperature, and curing age. The influence of the input features on strength development was assessed using Shapley Additive Explanations (SHAP) [105,106,107]. The results showed that XGB had the highest prediction accuracy with an R² of 0.91, RMSE of 3.85, MAE of 2.51, and MAPE of 16.94 for the testing dataset as depicted by the radar chart in Fig. 9 where XGB is at the inner core of the radar plot. SHAP showed that the curing age, curing temperature, H₂O/Na₂O, Si/Ca, and L/S, have the most influence on the compressive strength. A similar technique of SHAP was applied by Shah et al. [108] to quantify the significance of each feature in fly ash-slag one-part geopolymers and found that XGB had the highest performance accuracy with R² of 0.90, MAE of 4.47, and RMSE of 7.90 greatly influenced by Na₂O dosage, precursor content, water/binder ratio, and curing temperature. Also, Nguyen et al. [59] found XGB to be a robust model in predicting the compressive strength of fly ash-based geopolymer concrete as it had an R² of 0.964, RMSE of 2.457, MAE of 1.794, and MAPE of 0.086 for the testing data even though the RF was found to be the most effective in the training data.

5 Discussion

Several researchers have considered machine learning and statistical modeling as novel techniques in predicting geopolymer strength. The commonly modeled output parameter has been compressive strength based on input variables consisting of precursor materials (e.g., cement, FA, and GGBFS), coarse aggregate, fine aggregate, NaOH/Na₂SiO₃, activator/binder ratio, superplasticizer, NaOH molarity, AL/B, Ms, extra water, curing temperature, and curing age. Compressive strength is the most modeled output variable since it greatly influences the geopolymer structural performance, durability, and safety rating [109]. The precursor materials have been used in their unary and binary forms to improve the physico-chemical properties of geopolymers. Most ML models have been based on unary precursors to minimize the complexity of the model due to the heterogeneous nature of the waste materials. The variation in the contents and ratios of SiO₂, Al₂O₃, Fe₂O₃, and CaO, greatly influences the suitability of the precursor material in producing a geopolymer of good quality strength [110] . All the precursor materials come from different sources and are bound to have different physico-chemical properties. Interestingly, not all precursor material properties e.g., morphology, mineralogy, and chemical composition, have been incorporated in most of the ML and SM thus negatively influencing their performance when it comes to experimental testing data validation.

Alkaline activators also play a crucial role in the strength development of geopolymers and subsequently performance of the developed ML and SM. The most used alkaline activators have been NaOH and Na₂SiO₃ due to their favourable performance in the dissolution of precursors and strength development of geopolymers [7]. Furthermore, the sensitivity of the strength prediction models is greatly influenced by the curing conditions (i.e., temperature and time) and alkaline activator (NaOH + Na₂SiO₃). The dissolution of precursor materials and their rearrangement into poly(sialate) and poly(sialate-siloxo) depends on the concentration of the alkaline activator. Elevated temperature curing controls the setting time and reactivity of the precursor + alkaline liquid mixture ensuring complete dissolution into geopolymeric gels with no unreacted particles left behind.

Figure 10 shows a consolidated framework of the machine learning architectural layers comprising the acquisition of structured and unstructured data, data conditioning, algorithm and model training, human–machine interaction, and perception and deployment. The geopolymer dataset comprising precursor materials, alkaline activators, aggregates, and curing time and temperature, is acquired through experimental testing. The raw data is pre-processed through data conditioning to filter outliers, treat missing data, and produce information. The algorithms commonly based on supervised machine learning (SML) are used to model experimental data and perform geopolymer strength predictions. Engineers and construction practitioners interact with the developed predictive model in real practical deployments to gain further insights. The framework provides a structured summary of employing ML to mimic real experimental data for use in structural engineering.

The application of ML in predicting the mechanical properties of geopolymers is advantageous over empirical regression models (SM) because it explicitly iterates the nonlinear connection between dependent and independent variables. Interestingly, the empirical regression models and machine learning algorithms complement each other in precisely predicting the strength properties of geopolymers, thereby minimizing laboratory experiments and rationing precursor materials. This approach can sustainably optimize geopolymer mix design giving better strength properties, lower operational cost, and minimal environmental impact during the pre- and post-construction phases. Statistical performance metrics comprising correlation coefficient (R), determination coefficient (R²), mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and a20 index are crucial tools in validating the model strength prediction performance accuracy. The inclusion of sensitivity analysis and permutation feature importance in the models can advance the understanding and ranking of key features influencing the strength prediction in geopolymers. Based on this review, consolidating particle size distribution, surface area, specific gravity, density, chemical composition, morphology, and mineralogy, into the geopolymer strength prediction models can improve their replicability and generalizability regardless of precursor material source and nature. Furthermore, the level of dataset quality, size, and hyperparameter tuning influences the model performance thus explaining the variation in sensitivity for the same input variables [73].

Table 2 summarizes the performance of various ML and SM. Regression is a reliable statistical technique useful in data prediction, modelling, and analysis. Based on its efficiency, simplicity, and prognosis ability, LR is commonly used in most statistical models dealing with larger and smaller datasets. However, in smaller datasets, LR tends to overfit data by assigning more weights to specific features thereby reducing the performance accuracy of the predictive model. To overcome the overfitting problem and subpar nonlinear performance, RR, LASSO, and GPR are suitable modern modifiers as they tend to penalize/regularize the model coefficients and make the weights of higher-order features approach zero. LASSO selects a few features in the dataset dropping off variates and erratic coefficient estimates. GPR has a Bayesian requirement for continuous data to execute the probability distribution functions. Nevertheless, this review observed that neural networks, support vector machines, random forest, and Gaussian process regression strength-prediction models have significant performance since their R² values are > 0.9, the p-values < 0.0001, and errors are within ± 20%. The mix design is the core determinant of the mechanical strength of geopolymers such that a broad optimization of the mix materials and proportions dataset can improve safety ratings, control quality, save time, and minimize cost during the pre-and post-design phases.

Table 2 Summary comparing the performance of various ML and SM models

Full size table

However, some drawbacks to the applicability of machine learning and statistical models in the strength prediction of geopolymers exist. Firstly, the choice of activation function which depends on the type of network and experimental data influences the model performance and defines the prediction output accuracy. For example, most regression problems use one node linear activation function for the output layer while classification problems use one node sigmoid or softmax function for the output layer. In contrast, the multilayer perceptron and convolutional neural network use the ReLU function for each hidden layer while the recurrent neural network uses the Sigmoid and Tanh functions for each hidden layer. Nevertheless, it is recommended to try a few activation functions for each type of network and prediction problem and then compare the results to make an informed choice on what to use for the neural network model. Secondly, the dataset used in the activation function may be prone to noise and bias due to variability in experimental conditions affecting model interpretability and generalizability. Furthermore, each prediction model developed is specific to the nonlinear material properties such that the same precursor materials in a different or similar laboratory environment can give a different compressive strength. To minimize noise and bias and improve model interpretability/generalizability for use in structural engineering, the dataset needs to be subjected to robust preprocessing, structural risk minimization, hyperparameter tuning, regularization, cross-validation, statistical metrics, hybrid algorithms, and inter-laboratory validation experiments. The use of kernel functions addresses nonlinear regression computation difficulty due to the high dimensionality of feature space. Data preprocessing can help detect/remove outliers and treat missing data. This ensures consistency and splitting of data to be used in model development. The most used data splitting is a combination of 70/15/15% for training/testing/validating giving relatively good performance between the predicted and actual value with errors within ± 20%. The tenfold cross-validation is commonly applied to evaluate the accuracy of the trained model in generalizing to new data and minimizing overfitting/underfitting. Thereafter, inter-laboratory validation of the developed models can improve their reliability and streamline model performance to new materials and experimental conditions. Hybridized models tend to perform better than traditional models such that their incorporation improves prediction performance and interpretability. Advanced ML techniques such as deep learning, ANFIS, ResNet, and GPR, have better regularization and structural risk minimization improving the model's reliability and stability. Thirdly, a scarcity of datasets due to limited experimental tests. As the saying goes ‘garbage in garbage out’ [111], the robustness of the predictive models depends on the input data variables which are mostly determined through laboratory experiments. Using too few data in the model is prone to overfitting affecting its predictive ability under different datasets. Therefore, conducting more experiments using various materials prepared under different laboratory and in-situ conditions will greatly increase the datasets providing a wide range of input and output data for developing reliable predictive models. Besides data normalization, the performance of the predictive models could benefit from other data conditioning approaches detailing the deletion of outliers and the identification of missing and duplicated data. Moreover, the moulds used in casting geopolymers are of different dimensions, so their inclusion in data normalization could improve the performance of the prediction models. Lastly, there is little hyperparameter tuning to identify the crucial features influencing model performance. An optimal combination of the key features and selection of hidden layers dependent on problem complexity and data availability provides improved strength predictive models with minimal underfitting/overfitting without increasing the computational power. Tuning ANN models and using Levenberg–Marquardt or Bayesian regularization expands their ability to perform at par or even better than the ANFIS and DNN. ANFIS and DNN are emerging technologies with advanced predictive performance attributed to their integration of fuzzy logic + neural networks and multiple hidden layers/neurons, respectively. Incorporating the proposed suggestions in most geopolymer strength prediction models would give engineers and researchers greater confidence in accepting the interpretability/generalizability of the models to real-life practical scenarios further validated through laboratory trials.

6 Limitations and future work

The data used for the systematic review was collected from the Scopus database with set eligibility criteria such that articles written in non-English languages were not included. The search strings were set to “geopolymer” OR “alkali-activated materials” AND “machine learning” OR “statistical modeling” which might not guarantee the retrieval of alternative synonyms such as “eco-cement”, and “green-cement”. However, non-English publications and alternative databases or synonyms are not commonly used for systematic and bibliometric studies in this field hence nothing substantial is expected to change from the present review. Regarding the content of the reviewed studies, the development of the strength predictive models employed different input features and dataset ranges. Most studies did not specify the data normalization, preprocessing, and hyperparameter tuning making the inter-study comparison of the developed models very challenging. Furthermore, most studies did not tackle the impact of treating outliers and missing data on the model performance. These limitations could be addressed by specifying the feature selection criteria, data conditioning, and using completely new datasets.

Future research should investigate the impact of integrating precursor reactivity, particle size distribution, chemical composition, and mineralogy on model performance. The commonly used precursors consisting of fly ash, slag, and rice husk ash, have different reactivity and composition such that developed predictive models based on their unary and binary/ternary combinations behave differently. Additional mechanical properties such as elastic modulus, peak stress, and peak strain, could be predicted. Structural members e.g., beams, and columns, are influenced by the varying material properties, and their use is not only defined by compressive strength but also shear strength and flexural strength. Therefore, developing a universal strength predictive model incorporating all the physico-chemical and engineering properties could enhance the interpretability of ML and SM and eventually address the black-box nature.

7 Conclusions

This paper has systematically reviewed the fundamental machine learning algorithms and statistical models applied in predicting geopolymer compressive strength. The following conclusions were drawn from the review:

The commonly used input variables comprise FA, GGBFS, coarse aggregate, fine aggregate, NaOH/Na₂SiO₃, activator/binder ratio, superplasticizer, NaOH molarity, AL/B, Ms, extra water, curing temperature, and curing age. Hyperparameter tuning and SHAP showed that input features with a greater effect on compressive strength were curing conditions and Ms giving normalized importance factors greater than 0.6 and 0.2, respectively.
LR, RR, GPR, and LASSO are commonly used empirical regression techniques in geopolymer data prediction and modeling. RF, DT, ET, SVM, ELM, BPNN, ANN, DNN, SLFFNN, GEP, ANFIS, ICOA, ABCOA, ACOA, PSOA, GA, KNN, and XGB, are commonly used machine learning algorithms. NN gives better strength prediction performances with R² values > 0.99 followed by RF, SVM, and GPR.
Activation functions are a vital part of neural networks. The activation function introduces non-linearity in the model and transforms the neuron input value through a sigmoid, hyperbolic tangent, or ReLU.
Machine learning models have better predictive ability than empirical regression models attributed to their advanced ability to incorporate the nonlinearity of specific input and output variables.
To minimize noise and bias and improve model interpretability/generalizability for use in structural engineering, the dataset needs to be subjected to robust preprocessing, structural risk minimization, hyperparameter tuning, regularization, cross-validation, statistical metrics, hybrid algorithms, and inter-laboratory validation experiments.

Data availability

No datasets were generated or analysed during the current study.

References

Danish A, et al. Sustainability benefits and commercialization challenges and strategies of geopolymer concrete: a review. J Build Eng. 2022;58:105005. https://doi.org/10.1016/j.jobe.2022.105005.
Article MathSciNet Google Scholar
Nodehi M, Taghvaee VM. Alkali-activated materials and geopolymer: a review of common precursors and activators addressing circular economy. Circ Econ Sustain. 2022;2(1):165–96. https://doi.org/10.1007/s43615-021-00029-w.
Article Google Scholar
Matsimbe J, Dinka M, Olukanni D, Musonda I. Bibliometric trends of geopolymer research in Sub-Saharan Africa. Mater Today Commun. 2023;35:106082. https://doi.org/10.1016/j.mtcomm.2023.106082.
Article Google Scholar
Davidovits J. Geopolymers—inorganic polymeric new materials. J Therm Anal. 1991;37:1633–16556.
Article Google Scholar
Davidovits J. Geopolymer chemistry and applications. 5th ed. Saint-Quentin: Institut Géopolymère; 2020.
Google Scholar
Hu W, Ma Y, Koehler M, Gong H, Huang B. Mix design optimization and early strength prediction of unary and binary geopolymer from multiple waste streams. J Hazard Mater. 2021;403:123632. https://doi.org/10.1016/j.jhazmat.2020.123632.
Article Google Scholar
Matsimbe J, Dinka M, Olukanni D, Musonda I. Geopolymer: a systematic review of methodologies. Materials. 2022;15(19):6852. https://doi.org/10.3390/ma15196852.
Article Google Scholar
Huo W, Zhu Z, Sun H, Ma B, Yang L. Development of machine learning models for the prediction of the compressive strength of calcium-based geopolymers. J Clean Prod. 2022;380:135159. https://doi.org/10.1016/j.jclepro.2022.135159.
Article Google Scholar
Li N, Shi C, Zhang Z, Wang H, Liu Y. A review on mixture design methods for geopolymer concrete. Compos B Eng. 2019;178:107490. https://doi.org/10.1016/j.compositesb.2019.107490.
Article Google Scholar
Mohr DL, Wilson WJ, Freund RJ. Statistical methods. Amsterdam: Elsevier; 2022. https://doi.org/10.1016/C2019-0-02521-6.
Book Google Scholar
Samuel AL. Some studies in machine learning using the game of checkers. IBM J Res Dev. 1959;3(3):210–29. https://doi.org/10.1147/rd.33.0210.
Article MathSciNet Google Scholar
Alpaydn E. Introduction to machine learning. 4th ed. London: MIT Press; 2020.
Google Scholar
Domingos P. The master algorithm: how the quest for the ultimate learning machine will remake our world. 1st ed. New York: Basic Books; 2015.
Google Scholar
Joiner IA. Artificial intelligence. In: Emerging library technologies. Elsevier, 2018, pp 1–22. https://doi.org/10.1016/B978-0-08-102253-5.00002-2
Clifton J, Frohnsdorff G. High-performance concrete program at the U.S. National Institute of Standards and Technology. In: Beijing international symposium on cement and concrete, Beijing, 1, CH: National Institute of Standards and Technology, 2021.
Garboczi E, Bentz D, Frohnsdorff G. Knowledge-based systems and computational tools for concrete - computer integrated knowledge systems combine databases, models, and computing tools to address the complex nature of concrete. Concrete International, 2000.
Adel H, Ilchi Ghazaan M, Habibnejad Korayem A. Machine learning applications for developing sustainable construction materials. In: Artificial intelligence and data science in environmental sensing, Elsevier, 2022, pp 179–210. https://doi.org/10.1016/B978-0-323-90508-4.00002-2
Khan K, Ahmad W, Amin MN, Ahmad A. A systematic review of the research development on the application of machine learning for concrete. Materials. 2022;15(13):4512. https://doi.org/10.3390/ma15134512.
Article Google Scholar
Bang J, Yang B. Application of machine learning to predict the engineering characteristics of construction material. Multiscale Sci Eng. 2023;5(1–2):1–9. https://doi.org/10.1007/s42493-023-00092-5.
Article Google Scholar
Salehi H, Burgueño R. Emerging artificial intelligence methods in structural engineering. Eng Struct. 2018;171:170–89. https://doi.org/10.1016/j.engstruct.2018.05.084.
Article Google Scholar
Kardani N, Bardhan A, Samui P, Nazem M, Asteris PG, Zhou A. Predicting the thermal conductivity of soils using integrated approach of ANN and PSO with adaptive and time-varying acceleration coefficients. Int J Therm Sci. 2022;173:107427. https://doi.org/10.1016/j.ijthermalsci.2021.107427.
Article Google Scholar
Ben Chaabene W, Flah M, Nehdi ML. Machine learning prediction of mechanical properties of concrete: critical review. Constr Build Mater. 2020;260:119889. https://doi.org/10.1016/j.conbuildmat.2020.119889.
Article Google Scholar
Onyelowe KC, Iqbal M, Jalal FE, Onyia ME, Onuoha IC. Application of 3-algorithm ANN programming to predict the strength performance of hydrated-lime activated rice husk ash treated soil. Multiscale Multidiscip Model Exp Design. 2021;4(4):259–74. https://doi.org/10.1007/s41939-021-00093-7.
Article Google Scholar
Parhi SK, Patro SK. Prediction of compressive strength of geopolymer concrete using a hybrid ensemble of grey wolf optimized machine learning estimators. J Build Eng. 2023;71:106521. https://doi.org/10.1016/j.jobe.2023.106521.
Article Google Scholar
Song H, et al. Predicting the compressive strength of concrete with fly ash admixture using machine learning algorithms. Constr Build Mater. 2021;308:125021. https://doi.org/10.1016/j.conbuildmat.2021.125021.
Article Google Scholar
Perez-Cortes P, Escalante-Garcia JI. Alkali activated metakaolin with high limestone contents—statistical modeling of strength and environmental and cost analyses. Cem Concr Compos. 2020;106:103450. https://doi.org/10.1016/j.cemconcomp.2019.103450.
Article Google Scholar
Chou J-S, Tsai C-F, Pham A-D, Lu Y-H. Machine learning in concrete strength simulations: multi-nation data analytics. Constr Build Mater. 2014;73:771–80. https://doi.org/10.1016/j.conbuildmat.2014.09.054.
Article Google Scholar
Ahmed HU, Abdalla AA, Mohammed AS, Mohammed AA, Mosavi A. Statistical methods for modeling the compressive strength of geopolymer mortar. Materials. 2022;15(5):1868. https://doi.org/10.3390/ma15051868.
Article Google Scholar
Baghbani A, Soltani A, Kiany K, Daghistani F. Predicting the strength performance of hydrated-lime activated rice husk ash-treated soil using two grey-box machine learning models. Geotechnics. 2023;3(3):894–920. https://doi.org/10.3390/geotechnics3030048.
Article Google Scholar
Asteris PG, et al. Predicting uniaxial compressive strength of rocks using ANN models: incorporating porosity, compressional wave velocity, and schmidt hammer data. Ultrasonics. 2024;141: 107347. https://doi.org/10.1016/j.ultras.2024.107347.
Article Google Scholar
Cavaleri L, Barkhordari MS, Repapis CC, Armaghani DJ, Ulrikh DV, Asteris PG. Convolution-based ensemble learning algorithms to estimate the bond strength of the corroded reinforced concrete. Constr Build Mater. 2022;359:129504. https://doi.org/10.1016/j.conbuildmat.2022.129504.
Article Google Scholar
Asteris PG, Koopialipoor M, Armaghani DJ, Kotsonis EA, Lourenço PB. Prediction of cement-based mortars compressive strength using machine learning techniques. Neural Comput Appl. 2021;33(19):13089–121. https://doi.org/10.1007/s00521-021-06004-8.
Article Google Scholar
Golafshani EM, Behnood A, Arashpour M. Predicting the compressive strength of normal and high-performance concretes using ANN and ANFIS hybridized with grey wolf optimizer. Constr Build Mater. 2020;232:117266. https://doi.org/10.1016/j.conbuildmat.2019.117266.
Article Google Scholar
Zhong WL, Ding H, Zhao X, Fan LF. Mechanical properties prediction of geopolymer concrete subjected to high temperature by BP neural network. Constr Build Mater. 2023;409:133780. https://doi.org/10.1016/j.conbuildmat.2023.133780.
Article Google Scholar
Tian Z, Zhang Z, Zhang K, Tang X, Huang S. Statistical modeling and multi-objective optimization of road geopolymer grouting material via RSM and MOPSO. Constr Build Mater. 2021. https://doi.org/10.1016/j.conbuildmat.2020.121534.
Article Google Scholar
Zahid M, Shafiq N, Isa MH, Gil L. Statistical modeling and mix design optimization of fly ash based engineered geopolymer composite using response surface methodology. J Clean Prod. 2018;194:483–98. https://doi.org/10.1016/j.jclepro.2018.05.158.
Article Google Scholar
Hamdane H, et al. Statistical modeling of geopolymers from dual-alkali activation of un-calcined phosphate sludge and their potential applications as sustainable coating materials. J Clean Prod. 2021;283:125421. https://doi.org/10.1016/j.jclepro.2020.125421.
Article Google Scholar
Ghanizadeh AR, Ghanizadeh A, Asteris PG, Fakharian P, Armaghani DJ. Developing bearing capacity model for geogrid-reinforced stone columns improved soft clay utilizing MARS-EBS hybrid method. Transp Geotech. 2023;38:100906. https://doi.org/10.1016/j.trgeo.2022.100906.
Article Google Scholar
Ali R, Muayad M, Mohammed AS, Asteris PG. Analysis and prediction of the effect of Nanosilica on the compressive strength of concrete with different mix proportions and specimen sizes using various numerical approaches. Struct Concr. 2023;24(3):4161–84. https://doi.org/10.1002/suco.202200718.
Article Google Scholar
Ahmed HU, Mohammed AS, Qaidi SMA, Faraj RH, Hamah Sor N, Mohammed AA. “Compressive strength of geopolymer concrete composites: a systematic comprehensive review, analysis and modelling. Eur J Environ Civ Eng. 2023;27(3):1383–428. https://doi.org/10.1080/19648189.2022.2083022.
Article Google Scholar
Alaneme GU, Olonade KA, Esenogho E. Critical review on the application of artificial intelligence techniques in the production of geopolymer-concrete. SN Appl Sci. 2023;5(8):217. https://doi.org/10.1007/s42452-023-05447-z.
Article Google Scholar
Paruthi S, Husain A, Alam P, Husain Khan A, Abul Hasan M, Magbool HM. A review on material mix proportion and strength influence parameters of geopolymer concrete: application of ANN model for GPC strength prediction. Constr Build Mater. 2022;356:129253. https://doi.org/10.1016/j.conbuildmat.2022.129253.
Article Google Scholar
Rathnayaka M, Karunasinghe D, Gunasekara C, Wijesundara K, Lokuge W, Law DW. Machine learning approaches to predict compressive strength of fly ash-based geopolymer concrete: a comprehensive review. Constr Build Mater. 2024;419:135519. https://doi.org/10.1016/j.conbuildmat.2024.135519.
Article Google Scholar
Shamseer L, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;349:g7647–g7647. https://doi.org/10.1136/bmj.g7647.
Article Google Scholar
Zhang B, Zhu H, Feng P, Zhang P. A review on shrinkage-reducing methods and mechanisms of alkali-activated/geopolymer systems: effects of chemical additives. Journal of Building Engineering. 2022;49:104056. https://doi.org/10.1016/j.jobe.2022.104056.
Article Google Scholar
Wohlin C. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering - EASE’14, New York, New York, USA: ACM Press, 2014, pp 1–10. https://doi.org/10.1145/2601248.2601268
Higgins JPT, et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928–d5928. https://doi.org/10.1136/bmj.d5928.
Article Google Scholar
Whiting P, et al. ROBIS: a new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34. https://doi.org/10.1016/j.jclinepi.2015.06.005.
Article Google Scholar
Sharma U, Gupta N, Verma M. Prediction of compressive strength of GGBFS and Flyash-based geopolymer composite by linear regression, lasso regression, and ridge regression. Asian J Civ Eng. 2023;24(8):3399–411. https://doi.org/10.1007/s42107-023-00721-2.
Article Google Scholar
Lederer J. Linear regression, 2022, pp. 37–79. https://doi.org/10.1007/978-3-030-73792-4_2.
Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67. https://doi.org/10.1080/00401706.1970.10488634.
Article Google Scholar
Tibshirani R. Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodol). 1996;58(1):267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
Article MathSciNet Google Scholar
Prem PR, Thirumalaiselvi A, Verma M. Applied linear and nonlinear statistical models for evaluating strength of Geopolymer concrete. Comput Concrete. 2019;24(1):7–17.
Google Scholar
Bayen M, Siauw T. An introduction to MATLAB® programming and numerical methods for engineers. Amsterdam: Elsevier; 2015. https://doi.org/10.1016/C2012-0-00145-7.
Book Google Scholar
Hastie T, Tibshirani R, Tibshirani R. Best subset, forward stepwise or lasso? Analysis and recommendations based on extensive comparisons. Stat Sci. 2020;35(4):579–92. https://doi.org/10.1214/19-STS733.
Article MathSciNet Google Scholar
Dai L, Chen K, Sun Z, Liu Z, Li G. Broken adaptive ridge regression and its asymptotic properties. J Multivar Anal. 2018;168:334–51. https://doi.org/10.1016/j.jmva.2018.08.007.
Article MathSciNet Google Scholar
Völker C, et al. Data driven design of alkali-activated concrete using sequential learning. J Clean Prod. 2023;418:138221. https://doi.org/10.1016/j.jclepro.2023.138221.
Article Google Scholar
Kurt Z, Yilmaz Y, Cakmak T, Ustabaş I. A novel framework for strength prediction of geopolymer mortar: renovative precursor effect. J Build Eng. 2023;76:107041. https://doi.org/10.1016/j.jobe.2023.107041.
Article Google Scholar
Nguyen MH, Mai H-VT, Trinh SH, Ly H-B. A comparative assessment of tree-based predictive models to estimate geopolymer concrete compressive strength. Neural Comput Appl. 2023;35(9):6569–88. https://doi.org/10.1007/s00521-022-08042-2.
Article Google Scholar
Ellis K, Silvestrini R, Varela B, Alharbi N, Hailstone R. Modeling setting time and compressive strength in sodium carbonate activated blast furnace slag mortars using statistical mixture design. Cem Concr Compos. 2016;74:1–6. https://doi.org/10.1016/j.cemconcomp.2016.08.008.
Article Google Scholar
Kleijnen JPC. Response surface methodology, 2015, pp. 81–104. https://doi.org/10.1007/978-1-4939-1384-8_4.
Backhaus K, Erichson B, Gensler S, Weiber R, Weiber T, Analysis of variance. In: Multivariate analysis. Wiesbaden: Springer, 2021, pp. 147–203. https://doi.org/10.1007/978-3-658-32589-3_3.
Mazzinghy DB, Figueiredo RAM, Parbhakar-Fox A, Yahyaei M, Vaughan J, Powell MS. Trialling one-part geopolymer production including iron ore tailings as fillers. Int J Min Reclam Environ. 2022;36(5):356–67. https://doi.org/10.1080/17480930.2022.2047271.
Article Google Scholar
Petroli G, Dalmolin I, Brusamarello CZ. Prediction of phase equilibrium between soybean biodiesel, alcohols and supercritical CO₂ using artificial neural networks. Chem Thermodyn Therm Anal. 2022;6:100048. https://doi.org/10.1016/j.ctta.2022.100048.
Article Google Scholar
Sun Y, Cheng H, Zhang S, Mohan MK, Ye G, De Schutter G. Prediction & optimization of alkali-activated concrete based on the random forest machine learning algorithm. Constr Build Mater. 2023;385:131519. https://doi.org/10.1016/j.conbuildmat.2023.131519.
Article Google Scholar
Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40. https://doi.org/10.1023/A:1018054314350.
Article Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324.
Article Google Scholar
Probst P, Wright MN, Boulesteix A. Hyperparameters and tuning strategies for random forest. WIREs Data Min Knowl Discov. 2019. https://doi.org/10.1002/widm.1301.
Article Google Scholar
Verma M. Prediction of compressive strength of geopolymer concrete using random forest machine and deep learning. Asian J Civ Eng. 2023;24(7):2659–68. https://doi.org/10.1007/s42107-023-00670-w.
Article Google Scholar
Li Y, Shen J, Lin H, Li Y. Optimization design for alkali-activated slag-fly ash geopolymer concrete based on artificial intelligence considering compressive strength, cost, and carbon emission. J Build Eng. 2023;75:106929. https://doi.org/10.1016/j.jobe.2023.106929.
Article Google Scholar
Ding Y, Wei W, Wang J, Wang Y, Shi Y, Mei Z. Prediction of compressive strength and feature importance analysis of solid waste alkali-activated cementitious materials based on machine learning. Constr Build Mater. 2023;407:133545. https://doi.org/10.1016/j.conbuildmat.2023.133545.
Article Google Scholar
Nguyen KT, Nguyen QD, Le TA, Shin J, Lee K. Analyzing the compressive strength of green fly ash based geopolymer concrete using experiment and machine learning approaches. Constr Build Mater. 2020;247:118581. https://doi.org/10.1016/j.conbuildmat.2020.118581.
Article Google Scholar
Peng Y, Unluer C. Analyzing the mechanical performance of fly ash-based geopolymer concrete with different machine learning techniques. Constr Build Mater. 2022;316:125785. https://doi.org/10.1016/j.conbuildmat.2021.125785.
Article Google Scholar
MathWorks, MATLAB, The MathWorks, Inc.
Ahmad A, Ahmad W, Aslam F, Joyklad P. Compressive strength prediction of fly ash-based geopolymer concrete via advanced machine learning techniques. Case Stud Constr Mater. 2022;16:e00840. https://doi.org/10.1016/j.cscm.2021.e00840.
Article Google Scholar
Nazari A, Sanjayan JG. Modelling of compressive strength of geopolymer paste, mortar and concrete by optimized support vector machine. Ceram Int. 2015;41(9):12164–77. https://doi.org/10.1016/j.ceramint.2015.06.037.
Article Google Scholar
Huang GB, Zhu QY, Siew CK. Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE international joint conference on neural networks (IEEE Cat. No.04CH37541). IEEE, pp. 985–990. https://doi.org/10.1109/IJCNN.2004.1380068.
Kumar Dash P, Kumar Parhi S, Kumar Patro S, Panigrahi R. “Efficient machine learning algorithm with enhanced cat swarm optimization for prediction of compressive strength of GGBS-based geopolymer concrete at elevated temperature. Constr Build Mater. 2023;400:132814. https://doi.org/10.1016/j.conbuildmat.2023.132814.
Article Google Scholar
Kuang F, Long Z, Kuang D, Liu X, Guo R. Application of back propagation neural network to the modeling of slump and compressive strength of composite geopolymers. Comput Mater Sci. 2022;206:111241. https://doi.org/10.1016/j.commatsci.2022.111241.
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature. 1986;323(6088):533–6. https://doi.org/10.1038/323533a0.
Article Google Scholar
Dong W, Huang Y, Cui A, Ma G. Mix design optimization for fly ash-based geopolymer with mechanical, environmental, and economic objectives using soft computing technology. J Build Eng. 2023;72:106577. https://doi.org/10.1016/j.jobe.2023.106577.
Article Google Scholar
Shahmansouri AA, Yazdani M, Ghanbari S, Akbarzadeh Bengar H, Jafari A, Farrokh-Ghatte H. Artificial neural network model to predict the compressive strength of eco-friendly geopolymer concrete incorporating silica fume and natural zeolite. J Clean Prod. 2021;279:123697. https://doi.org/10.1016/j.jclepro.2020.123697.
Article Google Scholar
Dubey SR, Singh SK, Chaudhuri BB. Activation functions in deep learning: a comprehensive survey and benchmark. Neurocomputing. 2022;503:92–108. https://doi.org/10.1016/j.neucom.2022.06.111.
Article Google Scholar
Koçak Y, Üstündağ Şiray G. New activation functions for single layer feedforward neural network. Expert Syst Appl. 2021;164:113977. https://doi.org/10.1016/j.eswa.2020.113977.
Article Google Scholar
F. Chollet, Deep learning with python. Manning, 2017.
Goodfellow I, Bengio Y, Courville A. Deep learning. London: The MIT Press; 2016.
Google Scholar
Kiliçarslan S, Celik M. RSigELU: a nonlinear activation function for deep neural networks. Expert Syst Appl. 2021;174:114805. https://doi.org/10.1016/j.eswa.2021.114805.
Article Google Scholar
Awoyera PO, Kirgiz MS, Viloria A, Ovallos-Gazabon D. Estimating strength properties of geopolymer self-compacting concrete using machine learning techniques. J Mark Res. 2020;9(4):9016–28. https://doi.org/10.1016/j.jmrt.2020.06.008.
Article Google Scholar
Levenberg K. A method for the solution of certain non-linear problems in least squares. Q Appl Math. 1944;2(2):164–8. https://doi.org/10.1090/qam/10666.
Article MathSciNet Google Scholar
Marquardt DW. An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math. 1963;11(2):431–41. https://doi.org/10.1137/0111030.
Article MathSciNet Google Scholar
Kanzow C, Yamashita N, Fukushima M. Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints. J Comput Appl Math. 2004;172(2):375–97. https://doi.org/10.1016/j.cam.2004.02.013.
Article MathSciNet Google Scholar
Moyano JM, Reyes O, Fardoun HM, Ventura S. Performing multi-target regression via gene expression programming-based ensemble models. Neurocomputing. 2021;432:275–87. https://doi.org/10.1016/j.neucom.2020.12.060.
Article Google Scholar
Koza JR. Genetic programming as a means for programming computers by natural selection. Stat Comput. 1994. https://doi.org/10.1007/BF00175355.
Article Google Scholar
Mazumder EA, Prasad Meesaraganda LV. Probabilistic estimation for mechanical properties of self-compacting geopolymer concrete using machine learning technique. Arab J Sci Eng. 2023;48(10):13591–604. https://doi.org/10.1007/s13369-023-07866-x.
Article Google Scholar
Mozumder RA, Laskar AI, Hussain M. Empirical approach for strength prediction of geopolymer stabilized clayey soil using support vector machines. Constr Build Mater. 2017;132:412–24. https://doi.org/10.1016/j.conbuildmat.2016.12.012.
Article Google Scholar
Vapnik V, Golowich S, Smola A. Support vector method for function approximation, regression estimation and signal processing. Adv Neural Inf Process Syst 9, 1996:281–287.
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A. A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing. 2020;408:189–215. https://doi.org/10.1016/j.neucom.2019.10.118.
Article Google Scholar
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Appl. 1998;13(4):18–28. https://doi.org/10.1109/5254.708428.
Article Google Scholar
Kumar A, Arora HC, Kapoor NR, Kumar K. Prognosis of compressive strength of fly-ash-based geopolymer-modified sustainable concrete with ML algorithms. Struct Concr. 2023;24(3):3990–4014. https://doi.org/10.1002/suco.202200344.
Article Google Scholar
Duxson P, Fernández-Jiménez A, Provis JL, Lukey GC, Palomo A, van Deventer JSJ. Geopolymer technology: the current state of the art. J Mater Sci. 2007;42(9):2917–33. https://doi.org/10.1007/s10853-006-0637-z.
Article Google Scholar
Thokchom S, Mandal K, Ghosh S. Effect of Si/Al ratio on performance of fly ash geopolymers at elevated temperature. Arab J Sci Eng. 2012;37(4):977–89. https://doi.org/10.1007/s13369-012-0230-5.
Article Google Scholar
Chen T, Guestrin C. XGBoost. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, USA: ACM, 2016, pp. 785–794. https://doi.org/10.1145/2939672.2939785
Pajankar A, Joshi A. Introduction to machine learning with Scikit-learn. In: Hands-on machine learning with python, Berkeley, CA: Apress, 2022, pp. 65–77. https://doi.org/10.1007/978-1-4842-7921-2_5.
Python Software Foundation, Python.
Roth AE. Introduction to the Shapley value. In: The Shapley Value, Cambridge University Press, 1988, pp. 1–28. https://doi.org/10.1017/CBO9780511528446.002.
Johnsen PV, Riemer-Sørensen S, DeWan AT, Cahill ME, Langaas M. A new method for exploring gene–gene and gene–environment interactions in GWAS with tree ensemble methods and SHAP values. BMC Bioinform. 2021;22(1):230. https://doi.org/10.1186/s12859-021-04041-7.
Article Google Scholar
Nazar S, et al. Estimation of strength, rheological parameters, and impact of raw constituents of alkali-activated mortar using machine learning and SHapely Additive exPlanations (SHAP). Constr Build Mater. 2023;377:131014. https://doi.org/10.1016/j.conbuildmat.2023.131014.
Article Google Scholar
Shah SFA, Chen B, Zahid M, Ahmad MR. Compressive strength prediction of one-part alkali activated material enabled by interpretable machine learning. Constr Build Mater. 2022;360:129534. https://doi.org/10.1016/j.conbuildmat.2022.129534.
Article Google Scholar
Ozcelikci E, et al. A comprehensive study on the compressive strength, durability-related parameters and microstructure of geopolymer mortars based on mixed construction and demolition waste. J Clean Prod. 2023;396:136522. https://doi.org/10.1016/j.jclepro.2023.136522.
Article Google Scholar
Provis JL, et al. RILEM TC 247-DTA round robin test: mix design and reproducibility of compressive strength of alkali-activated concretes. Mater Struct. 2019;52(5):99. https://doi.org/10.1617/s11527-019-1396-z.
Article Google Scholar
Geiger RS, et al. ‘Garbage in, garbage out’ revisited: What do machine learning application papers report about human-labeled training data? Quant Sci Stud. 2021;2(3):795–827. https://doi.org/10.1162/qss_a_00144.
Article Google Scholar

Download references

Acknowledgements

This research is funded by the Intra-Africa Mobility Scheme of the European Union in partnership with the African Union in the framework of the project 624204-PANAF-1-2020-1-ZA-PANAF-MOBAF under the Africa Sustainable Infrastructure Mobility (ASIM) scheme. Opinions and conclusions are those of the authors and are not necessarily attributable to ASIM. The work is part of collaborative research at the Centre of Applied Research and Innovation in the Built Environment (CARINBE).

Funding

The fund was provided by Intra-Africa Mobility Scheme of the European Union in partnership with the African Union (Grant No. 624204-PANAF-1-2020-1-ZA-PANAF-MOBAF) under the Africa Sustainable Infrastructure Mobility Scheme.

Author information

Authors and Affiliations

Department of Civil Engineering Science, Faculty of Engineering and the Built Environment, University of Johannesburg, Johannesburg, 2006, South Africa
Jabulani Matsimbe & Megersa Dinka
Centre for Applied Research and Innovation in the Built Environment (CARINBE), Faculty of Engineering and the Built Environment, University of Johannesburg, Johannesburg, 2092, South Africa
Jabulani Matsimbe & Innocent Musonda
Department of Mining Engineering, Malawi University of Business and Applied Sciences, P/Bag 303, Chichiri, Blantyre 3, Malawi
Jabulani Matsimbe
Department of Civil Engineering, Covenant University, Ogun State, 10 Idiroko Road, Ota, Nigeria
David Olukanni

Authors

Jabulani Matsimbe
View author publications
You can also search for this author in PubMed Google Scholar
Megersa Dinka
View author publications
You can also search for this author in PubMed Google Scholar
David Olukanni
View author publications
You can also search for this author in PubMed Google Scholar
Innocent Musonda
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.M.: Conceptualization, Methodology, Experiments, Software, Writing—original draft, Writing— review and editing, Formal analysis. M.D., D.O., I.M.: Supervision, Validation, Writing— review and editing, Project administration, Resources, Funding Acquisition. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jabulani Matsimbe.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Matsimbe, J., Dinka, M., Olukanni, D. et al. Fundamental machine learning algorithms and statistical models applied in strength prediction of geopolymers: a systematic review. Discov Appl Sci 6, 538 (2024). https://doi.org/10.1007/s42452-024-06244-y

Download citation

Received: 21 July 2024
Accepted: 01 October 2024
Published: 09 October 2024
DOI: https://doi.org/10.1007/s42452-024-06244-y

Fundamental machine learning algorithms and statistical models applied in strength prediction of geopolymers: a systematic review

Abstract

Article Highlights

1 Introduction

2 Research significance

3 Methodology

4 Results analysis