1 Introduction

Geopolymer concrete (GPC) has emerged as a sustainable alternative to conventional Portland cement-based concrete, offering promising environmental benefits. By reducing cement usage in concrete, GPC mitigates CO2 emissions, which contribute significantly to global greenhouse gas emissions (5–7%) [1] and anthropogenic CO2 emissions [2]. The substitution of cement with fly ash in GPC can further decrease CO2 emissions by 55 to 75% [3]. Additionally, GPC requires less energy compared to ordinary Portland cement (OPC), leading to significant resource savings [4]. Moreover, GPC demonstrates superior compressive strength and durability characteristics compared to OPC concrete, making it an attractive choice for sustainable construction practices [5].

Significant research has been devoted to identifying relationships between precursor composition and geopolymer concrete attributes (e.g. compressive strength); nevertheless, results have been inconclusive. Several studies have highlighted key parameters that significantly influence the properties of geopolymer concrete. Singh, et al. [6] emphasized the importance of SiO2/Al2O3, Na2O/SiO2, and Na2O/Al2O3 ratios in both fly ash and activator solution, showing their substantial impact on concrete properties. Additionally, Talha Junaid et al. [7] identified water-to-geopolymer solid, alkali liquid-to-fly ash, and alkali liquid-to-water ratios as critical factors affecting compressive strength. Rai et al. [8] underscored the activator-to-fly ash ratio as the primary determinant of compressive strength, with curing temperature and alkali hydroxide mixture molarity also playing significant roles in early strength development for geopolymer concrete. This can be attributed to the overwhelmingly large compositional degrees of freedom resulting from significant heterogeneity in fly ash (in terms of chemical composition [9], crystallinity, and particle size) and activators (in terms of their chemistry) [10], which have prevented the advancement of clear, (semi-) empirical rules that govern the fundamental linkage between composition and geopolymer concrete properties [11]. Researchers have employed a variety of methods to predict the compressive strength of geopolymer concrete, such as experimental trial and error, statistical techniques, graphical analysis, and machine learning methodologies. However, the experimental trial and error method, despite its widespread use, has notable drawbacks. It necessitates the extensive consumption of materials [12] and time to obtain the required mix proportion, and it lacks predictive models for future projections [13].

Statistical techniques analyze input parameters to establish correlations with compressive strength in mix design. For example, Kishore et al. [14] developed two regression models, one for high calcium fly ash-based geopolymer concrete and another for low calcium fly ash-based geopolymer and further extended to contour plots for future predictions. Similar approach was followed by Junaid et al. [15] where he developed some line graphs called G-Graphs for the determination of mix designs for geopolymer concrete. However accurate of the models depend on quality and quantity of representative data. Limited or insufficient data may restrict predictive capabilities. Creating precise statistical models requires expertise in statistical analysis and data interpretation. Model development is complex, time-consuming, and necessitates skilled researchers for calibration and validation. Linear models used in some methods assume linearity in geopolymer concrete properties but fail to capture non-linear behavior and complex interactions. Experimental testing is crucial to validate predicted mix designs, as statistical models serve as initial guidance and real-world validation ensures that performance criteria are met.

Graphical methods offer qualitative insights into the relationships between mix design parameters and geopolymer properties but lack precise numerical values or equations. Additional analysis or modeling may be needed for accurate predictions [16]. Subjectivity arises from visual interpretation, leading to varying conclusions. Graphical methods may struggle with complex interactions and non-linear relationships, necessitating advanced modeling techniques.

An extensive literature review reveals that the properties of fly ash play a crucial role in determining the properties of geopolymer concrete. For instance, Wang highlighted that the Si/Al ratio of fly ash-based geopolymer concrete significantly influences its final properties. Similarly, Singh et al. [6] noted that the R2O/Al2O3 ratio and SiO2/R2O ratio (R = Na+ or K+) also have a similar impact on the final properties of geopolymer concrete. Studies conducted by Hadi et al. [17] identified that the presence of amorphous SiO2, amorphous Al2O3, and the median particle size of fly ash greatly contribute to its reactivity in geopolymer concrete. Moreover, Gunasekara [18] found that the chemical composition of fly ash, unburnt carbon content, particle size distribution, and surface area of fly ash directly influence the compressive strength of the resultant geopolymer concrete.

However, existing predictive models for geopolymer concrete mix design have not fully integrated the specific characteristics of fly ash, thus limiting their ability to accurately capture the material's complex behavior and formulate precise mix designs. This identified gap presents a significant opportunity for further research and development. By incorporating the distinctive properties of fly ash into the mix design determination procedure, researchers can improve the accuracy and reliability of mix design predictions for geopolymer concrete.

Machine learning algorithms have the capacity to analyze vast amounts of data, enabling the identification of intricate patterns that may evade human detection [19]. This ability enhances the accuracy of predicting geopolymer concrete strength. Previous investigations have utilized various machine learning models, such as artificial neural networks (ANN), deep neural networks (DNN), residual neural networks (Resnet), adaptive neuro fuzzy interface systems (ANFIS), support vector machine (SVM), gene expression programming (GEP), extreme gradient boost (XGBoost) algorithm, and random forest (RF) to predict the compressive strength of geopolymer concrete, taking parameters such as chemical oxides percentages of fly ash, surface area of fly ash, NaOH concentration, Sodium Silicate content, Sodium Hydroxide content, curing time, curing temperature as input parameters, with reasonable success. In-depth information regarding the previous machine learning models and input parameters utilized to develop machine learning models to predict the compressive strength of geopolymer concrete can be found within the thorough review undertaken by Rathnayaka et al. [20]. However, to the best of authors’ knowledge, no study has attempted to use machine learning techniques to identify the input parameters that would produce a desired compressive strength. Such an approach is not readily available as machine learning, or any other model generally cannot identify several input variables which would produce a certain single output. In real-world applications, however, the goal is to identify the mix design combination necessary to achieve a required compressive strength. This study addresses this issue and develops and demonstrates a methodology that combines machine learning methods and fly ash properties to deliver realistic values for mix design variables based on a target compressive strength.

As the first step, this research compiles an extensive database that encompasses material parameters and mix design parameters sourced from the peer-reviewed literature (Appendix-Table 8). In parallel to the database development, a comprehensive literature review was conducted on machine learning models related to geopolymer concrete [20]. The review identified that, out of all the machine learning methods used in the literature, the ANN model was the most frequently utilized. Furthermore, it was noted that regardless of the dataset used to develop the model, the performance of the ANN model was consistently superior compared to other models considered. Based on these observations, an ANN model was developed as the base prediction model. This model relates the chemical composition of fly ash, NaOH solid content in the NaOH solution, silicon oxide, and sodium oxide solid content in the sodium silicate solution, total water content, curing time, and temperature to compressive strength. A clustering technique is employed to identify a set of input parameter values that roughly corresponds to a desired target strength. These parameter values are then further refined to match the target strength by simulating the ANN model with the objective of minimizing any mismatch. The effectiveness of the refined input parameters thus obtained are validated through laboratory experiments.

2 Geopolymer concrete database

A comprehensive literature review was conducted over the period from 1990 to 2022, encompassing studies on low calcium fly ash-based geopolymer concrete. The objective was to compile a database specifically focused on the compressive strength of such concrete at 7 and 28 days. To ensure the database's coherence and relevance, only concrete mixes consisting of 100% low calcium fly ash were included, while mixes involving mortar, paste, and blended materials were excluded. Initially, the database consisted of 596 data points. To refine the dataset, specific criteria outlined in Table 1 were applied. These criteria included adhering to ASTM C618 guidelines to select datasets featuring class F fly ash exclusively. Furthermore, minimum curing conditions were implemented to guarantee reliable strength attainment through appropriate curing practices. Following the application of these filtering criteria, any duplicated data sets were eliminated, resulting in a reduced and refined database containing 227 data points.

Table 1 Data filtering criteria for database refinement

The variations in compressive strength resulting from the utilization of different mold shapes and sizes in the conducted studies were standardized by converting them into 200 × 100 mm cylindrical samples, employing the adjustment factors outlined in Table 2.

Table 2 Normalization factors for shape and size correction [21]

The compressive strength measurements obtained at 7 days were transformed to their corresponding values at 28 days by employing a linear regression model (Fig. 1) to identify the relationship between 7-day and 28-day values using the data records where both the 7-day and 28-day strengths are available.

Fig. 1
figure 1

7-Day compressive strength to 28-day compressive strength conversion

To ensure the integrity of the data, any outliers present were identified and subsequently eliminated. This elimination process identifies data points that deviate from the mean of the corresponding parameter by a magnitude of three times the standard deviation as outliers. Figure 2 provides a summary of the detected outliers.

Fig. 2
figure 2

Detected outliers

The database underwent a reduction in size, resulting in a final dataset consisting of 188 data points after applying outlier refinement techniques. The reduction in data set size is primarily because past studies experimented with parameters outside their feasible ranges. This refined dataset, as presented in Table 3, served as the basis for ANN model development. Furthermore, to standardize the parameters, all values were normalized between 0 and 1 using Eq. (1). In the equation, \({x}_{i}\) represents the ith observation of the parameter x, while \({x}_{min}\) and \({x}_{max}\) denote the minimum and maximum values of the parameter \(x\), respectively [22].

Table 3 Database summary
$$Normalized x= \frac{{x}_{i}-{x}_{min}}{{x}_{max}-{x}_{min}}.$$
(1)

Table 3 includes a summary of the database which is beneficial for dealing with machine learning model development.

3 Development of mix design determination procedure

There are three main components in the mix design determination procedure. They are the artificial neural network model that relates the fly ash properties, activator properties and curing conditions to compressive strength, a clustering technique to group data records with similar compressive strength values and material properties, and an optimization algorithm. A user interface is also added to make exploring different mix designs easier. The procedure is illustrated in Fig. 3.

Fig. 3
figure 3

Schematic diagram of compressive strength prediction model

The model development process starts with creating an artificial neural network (ANN) model, which takes ten input parameters and predicts the compressive strength as its output. Once this model is trained and tested with a dataset, it can predict the compressive strength based on specific input parameters. Then, an initial combination of input parameters needed to achieve a specific target strength is found using a clustering method. Here, it is assumed that the users know the fly ash properties of the source they intend to use for the GPC. Clustering method serves to identify data records from the database that have similar properties as the source and has the same or close compressive strength. Once such a data record is found, it serves as a set of prior estimates of the mix design parameters. Making arbitrary changes to prior set to make them match the source property parameters can result in an undesired compressive strength. Therefore, the prior estimates need be carefully refined to match more closely to the source properties, while still keeping desired target strength intact.

The ANN model is used for this purpose. When small admissible changes are made to the prior set, the resulting compressive strength can be predicted with the ANN model. Thereby, the mix design parameter set which exactly/closely matches the source properties and the target compressive strength can be found. To do this changing and testing procedure efficiently, an optimization method is employed. There, the fly ash parameters are set to the source values and the rest of the parameters are subjected to admissible changes until the difference between the ANN predicted strength and the target strength is negligible.

Once the final set of parameters is selected, they are translated into measurable quantities that can be used for the mixing process. Finally, the results are validated through laboratory experiments, and the result may be added to the database.

3.1 ANN model development

In the initial phase of the model development, a three-layer ANN model was constructed, as illustrated in Fig. 3. The ANN model comprises an input layer, a hidden layer, and an output layer. To determine the influential parameters for the input layer, a combination of neighborhood component analysis (NCA) [23] and sensitivity studies [24] reported in the literature was conducted. Initially, Model 1 was developed using the most frequent parameters found in mix design development models in the literature [20] and act as a base model. Then, Model 2 was developed based on the feature selection used in the current study. The base model served as an indicator to evaluate the performance of Model 2. The selected parameters are provided in Table 4.

Table 4 Model input parameter identification

MATLAB 2022a was used to develop the ANN model with a varying number of hidden neurons from 5 to 15 combined with two learning algorithms, namely Bayesian regularization (BR) and Levenberg–Marquardt (LM) [25] to predict the compressive strength of geopolymer concrete. Furthermore, the transfer function in the hidden layer was set to hyperbolic tangent sigmoid transfer function (tansig) [13] while the transfer function in the output layer was kept at the default, which is linear transfer function (purelin).

In the context of training artificial neural network (ANN) models, the data splitting approach varies depending on the training algorithm utilized. When employing the Levenberg–Marquardt algorithm, the database is typically divided into three subsets: training, validation, and testing. This partitioning facilitates the model's training on the training set, evaluation of performance and tuning of hyperparameters on the validation set, and final assessment of model generalization on the testing set. However, for ANN models trained with the Bayesian regularization algorithm, a simpler data splitting approach suffices. In this case, only two subsets are required: the training set and the testing set. The training set is used to train the model, while the testing set is employed to assess the model's performance and generalization ability. The Bayesian regularization algorithm incorporates a probabilistic framework that inherently accounts for model complexity and generalization, thereby requiring fewer data splits compared to other training algorithms.

It was noted that the most common data division scheme used is 70% of the data for the training dataset, while the remaining data are allocated to the validation and testing. Models having both validation and testing datasets used equal splits of 15% for each [26], while models having either the validation or test set used the remaining 30% [22]. This study randomly selected 70% of the data points for the training and the remainder was equally split between the validation and test subset. Nevertheless, to facilitate a meaningful comparison of training algorithms, it was necessary to exclude the validation dataset used with the Levenberg–Marquardt algorithm when assessing the Bayesian regularization method. Utilizing consistent data subsets for both training algorithms enables a fair comparison of their performance, mitigating potential biases introduced by differences in data partitioning. By employing the same data subsets for training and testing, we ensure that both algorithms are evaluated under equivalent conditions, allowing for an objective assessment of their respective strengths and weaknesses. This approach enhances the reliability and validity of comparisons between training algorithms, facilitating informed decision-making regarding model selection.

Three commonly employed statistical error measures in machine learning models were utilized within the scope of this study, namely, coefficient of determination [13], mean absolute error [26], and root-mean-square error [27]. Equations (2), (3), and (4) are applied for the performance assessment of the ANN model, where \({y}_{a },\overline{{y }_{a}},\) \({y}_{p}{^\prime}, and n\) denote actual compressive strength, mean of actual compressive strength, predicted compressive strength, and mean of predicted compressive strength and sample size, respectively.

$$\text{Coefficient of determination}, {R}^{2}=1-\frac{{\sum }_{i=1}^{n}{\left({y}_{a}-{y}_{p}{^\prime}\right)}^{2}}{{\sum }_{i=1}^{n}{\left({y}_{a}-\overline{{y }_{a}}\right)}^{2}}$$
(2)
$$\text{Mean absolute error}, MAE=\frac{1}{n}\times {\sum }_{i=1}^{n}\left|{y}_{a}-{y}_{p}{^\prime}\right|$$
(3)
$$\text{Root}-\text{mean}-\text{square error}, RMSE=\sqrt{\frac{1}{n}\times {\sum }_{j=1}^{n}{\left({y}_{a}-{y}_{p}{^\prime}\right)}^{2}.}$$
(4)

Figure 4 compares model performance with the number of hidden neurons for each learning algorithm considered. The results showed that ten hidden neurons with Bayesian regularization algorithm yield the best accuracy for the given data set and resulted in the highest combination of R2 values of 0.860 for training and 0.852 for testing. Furthermore, the resultant RMSE and MAE values were 4.96 MPa and 3.55 MPa, respectively, for the training data set, while they are 5.89 MPa and 4.43 MPa, for the testing data set.

Fig. 4
figure 4

Model performance vs number of hidden neurons for each algorithm (R.2—Coefficient of determination, MAE mean absolute error, TR train, TE test)

Figure 4 illustrates the model performance of neural network with the Bayesian Regularization algorithm as the training function. It was observed that the training, testing, and all plots have almost identical performance yielding more than 85% prediction accuracy, which indicates a close relationship between the actual compressive strength and the predicted compressive strength (Fig. 5).

Fig. 5
figure 5

Model 2 performance: a Training correlation, b Testing correlation, and c All data correlation

3.2 Cluster tables development

Initially, the database used to develop the ANN model was clustered according to the compressive strength and developed four cluster tables and tabulated as 20–30 MPa, 30–40 MPa, 40–50 MPa, and 50–60 MPa strength classes. Subsequently, each strength class was sub clustered based on the weights of SiO2, Al2O3, Fe2O3, and CaO using hierarchical clustering, as shown in Fig. 6. Cluster determination has been carried out based on natural cluster cut-off method using a cut-off value of 0.8 and adjusted based on the visual observation. Cophenetic coefficient and inconsistent values were used for the cluster performance assessment. Furthermore, the mean value of each cluster was calculated and tabulated in ascending order as given in Table 8 in the appendix section. Cluster mean tables are then used to determine the prior estimates for NaOH solid (kg/m3), SiO2 solid in Na2SiO3 (kg/m3), Na2O solid in Na2SiO3 (kg/m3), total water (kg/m3), curing time (hrs.), and temperature (0C) based on SiO2 Al2O3 Fe2O3 and CaO weights in the fly ash. Moreover, the clustering process plays a pivotal role in discerning and isolating distinct clusters within a dataset, thereby facilitating the subsequent critical analysis aimed at evaluating the reliability and veracity of the underlying database.

Fig. 6
figure 6

Dendrograms for data clustering

Figure 7 outlines the initial parameter selection process using cluster tables. Based on the required compressive strength (T MPa) of the geopolymer concrete, the corresponding cluster table is identified. Each cluster table contains multiple clusters ranging from 1 to N, with each cluster comprising ten variables. Notably, four variables pertain to the chemical composition of fly ash: SiO2, Al2O3, Fe2O3, and CaO. To determine the best matching cluster for the fly ash of interest, the chemical composition parameters of interest (W1, W2, W3, and W4) are compared with those in the cluster tables (X1,n, X2,n, X3,n, and X4,n where 1 < n < N). Subsequently, error values (E1,n, E2,n, E3,n, and E4,n) are calculated for each of the four parameters. The average error (An) for each cluster is then computed, with the cluster exhibiting the lowest average error selected as the optimal match for the fly ash of interest. From the chosen cluster, initial values for additional parameters, such as NaOH solid, SiO2 in Na2SiO3, Na2O in Na2SiO3, total water, curing time, and curing temperature, are determined as X5,n, X6,n, X7,n, X8,n, X9,n, and X10,n. respectively. This method facilitates the systematic determination of initial parameter values crucial for the subsequent stages of geopolymer concrete mix design and optimization.

Fig. 7
figure 7

Initial parameter selection using cluster tables

3.3 Parameter optimization

The initial values of the parameter combination, obtained from the cluster table, underwent further refinement with the ANN model until the model prediction aligned with the target strengths. The parameters W1–W4 were fixed for a specific source, while the parameters X5,n–X10,n were allowed to vary within a predefined range. This is achieved through an optimization algorithm, where the objective function defined as in Eq. (5) was minimized. The parameter a2 shown in Fig. 8 denotes output of the trained ANN model where LW –layer weights, IW- input weights, b1 bias for input layer, and b2 bias for hidden layer. With X1–X4 taken as W1–W4, the objective function in (5) a function of X5 –X10 only. The idea is, therefore, to search for X5–X10 values within the predetermined ranges around X5,n–X10,n that would produce the target strength. Predetermined ranges of the parameters were retained between 5% of initial values prior values.

Fig. 8
figure 8

Artificial neural network diagram for developing optimization equation [28]

$$Objective function = absolute (Target strength - LW \times \left(\frac{2}{1+exp\left(-2 \left(IW\times X+b1\right)\right)-1}+b2\right).$$
(5)

Optimization was done using the default solver in MATLAB 2022a “fmincon” with the “lbfgs” Hessian approximation for the derivatives.

3.4 User interface development

The user interface depicted in Fig. 9 showcases the functionality of the MATLAB application in obtaining user-defined input parameters and predicting the mix proportions for geopolymer concrete. The process begins with the program prompting the user to input the desired target strength and the percentages of chemical oxides (SiO2, Al2O3, Fe2O3, and CaO) present in the fly ash, as displayed in Fig. 9a. Upon receiving this input, the program automatically selects a predetermined value for the fly ash weight (kg/m3) based on the target strength and converts the oxide percentages into oxide weights (kg/m3). Using the obtained oxide weights of the fly ash, the program proceeds to determine the closest cluster by considering the average error between the actual oxide values and the values of the cluster. This selection process is followed by the prediction of the compressive strength using the ANN for the chosen combination of parameters, as illustrated in Fig. 9b. Furthermore, advance users can turn on the radio button and enter values of their own overriding the mix design determination procedure-suggested cluster values. If there is a discrepancy between the predicted and target compressive strength, the program optimizes the values of NaOH solid (kg/m3), SiO2 solid in Na2SiO3 (kg/m3), Na2O solid in Na2SiO3 (kg/m3), total water (kg/m3), curing time (hrs.), and temperature (0C) within a predefined range of parameters using the optimization routine and is as shown in Fig. 9c. This optimization process continues until the predicted compressive strength aligns with the target compressive strength. Even in the optimization process, advanced users can override the mix design determination procedure-defined parameter rangers for the optimization by toggling the radio button on.

Fig. 9
figure 9

User interface for mix design calculation: a obtaining user input, b selecting optimum cluster, c parameter optimization, d material density input, e mix design output, and f parameter sensitivity

Finally, to predict the mix proportion values, the program prompts the user to input the density of fly ash, fine aggregates, coarse aggregates, Na2SiO3 solution, NaOH solid, and the percentages of SiO2, Na2O, and water in the Na2SiO3 solution, as demonstrated in Fig. 9d. These input values are utilized to determine the mix proportions for the geopolymer concrete as shown in Fig. 9e. However, this application can be made more versatile for advanced users by using graphical interfaces as shown in Fig. 9f to understand the influence of each component of the mix design on the compressive strength of the final mix. However, one must keep in mind that allowing too large deviations from the prior estimates can provide unreliable outputs.

4 Design of mix proportions and mix design determination procedure validation

For mix design determination procedure validation, three different sources of fly ash were used, with target compressive strengths of 25, 32, and 40 MPa at 28 days. The verification procedure is illustrated by a sample calculation pertaining to Grade 25 concrete utilizing a class F fly ash from Norochcholai power plant Sri Lanka, though this is readily applicable to any class F fly ash from any location. The process commenced by subjecting the fly ash samples collected from the three power plants to X-ray fluorescence analysis for the determination of their chemical composition. The results are shown in Table 5. Subsequently, following the target compressive strength of 25 MPa, the corresponding quantity of fly ash needed to achieve this specific strength class was ascertained by employing the values derived from the data analysis. Additionally, the weight percentages of the various oxides present in the fly ash were converted into mass units (kg/m3) for further calculations. To derive the prior estimates combination for the concrete mixture targeting 25 MPa, a cluster table corresponding to this strength class was selected. From this cluster table, the cluster exhibiting the lowest average error was identified and chosen as the initial set of parameters for the concrete mix, as presented in Table 6.

Table 5 Chemical composition of fly ash
Table 6 Parameters for optimization algorithm (M25 mix design)

Table 5 presents the optimized parameters obtained by calculating the minimum and maximum values of NaOH solid (SH), SiO2 solid in Na2SiO3 (SiO2_S), Na2O solid in Na2SiO3 (Na2O_S), total water, curing time, and curing temperature based on the initial cluster values.

Na2SiO3 Solution (Grade D):

$$\begin{aligned} {\text{Water weight in Na}}_{2} {\text{SiO}}_{3} {\text{ Solution }} = & { }\frac{{{\text{SiO}}_{2} {\text{ solid wt}} + {\text{ Na}}_{2} {\text{O solid wt}}}}{{{\text{SiO}}_{2} {\text{ solid \% }} + {\text{ Na}}_{2} {\text{O solid \% }}}} \times {\text{water \% in Na}}_{2} {\text{SiO}}_{3} \\ = { 57}.{\text{92 kg}}/{\text{m}}^{{3}} . \\ \end{aligned}$$
$$\begin{aligned} {\text{Na}}_{2} {\text{SiO}}_{3} {\text{ solution weight }} = { } & {\text{Water weight in Na}}_{2} {\text{SiO}}_{3} {\text{ Solution }} + { }({\text{SiO}}_{2} + {\text{ Na}}_{2} {\text{O}}){\text{ solid weight}} \\ = { 57}.{92 } + { 3}0.{5 } + { 15}.{2} \\ = { 1}0{3}.{\text{6 kg}}/{\text{m}}^{{3}} . \\ \end{aligned}$$

(a) Extra water:

$$\begin{aligned} {\text{Extra water }} = \, & {\text{Total water}}{-}{\text{Water weight in Na}}_{{2}} {\text{SiO}}_{{3}} {\text{Solution}} \\ = { 115}.{9 }{-}{ 57}.{92} \\ = { 57}.{\text{97 kg}}/{\text{m}}^{{3}} . \\ \end{aligned}$$

(b) Aggregate content:

$${\text{V}}_{\text{Fly ash}}+{\text{V}}_{\text{Na}2\text{SiO}3\text{ Solution}}+{\text{V}}_{\text{NaOH solid}}+{\text{V}}_{\text{Extra water}}+{\text{V}}_{\text{Sand}}+{\text{V}}_{\text{Coarse}}=1$$
$$\frac{{\text{V}}_{\text{Coarse}}}{{\text{V}}_{\text{Coarse}}+{\text{V}}_{\text{sand}}}=0.65.$$

After solving: Sand weight: 626.7 kg/m3 and Coarse weight: 1166 kg/m3 [29].

Similarly, the all the mix proportions for 25, 32, and 40 MPa are tabulated in Table 7.

Table 7 Mix proportions for geopolymer concrete

Sodium hydroxide pellets and sodium silicate solution (Na2O = 14.7% and SiO2 = 29.4% by mass, specific gravity = 1.5) were used as the alkaline activator in geopolymer production. River sand (specific gravity = 2.5) served as the fine aggregate, while 10 mm crushed granite aggregate (specific gravity = 2.7) was used as the coarse aggregate in the concrete mix. Potable water was used for mixing. A 100 L concrete mixer combined fly ash, sand, and coarse aggregates for 4 min, followed by the addition of alkaline activator and water for an additional 8 min, resulting in a well-combined, nonsegregated concrete mix. The concrete mix was poured into 100 mm × 100 mm × 100 mm cubical molds and compacted using a vibration table for 1 min to eliminate air bubbles to cast 8 number of specimens out of which 4 is used to evaluate 7-day compressive strength and 4 is used to evaluate 28-day compressive strength. After 24 h of ambient curing at 24 °C, the specimens were heat cured at the program-recommended temperature and duration. Upon demolding, the labeled specimens were stored under laboratory conditions (24 °C) until the 28-day testing period. Compressive strength testing was performed according to ASTM C109/C109M standard using a concrete testing machine. Three specimens were tested at each interval at a loading rate of 0.34 MPa/s until failure.

Figure 10 illustrates the compressive strength development of nine distinct geopolymer concrete mixtures for each strength class considered, for each fly ash source. Remarkably, all three fly ash sources successfully attained their respective targeted compressive strengths at 28 days. The least error between the actual compressive strength obtained from laboratory experiments and the target strength was achieved for the Norochcholai fly ash, with error values of 4.7%, 0.65%, and 6.75% for grade 25, 32, and 40 concrete, respectively. Conversely, the highest errors were recorded for Port Augusta fly ash, with error values of 25.07%, 14.17%, and 18.87% for grade 25, 32, and 40, respectively. Despite the recorded errors, all mix designs were able to achieve higher compressive strength than their corresponding target values. These experimental observations provide strong support for the reliability of the mix design procedure developed, and the agreement between predicted and actual compressive strength values further validates the effectiveness of the approach outlined in this study. The findings from this research both contribute to the understanding of the critical parameters in the design of geopolymer concretes and the optimization of geopolymer concrete mixes containing fly ash, thus offering potential applications in construction projects that require specific compressive strength with a specific sources of fly ash based on the location and availability in the region of the construction project.

Fig. 10
figure 10

Experimental results

5 Summary and conclusion

This study developed a mix design determination procedure for geopolymer concrete mix proportions using a specific fly ash source to achieve a specified strength. It combined artificial neural network, data clustering, and parameter optimization. The mix design determination procedure’s accuracy was validated through rigorous experimentation, comparing predicted and experimental compressive strength values. The results showed that the mix design determination procedure accurately predicted the target strength, validating the reliability for mix design specification. The main conclusions of the study are as follows:

  1. 1.

    The developed predictive mix design determination procedure for geopolymer concrete showcases a notable advancement in accurately determining mix proportions for concrete grades 25, 32, and 40 MPa. Among the various fly ash sources studied, Port Augusta fly ash exhibited the highest deviation from the desired target value, particularly evident in the grade 25 MPa mix design, with a maximum deviation of 25.07%. Similarly, for grade 32 and 40 MPa mixes utilizing Port Augusta fly ash, deviations of 14.17% and 18.87%, respectively, were observed. In contrast, mix designs incorporating Norochcholai fly ash demonstrated comparatively lower deviations. For grade 25, 32, and 40 MPa mixes, discrepancies of 4.03%, 0.07%, and 5.07%, respectively, were recorded from the target values.

  2. 2.

    Data clustering facilitates the identification of relevant mix design parameters associated with specific fly ash types and target compressive strength, streamlining the mix design process and allowing for efficient analysis of pertinent data subsets. Furthermore, the data clustering process offers a valuable means to pinpoint regions within a dataset characterized by sparse data points, potentially raising concerns about data reliability. These clusters with only a single data point each can be subject to closer scrutiny during experimental programs to ascertain the precision and robustness of the mix design determination procedure. Moreover, the program itself functions as a data generator, actively generating new data points when encountering unrepresented combinations not found in the existing database. These generated data points are subsequently utilized to update the database with the corresponding experimental outcomes, enhancing the mix design determination procedure’s comprehensiveness and adaptability.

  3. 3.

    The incorporation of chemical oxide content in fly ash, along with variables such as curing time, curing temperature, and activator properties, into the predictive model has yielded substantial enhancements in prediction accuracy. Specifically, it has led to a noteworthy improvement of 68.3% in test performance and a commendable 15.9% enhancement in train performance. These findings underscore the pivotal role played by these factors in the realm of geopolymer concrete mix design, highlighting their significance in achieving precise and dependable results. Furthermore, the inclusion of chemical oxide content of fly ash in the model opens novel avenues for investigating the possibilities of blended mixtures involving fly ash. This avenue of research holds promise for enhancing the properties of the resulting concrete, potentially leading to the development of innovative concrete formulations with improved characteristics.

  4. 4.

    The mix design determination procedure demonstrates strong generalization capabilities when tested with entirely new sources of fly ash that are not present in the original database. For example, Norochcholai fly ash, a novel source excluded from the mix design determination procedure’s development dataset, serves as a rigorous test case. Evaluating the mix design determination procedure performance with such unseen data validates reliability and applicability in real-world scenarios beyond the confines of the initial dataset. This ability to accurately predict outcomes with new and diverse fly ash sources enhances the mix design determination procedure’s utility and trustworthiness for geopolymer concrete mix design across a wide range of applications.

  5. 5.

    Training ANN using the Bayesian Regularization algorithm optimizes the performance of the artificial neural network model, surpassing the Levenberg Marquardt algorithm with 9.3% increment in training accuracy and 7.1% in testing accuracy which showcase the ability of Bayesian Regularization algorithm to achieve better generalization capability.

  6. 6.

    Parameter optimization proves instrumental in formulating precise mix proportions, ensuring that the desired target strength is achieved in an economical manner while minimizing material waste and cost.

  7. 7.

    The user interface developed for the mix design determination procedure enables easy manipulation of mix designs, catering to users with varying levels of expertise. Advanced users can explore additional options to gain deeper insights into geopolymer concrete characteristics.

6 Appendix

See Tables 8 and 9.

Table 8 Mix design data base for fly ash geopolymer concrete
Table 9 Cluster tables