Modeling the compressive strength of concrete containing waste glass using multi-objective automatic regression

Some grades of municipal and industrial waste glass (WG) discarded in landfills can cause environmental issues. One of the efficient solutions to overcome this issue is to use WG in concrete mixtures as aggregate or supplementary cementitious materials. Modeling the compressive strength (CS) of the produced concrete using machine learning methods can provide helpful insights into the effects of WG on concrete properties. In this study, a comprehensive database of concrete containing WG (CCWG) was gathered from 24 peer-reviewed papers. Two different scenarios were considered in the selection of input variables, and a novel machine learning method, called multi-objective multi-biogeography-based programming, was used to predict the CS of CCWG. This algorithm can automatically select the effective input variables, the structure of the equations, and its coefficients. Moreover, the proposed model optimizes the precision and complexity of the developed models simultaneously. The definition of complexity in the optimization problem can help achieve different mathematical equations with various accuracies and assist users in predicting the CS of CCWG even with a limited number of optimal input variables. The results show that the proposed algorithm can introduce several equations with different accuracies, complexities, and input variables to predict the CS of CCWG.


Introduction
The excessive consumption of natural resources in concrete production, the emission of carbon dioxide from Portland cement production, and disposing of waste materials in landfills are the main sustainability concerns in the concrete industry, which threaten the environment [1,2]. There have been many endeavors to increase the environmental performance of the concrete industry by using waste materials as partially replaced natural aggregate and cement in the concrete mixture [3,4]. Some of these waste materials, including recycled aggregate, have been used in concrete to reduce the consumption of natural materials and prevent their landfills without improving the mechanical properties and durability behavior of the produced concrete [5][6][7]. On the other hand, other waste materials, consisting of silica fume and blast furnace slag, participate in chemical reactions leading to stronger cement hydration products [8][9][10]. The use of all these waste materials in concrete can help move toward sustainability.
Glass is one of the high-demanding materials and consumes a substantial amount of natural resources, including sand, limestone, and soda ash [1,11,12]. By increasing industrialization, the consumption of glass in the industry and household has grown significantly, leading to increased waste glass (WG) production [11]. It is estimated that almost 200 million tons of WG are produced each year worldwide [13]. Although some WG is recycled into glass containers, a considerable amount of WG cannot be recycled because of the high recycling costs, color mixing, and high breaking potential [14,15]. The discarded WG into the landfills causes a significant problem from the environmental viewpoint of inappropriate land use and & Emadaldin Mohammadi Golafshani e.mohammadi_golafshani@unsw.edu.au Alireza Kashani ali.kashani@unsw.edu.au pollution of the soil and water. This non-biodegradable material can be used as coarse aggregate, fine aggregate, and powder in the concrete mix design. However, more endeavors have been paid to use WG as fine aggregate and powder in the concrete mixture. The reason is that using WG as coarse aggregate in concrete decreases the quality of concrete [16]. The compressive strength (CS) of concrete containing WG (CCWG) has been investigated by many researchers. In the case of using WG as a powder in concrete, it can have two opposite effects on the CS. First, the WG powder can weaken the strength and aggregate-paste bond. Second, it can have a pozzolanic reaction and make a silicate gel by reacting with the calcium hydroxide produced by the hydration process [17]. Based on these two mechanisms, the experimental results have been contradictory. However, a few researchers have claimed that using the WG powder reduces the bond of aggregate with the cement matrix and, therefore, decreases the CS of CCWG [16,[18][19][20]. On the other hand, several researchers have stated that the CS of concrete containing WG powder increases especially in the long term due to the formation of denser microstructure compared to concrete without WG powder [21][22][23][24]. In terms of WG fine aggregate (WGFA), the effect of fine aggregate on the CS of CCWG has been contradictory. Some researchers have found that the CS of concrete containing WGFA increases because of the better bonding of the surface texture of WGFA with the cement paste than the concrete without WGFA [20,25]. On the other hand, several researchers have reported that the WGFA harms the CS. They have attributed the reduction of CS to the decreasing of concrete density, weaker bonding of the smooth surface of WGFA, the lower elastic modulus of WGFA compared to the natural aggregates, and the existence of lead in the WGFA [14,26,27].
Many attempts have been conducted to use several machine learning methods in modeling the CS of concrete containing waste materials [10,[28][29][30][31][32][33][34][35][36][37][38][39][40][41][42]. Different machine learning methods are available, including artificial neural networks, fuzzy systems, decision trees, and automatic regression methods. For predicting the different properties of various types of concrete containing WG, a limited number of studies have been conducted. Ray et al. [43] developed an artificial neural network model to predict the compressive and tensile strengths of concrete containing WR and tin can fiber based on their experiments. Ghorbani et al. [44] used an artificial neural network to model the dynamic characterization of recycled glass-recycled concrete according to their experiments. As can be observed, the numbers of data used for developing these models are small, and because of the existence of other types of components in the concrete mixtures of these studies, the applications of the developed models are limited. The automatic regression-based machine learning methods extend the metaheuristic optimization algorithms. They can automatically find the structure of the mathematical equation, its corresponding coefficients, and the effective input variables. Biogeography-based programming (BBP) is a relatively novel automatic regression method, which has been successfully employed in several practical applications, including the prediction of the CS of concrete containing silica fume [42], estimation of the climbing rate of slip formwork systems [45], prediction of the shear strength of concrete beams reinforced by fiber-reinforced polymer bars without transverse reinforcement [46], the formulation of the elastic modulus of recycled aggregate concrete [7], self-compacting concrete [47], and the dynamic modulus of asphalt mixture [48].
In this study, a comprehensive database consisting of 830 data records of CCWG was gathered from the literature. To model the CS of this type of concrete, various mathematical equations with different accuracies and complexities can be generated with the assumption of different control parameters for the BBP. To overcome this issue, the BBP was seen as a multi-objective optimization problem that was solved by a metaheuristic optimization algorithm called biogeography-based optimization. The proposed procedure can lead to developing several formulas for predicting the CS of CCWG with different precisions, complexities, and input variables.
2 Developed soft computing models

Biogeography-based optimization (BBO) algorithm
Biogeography-based optimization (BBO) algorithm is a metaheuristic optimization method inspired by the distribution of species among islands in an ecosystem [49]. Several factors, including topographic features, temperature, humidity, and climate, can affect the distribution of species in a nature biography so that more species live in high-quality islands, and they tend to migrate to low-populated islands. In the BBO, each possible solution (PS) and its related decision variables are simulated by an island and its characteristics, respectively. Each PS is evaluated by an objective function in which the lower objective value of a PS shows its better performance in a minimization problem. The quality of each PS is called habitat suitability index (HSI) in such a way that a high-quality PS has higher HSI. There are two mechanisms to look for the optimal solution in the BBO, including exploration and exploitation carried out by mutation and migration operators, respectively. For the migration operator, two emigration and immigration probabilities are assigned to each PS, which are the probabilities of migration from and to the PS, respectively. In this regard, all existing PSs are ranked in each epoch from the worst to the best ones based on their HSI, and more emigration and fewer immigration probabilities are assigned to a PS with higher HSI. Figure 1 depicts the linear migration curve, including emigration and immigration curves in which the E and I are the maximum values of emigration and immigration probabilities, respectively. These values are assumed one in this study. The emigrated and immigrated PSs are selected according to their assigning immigration and emigration probabilities and the roulette wheel selection mechanism. Then, to generate the migrated PS (MPS i ), the migration is carried out from the emigrated PS (EPS i ) to the immigrated PS (IPS i ) using the following equation: where RDV is a random decision variable, and a is a random number between zero and one. It means that the values of decision variables of MPS change between those of EPS and IPS. The mutation operator is a probabilistic procedure that avoids trapping the BBO in the local optimum and enhances the diversity of PSs in the algorithm. Having a mutation probability, this operator gives this chance to PSs with low and high HSIs to be improved by modifying their decision variables randomly. Figure 2 summarizes the different steps of the BBO algorithm.

Multi-objective biogeography-based optimization (MOBBO) algorithm
From the viewpoint of objective function number (OFN), the optimization problems are divided into single and multi-objective optimization problems. In the single-objective problem, the quality of the two PSs can be easily assessed by comparing their objective function values. In the case of a multi-objective problem, the comparison of PSs is more complicated. The dominance concept is used to compare the performance of PSs in this type of problem.
where z k i and z k j are the kth object function values related to the ith and jth PSs, respectively. Indeed, there are several optimal solutions in the multi-objective optimization problem, which make an optimal Pareto front (PF). In the MOBBO, the emigration and immigration probabilities of each PS are specified based on their HSIs, and the ranking of PSs in each epoch is determined using two following strategies: • All non-dominated PSs of each epoch are determined, which make the first PF, called optimal PF. Then, the members of this PF are disregarded, and the nondominated PSs of the remained PSs are specified, which build the second PF. This process will continue until all possible PFs are obtained, and each PS has its own PF. The members of the first PF have the top priority compared to the PSs of other PFs, and the members of the last PF have the least priority. • Motivated by the non-sorting genetic algorithm II (NSGA-II) [50], all members of each PF are ranked using the crowding distance (CD) measure, defined as follows: where CD p i is the CD of the PS i in the pth PF. z k;p max and z k;p min are the maximum and minimum objective function values of the kth function for PSs in the pth PF, respectively. Moreover, z k i;next and z k i;former are the closest next and former neighbors of the PS i from the viewpoint of objective function values, respectively. To have more uniform PFs, the PSs with the higher CDs have more priority than PSs with lower CDs in each PF, and more emigration probabilities are assigned to them. However, the emigration probabilities of PSs in the high-ranked PF are higher than those in the low-ranked PF. The flowchart of the MOBBO algorithm is given in Fig. 3.

Multi-biogeography-based programming (MBBP) algorithm
Biogeography-based programming (BBP), as a branch of automatic regression, mimics the BBO algorithm in regression problems. This method looks for the best mathematical equation, which automatically maps the  The flowchart of the MOBBO algorithm input of a system on its output(s) [51]. In the BBP, an island is a mathematical equation with a tree structure. The first node located on the top of a tree structure is called the root node, while the nodes in the lowest positions of a tree are called leaf nodes. The maximum tree depth is defined as the longest path from the root node to the leaf nodes. The root and intermediate nodes can be mathematical functions defined by the user, and variables and constants can be assigned to the leaf nodes. Moreover, a root node can be a variable or a constant with a tree depth of zero. Figure 4 illustrates an island with a tree structure which represents the mathematical equation of To evaluate an island, different statistical indicators can be used, defined later. Whatever the predicted values of an island are closer to the observed results, the prediction error of the related mathematical equation is lower, and the corresponding HSI would be higher.
In the multi-BBP (MBBP) algorithm, an archipelago is defined as several islands which combine as a linear combination. Figure 5 reveals an archipelago with two islands with the maximum tree depth of three. This archipelago represents the mathematical equation y ¼ a 0 þ a 1 y 1 þ a 2 y 2 in which the regression coefficients (a 0 , a 1 , and a 2 ) are optimized in such a way that the prediction error of the corresponding mathematical equation is minimized. The root-mean-square deviation was used as the cost function in this study, defined later.
Based on the obtained HSIs of all archipelagos, their emigration and immigration probabilities are calculated. To migrate from an emigrated archipelago to an immigrated one, a random sub-tree (ST) of a random island of the immigrated archipelago is selected and is replaced by a random ST of a random island of the emigrated archipelago. In the case of mutation, an archipelago is chosen randomly from the existing archipelagos in each epoch based on the mutation probability. Then, a random ST is selected from a random island of the chosen archipelago, and it is replaced by a randomly generated ST. It is noteworthy that the maximum depth of the generated archipelagos should be less than or equal to the defined maximum tree depth in both migrated and mutated archipelagos. Figure 6 schematically demonstrates the migration process between archipelagos.

The proposed multi-objective MBBP (MOMBBP) algorithm
The MBBP has the ability to find the relationship between the input and output variables automatically. Unlike the conventional regression method in which the regression type should be set at first, the MBBP looks for the regression type using the predefined functions, as well as the required coefficients simultaneously. However, finding a suitable mathematical equation can be considered as a multi-objective optimization problem. The first objective is the accuracy of the developed mathematical equation, and the second one is the complexity of the developed archipelago. The statistical indicators determine the accuracy of a developed model. At the same time, the number of nodes (i.e., number of input variables, constants, and functions) in the related archipelago specifies the complexity of a model. For an archipelago with the maximum number of islands MNI, the maximum tree depth MTD and the maximum argument number of functions MANF, the maximum complexity is MNI MCNF MTDÀ1 À Á . In general, increasing the complexity of an archipelago can raise the training capacity of the MBPP, and it may reduce the generalization of the developed model. Moreover, fewer input variables participate in simple models, which shows their importance compared to other input variables. Additionally, more variables appear in the most complicated MBBP models, and the produced models can have more accuracy. In the proposed MOMBBP, different mathematical equations are developed with various architectures and input variables. In this regard, the MOBBO algorithm is used to find the optimal Pareto front. The pseudo-code of the MOMBBP is shown in Fig. 7, and more details are given below.

Phase 1
The gathered dataset is divided randomly into training and testing datasets which the former is used to  train the MOMBBP, and the latter is served to verify the developed models.
Phase 2 Prior to running the proposed algorithm, all control parameters related to the proposed MOMBBP algorithm are adjusted. These parameters include the training dataset ratio (n train ), the testing dataset ratio (n test ), the number of archipelagos (n archipelagos ), the maximum number of generation (n generation ), the maximum number of algorithm repetition (n repetition ), the maximum number of islands (n islands ), the maximum tree depth (d tree ), the used functions, the mutation probability (p mutation ), the range of constants (r constants ).

Phase 3
The archipelagos are created randomly. To generate an archipelago, a random integer between 1 and n islands is selected as the number of its islands, and the islands are produced using the ramped half-and-half method. This method is a compound of growing and full methods. In the grow method, a root node is randomly chosen as a function or terminal (variables and constants). If a terminal is chosen, it will be a leaf; otherwise, a randomly used function is assigned to this node, and the number of branches of the selected function is determined based on the number of its arguments. This procedure continues for the generated nodes of the lower levels of the tree. If a node reaches the depth of d tree , a random terminal is assigned to it. In the case of the full method, a random function is assigned to each node from the root to the nodes with the tree depth of less than d tree , and the random terminals are assigned to the nodes in the lowest positions. This method allows the tree to grow entirely as far as it can. In the ramped half-and-half method, half of the archipelagos are generated using the growing method, while the others are created using the full method.
Phase 4 Using the training dataset, the regression coefficients of archipelagos are determined, and their HSIs are calculated using the statistical indicators. Moreover, the number of nodes used in the archipelagos is calculated as their complexities.
Phase 5 Having the HSIs and the complexities of all archipelagos, the non-dominated archipelagos are specified. Then, the CDs of all archipelagos are determined, and all mathematical equations are sorted from the best to the worst based on their HSIs at first and then according to their CDs. After ranking the archipelagos, their emigration and immigration probabilities are calculated.
Phase 6 In this phase, migration and mutation operators are used to generate the new archipelagos. In this regard, an archipelago is selected based on its immigration probability. Then, an emigrated archipelago is chosen according to its emigration probability and using the roulette wheel selection mechanism. Based on the predefined p mutation , the mutation is carried out to increase the variety of archipelagos.

Phase 7
The HSIs and complexities of the generated archipelagos are calculated. Then, the old and new archipelagos are merged, their non-dominated solutions and SDs are determined and finally ranked from the best to the worst archipelagos. Next, the n best archipelagos are chosen as the main archipelagos, and the rest are discarded.
Phase 8 The 7 and 8 phases are repeated iteratively until the termination conditions of the MOMBBP algorithm are satisfied. To do so, N generation was used in this study. After finishing the algorithm's runs, the best archipelagos, which are the optimal Pareto front, are saved and used for more investigations.

Data collection
Collecting a comprehensive and credible database is necessary to model the CS of CCWG. In this regard, a total number of 830 data records from 24 peer-reviewed journals were used to develop the MOMBBP model [21-23, 25, 52-71]. In this study, two scenarios were considered to select input variables of the model.  Table 1. Among 830 data records, 179 and 211 records include BFS and FA in the concrete mixture, respectively. Additionally, there are different types of standard specimens for measuring the CS of CCWG in the database, including cubic specimens with 100 mm and 150 mm and cylindrical specimens with dimensions of 100 mm 9 200 mm and 150 mm 9 300 mm. The CS of all specimens with different shapes and sizes was converted to cylindrical specimens with dimensions of 100 mm 9 200 mm based on the conversion equations given in [72]. Meanwhile, the NFA was entirely replaced by the WG in the concrete mix design of 47 data records.

Results and discussion
To assess the accuracy of the developed models and compare their performance, several statistical indicators were served, including root-mean-square deviation (RMSD), relative RMSE (RRMSD), mean absolute deviation (MAD), coefficient of determination (R 2 ), correlation coefficient (R), and OBJ, given in the following: where PV i and ER i are the predicted value and experimental results of the ith data record, respectively. ER is the average value of the experimental results. n, n train , n test are the number of data records, the number of data records of the training dataset, and the number of data records of the testing dataset, respectively. The OBJ value is a combinational merit that simultaneously considers three statistical indicators for the training and testing datasets.   [35,73]. Before running the proposed MOMBBP algorithm, the control parameters defined earlier should be set. Table 2 shows the control parameters adjusted for the MOMBBP algorithm in this study which were selected based on trial and error.
After running the proposed method ten times, the optimal PF with the best performance was achieved. Figure 8 depicts the best optimal PF for two scenarios in which the horizontal and vertical axes are the complexity and statistical indicator (i.e., RMSD and OBJ), respectively. As expected, the prediction error decreases when the complexity of models increases. The complexities of the optimal models in the first and second scenarios are in the range of [1,108] and [1,65], respectively. The numbers of optimal solutions of the first and second scenarios are 74 and 53, respectively. The best solutions, illustrated as solidfilled markers, have the R equal or greater than 0.85, and their numbers are 65 and 46 for the first and second scenarios, respectively.
The simplest optimal models in both scenarios consist of only one node, which is the root node. These nodes are, respectively, C and W/B in the first and second scenarios, which shows their importance compared to other input variables. Moreover, there are no significant differences between the optimal models with the complexities of 56 and 108 in the first scenario and the optimal models with the complexities of 37 and 65 in the second scenario for unknown data. All the eleven input variables of the first  Fig. 8 The optimal models obtained by the proposed method: a RMSD-complexity plot of the training dataset, b RMSD-complexity plot of the testing dataset, and c OBJ-complexity plot Fig. 9 The NRMSDscomplexity plot of the best models for two scenarios Table 3 Participated input variables in the best optimal models of the first scenario Participated input variables in the best optimal models of the first scenario.  10  56  11  57  12  58  13  59  14  60  15  61  16  62  17  63  18  65  19  66  20  68  21  69  22  71  23  72  24  74  25  75  26  76  27  77  28  79  29  80  30  81  31  82  33  83  34  85  35  88  36  89  38  90  40  92  42  94  44  98  45  107  49 108 54 scenario participate in the optimal solution with the complexity of 108. In contrast, all input variables except for PP 75 are involved in the optimal solution with the complexity of 56. It means that knowing the value of PP 75 does not significantly improve the accuracy of the prediction. In the case of the second scenario, FA/C and RWG 75 /  TFA do not appear in the optimal models with the complexities of 37 and 65. Moreover, NFA/TA and PWG 75 /C do not exist in the optimal model with the complexity of 37. It shows that it is possible to estimate the CS of CCWG with an acceptable accuracy given only six input variables. Assuming similar complexities, the optimal solutions of the second scenario have better performance than those of the first scenario. Additionally, there are 11 and 8 input variables in the best optimal solutions of the first and second scenarios, respectively. With the smaller number of input variables and lower complexity, the best model of the second scenario outperforms the best model of the first scenario. It means that considering the effective ratios as the predictors of CS of CCWG can lead to more accurate results.
The NRMSDs of the best optimal models for both scenarios are plotted in Fig. 9. Most developed models in two scenarios have the NRMSDs between 0.2 and 0.3, which indicates their fair performance in predicting the CS of CCWG for both training and testing phases. However, the Fig. 11 The number of input variables in the best-developed models of two scenarios  NRMSDs reduce by increasing the complexities of the developed models, and the NRMSDs of the most complicated models of the second scenario reach 0.2 for the training phase, which shows their good performance in mapping data from the input space into the output space. Tables 3 and 4 show the participated input variables in the best-developed models for the first and second scenarios, respectively. Six effective input variables of (C, BFS, FA, SP, PP 45 , WG, AC) and four effective input variables of (W/B, SP/B, PWG 45 /TFA, AC) participate in the simplest models of the first and second scenarios, respectively. Moreover, all 11 input variables exist in the most complicated model of the first scenario. In the case of the second scenario, two input variables of FA/C and PWG 75 /TFA do not participate in the most complicated model. Figure 10 illustrates the participation percentage of the input variables for the best-developed models of the first and second scenarios. In all 65 developed models of the first scenario, C, W, BFS, FA, SP, PP 45 , WG, and AC have the main participation, while the PP 75 has the least participation. It means that PP 75 has the lowest effect on the CS of CCWG. In the second scenario, W/B, NCA/C, SP/B, PWG 45 /C, and AC are the most effective variables on the CS of CCWG. Moreover, RWG 75 /TFA does not participate in any of the 46 developed models of the second scenario, which means that the waste glass content with the size of greater than 75 lm does not have any contribution on the CS of CCWG compared to other input variables. It is compatible with the results of Shao et al. [62,74] who reported that the WG with the size of 150 lm does not have a pozzolanic reaction. Besides, FA/C, with a participation percentage of 8.70%, has the second order from the viewpoint of the least important variable. Meanwhile, PWG 45 /C appeared about 2.7 times more than the PWG 75 / C in the 46 best-developed models of the second scenario. The reason can be related to the pozzolanic activity and filler effects of WG with a smaller size [74,75].
The number of input variables that participated in the best models of the first and second scenarios is given in Fig. 11. The numbers of input variables of the simplest models of the first and second scenarios is 7 and 4, respectively. For the most complicated models, the numbers of input variables is 11 and 8 for the first and second scenarios, respectively. It means fewer input variables participate in the second scenario than the first scenario. However, the accuracy of the best-developed models of the second scenario is higher despite their lower complexities.
From the best-developed models, the models with different variables and better precisions were chosen for more investigations. The numbers of 6 and 9 models were selected from the first and second scenarios, respectively. Table 5 shows the selected models as well as the input variables which participate in these models. In this table, CS i,j represents the developed model for the CS of CCWG, in which i can be A and B for the first and second scenarios, respectively, and j gives the complexity of the developed model. All the 15 developed models are given in the ''Appendix''.  Table 6 gives the statistical indicators of the selected models for the training and testing phases. The CS A,11 and CS B,65 models with the RMSDs of 8.9975 MPa and 7.0192 MPa for the training dataset are the best and the worst selected models. Moreover, the RMSD of the CS A,108 model, with the complexity of about ten times of the CS A,11 model, is about 13.7% lower than the RMSD of the CS A,11 model for the testing dataset. Comparing the RMSDs of the CS B,9 and CS B,65 models for the testing dataset shows that the latter model is almost 20.3% more accurate than the former model. In terms of OBJ, the simplest and complicated models of the second scenario are about 3% and 7% more precise than those of the first scenario, respectively. However, the simplest and complicated models of the second scenario are, respectively, almost 18% and 40% simpler than those of the first scenario.
To show the performance of the selected models, the predicted CSs by the 15 selected models versus the experimental CSs for the testing dataset are plotted in Fig. 12. For more complicated models, the majority of data scatter are around the baseline with the slope of one passed through the origin of coordinates. Moreover, with increasing the complexity of models, the R 2 values increase, and the slopes of models get closer to one. However, there are some data points with a considerable difference between the predicted and experimental CSs for (a) (b) Fig. 12 The predicted CSs by the 15 selected models versus the experimental CSs for the testing dataset all selected models which can be related to the experimental errors. Figure 13 depicts the error box plot of the selected models for the testing dataset. The error on the horizontal axis was defined as the logarithm of the predicted to experimental CSs. The interquartile range (IQR) of a model increases by increasing its complexity. All developed models overestimate the CS of CCWG. For both of the CS A,108 and CS B,65 models, as the best models of the first and second scenarios, the errors of the 50% of data records Fig. 13 The error box plot of the selected models for testing dataset happen in the range of -0.04 and 0.06 with the average of zero. The CS A,11 and CS B,9 models, as the least accurate models of the first and second scenarios, have the IQRs of about 40% and 50% more than the most complicated models. Surprisingly, the number of outliers is almost the same for the best models of the first and second scenarios. More investigation shows that the most outliers happen in the same data records for both models indicating the good performance of the developed models in identifying the outliers.
To investigate the effect of PP 45 , PP 75 on the CS of CCWG, the CS A,108 model was used. In this regard, all other input variables were kept constant in their average ranges, and the desired variables were changed within their allowable ranges. The CSs of CCWG were predicted for the designed data using the best model, and the obtained surfaces were plotted for the AC of 7, 28, 56, and 90 days. As illustrated in Fig. 14, the CS of CCWG rises by increasing the PP 45 and PP 75 . However, the growth of CS is higher in the case of PP 45 , which is related to the higher pozzolanic activity of the smaller size of WG [24,[75][76][77]. The distance of surfaces indicates the growth of CS of CCWG over time. The distance between the surfaces corresponding to the AC of 56 and 90 days is recognizable, which indicates the secondary reaction of WG at older ages of concrete. This analysis shows that the developed models can correctly estimate the trend of CS of CCWG by changing the PP 45 and PP 75 as well as considering the age of concrete in their related models.

Conclusions
In this study, the multi-objective multi-biogeography-based programming (MOMBBP) was used as a machine learning method to estimate the compressive strength of concrete containing waste glass. In this regard, two scenarios were considered. In the first scenario, the amounts of concrete constituents, the age of concrete, the passing percentages of waste glass at the sieves of 75 lm (PP 75 ) and 45 lm (PP 45 ) are used to predict the compressive strength. In the second scenario, the critical ratios of concrete constituents as well as the age of concrete, the ratios of passing WG at the sieve of 45 lm and 75 lm to cement (PWG 45 /C and PWG 75 /C) and the amount of remaining WG on the sieve of 75 lm to the total fine aggregate (RWG 75 /TFA) were served to estimate the compressive strength of concrete containing WG. The obtained conclusions are listed below: • The numbers of 65 and 46 mathematical models with R values of more than 0.85 were achieved for the first and second scenarios, respectively. The complexities of the developed models of the first and second scenarios are within [10,108] and [8,65], respectively. • Despite the lower complexities of the developed models of the second scenario, their precisions are better than the models developed by the first scenario. It shows the significant effect of employing the critical ratios of mix design as the input variables in predicting the compressive strength of concrete containing WG. • In the simplest model of the first scenario, the seven variables consisting of cement, blast furnace slag, fly ash, superplasticizer, PP 45 , WG, and age of concrete are the factors for the prediction of the compressive strength. In contrast, only four variables including water to binder ratio, superplasticizer to binder ratio, PWG 45  The authors suggest using other feature selection and machine learning methods, as a hybrid model, for the determination of the effective factors influence on the compressive strength of concrete containing WG as well as increasing the precision of the developed models in this study. In the case of MOMBBP, the proposed technique can be served for any engineering regression problem, especially in the field of material science and structural engineering, in which different mathematical equations with various attributes are required. Funding Open Access funding enabled and organized by CAUL and its Member Institutions.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.