Keywords

1 Introduction

Nanographite (NG) is a promising conductive filler for producing highly effective electrically conductive cementitious composites. As reported by [1], the NG-based cementitious composite (NGCC) has efficient strength and conductivity as a self-sensing construction material for use in non-destructive structural health monitoring (NDSHM). There are generally three forms of NG studied in NGCC, namely graphene nanoplatelet (GNP), graphene oxide (GO), and reduced GO (rGO). Apart from the form of NG used, as well as its physical properties, it has been confirmed that other experimental factors, such as the dispersion of NG and curing condition, are linked to the strength and conductivity variations of NGCC [2, 3]. Significant experimental research has been conducted to determine the optimal values for those factors during the production of NGCC [4, 5].

The purpose of design optimization of NGCC is to approach its high strength and low electrical resistivity (ER) simultaneously, which can seem difficult to realize by experimental methods. Once, an experiment could only consider one factor with limited design values for variable-controlling analysis, making the design experience lack universality. Former experimental studies often discussed mechanical and electrical objectives separately. Nevertheless, they have provided numerous test data. We present our study that aimed to address the multi-objective design optimization (MODO) problem of NGCC using advanced data mining techniques.

The study proposed a comprehensive data-driven computing and analyzing method, which integrates algorithms of machine learning (ML) and non-dominated sorting genetic algorithm (NSGA-II). First, the uniaxial compressive strength (UCS) and ER of NGCC were modelled by Bayesian-optimized XGBoost with compiled experimental datasets. During the modelling process, the Weight and SHAP (SHapley Additive exPlanations) theories were used to interpret the ML models and determine variables that were critical for optimization. Furthermore, the established models worked as objective functions in NSGA-II to develop the MODO program. The feasibility and accuracy of the proposed MODO method are discussed and proved with a case study. Using this method, researchers can quickly find the optimal designs of NGCC that satisfy their application demands.

2 Methods

2.1 Establishment of Calculation Models

We modelled the UCS and ER of NGCC through a complex ML modelling framework, as shown in Fig. 1, to ensure the established models were accurate and reliable. The framework followed four steps. Original experimental datasets of UCS and ER were constructed after an extensive literature review. They were then processed with proper feature engineering strategies before estimator training. In Step 3 of modelling, we applied four classic ML algorithms, namely support vector regressor (SVR), random forest regressor (RFR), XGBoost, and back-propagation neural network (BPNN), to the processed datasets. Three searching methods, comprising grid searching (GS), particle swarm optimization (PSO), and Bayesian optimization (BO), were used to tune the hyperparameters of the algorithms for achieving their best performance. At this stage, estimators based on four algorithms were compared with each other to determine the most suitable models for UCS and ER. Finally, we used SHAP theory to interpret the established models to identify critical input variables and quantify their influence.

Fig. 1
A framework with 4 steps includes the original dataset, processed testing and training subsets, support vector and random forest regressors, X G boost, back-propagation, grid searching, particle swarm and Bayesian optimization, comparison, the best estimator, performance evaluation, and launching the model.

Machine learning modeling framework

2.2 Development of the MODO Program

Figure 2 shows the workflow of the developed MODO program. The structure of the program was based on the mechanism of NSGA-II. Individuals in the generated group were vectors consisting of variables to be optimized. Lower and upper boundaries should be defined for each variable. Other variables that were also considered in both datasets were fixed. Afterwards, the generated group concatenated fixed variables to form the input datasets for UCS and ER calculations using the established models. Pipelines were tools packing feature engineering methods and making the format of input datasets consistent. Individuals were ranked based on the UCS and ER results according to the Pareto rule. Individuals in the high Pareto ranks and with larger crowding distances were kept for creating the next generation group. At the end of the program, the final group was the Pareto set formed with optimal design solutions.

Fig. 2
A flowchart includes start, initialize the group, Gen plus 1, generate children, combine parent and child groups, construct datasets, U C S and E R pipelines and prediction models, non-dominated sorting, crowding distance, select and form a new parent group, and end if Gen is less than max Gen.

Flowchart of the multi-objective design optimization (MODO) program

3 Results and Discussion

Bayesian-optimized XGBoost models proved the most suitable for the UCS and ER of NGCC according to the comparative results, and their hyperparameter combinations are given in Table 1. Models had minimal gaps between training and testing subsets at the end of the training, indicating no over- or under-fitting issues. Mean absolute error (MAE) and determination coefficient (R2) were two indexes for assessing the accuracy of the established models. Small MAE values (1.24 and 3.44, 0.15 and 0.22) and high R2 scores (0.95 and 0.92, 0.99 and 0.98) of the two models yielded satisfactory and reliable prediction abilities for the UCS and ER of NGCC. Figure 3 shows the feature importance ranking based on the SHAP interpretation results of the two models. It can be seen that the UCS and ER prediction models share almost the same feature importance ranking, where the mixing amount of NG (GC) was a dominant variable in both properties. The high influence could also be observed in other features: NG’s physical properties of thickness and diameter (GT and GD), the water dosage and curing age.

Table 1 Hyperparameter tuning results
Fig. 3
Two horizontal bar graphs labeled a and b plot the weight scores for the features G C, G D, W over C, C A, G T, U S, S over C, G M, S T M, S P, and C E. In both graphs, the weight score of G C was high and that of C E was low.

Feature importance ranking: a uniaxial compressive strength; b electrical resistivity

The developed MODO program proved feasible in a case study. In the case study, GNP was selected and the curing age was set at 28 days while the other six variables were optimized. The optimization process is recorded in Fig. 4 and the program successfully converged to the final Pareto set through iteration. As listed in Table 2, optimization results indicated that all the given design solutions qualified for the NDSHM application with acceptable strength and conductivity. Higher UCS led to higher ER (lower conductivity). Additionally, optimal values could be found for some variables. For example, the ideal thickness of GNP was ≈6.36 nm. The water/cement ratio was 0.32 in most solutions. The ultrasonication process for GNP dispersion was better if it lasted for 30 min. The differences in output results between solutions were mostly due to the changes in the dosage of sand and GNP, as well as the GD. UCS and ER of NGCC became smaller by adding more GNP.

Fig. 4
Three scatter plots of the ln of E R versus negative U C S for group and Pareto. In the first scatter plot, the scattered points of the group are greater than Pareto. The points of the group were reduced in the second plot, and the group was nil in the third plot.

Optimization process of the multi-objective design optimization (MODO) program of nanographite-based cementitious composite. (ER, electrical resistivity; UCS, uniaxial compressive strength)

Table 2 Multi-objective design optimization (MODO) design solutions for nanographite-based cementitious composite

4 Conclusions

Robust and reliable calculation models were established for the UCS and ER of NGCC by BO-tuned XGBoost. The SHAP interpretation results of the established models identified significant influential factors, including NG dosage and other variables. Moreover, we explained their quantitative influence on the properties of NGCC by analyzing their SHAP value distributions. NSGA-II was combined with the established models as objective functions to develop the MODO program of NGCC. The program proved feasible through a case study in which it successfully obtained the Pareto set of design solutions. The given solutions determined the optimal values for some variables.