Avoid common mistakes on your manuscript.
The discussers would like to thank the authors for investigating the efficiency of machine learning methods in predicting splitting tensile strength of high-performance concrete (HPC). In the discussed study (Wu and Zhou 2022), two hybrid genetic algorithm (GA)-based artificial neural networks (ANN) and grid search (GS)-based support vector regression are compared with classical ANN using empirical data collected from published literature and they conclude that the GS-SVR performance is superior to the other methods in predicting splitting tensile strength of HPC. The discussers would like to raise some important issues, which can be taken into account by the authors for future studies and audiences.
In the study of (Wu and Zhou 2022), the parameter settings of the GA-ANN have been provided in Table 1 (see Table 2 in the original study). In this table trainlm is a MATLAB command indicating the Levenberg–Marquardt (LM) algorithm which is used for training classical ANN. We wonder how the authors could use this algorithm in training while they also state that they calibrated ANN using GA which is an evolutionary method and that does not use LM. Furthermore, there is no information about cross-over or mutation operators and their values/settings which are very important in GA optimization. How did you find the optimal settings for the GA-ANN model? If you used a trial and error approach, which method did you follow? Did you optimize the hidden node number by setting generation and population size constant or was another procedure applied? These open questions need further explanation and do not make the GA-ANN model reproducible. On the other hand, there is not enough information about the classical ANN model (e.g., iteration number, activation functions, hidden node number) developed in Wu and Zhou (2022). Which software or program did the authors use for models simulations? Where is the parameters of GS and how did you select them? Such things weaken the replicability of the scientific studies. In research studies which use machine learning methods, details of the modeling process (e.g., software/program implemented in simulation, ranges of trials for the model parameters and their optimal values for the best model) should be clearly provided by the authors to allow for a reproduction and validation (Wu et al. 2014).
Wu and Zhou (2022) attempted to compare their models with literature (Nguyen et al. 2022) using RMSE, MAE and MAPE. However, such statistics are highly dependent on data range and data length and range and it is difficult to use them for direct comparison of the two different studies. To be able to compare two studies using RMSE, MAE or MAPE, data split ratio should be the same. The discussers checked the study of Nguyen et al. (2022) and saw that they applied tenfold cross-validation which means that the testing data are not comparable to those of Wu and Zhou (2022). Furthermore, the authors (Nguyen et al. 2022) mentioned some missing data and that they filled missing parts with the mean of the available data. On the other hand, the standard deviations of data provided by Nguyen et al. (2022) are not same as of data provided by Wu and Zhou (2022). This needs further clarification.
Wu and Zhou (2022) state that they used 80% of the whole data set for training and 20% for testing in the first paragraph of the results and discussion section. However, they also write that the GS-SVR model was trained using tenfold cross-validation. The discussers could not understand the methodology followed by the authors. Does this mean that the GA-ANN and GS-SVR models were not trained following the same data split rule? What do you mean by tenfold cross-validation? Did you split the entire data set into 10 folds and tested the models with each part (or fold)? If yes, in this case, testing data set will not be same than that of GA-ANN in which 20% was used for testing. Why didn’t you apply tenfold cross-validation for GA-ANN and ANN also? This point also needs better clarification.
In the abstract of Wu and Zhou (2022), the authors write `the optimized models were used to train and test the data set, …`. This statement is not correct because the optimized models cannot be used to train and test data sets. Training set is used to optimize models and then the obtained models are tested and evaluated using testing data set based on selected evaluation metrics. Again in the same section, they say `… and contribution of these input variables on the output …` but input variables were not mentioned before. An expansion has not been provided for the GA-ANN and GS-SVR in the text (neither in abstract nor in introduction and other sections). I could understand the meaning after carefully reading research methodology section. In the 4th paragraph of the introduction section, the authors define the ANN, extreme learning machine (ELM), SVR, decision tree, random forest, gradient boosting and adaptive boosting as machine learning (ML) algorithms. This is also not correct because all these are ML methods and training algorithms are used to obtain their model parameters. Such wrong statements are also seen in the last paragraph of the introduction section (`… with machine learning algorithms… … Section 2 describes the two machine learning optimization algorithms GA-ANN and GS-SVR…).
In the caption of Fig. 6, the authors mention the rough and fine selection but they did not give any information for these in the text. Which methods did they use for these selections? There is no equation and reference for the GA-ANN definition part while GS-SVR has several equations and references. It would be better to calibrate both SVR and ANN using same algorithm, GA or GS. The other issue worth to mention here is that the abbreviations of the input parameters could be used in Table 1, Fig. 4 and whole text instead of x1, x2, …, x12 to be able to better follow the difference between each parameter.
References
Nguyen MST, Trinh MC, Kim SE (2022) Uncertainty quantification of ultimate compressive strength of CCFST columns using hybrid machine learning model. Eng Comput 38(Suppl 4):2719–2738. https://doi.org/10.1007/s00366-021-01339-1
Wu W, Dandy GC, Maier HR (2014) Protocol for developing ANN models and its application to the assessment of the quality of the ANN model development process in drinking water quality modelling. Environ Model Softw 54:108–127. https://doi.org/10.1016/j.envsoft.2013.12.016
Wu Y, Zhou Y (2022) Splitting tensile strength prediction of sustainable high-performance concrete using machine learning techniques. Environ Sci Pollut Res 29:89198–89209. https://doi.org/10.1007/s11356-022-22048-2
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: Philippe Garrigues
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kisi, O., Ajri, S., Jörgens, K.C. et al. Comments on “Splitting tensile strength prediction of sustainable high-performance concrete using machine learning techniques” by Wu, Yangi et al., https://doi.org/10.1007/s11356-022-22048-2. Environ Sci Pollut Res 30, 109854–109855 (2023). https://doi.org/10.1007/s11356-023-28829-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11356-023-28829-7