Abstract
Input feature selection has a crucial role in predictive computational soft computing models. This chapter explored the appropriate pre-processing techniques and input vector selection methods for soft computing models. The pre-processing techniques, namely principal component analysis (PCA), Boruta feature selection algorithm (BFS), the gamma test (GT) algorithm, and subset selection by maximum dissimilarity (SSMD) algorithm, in the field of soft computing models is introduced, and implemented in bedload transport predictions, as a test case. The results of the current study highlighted the effectiveness of pre-processing, input variable selections, determination of the dominant input features and provide significant practical reference value for soft computing model developments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barzegar, R., Moghaddam, A. A., Deo, R., Fijani, E., & Tziritis, E. (2018). Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms. Science of the Total Environment, 621, 697–712.
Çamdevýren, H., Demýr, N., Kanik, A., & Keskýn, S. (2005). Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs. Ecological Modelling, 181(4), 581–589.
Choubin, B., Darabi, H., Rahmati, O., Sajedi-Hosseini, F., & Kløve, B. (2018). River suspended sediment modelling using the CART model: A comparative study of machine learning techniques. Science of the Total Environment, 615, 272–281.
Das, P., & Chanda, K. (2020). Bayesian Network based modeling of regional rainfall from multiple local meteorological drivers. Journal of Hydrology, 591, 125563.
Dehghani, M., Seifi, A., & Riahi-Madvar, H. (2019). Novel forecasting models for immediate-short-term to long-term influent flow prediction by combining ANFIS and grey wolf optimization. Journal of Hydrology, 576, 698–725.
Ebrahimi, H., & Rajaee, T. (2017). Simulation of groundwater level variations using wavelet combined with neural network, linear regression and support vector machine. Global and Planetary Change, 148, 181–191.
Helena, B., Pardo, R., Vega, M., Barrado, E., Fernandez, J. M., & Fernandez, L. (2000). Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis. Water Research, 34(3), 807–816.
Huang, M., Peng, G., Zhang, J., & Zhang, S. (2006). Application of artificial neural networks to the prediction of dust storms in Northwest China. Global and Planetary Change, 52(1–4), 216–224.
Jafari, S. M., Zahiri, A. R., Hadad, O. B., & Tabari, M. M. R. (2021). A hybrid of six soft models based on ANFIS for pipe failure rate forecasting and uncertainty analysis: A case study of Gorgan city water distribution network. Soft Computing, 25(11), 7459–7478.
Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13.
Liu, M. Y., Huai, W. X., Yang, Z. H., & Zeng, Y. H. (2020). A genetic programming-based model for drag coefficient of emergent vegetation in open channel flows. Advances in Water Resources, 140, 103582.
Lu, C., Zhang, T., Zhang, R., & Zhang, C. (2003, April). Adaptive robust kernel PCA algorithm. In 2003 IEEE International conference on acoustics, speech, and signal processing, 2003. Proceedings (ICASSP'03) (Vol. 6, pp. VI-621). IEEE.
Mallakpour, I., Villarini, G., Jones, M. P., & Smith, J. A. (2017). On the use of Cox regression to examine the temporal clustering of flooding and heavy precipitation across the central United States. Global and Planetary Change, 155, 98–108.
Memarzadeh, R., Zadeh, H. G., Dehghani, M., Riahi-Madvar, H., Seifi, A., & Mortazavi, S. M. (2020). A novel equation for longitudinal dispersion coefficient prediction based on the hybrid of SSMD and whale optimization algorithm. Science of the Total Environment, 716, 137007.
Montes, C., Kapelan, Z., & Saldarriaga, J. (2021). Predicting non-deposition sediment transport in sewer pipes using Random forest. Water Research, 189, 116639.
Noori, R., Karbassi, A. R., Moghaddamnia, A., Han, D., Zokaei-Ashtiani, M. H., Farokhnia, A., & Gousheh, M. G. (2011). Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. Journal of Hydrology, 401(3–4), 177–189.
Nourani, V., & Molajou, A. (2017). Application of a hybrid association rules/decision tree model for drought monitoring. Global and Planetary Change, 159, 37–45.
Qu, J., Ren, K., & Shi, X. (2021). Binary Grey wolf optimization-regularized extreme learning machine wrapper coupled with the Boruta algorithm for monthly streamflow forecasting. Water Resources Management, 35(3), 1029–1045.
Remesan, R., Shamim, M. A., Han, D., & Mathew, J. (2009). Runoff prediction using an integrated hybrid modelling scheme. Journal of Hydrology, 372(1–4), 48–60.
Riahi-Madvar, H., & Seifi, A. (2018). Uncertainty analysis in bed load transport prediction of gravel bed rivers by ANN and ANFIS. Arabian Journal of Geosciences, 11(21), 1–20.
Riahi-Madvar, H., Ayyoubzadeh, S. A., & Atani, M. G. (2011). Developing an expert system for predicting alluvial channel geometry using ANN. Expert Systems with Applications, 38(1), 215–222.
Riahi-Madvar, H., Dehghani, M., Seifi, A., & Singh, V. P. (2019). Pareto optimal multigene genetic programming for prediction of longitudinal dispersion coefficient. Water Resources Management, 33(3), 905–921.
Safari, M. J. S., Mohammadi, B., & Kargar, K. (2020). Invasive weed optimization-based adaptive neuro-fuzzy inference system hybrid model for sediment transport with a bed deposit. Journal of Cleaner Production, 276, 124267.
Seifi, A., & Riahi, H. (2020). Estimating daily reference evapotranspiration using hybrid gamma test-least square support vector machine, gamma test-ANN, and gamma test-ANFIS models in an arid area of Iran. Journal of Water and Climate Change, 11(1), 217–240.
Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3(2), 205–231.
Snieder, E., Shakir, R., & Khan, U. T. (2020). A comprehensive comparison of four input variable selection methods for artificial neural network flow forecasting models. Journal of Hydrology, 583, 124299.
Wang, Y. F., Huai, W. X., & Wang, W. J. (2017). Physically sound formula for longitudinal dispersion coefficients of natural rivers. Journal of Hydrology, 544, 511–523.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Riahi-Madvar, H., Gharabaghi, B. (2022). Pre-processing and Input Vector Selection Techniques in Computational Soft Computing Models of Water Engineering. In: Bozorg-Haddad, O., Zolghadr-Asli, B. (eds) Computational Intelligence for Water and Environmental Sciences. Studies in Computational Intelligence, vol 1043. Springer, Singapore. https://doi.org/10.1007/978-981-19-2519-1_20
Download citation
DOI: https://doi.org/10.1007/978-981-19-2519-1_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2518-4
Online ISBN: 978-981-19-2519-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)