Skip to main content

Data Mining Methods for Modeling in Water Science

  • Chapter
  • First Online:
Computational Intelligence for Water and Environmental Sciences

Abstract

One of the most useful research fields with many real-life applications, such as in water science, is the subject of data mining. Data mining (DM) is considered a process to extract valuable data from a wide range of information stored in various databases. The data is categorized into the form of patterns, associations, changes, anomalies and significant structures. In water recourses management and environmental engineering, predicting and modelling parameters play an integral role in decision making. The most critical freshwater water resource for millions of people worldwide are rivers with a dynamic nature (floods/droughts), in terms of available freshwater quantity and quality. With various basin characteristics, river flow and sediment regime may be influenced by natural processes such as erosion and sediment transport as well as anthropogenic factors such as urban stormwater runoff and semi-treated sanitary/industrial sewage discharge. Therefore, artificial intelligence (AI) techniques are used to decrease model development costs and improve prediction errors, achieving more efficient models. In this chapter, some well-known techniques and AI-based methods are introduced, and their applications are elaborated. The models are comprised of extreme learning machine (ELM), least square support vector machine (LSSVM), genetic programming (GP), adaptive neural-fuzzy inference system (ANFIS), and multivariate adaptive regression spline (MARS). Each technique, then, is illustrated with a brief literature review. Having being evaluated in terms of the basic concept, the methods are addressed based on a mathematical statement. In the last part, the pseudocode of the ways, an acceptable guideline for coding the methods, is pointed out. This chapter is collected for graduate students, researchers, educators, and practitioners interested in engineering optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abba, S., Hadi, S. J., Sammen, S. S., Salih, S. Q., Abdulkadir, R., Pham, Q. B., & Yaseen, Z. M. (2020). Evolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination. Journal of Hydrology, 587, 124974.

    Article  Google Scholar 

  • Ahmed, A. N., Othman, F. B., Afan, H. A., Ibrahim, R. K., Fai, C. M., Hossain, M. S., Elshafie, A., et al. (2019). Machine learning methods for better water quality prediction. Journal of Hydrology, 578, 124084.

    Article  Google Scholar 

  • Al-Sudani, Z. A., Salih, S. Q., & Yaseen, Z. M. (2019). Development of multivariate adaptive regression spline integrated with differential evolution model for streamflow simulation. Journal of Hydrology, 573, 1–12.

    Article  Google Scholar 

  • Alizamir, M., Heddam, S., Kim, S., & Mehr, A. D. (2021). On the implementation of a novel data-intelligence model based on extreme learning machine optimized by bat algorithm for estimating daily chlorophyll-a concentration: Case studies of river and lake in USA. Journal of Cleaner Production, 285, 124868.

    Article  Google Scholar 

  • Arora, S., & Keshari, A. K. (2021). ANFIS-ARIMA modelling for scheming re-aeration of hydrologically altered rivers. Journal of Hydrology, 126635.

    Google Scholar 

  • Aryafar, A., Khosravi, V., Zarepourfard, H., & Rooki, R. (2019). Evolving genetic programming and other AI-based models for estimating groundwater quality parameters of the Khezri plain Eastern Iran. Environmental Earth Sciences, 78(3), 69.

    Article  Google Scholar 

  • Asadollah, S. B. H. S., Sharafati, A., Motta, D., & Yaseen, Z. M. (2021). River water quality index prediction and uncertainty analysis: A comparative study of machine learning models. Journal of Environmental Chemical Engineering, 9(1), 104599.

    Article  Google Scholar 

  • Azar, N. A., Milan, S. G., & Kayhomayoon, Z. (2021). The prediction of longitudinal dispersion coefficient in natural streams using LS-SVM and ANFIS optimized by Harris hawk optimization algorithm. Journal of Contaminant Hydrology, 240, 103781.

    Article  Google Scholar 

  • Barzegar, R., Moghaddam, A. A., Adamowski, J., & Ozga-Zielinski, B. (2018). Multi-step water quality forecasting using a boosting ensemble multi-wavelet extreme learning machine model. Stochastic Environmental Research and Risk Assessment, 32(3), 799–813.

    Article  Google Scholar 

  • Bhardwaj, R., & Bangia, A. (2021). Neuronal Brownian dynamics for salinity of river basins’ water management. Neural Computing and Applications, 1–14.

    Google Scholar 

  • Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory.

    Google Scholar 

  • Çamdevýren, H., Demýr, N., Kanik, A., & Keskýn, S. (2005). Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs. Ecological Modelling, 181(4), 581–589.

    Article  Google Scholar 

  • Chen, H., Xu, L., Ai, W., Lin, B., Feng, Q., & Cai, K. (2020). Kernel functions embedded in support vector machine learning models for rapid water pollution assessment via near-infrared spectroscopy. Science of the Total Environment, 714, 136765.

    Article  Google Scholar 

  • Cheng, M.-Y., Tsai, H.-C., & Hsieh, W.-S. (2009). Web-based conceptual cost estimates for construction projects using evolutionary fuzzy neural inference model. Automation in Construction, 18(2), 164–172.

    Article  Google Scholar 

  • Civelekoglu, G., Yigit, N., Diamadopoulos, E., & Kitis, M. (2007). Prediction of bromate formation using multi-linear regression and artificial neural networks. Ozone Science and Engineering, 29(5), 353–362.

    Article  Google Scholar 

  • Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods: Cambridge university press.

    Google Scholar 

  • Deng, W.-Y., Zheng, Q.-H., Chen, L., & Xu, X.-B. (2010a). Research on extreme learning of neural networks. Chinese Journal of Computers, 33(2), 279–287.

    Article  MathSciNet  Google Scholar 

  • Deng, W.-Y., Zheng, Q.-H., Lian, S., Chen, L., & Wang, X. (2010b). Ordinal extreme learning machine. Neurocomputing, 74(1–3), 447–456.

    Article  Google Scholar 

  • Deng, W., Wang, G., & Zhang, X. (2015). A novel hybrid water quality time series prediction method based on cloud model and fuzzy forecasting. Chemometrics and Intelligent Laboratory Systems, 149, 39–49.

    Article  Google Scholar 

  • Fijani, E., Barzegar, R., Deo, R., Tziritis, E., & Skordas, K. (2019). Design and implementation of a hybrid model based on two-layer decomposition method coupled with extreme learning machines to support real-time environmental monitoring of water quality parameters. Science of the Total Environment, 648, 839–853.

    Article  Google Scholar 

  • Friedman, J. H. (1991). Multivariate adaptive regression splines. The annals of statistics, 1–67.

    Google Scholar 

  • Goh, A. T., & Zhang, W. (2014). An improvement to MLR model for predicting liquefaction-induced lateral spread using multivariate adaptive regression splines. Engineering Geology, 170, 1–10.

    Article  Google Scholar 

  • Heddam, S., & Kisi, O. (2018). Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree. Journal of Hydrology, 559, 499–509.

    Article  Google Scholar 

  • Herrig, I. M., Böer, S. I., Brennholt, N., & Manz, W. (2015). Development of multiple linear regression models as predictive tools for fecal indicator concentrations in a stretch of the lower Lahn River, Germany. Water Research, 85, 148–157.

    Article  Google Scholar 

  • Huan, J., Cao, W., & Qin, Y. (2018). Prediction of dissolved oxygen in aquaculture based on EEMD and LSSVM optimized by the Bayesian evidence framework. Computers and Electronics in Agriculture, 150, 257–265.

    Article  Google Scholar 

  • Huang, C., Davis, L., & Townshend, J. (2002). An assessment of support vector machines for land cover classification. International Journal of Remote Sensing, 23(4), 725–749.

    Article  Google Scholar 

  • Jafari, H., Rajaee, T., & Kisi, O. (2020). Improved water quality prediction with hybrid wavelet-genetic programming model and shannon entropy. Natural Resources Research, 29, 3819–3840.

    Article  Google Scholar 

  • Jamei, M., Ahmadianfar, I., Chu, X., & Yaseen, Z. M. (2020). Prediction of surface water total dissolved solids using hybridized wavelet-multigene genetic programming: New approach. Journal of Hydrology, 589, 125335.

    Article  Google Scholar 

  • Jayaweera, C., Othman, M., & Aziz, N. (2019). Improved predictive capability of coagulation process by extreme learning machine with radial basis function. Journal of Water Process Engineering, 32, 100977.

    Article  Google Scholar 

  • Kargar, K., Samadianfard, S., Parsa, J., Nabipour, N., Shamshirband, S., Mosavi, A., & Chau, K.-W. (2020). Estimating longitudinal dispersion coefficient in natural streams using empirical models and machine learning algorithms. Engineering Applications of Computational Fluid Mechanics, 14(1), 311–322.

    Article  Google Scholar 

  • Koza, J. R. (1994). Genetic programming as a means for programming computers by natural selection. Statistics and Computing, 4(2), 87–112.

    Article  Google Scholar 

  • Koza, J. R., & Koza, J. R. (1992). Genetic programming: on the programming of computers by means of natural selection (Vol. 1). MIT press.

    Google Scholar 

  • Kwon, Y.-K., & Moon, B.-R. (2007). A hybrid neurogenetic approach for stock forecasting. IEEE Transactions on Neural Networks, 18(3), 851–864.

    Article  Google Scholar 

  • Lashkari, A. (2013). Prediction of the shaft resistance of nondisplacement piles in sand. International Journal for Numerical and Analytical Methods in Geomechanics, 37(8), 904–931.

    Article  Google Scholar 

  • Liang, N.-Y., Huang, G.-B., Saratchandran, P., & Sundararajan, N. (2006). A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Transactions on Neural Networks, 17(6), 1411–1423.

    Article  Google Scholar 

  • Lin, C.-T., & Lee, C. G. (1996). Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems: Prentice hall.

    Google Scholar 

  • Liu, J., Yu, C., Hu, Z., Zhao, Y., Bai, Y., Xie, M., & Luo, J. (2020). Accurate prediction scheme of water quality in smart mariculture with deep Bi-S-SRU learning network. IEEE Access, 8, 24784–24798.

    Article  Google Scholar 

  • Mirzahosseini, M. R., Aghaeifar, A., Alavi, A. H., Gandomi, A. H., & Seyednour, R. (2011). Permanent deformation analysis of asphalt mixtures using soft computing techniques. Expert Systems with Applications, 38(5), 6081–6100.

    Article  Google Scholar 

  • Nacar, S., Mete, B., & Bayram, A. (2020). Estimation of daily dissolved oxygen concentration for river water quality using conventional regression analysis, multivariate adaptive regression splines, and TreeNet techniques. Environmental Monitoring and Assessment, 192(12), 1–21.

    Article  Google Scholar 

  • Najafzadeh, M., & Ghaemi, A. (2019). Prediction of the five-day biochemical oxygen demand and chemical oxygen demand in natural streams using machine learning methods. Environmental Monitoring and Assessment, 191(6), 1–21.

    Article  Google Scholar 

  • Najafzadeh, M., Homaei, F., & Farhadi, H. (2021). Reliability assessment of water quality index based on guidelines of national sanitation foundation in natural streams: integration of remote sensing and data-driven models. Artificial Intelligence Review, 1–33.

    Google Scholar 

  • Orouji, H., Bozorg Haddad, O., Fallah-Mehdipour, E., & Mariño, M. (2013). Modeling of water quality parameters using data-driven models. Journal of Environmental Engineering, 139(7), 947–957.

    Article  Google Scholar 

  • Poli, R., Langdon, W., & McPhee, N. (2008). A field guide to genetic programming (With contributions by JR Koza) (2008). Published via http://lulu.com.

  • RadFard, M., Seif, M., Hashemi, A. H. G., Zarei, A., Saghi, M. H., Shalyari, N., Samaei, M. R., et al. (2019). Protocol for the estimation of drinking water quality index (DWQI) in water resources: Artificial neural network (ANFIS) and Arc-Gis. MethodsX, 6, 1021–1029.

    Article  Google Scholar 

  • Samui, P. (2012). Determination of ultimate capacity of driven piles in cohesionless soil: A multivariate adaptive regression spline approach. International Journal for Numerical and Analytical Methods in Geomechanics, 36(11), 1434–1439.

    Article  Google Scholar 

  • Shi, P., Li, G., Yuan, Y., Huang, G., & Kuang, L. (2019). Prediction of dissolved oxygen content in aquaculture using Clustering-based Softplus Extreme Learning Machine. Computers and Electronics in Agriculture, 157, 329–338.

    Article  Google Scholar 

  • Sihag, P., Tiwari, N., & Ranjan, S. (2017). Modelling of infiltration of sandy soil using gaussian process regression. Modeling Earth Systems and Environment, 3(3), 1091–1100.

    Article  Google Scholar 

  • Sihag, P., Tiwari, N., & Ranjan, S. (2019). Prediction of unsaturated hydraulic conductivity using adaptive neuro-fuzzy inference system (ANFIS). ISH Journal of Hydraulic Engineering, 25(2), 132–142.

    Article  Google Scholar 

  • Song, C., Yao, L., Hua, C., & Ni, Q. (2021). A water quality prediction model based on variational mode decomposition and the least squares support vector machine optimized by the sparrow search algorithm (VMD-SSA-LSSVM) of the Yangtze River China. Environmental Monitoring and Assessment, 193(6), 1–17.

    Article  Google Scholar 

  • Su, H., Li, X., Yang, B., & Wen, Z. (2018). Wavelet support vector machine-based prediction model of dam deformation. Mechanical Systems and Signal Processing, 110, 412–427.

    Article  Google Scholar 

  • Suykens, J. A., & Vandewalle, J. (1999). Multiclass least squares support vector machines. Paper presented at the IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No. 99CH36339).

    Google Scholar 

  • Tsoukalas, L. H., & Uhrig, R. E. (1997). Hypermedia integration of information resources for nuclear plant operations. Nuclear Technology, 119(1), 48–62.

    Article  Google Scholar 

  • Vapnik, V. (2013). The nature of statistical learning theory. Springer science and business media.

    Google Scholar 

  • Yaseen, Z. M., Ramal, M. M., Diop, L., Jaafar, O., Demir, V., & Kisi, O. (2018). Hybrid adaptive neuro-fuzzy models for water quality index estimation. Water Resources Management, 32(7), 2227–2245.

    Article  Google Scholar 

  • Zarnani, S., El-Emam, M. M., & Bathurst, R. J. (2011). Comparison of numerical and analytical solutions for reinforced soil wall shaking table tests. Geomechanics and Engineering, 3(4), 291–321.

    Article  Google Scholar 

  • Zhang, W., & Goh, A. (2016). Evaluating seismic liquefaction potential using multivariate adaptive regression splines and logistic regression. Geomech Eng, 10(3), 269–284.

    Article  Google Scholar 

  • Zhang, W., & Goh, A. T. C. (2013). Multivariate adaptive regression splines for analysis of geotechnical engineering systems. Computers and Geotechnics, 48, 82–95.

    Article  Google Scholar 

  • Zhang, W., Zhang, Y., & Goh, A. T. (2017). Multivariate adaptive regression splines for inverse analysis of soil and wall properties in braced excavation. Tunnelling and Underground Space Technology, 64, 24–33.

    Article  Google Scholar 

  • Zhao, C., Lu, T., Hodson, H., & Jackson, J. (2004). The temperature dependence of effective thermal conductivity of open-celled steel alloy foams. Materials Science and Engineering: A, 367(1–2), 123–131.

    Article  Google Scholar 

  • Zhu, S., Heddam, S., Wu, S., Dai, J., & Jia, B. (2019). Extreme learning machine-based prediction of daily water temperature for rivers. Environmental Earth Sciences, 78(6), 1–17.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyedehelham Shirvani-Hosseini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Shirvani-Hosseini, S., Samadi-Koucheksaraee, A., Ahmadianfar, I., Gharabaghi, B. (2022). Data Mining Methods for Modeling in Water Science. In: Bozorg-Haddad, O., Zolghadr-Asli, B. (eds) Computational Intelligence for Water and Environmental Sciences. Studies in Computational Intelligence, vol 1043. Springer, Singapore. https://doi.org/10.1007/978-981-19-2519-1_8

Download citation

Publish with us

Policies and ethics