Skip to main content

Assessment of input data selection methods for BOD simulation using data-driven models: a case study

Abstract

Using the multivariate statistical methods, this study interprets a set of data containing 23 water quality parameters from 10 quality monitoring stations in Karkheh River located in southwest of Iran over 5 years. According to cluster analysis, the stations are classified into three classes of quality, and the most important factors on the whole set of parameters and each class are determined by the help of factor analysis. The results indicate the effects of natural factors, soil weathering and erosion, urban and human wastewater, agricultural and industrial wastewater on water quality at different levels and any location. Afterwards, five input selection methods such as correlation model, principal component analysis, combination of gamma test and backward regression, gamma test and genetic algorithm, and gamma test by elimination method are used for modeling BOD, and then their efficiency is investigated in simulation BOD with local linear regression, Artificial Neural Network, and genetic programming. From five methods of input variables in BOD simulation by local linear regression, genetic test and backward regression with RMSE error of 0.27 are the best input methods; gamma test based on genetic algorithm is the best model in simulation by Artificial Neural Network with RMSE error of 0.28, and finally, the gamma test model based on genetic algorithm with RMSE error of 0.1303 is the most appropriate model in simulation with genetic programming.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  • Ahmed, A. M., & Shah, S. M. A. (2015). Application of adaptive neuro-fuzzy inference system (ANFIS) to estimate the biochemical oxygen demand (BOD) of Surma River. Journal of King Saud University—Engineering Sciences.

  • Ahmadi, A., Han, D., Karamouz, M., & Remesan, R. (2009). Input data selection for solar radiation estimation. Hydrological processes, 23(19), 2754-2764.

  • Asagha, E. N., Udo, S. O., & Echi, I. M. (2014). Modeling and simulation of global solar radiation in Warri, Nigeria using gamma test and artificial neural network algorithms. International Journal of Innovative Research and Development|| ISSN 2278–0211.

  • Baek, G., Cheon, S.-P., Kim, S., Kim, Y., Kim, H., Kim, C., & Kim, S. (2012). Modular neural networks prediction model based A2/O process control system. International Journal of Precision Engineering and Manufacturing, 13(6), 905–913.

    Article  Google Scholar 

  • Baghvand, A., Nokhandan, A. K., & Kerachian, R. (2006). Design of river a water quality monitoring network: an entropy based approach. World Environmental and Water Resources Congress 2006.

  • Chau, K.-W. (2006). A review on integration of artificial intelligence into water quality modelling. Marine Pollution Bulletin, 52(7), 726–733.

    CAS  Article  Google Scholar 

  • Chen, W.-B., & Liu, W.-C. (2014). Artificial neural network modeling of dissolved oxygen in reservoir. Environmental Monitoring and Assessment, 186(2), 1203–1217.

    CAS  Article  Google Scholar 

  • Dogan, E., Ates, A., Yilmaz, E. C., & Eren, B. (2008). Application of artificial neural networks to estimate wastewater treatment plant inlet biochemical oxygen demand. Environmental Progress, 27(4), 439–446.

    CAS  Article  Google Scholar 

  • Dogan, E., Sengorur, B., & Koklu, R. (2009). Modeling biological oxygen demand of the Melen River in Turkey using an artificial neural network technique. Journal of Environmental Management, 90(2), 1229–1235.

    CAS  Article  Google Scholar 

  • Fan, X., Cui, B., Zhao, H., Zhang, Z., & Zhang, H. (2010). Assessment of river water quality in Pearl River Delta using multivariate statistical techniques. Procedia Environmental Sciences, 2, 1220–1234.

    Article  Google Scholar 

  • Gazzaz, N. M., Yusoff, M. K., Ramli, M. F., Juahir, H., & Aris, A. Z. (2015). Artificial neural network modeling of the water quality index using land use areas as predictors. Water Environment Research, 87(2), 99–112.

    CAS  Article  Google Scholar 

  • Hosseini, S. M., & Mahjouri, N. (2014). Developing a fuzzy neural network-based support vector regression (FNN-SVR) for regionalizing nitrate concentration in groundwater. Environmental Monitoring and Assessment, 186(6), 3685–3699.

    CAS  Article  Google Scholar 

  • Jaafar, W. W., & Han, D. (2011). Variable selection using the gamma test forward and backward selections. Journal of Hydrologic Engineering, 17(1), 182–190.

    Article  Google Scholar 

  • Karamouz, M., Mahjouri, N., & Kerachian, R. (2004). River water quality zoning: a case study of Karoon and Dez River system. Journal of Environmental Health Science & Engineering, 1(2), 1–2.

    Google Scholar 

  • Kazi, T., Arain, M., Jamali, M., Jalbani, N., Afridi, H., Sarfraz, R., et al. (2009). Assessment of water quality of polluted lake using multivariate statistical techniques: a case study. Ecotoxicology and Environmental Safety, 72(2), 301–309.

    CAS  Article  Google Scholar 

  • Ketola, A. A., Adekolurejo, S. M., & Osibanjo, O. (2013). Water quality assessment of River Ogun using multivariate statistical techniques. Journal of Environmental Protection, 04, 466–479. https://doi.org/10.4236/jep.2013.45055.

    Article  Google Scholar 

  • Mulia, I. E., Asano, T., & Tkalich, P. (2015). Retrieval of missing values in water temperature series using a data-driven model. Earth Science Informatics, 8(4), 787–798.

    Article  Google Scholar 

  • Noori, R., Hoshyaripour, G., Ashrafi, K., & Araabi, B. N. (2010). Uncertainty analysis of developed ANN and ANFIS models in prediction of carbon monoxide daily concentration. Atmospheric Environment, 44(4), 476–482.

    CAS  Article  Google Scholar 

  • Palma, P., Alvarenga, P., Palma, V. L., Fernandes, R. M., Soares, A. M., & Barbosa, I. R. (2010). Assessment of anthropogenic sources of water pollution using multivariate statistical techniques: a case study of the Alqueva’s reservoir, Portugal. Environmental Monitoring and Assessment, 165(1–4), 539–552.

    CAS  Article  Google Scholar 

  • Park, Y.-S., Céréghino, R., Compin, A., & Lek, S. (2003). Applications of artificial neural networks for patterning and predicting aquatic insect species richness in running waters. Ecological Modelling, 160(3), 265–280.

    Article  Google Scholar 

  • Pejman, A., Bidhendi, G. N., Karbassi, A., Mehrdadi, N., & Bidhendi, M. E. (2009). Evaluation of spatial and seasonal variations in surface water quality using multivariate statistical techniques. International Journal of Environmental Science & Technology, 6(3), 467–476.

    CAS  Article  Google Scholar 

  • Rama, B., Manoj, K., & Kumar, P. (2013). Index analysis, graphical and multivariate statistical approaches for hydrochemical characterisation of Dam Oder River and its canal system, Durgapur, West Bengal, India. International Research Journal of Environmental Sciences, 2(2), 53–62.

    Google Scholar 

  • Ravansalar, M., Rajaee, T., & Zounemat-Kermani, M. (2016). A wavelet–linear genetic programming model for sodium (Na+) concentration forecasting in rivers. Journal of Hydrology, 537, 398–407.

    CAS  Article  Google Scholar 

  • Singh, K. P., Basant, A., Malik, A., & Jain, G. (2009). Artificial neural network modeling of the river water quality—a case study. Ecological Modelling, 220(6), 888–895.

    CAS  Article  Google Scholar 

  • Tomić, A. N. Š., Antanasijević, D. Z., Ristić, M. Đ., Perić-Grujić, A. A., & Pocajt, V. V. (2016). Modeling the BOD of Danube River in Serbia using spatial, temporal, and input variables optimized artificial neural network models. Environmental Monitoring and Assessment, 188(5), 1–12.

    Google Scholar 

  • Zahiri, A., & Azamathulla, H. M. (2014). Comparison between linear genetic programming and M5 tree models to predict flow discharge in compound channels. Neural Computing and Applications, 24(2), 413–420.

    Article  Google Scholar 

  • Zhou, F., Liu, Y., & Guo, H. (2007). Application of multivariate statistical modes to water quality assessment of the watercourses in Northwestern New Territories, Hong Kong. Environmental Monitoring and Assessment, 132(1–3), 1–13.

    CAS  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Azadeh Ahmadi.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ahmadi, A., Fatemi, Z. & Nazari, S. Assessment of input data selection methods for BOD simulation using data-driven models: a case study. Environ Monit Assess 190, 239 (2018). https://doi.org/10.1007/s10661-018-6608-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10661-018-6608-4

Keywords

  • Surface water quality
  • Factor analysis
  • Principal component analysis
  • Gamma test
  • Genetic programming
  • Karkheh River