Abstract
The real time hydrological data may contain noise, missing information and deviation from its original scale due to complex and nonlinear nature of hydrological processes. The data when used as it is in hydrological forecasting may create uncertainty in hydrological models, especially in data-driven models which fully rely upon the input-output data. The current research provides a simple preprocessing approach to improve the performance of ANN-based streamflow estimation models through providing a better input state. The two-step preprocessing approach includes; the data transformation through a family of power transformation, the Box-Cox transformation, and the selection of appropriate input variables through the Gamma Test. The original data, which is essentially antecedent upland catchment information of thirteen stations located in Upper Indus Basin (UIB), comprises of twenty inputs, including precipitation, solar radiation and discharge. The Box-Cox transformation has been applied to prepare a transformed data-set and the power factor, λ, (with best value of 0.005), for this transformation, has been determined using probability plots and histogram characteristics. Input combination selection procedure is carried out in WinGamma environment with the help of Genetic Algorithm (GA). Two-layer ANN models have been trained through Broyden, Fletcher and Goldfrab Shano (BFGS) training algorithm for both original and transformed data-sets. The comparison of models clearly indicate that the models developed through transformed data-set showed better performance in both training and testing phases with high values of NSE and R2 which is above 90% in most of the cases, and less other statistical errors including RMSE, VARIANCE and BIAS. Simple preprocessing options, could significantly reduce the uncertainty in ANN based hydrological models through improving the quality of real time hydrological data.
Similar content being viewed by others
References
Abrahart RJ, Anctil F, Coulibaly P, Dawson CW, Mount NJ, See LM, Shamseldin AY, Solomatine DP, Toth E, Wilby RL, Toth E (2012) Two decades of anarchy? Emerging themes and outstanding challenges for neural network river forecasting. Progress in Physical Geography: Earth and Environment 36(4), DOI: https://doi.org/10.1177/0309133312444943
Addison P (2002) The illustrated wavelet transform handbook. Institute of Physics Publishing, London, UK
Afan HA, Allawi MF, El-Shafie A, Yaseen ZM, Ahmed AN, Malek MA, Koting SB, Salih SQ, Mohtar WHMW, Lai SH, Sefelnasr A, Sherif M, El-Shafie A (2020). Input attributes optimization using the feasibility of genetic nature inspired algorithm: Application of river flow forecasting. Scientific Reports 10:1–15, DOI: https://doi.org/10.1038/s41598-020-61355-x
Afzal MT, Arslan M, Zafar S, Waqar MM (2014). Satellite derived snow cover status and trends in the indus basin. Journal of Space Technology 4:26–31
Ahmed F, Hassan M, Nisar H (2018). Developing nonlinear models for sediment load estimation in an irrigation canal. Acta Geophysica 66:1485–1494, DOI: https://doi.org/10.1007/s11600-018-0221-3
Awchi TA (2014). River discharges forecasting in Northern Iraq using different ANN techniques. Water Resources Management 28:801–814, DOI: https://doi.org/10.1007/s11269-014-0516-3
Ba H, Guo S, Wang Y, Hong X, Zhong Y, Liu Z (2017) Improving ANN model performance in runoff forecasting by adding soil moisture input and using data preprocessing techniques. Hydrology Research 49(3):744–760, DOI: https://doi.org/10.2166/nh.2017.048
Bickel PJ, Doksum KA (1981). An analysis of transformations revisited. Journal of the American Statistical Association 76:296–311, DOI: https://doi.org/10.1080/01621459.1981.10477649
Box GEP, Cox DR (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B 26:211–252, DOI: https://doi.org/10.2307/2287791
Brown G, Wyatt JL (2003) The use of the ambiguity decomposition in neural network ensemble learning methods. Proceedings of the twentieth international conference on machine learning (ICML-2003), August 21–24, Washington DC, USA, 67–74
Cannas B, Fanni A, Sias G, Tronci S, Zedda MK (2005) River flow forecasting using neural networks and wavelet analysis. Geophysical Research Abstracts 7
Cui F, Salih SQ, Choubin B, Bhagat SK, Samui P, Yaseen ZM (2020) Newly explored machine learning model for river flow time series forecasting at Mary River, Australia. Environmental Monitoring and Assessment 192, DOI: https://doi.org/10.1007/s10661-020-08724-1
Dawn (2009) Geography: The rivers of Pakistan. Dawn News, Retrieved May 10, 2020, https://www.dawn.com/news/492660/geography-the-rivers-of-pakistan
Diop L, Bodian A, Djaman K, Yaseen ZM, Deo RC, El-shafie A, Brown LC (2018) The influence of climatic inputs on stream-flow pattern forecasting: Case study of upper Senegal River. Environmental Earth Sciences 77, DOI: https://doi.org/10.1007/s12665-018-7376-8
Dirk R (2013) Frequency analysis of rainfall data. International Centre for Theoretical Physics, Trieste, Italy
Famili A, Shen W, Weber R, Simoudis E (1997) Data preprocessing and intelligent data analysis. Intelligent Data Analysis 1(1–4):3–23
Ghorbani MA, Khatibi R, Karimi V, Yaseen ZM, Zounemat-Kermani M (2018). Learning from multiple models using artificial intelligence to improve model prediction accuracies: Application to river flows. Water Resources Management 32:4201–4215, DOI: https://doi.org/10.1007/s11269-018-2038-x
Govindaraju RS (2000). Artificial neural networks in hydrology. II: Hydrologic applications. Journal of Hydrologic Engineering 5:124–137
Haimoudi EK, Cherrat L (2016) Practical application of the data preprocessing method for kohonen neural networks in pattern recognition tasks. The sixth international conference on advances in information mining and management, May 22–26, Valencia, Spain, 38–44
Hassan M, Ali Shamim M, Sikandar A, Mehmood I, Ahmed I, Ashiq SZ, Khitab A (2015) Development of sediment load estimation models by using artificial neural networking techniques. Environmental Monitoring and Assessment 187, DOI: https://doi.org/10.1007/s10661-015-4866-y
Hassan M, Hassan I (2020). Improving ANN-based streamflow estimation models for the Upper Indus Basin using satellite-derived snow cover area. Acta Geophysica 68:1791–1801, DOI: https://doi.org/10.1007/s11600-020-00491-4
Hassan M, Shamim MA, Hashmi HN, Ashiq SZ, Ahmed I, Pasha GA, Naeem UA (2014). Predicting streamflows to a multipurpose reservoir using artificial neural networks and regression techniques. Earth Science Informatics 8:337–352, DOI: https://doi.org/10.1007/s12145-014-0161-7
Hasson S, Böhner J, Lucarini V (2017). Prevailing climatic trends and runoff response from Hindukush-Karakoram-Himalaya, upper Indus Basin. Earth System Dynamics 8:337–355, DOI: https://doi.org/10.5194/esd-8-337-2017
Hayat H, Akbar TA, Tahir AA, Hassan QK, Dewan A, Irshad M (2019). Simulating current and future river-flows in the snowmelt-runo ff model and RCP scenarios. Water 11:1–19
Humphrey GB, Gibbs MS, Dandy GC, Maier HR (2016). A hybrid approach to monthly streamflow forecasting: Integrating hydrological model outputs into a Bayesian artificial neural network. Journal of Hydrology 540:623–640, DOI: https://doi.org/10.1016/j.jhydrol.2016.06.026
John JA, Draper NR (1980). An alternative family of transformations. Applied Statitics 29:190, DOI: https://doi.org/10.2307/2986305
Karthikeyan L, Kumar DN (2013). Predictability of nonstationary time series using wavelet and EMD based ARMA models. Journal of Hydrology 502:103–119, DOI: https://doi.org/10.1016/j.jhydrol.2013.08.030
Khan MH, Muhammad NS, El-Shafie A (2018) Wavelet-ANN versus ANN-based model for hydrometeorological drought forecasting. Water 10(8):1–21, DOI: https://doi.org/10.3390/w10080998
Kisi O, Cimen M (2012). Engineering applications of artificial intelligence precipitation forecasting by using wavelet-support vector machine conjunction model. Engineering Application of Artificial Intelligence 25:783–792, DOI: https://doi.org/10.1016/j.engappai.2011.11.003
Kisi O, Sanikhani H (2015). Prediction of long-term monthly precipitation using several. International Journal of Climatology 35:4139–4150, DOI: https://doi.org/10.1002/joc.4273
Kuźniar K, Zając M (2015). Some methods of pre-processing input data for neural networks. Computer Assisted Methods in Engineering and Science 22:141–151
Litta AJ, Idicula SM, Mohanty UC (2013). Artificial neural network model in prediction of meteorological parameters during premonsoon thunderstorms. International Journal of Atmospheric Sciences 2013:1–14
Lobo FG, Lima CF (2007). Adaptive population sizing schemes in genetic algorithms. Studies in Computaional Intelligence 54:185–204, DOI: https://doi.org/10.1007/978-3-540-69432-8_9
Lutz AF, Immerzeel WW, Kraaijenbrink PDA, Shrestha AB, Bierkens MFP (2016) Climate change impacts on the upper indus hydrology: Sources, shifts and extremes. PLoS One, DOI: https://doi.org/10.1371/journal.pone.0165630
Manly BF (1976). Exponential data transformation. Journal of the Royal Statistical Society. Series D 25:37–42
Mohd N, Hasen W, Rehman MZ (2013). The effect of data pre-processing on optimized training of artificial neural networks. Procedia Technology 11:32–39
Naeem U, Mughal H-U-R Ghumman AR, Shamim MA (2015) Ranking sensitive calibrating parameters of UBC watershed model. KSCE Journal of Civil Engineering 19(7):1538–1547, DOI: https://doi.org/10.1007/s12205-015-0515-9
Napolitano G, Serinaldi F, See L (2011). Impact of EMD decomposition and random initialisation of weights in ANN hindcasting of daily stream flow series?: An empirical examination. Journal of Hydrology 406:199–214, DOI: https://doi.org/10.1016/j.jhydrol.2011.06.015
Nawi NM, Atomi WH, Rehman MZ (2013). The effect of data preprocessing on optimized training of artificial neural networks. Procedia Technology 11:32–39, DOI: https://doi.org/10.1016/j.protcy.2013.12.159
Nazir HM, Hussain I, Faisal M, Shoukry AM, Gani S, Ahmad I (2019) Development of multidecomposition hybrid model for hydrological time series analysis. Complexity 2019, DOI: https://doi.org/10.1155/2019/2782715
Nourani V, Baghanam AH, Adamowski J, Kisi O (2014). Applications of hybrid wavelet — Artificial intelligence models in hydrology: A review. Journal of Hydrology 514:358–377, DOI: https://doi.org/10.1016/j.jhydrol.2014.03.057
Osborne JW (2010). Improving your data transformations: Applying the Box-Cox transformation. Practical Assessment, Research, and Evaluation 15:12, DOI: https://doi.org/10.7275/qbpc-gk17
Partal T (2007). Wavelet and neuro-fuzzy conjunction model for precipitation forecasting. Journal of Hydrology 342:199–212, DOI: https://doi.org/10.1016/j.jhydrol.2007.05.026
Peng T, Zhou J, Zhang C, Fu W (2017) Streamflow forecasting using empirical wavelet transform and artificial neural networks. Water 9(6):1–20, DOI: https://doi.org/10.3390/w9060406
Remesan R, Ahmadi A, Shamim MA, Han D (2010) Effect of data time interval on real-time flood forecasting. Journal of Hydroinformatics 12(4):396–407, DOI: https://doi.org/10.2166/hydro.2010.063
Remesan R, Shamim MA, Han D (2008). Model data selection using gamma test for daily solar radiation estimation. Hydrological Processes 2274:2267–2274, DOI: https://doi.org/10.1002/hyp.7044
Sakia RM (1992). The Box-Cox transformation technique: A review. The Statistian 41:169–178, DOI: https://doi.org/10.2307/2348250
Shamim MA, Hassan M, Ahmad S, Zeeshan M (2016) A comparison of artificial neural networks (ANN) and local linear regression (LLR) techniques for predicting monthly reservoir levels. KSCE Journal of Civil Engineering 20(3):971–977, DOI: https://doi.org/10.1007/s12205-015-0298-z
Stefansson ANK, Antonia JJ (1997). A note on the Gamma test. Neurocomputing & Applications 5:131–133
Tukey JW (1957). On the comparative anatomy of transformations. The Annals of Mathematical Statistics 32:12–40
Uysal G, Arda A (2016). Streamflow forecasting using different neural network models wiht sattelite data for a snow dominated region in Turkey. Procedia Engineering 154:1185–1192, DOI: https://doi.org/10.1016/j.proeng.2016.07.526
Wang (2006) Stochasticity, nonlinearity and forecasting of streamflow processes. IOS Press, Amsterdam, The Netherlands
Wu CL, Chau KW, Fan C (2010). Prediction of rainfall time series using modular artificial neural networks coupled with data-preprocessing techniques. Journal of Hydrology 389:146–167, DOI: https://doi.org/10.1016/j.jhydrol.2010.05.040
Xiong T, Bao Y, Hu Z (2014). Does restraining end effect matter in EMD-based modeling framework for time series prediction? Some experimental evidences. Neurocomputing 123:174–184, DOI: https://doi.org/10.1016/j.neucom.2013.07.004
Yaseen ZM, Awadh SM, Sharafati A, Shahid S (2018). Complementary data-intelligence model for river flow simulation. Journal of Hydrology 567:180–190, DOI: https://doi.org/10.1016/j.jhydrol.2018.10.020
Yaseen ZM, Naganna SR, Sa’adi Z, Samui P, Ghorbani MA, Salih SQ, Shahid S (2020). Hourly river flow forecasting: Application of emotional neural network versus multiple machine learning paradigms. Water Resources Management 34:1075–1091, DOI: https://doi.org/10.1007/s11269-020-02484-w
Zhang X, Peng Y, Zhang C, Wang B (2015). Are hybrid models integrated with data preprocessing techniques suitable for monthly streamflow forecasting? Some experiment evidences. Journal of Hydrology 530:137–152, DOI: https://doi.org/10.1016/j.jhydrol.2015.09.047
Zheng X, Wang M (2018). Comparison of data preprocessing approaches for applying deep learning to human activity. Sensors 18:2146, DOI: https://doi.org/10.3390/s18072146
Zhou T, Wang F, Yang Z (2017). Comparative analysis of ANN and SVM models combined with wavelet preprocess for groundwater. Water 9:781, DOI: https://doi.org/10.3390/w9100781
Zhu X, Wu X (2004) Class noise vs. attribute noise?: A quantitative study of their impacts. Artificial Intelligence Review 22(3)
Acknowledgments
The Authors would like to acknowledge the help of Water and Power Development Authority (WAPDA) for providing the data-set required for this research work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hassan, M., Hassan, I. Improving Artificial Neural Network Based Streamflow Forecasting Models through Data Preprocessing. KSCE J Civ Eng 25, 3583–3595 (2021). https://doi.org/10.1007/s12205-021-1859-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12205-021-1859-y