Abstract
Recently, several papers have been published regarding the use of preprocessing models, such as Discrete Wavelet, in Data-Driven Forecasting Frameworks (DDFF). However, these models face unresolved issues, including the use of future data, boundary-affected data, and incorrect selection of decomposition level and wavelet filter, which can lead to inaccurate results. In contrast, the Wavelet-based Data-Driven Forecasting Framework (WDDFF) overcomes these problems. To address the first two issues, we can use Maximal Overlap Discrete Wavelet Transform (MODWT) and a-trous algorithm (AT). Although there is currently no definitive solution for selecting the decomposition level and wavelet filter, we propose a novel approach using Entropy to address these issues. By utilizing the concept of predictability of time series using entropy, we can determine the optimal decomposition level and suitable filter to develop the Maximal Overlap Discrete Wavelet-Entropy Transform (MODWET) and apply it to WDDFF accurately. This study, demonstrates the effectiveness of MODWET through three real-world case studies on the CAMELS data set. In these studies, we will forecast the streamflow of specific stations one month ahead to prove the effectiveness of using preprocessing algorithms for forecasting models. The proposed model combines Input Variable Selection (IVS), preprocessing model, and Data-Driven Model (DDM). We will conclude that MODWET-ANN is the most effective model and highlight how entropy can accurately identify the optimal decomposition level and filter, resolving the concerns associated with using WDDFF in hydrological forecasting problems.
Similar content being viewed by others
Data Availability
The case study presented in this paper is available online in accordance with funder data retention policies (https://ncar.github.io/hydrology/datasets/CAMELS_attributes).
Code Availability
The codes that support the findings of this study are available from the corresponding author upon reasonable request.
References
Abbasi M, Farokhnia A, Bahreinimotlagh M, Roozbahani R (2021) A hybrid of Random Forest and Deep Auto-Encoder with support vector regression methods for accuracy improvement and uncertainty reduction of long-term streamflow prediction. J Hydrol 597:125717
Addor N, Newman AJ, Mizukami N, Clark MP (2017) The CAMELS data set: catchment attributes and meteorology for large-sample studies. Hydrol Earth Syst Sci 21(10):5293–5313
Alizamir M, Shiri J, Fard AF, Kim S, Gorgij AD, Heddam S, Singh VP (2023) Improving the accuracy of daily solar radiation prediction by climatic data using an efficient hybrid deep learning model: long short-term memory (LSTM) network coupled with wavelet transform. Eng Appl Artif Intell 123:106199
Chakraborty S, Biswas S (2023) “River discharge prediction using wavelet-based artificial neural network and long short-term memory models: a case study of Teesta River Basin, India.“ Stoch Env Res Risk Assess, 1–22
Chen Y, Yu S, Islam S, Lim CP, Muyeen S (2022) Decomposition-based wind power forecasting models and their boundary issue: an in-depth review and comprehensive discussion on potential solutions. Energy Rep 8:8805–8820
Dariane A, Behbahani M (2022) “Development of an Efficient Input Selection Method for NN Based Streamflow Model.“ J Appl Water Eng Res.
Doss-Gollin J, Farnham DJ, Steinschneider S, Lall U (2019) Robust adaptation to multiscale climate variability. Earths Future 7(7):734–747
Du K, Zhao Y, Lei J (2017) The incorrect usage of singular spectral analysis and discrete wavelet transform in hybrid models to predict hydrological time series. J Hydrol 552:44–51
Eureqa (2009) “http://52.45.171.32/products/eureqa/trial-onprem/” Nutonian
Fang W, Huang S, Ren K, Huang Q, Huang G, Cheng G, Li K (2019) Examining the applicability of different sampling techniques in the development of decomposition-based streamflow forecasting models. J Hydrol 568:534–550
Galelli S, Humphrey GB, Maier HR, Castelletti A, Dandy GC, Gibbs MS (2014) An evaluation framework for input variable selection algorithms for environmental data-driven models. Environ Model Softw 62:33–51
Garg A, Garg A, Tai K (2014) A multi-gene genetic programming model for estimating stress-dependent soil water retention curves. Comput GeoSci 18(1):45–56
Garland J, James R, Bradley E (2014) Model-free quantification of time-series predictability. Phys Rev E 90(5):052910
Guntu RK, Yeditha PK, Rathinasamy M, Perc M, Marwan N, Kurths J, Agarwal A (2020) Wavelet entropy-based evaluation of intrinsic predictability of time series. Chaos: An Interdisciplinary Journal of Nonlinear Science 30(3):033117
Hazarika BB, Gupta D, Natarajan N (2022) Wavelet kernel least square twin support vector regression for wind speed prediction. Environ Sci Pollut Res 29(57):86320–86336
He J, Valeo C, Chu A, Neumann NF (2011) Prediction of event-based stormwater runoff quantity and quality by ANNs developed using PMI-based input selection. J Hydrol 400(1–2):10–23
Huang S, Chang J, Huang Q, Chen Y (2014) Monthly streamflow prediction using modified EMD-based support vector machine. J Hydrol 511:764–775
Hulle MMV (2005) Edgeworth approximation of multivariate differential entropy. Neural Comput 17(9):1903–1910
Jamei M, Ahmadianfar I, Karbasi M, Malik A, Kisi O, Yaseen ZM (2023) Development of wavelet-based kalman online sequential extreme learning machine optimized with boruta-random forest for drought index forecasting. Eng Appl Artif Intell 117:105545
Khazaee Poul A, Shourian M, Ebrahimi H (2019) A comparative study of MLR, KNN, ANN and ANFIS models with wavelet transform in monthly stream flow prediction. Water Resour Manage 33(8):2907–2923
Li J, Yuan X (2023) Daily streamflow forecasts based on Cascade Long Short-Term memory (LSTM) model over the Yangtze River Basin. Water 15(6):1019
Li Z, Zhang Y-K (2008) Multi-scale entropy analysis of Mississippi river flow. Stoch Env Res Risk Assess 22(4):507–512
Li C, Lin Q, Huang D, Grifoll M, Yang D, Feng H (2023) Is entropy an indicator of port traffic predictability? The evidence from chinese ports. Physica A 612:128483
Liu Z, Zhou P, Chen G, Guo L (2014) Evaluating a coupled discrete wavelet transform and support vector regression for daily and monthly streamflow forecasting. J Hydrol 519:2822–2831
López-Vázquez C, Hochsztain E (2019) Extended and updated tables for the Friedman rank test. Commun Statistics-Theory Methods 48(2):268–281
Marwan N, Wessel N, Meyerfeldt U, Schirdewan A, Kurths J (2002) Recurrence-plot-based measures of complexity and their application to heart-rate-variability data. Phys Rev E 66(2):026702
May RJ, Maier HR, Dandy GC, Fernando TG (2008) Non-linear variable selection for artificial neural networks using partial mutual information. Environ Model Softw 23(10–11):1312–1326
Modaresi F, Araghinejad S, Ebrahimi K (2018) A comparative assessment of artificial neural network, generalized regression neural network, least-square support vector regression, and K-nearest neighbor regression for monthly streamflow forecasting in linear and nonlinear conditions. Water Resour Manage 32(1):243–258
Nigsch F, Bender A, van Buuren B, Tissen J, Nigsch E, Mitchell JB (2006) Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization. J Chem Inf Model 46(6):2412–2422
Nilsson P, Uvo CB, Berndtsson R (2006) Monthly runoff simulation: comparing and combining conceptual and neural network models. J Hydrol 321(1–4):344–363
Palmer T, Hagedorn R (2006) Predictability of weather and climate. Cambridge University Press
Pospelov B, Rybka E, Meleshchenko R, Borodych P, Gornostal S (2019) Development of the method for rapid detection of hazardous atmospheric pollution of cities with the help of recurrence measures. Eastern-European J Enterp Technol 1(10):29–35
Quilty J, Adamowski J (2018) Addressing the incorrect usage of wavelet-based hydrological and water resources forecasting models for real-world applications with best practices and a new forecasting framework. J Hydrol 563:336–353
Quilty J, Adamowski J (2021) A maximal overlap discrete wavelet packet transform integrated approach for rainfall forecasting–A case study in the Awash River Basin (Ethiopia). Environ Model Softw 144:105119
Quilty J, Adamowski J, Khalil B, Rathinasamy M (2016) Bootstrap rank-ordered conditional mutual information (broCMI): a nonlinear input variable selection method for water resources modeling. Water Resour Res 52(3):2299–2326
Ravi Kumar Guntu PKY, Rathinasamy M (2020) Matjaž Perc, Norbert Marwan, Jürgen Kurths, Ankit Agarwal “Wavelet entropy-based evaluation of intrinsic predictability of time series.“ Chaos
Ren K, Fang W, Qu J, Zhang X, Shi X (2020) Comparison of eight filter-based feature selection methods for monthly streamflow forecasting–three case studies on CAMELS data sets. J Hydrol 586:124897
Santos CAG, do Nascimento GR, de Farias CAS, da Silva RM, Mishra M (2023) Short-and long-term streamflow forecasting using wavelet neural networks for complex watersheds: a case study in the Mahanadi River. India " Ecological Informatics 73:101945
Shannon CE (1948) Claude Elwood Shannon. Bell Syst Tech J 27:379–423
Shoaib M, Shamseldin AY, Melville BW, Khan MM (2016) A comparison between wavelet based static and dynamic neural network approaches for runoff prediction. J Hydrol 535:211–225
Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24(1):175–186
Wu J, Wang Z (2022) A hybrid model for water quality prediction based on an artificial neural network, wavelet transform, and long short-term memory. Water 14(4):610
Yaseen ZM, El-Shafie A, Jaafar O, Afan HA, Sayl KN (2015) Artificial intelligence based models for stream-flow forecasting: 2000–2015. J Hydrol 530:829–844
Zhang X, Peng Y, Zhang C, Wang B (2015) Are hybrid models integrated with data preprocessing techniques suitable for monthly streamflow forecasting? Some experiment evidences. J Hydrol 530:137–152
Acknowledgements
We acknowledge Dr. John Quilty from Waterloo University for providing helpful idea and technical feedback on this paper.
Funding
The authors declare that no funds, grants, or other supports were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
A.M contributed to the study conception and material preparation. Codlings and analyses were performed by M.M.B. The first draft of the manuscript was written by M.M.B. Revision was done by M. M. B and A. M.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mazarei Behbahani, M., Mazarei, A. A new criteria for determining the best decomposition level and filter for wavelet-based data-driven forecasting frameworks- validating using three case studies on the CAMELS dataset. Stoch Environ Res Risk Assess 37, 4827–4842 (2023). https://doi.org/10.1007/s00477-023-02531-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-023-02531-z