Skip to main content
Log in

A new criteria for determining the best decomposition level and filter for wavelet-based data-driven forecasting frameworks- validating using three case studies on the CAMELS dataset

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Recently, several papers have been published regarding the use of preprocessing models, such as Discrete Wavelet, in Data-Driven Forecasting Frameworks (DDFF). However, these models face unresolved issues, including the use of future data, boundary-affected data, and incorrect selection of decomposition level and wavelet filter, which can lead to inaccurate results. In contrast, the Wavelet-based Data-Driven Forecasting Framework (WDDFF) overcomes these problems. To address the first two issues, we can use Maximal Overlap Discrete Wavelet Transform (MODWT) and a-trous algorithm (AT). Although there is currently no definitive solution for selecting the decomposition level and wavelet filter, we propose a novel approach using Entropy to address these issues. By utilizing the concept of predictability of time series using entropy, we can determine the optimal decomposition level and suitable filter to develop the Maximal Overlap Discrete Wavelet-Entropy Transform (MODWET) and apply it to WDDFF accurately. This study, demonstrates the effectiveness of MODWET through three real-world case studies on the CAMELS data set. In these studies, we will forecast the streamflow of specific stations one month ahead to prove the effectiveness of using preprocessing algorithms for forecasting models. The proposed model combines Input Variable Selection (IVS), preprocessing model, and Data-Driven Model (DDM). We will conclude that MODWET-ANN is the most effective model and highlight how entropy can accurately identify the optimal decomposition level and filter, resolving the concerns associated with using WDDFF in hydrological forecasting problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

The case study presented in this paper is available online in accordance with funder data retention policies (https://ncar.github.io/hydrology/datasets/CAMELS_attributes).

Code Availability

The codes that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • Abbasi M, Farokhnia A, Bahreinimotlagh M, Roozbahani R (2021) A hybrid of Random Forest and Deep Auto-Encoder with support vector regression methods for accuracy improvement and uncertainty reduction of long-term streamflow prediction. J Hydrol 597:125717

    Article  Google Scholar 

  • Addor N, Newman AJ, Mizukami N, Clark MP (2017) The CAMELS data set: catchment attributes and meteorology for large-sample studies. Hydrol Earth Syst Sci 21(10):5293–5313

    Article  Google Scholar 

  • Alizamir M, Shiri J, Fard AF, Kim S, Gorgij AD, Heddam S, Singh VP (2023) Improving the accuracy of daily solar radiation prediction by climatic data using an efficient hybrid deep learning model: long short-term memory (LSTM) network coupled with wavelet transform. Eng Appl Artif Intell 123:106199

    Article  Google Scholar 

  • Chakraborty S, Biswas S (2023) “River discharge prediction using wavelet-based artificial neural network and long short-term memory models: a case study of Teesta River Basin, India.“ Stoch Env Res Risk Assess, 1–22

  • Chen Y, Yu S, Islam S, Lim CP, Muyeen S (2022) Decomposition-based wind power forecasting models and their boundary issue: an in-depth review and comprehensive discussion on potential solutions. Energy Rep 8:8805–8820

    Article  Google Scholar 

  • Dariane A, Behbahani M (2022) “Development of an Efficient Input Selection Method for NN Based Streamflow Model.“ J Appl Water Eng Res.

  • Doss-Gollin J, Farnham DJ, Steinschneider S, Lall U (2019) Robust adaptation to multiscale climate variability. Earths Future 7(7):734–747

    Article  Google Scholar 

  • Du K, Zhao Y, Lei J (2017) The incorrect usage of singular spectral analysis and discrete wavelet transform in hybrid models to predict hydrological time series. J Hydrol 552:44–51

    Article  Google Scholar 

  • Eureqa (2009) “http://52.45.171.32/products/eureqa/trial-onprem/Nutonian

  • Fang W, Huang S, Ren K, Huang Q, Huang G, Cheng G, Li K (2019) Examining the applicability of different sampling techniques in the development of decomposition-based streamflow forecasting models. J Hydrol 568:534–550

    Article  Google Scholar 

  • Galelli S, Humphrey GB, Maier HR, Castelletti A, Dandy GC, Gibbs MS (2014) An evaluation framework for input variable selection algorithms for environmental data-driven models. Environ Model Softw 62:33–51

    Article  Google Scholar 

  • Garg A, Garg A, Tai K (2014) A multi-gene genetic programming model for estimating stress-dependent soil water retention curves. Comput GeoSci 18(1):45–56

    Article  Google Scholar 

  • Garland J, James R, Bradley E (2014) Model-free quantification of time-series predictability. Phys Rev E 90(5):052910

    Article  Google Scholar 

  • Guntu RK, Yeditha PK, Rathinasamy M, Perc M, Marwan N, Kurths J, Agarwal A (2020) Wavelet entropy-based evaluation of intrinsic predictability of time series. Chaos: An Interdisciplinary Journal of Nonlinear Science 30(3):033117

    Article  Google Scholar 

  • Hazarika BB, Gupta D, Natarajan N (2022) Wavelet kernel least square twin support vector regression for wind speed prediction. Environ Sci Pollut Res 29(57):86320–86336

    Article  Google Scholar 

  • He J, Valeo C, Chu A, Neumann NF (2011) Prediction of event-based stormwater runoff quantity and quality by ANNs developed using PMI-based input selection. J Hydrol 400(1–2):10–23

    Article  CAS  Google Scholar 

  • Huang S, Chang J, Huang Q, Chen Y (2014) Monthly streamflow prediction using modified EMD-based support vector machine. J Hydrol 511:764–775

    Article  Google Scholar 

  • Hulle MMV (2005) Edgeworth approximation of multivariate differential entropy. Neural Comput 17(9):1903–1910

    Article  Google Scholar 

  • Jamei M, Ahmadianfar I, Karbasi M, Malik A, Kisi O, Yaseen ZM (2023) Development of wavelet-based kalman online sequential extreme learning machine optimized with boruta-random forest for drought index forecasting. Eng Appl Artif Intell 117:105545

    Article  Google Scholar 

  • Khazaee Poul A, Shourian M, Ebrahimi H (2019) A comparative study of MLR, KNN, ANN and ANFIS models with wavelet transform in monthly stream flow prediction. Water Resour Manage 33(8):2907–2923

    Article  Google Scholar 

  • Li J, Yuan X (2023) Daily streamflow forecasts based on Cascade Long Short-Term memory (LSTM) model over the Yangtze River Basin. Water 15(6):1019

    Article  Google Scholar 

  • Li Z, Zhang Y-K (2008) Multi-scale entropy analysis of Mississippi river flow. Stoch Env Res Risk Assess 22(4):507–512

    Article  Google Scholar 

  • Li C, Lin Q, Huang D, Grifoll M, Yang D, Feng H (2023) Is entropy an indicator of port traffic predictability? The evidence from chinese ports. Physica A 612:128483

    Article  Google Scholar 

  • Liu Z, Zhou P, Chen G, Guo L (2014) Evaluating a coupled discrete wavelet transform and support vector regression for daily and monthly streamflow forecasting. J Hydrol 519:2822–2831

    Article  Google Scholar 

  • López-Vázquez C, Hochsztain E (2019) Extended and updated tables for the Friedman rank test. Commun Statistics-Theory Methods 48(2):268–281

    Article  Google Scholar 

  • Marwan N, Wessel N, Meyerfeldt U, Schirdewan A, Kurths J (2002) Recurrence-plot-based measures of complexity and their application to heart-rate-variability data. Phys Rev E 66(2):026702

    Article  Google Scholar 

  • May RJ, Maier HR, Dandy GC, Fernando TG (2008) Non-linear variable selection for artificial neural networks using partial mutual information. Environ Model Softw 23(10–11):1312–1326

    Article  Google Scholar 

  • Modaresi F, Araghinejad S, Ebrahimi K (2018) A comparative assessment of artificial neural network, generalized regression neural network, least-square support vector regression, and K-nearest neighbor regression for monthly streamflow forecasting in linear and nonlinear conditions. Water Resour Manage 32(1):243–258

    Article  Google Scholar 

  • Nigsch F, Bender A, van Buuren B, Tissen J, Nigsch E, Mitchell JB (2006) Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization. J Chem Inf Model 46(6):2412–2422

    Article  CAS  Google Scholar 

  • Nilsson P, Uvo CB, Berndtsson R (2006) Monthly runoff simulation: comparing and combining conceptual and neural network models. J Hydrol 321(1–4):344–363

    Article  Google Scholar 

  • Palmer T, Hagedorn R (2006) Predictability of weather and climate. Cambridge University Press

  • Pospelov B, Rybka E, Meleshchenko R, Borodych P, Gornostal S (2019) Development of the method for rapid detection of hazardous atmospheric pollution of cities with the help of recurrence measures. Eastern-European J Enterp Technol 1(10):29–35

    Article  Google Scholar 

  • Quilty J, Adamowski J (2018) Addressing the incorrect usage of wavelet-based hydrological and water resources forecasting models for real-world applications with best practices and a new forecasting framework. J Hydrol 563:336–353

    Article  Google Scholar 

  • Quilty J, Adamowski J (2021) A maximal overlap discrete wavelet packet transform integrated approach for rainfall forecasting–A case study in the Awash River Basin (Ethiopia). Environ Model Softw 144:105119

    Article  Google Scholar 

  • Quilty J, Adamowski J, Khalil B, Rathinasamy M (2016) Bootstrap rank-ordered conditional mutual information (broCMI): a nonlinear input variable selection method for water resources modeling. Water Resour Res 52(3):2299–2326

    Article  Google Scholar 

  • Ravi Kumar Guntu PKY, Rathinasamy M (2020) Matjaž Perc, Norbert Marwan, Jürgen Kurths, Ankit Agarwal “Wavelet entropy-based evaluation of intrinsic predictability of time series.“ Chaos

  • Ren K, Fang W, Qu J, Zhang X, Shi X (2020) Comparison of eight filter-based feature selection methods for monthly streamflow forecasting–three case studies on CAMELS data sets. J Hydrol 586:124897

    Article  Google Scholar 

  • Santos CAG, do Nascimento GR, de Farias CAS, da Silva RM, Mishra M (2023) Short-and long-term streamflow forecasting using wavelet neural networks for complex watersheds: a case study in the Mahanadi River. India " Ecological Informatics 73:101945

    Article  Google Scholar 

  • Shannon CE (1948) Claude Elwood Shannon. Bell Syst Tech J 27:379–423

    Article  Google Scholar 

  • Shoaib M, Shamseldin AY, Melville BW, Khan MM (2016) A comparison between wavelet based static and dynamic neural network approaches for runoff prediction. J Hydrol 535:211–225

    Article  Google Scholar 

  • Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24(1):175–186

    Article  Google Scholar 

  • Wu J, Wang Z (2022) A hybrid model for water quality prediction based on an artificial neural network, wavelet transform, and long short-term memory. Water 14(4):610

    Article  CAS  Google Scholar 

  • Yaseen ZM, El-Shafie A, Jaafar O, Afan HA, Sayl KN (2015) Artificial intelligence based models for stream-flow forecasting: 2000–2015. J Hydrol 530:829–844

    Article  Google Scholar 

  • Zhang X, Peng Y, Zhang C, Wang B (2015) Are hybrid models integrated with data preprocessing techniques suitable for monthly streamflow forecasting? Some experiment evidences. J Hydrol 530:137–152

    Article  Google Scholar 

Download references

Acknowledgements

We acknowledge Dr. John Quilty from Waterloo University for providing helpful idea and technical feedback on this paper.

Funding

The authors declare that no funds, grants, or other supports were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

A.M contributed to the study conception and material preparation. Codlings and analyses were performed by M.M.B. The first draft of the manuscript was written by M.M.B. Revision was done by M. M. B and A. M.

Corresponding author

Correspondence to Mohammad Reza Mazarei Behbahani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mazarei Behbahani, M., Mazarei, A. A new criteria for determining the best decomposition level and filter for wavelet-based data-driven forecasting frameworks- validating using three case studies on the CAMELS dataset. Stoch Environ Res Risk Assess 37, 4827–4842 (2023). https://doi.org/10.1007/s00477-023-02531-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-023-02531-z

Keywords

Navigation