A novel algorithm for feature selection based on geographic distance metric: a case study of streamflow forecasting of Austria’s water resources

  • O. SelviEmail author
  • İ. Huseyinov
Original Paper


This paper focuses on input variable selection—feature selection—methods with the artificial neural network for the streamflow forecasting of large basins that have a variety of numerous stations. The feature selection methods in the current hydrology research community are not able to handle the problem in such basins. The paper proposes a novel feature selection algorithm—Bubble Selection—based on the idea of utilizing geographic distance as a metric. Evaluation of the performance of the algorithm is carried out by applying the Bubble Selection, to the case study of modeling Austria’s water resources of 540 stations in a single run mode. The aim is to select features for each station among 2412 stations, streamflow, precipitation, snow, snow depth, and water level measurements are available. The proposed algorithm allows considerably reducing the dimension of features. The Bubble Selection algorithm is further combined with the Sequential Forward Selection algorithm. Performance of the hybrid model is compared with the performance of Feature Ranking method in terms of the coefficient of determination, Nash–Sutcliffe Efficiency, and percent bias. The results show the superiority of the proposed hybrid algorithm over the Feature Ranking. The paper introduces a methodology to model a large basin and it reveals some skills that a feature selection algorithm should have.


Bubble selection Feature selection Sequential forward selection Feature ranking 



The authors wish to thank all who assisted in conducting this work.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

13762_2019_2485_MOESM1_ESM.pdf (2 mb)
Supplementary material 1 (PDF 2030 kb)
13762_2019_2485_MOESM2_ESM.pdf (607 kb)
Supplementary material 2 (PDF 606 kb)
13762_2019_2485_MOESM3_ESM.pdf (882 kb)
Supplementary material 3 (PDF 882 kb)
13762_2019_2485_MOESM4_ESM.xlsx (35.4 mb)
Supplementary material 4 (XLSX 36205 kb)


  1. Bowden GJ, Dandy GC, Maier HR (2005a) Input determination for neural network models in water resources applications. Part 1—background and methodology. J Hydrol 301:75–92. CrossRefGoogle Scholar
  2. Bowden GJ, Maier HR, Dandy GC (2005b) Input determination for neural network models in water resources applications. Part 2. Case study: forecasting salinity in a river. J Hydrol 301:93–107. CrossRefGoogle Scholar
  3. Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79. CrossRefGoogle Scholar
  4. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28. CrossRefGoogle Scholar
  5. Chen C, Twycross J, Garibaldi JM (2017) A new accuracy measure based on bounded relative error for time series forecasting. PLoS ONE 12:1–23. CrossRefGoogle Scholar
  6. Devia GK, Ganasri BP, Dwarakish GS (2015) A review on hydrological models. Aquat Procedia 4:1001–1007. CrossRefGoogle Scholar
  7. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection Isabelle. J Mach Learn Res 3(3):1157–1182Google Scholar
  8. Hu Z, Bao Y, Xiong T, Chiong R (2015) Hybrid filter-wrapper feature selection for short-term load forecasting. Eng Appl Artif Intell 40:17–27. CrossRefGoogle Scholar
  9. Humphrey GB, Gibbs MS, Dandy GC, Maier HR (2016) A hybrid approach to monthly streamflow forecasting: integrating hydrological model outputs into a Bayesian artificial neural network. J Hydrol 540:623–640. CrossRefGoogle Scholar
  10. James DE (1996) Straightforward statistics for the behavioral sciences. Brooks/Cole Pub. Co., Pacific GroveGoogle Scholar
  11. Jiang S, Chin KS, Wang L et al (2017) Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst Appl 82:216–230. CrossRefGoogle Scholar
  12. Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. Sci Inf Conf 2014:372–378. CrossRefGoogle Scholar
  13. Kröse B, Smagt P (1993) An introduction to neural networks. University of Amsterdam.
  14. Li Y, Li T, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53:551–577. CrossRefGoogle Scholar
  15. Lin G-F, Chen G-R (2007) A systematic approach to the input determination for neural network rainfall–runoff models. Hydrol Process 22:2524–2530. CrossRefGoogle Scholar
  16. Lin F, Liang D, Yeh CC, Huang JC (2014) Novel feature selection methods to financial distress prediction. Expert Syst Appl 41:2472–2483. CrossRefGoogle Scholar
  17. Luo X, Yuan X, Zhu S et al (2019) A hybrid support vector regression framework for streamflow forecast. J Hydrol 568:184–193. CrossRefGoogle Scholar
  18. Moriasi D, Gitau M, Pai N, Daggupati P (2015) Hydrologic and water quality models: performance measures and evaluation criteria. Trans ASABE 58:1763–1785. CrossRefGoogle Scholar
  19. Noori R, Karbassi AR, Moghaddamnia A et al (2011) Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. J Hydrol 401:177–189. CrossRefGoogle Scholar
  20. Prasad R, Deo RC, Li Y, Maraseni T (2017) Input selection and performance optimization of ANN-based streamflow forecasts in the drought-prone Murray Darling Basin region using IIS and MODWT algorithm. Atmos Res 197:42–63. CrossRefGoogle Scholar
  21. Salcedo-Sanz S, Pastor-Sánchez A, Prieto L et al (2014) Feature selection in wind speed prediction systems based on a hybrid coral reefs optimization—extreme learning machine approach. Energy Convers Manag 87:10–18. CrossRefGoogle Scholar
  22. Trancoso R, Phinn S, McVicar TR et al (2017) Regional variation in streamflow drivers across a continental climatic gradient. Ecohydrology 10:e1816. CrossRefGoogle Scholar
  23. Wang L, Wang Y, Chang Q (2016) Feature selection methods for big data bioinformatics: a survey from the search perspective. Methods 111:21–31. CrossRefGoogle Scholar
  24. Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20:606–626. CrossRefGoogle Scholar
  25. Yaseen ZM, El-shafie A, Jaafar O et al (2015) Artificial intelligence based models for stream-flow forecasting: 2000–2015. J Hydrol 530:829–844CrossRefGoogle Scholar
  26. Yaseen ZM, Jaafar O, Deo RC et al (2016) Stream-flow forecasting using extreme learning machines: a case study in a semi-arid region in Iraq. J Hydrol 542:603–614. CrossRefGoogle Scholar
  27. Zhang X, Hu Y, Xie K et al (2014) A causal feature selection algorithm for stock prediction modeling. Neurocomputing 142:48–59. CrossRefGoogle Scholar

Copyright information

© Islamic Azad University (IAU) 2019

Authors and Affiliations

  1. 1.Computer Engineering Department, Faculty of EngineeringIstanbul Aydın UniversityIstanbulTurkey
  2. 2.Software Engineering Department, Faculty of EngineeringIstanbul Aydın UniversityIstanbulTurkey

Personalised recommendations