# Improving flood forecasting through feature selection by a genetic algorithm – experiments based on real data from an Amazon rainforest river

## Abstract

This paper addresses the problem of feature selection aiming to improve a flood forecasting model. The proposed model is carried out through a case study that uses 18 different time series of thirty-five years of hydrological data, forecasting the level of the Xingu River, in the Amazon rainforest in Brazil. We employ a Genetic Algorithm for the task of feature selection and exploit several different genetic parameters seeking to improve the accuracy of the prediction. The features selected by the Genetic Algorithm are used as input of a Linear Regression model that performs the forecasting. A statistical analysis verifies that the final model can predict the river level with high accuracy, which obtains a coefficient of determination equal to 0.988. Hence, the proposed Genetic Algorithm showed to be successful in selecting the most relevant features.

This is a preview of subscription content, access via your institution.

## Notes

1. 1.

Equation for the Coefficient of Determination:

1. 1.

$$R^{2} = 1 - \frac {{\sum }_{i=1}^{n}(y_{true}- y_{pred})^{2}}{{\sum }_{i=1}^{n}(y_{true} - \bar {y})^{2}}$$, where ytrue is the data set, ypred is the prediction, $$\bar {y}$$ is the average of y, and n is number of the observations.

2. 2.

Equation for the Root Mean Square Error:

1. 2.

$$RMSE = \sqrt { \frac {1}{n} {\sum }_{i=i}^{n} (y_{true} - y_{pred})^{2}}$$

3. 3.

Equation for the Mean Absolute Error:

1. 3.

$$MAE = \frac {1}{n} {\sum }_{i=i}^{n} |y_{true} - y_{pred}|$$

## References

1. Bhandari D, Murthy CA (1996) Genetic algorithm with elitist model and its convergence. IJPRAI 10(6):731–747

2. Chen ST, Yu PS (2007) Pruning of support vector networks on flood forecasting. J Hydrol 347(1):67–78

3. de Lucena DV, de Lima TW, Soares AS, Coelho CJ (2012) Multi-objective evolutionary algorithm nsga-ii for variables selection in multivariate calibration problems. Int J Natural Comput Res 3:43–58

4. de Oliveira LL, Freitas AA, Tinós R. (2018) Multi-objective genetic algorithms in the study of the genetic code’s adaptability. Inf Sci 425:48–61

5. de Paula TI (2015) Avaliação da influência de parêmetros do algoritmo genético na otimização de um problema multiobjetivo utilizando-se arranjo de misturas. Master’s thesis, PPGEP, Univesidade Federal de Itajubá

6. Dornelles F, Goldenfum JA, Pedrollo OC (2013) Artificial neural network methods applied to forecasting river levels. Revista Brasileira de Recursos Hídricos 18:45–54

7. Eiben AE, Schippers CA (1998) On evolutionary exploration and exploitation. Fundamenta Informaticae 35(1-4):35–50

8. EM-DAT (2016) The international disaster database. Emdat Advanced Search. Available at www.emdat.be/advanced_search/index.html

9. Francescomarino CD, Dumas M, Federici M, Ghidini C, Maggi FM, Rizzi W, Simonetto L (2018) Genetic algorithms for hyperparameter optimization in predictive business process monitoring. Inf Syst 74(Part):67–83

10. Franco VS (2014) Previsao hidrológica de cheia sazonal do rio xingu em altamira-pa. Master’s thesis, PPGCA, Universidade Federal do Pará

11. Furquim G, Pessin G, Faiçal BS, Mendiondo EM, Ueyama J (2016) Improving the accuracy of a flood forecasting model by means of machine learning and chaos theory. Neural Comput & Applic 27 (5):1129–1141

12. Galelli S, Castelletti A (2013) Tree-based iterative input variable selection for hydrological modeling. Water Resour Res 49(7): 4295–4310

13. Galelli S, Humphrey GB, Maier HR, Castelletti A, Dandy GC, Gibbs MS (2014) An evaluation framework for input variable selection algorithms for environmental data-driven models. Environ Model Softw 62:33–51

14. Gavriilidis A, Velten J, Tilgner S, Kummert A (2018) Machine learning for people detection in guidance functionality of enabling health applications by means of cascaded SVM classifiers. J Franklin Institute 355(4):2009–2021

15. Gonçalves VP, Giancristofaro GT, Geraldo Filho P, Johnson T, Carvalho V, Pessin G, de Almeida Neris VP, Ueyama J (2016) Assessing users emotion at interaction time: a multimodal approach with multiple sensors. Soft Comput 21(18): 5309–5323

16. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn 3:1157–1182

17. Haddad K, Rahman A (2020) Regional flood frequency analysis: evaluation of regions in cluster space using support vector regression. Nat Hazards 102:489–517

18. Hall MA (1999) Correlation-based feature selection for machine learning

19. Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press

20. IPCC (2013) Climate change 2013: the physical science basis. contribution of working group I to the fifth assessment report of the intergovernmental panel on climate change. Cambridge University Press, Cambridge

21. Jing M, Jie Y, Shou-yi L, Lu W (2018) Application of fuzzy analytic hierarchy process in the risk assessment of dangerous small-sized reservoirs. Int J Mach Learn Cybern 9(1):113–123

22. Khaji E, Mohammadi AS (2014) A heuristic method to generate better initial population for evolutionary methods. CoRR arXiv:1406.4518

23. Linden R (2012) Algoritmo genetico editora ciencia mordena

24. Mokadem D, Amine A, Elberrichi Z, Helbert D (2018) Detection of urban areas using genetic algorithms and kohonen maps on multispectral images. IJOCI 8(1):46–62

25. Montgomery DC (2013) Design and analysis of experiments, 8th edn. Wiley, New York

26. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

27. Pfafstetter O (1989) Classificação de bacias hidrográficas - Metodologia de classificação Departamento Nacional de Obras de Saneamento (RJ)

28. Rahnamayan S, Tizhoosh HR, Salama MMA (2007) A novel population initialization method for accelerating evolutionary algorithms. Comput Math Applic 53(10):1605–1614

29. Rocha EJP, Rolim PAM, Santos DM (2007) Modelo estatístico hidroclimático para previsão de níveis em Altamira-PA. In: XVII Simpósio brasileiro de recursos hídricos

30. Rodrigues MM, Costa MGF, Filho CFFC (2015) Proposta de um método para previsão de cheias sazonais utilizando redes neurais artificiais: Uma aplicação no rio amazonas. In: Workshop de computação aplicada a gestão do meio ambiente e recursos naturais (WCAMA)

31. Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52:591–611

32. Silva B, Netto MAS, Cunha RLF (2018) Jobpruner: a machine learning assistant for exploring parameter spaces in HPC applications. Future Gen Comp Sys 83:144–157

33. Souza F, Araújo R (2011) Variable and time-lag selection using empirical data. In: IEEE 16th conference on emerging technologies & factory automation, ETFA 2011, pp 1–8

34. Sumbana MIM, Silva AJC, Gonçalves MA, Almeida JM, Pappa GL (2012) Seleção de atributos utilizando algoritmos genéticos para detecção do vandalismo na wikipedia. In: XXVII Simpósio brasileiro de banco de dados - short papers, São Paulo, São Paulo, Brasil, October 15-18, 2012, pp 209–216

35. Thomas JM (2017) Complex network embedding in the hyperbolic space using non-linear unsupervised machine learning techniques. Ph.D. thesis, Dresden University of Technology, Germany

36. Tran H, Muttil N, Perera B (2015) Selection of significant input variables for time series forecasting. Environmental Modelling & Software 64(C):156–163

37. Ueyama J, Faiçal BS, Mano LY, Bayer G, Pessin G, Gomes PH (2017) Enhancing reliability in wireless sensor networks for adaptive river monitoring systems: reflections on their long-term deployment in Brazil. Computers, Environment and Urban Systems 65:41–52

38. UFSC (2013) Atlas Brasileiro de Desastres Naturais: 1991 a 2012. Centro Universitario de Estudos e Pesquisa sobre Desastres. Universidade Federal de Santa Catarina

39. Wu J, Liu H, Wei G, Song T, Zhang C, Zhou H (2019) Flash flood forecasting using support vector regression model in a small mountainous catchment. Water 11:1327

## Acknowledgments

The authors would like to thank the following colleagues due to help revising the manuscript and providing ideas to its best organization: Bruno S. Faiçal, Leandro Y. Mano, Vinícius Gonçalves and Pedro H. Gomes. The authors would like also to thank Márcio Nirlando Gomes Lopes due to his help in the development of Figure 3. Dr. J. Ueyama would like to acknowledge FAPESP, process 2018/17335-9.

## Author information

Authors

### Corresponding author

Correspondence to Alen Costa Vieira.

### Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by: H. Babaie

## Rights and permissions

Reprints and Permissions

Vieira, A.C., Garcia, G., Pabón, R.E.C. et al. Improving flood forecasting through feature selection by a genetic algorithm – experiments based on real data from an Amazon rainforest river. Earth Sci Inform 14, 37–50 (2021). https://doi.org/10.1007/s12145-020-00528-8