An operational machine learning approach to predict mosquito abundance based on socioeconomic and landscape patterns
- 210 Downloads
Socioeconomic and landscape factors influence mosquito abundance especially in urban areas. Few studies addressed how socioeconomic and landscape factors, especially at micro-scale for mosquito life history, determine mosquito abundance.
We aim to predict mosquito abundance based on socioeconomic and/or landscape factors using machine learning framework. Additionally, we determine these factors’ response to mosquito abundance.
We identified 3985 adult mosquitoes (majority of which were Aedes mosquitoes) in 90 sampling sites from Charlotte, NC, USA in 2017. Seven socioeconomic and seven landscape factors were used to predict mosquito abundance. Three supervised learning models, k-nearest neighbor (kNN), artificial neural network (ANN), and support vector machine (SVM) were constructed, tuned, and evaluated using both continuous input factors and binary inputs. Random forest (RF) was used to assess individual input’s relative importance and response to mosquito abundance.
We showed that landscape factors alone yielded equal or better predictability than socioeconomic factors. The inclusion of both types of factors further improved model accuracy using binary inputs. kNN also had robust performance regardless of inputs (accuracy > 95% for binary and > 99% for continuous input data). Landscape factors group had higher importance than socioeconomic group (54.4% vs. 45.6%). Landscape heterogeneity (measured by Shannon index) was the single most important input factor for mosquito abundance.
Landscape factors were the key for mosquito abundance. Machine learning models were powerful tools to handle complex datasets with multiple socioeconomic and landscape factors to accurately predict mosquito abundance.
KeywordsSocioeconomic gradient Landscape heterogeneity Mosquito abundance Machine learning Urban ecology
Population size factor
Employment rate factor
Education status factor
Population density factor
Home sale price factor
Violent crime rate factor
Tree canopy factor
Artificial neural network
Support vector machine
Generalized linear model
Non-metric multidimensional scaling
Root mean squared error
We thank Mecklenburg County Health Department for providing funding for the mosquito sampling work in 2017 and access to the orthophotos. We are also grateful for the field work of ten volunteering undergraduate students from UNC Charlotte.
- Burger SV (2018) Introduction to machine learning with R. O’Reilly, SebastopolGoogle Scholar
- Chen S, Fleischer SJ, Saunders MC, Thomas MB (2015) The influence of diurnal temperature variation on degree-day accumulation and insect life history. PLoS ONE 10(3):1–15Google Scholar
- Degroote S, Bermudez-Tamayo C, Ridde V (2018) Approach to identifying research gaps on vector-borne and other infectious diseases of poverty in urban settings: scoping review protocol from the VERDAS consortium and reflections on the project’s implementation. Infect Dis Poverty 7(1):98CrossRefGoogle Scholar
- Gordis L (2013) Epidemiology, 5th edn. Elsevier, CanadaGoogle Scholar
- Lantz B (2015) Machine learning with R, 2nd edn. Packt Publishing, BirminghamGoogle Scholar
- Lary DJ, Woof S, Faruque F et al (2014) Holistics 3.0 for health. Int J Geoinf 3:1023–1038Google Scholar
- Lesmeister C (2015) Mastering machine learning with R. Packt Publishing, BirminghamGoogle Scholar
- Liaw A, Wiener M (2002) Classification and regression by randomForest. RNews 2(3):18–22Google Scholar
- McGarigal K, Cushman, SA, Ene E (2012) Fragstats v4: spatial pattern analysis program for categorical and continous maps. http://www.umass.edu/landeco/research/fragstats/fragstats.html
- R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
- The World Bank (2015) GINI index (world bank estimate). The World Bank: 1–16. http://data.worldbank.org/indicator/SI.POV.GINI
- Townsend AT, Vieglais DA (2001) Predicting species invasions using ecological niche modeling: new approaches from bioinformatics attack a pressing problem: a new approach to ecological nich modeling, based on new tolls drawn from biodiversity informatics, is applied to the challenge of predicting potential species’ invasions. Bioscience 51(5):363–371CrossRefGoogle Scholar
- World Health Organization (2018) Vector-borne disease. http://www.who.int/heli/risks/vectors/vector/en/. Accessed 20 Oct 2018
- World Health Organization (2018) Handbook for integrated vector management. http://apps.who.int/iris/bitstream/handle/10665/44768/9789241502801_eng.pdf. Accessed 20 Oct 2018
- Young BD, Yarie J, Verbyla D, Huettmann F, Chapin FS (2018) Machine learning for ecology and sustainable natural resource management. Springer Nature, SwitzerlandGoogle Scholar