Abstract
Boosting, bagging and ensembles are intellectually ‘deep’ modeling methods well-known and described for several decades. Great computing tools exist to use those methods. But with few exceptions they have not been used well for natural resource conservation management or ecology; for instance, the advanced works of Breiman (2001), Friedman (2001), and Elder (2003) still await generic recognition. Here I present on these methods, conveniently driven by binary recursive partitioning (Classification and Regression Trees CARTs), and many of their real-world aspects and usages. I elaborate on applications and on some of the implementation hurdles known. It is shown that those machine learning methods are the essential part of the new generation of quantitative reasoning. It allows for relevant progress, all while the global environmental state decays further, climate change remain unaccounted for and sustainability policies remain outdated urging for an effective change of global culture and governance.
Keywords
- Boosting
- Bagging
- Ensembles
- Machine learning
- Classification and Regression Trees (CARTs)
- Global sustainability culture and governance
“My goal is simple. It is a complete understanding of the universe, why it is as it is and why it exists at all.”
Stephen Hawkins
This is a preview of subscription content, access via your institution.
Buying options
References
Aggarwal C (2015) Data mining: the textbook. Springer
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr AC-19. Institute of Statistical Mathematics, Minato-ku, pp 716–723
Alexander JC (2013) The dark side of modernity. Polity Press, Cambridge
Anderson DR, Burnham KP, Thompson WL (2000) Null hypothesis testing: problems, prevalence, and an alternative. J Wildl Manag 64:912–923
Araujo MB, and New M (2007) Ensemble forecasting of speies distributions. Trends in Ecology and Evolution 22:42–47
Arnold TW (2010) Uninformative parameters and model selection using Akaike’s information criterion. J Wildl Manag 74:1175–1178
Baltensperger AP, Huettmann F (2015) Predicted shifts in small mammal distributions and biodiversity in the altered future environment of Alaska: an open access data and Machine Learning. PLoS One. https://doi.org/10.1371/journal.pone.0132054
Berthold P (2016) Mein Leben fuer die Voegel. Kosmos Publisher, Berlin
Breiman L (1996) Bagging predictors. Mach Learn 26:123–140
Breiman L (1998) Arcing classifier (with discussion and a rejoinder by the author). Ann Stat 26(3):801–849. https://doi.org/10.1214/aos/1024691079
Breiman L (2001a) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16:199–231
Breiman L (2001b) Random forests. Mach Learn 45:5–32
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York
Cai T, Huettmann F, Guo Y (2014) Using stochastic gradient boosting to infer stopover habitat selection and distribution of hooded cranes Grus monacha during spring migration in Lindian, Northeast China. PLos ONE 9. https://doi.org/10.1371/journal.pone.0097372
Chunrong M, Huettmann F, Guo Y (2016) Climate envelope predictions indicate an enlarged suitable wintering distribution for great bustards (Otis tarda dybowski) in China for the 21st century. PeerJ 4:e1630. https://doi.org/10.7717/peerj.1630
Chunrong M, Huettmann F, Guo Y, Han X, Wen L (2017) Why choose random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence. PeerJ 5:e2849. https://doi.org/10.7717/peerj.2849
Cockburn A (2013) A colossal wreck: a road trip through political scandal, corruption and American culture. Verso Publishers, New York
Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88:2783–2792. https://doi.org/10.1890/07-0539.1
Czech B, Krausman PR, Devers PK (2000) Economic associations among causes of species endangerment in the United States. Bioscience 50:593–601
De’ath G (2007) Boosted trees for ecological modeling and prediction. Ecology 88:243–251
De’ath G, Fabricius K (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192 https://doi.org/10.1890/0012-9658(2000)081[3178:CARTAP]2.0.CO;2
Dhar V (1998) Data mining in finance: using counterfactuals to generate knowledge from organizational information systems. Inf Syst 23:423–437
Drew CA, Wiersma Y, Huettmann F (eds) (2011). Predictive Species and Habitat Modeling in Landscape Ecology. Springer, New York
Drucker H, Schapire R, Simard P (1993) Boosting performance in neural networks. Int J Pattern Recognit Artif Intell 7:705–771
Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall/CRC Monographs, New York
Elder JF (2003) The generalization paradox of ensembles. J Comput Graph Stat 12:853–864
Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S, Guisan A, Hijmans RJ, Huettmann F, Leathwick JR, Lehmann A, Li J, Lohmann LG, Loiselle BA, Manion G, Moritz C, Nakamura M, Nakazawa Y, Overton J, Peterson AT, Phillips SJ, Richardson K, Scachetti-Pereira R, Schapire RE, Soberón J, Williams S, Wisz MS, Zimmermann NE (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151
Evans JS, Cushman S (2009) Gradient modeling of conifer species using random forests. Landsc Ecol 24:673. https://doi.org/10.1007/s10980-009-9341-0
Evans JS, Murphy MA, Holden ZA, Cushman SA (2010) Modeling species distribution and change using random forest. Predictive species and habitat modeling in landscape ecology, pp 139–159
Ferandez-Delgado M, Cernadas E, Barrow S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15:3133–3181
Fielding A (1999) Machine learning methods for ecological applications. Springer, Boston
Fielding A, Bell Y (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ Conserv 24:38–49
Forman RTT (1995) Land mosaics: the ecology of landscapes and regions. Cambridge University Press, Cambridge
Fox CH, Huettmann, F, Harvey GKA, Morgan KH,. Robinson J, Williams R and Paquet PC (2017) Predictions from Machine Learning ensembles: marine bird distribution and density on Canada’s Pacific coast. Marine Ecology Progress Series 566:199–216
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378
Guthery FS, Brennan LA, Peterson MJ, Lusk LL (2005) Information theory in wildlife science: critique and viewpoint. J Wildl Manag 69:457–465
Hardy SM, Lindgren M, Konakanchi H, Huettmann F (2011) Predicting the distribution and ecological niche of unexploited snow crab (Chionoecetes opilio) populations in Alaskan waters: a first open-access ensemble model. Integr Comp Biol 51(4):608–622. https://doi.org/10.1093/icb/icr102
Harrell FE Jr (2001) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, New York
Hastie T, Tibshirany R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Series in Statistics
Hegel TSA, Cushman JE, Huettmann F (2010) Current state of the art for statistical modelling of species distributions. Chapter 16. In: Cushman S, Huettmann F (eds) Spatial complexity, informatics and wildlife conservation. Springer, Tokyo, pp 273–312
Herrick KA, Huettmann F, Lindgren MA (2013) A global model of avian influenza prediction in wild birds: the importance of northern regions. Vet Res. https://doi.org/10.1186/1297-9716-44-42
Hilborn R, Mangel M (1997) The ecological detective: confronting models with data. Princeton University Press, Princeton
Hobbs NT, Hooten M (2015) Bayesian models: a statistical primer for ecologists. University Press, Princeton
Hochachka W, Caruana R, Fink D, Munson A, Riedewald M, Sorokina D, Kelling S (2007) Data mining for discovery of pattern and process in ecological systems. J Wildl Manag 71:2427–2437
Huettmann F (2007) Modern adaptive management: adding digital opportunities towards a sustainable world with new values. Forum on Public Policy: Clim Chang Sustain Dev 3:337–342
Jiao S, Guo Y, Huettmann F, Lei G (2014) Nest-site selection analysis of hooded crane (Grus monacha) in northeastern China based on a multivariate ensemble model. Zool Sci 31:430–437
Johnson DS, Thomas DL, Ver Hoef JM, Christ AD (2008) A general framework for the analysis of animal resource selection from telemetry data. Biometrics 64:968–976
Kampichler C, Wieland R, Calmé S, Weissenberger H, Arriaga-Weiss S (2010) Classification in conservation biology: a comparison of five machine-learning methods. Ecol Inform 5:441–450
Kandel K, Huettmann F, Suwal MK, Regmi GR, Nijman V, Nekaris KAI, Lama ST, Thapa A, Sharma HP, Subedi TR (2015) Rapid multi-nation distribution assessment of a charismatic conservation species using open access ensemble model GIS predictions: red panda (Ailurus fulgens) in the Hindu-Kush Himalaya region. Biol Conserv 181:150–161
Keating KA, Cherry S (2004) Use and interpretation of logistic regression in habitat- selection studies. Journal of Wildlife Management 68:774–789
Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23:89–109
Kurt F (1982) Naturschutz-illusion. Paul Parey Publisher, Berlin Germany
Lawler JJ, White D, Neilson RP, Blaustein AR (2006) Predicting climate-induced range-shifts: model differences and model reliability. Glob Chang Biol 12:1568–1584
Lawler JJ, Yo W, Huettmann F (2011) Designing predictive models for increased utility: using species distribution models for conservation planning, forecasting, and risk assessment. In: Drew CA, Wiersma Y, Huettmann F (eds) Predictive modeling in landscape ecology. Chapter 5. Springer, New York, pp 271–290
Leopold A, Meine C (2013) A sand county almanac & other writings on conservation and ecology. Library of America, New York
Liaw A, Wiener M (2002) Classification and regression by randomforests. R News 2(3):18
Liu J, Dou Y, Batistella M, Challies E, Conno T, Friis C, DA MJ, Parish E, CL R, Bl BS, Triezenber H, Yang H, Zhao Z, Zimmerer KS, Huettmann F, Treglia M, Basher Z, Chung MG, Herzberger A, Lenschow A, Mechiche-Alami A, Newig A, Roch J, Sun J (2018) Spillover systems in a telecoupled Anthropocene: typology, methods, and governance for global sustainability. Environ Sustain 33:58–69. https://doi.org/10.1016/j.cosust.2018.04.009
Loftus GR (1996) Psychology will be a much better science when we change the way we analyze data. Curr Dir Psychol 5:161–171
Mace G, Cramer W, Diaz S, Faith DP, Larigauderie A, Le Prestre P, Palmer M, Perrings C, Scholes RJ, Walpole M, Walter BA, Watson JEM, Mooney HA (2010) Biodiversity targets after 2010. Environ Sustain 2:3–8
MacNally R (2000) Regression and model-building in conservation biology, biogeography and ecology: the distinction between – and reconciliation of – ‘predictive’ and ‘explanatory’ models. Biodivers Conserv 6:655–671
Manly FJ, McDonald LL, Thomas DL, McDonald TL, Erickson WP (2002) Resource selection by animals: statistical design and analysis for field studies, Second edn. Kluwer Academic Publishers, Dordrecht
McArdle (1988) The structural relationship: regression in biology. Can J Zool 66: 2329–2339
Merow C, Silander JA (2014) A comparison of Maxlike and Maxent for modelling species distributions. Methods Ecol Evol 5:215–225
Mueller JP, Massaron L (2016) Machine Learning for dummies. For Dummies Publisher, 435 p
Næss A (1989) Ecology, community and lifestyle: outline of an Ecosophy (trans: Rothenberg D). Cambridge University Press, Cambridge
Nielsen SE, Stenhouse GB, Beyer HL, Huettmann F, Boyce MS (2008) Can natural disturbance-based forestry rescue a declining population of grizzly bears? Biol Conserv 141:2193–2207
O’Connor R, Jones MT, White D, Hunsacker C, Loveland T, Jones B, Preston E (1996) Spatial partitioning of environmental correlates of avian biodiversity in the Conterminuous United States. Biodivers Lett 3:97–110
Oppel S, Meirinho A, Ramírez I, Gardner B, O’Connell AF, Miller PI, Louzao M (2012) Comparison of five modelling techniques to predict the spatial distribution and abundance of seabirds. Biol Conserv 156:94–104
Perera AH, Drew A, Johnson CJ (2010) Expert knowledge and its application in landscape ecology. Springer, New York
Phillips SJ, Dudik M (2008) Modelling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31:161–175
Regmi GR, Huettmann F, Suwal MK, Nijman V, Nekaris KAI, Kandel K, Sharma N and Coudrat C (2018). First Open Access Ensemble Climate Envelope Predictions of Assamese Macaque Macaca Assamensis in South and South-East Asia: A new role model and assessment of endangered species. Endangered Species Research 36:149–160 https://doi.org/10.3354/esr0088
Reinhart A (2015) Statistics done wrong: The woefully complete guide. No Starch Press. San Francisco
Reich Y, Barai SV (1999) Evaluating Machine Learning models for engineering problems. Artif Intell Eng 13:257–272
Romesburg HC (1989) More on gaining reliable knowledge. J Wildl Manag 53:1177–1180
Schapire RE (1990) The strength of weak learnability (PDF). Machine learning, vol 5. Kluwer Academic Publishers, Boston, pp 197–227. https://doi.org/10.1007/bf00116037
Schapire RE (1992) The design and analysis of efficient learning algorithms. MIT Press, USA
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictors. Machine Learning 37:297–336
Silva NJ (2012) The wildlife techniques manual: research & management. 2 volumes. The Johns Hopkins University Press; Seventh edn
Smith BD, Zeder MD (2013) The onset of the Anthropocene. Anthropocene 4:6–13
Venables WN, Ripley BD (2002) Modern applied statistical analysis, 4th edn. Springer, New York
Verner J, Morrison ML, Ralph CJ (1986) Wildlife 2000. Modeling habitat relationships of terrestrial vertebrates. University of Wisconsin Press, Madison
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufman Publisher, Amsterdam
Yen P, Huettmann F, Cooke F (2004) Modelling abundance and distribution of marbled Murrelets (Brachyramphus marmoratus) using GIS, marine data and advanced multivariate statistics. Ecol Model 171:395–413
Zar JH (2010) Biostatistical analysis, 5th edn. Prentice Hall, Upper Saddle River
Acknowledgement
I thank Profs R. O’Connor and A.W. (Tony) Diamond for an early workshop on statistics with ACWERN at UNB, Canada introducing me in the late 1990s to tree-based techniques (CART) and multivariate analysis. I thank Dan Steinberg and Salford Systems Ltd. for a workshop with U.S. IALE at Snowbird, Utah, as well as with The Wildlife Society, Alaska Chapter, for a wider debate and introduction of tree-based methods, boosting and bagging. I am indebted to U.S.IALE, the Global Primate Network in Kathmandu, Nepal, Medical University Taipeh, Taiwan, and the Wildlife Institute of India in Dheradun for their workshop promotion and support. Thanks to S. Linke, I. Presse, B. Walter, G. Regmi, M. Suwal, R. Lama, C. Cambu, H. Hera, S. Sparks, Y. Subaru, H. Berrios and the many members of the -EWHALE lab- at UAF for their discussions and partly, support. This is EWHALE lab publication #187.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Huettmann, F. (2018). Boosting, Bagging and Ensembles in the Real World: An Overview, some Explanations and a Practical Synthesis for Holistic Global Wildlife Conservation Applications Based on Machine Learning with Decision Trees. In: Humphries, G., Magness, D., Huettmann, F. (eds) Machine Learning for Ecology and Sustainable Natural Resource Management. Springer, Cham. https://doi.org/10.1007/978-3-319-96978-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-96978-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96976-3
Online ISBN: 978-3-319-96978-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)