Abstract
The Siberian crane (Leucogeranus leucogeranus,) remains an elusive but highly regarded species of global conservation concern. Breeding regions occur in the Russian high arctic, and two subpopulations are known. Here we present for the first time a machine learning-based summer habitat analysis using nesting data for the eastern population in the breeding grounds employing predictive modeling with 74 GIS predictors. There is a typical desire for parsimony to help increase interpretability of models, but findings generally show that it would not result in greatest improvement to the model and inference. ‘Batteries’ are a new concept in machine learning allowing to test a set of experiments that help to test on predictors and model selection. Here we show 28 of those ‘batteries’ and compared multiple approaches to model runs from iteratively dropping the least or most important predictor (‘variable shaving’) to allow all predictors to contribute. It was found that the generic ‘kitchen sink’ model with TreeNet (an optimized boosting algorithm from Salford Systems Ltd) performs best. However, while the use of ‘batteries’ remain widely underused in wildlife conservation management, ‘shaving’ was of great use to learn about the structure, role and impacts of predictors and their spatial performance supporting non-parsimonious work. Of great interest is the finding that a bundle of low-ranked predictors performs almost equal to, or better than, the so-called top predictors. This is called ‘Predictor swapping’. This is the best and most detailed habitat study and prediction for the Siberian crane in summer, thus far. It is to be used for conservation management and as a generic template for any species while data availability and the environmental crisis are on the rise, specifically for the high Arctic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr AC-19:716–23, Institute of Statistical Mathematics, Minato-ku, Tokyo, Japan
Arnold TW (2010) Uninformative parameters and model selection using Akaike’s information criterion. J Wildl Manag 74:1175–1178
Barbet-Massin M, Jiguet F, Albert CH, Thuiller W (2012) Selecting pseudo-absences for species distribution models: how, where and how many? Methods Ecol Evol, 3:327–338. https://doi.org/10.1111/j.2041-210X.2011.00172.x
BirdLife International (2001) Threatened birds of Asia: the bird life international red data book, vol 1. Bird Life International Cambridge, Cambridge
Breiman L (2001) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16:199–231
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York
Cai T, Huettmann F, Guo Y (2014) Using stochastic gradient boosting to infer stopover habitat selection and distribution of hooded cranes Grus monacha during spring migration in lindian, Northeast China. PLoS ONE 9. https://doi.org/10.1371/journal.pone.0097372
Chamberlin TC (1890) The method of multiple working hypotheses. Science 15:92–96
Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S, Guisan A, Hijmans RJ, Huettmann F, Leathwick JR, Lehmann A, Li J, Lohmann LG, Loiselle BA, Manion G, Moritz C, Nakamura M, Nakazawa Y, McC J, Overton M, Townsend Peterson A, Phillips SJ, Richardson K, Scachetti-Pereira R, Schapire RE, Soberón J, Williams S, Wisz MS, Zimmermann NE (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151
Fielding A (1999) Machine learning methods for ecological applications. Springer, Boston
Fielding A, Bell JF (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ Conserv 24:38–49
Friedman JH (2001) Greedy function approximation: A gradient boosting machine. Ann Stat 29:1189–1232
Friedman JH (2002) Stochastic gradient boosting. Comp Stat Data Anal 38:367–378
Guthery FS, Brennan LA, Peterson MJ, Lusk LL (2005) Information theory in wildlife science: critique and viewpoint. J Wildl Manag 69:457–465
Han X, Guo Y, Mi C, Huettmann F, Wen L (2017) Machine learning model analysis of breeding habitats for the Blacknecked Crane in Central Asian Uplands under Anthropogenic pressures. Scientific Reports 7, Article number: 6114. https://doi.org/10.1038/s41598-017-06167-2. https://www.nature.com/articles/s41598-017-06167-2
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
Herrick KA, Huettmann F, Lindgren MA (2013) A global model of avian influenza prediction in wild birds: The importance of northern regions. Vet Res. https://doi.org/10.1186/1297-9716-44-42
Hilborn R, Mangel M (1997) The ecological detective: Confronting models with data. Princeton University Press, Princeton
Hochachka W, Caruana R, Fink D, Munson A, Riedewald M, Sorokina D, Kelling S (2007) Data mining for discovery of pattern and process in ecological systems. J Wildl Manag 71:2427–2437
Jiao S, Guo Y, Huettmann F, Lei G (2014) Nest-Site selection analysis of hooded crane (Grus monacha) in northeastern china based on a multivariate ensemble model. Zool Sci 31:430–437
Kandel K, Huettmann F, Suwal MK, Regmi GR, Nijman V, Nekaris KAI, Lama ST, Thapa A, Sharma HP, Subedi TR (2015) Rapid multi-nation distribution assessment of a charismatic conservation species using open access ensemble model GIS predictions: red panda (Ailurus fulgens) in the Hindu-Kush Himalaya region. Biol Conserv 181:150–161
Kanai Y, Ueta M, Germogenov N, Nagendran M, Mita N, Higuchi H (2002) Migration routes and important resting areas of Siberian cranes (Grus leucogeranus) between northeastern Siberia and China as revealed by satellite tracking. Biol Conserv 106:339–346
Klein DR, Magomedova M (2003) Industrial development and wildlife in arctic ecosystems: Can learning from the past lead to a brighter future? In: Rasmussen RO, Koroleva NE (eds) Social and environmental impacts in the North. Kluwer Academic Publishers, The Netherlands, pp 35–56
Mace G, Cramer W, Diaz S, Faith DP, Larigauderie A, Le Prestre P, Palmer M, Perrings C, Scholes RJ, Walpole M, Walter BA, Watson JEM, Mooney HA (2010) Biodiversity targets after 2010. Env Sustain 2:3–8
Manly FJ, McDonald LL, Thomas DL, McDonald TL, Erickson WP (2002) Resource selection by animals: statistical design and analysis for field studies, Second edn. Kluwer Academic Publishers, Netherlands
Matthiessen P (2001) The birds of heaven. Travels with cranes. North Point Press, New York
McGarical K, Cushman S, Stafford S (2000) Multivariate statistics for wildlife and ecology research. Springer, New York
Mi C, Huettmann F, Guo Y, Han X, Wen L (2017) Why choose random forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence. PeerJ. https://doi.org/10.7717/peerj.2849
Moore GS, Ilyashenko E (2009) Regional flyway education programs: increasing public awareness of crane conservation along the crane flyways of Eurasia and North America. In: Prentice C (ed) Conservation of flyway wetlands in East and West/Central Asia. Proceedings of the project completion workshop of the UNEP/GEF Siberian Crane wetland project, 14–15 October 2009, Harbin, China. Baraboo (Wisconsin), USA: International Crane Foundation
Mueller JP, Massaron L (2016) Machine learning for dummies. For Dummies Publisher, 435 p
Ohse B, Huettmann F, Ickert-Bond S, Juday G (2009) Modeling the distribution of white spruce (Picea glauca) for Alaska with high accuracy: an open access role-model for predicting tree species in last remaining wilderness areas. Polar Biol 32:1717–1724
Prentice C (ed) (2010) Conservation of flyway wetlands in East and West/Central Asia. Proceedings of the project completion workshop of the UNEP/GEF Siberian Crane wetland project, 14–15 October 2009, Harbin, China. Baraboo (Wisconsin), USA: International Crane Foundation
Sorokin AG, Kotyukov YV (1987) Discovery of the nesting ground of the Ob River population of the Siberian Crane. In: Archibald GW, Pasquier RF (eds) Proceedings of the 1983 international crane workshop. International Crane Foundation, Baraboo, pp 209–212
Sorokin A, Markin Y (1996) New nesting site of Siberian Cranes. Newsletter of Russian Bird Conservation Union, Moscow
Spiridonov V, Gavrilo M, Krasnov MA, Nikolaeva N, Sergienko L, Popov A, Krasnova E (2011) Toward the new role of marine and coastal protected areas in the arctic: The russian case. In: Huettmann F (ed) Protection of the three poles. Springer, New York
Silvy NY (2012) The wildlife techniques manual: research and management, vol 2, 7th edn. John Hopkins University Press, Baltimore
Van Impe J (2013) Esquisse de l’avifaune de la Sibérie Occidentale: Une revue bibliographique. Alauda 81:269–296
Wu G, Leeuw J, Skidmore AK, Prins HHT, Best EPH, Liu Y (2009) Will the three gorges dam affect the underwater light climate of Vallisneria spiralis L. and food habitat of Siberian Crane in Poyang Lake. Hydrobiologia 623:213–222
Yu C, Yinghao W, Qing Y (2008) Ground survey of waterbirds in the Poyang Lake region in Winter 2007/2008. Siberian Crane Flyway News: 15
Acknowledgement
We thank Dan Steinberg and Salford Systems Ltd. for a workshop with U.S. IALE at Snowbird, Utah, to introduce us to the power of batteries. FH acknowledges the kind and long collaboration with the Forestry University of Beijing, China, and the use of their data. U.S. IALE and S. Linke, C. Cambu, H. Hera, H. Berrios Alvarez and the -EWHALE lab- at UAF, are thanked for their support. This is EWHALE lab publication #185.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: Details of 74 GIS Environmental layers Used in the Model Prediction (+ 3 Additional Internal Columns)
# | Name and abbreviation of GIS layer | Source | Comment |
---|---|---|---|
1–12 | Monthly mean temperature tmen_1–12 | These are standard layers used for GIS modeling | |
13–24 | Monthly minimum temperature tmin_1–12 | (see above) | |
25–36 | Monthly maximum temperature tmax 1–12 | (see above) | |
37–48 | Monthly precipitation prec_1–12 | (see above) | |
49–67 | Bioclim bio_1–19 | (see above) | |
68 | Altitude | (see above) | |
69 | Aspect | (see above) | |
70 | Slope | (see above) | |
71 | Landcover Landcv | Herrick et al. (2013) | Several of global landcover layers exist |
72 | Human infrastructure index Hii | Herrick et al. (2013) | Human footprint. Several human footprint layers |
73 | Distance to waterbody/lake Dislke | Mi unpublished | While essential for cranes, this layer is unlikely to be very accurate due to the huge and ephemeral wetlands worldwide |
74 | Distance to coastline Discsln | Mi unpublished | Relies on the coastline map resolution |
75 | x coordinate | ArcGIS | Not often used in most GIS model work but important for geo-referencing |
76 | y coordinate | ArcGIS | Not often used in most GIS model work but important for geo-referencing |
77 | Row index FID | ArcGIS | Not often used in most GIS model work but important for row identification |
Appendix 2
1.1 List of Top 20 Predictors, as identified by TreeNet ranking
Predictor | Relative Importance |
---|---|
Bio12 | 100.0 |
Bio14 | 71.2 |
Bio17 | 44.2 |
TMEN9 | 40.1 |
Prec12 | 37.6 |
Distance to lake | 35.1 |
TMAX12 | 29.8 |
Altitude | 27.3 |
Slope | 25.9 |
Tmin1 | 23.8 |
Bio1 | 23.0 |
Bio19 | 20.4 |
Tmen2 | 19.2 |
Bio3 | 18.9 |
Tmax3 | 17.9 |
Bio6 | 16.3 |
Tmen7 | 15.9 |
Prec6 | 14.3 |
Prec7 | 13.9 |
Tmin6 | 12.9 |
Appendix 3
1.1 Prediction Model Details for the Best Performing Model (the ‘Kitchen sink model’ with 74 predictors)
Siberian crane with a battery run on TreeNet (SPM7) balanced
The kitchensink model, all 74 environmental predictors
Frequency of Prediction Relative Index of Ocurrence (RIO 0-1) for known presence (1)
Appendix 4
(For Prediction map 1 for the ‘Kitchen sink model ’ see Fig. 8.4 in the text; for map legends please see this figure; same for all other appendix maps)
(For Prediction map 2 for the ‘TMax12 model’ see Fig. 8.5 in the text)
1.1 Prediction Map 3 for the ‘BIO14 model’
1.2 Prediction Map 4 for the ‘TMax12BIO14 model’
1.3 Prediction Map 5 for the ‘Top5 model’
1.4 Prediction Map 6 for the ‘Top10 model’
1.5 Prediction Map 7 for the ‘Top29 model’
1.6 Prediction Map 8 for the ‘Top35 model’
1.7 Prediction Map 9 for the ‘Bottom 44 model’
1.8 Prediction map 10 for the ‘Leaving out top 3 interacting predictors model’
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Huettmann, F., Mi, C., Guo, Y. (2018). ‘Batteries’ in Machine Learning: A First Experimental Assessment of Inference for Siberian Crane Breeding Grounds in the Russian High Arctic Based on ‘Shaving’ 74 Predictors. In: Humphries, G., Magness, D., Huettmann, F. (eds) Machine Learning for Ecology and Sustainable Natural Resource Management. Springer, Cham. https://doi.org/10.1007/978-3-319-96978-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-96978-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96976-3
Online ISBN: 978-3-319-96978-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)