Abstract
Context
Use of statistical models developed with machine-learning algorithms is increasing in the ecological sciences, yet these disciplines have not capitalized on the ability to use cost matrices to selectively reduce classification errors that have highly detrimental consequences.
Objectives
Our aim was to promote such applications by demonstrating the process of using a cost matrix to decrease specific types of misclassification, explaining the importance of exploring the effectiveness of cost matrices for a given dataset, and encouraging use of cost matrices with machine-learning models in landscape-ecological and conservation contexts.
Methods
Bird occurrence data, landscape and regional land-cover data, costs of false-positive and false-negative errors, and the C5.0 decision tree algorithm were used to train and test a binary classifier.
Results
Increasing the cost for false negatives tended to decrease the frequency of this error type while allowing for reasonable predictive performance for each class separately and both classes combined.
Conclusions
Cost matrices are applicable to many different categorical response variables and spatial scales. We encourage landscape ecologists and planners to explore the effectiveness of cost matrices for their particular dataset and project goals, especially when conservation of biodiversity across broad spatial extents is at stake.
Similar content being viewed by others
Data availability
The metadata, data, and R code used in this research are included in this article’s electronic supplementary material files.
References
Bhattacharya M (2013) Machine learning for bioclimatic modelling. Int J Adv Comput Sci Appl 4(2):1–8
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman and Hall, New York
Fielding AH (2002) What are the appropriate characteristics of an accuracy measure? In: Scott JM, Heglund PJ, Morrison ML, Haufler JB, Raphael MG, Wall WA, Samson FB (eds) Predicting species occurrences: issues of accuracy and scale. Island Press, Washington, DC, pp 271–280
Fielding AH (2007) Cluster and classification techniques for the biosciences. Cambridge University Press, New York
Gergel SE, Stange Y, Coops NC, Johansen K, Kirby KR (2007) What is the value of a good map? An example using high spatial resolution imagery to aid riparian restoration. Ecosystems 10:688–702
Gutzwiller KJ, Riffell SK, Flather CH (2015) Avian abundance thresholds, human-altered landscapes, and the challenge of assemblage-level conservation. Landsc Ecol 30:2095–2110
Hollmén J, Skubacz M, Taniguchi M (2000) Input dependent misclassification costs for cost-sensitive classifiers. In: Ebecken N, Brebbia C (eds) Data Mining II—Proceedings of the Second International Conference on Data Mining. WIT Press, Ashurst Lodge, Southampton, UK, pp 495–503
Humphries GRW, Huettmann F (2018) Machine learning in wildlife biology: algorithms, data issues and availability, workflows, citizen science, code sharing, metadata and a brief historical perspective. In: Humphries GRW, Magness DR, Huettmann F (eds) Machine learning for ecology and sustainable natural resource management. Springer Nature Switzerland, Cham, pp 3–26
Kuhn M, Johnson K (2016) Applied predictive modeling. Springer, New York
Lantz B (2015) Machine learning with R, 2nd edn. Packt Publishing, Birmingham, UK
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Lynn H, Mohler CL, DeGloria SD, McCulloch CE (1995) Error assessment in decision-tree models applied to vegetation analysis. Landsc Ecol 10:323–335
Olden JD, Lawler JJ, Poff NL (2008) Machine learning methods without tears: a primer for ecologists. Q Rev Biol 83:171–193
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
R Core Development Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Sauer JR, Niven DK, Hines JE, Ziolkowski DJ, Pardieck KL, Fallon JE, Link WA (2017) The North American Breeding Bird Survey, results and analysis 1966–2015. Version 2.07.2017—USGS Patuxent Wildlife Research Center, Laurel, Maryland. https://www.mbr-pwrc.usgs.gov/bbs/
Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40:3358–3378
Acknowledgements
We thank J. Stoklosa for comments about an earlier version of the manuscript and Baylor University for supporting this research.
Funding
The authors’ work on this project was supported by funding from Baylor University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
The authors declare that they are in full compliance with all of the ethical standards for publishing in Landscape Ecology. Data obtained from the website for the North American Breeding Bird Survey involved birds, but the authors’ research did not involve actual interaction with birds, other animals, or human subjects.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Gutzwiller, K.J., Chaudhary, A. Machine-learning models, cost matrices, and conservation-based reduction of selected landscape classification errors. Landscape Ecol 35, 249–255 (2020). https://doi.org/10.1007/s10980-020-00969-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10980-020-00969-y