Machine-learning models, cost matrices, and conservation-based reduction of selected landscape classification errors

Gutzwiller, Kevin J.; Chaudhary, Anand

doi:10.1007/s10980-020-00969-y

Machine-learning models, cost matrices, and conservation-based reduction of selected landscape classification errors

Perspective Article
Published: 03 February 2020

Volume 35, pages 249–255, (2020)
Cite this article

Landscape Ecology Aims and scope Submit manuscript

739 Accesses
3 Citations
2 Altmetric
Explore all metrics

Abstract

Context

Use of statistical models developed with machine-learning algorithms is increasing in the ecological sciences, yet these disciplines have not capitalized on the ability to use cost matrices to selectively reduce classification errors that have highly detrimental consequences.

Objectives

Our aim was to promote such applications by demonstrating the process of using a cost matrix to decrease specific types of misclassification, explaining the importance of exploring the effectiveness of cost matrices for a given dataset, and encouraging use of cost matrices with machine-learning models in landscape-ecological and conservation contexts.

Methods

Bird occurrence data, landscape and regional land-cover data, costs of false-positive and false-negative errors, and the C5.0 decision tree algorithm were used to train and test a binary classifier.

Results

Increasing the cost for false negatives tended to decrease the frequency of this error type while allowing for reasonable predictive performance for each class separately and both classes combined.

Conclusions

Cost matrices are applicable to many different categorical response variables and spatial scales. We encourage landscape ecologists and planners to explore the effectiveness of cost matrices for their particular dataset and project goals, especially when conservation of biodiversity across broad spatial extents is at stake.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Remote Sensing’s Recent and Future Contributions to Landscape Ecology

Article Open access 09 May 2020

A new European land systems representation accounting for landscape characteristics

Article Open access 16 March 2021

Tools for Landscape Science: Theory, Models and Data

Data availability

The metadata, data, and R code used in this research are included in this article’s electronic supplementary material files.

References

Bhattacharya M (2013) Machine learning for bioclimatic modelling. Int J Adv Comput Sci Appl 4(2):1–8
Article CAS Google Scholar
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman and Hall, New York
Google Scholar
Fielding AH (2002) What are the appropriate characteristics of an accuracy measure? In: Scott JM, Heglund PJ, Morrison ML, Haufler JB, Raphael MG, Wall WA, Samson FB (eds) Predicting species occurrences: issues of accuracy and scale. Island Press, Washington, DC, pp 271–280
Google Scholar
Fielding AH (2007) Cluster and classification techniques for the biosciences. Cambridge University Press, New York
Google Scholar
Gergel SE, Stange Y, Coops NC, Johansen K, Kirby KR (2007) What is the value of a good map? An example using high spatial resolution imagery to aid riparian restoration. Ecosystems 10:688–702
Article Google Scholar
Gutzwiller KJ, Riffell SK, Flather CH (2015) Avian abundance thresholds, human-altered landscapes, and the challenge of assemblage-level conservation. Landsc Ecol 30:2095–2110
Article Google Scholar
Hollmén J, Skubacz M, Taniguchi M (2000) Input dependent misclassification costs for cost-sensitive classifiers. In: Ebecken N, Brebbia C (eds) Data Mining II—Proceedings of the Second International Conference on Data Mining. WIT Press, Ashurst Lodge, Southampton, UK, pp 495–503
Humphries GRW, Huettmann F (2018) Machine learning in wildlife biology: algorithms, data issues and availability, workflows, citizen science, code sharing, metadata and a brief historical perspective. In: Humphries GRW, Magness DR, Huettmann F (eds) Machine learning for ecology and sustainable natural resource management. Springer Nature Switzerland, Cham, pp 3–26
Chapter Google Scholar
Kuhn M, Johnson K (2016) Applied predictive modeling. Springer, New York
Google Scholar
Lantz B (2015) Machine learning with R, 2nd edn. Packt Publishing, Birmingham, UK
Google Scholar
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Article Google Scholar
Lynn H, Mohler CL, DeGloria SD, McCulloch CE (1995) Error assessment in decision-tree models applied to vegetation analysis. Landsc Ecol 10:323–335
Article Google Scholar
Olden JD, Lawler JJ, Poff NL (2008) Machine learning methods without tears: a primer for ecologists. Q Rev Biol 83:171–193
Article Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Google Scholar
R Core Development Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Google Scholar
Sauer JR, Niven DK, Hines JE, Ziolkowski DJ, Pardieck KL, Fallon JE, Link WA (2017) The North American Breeding Bird Survey, results and analysis 1966–2015. Version 2.07.2017—USGS Patuxent Wildlife Research Center, Laurel, Maryland. https://www.mbr-pwrc.usgs.gov/bbs/
Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40:3358–3378
Article Google Scholar

Download references

Acknowledgements

We thank J. Stoklosa for comments about an earlier version of the manuscript and Baylor University for supporting this research.

Funding

The authors’ work on this project was supported by funding from Baylor University.

Author information

Authors and Affiliations

Department of Biology, Baylor University, One Bear Place, # 97388, Waco, TX, 76798, USA
Kevin J. Gutzwiller
The Institute of Ecological, Earth, and Environmental Sciences, Baylor University, One Bear Place, # 97205, Waco, TX, 76798, USA
Kevin J. Gutzwiller & Anand Chaudhary

Authors

Kevin J. Gutzwiller
View author publications
You can also search for this author in PubMed Google Scholar
Anand Chaudhary
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin J. Gutzwiller.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

The authors declare that they are in full compliance with all of the ethical standards for publishing in Landscape Ecology. Data obtained from the website for the North American Breeding Bird Survey involved birds, but the authors’ research did not involve actual interaction with birds, other animals, or human subjects.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Online Resource 1 Metadata for the data file used in the analyses (PDF 105 kb)

Online Resource 2 Data file (YBCHC) used in the statistical analyses (CSV 80 kb)

Online Resource 3 R code used to analyze the data (PDF 166 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gutzwiller, K.J., Chaudhary, A. Machine-learning models, cost matrices, and conservation-based reduction of selected landscape classification errors. Landscape Ecol 35, 249–255 (2020). https://doi.org/10.1007/s10980-020-00969-y

Download citation

Received: 14 September 2019
Accepted: 08 December 2019
Published: 03 February 2020
Issue Date: February 2020
DOI: https://doi.org/10.1007/s10980-020-00969-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine-learning models, cost matrices, and conservation-based reduction of selected landscape classification errors