Extraction of mitigation-related text from Endangered Species Act documents using machine learning: a case study

Varghese, Arun; Allen, Kasey; Agyeman-Badu, George; Haire, Jennifer; Madsen, Rebecca

doi:10.1007/s10669-021-09830-2

Extraction of mitigation-related text from Endangered Species Act documents using machine learning: a case study

Published: 30 August 2021

Volume 42, pages 63–74, (2022)
Cite this article

Environment Systems and Decisions Aims and scope Submit manuscript

Arun Varghese ORCID: orcid.org/0000-0001-9882-884X¹,
Kasey Allen²,
George Agyeman-Badu¹,
Jennifer Haire² &
…
Rebecca Madsen³

169 Accesses
2 Altmetric
Explore all metrics

Abstract

Various industrial and development projects have the potential to adversely affect threatened and endangered species and their habitats. The federal Endangered Species Act (ESA) requires preparation of a biological assessment or habitat conservation plan before federal agencies can authorize, through decision documents and permits, unintentional and otherwise prohibited “take” (i.e., harm) of listed species. These documents describe the potential effects of proposed projects on listed species and include measures to mitigate those effects. Collectively, these assessments, plans, decision documents, and permits—termed ESA documents in our study—are valuable for identifying approved mitigation options that could apply to future projects. However, owing to the volume, length, and complexity of these documents, manual review would be time- and labor-intensive. In this study, we apply three supervised machine learning algorithms, including two based on state-of-the-art transfer learning, to develop and evaluate predictive models capable of extracting mitigation-related text from ESA documents. The machine learning models were developed based on a training dataset that was created as part of this study. The best performing model showed an estimated ROC-AUC score of 0.98 and a precision recall AUC score of 0.86 during cross-validation, indicating great potential for effectively extracting mitigation-related content from existing documents. To illustrate the utility of this technology, we present a simulated case study application in which the use of pretrained machine learning models capable of recognizing mitigation measures, coupled with a large historical corpus of ESA documents and keyword filters, provided a means to rapidly assess the commonly used mitigation measures for a given species. While this technology did not eliminate the requirement for biological expertise, it did allow for rapid scoping assessments and could serve as a supporting resource even for experienced biologists.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Validity of Machine Learning in Assessing Large Texts Through Sustainability Indicators

Article Open access 26 January 2023

A computational approach to analyzing climate strategies of cities pledging net zero

Article Open access 26 August 2022

Tracking Environmental Policy Changes in the Brazilian Federal Official Gazette

Notes

See https://www.fws.gov/endangered/laws-policies for more information.
Take means to harass, harm, pursue, hunt, shoot, wound, kill, trap, capture, or collect or attempt to engage in any such conduct.
See https://www.fws.gov/sacramento/es/overview/Documents/ESA_Basics.pdf for more information.
See https://esadocs.defenders-cci.org for more information.
https://www.fws.gov/endangered/species/us-species.html

References

Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF (2005) Text categorization models for high-quality article retrieval in internal medicine. J Am Med Inform Assoc 12:207–216
Article Google Scholar
Bekhuis T, Demner-Fushman D (2012) Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. Artif Intell Med 55(3):197–207
Article Google Scholar
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–39
Article Google Scholar
Cohen AM, Hersh WR, Peterson K, Yen P-Y (2006) Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc 13:206–219
Article CAS Google Scholar
Cohen AM, Ambert K, McDonagh M (2012) Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Med Inform Decis Mak 12(1):33
Article Google Scholar
Defenders of Wildlife (2020) ESAdocs search. https://esadocs.defenders-cci.org. Accessed 22 Feb 2019
Devlin J, Chang M, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/quantph/1810.04805
Frunza O, Inkpen D, Matwin S, Klement W, O’blenis P (2011) Exploiting the systematic review protocol for classification of medical abstracts. Artif Intell Med 51:17–25
Article Google Scholar
Horspool RN (1980) Practical fast searching in strings. Softw Pract Exp 10(6):501–506
Article Google Scholar
ICF (2015) Document classification and topic extraction resource (DoCTER). https://www.icf-docter.com. Accessed 14 Mar 2020
Ingersoll GS, Morton TS, Farris AL (2013) Taming text: how to find, organize, and manipulate it. Manning Publications Co., New York
Google Scholar
Jonnalagadda S, Petitti D (2013) A new iterative method to reduce workload in systematic review process. Int J Comput Biol Drug Des 6:5–17
Article Google Scholar
Kaur H, Pannu HS, Malhi AK (2019) A systematic review on imbalanced data challenges in machine learning. ACM Comput Surv 52:1–36
Google Scholar
Kaushik N, Chatterjee N (2016) A practical approach for term and relationship extraction for automatic ontology creation from agricultural text. ICIT 2016:241–247. https://doi.org/10.1109/ICIT.2016.056
Article Google Scholar
Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Information 10(4):150
Article Google Scholar
Kulkarni R, Minin E (2021) Automated retrieval of information on threatened species from online sources using machine learning. Methods Ecol Evol. https://doi.org/10.1111/2041-210X.13608
Article Google Scholar
Lee J, Yoon W, Kim S, Kim D, Kim S, So C, Kang J (2019) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz682
Article Google Scholar
Maiya AS (2020) ktrain: a low-code library for augmented machine learning. Preprint at https://arxiv.org/abs/quantph/2004.10703
Nadeau N, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Investig 30(1):3–26. https://doi.org/10.1075/li.30.1.03nad
Article Google Scholar
O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S (2015) Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4:5
Article Google Scholar
Palmer D, Hearst M (1997) Adaptive multilingual sentence boundary disambiguation. Comput Linguist 23(2):241–267
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Google Scholar
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74
Google Scholar
Python Software Foundation. Python Language Reference, version 3.6. Available at http://www.python.org. Accessed 15 Mar 2020
Shemilt I et al (2014) Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Syn Methods 5(1):31–49
Article Google Scholar
Varghese A, Cawley M, Hong T (2017) Supervised clustering for automated document classification and prioritization: a case study using toxicological abstracts. Environ Syst Decis. https://doi.org/10.1007/s10669-017-9670-5
Article Google Scholar
Varghese A, Hong T, Hunter C, Agyeman-Badu G, Cawley M (2019) Active learning in automated text classification: a case study exploring bias in predicted model performance metrics. Environ Syst Decis. https://doi.org/10.1007/s10669-019-09717-3
Article Google Scholar
Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH (2010) Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinform 11:55
Article Google Scholar

Download references

Acknowledgements

We are deeply grateful to Jacob Malcolm of Defenders of Wildlife, who supported this work by providing us the historical repository of ESA documents used in this study from his organization’s website. This work would not have been possible without this data resource. Mr. Malcolm also converted the documents to text format, which was an important preliminary step in this study.

Author information

Authors and Affiliations

ICF, 2635 Meridian Parkway, Durham, NC, 27713, USA
Arun Varghese & George Agyeman-Badu
ICF, 980 9th Street, Suite 1200, Sacramento, CA, 95814, USA
Kasey Allen & Jennifer Haire
Electric Power Resources Institute, 3420 Hillview Avenue, Palo Alto, CA, 94303, USA
Rebecca Madsen

Authors

Arun Varghese
View author publications
You can also search for this author in PubMed Google Scholar
Kasey Allen
View author publications
You can also search for this author in PubMed Google Scholar
George Agyeman-Badu
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Haire
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca Madsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arun Varghese.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Ethical approval

This study was performed by ICF under a contract with the Electric Power Resources Institute.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 24 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Varghese, A., Allen, K., Agyeman-Badu, G. et al. Extraction of mitigation-related text from Endangered Species Act documents using machine learning: a case study. Environ Syst Decis 42, 63–74 (2022). https://doi.org/10.1007/s10669-021-09830-2

Download citation

Accepted: 24 August 2021
Published: 30 August 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10669-021-09830-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extraction of mitigation-related text from Endangered Species Act documents using machine learning: a case study

Abstract

Access this article

Similar content being viewed by others

Validity of Machine Learning in Assessing Large Texts Through Sustainability Indicators

A computational approach to analyzing climate strategies of cities pledging net zero

Tracking Environmental Policy Changes in the Brazilian Federal Official Gazette

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical approval

Supplementary Information

Supplementary file1 (DOCX 24 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Extraction of mitigation-related text from Endangered Species Act documents using machine learning: a case study

Abstract

Access this article

Similar content being viewed by others

Validity of Machine Learning in Assessing Large Texts Through Sustainability Indicators

A computational approach to analyzing climate strategies of cities pledging net zero

Tracking Environmental Policy Changes in the Brazilian Federal Official Gazette

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical approval

Supplementary Information

Supplementary file1 (DOCX 24 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation