How to Code a Million Missions: Developing Bespoke Nonprofit Activity Codes Using Machine Learning Algorithms

Santamarina, Francisco J.; Lecy, Jesse D.; van Holm, Eric Joseph

doi:10.1007/s11266-021-00420-z

How to Code a Million Missions: Developing Bespoke Nonprofit Activity Codes Using Machine Learning Algorithms

Research Papers
Published: 07 October 2021

Volume 34, pages 29–38, (2023)
Cite this article

VOLUNTAS: International Journal of Voluntary and Nonprofit Organizations Aims and scope Submit manuscript

Francisco J. Santamarina ORCID: orcid.org/0000-0003-1724-8769¹,
Jesse D. Lecy² &
Eric Joseph van Holm³

594 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

National Taxonomy of Exempt Entities (NTEE) codes have become the primary classifier of nonprofit missions since they were developed in the mid-1980s in response to growing demands for a taxonomy of nonprofit activities (Herman in Nonprofit and Voluntary Sector Quarterly 19(3):293–306, 1990, Barman in Social Science History 37:103–141, 2013). However, the increasingly complex nature of nonprofits means that NTEE codes may be outdated or lack specificity. As an alternative, scholars and practitioners can create a bespoke taxonomy for a specific purpose by hand-coding a training dataset and using machine learning classifiers to apply the codes to a large population. This paper presents a framework for determining training set sizes needed to scale custom taxonomies using machine learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Methods for Classifying Nonprofit Organizations According to their Field of Activity: A Report on Semi-automated Methods Based on Text

Article Open access 01 November 2019

Julia Litofcenko, Dominik Karner & Florentine Maier

Biases and Ethical Considerations for Machine Learning Pipelines in the Computational Social Sciences

Ask the Data: A Machine Learning Analysis of the Legal Scholarship on Artificial Intelligence

Data Availability

Data and code to replicate the results presented in the article are available via the authors’ dedicated GitHub repository and Harvard Dataverse site: https://fjsantam.github.io/bespoke-npo-taxonomies/

Notes

Form 1023-EZ meta-data was downloaded from the IRS website: https://www.irs.gov/charities-non-profits/exempt-organizations-form-1023ez-approvals
See “Part IV. Foundation Classification” in the instructions for the 1023-EZ form (IRS, 2018).
See the entries for line 7 and line 12 in “Part III. Foundation Classification” (IRS, 2018).
Profits are maximized when firm production levels reach the point of diminishing returns to labor or capital, which is determined by identifying the point that the second derivative of the production function is equal to zero.

References

Barman, E. (2013). Classificatory struggles in the nonprofit sector: The formation of the national taxonomy of exempt entities, 1969–1987. Social Science History, 37, 103–141. https://doi.org/10.2307/23361114
Article Google Scholar
Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774
Article Google Scholar
Brodersen, K. H., Ong, C. S., Stephan, K. E., & Buhmann, J. M. (2010). The balanced accuracy and its posterior distribution. In 2010 20th international conference on pattern recognition (pp. 3121–3124). IEEE.
Fyall, R., Moore, M. K., & Gugerty, M. K. (2018). Beyond NTEE codes: Opportunities to understand nonprofit activity through mission statement content coding. Nonprofit and Voluntary Sector Quarterly, 47(4), 677–701.
Article Google Scholar
Hand, D. J., & Yu, K. (2001). Idiot’s Bayes—not so stupid after all? International Statistical Review, 69(3), 385–398.
Google Scholar
Herman, R. D. (1990). Methodological issues in studying the effectiveness of nongovernmental and nonprofit organizations. Nonprofit and Voluntary Sector Quarterly, 19(3), 293–306. https://doi.org/10.1177/089976409001900309
Article Google Scholar
Internal Revenue Service. (2018). Instructions for form 1023-EZ: Streamlined application for recognition of exemption under section 501(c)(3) of the internal revenue code (Cat. No. 66268Y). Retrieved from https://www.irs.gov/pub/irs-pdf/i1023ez.pdf.
Jones, D. (2019). IRS activity codes. Published January 22, 2019. https://nccs.urban.org/publication/irs-activity-codes.
Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26.
Article Google Scholar
Kuhn, M. (2019). The `caret` package. “17 Measuring Performance.” https://topepo.github.io/caret/measuring-performance.html.
Lecy, J. D., Ashley, S. R., & Santamarina, F. J. (2019a). Do nonprofit missions vary by the political ideology of supporting communities? Some preliminary results. Public Performance & Management Review, 42(1), 115–141.
Article Google Scholar
Lecy, J. D., Santamarina, F. J., & van Holm, E. J. (2019b). The political economy of nonprofit entrepreneurship: Using open data to explore geographic and demographic dimensions of nonprofit mission [Paper presentation]. USC CPPP Symposium, Los Angeles, California.
Lewis, D. D., Yang, Y., Rose, T. G., & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of machine learning research, 5(Apr), 361–397.
Google Scholar
Ma, J. (2021). Automated coding using machine learning and remapping the US nonprofit sector: A guide and benchmark. Nonprofit and Voluntary Sector Quarterly, 50(3), 662–687.
Article Google Scholar
Manning, C. D., Schütze, H., & Raghavan, P. (2009). Introduction to information retrieval. Cambridge university press. Online edition. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf.
Paxton, P., Velasco, K., & Ressler, R. (2019a). Form 990 Mission Glossary v.1. [Computer file]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor].
Paxton, P., Velasco, K., & Ressler, R. (2019b). Form 990 Mission Stemmer v.1. [Computer file]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor].
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Saito, T., & Rehmsmeier, M. (n.d.). Basic evaluation measures from the confusion matrix. https://classeval.wordpress.com/introduction/basic-evaluation-measures/.
Salamon, L. M. & Anheier, H. K. (1996). The International classification of nonprofit organizations: ICNPO-Revision 1, 1996. Working Papers of the Johns Hopkins Comparative Nonprofit Sector Project, no. 19. Baltimore: The Johns Hopkins Institute for Policy Studies.
Tierney, L., Rossini, A. J., Li, N., & Sevcikova, H. (2018). snow: Simple network of workstations. R package version 0.4–3. https://CRAN.R-project.org/package=snow.
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.
Wickham, H., & Seidel, D. (2020). scales: Scale functions for visualization. R package version 1.1.1. https://CRAN.R-project.org/package=scales.

Download references

Acknowledgements

Special thanks to the ARNOVA 2020 Conference Doctoral Fellowship Program participants, ARNOVA 2019 Conference panel feedback, and the USC’s Price School of Public Policy, The Center on Philanthropy & Public Policy’s “Philanthropy & Social Impact: A Research Symposium” (March 15, 2019).

Funding

Partial support for this research came from a Eunice Kennedy Shriver National Institute of Child Health and Human Development research infrastructure grant, P2C HD042828, to the Center for Studies in Demography & Ecology at the University of Washington.

Author information

Authors and Affiliations

Evans School of Public Policy and Governance, University of Washington, 4105 George Washington Lane Northeast, Seattle, WA, 98105, USA
Francisco J. Santamarina
Watts College, Arizona State University, 411 N. Central Ave., Suite 750, Phoenix, AZ, 85004-2163, USA
Jesse D. Lecy
Department of Political Science, Urban Entrepreneurship and Policy Institute, The University of New Orleans, 256 Milneburg Hall, New Orleans, LA, 70148, USA
Eric Joseph van Holm

Authors

Francisco J. Santamarina
View author publications
You can also search for this author in PubMed Google Scholar
Jesse D. Lecy
View author publications
You can also search for this author in PubMed Google Scholar
Eric Joseph van Holm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francisco J. Santamarina.

Ethics declarations

Conflicts of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (TXT 1 KB)

Appendix: Model Fit Formulas

Metric	Formula
Sensitivity	\({\text{SN}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} = \frac{{{\text{TP}}}}{{\text{P}}}\)
Specificity	\({\text{SP}} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}} = \frac{{{\text{TN}}}}{{\text{N}}}\)
Precision	\({\text{PREC}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}\)
Recall	\({\text{SN}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} = \frac{{{\text{TP}}}}{{\text{P}}}\)
F1	\(F_{1} = \frac{{{2} \cdot {\text{PREC}} \cdot {\text{REC}}}}{{{\text{PREC}} + {\text{REC}}}}\)
Accuracy	\({\text{ACC}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FN}} + {\text{FP}}}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{P}} + {\text{N}}}}\)
Balanced accuracy	\(\begin{aligned} {\text{BA }} & = \frac{{\left( {{\text{Sensitivity}} + {\text{Specificity}}} \right)}}{2} \\ & = \left[ {\left( {\frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}} \right) + \left( {\frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}}} \right)} \right]* \frac{1}{2} \\ \end{aligned}\)
Error	\({\text{ERR}} = \frac{{{\text{FP}} + {\text{FN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FN}} + {\text{FP}}}} = \frac{{{\text{FP}} + {\text{FN}}}}{{{\text{P}} + {\text{N}}}}\)

Source: Balanced Accuracy’s first formulation comes from Kuhn (2019). The second formulation comes from Brodersen et al. (2010). Other formulas come from Saito and Rehmsmeier (n.d.).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Santamarina, F.J., Lecy, J.D. & van Holm, E.J. How to Code a Million Missions: Developing Bespoke Nonprofit Activity Codes Using Machine Learning Algorithms. Voluntas 34, 29–38 (2023). https://doi.org/10.1007/s11266-021-00420-z

Download citation

Accepted: 17 September 2021
Published: 07 October 2021
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11266-021-00420-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to Code a Million Missions: Developing Bespoke Nonprofit Activity Codes Using Machine Learning Algorithms

Abstract

Access this article

Similar content being viewed by others

Methods for Classifying Nonprofit Organizations According to their Field of Activity: A Report on Semi-automated Methods Based on Text

Biases and Ethical Considerations for Machine Learning Pipelines in the Computational Social Sciences

Ask the Data: A Machine Learning Analysis of the Legal Scholarship on Artificial Intelligence

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (TXT 1 KB)

Appendix: Model Fit Formulas

Rights and permissions

About this article

Cite this article

Keywords

Navigation

How to Code a Million Missions: Developing Bespoke Nonprofit Activity Codes Using Machine Learning Algorithms

Abstract

Access this article

Similar content being viewed by others

Methods for Classifying Nonprofit Organizations According to their Field of Activity: A Report on Semi-automated Methods Based on Text

Biases and Ethical Considerations for Machine Learning Pipelines in the Computational Social Sciences

Ask the Data: A Machine Learning Analysis of the Legal Scholarship on Artificial Intelligence

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (TXT 1 KB)

Appendix: Model Fit Formulas

Appendix: Model Fit Formulas

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation