Abstract
National Taxonomy of Exempt Entities (NTEE) codes have become the primary classifier of nonprofit missions since they were developed in the mid-1980s in response to growing demands for a taxonomy of nonprofit activities (Herman in Nonprofit and Voluntary Sector Quarterly 19(3):293–306, 1990, Barman in Social Science History 37:103–141, 2013). However, the increasingly complex nature of nonprofits means that NTEE codes may be outdated or lack specificity. As an alternative, scholars and practitioners can create a bespoke taxonomy for a specific purpose by hand-coding a training dataset and using machine learning classifiers to apply the codes to a large population. This paper presents a framework for determining training set sizes needed to scale custom taxonomies using machine learning algorithms.
Similar content being viewed by others
Data Availability
Data and code to replicate the results presented in the article are available via the authors’ dedicated GitHub repository and Harvard Dataverse site: https://fjsantam.github.io/bespoke-npo-taxonomies/
Notes
Form 1023-EZ meta-data was downloaded from the IRS website: https://www.irs.gov/charities-non-profits/exempt-organizations-form-1023ez-approvals
See “Part IV. Foundation Classification” in the instructions for the 1023-EZ form (IRS, 2018).
See the entries for line 7 and line 12 in “Part III. Foundation Classification” (IRS, 2018).
Profits are maximized when firm production levels reach the point of diminishing returns to labor or capital, which is determined by identifying the point that the second derivative of the production function is equal to zero.
References
Barman, E. (2013). Classificatory struggles in the nonprofit sector: The formation of the national taxonomy of exempt entities, 1969–1987. Social Science History, 37, 103–141. https://doi.org/10.2307/23361114
Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774
Brodersen, K. H., Ong, C. S., Stephan, K. E., & Buhmann, J. M. (2010). The balanced accuracy and its posterior distribution. In 2010 20th international conference on pattern recognition (pp. 3121–3124). IEEE.
Fyall, R., Moore, M. K., & Gugerty, M. K. (2018). Beyond NTEE codes: Opportunities to understand nonprofit activity through mission statement content coding. Nonprofit and Voluntary Sector Quarterly, 47(4), 677–701.
Hand, D. J., & Yu, K. (2001). Idiot’s Bayes—not so stupid after all? International Statistical Review, 69(3), 385–398.
Herman, R. D. (1990). Methodological issues in studying the effectiveness of nongovernmental and nonprofit organizations. Nonprofit and Voluntary Sector Quarterly, 19(3), 293–306. https://doi.org/10.1177/089976409001900309
Internal Revenue Service. (2018). Instructions for form 1023-EZ: Streamlined application for recognition of exemption under section 501(c)(3) of the internal revenue code (Cat. No. 66268Y). Retrieved from https://www.irs.gov/pub/irs-pdf/i1023ez.pdf.
Jones, D. (2019). IRS activity codes. Published January 22, 2019. https://nccs.urban.org/publication/irs-activity-codes.
Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26.
Kuhn, M. (2019). The `caret` package. “17 Measuring Performance.” https://topepo.github.io/caret/measuring-performance.html.
Lecy, J. D., Ashley, S. R., & Santamarina, F. J. (2019a). Do nonprofit missions vary by the political ideology of supporting communities? Some preliminary results. Public Performance & Management Review, 42(1), 115–141.
Lecy, J. D., Santamarina, F. J., & van Holm, E. J. (2019b). The political economy of nonprofit entrepreneurship: Using open data to explore geographic and demographic dimensions of nonprofit mission [Paper presentation]. USC CPPP Symposium, Los Angeles, California.
Lewis, D. D., Yang, Y., Rose, T. G., & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of machine learning research, 5(Apr), 361–397.
Ma, J. (2021). Automated coding using machine learning and remapping the US nonprofit sector: A guide and benchmark. Nonprofit and Voluntary Sector Quarterly, 50(3), 662–687.
Manning, C. D., Schütze, H., & Raghavan, P. (2009). Introduction to information retrieval. Cambridge university press. Online edition. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf.
Paxton, P., Velasco, K., & Ressler, R. (2019a). Form 990 Mission Glossary v.1. [Computer file]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor].
Paxton, P., Velasco, K., & Ressler, R. (2019b). Form 990 Mission Stemmer v.1. [Computer file]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor].
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Saito, T., & Rehmsmeier, M. (n.d.). Basic evaluation measures from the confusion matrix. https://classeval.wordpress.com/introduction/basic-evaluation-measures/.
Salamon, L. M. & Anheier, H. K. (1996). The International classification of nonprofit organizations: ICNPO-Revision 1, 1996. Working Papers of the Johns Hopkins Comparative Nonprofit Sector Project, no. 19. Baltimore: The Johns Hopkins Institute for Policy Studies.
Tierney, L., Rossini, A. J., Li, N., & Sevcikova, H. (2018). snow: Simple network of workstations. R package version 0.4–3. https://CRAN.R-project.org/package=snow.
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.
Wickham, H., & Seidel, D. (2020). scales: Scale functions for visualization. R package version 1.1.1. https://CRAN.R-project.org/package=scales.
Acknowledgements
Special thanks to the ARNOVA 2020 Conference Doctoral Fellowship Program participants, ARNOVA 2019 Conference panel feedback, and the USC’s Price School of Public Policy, The Center on Philanthropy & Public Policy’s “Philanthropy & Social Impact: A Research Symposium” (March 15, 2019).
Funding
Partial support for this research came from a Eunice Kennedy Shriver National Institute of Child Health and Human Development research infrastructure grant, P2C HD042828, to the Center for Studies in Demography & Ecology at the University of Washington.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Model Fit Formulas
Appendix: Model Fit Formulas
Metric | Formula |
---|---|
Sensitivity | \({\text{SN}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} = \frac{{{\text{TP}}}}{{\text{P}}}\) |
Specificity | \({\text{SP}} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}} = \frac{{{\text{TN}}}}{{\text{N}}}\) |
Precision | \({\text{PREC}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}\) |
Recall | \({\text{SN}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} = \frac{{{\text{TP}}}}{{\text{P}}}\) |
F1 | \(F_{1} = \frac{{{2} \cdot {\text{PREC}} \cdot {\text{REC}}}}{{{\text{PREC}} + {\text{REC}}}}\) |
Accuracy | \({\text{ACC}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FN}} + {\text{FP}}}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{P}} + {\text{N}}}}\) |
Balanced accuracy | \(\begin{aligned} {\text{BA }} & = \frac{{\left( {{\text{Sensitivity}} + {\text{Specificity}}} \right)}}{2} \\ & = \left[ {\left( {\frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}} \right) + \left( {\frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}}} \right)} \right]* \frac{1}{2} \\ \end{aligned}\) |
Error | \({\text{ERR}} = \frac{{{\text{FP}} + {\text{FN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FN}} + {\text{FP}}}} = \frac{{{\text{FP}} + {\text{FN}}}}{{{\text{P}} + {\text{N}}}}\) |
Rights and permissions
About this article
Cite this article
Santamarina, F.J., Lecy, J.D. & van Holm, E.J. How to Code a Million Missions: Developing Bespoke Nonprofit Activity Codes Using Machine Learning Algorithms. Voluntas 34, 29–38 (2023). https://doi.org/10.1007/s11266-021-00420-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11266-021-00420-z