Abstract
A number of strides have been taken in recent years to measure interest group populations in the American states, but sorting these groups by economic sector requires substantial investment in time and personnel. This paper introduces an automated process to estimate the industry of interest groups, using only their names. We discuss the advantages and hurdles of using automated methods and then employ a supervised learning method that produces a reliable set of estimations of the sector of more than six hundred thousand interest groups in the states. We validate these estimates in a number of ways, showing that they closely correlate to datasets employed in the literature, can replicate published results and reflect real-world events.
This is a preview of subscription content, access via your institution.

Notes
Gray and Lowery also coordinated a team that hand-coded the 2007 version of the NIMSP data according to their coding scheme.
Unsupervised methods, such as the one employed by Hopkins (2018), do not require training data and are better suited to produce their own categories, which can be useful in exploratory work and reveal categorizations to researchers when they did not know what to expect.
A useful explanation of ‘Naïve Bayes,’ with equations, is available from the software engineer Ahmet Taspinar on the Data Science Central blog http://www.datasciencecentral.com/profiles/blogs/text-classification-sentiment-analysis-tutorial-blog (accessed: January 11, 2016).
We estimate six sets of bags of words: two using just the hand-coded estimates (one with all words and one with stop words removed), two using just the lobbying reports from Colorado and Pennsylvania (with and without stop words) and two with both (with and without stop words). We also evaluate three different equations to combine individual word probabilities from step 4 and two different types of bootstraps drawn 50 times each, as described in step 6.
See the Online Appendix: https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/WLYBSX/VSH8ZR&version=2.0.
We use a stop list of 1557 words and misspellings that remove geographic identifiers, suffixes (e.g., Inc.), basic words (e.g., Services), numbers (1,2,3,...,11811061), locations (akron, alabama, alaskan, etc.), common groupings (association, associates, board, corporation, league, services, etc.), solo letters (f, j, k, etc.), suffixes (junior, llc, inc, corp, etc.) and geographical fixtures (mountain, ocean, riverside, etc.). This list of words is available in Tables 15–18 in Online Appendix.
Specifically, these samples are drawn using STATA’s gsample feature, with analytic weights for \({B_{{\rm s}}}^2\).
Table 14 in Online Appendix shows that the levels of recall and precision scores are lower with the Massachusetts data. This is due to the fact that only 17 of the 26 Gray and Lowery policy codes overlap with Massachusetts’ scheme.
Hartman, Kristi. (2014) ‘Lessons from North Dakota’s Energy Boom’ 8 Oct http://www.ncsl.org/blog/2014/10/08/lessons-from-north-dakotas-energy-boom.aspx.
Available from the Correlates of State Policy project: http://ippsr.msu.edu/public-policy/correlates-state-policy (accessed 24 Sept 2019).
See Thomas Holyoke’s personal Web site: http://www.fresnostate.edu/socialsciences/polisci/fac-staff/full-time/holyoke.html (accessed 24 Sept 2019).
There are three individual datasets available; the first provides each individual group estimate: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/AW4DY7. The second dataset has the aggregate measure for each year from 2006 to 2017. The third dataset has the aggregate measure for two-year periods from 2006 to 2017. The aggregate datasets are available at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/WLYBSX.
References
Anzia, Sarah F. 2019. Looking for influence in all the wrong places: How studying subnational policy can revive research on interest groups. The Journal of Politics 81 (1): 343–351.
Benoit, Kenneth, Drew Conway, Benjamin E. Lauderdale, Michael Laver, and Slava Mikhaylov. 2016. Crowd-sourced text analysis: Reproducible and agile production of political data. American Political Science Review 110 (2): 278–295.
Evans, James A., and Pedro Aceves. 2016. Machine translation: Mining text for social theory. Annual Review of Sociology 42: 21–50.
Garlick, Alex. 2020. Measuring and Analyzing the Policy Agendas of American State Legislatures: 1991–2017. State Politics and Policy Conference, San Diego, CA.
Gentzkow, Matthew, Bryan T Kelly and Matt Taddy. 2017. Text as data. Technical report, National Bureau of Economic Research.
Gray, Virginia, and David Lowery. 1995. Interest representation and democratic gridlock. Legislative Studies Quarterly 20: 531–552.
Gray, Virginia, and David Lowery. 2000. The population ecology of interest representation: Lobbying communities in the American states. Ann Arbor: University of Michigan Press.
Gray, Virginia, John Cluverius, Jeffrey J. Harden, Boris Shor, and David Lowery. 2015. Party competition, party polarization, and the changing demand for lobbying in the American states. American Politics Research 43 (2): 175–204.
Greenwood, Justin, and Joanna Dreger. 2013. The Transparency Register: A European vanguard of strong lobby regulation? Interest Groups & Advocacy 2 (2): 139–162.
Grimmer, Justin, and Brandon M. Stewart. 2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21 (3): 267–297.
Holman, Craig, and William Luneburg. 2012. Lobbying and transparency: A comparative analysis of regulatory reform. Interest Groups & Advocacy 1 (1): 75–104.
Holyoke, Thomas T. 2019. Dynamic state interest group systems: A new look with new data. Interest Groups & Advocacy 8: 499–518.
Holyoke, Thomas T., and Jeff Cummins. 2019. Interest group and political party influence on growth in state spending and debt. American Politics Research 48: 1–19.
Hopkins, Daniel J. 2018. The exaggerated life of death panels? The limited but real influence of elite rhetoric in the 2009–2010 health care debate. Political Behavior 40 (3): 681–709.
Junk, Wiebke Marie. 2019. When diversity works: The effects of coalition composition on the success of lobbying coalitions. American Journal of Political Science 63: 660–674.
Lorenz, Geoffrey M. 2019. Prioritized interests: Diverse lobbying coalitions and congressional committee agenda-setting. Journal of Politics 82: 225–240.
Lowery, David, Virginia Gray, Matthew Fellowes, and Jennifer Anderson. 2004. Living in the moment: Lags, leads, and the link between legislative agendas and interest advocacy. Social Science Quarterly 85 (2): 463–477.
Quinn, Kevin M., Burt L. Monroe, Michael Colaresi, Michael H. Crespin, and Dragomir R. Radev. 2010. How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54 (1): 209–228.
Schütze, Hinrich, Christopher D. Manning, and Prabhakar Raghavan. 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press.
Stone, Philip J., Dexter C. Dunphy, and Marshall S. Smith. 1966. The General Inquirer: A Computer Approach to Content Analysis. Cambridge: MIT Press.
Strickland, James. 2019. A paradox of political reform: Shadow interests in the US states. American Politics Research 47 (4): 887–914.
Sumner, Jane Lawrence, Emily M. Farris, and Mirya R. Holman. 2019. Crowdsourcing reliable local data. Political Analysis 28: 244–262.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank anonymous reviewers and participants at the 2017 State Politics and Policy Conference in St. Louis, Missouri, for helpful comments, as well as Virginia Gray and David Lowery for sharing data. Supplementary materials and the data described in this article are available for download at the Harvard Dataverse: https://dataverse.harvard.edu/dataverse/garlick_auto.
Rights and permissions
About this article
Cite this article
Garlick, A., Cluverius, J. Automated estimates of state interest group lobbying populations. Int Groups Adv 9, 396–409 (2020). https://doi.org/10.1057/s41309-020-00091-z
Published:
Issue Date:
DOI: https://doi.org/10.1057/s41309-020-00091-z
Keywords
- Interest groups
- Data science
- State politics
- Legislative studies
- Text as data
- Automated methods