Skip to main content

Automated estimates of state interest group lobbying populations


A number of strides have been taken in recent years to measure interest group populations in the American states, but sorting these groups by economic sector requires substantial investment in time and personnel. This paper introduces an automated process to estimate the industry of interest groups, using only their names. We discuss the advantages and hurdles of using automated methods and then employ a supervised learning method that produces a reliable set of estimations of the sector of more than six hundred thousand interest groups in the states. We validate these estimates in a number of ways, showing that they closely correlate to datasets employed in the literature, can replicate published results and reflect real-world events.

This is a preview of subscription content, access via your institution.

Fig. 1


  1. Gray and Lowery also coordinated a team that hand-coded the 2007 version of the NIMSP data according to their coding scheme.

  2. Unsupervised methods, such as the one employed by Hopkins (2018), do not require training data and are better suited to produce their own categories, which can be useful in exploratory work and reveal categorizations to researchers when they did not know what to expect.

  3. A useful explanation of ‘Naïve Bayes,’ with equations, is available from the software engineer Ahmet Taspinar on the Data Science Central blog (accessed: January 11, 2016).

  4. We estimate six sets of bags of words: two using just the hand-coded estimates (one with all words and one with stop words removed), two using just the lobbying reports from Colorado and Pennsylvania (with and without stop words) and two with both (with and without stop words). We also evaluate three different equations to combine individual word probabilities from step 4 and two different types of bootstraps drawn 50 times each, as described in step 6.

  5. See the Online Appendix:

  6. We use a stop list of 1557 words and misspellings that remove geographic identifiers, suffixes (e.g., Inc.), basic words (e.g., Services), numbers (1,2,3,...,11811061), locations (akron, alabama, alaskan, etc.), common groupings (association, associates, board, corporation, league, services, etc.), solo letters (f, j, k, etc.), suffixes (junior, llc, inc, corp, etc.) and geographical fixtures (mountain, ocean, riverside, etc.). This list of words is available in Tables 15–18 in Online Appendix.

  7. Specifically, these samples are drawn using STATA’s gsample feature, with analytic weights for \({B_{{\rm s}}}^2\).

  8. Table 14 in Online Appendix shows that the levels of recall and precision scores are lower with the Massachusetts data. This is due to the fact that only 17 of the 26 Gray and Lowery policy codes overlap with Massachusetts’ scheme.

  9. We draw estimates of the number of bills introduced in each policy area using code word searches from LexisNexis (Garlick 2020). This is the same procedure used by Lowery et al. (2004).

  10. Hartman, Kristi. (2014) ‘Lessons from North Dakota’s Energy Boom’ 8 Oct

  11. Available from the Correlates of State Policy project: (accessed 24 Sept 2019).

  12. See Thomas Holyoke’s personal Web site: (accessed 24 Sept 2019).

  13. There are three individual datasets available; the first provides each individual group estimate: The second dataset has the aggregate measure for each year from 2006 to 2017. The third dataset has the aggregate measure for two-year periods from 2006 to 2017. The aggregate datasets are available at


  • Anzia, Sarah F. 2019. Looking for influence in all the wrong places: How studying subnational policy can revive research on interest groups. The Journal of Politics 81 (1): 343–351.

    Article  Google Scholar 

  • Benoit, Kenneth, Drew Conway, Benjamin E. Lauderdale, Michael Laver, and Slava Mikhaylov. 2016. Crowd-sourced text analysis: Reproducible and agile production of political data. American Political Science Review 110 (2): 278–295.

    Article  Google Scholar 

  • Evans, James A., and Pedro Aceves. 2016. Machine translation: Mining text for social theory. Annual Review of Sociology 42: 21–50.

    Article  Google Scholar 

  • Garlick, Alex. 2020. Measuring and Analyzing the Policy Agendas of American State Legislatures: 1991–2017. State Politics and Policy Conference, San Diego, CA.

  • Gentzkow, Matthew, Bryan T Kelly and Matt Taddy. 2017. Text as data. Technical report, National Bureau of Economic Research.

  • Gray, Virginia, and David Lowery. 1995. Interest representation and democratic gridlock. Legislative Studies Quarterly 20: 531–552.

    Article  Google Scholar 

  • Gray, Virginia, and David Lowery. 2000. The population ecology of interest representation: Lobbying communities in the American states. Ann Arbor: University of Michigan Press.

    Google Scholar 

  • Gray, Virginia, John Cluverius, Jeffrey J. Harden, Boris Shor, and David Lowery. 2015. Party competition, party polarization, and the changing demand for lobbying in the American states. American Politics Research 43 (2): 175–204.

    Article  Google Scholar 

  • Greenwood, Justin, and Joanna Dreger. 2013. The Transparency Register: A European vanguard of strong lobby regulation? Interest Groups & Advocacy 2 (2): 139–162.

    Article  Google Scholar 

  • Grimmer, Justin, and Brandon M. Stewart. 2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21 (3): 267–297.

    Article  Google Scholar 

  • Holman, Craig, and William Luneburg. 2012. Lobbying and transparency: A comparative analysis of regulatory reform. Interest Groups & Advocacy 1 (1): 75–104.

    Article  Google Scholar 

  • Holyoke, Thomas T. 2019. Dynamic state interest group systems: A new look with new data. Interest Groups & Advocacy 8: 499–518.

    Article  Google Scholar 

  • Holyoke, Thomas T., and Jeff Cummins. 2019. Interest group and political party influence on growth in state spending and debt. American Politics Research 48: 1–19.

    Google Scholar 

  • Hopkins, Daniel J. 2018. The exaggerated life of death panels? The limited but real influence of elite rhetoric in the 2009–2010 health care debate. Political Behavior 40 (3): 681–709.

    Article  Google Scholar 

  • Junk, Wiebke Marie. 2019. When diversity works: The effects of coalition composition on the success of lobbying coalitions. American Journal of Political Science 63: 660–674.

    Article  Google Scholar 

  • Lorenz, Geoffrey M. 2019. Prioritized interests: Diverse lobbying coalitions and congressional committee agenda-setting. Journal of Politics 82: 225–240.

    Article  Google Scholar 

  • Lowery, David, Virginia Gray, Matthew Fellowes, and Jennifer Anderson. 2004. Living in the moment: Lags, leads, and the link between legislative agendas and interest advocacy. Social Science Quarterly 85 (2): 463–477.

    Article  Google Scholar 

  • Quinn, Kevin M., Burt L. Monroe, Michael Colaresi, Michael H. Crespin, and Dragomir R. Radev. 2010. How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54 (1): 209–228.

    Article  Google Scholar 

  • Schütze, Hinrich, Christopher D. Manning, and Prabhakar Raghavan. 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press.

    Google Scholar 

  • Stone, Philip J., Dexter C. Dunphy, and Marshall S. Smith. 1966. The General Inquirer: A Computer Approach to Content Analysis. Cambridge: MIT Press.

    Google Scholar 

  • Strickland, James. 2019. A paradox of political reform: Shadow interests in the US states. American Politics Research 47 (4): 887–914.

    Article  Google Scholar 

  • Sumner, Jane Lawrence, Emily M. Farris, and Mirya R. Holman. 2019. Crowdsourcing reliable local data. Political Analysis 28: 244–262.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Alex Garlick.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank anonymous reviewers and participants at the 2017 State Politics and Policy Conference in St. Louis, Missouri, for helpful comments, as well as Virginia Gray and David Lowery for sharing data. Supplementary materials and the data described in this article are available for download at the Harvard Dataverse:

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Garlick, A., Cluverius, J. Automated estimates of state interest group lobbying populations. Int Groups Adv 9, 396–409 (2020).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Interest groups
  • Data science
  • State politics
  • Legislative studies
  • Text as data
  • Automated methods