Theory and Society

, Volume 43, Issue 3–4, pp 465–482 | Cite as

The cultural environment: measuring culture with big data

  • Christopher A. BailEmail author


The rise of the Internet, social media, and digitized historical archives has produced a colossal amount of text-based data in recent years. While computer scientists have produced powerful new tools for automated analyses of such “big data,” they lack the theoretical direction necessary to extract meaning from them. Meanwhile, cultural sociologists have produced sophisticated theories of the social origins of meaning, but lack the methodological capacity to explore them beyond micro-levels of analysis. I propose a synthesis of these two fields that adjoins conventional qualitative methods and new techniques for automated analysis of large amounts of text in iterative fashion. First, I explain how automated text extraction methods may be used to map the contours of cultural environments. Second, I discuss the potential of automated text-classification methods to classify different types of culture such as frames, schema, or symbolic boundaries. Finally, I explain how these new tools can be combined with conventional qualitative methods to trace the evolution of such cultural elements over time. While my assessment of the integration of big data and cultural sociology is optimistic, my conclusion highlights several challenges in implementing this agenda. These include a lack of information about the social context in which texts are produced, the construction of reliable coding schemes that can be automated algorithmically, and the relatively high entry costs for cultural sociologists who wish to develop the technical expertise currently necessary to work with big data.


Culture Content analysis Mixed-methods Evolutionary theory 



I thank Elizabeth Armstrong, Alex Hanna, Gabe Ignatow, Charles Kurzman, Brayden King, Jennifer Lena, John Mohr, Terry McDonnell, Andy Perrin, and Steve Vaisey for helpful comments on previous drafts. The Robert Wood Johnson Foundation and the Odum Institute at the University of North Carolina provided financial support for this research.


  1. Abbott, A. (1995). Things of boundaries. Social Science Research, 62(4), 857–882.Google Scholar
  2. Abbott, A. (1997). On the concept of turning point. Comparative Social Research, 16, 85–106.Google Scholar
  3. Abbott, A. (2001). Chaos of disciplines. Chicago: University of Chicago Press.Google Scholar
  4. Agnew, J., Gillespie, T., Gonzalez, J., & Min, B. (2008). Baghdad nights: Evaluating the US military “surge” using nighttime light signatures.Google Scholar
  5. Alexander, J. (2006). The civil sphere. Oxford: Oxford University Press.CrossRefGoogle Scholar
  6. Alexander, J., & Smith, P. (2001). The strong program in cultural theory: Elements of a structural hermeneutics. In J. H. Turner (Ed.), Handbook of sociological theory (pp. 135–150). New York: Springer.Google Scholar
  7. Armstrong, E. A. (2002). Forging gay identities: Organizing sexuality in San Francisco, 1950–1994. Chicago: University of Chicago Press.Google Scholar
  8. Bail, C. (2012). The fringe effect: civil society organizations and the evolution of media discourse about Islam, 2001–2008. American Sociological Review, 77(7), 855–879.CrossRefGoogle Scholar
  9. Bail, C. (2013a). Winning minds through hearts: Organ donation advocacy, emotional feedback, and social media. Working Paper, Department of Sociology, University of North Carolina at Chapel Hill.Google Scholar
  10. Bail, C. (2013b). Taming big data: Apps and the future of survey research. Working Paper, Department of Sociology, University of North Carolina, Chapel Hill.Google Scholar
  11. Bail, C.A. (forthcoming). Terrified: How anti-muslim organizations became mainstream. Princeton University Press, Princeton, NJ.Google Scholar
  12. Barth, F. (1969). Ethnic groups and boundaries: The social organization of cultural difference. Boston: Little, Brown.Google Scholar
  13. Bartley, T. (2007). How foundations shape social movements: the construction of an organizational field and the rise of forest certification. Social Problems, 54(3), 229–255.CrossRefGoogle Scholar
  14. Baumer, E. P. S., Polletta, F., Pierski, N., Celaya, C., Rosenblatt, K., & Gay, G. K. (2013, February). Developing computational supports for frame reflection. Retrieved from
  15. Bearman, P., & Stovel, K. (2000). Becoming a Nazi: a model for narrative networks. Poetics, 27(2), 69–90.CrossRefGoogle Scholar
  16. Bearman, P., Faris, R., & Moody, J. (1999). Blocking the future: new solutions for old problems in historical social science. Social Science History, 23(4), 501–533.Google Scholar
  17. Benford, R., & Snow, D. (2003, November 28). Framing processes and social movements: An overview and assessment. Review-article.Google Scholar
  18. Biernacki, R. (2012). Reinventing evidence in social inquiry: Decoding facts and variables. New York: Palgrave Macmillan.CrossRefGoogle Scholar
  19. Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.CrossRefGoogle Scholar
  20. Blei, D., & Lafferty, J. (2006). International Conference on Machine Learning, ACM, New York, New York, 113–120.Google Scholar
  21. Blei, D., & Lafferty, J. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17–35.CrossRefGoogle Scholar
  22. Blei, D., & McAuliffe, J. (2010). Supervised topic models. arXiv:1003.0783. Retrieved from
  23. Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.Google Scholar
  24. Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8. doi: 10.1016/j.jocs.2010.12.007.CrossRefGoogle Scholar
  25. Bourdieu, P. (1975). The specificity of the scientific field and the social conditions of the progres of reason. Social Science Information, 14(6), 1–19.CrossRefGoogle Scholar
  26. Bourdieu, P. (1985). The social space and the genesis of groups. Theory and Society, 14(6), 723–744. doi: 10.1007/BF00174048.CrossRefGoogle Scholar
  27. Bourdieu, P. (1990). Homo Academicus (1st ed.). Stanford: Stanford University Press.Google Scholar
  28. Cerulo, K. A. (1998). Deciphering violence: The cognitive structure of right and wrong (1st ed.). New York: Routledge.Google Scholar
  29. Chang, J., Boyd-graber, J., Gerrish, S., Wang, C., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models.Google Scholar
  30. Collins, R. (2013). Solving the Mona Lisa smile, and other developments in visual micro-sociology. Working Paper, Department of Sociology, University of Pennsylvania.Google Scholar
  31. D’Andrade, R. G. (1995). The development of cognitive anthropology. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  32. DiMaggio, P. (1997). Culture and cognition. Annual Review of Sociology, 23, 263–287.CrossRefGoogle Scholar
  33. DiMaggio, P., & Bonikowski, B. (2008). Make money surfing the web? The impact of internet use on the earnings of U.S. workers. American Sociological Review, 73(2), 227–250. doi: 10.1177/000312240807300203.CrossRefGoogle Scholar
  34. Dimaggio, P., Hargittai, E., Neuman, W. R., & Robinson, J. (2001). Social implications of the internet. Annual Review of Sociology, 27, 307–336.CrossRefGoogle Scholar
  35. Dimaggio, P., Nag, M., & Blei, D. (forthcoming). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of government arts funding in the U.S. Poetics, Page numbers unknown.Google Scholar
  36. Douglas, M. (1966). Purity and danger: An analysis of concepts of pollution and taboo. New York: Praeger.CrossRefGoogle Scholar
  37. Douglas, M. (1986). How institutions think. Syracuse: Syracuse University Press.Google Scholar
  38. Eliasoph, N., & Lichterman, P. (2003). Culture in interaction. American Journal of Sociology, 108(4), 735–794.CrossRefGoogle Scholar
  39. Espeland, W. N., & Stevens, M. L. (1998). Commensuration as a social process. Annual Review of Sociology, 24, 313–343. doi: 10.2307/223484.CrossRefGoogle Scholar
  40. Evans, R., & Kay, T. (2008). How environmentalists “Greened” trade policy: strategic action and the architecture of field overlap. American Sociological Review, 73(6), 970–991. doi: 10.1177/000312240807300605.CrossRefGoogle Scholar
  41. Eyal, G. (2009). The space between fields. Working Paper, Center for Comparative Research, Yale University.Google Scholar
  42. Fligstein, N., & McAdam, D. (2011). Toward a general theory of strategic action fields. Sociological Theory, 29(1), 1–26.CrossRefGoogle Scholar
  43. Foucault, M. (1970). The order of things: An archaeology of the human sciences (1st ed.). New York: Vintage.Google Scholar
  44. Franzosi, R. (2004). From words to numbers: Narrative, data, and social science. Cambridge: Cambridge University Press.Google Scholar
  45. Franzosi, R. (2009). Quantitative narrative analysis (1st ed.). Thousand Oaks: SAGE Publications, Inc.Google Scholar
  46. Gaby, S., & Caren, N. (2012). Occupy online: how cute old men and Malcolm X Recruited 400,000 U.S. users to OWS on Facebook. Social Movement Studies, 11, 367–374.CrossRefGoogle Scholar
  47. Geertz, C. (1973). The interpretation of cultures: Selected essays. New York: Basic Books.Google Scholar
  48. Ghaziani, A. (2009). An “amorphous mist”? The problem of measurement in the study of culture. Theory and Society, 38(6), 581–612. doi: 10.1007/s11186-009-9096-2.CrossRefGoogle Scholar
  49. Ghaziani, A., & Baldassarri, D. (2011). Cultural anchors and the organization of differences. American Sociological Review, 76(2), 179–206. doi: 10.1177/0003122411401252.CrossRefGoogle Scholar
  50. Gieryn, T. F. (1999). Cultural boundaries of science: Credibility on the line (1st ed.). Chicago: University of Chicago Press.Google Scholar
  51. Goffman, E. (1963). Stigma: Notes on the management of spoiled identity. New York: Touchstone.Google Scholar
  52. Goffman, E. (1974). Frame analysis. Cambridge: Harvard University Press.Google Scholar
  53. Gold, M. K. (2012). Debates in the digital humanities. Minneapolis: U of Minnesota Press.Google Scholar
  54. Golder, S. A., & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep, and day length across diverse cultures. Science, 333(6051), 1878–1881. doi: 10.1126/science.1202775.CrossRefGoogle Scholar
  55. Gong, A. (2011). An automated snowball census of the political web. SSRN eLibrary. Retrieved from
  56. Grimmer, J. (2010). A Bayesian hierchical topic model for political texts: measuring expressed agendas in senate press releases. Political Analysis, 18(1), 1–35.CrossRefGoogle Scholar
  57. Grimmer, J., & King, G. (2011). General purpose computer-assisted clustering and conceptualization. Proceedings of the National Academy of Sciences, 108(7), 2643–2650. doi: 10.1073/pnas.1018067108.CrossRefGoogle Scholar
  58. Griswold, W., & Wright, N. (2004). Wired and well read. In Society online: The internet in context. New York: Sage.Google Scholar
  59. Hopkins, D. (2013). The exaggerated life of death panels: The limits of framing effects in the 2009–2012 health care debate. Working Paper, SSRN.Google Scholar
  60. Hopkins, D. J., & King, G. (2010). A method of automated nonparametric content analysis for social science. American Journal of Political Science, 54(1), 229–247. doi: 10.1111/j.1540-5907.2009.00428.x.CrossRefGoogle Scholar
  61. Ignatow, G., & Mihalcea, R. (2013). Text mining for comparative cultural analysis. Working Paper, Department of Sociology, University of North Texas.Google Scholar
  62. Johnson-Hanks, J., Bachrach, C., Morgan, P., & Kohler, H.-P. (2011). Understanding family change and variation: toward a theory of conjuctural action. Understanding Population Trends and Processes, 5, 1–179.Google Scholar
  63. Kaufman, J. (2004). Endogenous explanation in the sociology of culture. Annual Review of Sociology, 30, 335–357.CrossRefGoogle Scholar
  64. King, G. (2011). Ensuring the data rich future of the social sciences. Science, 331(11 February), 719–721.CrossRefGoogle Scholar
  65. Krippendorff, K. H. (2003). Content analysis: An introduction to its methodology (2nd ed.). Thousand Oaks: Sage Publications, Inc.Google Scholar
  66. Lamont, M. (1992). Money, morals, and manners: The culture of the French and American upper-middle class. Chicago: University of Chicago Press.CrossRefGoogle Scholar
  67. Lamont, M. (2000). The dignity of working men: Morality and the boundaries of race, class, and immigration. New York: Russell Sage.Google Scholar
  68. Lamont, M. (2012). Toward a comparative sociology of valuation and evaluation. Annual Review of Sociology, 38, 201–221.CrossRefGoogle Scholar
  69. Lamont, M., & White, P. (2009). The evaluation of systematic qualitative research in the social sciences. Report of the U.S. National Science Foundation.Google Scholar
  70. Lan, T., & Raptis, M. (2013). From subcategories to visual composites: A multi-level framework for object recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  71. Latour, B. (1988). How to follow scientists and engineers through society. Cambridge: Harvard University Press.Google Scholar
  72. Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L., Brewer, D., et al. (2009). SOCIAL SCIENCE: computational social science. Science, 323(5915), 721–723. doi: 10.1126/science.1167742.CrossRefGoogle Scholar
  73. Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N. (2008). Tastes, ties, and time: a new social network dataset using Social Networks, 30(4), 330–342. doi: 10.1016/j.socnet.2008.07.002.CrossRefGoogle Scholar
  74. Lieberson, S. (2000). A matter of taste: How names, fashions, and culture change. New Haven: Yale University Press.Google Scholar
  75. Livne, A., Simmons, M. P., Adar, E., & Adamic, L. (2011). The party is over here: Structure and content in the 2010 election. Proceedings of the Fifth Intenrational AAAI Conference on Weblogs and Social Media, 201–209.Google Scholar
  76. Loveman, M., & Muniz, J. (2007). How Puerto Rico became white: boundary dynamics and inter-census racial classification. American Sociological Review, 72, 915–939.CrossRefGoogle Scholar
  77. Manning, C. D., & Schuetze, H. (1999). Foundations of statistical natural language processing (1st ed.). Cambridge: The MIT Press.Google Scholar
  78. Mark, N. P. (2003). Culture and competition: homophily and distancing explanations for cultural niches. American Sociological Review, 68(3), 319–345. doi: 10.2307/1519727.CrossRefGoogle Scholar
  79. Martin, J. L. (2003). What is field theory? American Journal of Sociology, 109(1), 1–49.CrossRefGoogle Scholar
  80. Medvetz, T. (2012). The rise of think tanks in America: Merchants of policy and power. Chicago: University of Chicago.CrossRefGoogle Scholar
  81. Merton, R. (1949). Social theory and social structure. New York: The Free Press.Google Scholar
  82. Mische, A. (2008). Partisan publics: Communication and contention across Brazilian youth activist networks. Princeton: Princeton University Press.Google Scholar
  83. Mohr, J. (1998). Measuring meaning structures. Annual Review of Sociology, 24, 345–370.CrossRefGoogle Scholar
  84. Mohr, J., & Guerra-Pearson, F. (2010). The duality of niche and form: The differentiation of institutional space in New York City, 1888–1917. In Categories in markets: Origins and evolution (pp. 321–368). New York: Emerald Group Publishing.Google Scholar
  85. Mohr, J., Singh, A., & Wagner-Pacifici, R. (2013). CulMINR: Cultural meanings from the interpretation of narrative and rhetoric: A dynamic network approach to hermeneutic mining of large text corpora. Working Paper, Department of Sociology, University of California, Santa Barbara.Google Scholar
  86. Mohr, J., Wagner-Pacifici, R., Breiger, R., Bogdanov, P. (2014). Graphing the grammar of motives in National Security Strategies: cultural interpretation, automated text analysis, and the drama of global politics. Poetics, 41(6), 670–700.Google Scholar
  87. Moretti, F. (2013). Distant reading. London: VERSO BOOKS.Google Scholar
  88. Pachucki, M. A., & Breiger, R. L. (2010). Cultural holes: beyond relationality in social networks and culture. Annual Review of Sociology, 36(1), 205–224. doi: 10.1146/annurev.soc.012809.102615.CrossRefGoogle Scholar
  89. Padgett, J. F., & Powell, W. W. (2012). The emergence of organizations and markets. Princeton: Princeton University Press.Google Scholar
  90. Paul, M. J., & Dredze, M. (2011). You are what you tweet: Analyzing Twitter for public health. Fifth International Conference on Weblogs.Google Scholar
  91. Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., & Radev, D. R. (2010). How to analyze political attention with minimal assumptions and costs. American Journal of Political Science, 54(1), 209–228. doi: 10.1111/j.1540-5907.2009.00427.x.CrossRefGoogle Scholar
  92. Scheufele, D. A. (1999). Framing as a theory of media effects. The Journal of Communication, 49(1), 103–122. doi: 10.1111/j.1460-2466.1999.tb02784.x.CrossRefGoogle Scholar
  93. Sewell, W. (1996). Historical events as transformations of structures: inventing revolution at the Bastille. Theory and Society, 25(6), 841–881. doi: 10.1007/BF00159818.CrossRefGoogle Scholar
  94. Smith, T. (2007). Narrative boundaries and the dynamics of ethnic conflict and conciliation. Poetics, 35, 22–46.CrossRefGoogle Scholar
  95. Swidler, A. (1986). Culture in action: symbols and strategies. American Sociological Review, 51(2), 273–286.CrossRefGoogle Scholar
  96. Swidler, A. (1995). Cultural power and social movements. In Social movements and culture. London: Routledge.Google Scholar
  97. Tangherlini, T. R., & Leonard, P. (2013). Trawling in the Sea of the Great Unread: sub-corpus topic modeling and Humanities research. Poetics. doi: 10.1016/j.poetic.2013.08.002.Google Scholar
  98. Tavory, I., & Timmermans, S. (2013). Consequences in Action: A pragmatist approach to causality in ethnography. Working Paper, New School for Social Research.Google Scholar
  99. Vaisey, S., & Lizardo, O. (2010). Can cultural worldviews influence network composition? Social Forces, 88(4), 1595–1618. doi: 10.1353/sof.2010.0009.CrossRefGoogle Scholar
  100. Wagner-Pacifici, R. (2010). Theorizing the restlessness of events. American Journal of Sociology, 115(5), 1351–1386.CrossRefGoogle Scholar
  101. Wallach, H. (2006). Topic modeling: Beyond bag of words. Proceedings of the 23rd International Conference on Machine Learnings.Google Scholar
  102. Weber, K. (2005). A toolkit for analyzing corporate cultural toolkits. Poetics, 33(3–4), 227–252. doi: 10.1016/j.poetic.2005.09.011.CrossRefGoogle Scholar
  103. Wuthnow, R. (1993). Communities of discourse: Ideology and social structure in the reformation, the enlightenment, and European socialism. Cambridge: Harvard University Press.Google Scholar
  104. Zelizer, V. A. R. (1985). Pricing the priceless child: The changing social value of children. Princeton: Princeton University Press.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.University of North Carolina at Chapel HillChapel HillUSA

Personalised recommendations