Skip to main content

Sociology in the Era of Big Data: The Ascent of Forensic Social Science

Abstract

The rise of big data—data that are not only large and massively multivariate but concern a dizzying array of phenomena—represents a watershed moment for the social sciences. These data have created demand for new methods that reduce/simplify the dimensionality of data, identify novel patterns and relations, and predict outcomes, from computational ethnography and computational linguistics to network science, machine learning, and in situ experiments. Such developments have led scholars to begin new lines of social inquiry. Company engineers, computer scientists, and social scientists have all converged on big data, creating the possibility of a vibrant “trading zone” for collaboration. However, strong differences in research frameworks help explain why big data may not be an egalitarian trading zone across fields, but rather—at least in the short term—a moment when engineering colonizes sociology more than vice versa. In the long term, however, we suggest there may be the possibility of a constructive synthesis across paradigms in what we term ‘forensic social science.’

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Notes

  1. This approach also had an elective affinity with certain social scientific theories over others: for instance, rational choice theory (Coleman 1994a) was arguably more readily translated into the data and methods of the time, than say the more abstract theories that preceded it, such as structural functionalism.

  2. However, we are not necessarily living an age of science. By this we mean that we are black-boxing information and knowledge in tools and treatments that bring about desired outcomes without ever really understanding why or how they do so. While in the past, scientific facts were black-boxed due to their complexity (Latour 1988), we now find ourselves in a time when we often seek solutions without any concern or desire for explanation at all. Not everyone wants this; but rather, the prevailing pressures of industry, engineering, and practical concerns of life in a technology-mediated age demand this. Scientists will still seek explanations—but their voices may remain an (increasingly small) numerical minority.

  3. This feature of big data, in particular, is a curse as much as a blessing. Several years ago, Lazer et al. questioned whether “computational social science could become the exclusive domain of private companies and government agencies” (2009:721). Equally problematic as questions of ownership and access, however, are related concerns about data quality and interpretation: in eliminating the participation of the academic researcher, we also eliminate the guiding force that orients data collection towards the pursuit of knowledge rather than the maximization of profit. The data that are collected in industry are not always the data that are most useful for science (as we elaborate in great detail below); worse, too seldom acknowledged is the basic observation that technologies constrain as much as they enable—and so any given dataset may tell us less about human agency and more about interfaces and algorithms that subtly influence user behavior (cf. Lewis 2015).

  4. An additional dimension to these new types of data relates to behaviors that are made possible by digital intermediation and hitherto did not exist. For example, in pre-internet times, people were simply technically unable to share photos on the scale and frequency they do today. These types of technologically enabled social transactions are a specific category of behaviors, some of which may (or may not) significantly affect social dynamics and structures. Though the same technological advances that make big data possible enable these new categories of data, these advances are, in principle, no different from previous technological and ideational transformations that catalyzed social change (such as the invention of the printing press or the emergence of the formal organization). In that respect, data generated on digitally-mediated platforms that represent new categories of social action are no different from other phenomena of sociological interest.

  5. For our part, we believe the linking of multiple corpora for an entire domain will bring the greatest advances to the social sciences. With rich, multifaceted data for an entire social system—of, say, politics, a market, or academe—we can ask and answer a variety of social science questions with less concern of confounding, missing data, and selection bias (see Coleman 1994b).

  6. A note of caution is in order here: by no concern for theory, we are referring only to theories that relate to explaining the social phenomenon in question. There are, of course, multiple statistical assumptions, informed by theory, that are embodied in the data-mining algorithms being employed.

  7. “Novel” often amounts to “more comprehensive” as system-wide, societal-wide, and even planetary-wide data are increasingly available (e.g. Leskovec and Horvitz 2008).

  8. http://www.pewinternet.org/data-trend/mobile/device-ownership/

  9. Naturally, conclusions from big data will always require qualification insofar as 1) the “digital divide” persists and 2) usage patterns of a given technology are differentiated even among those who can access it (see Lewis in press). Nonetheless, digital—and especially mobile—communications technologies are diffusing at a staggering rate (e.g. Castells et al. 2007); many population-level datasets are compiled by government or other record-keeping organizations and inclusion is not biased by “self-selection”; and given our unprecedented reliance on technology for communication, information retrieval, and relationship formation and maintenance (Bohn et al. 2014; Rosenfeld and Thomas 2012; Sparrow et al. 2011), the sheer size of available data and the proportion of humanity to whom it pertains is staggering.

  10. The term “paradigm” may be too strong for the social science disciplines, as they often lack a shared set of standards and questions. In fact, Thomas Kuhn regarded them as pre-paradigmatic (1996). That said, we maintain it is still reasonable to regard social science and engineering as entailing different research frameworks or distinct gestalts of epistemology and research activity.

  11. Distinct kinds of training and opportunities for employment are also important, but brevity requires this to remain a caricature of the two perspectives. We merely intend for the reader to grasp these differences on an intuitive level, so she can see incommensurability as one reason for less reciprocal forms of exchange in the trading zone of big data.

  12. Here is an example of where the study of online dating sites by engineers reveal which odd questions differentiate tastes (http://blog.okcupid.com/index.php/the-best-questions-for-first-dates/).

References

  • Abbott, A. (1988). Transcending general linear reality. Sociological Theory, 6(2), 169–86.

    Article  Google Scholar 

  • Agresti, A., & Finlay, B. (2009). Statistical methods for the social sciences (4th ed.). Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Alpaydin, E. (2004). Introduction to machine learning. Cambridge: MIT Press.

    Google Scholar 

  • Anand, G. (2010). A weird way of thinking has prevailed worldwide. New York Times (August 25, 2010).

  • Anderson, M. J. (1988). The American census: a social history. New York: Yale University Press.

    Google Scholar 

  • Anderson, A., McFarland, D. A., & Jurafsky, D. (2012). Towards a computational history of the ACL: 1980–2008. Association of Computational Linguistics, Workshop (ACL Workshop 2012).

  • Backstrom, L., Kleinberg, J., Lee, L., & Danescu-Niculescu-Mizil, C. (2013). Characterizing and curating conversation threads: expansion, focus, volume, re-entry. Proceedings of WSDM, 2013.

  • Bail, C. A. (2014). The cultural environment: measuring culture with big data. Theory and Society, 43, 465–482.

    Article  Google Scholar 

  • Barabasi, A. (2003). Linked: How everything is connected to everything else and what it means for business, science, and everyday life. New York: Plume.

    Google Scholar 

  • Bender-deMoll, S., & McFarland, D. A. (2006). The art and science of dynamic network visualization. Journal of Social Structure, 7(2).

  • Berger, P., & Luckmann, T. (1966). The social construction of reality: a treatise in the sociology of knowledge. New York: Anchor.

    Google Scholar 

  • Bishop, C. (2007). Pattern recognition and machine learning (information science and statistics). Cambridge: Springer.

    Google Scholar 

  • Blei, D. (2012). Probabilistic topic models. Review article, Communication of the ACM, 55(4), 77–84.

    Article  Google Scholar 

  • Bohn, A., Buchta, C., Hornik, K., & Mair, P. (2014). Making friends and communicating on facebook: implications for the access to social capital. Social Networks, 37, 29–41.

    Article  Google Scholar 

  • Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323, 892–95.

    Article  Google Scholar 

  • Boyd, D., & Crawford, K. (2012). Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15, 662–79.

    Article  Google Scholar 

  • Brandes, U., Robins, G., McCranie, A., & Wasserman, S. (2013). What is network science? Network Science, 1, 1–15.

    Article  Google Scholar 

  • Brown, J. S., & Duguid, P. (2002). The social life of information. Harvard Business Review Press

  • Bruch, E. E., & Mare, R. D. (2012). Methodological issues in the analysis of residential preferences and residential mobility. Sociological Methodology, 42, 103–54.

    Article  Google Scholar 

  • Camic, C., & Xie, Y. (1994). The statistical turn in American social science: Columbia University, 1890 to 1915. American Sociological Review, 59(5), 773–805.

    Article  Google Scholar 

  • Castells, M., Fernández-Ardèvol, M., Qiu, J. L., & Sey, A. (2007). Mobile communication and society: a global perspective. Cambridge: MIT Press.

    Google Scholar 

  • Centola, D. (2010). The spread of behavior in an online social network experiment. Science, 329(5996), 1194–97.

    Article  Google Scholar 

  • Coleman, J. S. (1986). Social theory, social research, and a theory of action. American Journal of Sociology, 91(6), 1309–35.

    Article  Google Scholar 

  • Coleman, J. S. (1994a). Foundations of social theory. Cambridge: Belknap Press.

    Google Scholar 

  • Coleman, J. S. (1994b). A vision for sociology. Society, 30, 29–34.

    Article  Google Scholar 

  • Collins, H., Evans, R., & Gorman, M. (2007). Trading zones and interactional expertise. Studies in History and Philosophy of Science, 38(4), 657–66.

    Article  Google Scholar 

  • Converse, J. M. (1987). Survey research in the United States: roots and emergence 1890–1960. Berkeley: University of California Press.

    Google Scholar 

  • Cukier, K., & Mayer-Schoenberge, V. (2013). The rise of big data: how it’s changing the way we think about the world. Foreign Affairs, 28–41.

  • Diehl, D., & McFarland, D. A. (2010). Towards a historical sociology of situations. American Journal of Sociology, 115(6), 1713–52.

    Article  Google Scholar 

  • Dodds, P. S., Muhamad, R., & Watts, D. (2003). An experimental study of search in global social networks. Science, 301(5634), 827–9.

    Article  Google Scholar 

  • Easley, D., & Kleinberg, J. (2010). Networks, crowds, and markets: reasoning about a highly connected world. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Einav, L., Levin, J., Popov, I., & Sundaresan, N. (2014). Growth, adoption and use of mobile e-commerce. American Economic Review: Papers and Proceedings, 104(5), 489–94.

    Article  Google Scholar 

  • Fleck, L. (1979). Genesis and development of a scientific fact. Chicago: University of Chicago Press.

    Google Scholar 

  • Galison, P. (1997). Image and logic: a material culture of microphysics. Chicago: University of Chicago Press.

    Google Scholar 

  • Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: strategies for qualitative research. Chicago: Aldine Pub. Co.

    Google Scholar 

  • Goldberg, A. (in press). In defense of forensic social science. Big Data & Society.

  • Golder, S. A., & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep and daylength across diverse cultures. Science, 333(6051), 1878–81.

    Article  Google Scholar 

  • Golder, S. A., & Macy, M. W. (2014). Digital footprints: opportunities and challenges for online social research. Annual Review of Sociology, 40, 129–52.

    Article  Google Scholar 

  • González-Bailón, S., Borge-Holthoeter, J., Rivero, A., & Moreno, Y. (2011). The dynamics of protest recruitment through an online network. Scientific Reports, 1, 197.

    Google Scholar 

  • González-Bailón, S., Wang, N., Rivero, A., Borge-Holthoefer, J., & Moreno, Y. (2014). Assessing the bias in samples of large online networks. Social Networks, 38, 16–27.

    Article  Google Scholar 

  • Grimmer, J., Westwood, S. J., & Messing, S. (2014). The impression of influence: legislator communication, representation, and democratic accountability. Princeton: Princeton University Press.

    Book  Google Scholar 

  • Hacking, I. (2006). The emergence of probability. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Hilbert, M., & López, P. (2011). The world’s technological capacity to store, communicate, and compute information. Science, 332(6025), 60–5.

    Article  Google Scholar 

  • Jurafsky, D., & Martin, J. H. (2009). Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. New York: Prentice Hall.

    Google Scholar 

  • Kagan, J. (2009). The three cultures: natural sciences, social sciences, and the humanities in the 21st century. New York: Cambridge University Press.

    Book  Google Scholar 

  • Kirchner, C., & Mohr, J. W. (2010). “Meanings and relations: an introduction to the study of language, discourse, and networks.”. Poetics, 38(6), 555–66.

    Article  Google Scholar 

  • Kohavi, R., & Longbotham, R. (2007). Online experiments: lessons learned. Computer, 40(9), 103–5.

    Article  Google Scholar 

  • Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.

    Article  Google Scholar 

  • Kuhn, T. S. (1996). The structure of scientific revolutions (3rd ed.). Chicago: University of Chicago Press.

    Book  Google Scholar 

  • Latour, B. (1988). Science in action. Cambirdge: Harvard University Press.

    Google Scholar 

  • Latour, B., & Woolgar, S. (1986). Laboratory life: the construction of scientific facts (2nd ed.). Princeton: Princeton University Press.

  • Laumann, E. O., Marsden, P., & Prensky, D. (1983). “The boundary specification problem in network analysis.”. In R. S. Burt & M. J. Minor (Eds.), Applied network analysis: A methodological introduction. London: Sage Publications.

    Google Scholar 

  • Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., & Alstyne, M. V. (2009). Computational social science. Science, 323(5915), 721–3.

    Article  Google Scholar 

  • Leskovec, J., & Horvitz, E. (2008). Planetary-scale views on a large instant-messaging network. International World Wide Web Conference (WWW).

  • Leskovec, J., Lang, K., & Mahoney, M. (2010). Empirical comparison of algorithms for network community detection. In WWW ’10: Proceedings of the 19th International Conference on World Wide Web. New York: ACM.

  • Levine, D. N. (1995). Visions of the sociological tradition. Chicago: University of Chicago Press.

    Google Scholar 

  • Lewis, K. (2015). Studying online behavior: comment on Anderson et al. 2014. Sociological Science, 2, 20–31.

    Article  Google Scholar 

  • Lewis, K. (in press). Three fallacies of digital footprints. Big Data & Society.

  • Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N. (2008). Tastes, ties, and time: a new social network dataset using facebook.com. Social Networks, 30(4), 330–42.

    Article  Google Scholar 

  • Lohr, S. (2012). “The Age of Big Data.” New York Times (February 11, 2012)

  • Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press.

    Google Scholar 

  • McCallum, A., Corrada-Emmanuel, A., & Wang, X. (2005). Topic and role discovery in social networks. IJCAI (International Joint Conferences on Artificial Intelligence).

  • McFarland, D.A. and H.R. McFarland. (in press). Big data and the danger of being precisely inaccurate. Big Data & Society.

  • McFarland, D. A., Diehl, D., & Rawlings, C. (2011). “Methodological transactionalism and the sociology of education.”. In H. Maureen (Ed.), Chapter 5 in Frontiers in sociology of education (pp. 87–109). New York: Springer.

    Chapter  Google Scholar 

  • McFarland, D. A., Manning, C. D., Ramage, D., Chuang, J., Heer, J., & Jurafsky, D. (2013a). Differentiating language usage through topic models. Poetics, 41(6), 607–25.

    Article  Google Scholar 

  • McFarland, D. A., Jurafsky, D., & Rawlings, C. (2013b). Making the connection: social bonding in courtship situations. American Journal of Sociology, 118(6), 1596–1649.

    Article  Google Scholar 

  • Menand, L. (2010). The marketplace of ideas: issues of our time. New York: W.W. Norton & Company.

    Google Scholar 

  • National Research Council. (2014). Convergence: Facilitating transdisciplinary integration of life sciences, physical sciences, engineering and beyond. National Research Council.

  • Newman, M. E. J. (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98, 404–409.

    Article  Google Scholar 

  • Newman, M. E. J. (2009). Networks: an introduction. Oxford: Oxford University Press.

    Google Scholar 

  • Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69, 026113.

    Article  Google Scholar 

  • Pentland, A. (2014). Social physics: How good ideas spread--the lessons from a new science. New York: Penguin Press.

    Google Scholar 

  • Platt, J. (1996). A history of sociological research methods in America, 1920–1960. Cambridge: Cambridge University Press.

    Google Scholar 

  • Porter, T. M. (1995). Trust in numbers. Princeton: Princeton University Press.

    Google Scholar 

  • Porter, T. M., & Ross, D. (Eds.). (2003). The modern social sciences. New York: Cambridge University Press.

    Google Scholar 

  • Ranganath, R., Jurafsky, D., & McFarland, D. A. (2012). Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates. Computer Speech and Language, 27(1), 89–115.

    Article  Google Scholar 

  • Rogers, E. M. (1987). Progress, problems and prospects for network research: investigating relationships in the age of electronic communication technologies. Social Networks, 9, 285–310.

    Article  Google Scholar 

  • Rosenfeld, M. J., & Thomas, R. J. (2012). Searching for a mate: the rise of the internet as a social intermediary. American Sociological Review, 77(4), 523–47.

    Article  Google Scholar 

  • Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311, 854–6.

    Article  Google Scholar 

  • Shi, X., Leskovec, J., & McFarland, D. A. (2010). Citing for high impact. Joint Conference on Digital Libraries, (JCDL 2010).

  • Shwed, U., & Bearman, P. S. (2010). The temporal structure of scientific consensus formation. American Sociological Review, 75(6), 817–40.

    Article  Google Scholar 

  • Smith, A., & Duggan, M. (2013). Online dating & relationships. Washington: Pew Research Center.

    Google Scholar 

  • Snow, C. P. (2001). The two cultures. London: Cambridge University Press. 1959.

    Google Scholar 

  • Sparrow, B., Liu, J., & Wegner, D. M. (2011). Google effects on memory: cognitive consequences of having information at our fingertips. Science, 333, 776–8.

    Article  Google Scholar 

  • Stokes, D. E. (1997). Pasteur’s quadrant: basic science and technological innovation. Washington: Brookings Institution Press.

    Google Scholar 

  • Stouffer, S. A. (1949). In The American Soldier, 4 vols Studies in social psychology during World War II.. Princeton, NJ: Princeton University Press.

    Google Scholar 

  • Szell, M., & Thurner, S. (2010). Measuring social dynamics in a massive multiplayer online game. Social Networks, 32, 313–29.

    Article  Google Scholar 

  • Talley, E., Newman, D., Herr, B., II, Wallach, H., Burns, G., Leenders, M., & McCallum, A. (2011). A database of national institutes of health (NIH) research using machine learned categories and graphically clustered grant awards. Nature Methods, 8, 443–4.

    Article  Google Scholar 

  • Vaisey, S. (2009). Motivation and justification: a dual-process model of culture in action. American Journal of Sociology, 114, 1675–1715.

    Article  Google Scholar 

  • Vaughan, D. (2014). Analogy, cases, and comparative social organization. In R. Swedberg (Ed.), Theorizing in social science: the context of discovery (pp. 61–84). Stanford: Stanford University Press.

    Google Scholar 

  • Wang, D. J., Shi, X., McFarland, D. A., & Leskovec, J. (2012). Measurement error in social network data: a re-classification. Social Networks, 34(4), 396–409.

    Article  Google Scholar 

  • Wasserman, S., & Faust, K. (1994). Social network analysis: methods and applications. Cambridge: Cambridge University Press.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel A. McFarland.

Additional information

This paper has benefited from conversations with Jure Leskovec and Kylie Swall who helped us formulate some of this material. This paper also benefitted from the feedback of Ron Breiger and John Mohr at the annual conference of the American Sociological Association in 2014. The work presented here is in part supported by Stanford’s Center for Computational Social Science and NSF # 0835614.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

McFarland, D.A., Lewis, K. & Goldberg, A. Sociology in the Era of Big Data: The Ascent of Forensic Social Science. Am Soc 47, 12–35 (2016). https://doi.org/10.1007/s12108-015-9291-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12108-015-9291-8

Keywords

  • Big data
  • Computational social science
  • Sociology of science
  • Forensic social science