Skip to main content
Log in

Does ‘bigger’ mean ‘better’? Pitfalls and shortcuts associated with big data for social research

  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

‘Big data is here to stay.’ This key statement has a double value: is an assumption as well as the reason why a theoretical reflection is needed. Furthermore, Big data is something that is gaining visibility and success in social sciences even, overcoming the division between humanities and computer sciences. In this contribution some considerations on the presence and the certain persistence of Big data as a socio-technical assemblage will be outlined. Therefore, the intriguing opportunities for social research linked to such interaction between practices and technological development will be developed. However, despite a promissory rhetoric, fostered by several scholars since the birth of Big data as a labelled concept, some risks are just around the corner. The claims for the methodological power of bigger and bigger datasets, as well as increasing speed in analysis and data collection, are creating a real hype in social research. Peculiar attention is needed in order to avoid some pitfalls. These risks will be analysed for what concerns the validity of the research results ‘obtained through Big data. After a pars distruens, this contribution will conclude with a pars construens; assuming the previous critiques, a mixed methods research design approach will be described as a general proposal with the objective of stimulating a debate on the integration of Big data in complex research projecting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The book title, not by chance, is: “Taming the tidal wave of Big data”.

  2. Internet World Stats, usage and population statistics, checked on June the 5th 2014.

  3. Comparison between Internet use in households and by individuals edition 2009 and 2012; data coming from Eurostat: statistics in focus.

  4. Or five if we include value and veracity (Demchenko et al. 2013), but in the interests of economy they will not be accounted for in this paper.

  5. According to Hilbert and López (2011), Communication capacity is certainly growing but they admit that still “(...) the digital age increased our capacity to store information [...] much more than our capacity to transmit information through broadcast and telecommunication networks.” Anyway this element will be further analysed later on as one of the main criticalities of BD.

  6. In the chapter in his edited book, Bijker analysed the fluorescent lamp; later on he dealt with bakelites, bicicles and incadescent bulbs (1997).

  7. http://google.com/patents/US6317722, Retrieved on February the 12th, 2014.

  8. An attempt to apply ANT approach to the description of the raise of electronic market in the case of EUREX has been conducted in Baygeldi and Smithson (2004). The ability of Actor Network Theory (ANT) to model and interpret an electronic market. Creating Knowledge-based Organisations, 109–26.

  9. Or ‘Ecology’ as meant by Abbott (2005). In contrast with deterministic concepts such as mechanisms and organisms, he argued that “When we call a set of social relations an ecology, we mean that it is best understood in terms of interactions between multiple elements that are neither fully constrained nor fully independent.” p. 248.

  10. As defined by Internet Engineering Task Force (IETF) of Berkeley University in the document available on http://tools.ietf.org/html/rfc6265#section-3. Checked on August 9th 2014.

  11. Eric Schimdt’s speech to the Association of National Advertisers, 2005, retrieved on Google press on May 21 2014. http://www.google.com/press/podium/ana.html.

  12. Big Data: the Management revolution, McAfee and Brynjolfsson in Big data’s management revolution, The Promise and Challange of Big Data, Harvard Business Review Insight Center Report, September 11, 2014.

  13. In computer programming Application Programming Interface, is exactly a software interface able to regulate software interaction. In this case, APIs are useful interfaces that allow interaction between different web applications as for instance social networks sites.

  14. Ginsberg et al. (2009) Detecting influenza epidemics using search engine query data, Nature vol. 457, 19 February 2009.

  15. Actually Rogers quotes Mike Thelwall, an affirmed webometrician, who said that the challenge “is to demonstrate the web data correlate significantly with some non web data in order to prove that web data are not wholly random” (p. 205).

  16. A global map produced by Eric Fisher based on Twitter is available here: https://www.mapbox.com/labs/twitter-gnip/locals/#.

  17. The celeb way of labelling a content in order to assure that the content is referred to a specific object, topic or event.

  18. Since 2010, on Sage’s Social Science Computer Review, 53 papers (57.6 % of total) has been published basing their empirical data on Twitter. Site consulted last time on 17th July 2014. As Zeynep Tufekci noticed (2014), at the 2013 edition of International Conference of Weblogs and Social Media (ICWSM) almost more than thirty papers (the half of total accepted) were based on some kind of Twitter analysis.

  19. It should be mark out that computer sciences applied on content analysis is proceeding very fast. Information retrieval is a cutting edge field of research for informatics: some of the key articles are about machine learning and sentiment analysis. The former is Blei et al. (2003), on the so called LDA (Latent Dirichelet Allocation) that allows to automatically individuate on probabilistic basis the topics contained into a complex corpus of texts; the latter is Pang and Lee (2008), about sentiment analysis, a way to automatically calculate partisanship of texts about a certain topic.

  20. A pure mix of social sciences and informatics might probably be a new frontier for high education but this would open an other kind problems in the academic world.

References

  • Abbott, A.: Linked ecologies: states and universities as environments for professions*. Sociol. Theory 23(3), 245–274 (2005)

    Article  Google Scholar 

  • Akrich, M.: The description of technical objects. In: Bijker, W.E., Law, J. Shaping Technology/Building Society: Studies in Sociotechnical Change. MIT Press, Cambridge pp. 205–224 (1992)

  • Anderson, C.: The end of theory. Wired Mag. 16 (2008)

  • Bainbridge, W.S.: The scientific research potential of virtual worlds. Sci. Mag. 317(5837), 472–476 (2007)

    Google Scholar 

  • Barland, M.: Big Data Special Report—Data-driven decision-making: interpreting the digital exhaust of everybody, everywhere, Volta Science, technology and society in Europe- 2013(5), 6–14 (2013)

  • Baygeldi, M., Smithson, S.: The ability of Actor Network Theory (ANT) to model and interpret an electronic market. Creat. Knowl.-Based Organ., 109–26 (2004)

  • Baym, N., Boyd, d: Socially mediated publicness: an introduction. J. Broadcast. Electron. Media 56(3), 320–329 (2012)

    Article  Google Scholar 

  • Beer, D., Burrows, R.: Consumption, prosumption and participatory web cultures. J. Consum. Cult. 10(1), 3 (2010)

    Article  Google Scholar 

  • Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    Google Scholar 

  • Bijker, W.E., Law, J.: Shaping Technology/Building Society: Studies in Sociotechnical Change. MIT Press, Cambridge (1992)

    Google Scholar 

  • Bijker, W.E.: Of Bicycles, Bakelites, and Bulbs: Toward a Theory of Sociotechnical Change. MIT Press, Cambridge (1997)

    Google Scholar 

  • Boase, J., et al.: The strength of internet ties. Pew Internet and American Life Project (2006)

  • boyd, d.: Social network sites: the role of networked publics in teenagesocial life. Youth, identity, and digital media. In: Buckingham, D. (ed.) The John D. and Catherine T. MacArthur Foundation Series on Digital Media and Learning, pp. 119–142. The MIT Press, Cambridge (2008)

  • boyd, d: The politics of ’Real Names’: power, context, and control in networked publics. Commun. ACM 55(8), 29–31 (2012)

  • boyd, d., Crawford, K.: Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 15(5), 662–679 (2012)

  • Brynjolfsson E., McAfee, A.: Big data’s management revolution. The Promise and Challange of Big Data, Harvard Business Review Insight Center Report, September 11 (2012)

  • Callon, M.: Society in the making: the study of technology as a tool for sociological analysis. In: Bijker, W.E., Hughes, T.P., Pinch, T.J. (eds.) The Social Construction of Technological Systems: New Directions in the Sociology and History of Technology, pp. 83–103. MIT Press, Cambridge (1987)

  • Castells, M.: Communication Power. Oxford University Press, New York (2009)

    Google Scholar 

  • Clarke, A.E., Fujimura, J. (eds.): The Right Tools for the Job: At Work in Twentieth-Century Life Sciences. University Press, Princeton (1992)

    Google Scholar 

  • Consalvo, M., Ess, C. (eds.): The Hanbook of Internet Studies. Wiley, Oxford (2011)

    Google Scholar 

  • Crampton, J.W., Graham, M., Poorthuis, A., Shelton, T., Stephens, M., Wilson, M.W., Zook, M.: Beyond the geotag: situating ‘big data’ and leveraging the potential of the geoweb. Cartogr. Geogr. Inf. Sci. 40(2), 130–139 (2013)

    Article  Google Scholar 

  • Crabu, S.: Give us a protocol and we will rise a lab, The shaping of the infra-structuring objects. In: Mongili, A., Pellegrino, G. (eds.) Information Infrastructure(s): Boundaries, Ecologies, Multiplicity, pp. 120–144. Cambridge Scholars Publishing, Cambridge (2014)

    Google Scholar 

  • Cukier, K.: Data, data everywhere: a special report on managing information. The Economist, 25th February 2010 (2010)

  • Demchenko, Y., Ngo, C., Membrey, P.: Architecture framework and components for the big data ecosystem. J. Syst. Netw. Eng., 1–31 (2013)

  • De Paoli, Stefanoe, T., Maurizio : New Groups and New Methods? The Ethnography and Qualitative Research of Online Groups, special issue Etnografia e Ricerca Qualitativa, 4(2) (2011)

  • Ellison, N.B., Steinfield, C., Lampe, C.: The benefits of Facebook friends: social capital and college students’ use of online social network sites. J. Comput. Mediat. Commun. 12(4), 1143–1168 (2007)

    Article  Google Scholar 

  • Elwood, S.: Geographic information science: emerging research on the societal implications of the geospatial web. Prog. Hum. Geogr. 34(3), 349–357 (2010)

    Article  Google Scholar 

  • Franks, B.: Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics, vol. 56. John Wiley and Sons, New York (2012)

    Book  Google Scholar 

  • Fuchs, C.: Social Media: A Critical Introduction. Sage, London (2014)

    Book  Google Scholar 

  • Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014 (2009)

    Article  Google Scholar 

  • González-Bailón, S.: Social science in the era of big data. Policy Internet 5(2), 147–160 (2013)

    Article  Google Scholar 

  • Geser, H.: Towards a sociological theory of the mobile phone. In: Zerdick, A., Picot, A., Schrape, K., Burgelman, J.-C., Silverstone, R. (eds.) E-merging Media Communication and the Media Economy of the Future. Springer, Heidelberg (2005)

    Google Scholar 

  • Giardullo, P., Lazzer, G.P.: Digital texts: writing and reading at the digital turning point. In: Mongili, A., Pellegrino, G. (eds.) Information Infrastructure(s): Boundaries, Ecologies, Multiplicity, pp. 166–189. Cambridge Scholars Publishing, Cambridge (2014)

    Google Scholar 

  • Hampton, K., Wellman, B.: Neighboring in Netville: how the Internet supports community and social capital in a wired suburb. City Community 2(4), 277–311 (2003)

    Article  Google Scholar 

  • Hale, S.A.: Global connectivity and multilinguals in the Twitter network. In: Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems, pp. 833–842 (2014)

  • Hilbert, M.: How much information is there in the information society? Significance 9(4), 8–12 (2012)

    Article  Google Scholar 

  • Hilbert, M., López, P.: The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011)

    Article  Google Scholar 

  • Hilbert, M., López, P.: Info capacity: how to measure the world’s technological capacity to communicate, store and compute information? Part I: results and scope. Int. J. Commun. 6, 24 (2012)

    Google Scholar 

  • Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media?. In: Proceedings of the 19th international conference on World wide web, pp. 591–600. ACM, New York (2010)

  • Latour, B.: Science in action: How to follow scientists and engineers through society. Harvard university press, Cambridge (1987)

  • Latour, B.: Reassembling the social: an introduction to actor-network-theory., by Bruno Latour, Oxford University Press, New York (2005)

  • Law, J.: Technology and heterogeneous engineering: the case of Portuguese expansion. In: Bijker, W.E., Hughes, T.P., Pinch, T.J. (eds.) The Social Construction of Technological Systems: New Directions in the Sociology and History of Technology, pp. 111–134. MIT Press, Cambridge (1987)

  • Lazer, D., Pentland, A.S., Adamic, L., Aral, S., Barabasi, A.L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., Van Alstyne, M.: Computational social science. Science 323(5915), 721–723 (2009)

    Article  Google Scholar 

  • Lazer, D., Kennedy, R., King, G., Vespignani, A.: Big data. The parable of Google Flu: traps in big data analysis. Science (New York, NY) 343(6176), 1203–1205n (2014)

    Article  Google Scholar 

  • Lieberman, E.S.: Nested analysis as a mixed-method strategy for comparative research. Am. Polit. Sci. Rev. 99(03), 435–452 (2005)

    Article  Google Scholar 

  • Lin, N.: Social Capital: A Theory of Structure and Action. Cambridge University Press, London (2001)

    Book  Google Scholar 

  • Lotan, G., Graeff, E., Ananny, M., Gaffney, D., Pearce, I.: The Arab Spring| the revolutions were tweeted: information flows during the 2011 Tunisian and Egyptian revolutions. Int. J. Commun. 5, 31 (2011)

    Google Scholar 

  • Magaudda, P.: Hacking Practices and their Relevance for Consumer Studies: The Example of the ‘Jailbreaking’of the iPhone. Consumers, Commodities and Consumption (2010)

  • Magaudda, P.: When materiality ’bites back’: digital music consumption practices in the age of dematerialization. J. Consum. Cult. 11(1), 15–36 (2011)

    Article  Google Scholar 

  • Magaudda, P.: What happens to materiality in digital virtual consumption? In: Denegri-Knott, J., Molesworth, M. (eds.) Digital Virtual Consumption. Routledge, London (2013)

    Google Scholar 

  • Manovich, L.: Trending: the promises and the challenges of Big Social Data. In: Gold, M.K. (ed.) Debates in the Digital Humanities. University of Minnesota Press, Minneapolis (2012a)

  • Manovich, L.: How to compare one million images. In: Berry, D.M. (ed.) Understanding Digital Humanities, pp. 249–278. Palgrave Macmillan, Basingstoke (2012b)

  • MacKenzie D.: A sociology of algorithms: high-frequency trading, boundary work, and market configurations, Paper discussed at MaxPo Conference on Monday, March 3rd, SCOOPs Seminar MaxPo (2014)

  • McAfee, A., Brynjolfsson, E.: Big data. The management revolution. Harv. Bus Rev. 90(10), 61–67 (2012)

    Google Scholar 

  • McKinsey Global Institute, Big data: the next frontier for innovation, competition, and productivity, May 2011 http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation. Accessed 29 May 2014

  • Morgan, D.L.: Paradigms lost and pragmatism regained methodological implications of combining qualitative and quantitative methods. J. Mixed Methods Res. 1(1), 48–76 (2007)

    Article  Google Scholar 

  • Neuhaus, F., Webmoor, T.: Agile ethics for massified research and visualisation. Inf. Commun. Soc. 15(1), 43–65 (2012)

    Article  Google Scholar 

  • Ohm P.: The underwhelming benefits of big data, University of Pennsylvania Law Review, 161 U. Pa. L. Rev 339, (2013)

  • Omernick, E., Sood, S.O.: The impact of anonymity in online communities. In: Social Computing (SocialCom), 2013 International Conference on (pp. 526–535). IEEE (2013)

  • Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  • Pew Research Center, Older Adults and Technology Use Available at: http://www.pewinternet.org/2014/04/03/older-adults-and-technology-use/ (2014). Accessed 1st Aug 2014

  • Pickering, A.: The Mangle of Practice: Time, Agency, and Science. University of Chicago Press, Chicago (2010)

    Google Scholar 

  • Rogers, R.: Digital methods. MIT press, Cambridge (2013)

  • Rogers, R., Marres, N.: Landscaping climate change: a mapping technique for understanding science and technology debates on the World Wide Web. Public Underst. Sci. 9(2), 141–163 (2000)

    Article  Google Scholar 

  • Schwarz, O.: Good young nostalgia: camera phones and technologies of self among Israeli youth. J. Consum. Cult. 9(3), 348–376 (2009)

    Article  Google Scholar 

  • Schwarz, O.: Going to bed with a camera: on the visualization of sexuality and the production of knowledge. Int. J. Cult. Stud. 13(6), 637–656 (2010)

    Article  Google Scholar 

  • Smith, A., Brenner, J.: Twitter use 2012. Pew internet and american life project, 4. http://pewinternet.org/Reports/2012/Twitter-Use-2012.aspx (2012). Accessed 9th Aug 2014

  • Snijders, C., Matzat, U., Reips, U.: Big data: big gaps of knowledge in the field of internet science. Int. J. Internet Sci. 7(1), 1–5 (2012)

    Google Scholar 

  • Tinati, R. et al.: Big data: methodological challenges and approaches for sociological analysis. Sociology 1–19 (2014)

  • Tufekci Z.: Big questions for social media big data: representativeness, validity and other methodological pitfalls. In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (2014)

  • Twitter: One hundred million voices, Twitter blog, http://blog.twitter.com/2011/09/one-hundred-million-voices.html (2011). Accessed 12 Sep 2011

  • Venturini, T.: Diving in magma: How to explore controversies with actor-network theory. Public Underst. Sci. 19(3), 258–273 (2010)

    Article  Google Scholar 

  • Wellman, B.: Designing the Internet for a networked society. Commun. ACM 45(5), 91–96 (2002)

    Article  Google Scholar 

  • Wellman, B.: The three ages of internet studies: ten, five and zero years ago. New Media Soc. 6(1), 123–129 (2004)

    Article  Google Scholar 

  • Wellman, B.: Studying the internet through the ages. In: Consalvo, M., Ess, C. (eds.) The Handbook of Internet Studies. Wiley, Oxford (2011)

    Google Scholar 

  • Wellman B., Gulia M.: Net surfers don’t ride alone: virtual communities as communities. Netw. Glob. Village 331–366 (1999)

  • Zikopoulos, P., Eaton, C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media, New York (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Giardullo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Giardullo, P. Does ‘bigger’ mean ‘better’? Pitfalls and shortcuts associated with big data for social research. Qual Quant 50, 529–547 (2016). https://doi.org/10.1007/s11135-015-0162-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-015-0162-8

Keywords

Navigation