Abstract
‘Big data is here to stay.’ This key statement has a double value: is an assumption as well as the reason why a theoretical reflection is needed. Furthermore, Big data is something that is gaining visibility and success in social sciences even, overcoming the division between humanities and computer sciences. In this contribution some considerations on the presence and the certain persistence of Big data as a socio-technical assemblage will be outlined. Therefore, the intriguing opportunities for social research linked to such interaction between practices and technological development will be developed. However, despite a promissory rhetoric, fostered by several scholars since the birth of Big data as a labelled concept, some risks are just around the corner. The claims for the methodological power of bigger and bigger datasets, as well as increasing speed in analysis and data collection, are creating a real hype in social research. Peculiar attention is needed in order to avoid some pitfalls. These risks will be analysed for what concerns the validity of the research results ‘obtained through Big data. After a pars distruens, this contribution will conclude with a pars construens; assuming the previous critiques, a mixed methods research design approach will be described as a general proposal with the objective of stimulating a debate on the integration of Big data in complex research projecting.
Similar content being viewed by others
Notes
The book title, not by chance, is: “Taming the tidal wave of Big data”.
Internet World Stats, usage and population statistics, checked on June the 5th 2014.
Comparison between Internet use in households and by individuals edition 2009 and 2012; data coming from Eurostat: statistics in focus.
Or five if we include value and veracity (Demchenko et al. 2013), but in the interests of economy they will not be accounted for in this paper.
According to Hilbert and López (2011), Communication capacity is certainly growing but they admit that still “(...) the digital age increased our capacity to store information [...] much more than our capacity to transmit information through broadcast and telecommunication networks.” Anyway this element will be further analysed later on as one of the main criticalities of BD.
In the chapter in his edited book, Bijker analysed the fluorescent lamp; later on he dealt with bakelites, bicicles and incadescent bulbs (1997).
http://google.com/patents/US6317722, Retrieved on February the 12th, 2014.
An attempt to apply ANT approach to the description of the raise of electronic market in the case of EUREX has been conducted in Baygeldi and Smithson (2004). The ability of Actor Network Theory (ANT) to model and interpret an electronic market. Creating Knowledge-based Organisations, 109–26.
Or ‘Ecology’ as meant by Abbott (2005). In contrast with deterministic concepts such as mechanisms and organisms, he argued that “When we call a set of social relations an ecology, we mean that it is best understood in terms of interactions between multiple elements that are neither fully constrained nor fully independent.” p. 248.
As defined by Internet Engineering Task Force (IETF) of Berkeley University in the document available on http://tools.ietf.org/html/rfc6265#section-3. Checked on August 9th 2014.
Eric Schimdt’s speech to the Association of National Advertisers, 2005, retrieved on Google press on May 21 2014. http://www.google.com/press/podium/ana.html.
Big Data: the Management revolution, McAfee and Brynjolfsson in Big data’s management revolution, The Promise and Challange of Big Data, Harvard Business Review Insight Center Report, September 11, 2014.
In computer programming Application Programming Interface, is exactly a software interface able to regulate software interaction. In this case, APIs are useful interfaces that allow interaction between different web applications as for instance social networks sites.
Ginsberg et al. (2009) Detecting influenza epidemics using search engine query data, Nature vol. 457, 19 February 2009.
Actually Rogers quotes Mike Thelwall, an affirmed webometrician, who said that the challenge “is to demonstrate the web data correlate significantly with some non web data in order to prove that web data are not wholly random” (p. 205).
A global map produced by Eric Fisher based on Twitter is available here: https://www.mapbox.com/labs/twitter-gnip/locals/#.
The celeb way of labelling a content in order to assure that the content is referred to a specific object, topic or event.
Since 2010, on Sage’s Social Science Computer Review, 53 papers (57.6 % of total) has been published basing their empirical data on Twitter. Site consulted last time on 17th July 2014. As Zeynep Tufekci noticed (2014), at the 2013 edition of International Conference of Weblogs and Social Media (ICWSM) almost more than thirty papers (the half of total accepted) were based on some kind of Twitter analysis.
It should be mark out that computer sciences applied on content analysis is proceeding very fast. Information retrieval is a cutting edge field of research for informatics: some of the key articles are about machine learning and sentiment analysis. The former is Blei et al. (2003), on the so called LDA (Latent Dirichelet Allocation) that allows to automatically individuate on probabilistic basis the topics contained into a complex corpus of texts; the latter is Pang and Lee (2008), about sentiment analysis, a way to automatically calculate partisanship of texts about a certain topic.
A pure mix of social sciences and informatics might probably be a new frontier for high education but this would open an other kind problems in the academic world.
References
Abbott, A.: Linked ecologies: states and universities as environments for professions*. Sociol. Theory 23(3), 245–274 (2005)
Akrich, M.: The description of technical objects. In: Bijker, W.E., Law, J. Shaping Technology/Building Society: Studies in Sociotechnical Change. MIT Press, Cambridge pp. 205–224 (1992)
Anderson, C.: The end of theory. Wired Mag. 16 (2008)
Bainbridge, W.S.: The scientific research potential of virtual worlds. Sci. Mag. 317(5837), 472–476 (2007)
Barland, M.: Big Data Special Report—Data-driven decision-making: interpreting the digital exhaust of everybody, everywhere, Volta Science, technology and society in Europe- 2013(5), 6–14 (2013)
Baygeldi, M., Smithson, S.: The ability of Actor Network Theory (ANT) to model and interpret an electronic market. Creat. Knowl.-Based Organ., 109–26 (2004)
Baym, N., Boyd, d: Socially mediated publicness: an introduction. J. Broadcast. Electron. Media 56(3), 320–329 (2012)
Beer, D., Burrows, R.: Consumption, prosumption and participatory web cultures. J. Consum. Cult. 10(1), 3 (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bijker, W.E., Law, J.: Shaping Technology/Building Society: Studies in Sociotechnical Change. MIT Press, Cambridge (1992)
Bijker, W.E.: Of Bicycles, Bakelites, and Bulbs: Toward a Theory of Sociotechnical Change. MIT Press, Cambridge (1997)
Boase, J., et al.: The strength of internet ties. Pew Internet and American Life Project (2006)
boyd, d.: Social network sites: the role of networked publics in teenagesocial life. Youth, identity, and digital media. In: Buckingham, D. (ed.) The John D. and Catherine T. MacArthur Foundation Series on Digital Media and Learning, pp. 119–142. The MIT Press, Cambridge (2008)
boyd, d: The politics of ’Real Names’: power, context, and control in networked publics. Commun. ACM 55(8), 29–31 (2012)
boyd, d., Crawford, K.: Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 15(5), 662–679 (2012)
Brynjolfsson E., McAfee, A.: Big data’s management revolution. The Promise and Challange of Big Data, Harvard Business Review Insight Center Report, September 11 (2012)
Callon, M.: Society in the making: the study of technology as a tool for sociological analysis. In: Bijker, W.E., Hughes, T.P., Pinch, T.J. (eds.) The Social Construction of Technological Systems: New Directions in the Sociology and History of Technology, pp. 83–103. MIT Press, Cambridge (1987)
Castells, M.: Communication Power. Oxford University Press, New York (2009)
Clarke, A.E., Fujimura, J. (eds.): The Right Tools for the Job: At Work in Twentieth-Century Life Sciences. University Press, Princeton (1992)
Consalvo, M., Ess, C. (eds.): The Hanbook of Internet Studies. Wiley, Oxford (2011)
Crampton, J.W., Graham, M., Poorthuis, A., Shelton, T., Stephens, M., Wilson, M.W., Zook, M.: Beyond the geotag: situating ‘big data’ and leveraging the potential of the geoweb. Cartogr. Geogr. Inf. Sci. 40(2), 130–139 (2013)
Crabu, S.: Give us a protocol and we will rise a lab, The shaping of the infra-structuring objects. In: Mongili, A., Pellegrino, G. (eds.) Information Infrastructure(s): Boundaries, Ecologies, Multiplicity, pp. 120–144. Cambridge Scholars Publishing, Cambridge (2014)
Cukier, K.: Data, data everywhere: a special report on managing information. The Economist, 25th February 2010 (2010)
Demchenko, Y., Ngo, C., Membrey, P.: Architecture framework and components for the big data ecosystem. J. Syst. Netw. Eng., 1–31 (2013)
De Paoli, Stefanoe, T., Maurizio : New Groups and New Methods? The Ethnography and Qualitative Research of Online Groups, special issue Etnografia e Ricerca Qualitativa, 4(2) (2011)
Ellison, N.B., Steinfield, C., Lampe, C.: The benefits of Facebook friends: social capital and college students’ use of online social network sites. J. Comput. Mediat. Commun. 12(4), 1143–1168 (2007)
Elwood, S.: Geographic information science: emerging research on the societal implications of the geospatial web. Prog. Hum. Geogr. 34(3), 349–357 (2010)
Franks, B.: Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics, vol. 56. John Wiley and Sons, New York (2012)
Fuchs, C.: Social Media: A Critical Introduction. Sage, London (2014)
Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014 (2009)
González-Bailón, S.: Social science in the era of big data. Policy Internet 5(2), 147–160 (2013)
Geser, H.: Towards a sociological theory of the mobile phone. In: Zerdick, A., Picot, A., Schrape, K., Burgelman, J.-C., Silverstone, R. (eds.) E-merging Media Communication and the Media Economy of the Future. Springer, Heidelberg (2005)
Giardullo, P., Lazzer, G.P.: Digital texts: writing and reading at the digital turning point. In: Mongili, A., Pellegrino, G. (eds.) Information Infrastructure(s): Boundaries, Ecologies, Multiplicity, pp. 166–189. Cambridge Scholars Publishing, Cambridge (2014)
Hampton, K., Wellman, B.: Neighboring in Netville: how the Internet supports community and social capital in a wired suburb. City Community 2(4), 277–311 (2003)
Hale, S.A.: Global connectivity and multilinguals in the Twitter network. In: Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems, pp. 833–842 (2014)
Hilbert, M.: How much information is there in the information society? Significance 9(4), 8–12 (2012)
Hilbert, M., López, P.: The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011)
Hilbert, M., López, P.: Info capacity: how to measure the world’s technological capacity to communicate, store and compute information? Part I: results and scope. Int. J. Commun. 6, 24 (2012)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media?. In: Proceedings of the 19th international conference on World wide web, pp. 591–600. ACM, New York (2010)
Latour, B.: Science in action: How to follow scientists and engineers through society. Harvard university press, Cambridge (1987)
Latour, B.: Reassembling the social: an introduction to actor-network-theory., by Bruno Latour, Oxford University Press, New York (2005)
Law, J.: Technology and heterogeneous engineering: the case of Portuguese expansion. In: Bijker, W.E., Hughes, T.P., Pinch, T.J. (eds.) The Social Construction of Technological Systems: New Directions in the Sociology and History of Technology, pp. 111–134. MIT Press, Cambridge (1987)
Lazer, D., Pentland, A.S., Adamic, L., Aral, S., Barabasi, A.L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., Van Alstyne, M.: Computational social science. Science 323(5915), 721–723 (2009)
Lazer, D., Kennedy, R., King, G., Vespignani, A.: Big data. The parable of Google Flu: traps in big data analysis. Science (New York, NY) 343(6176), 1203–1205n (2014)
Lieberman, E.S.: Nested analysis as a mixed-method strategy for comparative research. Am. Polit. Sci. Rev. 99(03), 435–452 (2005)
Lin, N.: Social Capital: A Theory of Structure and Action. Cambridge University Press, London (2001)
Lotan, G., Graeff, E., Ananny, M., Gaffney, D., Pearce, I.: The Arab Spring| the revolutions were tweeted: information flows during the 2011 Tunisian and Egyptian revolutions. Int. J. Commun. 5, 31 (2011)
Magaudda, P.: Hacking Practices and their Relevance for Consumer Studies: The Example of the ‘Jailbreaking’of the iPhone. Consumers, Commodities and Consumption (2010)
Magaudda, P.: When materiality ’bites back’: digital music consumption practices in the age of dematerialization. J. Consum. Cult. 11(1), 15–36 (2011)
Magaudda, P.: What happens to materiality in digital virtual consumption? In: Denegri-Knott, J., Molesworth, M. (eds.) Digital Virtual Consumption. Routledge, London (2013)
Manovich, L.: Trending: the promises and the challenges of Big Social Data. In: Gold, M.K. (ed.) Debates in the Digital Humanities. University of Minnesota Press, Minneapolis (2012a)
Manovich, L.: How to compare one million images. In: Berry, D.M. (ed.) Understanding Digital Humanities, pp. 249–278. Palgrave Macmillan, Basingstoke (2012b)
MacKenzie D.: A sociology of algorithms: high-frequency trading, boundary work, and market configurations, Paper discussed at MaxPo Conference on Monday, March 3rd, SCOOPs Seminar MaxPo (2014)
McAfee, A., Brynjolfsson, E.: Big data. The management revolution. Harv. Bus Rev. 90(10), 61–67 (2012)
McKinsey Global Institute, Big data: the next frontier for innovation, competition, and productivity, May 2011 http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation. Accessed 29 May 2014
Morgan, D.L.: Paradigms lost and pragmatism regained methodological implications of combining qualitative and quantitative methods. J. Mixed Methods Res. 1(1), 48–76 (2007)
Neuhaus, F., Webmoor, T.: Agile ethics for massified research and visualisation. Inf. Commun. Soc. 15(1), 43–65 (2012)
Ohm P.: The underwhelming benefits of big data, University of Pennsylvania Law Review, 161 U. Pa. L. Rev 339, (2013)
Omernick, E., Sood, S.O.: The impact of anonymity in online communities. In: Social Computing (SocialCom), 2013 International Conference on (pp. 526–535). IEEE (2013)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)
Pew Research Center, Older Adults and Technology Use Available at: http://www.pewinternet.org/2014/04/03/older-adults-and-technology-use/ (2014). Accessed 1st Aug 2014
Pickering, A.: The Mangle of Practice: Time, Agency, and Science. University of Chicago Press, Chicago (2010)
Rogers, R.: Digital methods. MIT press, Cambridge (2013)
Rogers, R., Marres, N.: Landscaping climate change: a mapping technique for understanding science and technology debates on the World Wide Web. Public Underst. Sci. 9(2), 141–163 (2000)
Schwarz, O.: Good young nostalgia: camera phones and technologies of self among Israeli youth. J. Consum. Cult. 9(3), 348–376 (2009)
Schwarz, O.: Going to bed with a camera: on the visualization of sexuality and the production of knowledge. Int. J. Cult. Stud. 13(6), 637–656 (2010)
Smith, A., Brenner, J.: Twitter use 2012. Pew internet and american life project, 4. http://pewinternet.org/Reports/2012/Twitter-Use-2012.aspx (2012). Accessed 9th Aug 2014
Snijders, C., Matzat, U., Reips, U.: Big data: big gaps of knowledge in the field of internet science. Int. J. Internet Sci. 7(1), 1–5 (2012)
Tinati, R. et al.: Big data: methodological challenges and approaches for sociological analysis. Sociology 1–19 (2014)
Tufekci Z.: Big questions for social media big data: representativeness, validity and other methodological pitfalls. In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (2014)
Twitter: One hundred million voices, Twitter blog, http://blog.twitter.com/2011/09/one-hundred-million-voices.html (2011). Accessed 12 Sep 2011
Venturini, T.: Diving in magma: How to explore controversies with actor-network theory. Public Underst. Sci. 19(3), 258–273 (2010)
Wellman, B.: Designing the Internet for a networked society. Commun. ACM 45(5), 91–96 (2002)
Wellman, B.: The three ages of internet studies: ten, five and zero years ago. New Media Soc. 6(1), 123–129 (2004)
Wellman, B.: Studying the internet through the ages. In: Consalvo, M., Ess, C. (eds.) The Handbook of Internet Studies. Wiley, Oxford (2011)
Wellman B., Gulia M.: Net surfers don’t ride alone: virtual communities as communities. Netw. Glob. Village 331–366 (1999)
Zikopoulos, P., Eaton, C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media, New York (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Giardullo, P. Does ‘bigger’ mean ‘better’? Pitfalls and shortcuts associated with big data for social research. Qual Quant 50, 529–547 (2016). https://doi.org/10.1007/s11135-015-0162-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11135-015-0162-8