Academic knowledge building has progressed for the past few centuries using small data studies characterized by sampled data generated to answer specific questions. It is a strategy that has been remarkably successful, enabling the sciences, social sciences and humanities to advance in leaps and bounds. This approach is presently being challenged by the development of big data. Small data studies will however, we argue, continue to be popular and valuable in the future because of their utility in answering targeted queries. Importantly, however, small data will increasingly be made more big data-like through the development of new data infrastructures that pool, scale and link small data in order to create larger datasets, encourage sharing and reuse, and open them up to combination with big data and analysis using big data analytics. This paper examines the logic and value of small data studies, their relationship to emerging big data and data science, and the implications of scaling small data into data infrastructures, with a focus on spatial data examples.
KeywordsBig data Small data Data infrastructures Cyber-infrastructures Ontology Epistemology
The research conducted for this paper was made possible with funding from the European Research Council (ERC-2012-AdG-323636) and Science Foundation Ireland.
- Amin, A., & Thrift, N. (2002). Cities: Reimagining the urban. London: Polity.Google Scholar
- Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired, June 23, 2008, http://www.wired.com/science/discoveries/magazine/16-07/pb_theo-ry. Accessed 12 Oct 2012.
- Batty, M. (2013). The new science of cities. Cambridge, MA: MIT Press.Google Scholar
- Berry, D. (2011). The computational turn: Thinking about the digital humanities. Culture Machine 12. http://www.culturemachine.net/index.php/cm/article/view/440/470. Accessed 3 Dec 2012.
- Bollier, D. (2010). The promise and peril of big data. The Aspen Institute. http://www.aspeninstitute.org/sites/default/files/content/docs/pubs/The_Promise_and_Peril_of_Big_Data.pdf. Accessed 1 Oct 2012.
- Borgman, C. L. (2007). Scholarship in the digital age. Cambridge, MA: MIT Press.Google Scholar
- Brooks, D. (2013). What data can’t do. New York Times, http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html. Accessed 18 Feb 2013.
- Canadian Internet Public Policy Interest Clinic (CIPPIC). (2006). On the data trail: How detailed information about you gets into the hands of organizations with whom you have no relationship. Ottawa: A Report on the Canadian Data Brokerage Industry. https://www.cippic.ca/sites/default/files/May1-06/DatabrokerReport.pdf.
- Constine, J. (2012). How big is facebook’s data? 2.5 billion pieces of content and 500 + terabytes ingested every day, 22 August 2012, http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/. Accessed 28 Jan 2013.
- Crampton, J., Graham, M., Poorthuis, A., Shelton, T., Stephens, M., Wilson, M. W., et al. (2012). Beyond the Geotag? Deconstructing “big data” and leveraging the potential of the geoweb. http://www.uky.edu/~tmute2/geography_methods/readingPDFs/2012-Beyond-the-Geotag-2012.10.01.pdf. Accessed 21 Feb 2013.
- Cyberinfrastructure Council. (2007). Cyberinfrastructure vision for 21st century discovery. http://www.nsf.gov/pubs/2007/nsf0728/index.jsp?org=EEC Washington, DC: National Science Foundation. Accessed 17 Jan 2014.
- Dasish. (2012). Roadmap for preservation and curation in the social sciences and humanities. http://dasish.eu/publications/projectreports/D4.1_-_Roadmap_for_Preservation_and_Curation_in_the_SSH.pdf/. Accessed 15 Oct 2013.
- Edwards, J. (2013). Facebook is about to launch a huge play in ‘big data’ analytics. Business insider, May 10th http://www.businessinsider.com/facebook-is-about-to-launch-a-huge-play-in-big-data-analytics-2013-5. Accessed 18 Sept 2013.
- Environics Analytics. (2013a). Wealth$capes: Dollars and sense, http://www.environicsanalytics.ca/environics-analytics/data/financial-data/wealthscapes. Accessed 26 Nov 2013.
- Environics Analytics. (2013b). PRiZMc2 segmentation lifestyle lookup tool, http://www.environicsanalytics.ca/prizm-c2-cluster-lookup. Accessed 26 Nov 2013.
- Fry, J., Lockyer, S., Oppenheim, C., Houghton, J. W., & Rasmussen, B. (2008). Identifying benefits arising from the curation and open sharing of research data produced by UK higher education and research institutes. London and Bristol: JISC. http://repository.jisc.ac.uk/279/. Accessed 8 Oct 2014.
- Hacking, I. (1975). The emergence of probability. Cambridge: Cambridge University Press.Google Scholar
- Hacking, I. (2007). Kinds of people, moving targets. In Proceedings of the British Academy (Vol. 151, pp. 285–318), 2006 Lectures. British Academy Lecture, Read at the Academy 11 April 2006.Google Scholar
- Han, J., Kamber, M., & Pei, (2011). Data mining: Concepts and techniques (3rd ed.). Waltham: Morgan Kaufmann.Google Scholar
- Haraway, D. (1991). Simians, cyborgs and women: The reinvention of nature. New York: Routledge.Google Scholar
- Innes, M. (2001). Control creep. Sociological Research Online, 6(3), http://www.socresonline.org.uk/6/3/innes.html. Accessed 8 Oct 2014.
- Kitchin, R. (2013). Big data and human geography: Opportunities, challenges and risks. Dialogues in Human Geography, 79(1), 1–14.Google Scholar
- Kitchin, R. (2014a). Big data, new epistemologies and paradigm shifts. Big Data and Society, 1(1), 1–12.Google Scholar
- Kitchin, R. (2014b). The real-time city? Big data and smart urbanism. GeoJournal, 3(3), 262–267.Google Scholar
- Koops, B. J. (2011). Forgetting footprints, shunning shadows: A critical analysis of the ‘right to be forgotten’ in big data practice. SCRIPTed, 8(3), 229–256.Google Scholar
- Laney, D. (2001). 3D data management: Controlling data volume, velocity and variety. Meta Group. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 16 Jan 2013.
- Lauriault, T.P. (2012). Data, infrastructures and geographical imaginations: Mapping data access discourses in Canada. PhD Thesis, Ottawa: Carleton University.Google Scholar
- Lauriault, T. P., Craig, B. L., Taylor, D. R. F., & Pulsifier, P. L. (2007). Today’s data are part of tomorrow’s research: Archival issues in the sciences. Archivaria, 64, 123–179.Google Scholar
- Lauriault, T. P., Hackett, Y., & Kennedy, E. (2013). Geospatial data preservation primer. Arthurs and Low: Hickling.Google Scholar
- Loukides, M. (2010). What is data science? O’Reilly Radar, 2 June 2010, http://radar.oreilly.com/2010/06/what-is-data-science.html. Accessed 28 Jan 2013.
- Manovich, L. (2011). Trending: The promises and the challenges of big social data. http://www.manovich.net/DOCS/Manovich_trending_paper.pdf. Accessed 9 Nov 2012.
- Manyika, J., Chiu, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., et al. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.Google Scholar
- Marz, N., & Warren, J. (2012). Big data: Principles and best practices of scalable realtime data systems. Manning: MEAP edition.Google Scholar
- Mayer-Schonberger, V., & Cukier, K. (2013). Big data: A revolution that will change how we live. John Murray: Work and Think.Google Scholar
- Moretti, F. (2005). Graphs, maps, trees: Abstract models for a literary history. London: Verso.Google Scholar
- Rameriz, E. (2013). The privacy challenges of big data: A view from the lifeguard’s chair. Technology Policy Institute Aspen Forum, August 19th. http://ftc.gov/speeches/ramirez/130819bigdataaspen.pdf. Accessed 11 Oct 2013.
- Ramsay, S. (2010). Reading machines: Towards an algorithmic criticism. Champaign, IL: University of Illinois Press.Google Scholar
- Siegel, E. (2013). Predictive analytics. Hoboken, NJ: Wiley.Google Scholar
- Singer, N. (2012). You for sale: Mapping, and sharing, the consumer genome. New York Times, 17th June, www.nytimes.com/2012/06/17/technology/acxiom-the-quiet-giant-of-consumer-database-marketing.html. Accessed 11 Oct 2013.
- Wyly, E. (2014). Automated (post) positivism. Urban Geography, 35(5), 669–690.Google Scholar