GeoJournal

, Volume 80, Issue 4, pp 463–475 | Cite as

Small data in the era of big data

Article

Abstract

Academic knowledge building has progressed for the past few centuries using small data studies characterized by sampled data generated to answer specific questions. It is a strategy that has been remarkably successful, enabling the sciences, social sciences and humanities to advance in leaps and bounds. This approach is presently being challenged by the development of big data. Small data studies will however, we argue, continue to be popular and valuable in the future because of their utility in answering targeted queries. Importantly, however, small data will increasingly be made more big data-like through the development of new data infrastructures that pool, scale and link small data in order to create larger datasets, encourage sharing and reuse, and open them up to combination with big data and analysis using big data analytics. This paper examines the logic and value of small data studies, their relationship to emerging big data and data science, and the implications of scaling small data into data infrastructures, with a focus on spatial data examples.

Keywords

Big data Small data Data infrastructures Cyber-infrastructures Ontology Epistemology 

Notes

Acknowledgments

The research conducted for this paper was made possible with funding from the European Research Council (ERC-2012-AdG-323636) and Science Foundation Ireland.

References

  1. Amin, A., & Thrift, N. (2002). Cities: Reimagining the urban. London: Polity.Google Scholar
  2. Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired, June 23, 2008, http://www.wired.com/science/discoveries/magazine/16-07/pb_theo-ry. Accessed 12 Oct 2012.
  3. Batty, M. (2013). The new science of cities. Cambridge, MA: MIT Press.Google Scholar
  4. Berry, D. (2011). The computational turn: Thinking about the digital humanities. Culture Machine 12. http://www.culturemachine.net/index.php/cm/article/view/440/470. Accessed 3 Dec 2012.
  5. Bollier, D. (2010). The promise and peril of big data. The Aspen Institute. http://www.aspeninstitute.org/sites/default/files/content/docs/pubs/The_Promise_and_Peril_of_Big_Data.pdf. Accessed 1 Oct 2012.
  6. Borgman, C. L. (2007). Scholarship in the digital age. Cambridge, MA: MIT Press.Google Scholar
  7. boyd, D., & Crawford, K. (2012). Critical questions for big data. Information, Communication and Society, 15(5), 662–679.CrossRefGoogle Scholar
  8. Brooks, D. (2013). What data can’t do. New York Times, http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html. Accessed 18 Feb 2013.
  9. Canadian Internet Public Policy Interest Clinic (CIPPIC). (2006). On the data trail: How detailed information about you gets into the hands of organizations with whom you have no relationship. Ottawa: A Report on the Canadian Data Brokerage Industry. https://www.cippic.ca/sites/default/files/May1-06/DatabrokerReport.pdf.
  10. Clarke, R. (1988). Information technology and dataveillance. Communications of ACM, 31(5 May 1988), 498–512.CrossRefGoogle Scholar
  11. Cohen, D. (2008). Contribution to: The promise of digital history (roundtable discussion). Journal of American History, 95(2), 452–491.CrossRefGoogle Scholar
  12. Constine, J. (2012). How big is facebook’s data? 2.5 billion pieces of content and 500 + terabytes ingested every day, 22 August 2012, http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/. Accessed 28 Jan 2013.
  13. Crampton, J., Graham, M., Poorthuis, A., Shelton, T., Stephens, M., Wilson, M. W., et al. (2012). Beyond the Geotag? Deconstructing “big data” and leveraging the potential of the geoweb. http://www.uky.edu/~tmute2/geography_methods/readingPDFs/2012-Beyond-the-Geotag-2012.10.01.pdf. Accessed 21 Feb 2013.
  14. Cyberinfrastructure Council. (2007). Cyberinfrastructure vision for 21st century discovery. http://www.nsf.gov/pubs/2007/nsf0728/index.jsp?org=EEC Washington, DC: National Science Foundation. Accessed 17 Jan 2014.
  15. Dasish. (2012). Roadmap for preservation and curation in the social sciences and humanities. http://dasish.eu/publications/projectreports/D4.1_-_Roadmap_for_Preservation_and_Curation_in_the_SSH.pdf/. Accessed 15 Oct 2013.
  16. Dodge, M., & Kitchin, R. (2005). Codes of life: Identification codes and the machine-readable world. Environment and Planning D: Society and Space, 23(6), 851–881.CrossRefGoogle Scholar
  17. Edwards, J. (2013). Facebook is about to launch a huge play in ‘big data’ analytics. Business insider, May 10th http://www.businessinsider.com/facebook-is-about-to-launch-a-huge-play-in-big-data-analytics-2013-5. Accessed 18 Sept 2013.
  18. Environics Analytics. (2013a). Wealth$capes: Dollars and sense, http://www.environicsanalytics.ca/environics-analytics/data/financial-data/wealthscapes. Accessed 26 Nov 2013.
  19. Environics Analytics. (2013b). PRiZMc2 segmentation lifestyle lookup tool, http://www.environicsanalytics.ca/prizm-c2-cluster-lookup. Accessed 26 Nov 2013.
  20. Fry, J., Lockyer, S., Oppenheim, C., Houghton, J. W., & Rasmussen, B. (2008). Identifying benefits arising from the curation and open sharing of research data produced by UK higher education and research institutes. London and Bristol: JISC. http://repository.jisc.ac.uk/279/. Accessed 8 Oct 2014.
  21. Graham, S. (2005). Software-sorted geographies. Progress in Human Geography, 29(5), 562–580.CrossRefGoogle Scholar
  22. Hacking, I. (1975). The emergence of probability. Cambridge: Cambridge University Press.Google Scholar
  23. Hacking, I. (1990). The taming of chance. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  24. Hacking, I. (2007). Kinds of people, moving targets. In Proceedings of the British Academy (Vol. 151, pp. 285–318), 2006 Lectures. British Academy Lecture, Read at the Academy 11 April 2006.Google Scholar
  25. Han, J., Kamber, M., & Pei, (2011). Data mining: Concepts and techniques (3rd ed.). Waltham: Morgan Kaufmann.Google Scholar
  26. Haraway, D. (1991). Simians, cyborgs and women: The reinvention of nature. New York: Routledge.Google Scholar
  27. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd edition ed.). Berlin: Springer.CrossRefGoogle Scholar
  28. Innes, M. (2001). Control creep. Sociological Research Online, 6(3), http://www.socresonline.org.uk/6/3/innes.html. Accessed 8 Oct 2014.
  29. Kelling, S., Hochachka, W., Fink, D., Riedewald, M., Caruana, R., Ballard, G., et al. (2009). Data-intensive science: A new paradigm for biodiversity studies. BioScience, 59(7), 613–620.CrossRefGoogle Scholar
  30. Kitchin, R. (2013). Big data and human geography: Opportunities, challenges and risks. Dialogues in Human Geography, 79(1), 1–14.Google Scholar
  31. Kitchin, R. (2014a). Big data, new epistemologies and paradigm shifts. Big Data and Society, 1(1), 1–12.Google Scholar
  32. Kitchin, R. (2014b). The real-time city? Big data and smart urbanism. GeoJournal, 3(3), 262–267.Google Scholar
  33. Kitchin, R., & Dodge, M. (2011). Code/space: Software and everyday life. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
  34. Koops, B. J. (2011). Forgetting footprints, shunning shadows: A critical analysis of the ‘right to be forgotten’ in big data practice. SCRIPTed, 8(3), 229–256.Google Scholar
  35. Laney, D. (2001). 3D data management: Controlling data volume, velocity and variety. Meta Group. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 16 Jan 2013.
  36. Lauriault, T.P. (2012). Data, infrastructures and geographical imaginations: Mapping data access discourses in Canada. PhD Thesis, Ottawa: Carleton University.Google Scholar
  37. Lauriault, T. P., Craig, B. L., Taylor, D. R. F., & Pulsifier, P. L. (2007). Today’s data are part of tomorrow’s research: Archival issues in the sciences. Archivaria, 64, 123–179.Google Scholar
  38. Lauriault, T. P., Hackett, Y., & Kennedy, E. (2013). Geospatial data preservation primer. Arthurs and Low: Hickling.Google Scholar
  39. Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., et al. (2009). Computational social science. Science, 323, 721–733.CrossRefGoogle Scholar
  40. Loukides, M. (2010). What is data science? O’Reilly Radar, 2 June 2010, http://radar.oreilly.com/2010/06/what-is-data-science.html. Accessed 28 Jan 2013.
  41. Lyon, D. (2002). Everyday surveillance: Personal data and social classifications. Information, Communication and Society, 5, 242–257.CrossRefGoogle Scholar
  42. Manovich, L. (2011). Trending: The promises and the challenges of big social data. http://www.manovich.net/DOCS/Manovich_trending_paper.pdf. Accessed 9 Nov 2012.
  43. Manyika, J., Chiu, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., et al. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.Google Scholar
  44. Marz, N., & Warren, J. (2012). Big data: Principles and best practices of scalable realtime data systems. Manning: MEAP edition.Google Scholar
  45. Mayer-Schonberger, V., & Cukier, K. (2013). Big data: A revolution that will change how we live. John Murray: Work and Think.Google Scholar
  46. Miller, H. J. (2010). The data avalanche is here. Shouldn’t we be digging? Journal of Regional Science, 50(1), 181–201.CrossRefGoogle Scholar
  47. Moretti, F. (2005). Graphs, maps, trees: Abstract models for a literary history. London: Verso.Google Scholar
  48. O’Carroll, A., Collins, S., Gallagher, D., Tang, J., & Webb, S. (2013). Caring for digital content, mapping international approaches Nui Maynooth. Dublin: Trinity College Dublin, Royal Irish Academy and Digital Repository of Ireland.CrossRefGoogle Scholar
  49. Rameriz, E. (2013). The privacy challenges of big data: A view from the lifeguard’s chair. Technology Policy Institute Aspen Forum, August 19th. http://ftc.gov/speeches/ramirez/130819bigdataaspen.pdf. Accessed 11 Oct 2013.
  50. Ramsay, S. (2010). Reading machines: Towards an algorithmic criticism. Champaign, IL: University of Illinois Press.Google Scholar
  51. Ruppert, E. (2013). Rethinking empirical social sciences. Dialogues in Human Geography, 3(3), 268–273.CrossRefGoogle Scholar
  52. Sawyer, S. (2008). Data wealth, data poverty, science and cyberinfrastructure. Prometheus: Critical Studies in Innovation, 26(4), 355–371.CrossRefGoogle Scholar
  53. Siegel, E. (2013). Predictive analytics. Hoboken, NJ: Wiley.Google Scholar
  54. Singer, N. (2012). You for sale: Mapping, and sharing, the consumer genome. New York Times, 17th June, www.nytimes.com/2012/06/17/technology/acxiom-the-quiet-giant-of-consumer-database-marketing.html. Accessed 11 Oct 2013.
  55. Solove, D. J. (2006). A taxonomy of privacy. University of Pennsylvania Law Review, 154(3), 477–560.CrossRefGoogle Scholar
  56. Wyly, E. (2014). Automated (post) positivism. Urban Geography, 35(5), 669–690.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.NIRSANational University of Ireland MaynoothCounty KildareIreland

Personalised recommendations