Abstract
We present preliminary findings from a three-year research project comprised of longitudinal qualitative case studies of data practices in four large, distributed, highly multidisciplinary scientific collaborations. This project follows a 2 \(\times \) 2 research design: two of the collaborations are big science while two are little science, two have completed data collection activities while two are ramping up data collection. This paper is centered on one of these collaborations, a project bringing together scientists to study subseafloor microbial life. This collaboration is little science, characterized by small teams, using small amounts of data, to address specific questions. Our case study employs participant observation in a laboratory, interviews (\(n=49\) to date) with scientists in the collaboration, and document analysis. We present a data workflow that is typical for many of the scientists working in the observed laboratory. In particular, we show that, although this workflow results in datasets apparently similar in form, nevertheless a large degree of heterogeneity exists across scientists in this laboratory in terms of the methods they employ to produce these datasets—even between scientists working on adjacent benches. To date, most studies of data in little science focus on heterogeneity in terms of the types of data produced: this paper adds another dimension of heterogeneity to existing knowledge about data in little science. This additional dimension makes more complex the task of management and curation of data for subsequent reuse. Furthermore, the nature of the factors that contribute to heterogeneity of methods suggest that this dimension of heterogeneity is a persistent and unavoidable feature of little science.
Similar content being viewed by others
References
Altman, M.: Digital preservation through archival collaboration: the data preservation alliance for the social sciences. Am. Arch. 72(1), 169–182 (2009)
Anderson C.: The long tail. Wired Mag., 12(10) (2004, October). http://www.wired.com/wired/archive/12.10/tail_pr.html
Aronova, E., Baker, K.S., Oreskes, N.: Big science and big data in biology: from the international geophysical year through the international biological program to the long term ecological research (LTER) network, 1957-present. Hist. Stud. Nat. Sci. 40(2), 183–224 (2010). doi:10.1525/hsns.2010.40.2.183
Association of Research Libraries: The research library’s role in digital repository services: final report of the ARL digital repository issues task force. Association of Research Libraries. Washington, DC (2009b). www.arl.org/bm~doc/repository-services-report.pdf
Bechhofer, S., Ainsworth, J., Bhagat, J., Buchan, I., Couch, P., Cruickshank, D., Sufi, S.: Why linked data is not enough for scientists. In: Sixth IEEE e-science conference. Brisbane, Australia (2010). http://eprints.ecs.soton.ac.uk/21587/
Berman, F., Lavoie, B., Ayris, P., Choudhury, G. S., Cohen, E., Courant, P., Van Camp, A.: Sustaining the digital investment: issues and challenges of economically sustainable digital preservation (Interim Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access). San Diego (2008). http://brtf.sdsc.edu/publications.html
Bijker, W.E., Hughes, T.P., Pinch, T.J.: The Social Construction of Technological Systems: New Directions in the Sociology and History of Technology. MIT Press, Cambridge (1987)
Borgman, C. L.: Big data, little data, no data: scholarship in the networked world. MIT Press, Cambridge, MA (2015)
Borgman, C. L.: The premise and promise of the global information infrastructure. First Monday, 5 (2000). http://www.firstmonday.dk/issues/issue5_8/borgman/index.html
Borgman, C.L.: Scholarship in the Digital Age: Information, Infrastructure, and the Internet. MIT Press, Cambridge (2007)
Borgman, C.L.: The conundrum of sharing research data. J. Am. Soc. Inf. Sci. Technol. 63(6), 1059–1078 (2012). doi:10.1002/asi.22634
Borgman, C.L., Wallis, J.C.: Building digital libraries for scientific data: an exploratory study of data practices in habitat ecology. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds.) Proceedings of the 10th European Conference on Research and Advanced Technology for Digital Libraries, pp. 170–183. Springer, Berlin, Heidelberg, Alicante, Spain (2006)
Borgman, C.L., Wallis, J.C., Enyedy, N.D.: Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries. Int. J. Digit. Libr. 7(1–2), 17–30 (2007). doi:10.1007/s00799-007-0022-9
Borgman, C.L., Wallis, J.C., Mayernik, M.S.: Who’s got the data? Interdependencies in science and technology collaborations. Comput. Support. Coop. Work 21(6), 485–523 (2012). doi:10.1007/s10606-012-9169-z
Bozeman, B., Fay, D., Slade, C.P.: Research collaboration in universities and academic entrepreneurship: the-state-of-the-art. J. Technol. Transf. 38(1), 1–67 (2013). doi:10.1007/s10961-012-9281-8
Callon, M.: The sociology of an actor–network: the case of the electric vehicle. In: Mapping the Dynamics of Science and Technology: Sociology of Science in the Real World, pp. 19–34. Macmillan, London (1986)
Center for Dark Energy Biosphere Investigations: Center for dark energy biosphere investigations STC annual report 2013 (2014). http://www.darkenergybiosphere.org/internal/docs/C-DEBI-Annual-Report-2013.pdf
CODATA-ICSTI Task Group on Data Citation Standards Practices: Out of cite, out of mind: the current state of practice, policy, and technology for the citation of data. Data Sci. J., 12, CIDCR1–CIDCR75 (2013). doi:10.2481/dsj.OSOM13-043
Data’s shameful neglect. Nature, 461(7261), 145 (2009). doi:10.1038/461145a
Dealing with data. Science, 331(6018), 692–729 (2011)
Deuten, J. J.: Cosmopolitanising technologies: a study of four emerging technological regimes. Twente University Press, Enschede (2003).http://doc.utwente.nl/38695/1/t0000007.pdf
Edwards, K.: Center for dark energy biosphere investigations (C-DEBI): a center for resolving the extent, function, dynamics and implications of the subseafloor Biosphere (2009). http://www.darkenergybiosphere.org/internal/docs/2009C-DEBI_FullProposal.pdf
Edwards, P.N.: A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming. MIT Press, Cambridge, MA (2010)
Edwards, P. N., Jackson, S. J., Bowker, G. C., Knobel, C. P.: Understanding infrastructure: dynamics, tensions, and design: report of a workshop on history and theory of infrastructure, lessons for new scientific cyberinfrastructures. National Science Foundation, Washington, DC (2007). http://hdl.handle.net/2027.42/49353
Edwards, P. N., Jackson, S. J., Chalmers, M. K., Bowker, G. C., Borgman, C. L., Ribes, D., Calvert, S.: Knowledge infrastructures: intellectual frameworks and research challenges (p. 40). University of Michigan, Ann Arbor, MI (2013). http://deepblue.lib.umich.edu/handle/2027.42/97552
Faniel, I.M., Jacobsen, T.E.: Reusing scientific data: how earthquake engineering researchers assess the reusability of colleagues’ data. J. Comput. Support. Coop. Work 19(3–4), 355–375 (2010). doi:10.1007/s10606-010-9117-8
Glaser, B.G., Strauss, A.L.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Pub. Co, Chicago (1967)
Hammersley, M., Atkinson, P.: Ethnography: Principles in Practice. Routledge, London (2007)
Helland, P.: If you have too much data, then “good enough” is good enough. Commun. ACM 54, 40–47 (2011). doi:10.1145/1953122.1953140
Hey, A.J.G., Trefethen, A.: The data deluge: an e-science perspective. In: Berman, F. Fox, G., Hey, A.J.G. (Eds.) Grid computing: making the global infrastructure a reality, pp. 809–824. Wiley, West Sussex, England (2003). http://www.rcuk.ac.uk/escience/documents/report_datadeluge.pdf
Hine, C.: Connective ethnography for the exploration of e-science. J. Comput. Media. Commun. 12(2), 618–634 (2007). doi:10.1111/j.1083-6101.2007.00341.x
Hughes, T.P.: Technological momentum. In: Smith, M.R., Marx, L. (eds.) Does Technology Drive History? The Dilemma of Technological Determinism. pp. 101–113. MIT Press, Cambridge, MA (1994)
Kallmeyer, J., Pockalny, R., Adhikari, R.R., Smith, D.C., D’Hondt, S.: Global distribution of microbial abundance and biomass in subseafloor sediment. Proc. Natl. Acad. Sci. 109(40), 16213–16216 (2012)
Karasti, H., Baker, K.S., Millerand, F.: Infrastructure time: long-term matters in collaborative development. Comput. Support. Coop. Work (CSCW) 19(3–4), 377–415 (2010). doi:10.1007/s10606-010-9113-z
Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., et al.: Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28(12), 1647–1649 (2012)
Knorr-Cetina, K.: The manufacture of knowledge. Pergamon Press Oxford, (1981). http://sites.google.com/site/sciencestudies09/reader/Knorr-Cetina_ManKnow-Chapter1.doc
Knorr-Cetina, K.: Epistemic Cultures: How the Sciences Make Knowledge. Harvard University Press, Cambridge (1999)
Latour, B.: Science in Action: How to Follow Scientists and Engineers through Society. Harvard University Press, Cambridge (1987)
Latour, B., Woolgar, S.: Laboratory Life: The Construction of Scientific Facts, 2nd edn. Princeton University Press, Princeton (1986)
Lloyd, K.G., May, M.K., Kevorkian, R.T., Steen, A.D.: Meta-analysis of quantification methods shows that archaea and bacteria have similar abundances in the subseafloor. Appl. Environ. Microbiol. 79(24), 7790–7799 (2013)
Lynch, M.: Art and artifact in laboratory science: a study of shop work and shop talk in a research laboratory. Routledge & Kegan Paul, London (1985)
Meng, X.-L.: Multi-party inference and uncongeniality. In: Lovric, M. (ed.), International Encyclopedia of Statistical Science, pp. 884–888. Springer, Berlin Heidelberg (2011). http://link.springer.com/referenceworkentry/10.1007/978-3-642-04898-2_381
O’Donoghue, T., Punch, K.: Qualitative Educational Research in Action: Doing and Reflecting. Routledge, London (2004)
Office of Science and Technology Policy: Harnessing the power of digital data for science and society: Report of the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council. Washington, D.C. (2009). http://www.nitrd.gov/About/Harnessing_Power.aspx
Østerlund, C., Carlile, P.: Relations in practice: sorting through practice theories on knowledge sharing in complex organizations. Inf. Soc. 21(2), 91–107 (2005)
Palmer, C. L., Cragin, M. H., Heidorn, P. B., Smith, L. C.: Data curation for the long tail of science: the case of environmental studies. In: Presented at the 3rd International Digital Curation Conference, Washington, DC (2007). https://apps.lis.uiuc.edu/wiki/download/attachments/32666/Palmer_DCC2007.rtf?version=1
Ribes, D., Bowker, G.C.: Between meaning and machine: learning to represent the knowledge of communities. Inf. Org. 19(4), 199–217 (2009). doi:10.1016/j.infoandorg.2009.04.001
Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75(23), 7537–7541 (2009)
Star, S.L., Ruhleder, K.: Steps toward an ecology of infrastructure: design and access for large information spaces. Inf. Syst. Res. 7(1), 111–134 (1996). doi:10.1287/isre.7.1.111
Traweek, S.: Beamtimes and Lifetimes: The World of High Energy Physicists (1st Harvard University Press pbk.). Harvard University Press, Cambridge (1988)
Uhlir, P. F. (Ed.): For attribution-developing data attribution and citation practices and standards: summary of an International Workshop. The National Academies Press, Washington, D.C (2012). http://www.nap.edu/catalog.php?record_id=13564
Wallis, J.C., Borgman, C.L.: Who is responsible for data? An exploratory study of data authorship, ownership, and responsibility. In: Annual meeting of the American Society for Information Science and Technology (Vol. 48, pp. 1–10). New Orleans, LA. Information (2011). doi:10.1002/meet.2011.14504801188
Wallis, J.C., Borgman, C.L., Mayernik, M.S., Pepe, A.: Moving archival practices upstream: an exploration of the life cycle of ecological sensing data in collaborative field research. Int. J. Digital Curation 3(1), 114–126 (2008). doi:10.2218/ijdc.v3i1.46
Wallis, J.C., Borgman, C.L., Mayernik, M.S., Pepe, A., Ramanathan, N., Hansen, M. A.: Know thy sensor: trust, data quality, and data integrity in scientific digital libraries. In: Proceedings of the 11th European Conference on Research and Advanced Technology for Digital Libraries, Vol. LINCS 4675, pp. 380–391. Springer, Budapest, Hungary:Berlin (2007). doi:10.1007/978-3-540-74851-9_32
Wallis, J.C., Rolando, E., Borgman, C.L.: If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology. PLoS ONE 8(7), e67332 (2013). doi:10.1371/journal.pone.0067332
Acknowledgments
The work in this paper has been supported by the Sloan Foundation Award #20113194, The Transformation of Knowledge, Culture and Practice in Data-Driven Science: A Knowledge Infrastructures Perspective. We also acknowledge the contributions of Milena Golshan, Irene Pasquetto, and Laura A. Wynholds for commenting on drafts of this paper, and Elaine Levia for technical and administrative support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Darch, P.T., Borgman, C.L., Traweek, S. et al. What lies beneath?: Knowledge infrastructures in the subseafloor biosphere and beyond. Int J Digit Libr 16, 61–77 (2015). https://doi.org/10.1007/s00799-015-0137-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-015-0137-3