Abstract
Science and technology always have been interdependent, but never more so than with today’s highly instrumented data collection practices. We report on a long-term study of collaboration between environmental scientists (biology, ecology, marine sciences), computer scientists, and engineering research teams as part of a five-university distributed science and technology research center devoted to embedded networked sensing. The science and technology teams go into the field with mutual interests in gathering scientific data. “Data” are constituted very differently between the research teams. What are data to the science teams may be context to the technology teams, and vice versa. Interdependencies between the teams determine the ability to collect, use, and manage data in both the short and long terms. Four types of data were identified, which are managed separately, limiting both reusability of data and replication of research. Decisions on what data to curate, for whom, for what purposes, and for how long, should consider the interdependencies between scientific and technical processes, the complexities of data collection, and the disposition of the resulting data.
Similar content being viewed by others
References
Aronova, E., K. S. Baker, and N. Oreskes (2010): Big Science and Big Data in Biology: From the International Geophysical Year through the International Biological Program to the Long Term Ecological Research (LTER) Network, 1957–Present. Historical Studies in the Natural Sciences, vol. 40, no. 2, pp. 183–224.
Arzberger, P., P. Schroeder, A. Beaulieu, G. C. Bowker, K. Casey, L. Laaksonen, D. Moorman, P. F. Uhlir, and P. Wouters (2004a): An International Framework to Promote Access to Data. Science, vol. 303, no. 5665, pp. 1777–1778.
Arzberger, P., P. Schroeder, A. Beaulieu, G. C. Bowker, K. Casey, L. Laaksonen, D. Moorman, P. F. Uhlir, and P. Wouters (2004b): Promoting Access to Public Research Data for Scientific, Economic, and Social Development. Data Science Journal, vol. 3, pp. 135–152.
Basili, V. R., and M. V. Zelkowitz (2007): Empirical studies to build a science of computer science. Communications of the ACM, vol. 50, no. 11, pp. 33–37.
Batalin, M. A., M. Rahimi, Y. Yu, D. Liu, A. Kansal, G. S. Sukhatme, W. J. Kaiser, M. Hansen, G. J. Pottie, M. Srivastava, and D. Estrin (2004): Call and Response: Experiments in Sampling the Environment. Proceedings of the 2nd international conference on Embedded networked sensor systems, Los Angeles. New York, NY: ACM Press. pp. 25–38.
Berman, H. M., J. Westbrook, J. Feng, G. Gilliland, T. N. Bhat, H. Wessig, I. N. Shindyalov, and P. E. Bourne (2000): The Protein Data Bank. Nucleic Acids Research, vol. 28, pp. 235–242.
Borgman, C. L. (2007): Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, MA: MIT Press.
Borgman, C. L. (2012): The conundrum of sharing research data. Journal of the American Society for Information Science and Technology. http://dx.doi.org/10.1002/asi.22634.
Borgman, C. L., J. C. Wallis, and N. Enyedy (2006a): Building digital libraries for scientific data: An exploratory study of data practices in habitat ecology. 10th European Conference on Digital Libraries, Alicante, Spain. Berlin: Springer. pp. 170–183.
Borgman, C. L., J. C. Wallis, N. Enyedy, and M. S. Mayernik (2006b): Capturing habitat ecology in reusable forms: A case study with embedded networked sensor technology. Annual Meeting of the Society for the Social Studies of Science, Vancouver, BC. http://works.bepress.com/borgman/226/.
Borgman, C. L., J. C. Wallis, and N. Enyedy (2007a): Little Science confronts the data deluge: Habitat ecology, embedded sensor networks, and digital libraries. International Journal on Digital Libraries, vol. 7, nos. 1–2, pp. 17–30.
Borgman, C. L., J. C. Wallis, M. S. Mayernik, and A. Pepe (2007b): Drowning in data: Digital library architecture to support scientific use of embedded sensor networks. Joint Conference on Digital Libraries, Vancouver, British Columbia, Canada. Association for Computing Machinery. pp. 269–277.
Bourne, P. (2005): Will a biological database be different from a biological journal? PLoS Computational Biology, vol. 1, no. 3, pp. e34.
Bowen, G. M., and W.-M. Roth (2007): The practice of field ecology: Insights for science education. Research in Science Education, vol. 37, no. 2, pp. 171–187.
Carver, J., L. Hochstein, R. Kendall, T. Nakamura, M. Zelkowitz, V., Basili, and D. Post (2006): Observations about software development for high-end computing. Cyberinfrastructure Technology Watch Quarterly, vol. 2, no. 4a, pp. 33–38.
Chen, J. C., J. Elson, H. Wang, D. Maniezzo, R. E. Hudson, K., Yao, and D. Estrin (2003): Coherent Acoustic Array Processing and Localization on Wireless Sensor Networks. Proceedings of the IEEE, vol. 91, no. 8, pp. 1154–1162.
Cole, F. T. H. (2008): Taking “Data” (as a Topic): The Working Policies of Indifference, Purification and Differentiation. 19th Australasian Conference on Information Systems, Christchurch, NZ. pp. 240–249.
Collins, H. M. (1975): The seven sexes: A study in the sociology of a phenomenon, or the replication of experiments in physics. Sociology, vol. 9, pp. 205–224.
Collings, H. M. (1998): The Meaning of Data: Open and Closed Evidential Cultures in the Search for Gravitational Waves. American Journal of Sociology, vol. 104, no. 2, pp. 293–338.
Cragin, M. H., and K. Shankar (2006): Scientific data collections and distributed collective practice. Computer Supported Cooperative Work, vol. 15, pp. 185–204.
National Science Foundation (2007): Cyberinfrastructure Vision for 21st Century Discovery. http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf.
de Souza, C., J. Froehlich, and P. Dourish (2005): Seeking the source: software source code as a social and technical artifact. Proceedings of the 2005 international ACM SIGGROUP Conference, Sanibel Island, Florida, Association for Computing Machinery. pp. 197–206.
Easterbrook, S. M., and T. C. Johns (2009): Engineering the software for understanding climate change. Computing in Science & Engineering, vol. 11, no. 6, pp. 64–74.
Edwards, P. N., M. S. Mayernik, A. L. Batcheller, G. C. Bowker, and C. L. Borgman (2011): Science Friction: Data, Metadata, and Collaboration. Social Studies of Science, vol. 41, no. 5, pp. 667–690.
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers (2001): Washington, D.C.: National Academy Press. http://www.nap.edu/openbook.php?record_id=10193.
Estrin, D., W. K. Michener, and G. Bonito (2003): Environmental cyberinfrastructure needs for distributed sensor networks: A report from a National Science Foundation sponsored workshop. Scripps Institute of Oceanography. http://www.lternet.edu/sensor_report/.
Faniel, I. M., and T. E. Jacobsen (2010): Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues’ Data. Journal of Computer-Supported Cooperative Work, vol. 19, nos. 3–4, pp. 355–375.
Fry, J. (2006): Scholarly research and information practices: A domain analytic approach. Information Processing and Management, vol. 2006, no. 42, pp. 299–316.
GEON. (2010): http://www.geongrid.org/. Accessed 20 August 2010.
Giere, R. N. (1999): Science without Laws. Chicago: University of Chicago Press.
Hamilton, M. P., E. A. Graham, P. W. Rundel, M. F. Allen, W. Kaiser, M. H. Hansen, and D. L. Estrin (2007): New Approaches in Embedded Networked Sensing for Terrestrial Ecological Observatories. Environmental Engineering Science, vol. 24, no. 2.
Harmon, T. C., R. F. Ambrose, R. M. Gilbert, J. C. Fisher, M. Stealey, and W. J. Kaiser (2007): High-Resolution River Hydraulic and Water Quality Characterization Using Rapidly Deployable Networked Infomechanical Systems (NIMS RD). Environmental Engineering Science, vol. 24, no. 2, pp. 151–159.
Jackson, S. J., D. Ribes, and A. Buyuktur (2010): Exploring Collaborative Rhythm: Temporal Flow and Alignment in Collaborative Scientific Work. iConference 2010, Urbana-Champagne, IL. http://www.ideals.illinois.edu/handle/2142/14955.
Jirotka, M., R. Procter, T. Rodden, and G. C. Bowker (2006): Special Issue: Collaboration in e-Research. Computer Supported Cooperative Work, vol. 15, pp. 251–255.
Kanfer, A. G., C. Haythornthwaite, B. C. Bruce, G. C. Bowker, N. C. Burbules, J. F. Porac, and J. Wade (2000): Modeling distributed knowledge processes in next generation multidisciplinary alliances. Information Systems Frontiers, vol. 2, nos. 3–4, pp. 317–331.
Karasti, H., K. S. Baker, and E. Halkola (2006): Enriching the notion of data curation in e-Science: Data managing and information infrastructuring in the Long Term Ecological Research (LTER) Network. Journal of Computer-Supported Cooperative Work, vol. 15, no. 4, pp. 321–358.
Karasti, H., K. S. Baker, and F. Millerand (2010): Infrastructure Time: Long-term Matters in Collaborative Development. Computer Supported Cooperative Work, vol. 19, nos. 3–4, pp. 377–415.
Kwa, C. (2005): Local ecologies and global science: Discourses and strategies of the International Geosphere-Biosphere Programme. Social Studies of Science, vol. 35, no. 6, pp. 923–950.
Latour, B. (1987): Science in Action: How to Follow Scientists and Engineers through Society. Cambridge, MA: Harvard University Press.
Lawrence, K. A. (2006): Walking the Tightrope: The Balancing Acts of a Large e-Research Project. Computer Supported Cooperative Work, vol. 15, pp. 385–411.
Lee, C. P., P. Dourish, and G. Mark (2006): The human infrastructure of cyberinfrastructure. Proceedings of the Conference on Computer-Supported Cooperative Work, Banff, Alberta, Association for Computing Machinery. pp. 483–492.
Lee, C. P., D. Ribes, M. Bietz, M. Jirotka, and H. Karasti (2010): Supporting Scientific Collaboration Through Cyberinfrastructure and e-Science: Special issue. Computer Supported Cooperative Work, vol. 19, nos. 3–4.
Long-Lived Digital Data Collections (2005): National Science Board.
Maurer, B. A. (2004): Models of Scientific Inquiry and Statistical Practice: Implications for the structure of scientific knowledge. In Taper, M. L., and Lele, S. R. (Eds.). The Nature of Scientific Evidence: Statistical, philosophical, and empirical considerations. Chicago, London, The University of Chicago Press, pp. 17–50.
Mayernik, M. S. (2011): Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators. PhD Dissertation. Information Studies. UCLA. Los Angeles, CA.
Mayernik, M. S., A. L. Batcheller, and C. L. Borgman (2011): How Institutional Factors Influence the Creation of Scientific Metadata. iConference, Seattle, WA, Association for Computing Machinery.
Mayernik, M. S., J. C. Wallis, and C. L. Borgman (in review): Unearthing the infrastructure: Humans and sensors in environmental and ecological field research.
Mun, M., S. Reddy, K. Shilton, N. Yau, J. Burke, D. Estrin, M. Hansen, E. Howard, R. West, and P. Boda (2009): PEIR, the Personal Environmental Impact Report, as a Platform for Participatory Sensing Systems Research. Proceedings of the 7th International Conference on Mobile Systems, Applications, and Service, Krakow, Poland. pp. 55–68.
National Ecological Observatory Network (2010): http://www.neoninc.org/. Accessed 20 August 2010.
NIMS: Networked Infomechanical Systems (2006): http://www.cens.ucla.edu/portal/nims. Accessed 3 October 2006.
Pepe, A. (2010): Structure and Evolution of Scientific Collaboration Networks in a Modern Research Collaboratory. Doctoral. Information Studies. UCLA. Los Angeles, CA.
Pepe, A., and M. A. Rodriguez (2010): Collaboration in sensor network research: an in-depth longitudinal analysis of assortative mixing patterns. Scientometrics, vol. 84, no. 3, pp. 687–701.
Pon, R., M. Maxim Batalin, J. Gordon, M. H. Rahimi, W. Kaiser, G. S. Sukhatme, M. Srivastava, and D. Estrin (2005): Networked Infomechanical Systems: A Mobile Wireless Sensor Network Platform. IEEE/ACM Fourth International Conference on Information Processing in Sensor Networks (IPSN-SPOTS). pp. 376–381.
Protein Data Bank (2006): http://www.rcsb.org/pdb/. Accessed 4 October 2006.
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases (1999): Washington, DC: National Academy Press.
Rahimi, M. H., W. Kaiser, G. S. Sukhatme, and D. Estrin (2005): Adaptive sampling for environmental field estimation using robotic sensors. IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 3692–3698.
Renear, A. H., S. Sacchi, and K. M. Wickett (2010): Definitions of Dataset in the Scientific and Technical Literature. Proceedings of the Annual Meeting of the American Society for Information Science and Technology, vol. 47, no. 1, pp. 1–4.
Ribes, D., and T. A. Finholt (2007): Tensions across the scales: Planning infrastructure for the long-term. Proceedings of the 2007 International ACM SIGGROUP Conference on Supporting Group Work, Sanibel Island, Florida, USA, Sanibel Island, Florida, Association for Computing Machinery. pp. 229–238.
Ribes, D., and C. P. Lee (2010): Sociotechnical Studies of Cyberinfrastructure and e-Research: Current Themes and Future Trajectories. Computer Supported Cooperative Work, vol. 19, nos. 3–4, pp. 231–244.
Segal, J. (2005): When software engineers met research scientists: A case study. Empirical Software Engineering, vol. 10, pp. 517–536.
Segal, J. (2009): Software Development Cultures and Cooperation Problems: A Field Study of the Early Stages of Development of Software for a Scientific Community. Computer Supported Cooperative Work, vol. 18, no. 5–6, pp. 581–606.
Shrum, W., J. Genuth, and I. Chompalov (2007): Structures of Scientific Collaboration. Cambridge, MA: MIT Press.
Star, S. L., and J. Griesemer (1989): Institutional ecology, "translations," and boundary objects: Amateurs and professionals in Berkeley's Museum of Vertebrate Zoology, 1907-1939. Social Studies of Science, vol. 19, no. 3, pp. 387–420.
Sutton, C. (2003): UCLA Develops Mobile Sensing System for Enriched Monitoring of the Environment. UCLA. Los Angeles, CA.
Szewczyk, R., E. Osterweil, J. Polastre, M. Hamilton, A. Mainwaring, and D. Estrin (2004): Habitat monitoring with sensor networks. Communications of the ACM, vol. 47, no. 6, pp. 34–40.
Traweek, S. (1992): Beamtimes and Lifetimes: The World of High Energy Physicists (1st Harvard University Press pbk. ed.). Cambridge, Mass.: Harvard University Press.
Traweek, S. (2004): Generating high energy physics in Japan. In Kaiser, D. (Ed.). Pedagogy and Practice in Physics. Chicago, University of Chicago Press.
Turner, W., G. C. Bowker, L. Gasser, and M. Zacklad (2006): Information Infrastructures for Distributed Collective Practices. Computer Supported Cooperative Work, vol. 15, pp. 93–110.
U.S. Long Term Ecological Research Network (2010): http://lternet.edu/. Accessed 20 August 2010.
Voorhees, E. M. (2007): TREC: Continuing information retrieval's tradition of experimentation. Communications of the ACM, vol. 50, no. 11, pp. 51–54.
Voorhees, E. M., and D. K. Harman (eds.). (2005): TREC: Experiment and Evaluation in Information Retrieval. Cambridge, MA: MIT Press.
Wallis, J. C., C. L. Borgman, M. S. Mayernik, and A. Pepe (2008a): Moving archival practices upstream: An exploration of the life cycle of ecological sensing data in collaborative field research. International Journal of Digital Curation, vol. 3, no. 1.
Wallis, J. C., A. Pepe, M. S. Mayernik, and C. L. Borgman (2008b): An exploration of the life cycle of eScience collaboratory data. iConference 2008, Los Angeles, CA. http://hdl.handle.net/2142/15122.
Wallis, J. C., M. S. Mayernik, C. L. Borgman, and A. Pepe (2010): Digital Libraries for Scientific Data Discovery and Reuse: From Vision to Practical Reality. Joint Conference on Digital Libraries, Gold Coast, Queensland, Australia, Association for Computing Machinery.
Acknowledgements
Research reported here is supported in part by grants from the National Science Foundation (NSF): (1) The Center for Embedded Networked Sensing (CENS) is funded by NSF Cooperative Agreement #CCR-0120778, Deborah L. Estrin, UCLA, Principal Investigator; (2) CENS Education Infrastructure (CENSEI), under which much of this research was conducted, is funded by National Science Foundation grant #ESI-0352572, William A. Sandoval, Principal Investigator and Christine L. Borgman, co-Principal Investigator. (3) Towards a Virtual Organization for Data Cyberinfrastructure, #OCI-0750529, C.L. Borgman, UCLA, PI; G. Bowker, Santa Clara University, Co-PI; Thomas Finholt, University of Michigan, Co-PI; (4) Monitoring, Modeling & Memory: Dynamics of Data and Knowledge in Scientific Cyberinfrastructures: #0827322, P.N. Edwards, UM, PI; Co-PIs C.L. Borgman, UCLA; G. Bowker, SCU; T. Finholt, UM; S. Jackson, UM; D. Ribes, Georgetown; S.L. Star, SCU.
We also are grateful to Microsoft Technical Computing and External Research for gifts in support of this research program. The authors would also like to thank Archer Batcheller, David Fearon, George Mood, Alberto Pepe, Katie Shilton, Elizabeth Rolando, and Laura Wynholds for their thoughtful comments on prior drafts of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Borgman, C.L., Wallis, J.C. & Mayernik, M.S. Who’s Got the Data? Interdependencies in Science and Technology Collaborations. Comput Supported Coop Work 21, 485–523 (2012). https://doi.org/10.1007/s10606-012-9169-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10606-012-9169-z