Who’s Got the Data? Interdependencies in Science and Technology Collaborations
- 817 Downloads
Science and technology always have been interdependent, but never more so than with today’s highly instrumented data collection practices. We report on a long-term study of collaboration between environmental scientists (biology, ecology, marine sciences), computer scientists, and engineering research teams as part of a five-university distributed science and technology research center devoted to embedded networked sensing. The science and technology teams go into the field with mutual interests in gathering scientific data. “Data” are constituted very differently between the research teams. What are data to the science teams may be context to the technology teams, and vice versa. Interdependencies between the teams determine the ability to collect, use, and manage data in both the short and long terms. Four types of data were identified, which are managed separately, limiting both reusability of data and replication of research. Decisions on what data to curate, for whom, for what purposes, and for how long, should consider the interdependencies between scientific and technical processes, the complexities of data collection, and the disposition of the resulting data.
Key wordscyberinfrastructure data curation data practices escience scientific collaboration, scientific software development technology research sensor networks environmental sciences
Research reported here is supported in part by grants from the National Science Foundation (NSF): (1) The Center for Embedded Networked Sensing (CENS) is funded by NSF Cooperative Agreement #CCR-0120778, Deborah L. Estrin, UCLA, Principal Investigator; (2) CENS Education Infrastructure (CENSEI), under which much of this research was conducted, is funded by National Science Foundation grant #ESI-0352572, William A. Sandoval, Principal Investigator and Christine L. Borgman, co-Principal Investigator. (3) Towards a Virtual Organization for Data Cyberinfrastructure, #OCI-0750529, C.L. Borgman, UCLA, PI; G. Bowker, Santa Clara University, Co-PI; Thomas Finholt, University of Michigan, Co-PI; (4) Monitoring, Modeling & Memory: Dynamics of Data and Knowledge in Scientific Cyberinfrastructures: #0827322, P.N. Edwards, UM, PI; Co-PIs C.L. Borgman, UCLA; G. Bowker, SCU; T. Finholt, UM; S. Jackson, UM; D. Ribes, Georgetown; S.L. Star, SCU.
We also are grateful to Microsoft Technical Computing and External Research for gifts in support of this research program. The authors would also like to thank Archer Batcheller, David Fearon, George Mood, Alberto Pepe, Katie Shilton, Elizabeth Rolando, and Laura Wynholds for their thoughtful comments on prior drafts of this paper.
- Aronova, E., K. S. Baker, and N. Oreskes (2010): Big Science and Big Data in Biology: From the International Geophysical Year through the International Biological Program to the Long Term Ecological Research (LTER) Network, 1957–Present. Historical Studies in the Natural Sciences, vol. 40, no. 2, pp. 183–224.CrossRefGoogle Scholar
- Arzberger, P., P. Schroeder, A. Beaulieu, G. C. Bowker, K. Casey, L. Laaksonen, D. Moorman, P. F. Uhlir, and P. Wouters (2004b): Promoting Access to Public Research Data for Scientific, Economic, and Social Development. Data Science Journal, vol. 3, pp. 135–152.Google Scholar
- Batalin, M. A., M. Rahimi, Y. Yu, D. Liu, A. Kansal, G. S. Sukhatme, W. J. Kaiser, M. Hansen, G. J. Pottie, M. Srivastava, and D. Estrin (2004): Call and Response: Experiments in Sampling the Environment. Proceedings of the 2nd international conference on Embedded networked sensor systems, Los Angeles. New York, NY: ACM Press. pp. 25–38.Google Scholar
- Borgman, C. L. (2007): Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, MA: MIT Press.Google Scholar
- Borgman, C. L. (2012): The conundrum of sharing research data. Journal of the American Society for Information Science and Technology. http://dx.doi.org/ 10.1002/asi.22634.
- Borgman, C. L., J. C. Wallis, and N. Enyedy (2006a): Building digital libraries for scientific data: An exploratory study of data practices in habitat ecology. 10th European Conference on Digital Libraries, Alicante, Spain. Berlin: Springer. pp. 170–183.Google Scholar
- Borgman, C. L., J. C. Wallis, N. Enyedy, and M. S. Mayernik (2006b): Capturing habitat ecology in reusable forms: A case study with embedded networked sensor technology. Annual Meeting of the Society for the Social Studies of Science, Vancouver, BC. http://works.bepress.com/borgman/226/.
- Borgman, C. L., J. C. Wallis, and N. Enyedy (2007a): Little Science confronts the data deluge: Habitat ecology, embedded sensor networks, and digital libraries. International Journal on Digital Libraries, vol. 7, nos. 1–2, pp. 17–30.Google Scholar
- Borgman, C. L., J. C. Wallis, M. S. Mayernik, and A. Pepe (2007b): Drowning in data: Digital library architecture to support scientific use of embedded sensor networks. Joint Conference on Digital Libraries, Vancouver, British Columbia, Canada. Association for Computing Machinery. pp. 269–277.Google Scholar
- Carver, J., L. Hochstein, R. Kendall, T. Nakamura, M. Zelkowitz, V., Basili, and D. Post (2006): Observations about software development for high-end computing. Cyberinfrastructure Technology Watch Quarterly, vol. 2, no. 4a, pp. 33–38.Google Scholar
- Cole, F. T. H. (2008): Taking “Data” (as a Topic): The Working Policies of Indifference, Purification and Differentiation. 19th Australasian Conference on Information Systems, Christchurch, NZ. pp. 240–249.Google Scholar
- National Science Foundation (2007): Cyberinfrastructure Vision for 21st Century Discovery. http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf.
- de Souza, C., J. Froehlich, and P. Dourish (2005): Seeking the source: software source code as a social and technical artifact. Proceedings of the 2005 international ACM SIGGROUP Conference, Sanibel Island, Florida, Association for Computing Machinery. pp. 197–206.Google Scholar
- Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers (2001): Washington, D.C.: National Academy Press. http://www.nap.edu/openbook.php?record_id=10193.
- Estrin, D., W. K. Michener, and G. Bonito (2003): Environmental cyberinfrastructure needs for distributed sensor networks: A report from a National Science Foundation sponsored workshop. Scripps Institute of Oceanography. http://www.lternet.edu/sensor_report/.
- Faniel, I. M., and T. E. Jacobsen (2010): Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues’ Data. Journal of Computer-Supported Cooperative Work, vol. 19, nos. 3–4, pp. 355–375.Google Scholar
- GEON. (2010): http://www.geongrid.org/. Accessed 20 August 2010.
- Giere, R. N. (1999): Science without Laws. Chicago: University of Chicago Press.Google Scholar
- Hamilton, M. P., E. A. Graham, P. W. Rundel, M. F. Allen, W. Kaiser, M. H. Hansen, and D. L. Estrin (2007): New Approaches in Embedded Networked Sensing for Terrestrial Ecological Observatories. Environmental Engineering Science, vol. 24, no. 2.Google Scholar
- Harmon, T. C., R. F. Ambrose, R. M. Gilbert, J. C. Fisher, M. Stealey, and W. J. Kaiser (2007): High-Resolution River Hydraulic and Water Quality Characterization Using Rapidly Deployable Networked Infomechanical Systems (NIMS RD). Environmental Engineering Science, vol. 24, no. 2, pp. 151–159.CrossRefGoogle Scholar
- Jackson, S. J., D. Ribes, and A. Buyuktur (2010): Exploring Collaborative Rhythm: Temporal Flow and Alignment in Collaborative Scientific Work. iConference 2010, Urbana-Champagne, IL. http://www.ideals.illinois.edu/handle/2142/14955.
- Kanfer, A. G., C. Haythornthwaite, B. C. Bruce, G. C. Bowker, N. C. Burbules, J. F. Porac, and J. Wade (2000): Modeling distributed knowledge processes in next generation multidisciplinary alliances. Information Systems Frontiers, vol. 2, nos. 3–4, pp. 317–331.Google Scholar
- Karasti, H., K. S. Baker, and F. Millerand (2010): Infrastructure Time: Long-term Matters in Collaborative Development. Computer Supported Cooperative Work, vol. 19, nos. 3–4, pp. 377–415.Google Scholar
- Latour, B. (1987): Science in Action: How to Follow Scientists and Engineers through Society. Cambridge, MA: Harvard University Press.Google Scholar
- Lee, C. P., P. Dourish, and G. Mark (2006): The human infrastructure of cyberinfrastructure. Proceedings of the Conference on Computer-Supported Cooperative Work, Banff, Alberta, Association for Computing Machinery. pp. 483–492.Google Scholar
- Lee, C. P., D. Ribes, M. Bietz, M. Jirotka, and H. Karasti (2010): Supporting Scientific Collaboration Through Cyberinfrastructure and e-Science: Special issue. Computer Supported Cooperative Work, vol. 19, nos. 3–4.Google Scholar
- Long-Lived Digital Data Collections (2005): National Science Board.Google Scholar
- Maurer, B. A. (2004): Models of Scientific Inquiry and Statistical Practice: Implications for the structure of scientific knowledge. In Taper, M. L., and Lele, S. R. (Eds.). The Nature of Scientific Evidence: Statistical, philosophical, and empirical considerations. Chicago, London, The University of Chicago Press, pp. 17–50.Google Scholar
- Mayernik, M. S. (2011): Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators. PhD Dissertation. Information Studies. UCLA. Los Angeles, CA.Google Scholar
- Mayernik, M. S., A. L. Batcheller, and C. L. Borgman (2011): How Institutional Factors Influence the Creation of Scientific Metadata. iConference, Seattle, WA, Association for Computing Machinery.Google Scholar
- Mayernik, M. S., J. C. Wallis, and C. L. Borgman (in review): Unearthing the infrastructure: Humans and sensors in environmental and ecological field research. Google Scholar
- Mun, M., S. Reddy, K. Shilton, N. Yau, J. Burke, D. Estrin, M. Hansen, E. Howard, R. West, and P. Boda (2009): PEIR, the Personal Environmental Impact Report, as a Platform for Participatory Sensing Systems Research. Proceedings of the 7th International Conference on Mobile Systems, Applications, and Service, Krakow, Poland. pp. 55–68.Google Scholar
- National Ecological Observatory Network (2010): http://www.neoninc.org/. Accessed 20 August 2010.
- NIMS: Networked Infomechanical Systems (2006): http://www.cens.ucla.edu/portal/nims. Accessed 3 October 2006.
- Pepe, A. (2010): Structure and Evolution of Scientific Collaboration Networks in a Modern Research Collaboratory. Doctoral. Information Studies. UCLA. Los Angeles, CA.Google Scholar
- Pon, R., M. Maxim Batalin, J. Gordon, M. H. Rahimi, W. Kaiser, G. S. Sukhatme, M. Srivastava, and D. Estrin (2005): Networked Infomechanical Systems: A Mobile Wireless Sensor Network Platform. IEEE/ACM Fourth International Conference on Information Processing in Sensor Networks (IPSN-SPOTS). pp. 376–381.Google Scholar
- Protein Data Bank (2006): http://www.rcsb.org/pdb/. Accessed 4 October 2006.
- A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases (1999): Washington, DC: National Academy Press.Google Scholar
- Rahimi, M. H., W. Kaiser, G. S. Sukhatme, and D. Estrin (2005): Adaptive sampling for environmental field estimation using robotic sensors. IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 3692–3698.Google Scholar
- Ribes, D., and T. A. Finholt (2007): Tensions across the scales: Planning infrastructure for the long-term. Proceedings of the 2007 International ACM SIGGROUP Conference on Supporting Group Work, Sanibel Island, Florida, USA, Sanibel Island, Florida, Association for Computing Machinery. pp. 229–238.Google Scholar
- Ribes, D., and C. P. Lee (2010): Sociotechnical Studies of Cyberinfrastructure and e-Research: Current Themes and Future Trajectories. Computer Supported Cooperative Work, vol. 19, nos. 3–4, pp. 231–244.Google Scholar
- Shrum, W., J. Genuth, and I. Chompalov (2007): Structures of Scientific Collaboration. Cambridge, MA: MIT Press.Google Scholar
- Sutton, C. (2003): UCLA Develops Mobile Sensing System for Enriched Monitoring of the Environment. UCLA. Los Angeles, CA.Google Scholar
- Traweek, S. (1992): Beamtimes and Lifetimes: The World of High Energy Physicists (1st Harvard University Press pbk. ed.). Cambridge, Mass.: Harvard University Press.Google Scholar
- Traweek, S. (2004): Generating high energy physics in Japan. In Kaiser, D. (Ed.). Pedagogy and Practice in Physics. Chicago, University of Chicago Press.Google Scholar
- U.S. Long Term Ecological Research Network (2010): http://lternet.edu/. Accessed 20 August 2010.
- Voorhees, E. M., and D. K. Harman (eds.). (2005): TREC: Experiment and Evaluation in Information Retrieval. Cambridge, MA: MIT Press.Google Scholar
- Wallis, J. C., C. L. Borgman, M. S. Mayernik, and A. Pepe (2008a): Moving archival practices upstream: An exploration of the life cycle of ecological sensing data in collaborative field research. International Journal of Digital Curation, vol. 3, no. 1.Google Scholar
- Wallis, J. C., A. Pepe, M. S. Mayernik, and C. L. Borgman (2008b): An exploration of the life cycle of eScience collaboratory data. iConference 2008, Los Angeles, CA. http://hdl.handle.net/2142/15122.
- Wallis, J. C., M. S. Mayernik, C. L. Borgman, and A. Pepe (2010): Digital Libraries for Scientific Data Discovery and Reuse: From Vision to Practical Reality. Joint Conference on Digital Libraries, Gold Coast, Queensland, Australia, Association for Computing Machinery.Google Scholar