Computer Supported Cooperative Work (CSCW)

, Volume 21, Issue 6, pp 485–523

Who’s Got the Data? Interdependencies in Science and Technology Collaborations

  • Christine L. Borgman
  • Jillian C. Wallis
  • Matthew S. Mayernik
Article

Abstract

Science and technology always have been interdependent, but never more so than with today’s highly instrumented data collection practices. We report on a long-term study of collaboration between environmental scientists (biology, ecology, marine sciences), computer scientists, and engineering research teams as part of a five-university distributed science and technology research center devoted to embedded networked sensing. The science and technology teams go into the field with mutual interests in gathering scientific data. “Data” are constituted very differently between the research teams. What are data to the science teams may be context to the technology teams, and vice versa. Interdependencies between the teams determine the ability to collect, use, and manage data in both the short and long terms. Four types of data were identified, which are managed separately, limiting both reusability of data and replication of research. Decisions on what data to curate, for whom, for what purposes, and for how long, should consider the interdependencies between scientific and technical processes, the complexities of data collection, and the disposition of the resulting data.

Key words

cyberinfrastructure data curation data practices escience scientific collaboration, scientific software development technology research sensor networks environmental sciences 

References

  1. Aronova, E., K. S. Baker, and N. Oreskes (2010): Big Science and Big Data in Biology: From the International Geophysical Year through the International Biological Program to the Long Term Ecological Research (LTER) Network, 1957–Present. Historical Studies in the Natural Sciences, vol. 40, no. 2, pp. 183–224.CrossRefGoogle Scholar
  2. Arzberger, P., P. Schroeder, A. Beaulieu, G. C. Bowker, K. Casey, L. Laaksonen, D. Moorman, P. F. Uhlir, and P. Wouters (2004a): An International Framework to Promote Access to Data. Science, vol. 303, no. 5665, pp. 1777–1778.CrossRefGoogle Scholar
  3. Arzberger, P., P. Schroeder, A. Beaulieu, G. C. Bowker, K. Casey, L. Laaksonen, D. Moorman, P. F. Uhlir, and P. Wouters (2004b): Promoting Access to Public Research Data for Scientific, Economic, and Social Development. Data Science Journal, vol. 3, pp. 135–152.Google Scholar
  4. Basili, V. R., and M. V. Zelkowitz (2007): Empirical studies to build a science of computer science. Communications of the ACM, vol. 50, no. 11, pp. 33–37.CrossRefGoogle Scholar
  5. Batalin, M. A., M. Rahimi, Y. Yu, D. Liu, A. Kansal, G. S. Sukhatme, W. J. Kaiser, M. Hansen, G. J. Pottie, M. Srivastava, and D. Estrin (2004): Call and Response: Experiments in Sampling the Environment. Proceedings of the 2nd international conference on Embedded networked sensor systems, Los Angeles. New York, NY: ACM Press. pp. 25–38.Google Scholar
  6. Berman, H. M., J. Westbrook, J. Feng, G. Gilliland, T. N. Bhat, H. Wessig, I. N. Shindyalov, and P. E. Bourne (2000): The Protein Data Bank. Nucleic Acids Research, vol. 28, pp. 235–242.CrossRefGoogle Scholar
  7. Borgman, C. L. (2007): Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, MA: MIT Press.Google Scholar
  8. Borgman, C. L. (2012): The conundrum of sharing research data. Journal of the American Society for Information Science and Technology. http://dx.doi.org/10.1002/asi.22634.
  9. Borgman, C. L., J. C. Wallis, and N. Enyedy (2006a): Building digital libraries for scientific data: An exploratory study of data practices in habitat ecology. 10th European Conference on Digital Libraries, Alicante, Spain. Berlin: Springer. pp. 170–183.Google Scholar
  10. Borgman, C. L., J. C. Wallis, N. Enyedy, and M. S. Mayernik (2006b): Capturing habitat ecology in reusable forms: A case study with embedded networked sensor technology. Annual Meeting of the Society for the Social Studies of Science, Vancouver, BC. http://works.bepress.com/borgman/226/.
  11. Borgman, C. L., J. C. Wallis, and N. Enyedy (2007a): Little Science confronts the data deluge: Habitat ecology, embedded sensor networks, and digital libraries. International Journal on Digital Libraries, vol. 7, nos. 1–2, pp. 17–30.Google Scholar
  12. Borgman, C. L., J. C. Wallis, M. S. Mayernik, and A. Pepe (2007b): Drowning in data: Digital library architecture to support scientific use of embedded sensor networks. Joint Conference on Digital Libraries, Vancouver, British Columbia, Canada. Association for Computing Machinery. pp. 269–277.Google Scholar
  13. Bourne, P. (2005): Will a biological database be different from a biological journal? PLoS Computational Biology, vol. 1, no. 3, pp. e34.MathSciNetCrossRefGoogle Scholar
  14. Bowen, G. M., and W.-M. Roth (2007): The practice of field ecology: Insights for science education. Research in Science Education, vol. 37, no. 2, pp. 171–187.CrossRefGoogle Scholar
  15. Carver, J., L. Hochstein, R. Kendall, T. Nakamura, M. Zelkowitz, V., Basili, and D. Post (2006): Observations about software development for high-end computing. Cyberinfrastructure Technology Watch Quarterly, vol. 2, no. 4a, pp. 33–38.Google Scholar
  16. Chen, J. C., J. Elson, H. Wang, D. Maniezzo, R. E. Hudson, K., Yao, and D. Estrin (2003): Coherent Acoustic Array Processing and Localization on Wireless Sensor Networks. Proceedings of the IEEE, vol. 91, no. 8, pp. 1154–1162.CrossRefGoogle Scholar
  17. Cole, F. T. H. (2008): Taking “Data” (as a Topic): The Working Policies of Indifference, Purification and Differentiation. 19th Australasian Conference on Information Systems, Christchurch, NZ. pp. 240–249.Google Scholar
  18. Collins, H. M. (1975): The seven sexes: A study in the sociology of a phenomenon, or the replication of experiments in physics. Sociology, vol. 9, pp. 205–224.CrossRefGoogle Scholar
  19. Collings, H. M. (1998): The Meaning of Data: Open and Closed Evidential Cultures in the Search for Gravitational Waves. American Journal of Sociology, vol. 104, no. 2, pp. 293–338.CrossRefGoogle Scholar
  20. Cragin, M. H., and K. Shankar (2006): Scientific data collections and distributed collective practice. Computer Supported Cooperative Work, vol. 15, pp. 185–204.CrossRefGoogle Scholar
  21. National Science Foundation (2007): Cyberinfrastructure Vision for 21st Century Discovery. http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf.
  22. de Souza, C., J. Froehlich, and P. Dourish (2005): Seeking the source: software source code as a social and technical artifact. Proceedings of the 2005 international ACM SIGGROUP Conference, Sanibel Island, Florida, Association for Computing Machinery. pp. 197–206.Google Scholar
  23. Easterbrook, S. M., and T. C. Johns (2009): Engineering the software for understanding climate change. Computing in Science & Engineering, vol. 11, no. 6, pp. 64–74.CrossRefGoogle Scholar
  24. Edwards, P. N., M. S. Mayernik, A. L. Batcheller, G. C. Bowker, and C. L. Borgman (2011): Science Friction: Data, Metadata, and Collaboration. Social Studies of Science, vol. 41, no. 5, pp. 667–690.CrossRefGoogle Scholar
  25. Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers (2001): Washington, D.C.: National Academy Press. http://www.nap.edu/openbook.php?record_id=10193.
  26. Estrin, D., W. K. Michener, and G. Bonito (2003): Environmental cyberinfrastructure needs for distributed sensor networks: A report from a National Science Foundation sponsored workshop. Scripps Institute of Oceanography. http://www.lternet.edu/sensor_report/.
  27. Faniel, I. M., and T. E. Jacobsen (2010): Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues’ Data. Journal of Computer-Supported Cooperative Work, vol. 19, nos. 3–4, pp. 355–375.Google Scholar
  28. Fry, J. (2006): Scholarly research and information practices: A domain analytic approach. Information Processing and Management, vol. 2006, no. 42, pp. 299–316.CrossRefGoogle Scholar
  29. GEON. (2010): http://www.geongrid.org/. Accessed 20 August 2010.
  30. Giere, R. N. (1999): Science without Laws. Chicago: University of Chicago Press.Google Scholar
  31. Hamilton, M. P., E. A. Graham, P. W. Rundel, M. F. Allen, W. Kaiser, M. H. Hansen, and D. L. Estrin (2007): New Approaches in Embedded Networked Sensing for Terrestrial Ecological Observatories. Environmental Engineering Science, vol. 24, no. 2.Google Scholar
  32. Harmon, T. C., R. F. Ambrose, R. M. Gilbert, J. C. Fisher, M. Stealey, and W. J. Kaiser (2007): High-Resolution River Hydraulic and Water Quality Characterization Using Rapidly Deployable Networked Infomechanical Systems (NIMS RD). Environmental Engineering Science, vol. 24, no. 2, pp. 151–159.CrossRefGoogle Scholar
  33. Jackson, S. J., D. Ribes, and A. Buyuktur (2010): Exploring Collaborative Rhythm: Temporal Flow and Alignment in Collaborative Scientific Work. iConference 2010, Urbana-Champagne, IL. http://www.ideals.illinois.edu/handle/2142/14955.
  34. Jirotka, M., R. Procter, T. Rodden, and G. C. Bowker (2006): Special Issue: Collaboration in e-Research. Computer Supported Cooperative Work, vol. 15, pp. 251–255.CrossRefGoogle Scholar
  35. Kanfer, A. G., C. Haythornthwaite, B. C. Bruce, G. C. Bowker, N. C. Burbules, J. F. Porac, and J. Wade (2000): Modeling distributed knowledge processes in next generation multidisciplinary alliances. Information Systems Frontiers, vol. 2, nos. 3–4, pp. 317–331.Google Scholar
  36. Karasti, H., K. S. Baker, and E. Halkola (2006): Enriching the notion of data curation in e-Science: Data managing and information infrastructuring in the Long Term Ecological Research (LTER) Network. Journal of Computer-Supported Cooperative Work, vol. 15, no. 4, pp. 321–358.CrossRefGoogle Scholar
  37. Karasti, H., K. S. Baker, and F. Millerand (2010): Infrastructure Time: Long-term Matters in Collaborative Development. Computer Supported Cooperative Work, vol. 19, nos. 3–4, pp. 377–415.Google Scholar
  38. Kwa, C. (2005): Local ecologies and global science: Discourses and strategies of the International Geosphere-Biosphere Programme. Social Studies of Science, vol. 35, no. 6, pp. 923–950.CrossRefGoogle Scholar
  39. Latour, B. (1987): Science in Action: How to Follow Scientists and Engineers through Society. Cambridge, MA: Harvard University Press.Google Scholar
  40. Lawrence, K. A. (2006): Walking the Tightrope: The Balancing Acts of a Large e-Research Project. Computer Supported Cooperative Work, vol. 15, pp. 385–411.MathSciNetCrossRefGoogle Scholar
  41. Lee, C. P., P. Dourish, and G. Mark (2006): The human infrastructure of cyberinfrastructure. Proceedings of the Conference on Computer-Supported Cooperative Work, Banff, Alberta, Association for Computing Machinery. pp. 483–492.Google Scholar
  42. Lee, C. P., D. Ribes, M. Bietz, M. Jirotka, and H. Karasti (2010): Supporting Scientific Collaboration Through Cyberinfrastructure and e-Science: Special issue. Computer Supported Cooperative Work, vol. 19, nos. 3–4.Google Scholar
  43. Long-Lived Digital Data Collections (2005): National Science Board.Google Scholar
  44. Maurer, B. A. (2004): Models of Scientific Inquiry and Statistical Practice: Implications for the structure of scientific knowledge. In Taper, M. L., and Lele, S. R. (Eds.). The Nature of Scientific Evidence: Statistical, philosophical, and empirical considerations. Chicago, London, The University of Chicago Press, pp. 17–50.Google Scholar
  45. Mayernik, M. S. (2011): Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators. PhD Dissertation. Information Studies. UCLA. Los Angeles, CA.Google Scholar
  46. Mayernik, M. S., A. L. Batcheller, and C. L. Borgman (2011): How Institutional Factors Influence the Creation of Scientific Metadata. iConference, Seattle, WA, Association for Computing Machinery.Google Scholar
  47. Mayernik, M. S., J. C. Wallis, and C. L. Borgman (in review): Unearthing the infrastructure: Humans and sensors in environmental and ecological field research. Google Scholar
  48. Mun, M., S. Reddy, K. Shilton, N. Yau, J. Burke, D. Estrin, M. Hansen, E. Howard, R. West, and P. Boda (2009): PEIR, the Personal Environmental Impact Report, as a Platform for Participatory Sensing Systems Research. Proceedings of the 7th International Conference on Mobile Systems, Applications, and Service, Krakow, Poland. pp. 55–68.Google Scholar
  49. National Ecological Observatory Network (2010): http://www.neoninc.org/. Accessed 20 August 2010.
  50. NIMS: Networked Infomechanical Systems (2006): http://www.cens.ucla.edu/portal/nims. Accessed 3 October 2006.
  51. Pepe, A. (2010): Structure and Evolution of Scientific Collaboration Networks in a Modern Research Collaboratory. Doctoral. Information Studies. UCLA. Los Angeles, CA.Google Scholar
  52. Pepe, A., and M. A. Rodriguez (2010): Collaboration in sensor network research: an in-depth longitudinal analysis of assortative mixing patterns. Scientometrics, vol. 84, no. 3, pp. 687–701.CrossRefGoogle Scholar
  53. Pon, R., M. Maxim Batalin, J. Gordon, M. H. Rahimi, W. Kaiser, G. S. Sukhatme, M. Srivastava, and D. Estrin (2005): Networked Infomechanical Systems: A Mobile Wireless Sensor Network Platform. IEEE/ACM Fourth International Conference on Information Processing in Sensor Networks (IPSN-SPOTS). pp. 376–381.Google Scholar
  54. Protein Data Bank (2006): http://www.rcsb.org/pdb/. Accessed 4 October 2006.
  55. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases (1999): Washington, DC: National Academy Press.Google Scholar
  56. Rahimi, M. H., W. Kaiser, G. S. Sukhatme, and D. Estrin (2005): Adaptive sampling for environmental field estimation using robotic sensors. IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 3692–3698.Google Scholar
  57. Renear, A. H., S. Sacchi, and K. M. Wickett (2010): Definitions of Dataset in the Scientific and Technical Literature. Proceedings of the Annual Meeting of the American Society for Information Science and Technology, vol. 47, no. 1, pp. 1–4.CrossRefGoogle Scholar
  58. Ribes, D., and T. A. Finholt (2007): Tensions across the scales: Planning infrastructure for the long-term. Proceedings of the 2007 International ACM SIGGROUP Conference on Supporting Group Work, Sanibel Island, Florida, USA, Sanibel Island, Florida, Association for Computing Machinery. pp. 229–238.Google Scholar
  59. Ribes, D., and C. P. Lee (2010): Sociotechnical Studies of Cyberinfrastructure and e-Research: Current Themes and Future Trajectories. Computer Supported Cooperative Work, vol. 19, nos. 3–4, pp. 231–244.Google Scholar
  60. Segal, J. (2005): When software engineers met research scientists: A case study. Empirical Software Engineering, vol. 10, pp. 517–536.CrossRefGoogle Scholar
  61. Segal, J. (2009): Software Development Cultures and Cooperation Problems: A Field Study of the Early Stages of Development of Software for a Scientific Community. Computer Supported Cooperative Work, vol. 18, no. 5–6, pp. 581–606.CrossRefGoogle Scholar
  62. Shrum, W., J. Genuth, and I. Chompalov (2007): Structures of Scientific Collaboration. Cambridge, MA: MIT Press.Google Scholar
  63. Star, S. L., and J. Griesemer (1989): Institutional ecology, "translations," and boundary objects: Amateurs and professionals in Berkeley's Museum of Vertebrate Zoology, 1907-1939. Social Studies of Science, vol. 19, no. 3, pp. 387–420.CrossRefGoogle Scholar
  64. Sutton, C. (2003): UCLA Develops Mobile Sensing System for Enriched Monitoring of the Environment. UCLA. Los Angeles, CA.Google Scholar
  65. Szewczyk, R., E. Osterweil, J. Polastre, M. Hamilton, A. Mainwaring, and D. Estrin (2004): Habitat monitoring with sensor networks. Communications of the ACM, vol. 47, no. 6, pp. 34–40.CrossRefGoogle Scholar
  66. Traweek, S. (1992): Beamtimes and Lifetimes: The World of High Energy Physicists (1st Harvard University Press pbk. ed.). Cambridge, Mass.: Harvard University Press.Google Scholar
  67. Traweek, S. (2004): Generating high energy physics in Japan. In Kaiser, D. (Ed.). Pedagogy and Practice in Physics. Chicago, University of Chicago Press.Google Scholar
  68. Turner, W., G. C. Bowker, L. Gasser, and M. Zacklad (2006): Information Infrastructures for Distributed Collective Practices. Computer Supported Cooperative Work, vol. 15, pp. 93–110.CrossRefGoogle Scholar
  69. U.S. Long Term Ecological Research Network (2010): http://lternet.edu/. Accessed 20 August 2010.
  70. Voorhees, E. M. (2007): TREC: Continuing information retrieval's tradition of experimentation. Communications of the ACM, vol. 50, no. 11, pp. 51–54.CrossRefGoogle Scholar
  71. Voorhees, E. M., and D. K. Harman (eds.). (2005): TREC: Experiment and Evaluation in Information Retrieval. Cambridge, MA: MIT Press.Google Scholar
  72. Wallis, J. C., C. L. Borgman, M. S. Mayernik, and A. Pepe (2008a): Moving archival practices upstream: An exploration of the life cycle of ecological sensing data in collaborative field research. International Journal of Digital Curation, vol. 3, no. 1.Google Scholar
  73. Wallis, J. C., A. Pepe, M. S. Mayernik, and C. L. Borgman (2008b): An exploration of the life cycle of eScience collaboratory data. iConference 2008, Los Angeles, CA. http://hdl.handle.net/2142/15122.
  74. Wallis, J. C., M. S. Mayernik, C. L. Borgman, and A. Pepe (2010): Digital Libraries for Scientific Data Discovery and Reuse: From Vision to Practical Reality. Joint Conference on Digital Libraries, Gold Coast, Queensland, Australia, Association for Computing Machinery.Google Scholar

Copyright information

© Springer 2012

Authors and Affiliations

  • Christine L. Borgman
    • 1
  • Jillian C. Wallis
    • 1
  • Matthew S. Mayernik
    • 2
  1. 1.Department of Information Studies and Center for Embedded Networked SensingUniversity of California, Los AngelesLos AngelesUSA
  2. 2.National Center for Atmospheric ResearchBoulderUSA

Personalised recommendations