Advertisement

Surfacing Data Change in Scientific Work

  • Drew PaineEmail author
  • Lavanya Ramakrishnan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11420)

Abstract

Data are essential products of scientific work that move among and through research infrastructures over time. Data constantly changes due to evolving practices and knowledge, requiring improvisational work by scientists to determine the effects on analyses. Today for end users of datasets much of the information about changes, and the processes leading to them, is invisible—embedded elsewhere in the work of a collaboration. Simultaneously scientists use increasing quantities of data, making ad hoc approaches to identifying change difficult to scale effectively. Our research investigates data change by examining how scientists make sense of change in datasets being created and sustained by the collaborative infrastructures they engage with. We examine two forms of change, before examining how trust and project rhythms influence a scientist’s notion that the newest available data are the best. We explore the opportunity to design tools and practices to support user examinations of data change and surface key provenance information embedded in research infrastructures.

Keywords

Data change Invisible work Research infrastructures 

Notes

Acknowledgements

The authors thank the members of the Deduce project, the study participants, and the anonymous reviewers of this work. This work is supported by the U.S. Department of Energy, Office of Science and Office of Advanced Scientific Computing Research (ASCR) under Contract No. DE-AC02-05CH11231.

References

  1. 1.
    Birnholtz, J.P., Bietz, M.J.: Data at work: supporting sharing in science and engineering. In: Proceedings of the 2003 International ACM SIGGROUP Conference on Supporting Group Work, GROUP 2003, pp. 339–348. ACM, New York (2003).  https://doi.org/10.1145/958160.958215
  2. 2.
    Borgman, C.L.: Big Data, Little Data, No Data: Scholarship in the Networked World. MIT Press, Cambridge (2015)CrossRefGoogle Scholar
  3. 3.
    Dourish, P., Gómez Cruz, E.: Datafication and data fiction: narrating data and narrating with data. Big Data Soc. 5(2) (2018).  https://doi.org/10.1177/2053951718784083
  4. 4.
    Edwards, P.N.: A Vast Machine: Computer Models, Climate Data, and the Politics of Global. MIT Press, Cambridge (2010)Google Scholar
  5. 5.
    Edwards, P.N., Jackson, S.J., Bowker, G.C., Knobel, C.P.: Understanding infrastructure: dynamics, tensions, and design. Workshop report, University of Mighican (2007). http://hdl.handle.net/2027.42/49353
  6. 6.
    Edwards, P.N., Mayernik, M.S., Batcheller, A.L., Bowker, G.C., Borgman, C.L.: Science friction: data, metadata, and collaboration. Soc. Stud. Sci. 41(5), 667–690 (2011).  https://doi.org/10.1177/0306312711413314CrossRefGoogle Scholar
  7. 7.
    Faniel, I., Jacobsen, T.: Reusing scientific data: How earthquake engineering researchers assess the reusability of colleagues’ data. Comput. Support. Coop. Work (CSCW) 19(3), 355–375 (2010).  https://doi.org/10.1007/s10606-010-9117-8CrossRefGoogle Scholar
  8. 8.
    Gerson, E.M.: Reach, Bracket, and the Limits of Rationalized Coordination: Some Challenges for CSCW Resources, Co-Evolution and Artifacts, Computer Supported Cooperative Work, pp. 193–220. Springer, London (2008).  https://doi.org/10.1007/978-1-84628-901-9
  9. 9.
    Gitelman, L., Jackson, V.: Introduction. In: Gitelman, L. (ed.) “Raw Data” is an Oxymoron. Infrastructure Series, pp. 1–14. MIT Press, Cambridge (2013)Google Scholar
  10. 10.
    Jirotka, M., Lee, C.P., Olson, G.M.: Supporting scientific collaboration: methods, tools and concepts. Comput. Support. Coop. Work (CSCW) 22(4–6), 667–715 (2013).  https://doi.org/10.1007/s10606-012-9184-0CrossRefGoogle Scholar
  11. 11.
    Karasti, H., Blomberg, J.: Studying infrastructuring ethnographically. Comput. Support. Coop. Work 27(2), 233–265 (2018).  https://doi.org/10.1007/s10606-017-9296-7CrossRefGoogle Scholar
  12. 12.
    Kitchin, R.: The Data Revolution: Big Data, Open Data, Data Infrastructures and their Consequences. Sage, London (2014)Google Scholar
  13. 13.
    Paine, D., Lee, C.P.: Who has plots? contextualizing scientific software, practice, and visualizations. In: Proceedings of the ACM on Human-Computer Interaction 1(CSCW) (2017).  https://doi.org/10.1145/3134720
  14. 14.
    Paine, D., Sy, E., Piell, R., Lee, C.P.: Examining data processing work as part of the scientific data lifecycle: Comparing practices across four scientific research groups. In: iConference 2015 (2015). http://hdl.handle.net/2142/73644
  15. 15.
    Pipek, V., Karasti, H., Bowker, G.C.: A preface to ‘infrastructuring and collaborative design’. Comput. Support. Coop. Work (CSCW) 26(1), 1–5 (2017).  https://doi.org/10.1007/s10606-017-9271-3CrossRefGoogle Scholar
  16. 16.
    Plantin, J.C.: Data cleaners for pristine datasets: visibility and invisibility of data processors in social science. Sci. Technol. Hum. Values 44(1), 52–73 (2019).  https://doi.org/10.1177/0162243918781268CrossRefGoogle Scholar
  17. 17.
    Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)Google Scholar
  18. 18.
    Rawson, K., Munoz, T.: Against cleaning. Curating Menus 6 (2016). http://curatingmenus.org/articles/against-cleaning/
  19. 19.
    Rolland, B., Lee, C.P.: Beyond trust and reliability: reusing data in collaborative cancer epidemiology research. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, CSCW 2013, pp. 435–444. ACM, New York (2013).  https://doi.org/10.1145/2441776.2441826
  20. 20.
    Star, S.L., Ruhleder, K.: Steps toward an ecology of infrastructure: design and access for large information spaces. Inf. Syst. Res. 7(1), 24 (1996)CrossRefGoogle Scholar
  21. 21.
    Star, S.L., Strauss, A.: Layers of silence, arenas of voice: the ecology of visible and invisible work. Comput. Support. Coop. Work (CSCW) 8, 9–30 (1999)CrossRefGoogle Scholar
  22. 22.
    Stodden, V., et al.: Enhancing reproducibility for computational methods. Science 354(6317), 1240–1241 (2016).  https://doi.org/10.1126/science.aah6168CrossRefGoogle Scholar
  23. 23.
    Strauss, A.: The articulation of project work: an organizational process. Sociol. Q. 29(2), 163–178 (1988)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Thomer, A.K., Wickett, K.M., Baker, K.S., Fouke, B.W., Palmer, C.L.: Documenting provenance in noncomputational workflows: research process models based on geobiology fieldwork in yellowstone national park. J. Assoc. Inform. Sci. Technol. 69(10), 1234–1245 (2018).  https://doi.org/10.1002/asi.24039CrossRefGoogle Scholar
  25. 25.
    Vertesi, J., Dourish, P.: The value of data: considering the context of production in data economies. In: Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work, CSCW 2011, pp. 533–542. ACM, New York (2011).  https://doi.org/10.1145/1958824.1958906
  26. 26.
    Weiss, R.S.: Learning From Strangers: The Art and Method of Qualitative Interview Studies. The Free Press, New York (1995)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Data Science and Technology DepartmentLawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations