Archival Science

, Volume 12, Issue 3, pp 319–339 | Cite as

Provenance and credibility in scientific data repositories

  • Kathleen Fear
  • Devan Ray Donaldson
Original Paper


Despite a long history of rich theoretical work on provenance, empirical research regarding users’ interactions with and judgments based upon provenance information in archives with scientific data is extremely limited. This article focuses on the relationship between provenance and credibility (i.e., trustworthiness and expertise) for scientists. Toward this end, the authors conducted semi-structured interviews with seventeen proteomics researchers who interact with data from, a large online repository. To analyze the resulting interview data, the authors apply Brian Hilligoss and Soo Young Rieh’s empirically tested theoretical framework for user credibility assessment. Findings from this study suggest that together with other information provided in and subjects’ own experiences and prior knowledge, provenance allows users to determine the credibility of datasets. Implications of this study stress the importance of the archival perspective of provenance and archival bond for aiding scientists in their credibility assessments of data housed in scientific data repositories.


Provenance Credibility Scientific data Metadata 



This material is based upon work supported by the National Science Foundation under Grant No. 090362. The authors would like to thank Ann Zimmerman and Margaret Hedstrom for their guidance on the development of this project, Elizabeth Yakel, and members of the Archives Research Group for their feedback on earlier drafts of this paper, as well as Philip Andrews and the staff of for their help and support.


  1. Bazeley P (2007) Qualitative data analysis with NVivo. Sage, Los AngelesGoogle Scholar
  2. Bearman DA, Lytle RH (1985) The power of the principle of provenance. Archivaria 21:14–27. Accessed 28 July 2011Google Scholar
  3. Bertino E, Dai C, Kantarcioglu M (2009) The challenge of assuring data trustworthiness. In: Proceedings of the 14th International Conference on Database Systems for Advanced Applications. doi: 10.1007/978-3-642-00887-0_2
  4. Bose R, Frew J (2005) Lineage retrieval for scientific data processing: a survey. ACM Comput Surv 37(1):1–28. doi: 10.1145/1057977.1057978 CrossRefGoogle Scholar
  5. Bowers S, McPhillips T, Ludäscher B, Cohen S, Davidson SB (2006) A model for user-oriented data provenance in pipelined scientific workflows. In: Proceedings of the International Provenance and Annotation Workshop. Accessed 28 July 2011
  6. Bowker GC (2005) Memory practices in the sciences. Inside technology. MIT Press, Cambridge, MAGoogle Scholar
  7. Brothman B (1991) Orders of value: probing the theoretical terms of archival practice. Archivaria 32:78–100. Accessed 28 July 2011Google Scholar
  8. Buneman P, Khanna S, Tan WC (2001) why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory. Accessed 28 July 2011
  9. Buneman P, Chapman A, Cheney J (2006) Provenance management in curated databases. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. doi: 10.1145/1142473.1142534
  10. Caplan P (2009) Understanding PREMIS. Accessed 28 July, 2011
  11. Cook T (1993) The concept of the archival fonds: theory, description, and provenance in the post-custodial era. Archivaria 35:24–37. Accessed 28 July 2011Google Scholar
  12. Cook T (2001) Archival science and postmodernism: new formulations for old concepts. Arch Sci 1:3–24. doi: 10.1007/BF02435636 CrossRefGoogle Scholar
  13. Corti L (2007) Re-using archived qualitative data—where, how and why? Arch Sci 7:37–54. doi: 10.1007/s10502-006-9038-y CrossRefGoogle Scholar
  14. Dai C, Lin D, Bertino E, Kantarcioglu M (2008) An approach to evaluate data trustworthiness based on data provenance. In: Jonker W, Petković M (eds) Secure data management. Lecture notes in computer science 5159:82–89. doi:  10.1007/978-3-540-85259-9_6
  15. Duranti L (1997) The archival bond. Arch Mus Inform 11:213–218. doi: 10.1023/A:1009025127463 CrossRefGoogle Scholar
  16. Duranti L (2001) The impact of digital technology on archival science. Arch Sci 1:39–55. doi: 10.1007/BF02435638 CrossRefGoogle Scholar
  17. Greenwood M, Goble C, Stevens R, Zhao J, Addis M, Marvin D, Moreau L et al (2003) Proceedings of the UK e-Science All Hands Meeting. doi:
  18. Heinis T, Alonso G (2008) Efficient lineage tracking for scientific workflows. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. doi:  10.1145/1376616.1376716
  19. Hilligoss B, Rieh SY (2008) Developing a unifying framework of credibility assessment: construct, heuristics, and interaction in context. Inform Process Manag 44(4):1467–1484. doi: 10.1016/j.ipm.2007.10.001 CrossRefGoogle Scholar
  20. Lauriault T, Craig B, Taylor D, Pulsifer P (2007) Today’s data are part of tomorrow’s research: archival issues in the sciences. Archivaria 64:123–179. Accessed 28 July 2011Google Scholar
  21. PREMIS Editorial Committee (2008) PREMIS data dictionary for preservation metadata version 2.0. Library of Congress, Washington, DC. Accessed 28 July 2011
  22. Rieh SY (2002) Judgment of information quality and cognitive authority in the Web. J Am Soc Inform Sci Technol 53(2):145. doi: 10.1002/asi.10017.abs CrossRefGoogle Scholar
  23. Rodriguez H, Andrews P, Kinsinger C (2010) Share the (Proteomics) data. Bio-IT World, (September–October 2010). Accessed 28 July 2011
  24. Shankar K (2007) Order from chaos: the poetics and pragmatics of scientific recordkeeping. J Am Soc Inform Sci Technol 58(10):1457–1466. doi: 10.1002/asi.20625 CrossRefGoogle Scholar
  25. Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance in e-science. ACM SIGMOD Rec 34(3):31. doi: 10.1145/1084805.1084812 CrossRefGoogle Scholar
  26. Smit E (2011) Abelard and Héloise: why data and publications belong together. D-Lib Mag 17(1/2). doi: 10.1045/january2011-smit
  27. Society of American Archivists (2004) Describing archives: a content standard. Society of American Archivists, Chicago, ILGoogle Scholar
  28. Taylor CF, Paton NW, Lilley KS et al (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25:887–893. doi: 10.1038/nbt1329 CrossRefGoogle Scholar
  29. Van House NA (2002) Digital libraries and practices of trust: networked biodiversity information. Soc Epistemol 16(1):99–114. doi: 10.1080/02691720210132833 CrossRefGoogle Scholar
  30. Vardigan M, Whiteman C (2007) ICPSR meets OAIS: applying the OAIS reference model to the social science archive context. Arch Sci 7:73–87. doi: 10.1007/s10502-006-9037-z CrossRefGoogle Scholar
  31. Zimmerman AS (2008) New knowledge from old data: the role of standards in the sharing and reuse of ecological data. Sci Technol Human Values 33(5):631–652. doi: 10.1177/0162243907306704 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.School of InformationUniversity of MichiganAnn ArborUSA

Personalised recommendations