Encoding Provenance Metadata for Social Science Datasets

  • Carl Lagoze
  • Jeremy Willliams
  • Lars Vilhuber
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 390)


Recording provenance is a key requirement for data-centric scholarship, allowing researchers to evaluate the integrity of source data sets and reproduce, and thereby, validate results. Provenance has become even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. Recent work by the W3C on the PROV model provides the foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We apply that model to complex, but characteristic, provenance examples of social science data, describe scenarios that make scholarly use of those provenance descriptions, and propose a manner for encoding this provenance metadata within the widely-used DDI metadata standard.


Metadata Provenance DDI eSocial Science 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Daw, M., Procter, R., Lin, Y., Hewitt, T., Ji, W., Voss, A., Baird, K., Turner, A., Birkin, M., Miller, K., Dutton, W., Jirotka, M., Schroeder, R., de la Flor, G., Edwards, P., Allan, R., Yang, X., Crouchley, R.: Developing an e-Infrastructure for Social Science. In: Proceedings of e-Social Science 2007 (2007)Google Scholar
  2. 2.
    Lagoze, C., Block, W., Williams, J., Abowd, J.M., Vilhuber, L.: Data Management of Confidential Data. In: International Data Curation Conference (2013)Google Scholar
  3. 3.
    Vardigan, M., Heus, P., Thomas, W.: Data Documentation Initiative: Toward a Standard for the Social Sciences. The International Journal of Digital Curation 3(1) (2008)Google Scholar
  4. 4.
    Groth, P., Moreau, L.: PROV-Overview: An Overview of the PROV Family of Documents. W3C (2013)Google Scholar
  5. 5.
    National Science Foundation, NSF Award Search: Award#1131848 - NCRN-MN: Cornell Census-NSF Research Node: Integrated Research Support, Training and Data Documentation (2011)Google Scholar
  6. 6.
    Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-science. ACM Sigmod Record (2005)Google Scholar
  7. 7.
    Cheney, J., Chong, S., Foster, N., Seltzer, M., Vansummeren, S.: Provenance. In: Proceeding of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications - OOPSLA 2009, p. 957 (2009)Google Scholar
  8. 8.
    Groth, P., Gil, Y., Cheney, J., Miles, S.: Requirements for Provenance on the Web. International Journal of Digital Curation 7(1), 39–56 (2012)CrossRefGoogle Scholar
  9. 9.
    McGuinness, D.L., Fox, P., Pinheiro da Silva, P., Zednik, S., Del Rio, N., Ding, L., West, P., Chang, C.: Annotating and embedding provenance in science data repositories to enable next generation science applications. AGU Fall Meeting Abstracts 1 (2008)Google Scholar
  10. 10.
    Moreau, L., Freire, J., Futrelle, J., McGrath, R., Myers, J., Paulson, P.: The Open Provenance Model. University of Southampton, pp. 1–30 (August 2007)Google Scholar
  11. 11.
    Moreau, L., Missier, P.: PROV-N: The Provenance Notation. W3C (2013)Google Scholar
  12. 12.
    Jarmin, R., Miranda, J.: The Longtitudinal Business Database (2002)Google Scholar
  13. 13.
    Klyne, G., Groth, P.: Provenance Access and Query. W3C (2013)Google Scholar
  14. 14.
    Lebo, T., Sahoo, S., McGuinness, D.L.: PROV-O: The PROV Ontology. W3C (2013)Google Scholar
  15. 15.
    Kramer, S., Leahey, A., Southall, H., Vampras, J., Wackerow, J.: Using RDF to describe and link social science data to related resources on the Web: leveraging the Data Documentation Initiative (DDI) model. Data Documentation Initiative (September 01, 2012)Google Scholar
  16. 16.
    Bosch, T., Cyganiak, R., Wackerow, J., Zapilko, B.: Leveraging the DDI Model for Linked Statistical Data in the Social,  Behavioural, and Economic Sciences. In: International Conference on Dublin Core and Metadata Applications; DC-2012–The Kuching Proceedings (September 2012)Google Scholar
  17. 17.
    Bosch, T., Cyganiak, R., Gregory, A., Wackerow, J.: DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data. In: Linked Data on the Web Workshop (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Carl Lagoze
    • 1
  • Jeremy Willliams
    • 2
  • Lars Vilhuber
    • 3
  1. 1.School of InformationUniversity of MichiganAnn ArborUSA
  2. 2.Cornell Institute for Social and Economic ResearchCornell UniversityIthacaUSA
  3. 3.School of Industrial and Labor RelationsCornell UniversityIthacaUSA

Personalised recommendations