Key components of data publishing: using current best practices to develop a reference model for data publishing

  • Claire C. Austin
  • Theodora Bloom
  • Sünje Dallmeier-Tiessen
  • Varsha K. Khodiyar
  • Fiona Murphy
  • Amy Nurnberger
  • Lisa Raymond
  • Martina Stockhause
  • Jonathan Tedds
  • Mary Vardigan
  • Angus Whyte
Article

Abstract

The availability of workflows for data publishing could have an enormous impact on researchers, research practices and publishing paradigms, as well as on funding strategies and career and research evaluations. We present the generic components of such workflows to provide a reference model for these stakeholders. The RDA-WDS Data Publishing Workflows group set out to study the current data-publishing workflow landscape across disciplines and institutions. A diverse set of workflows were examined to identify common components and standard practices, including basic self-publishing services, institutional data repositories, long-term projects, curated data repositories, and joint data journal and repository arrangements. The results of this examination have been used to derive a data-publishing reference model comprising generic components. From an assessment of the current data-publishing landscape, we highlight important gaps and challenges to consider, especially when dealing with more complex workflows and their integration into wider community frameworks. It is clear that the data-publishing landscape is varied and dynamic and that there are important gaps and challenges. The different components of a data-publishing system need to work, to the greatest extent possible, in a seamless and integrated way to support the evolution of commonly understood and utilized standards and—eventually—to increased reproducibility. We therefore advocate the implementation of existing standards for repositories and all parts of the data-publishing process, and the development of new standards where necessary. Effective and trustworthy data publishing should be embedded in documented workflows. As more research communities seek to publish the data associated with their research, they can build on one or more of the components identified in this reference model.

Keywords

Data publishing Open data Open Science World Data System Research Data Alliance 

References

  1. 1.
    Schmidt, B., Gemeinholzer, B., Treloar, A.: Open Data in Global Environmental Research: The Belmont Forum’s Open Data Survey (2015). http://docs.google.com/document/d/1jRM5ZlJ9o4KWIP1GaW3vOzVkXjIIBYONFcd985qTeXE/ed
  2. 2.
    Vines, T.H., Albert, A.Y.K., Andrew, R.L., DeBarre, F., Bock, D.G., Franklin, M.T., Gilbert, K.J., Moore, J.S., Renaut, S., Rennison, D.J.: The availability of research data declines rapidly with article age. Curr. Biol. 24(1), 94–97 (2014)Google Scholar
  3. 3.
    Hicks, D., Wouters, P., Waltman, L., De Rijcke, S., Rafols, I.: Bibliometrics: The Leiden Manifesto for research metrics. Nature 520, 429–431 (2015). http://www.nature.com/news/bibliometrics-the-leiden-manifesto-for-research-metrics-1.17351. Accessed 10 November 2015
  4. 4.
    Piwowar, H., Vision, T.: Data reuse and the open data citation advantage. PeerJ Comput. Sci. (2013). http://peerj.com/articles/175/. Accessed 10 November 2015
  5. 5.
    Pienta, A.M., Alter, G.C., Lyle, J.A.: The enduring value of social science research: the use and reuse of primary research data (2010). http://hdl.handle.net/2027.42/78307. Accessed 10 November 2015
  6. 6.
    Borgman, C.L.: Big data, little data, no data: scholarship in the networked world. MIT Press, Cambridge (2015)Google Scholar
  7. 7.
    Wallis, J.C., Rolando, E., Borgman, C.L.: If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology. PLoS One 8(7), e67332 (2013). doi:10.1371/journal.pone.0067332 CrossRefGoogle Scholar
  8. 8.
    Peng, R.D.: Reproducible research in computational science. Science 334(6060), 1226–1227 (2011)CrossRefGoogle Scholar
  9. 9.
    Thayer, K.A., Wolfe, M.S., Rooney, A.A., Boyles, A.L., Bucher, J.R., Birnbaum, L.S.: Intersection of systematic review methodology with the NIH reproducibility initiative. Environ. Health Perspect. 122, A176–A177 (2014). http://ehp.niehs.nih.gov/wp-content/uploads/122/7/ehp.1408671.pdf. Accessed 10 November 2015
  10. 10.
    George, B.J., Sobus, J.R., Phelps, L.P., Rashleigh, B., Simmons, J.E., Hines, R.N.: Raising the bar for reproducible science at the US Environmental Protection Agency Office of Research and Development. Toxicol. Sci. 145(1), 16–22 (2015). http://toxsci.oxfordjournals.org/content/145/1/16.full.pdf+html
  11. 11.
    Boulton, G., et al.: Science as an open enterprise. R. Soc. Lond. (2012). https://royalsociety.org/policy/projects/science-public-enterprise/Report/. Accessed 10 November 2015
  12. 12.
    Stodden, V., Bailey, D.H., Borwein, J., LeVeque, R.J., Rider, W., Stein, W.: Setting the default to reproducible. Reproducibility in computational and experimental mathematics. Institute for Computational and Experimental Research in Mathematics (2013). http://icerm.brown.edu/tw12-5-rcem/icerm_report.pdf. Workshop report accessed 10 November 2015
  13. 13.
    Whyte, A., Tedds, J.: Making the case for research data management. DCC briefing papers. Digital Curation Centre, Edinburgh (2011). http://www.dcc.ac.uk/resources/briefing-papers/making-case-rdm. Accessed 10 November 2015
  14. 14.
    Parsons, M., Fox, P.: Is data publication the right metaphor? Data Sci. J. 12 (2013). doi:10.2481/dsj.WDS-042. Accessed 10 November 2015
  15. 15.
    Rauber, A., Pröll, S.: Scalable dynamic data citation approaches, reference architectures and applications RDA WG Data Citation position paper. Draft version (2015). http://rd-alliance.org/groups/data-citation-wg/wiki/scalable-dynamic-data-citation-rda-wg-dc-position-paper.html. Accessed 13 November 2015
  16. 16.
    Rauber, A., Asmi, A., van Uytvanck, D., Pröll, S.: Data citation of evolving data: recommendations of the Working Group on Data Citation (WGDC) Draft—request for comments (2015). Revision of 24th September 2015. http://rd-alliance.org/system/files/documents/RDA-DC-Recommendations_150924.pdf. Accessed 6 November 2015
  17. 17.
    Watson, et al.: The XMM-Newton serendipitous survey. V. The Second XMM-Newton serendipitous source catalogue. Astron. Astrophys. 493(1), 339–373 (2009). doi:10.1051/0004-6361:200810534
  18. 18.
    Lawrence, B., Jones, C., Matthews, B., Pepler, S., Callaghan, S.: Citation and peer review of data: moving toward formal data publication. Int. J. Digital Curation (2011). doi:10.2218/ijdc.v6i2.20r
  19. 19.
    Callaghan, S., Murphy, F., Tedds, J., Allan, R., Kunze, J., Lawrence, R., Mayernik, M.S., Whyte , A.: Processes and procedures for data publication: a case study in the geosciences. Int. J. Digital Curation 8(1) (2013). doi:10.2218/ijdc.v8i1.253
  20. 20.
    Austin, C.C., Brown, S., Fong, N., Humphrey, C., Leahey, L., Webster, P.: Research data repositories: review of current features, gap analysis, and recommendations for minimum requirements. Presented at the IASSIST Annual Conference. IASSIST Quarterly Preprint. International Association for Social Science, Information Services, and Technology. Minneapolis (2015). http://drive.google.com/file/d/0B_SRWahCB9rpRF96RkhsUnh1a00/view. Accessed 13 November 2015
  21. 21.
    Yin, R.: Case study research: design and methods, 5th edn. Sage Publications, Thousand Oaks (2003)Google Scholar
  22. 22.
    Murphy, F., Bloom, T., Dallmeier-Tiessen, S., Austin, C.C., Whyte, A., Tedds, J., Nurnberger, A., Raymond, L., Stockhause, M., Vardigan, M.: WDS-RDA-F11 Publishing Data Workflows WG Synthesis FINAL CORRECTED. Zenodo. 2015 (2015). doi:10.5281/zenodo.33899. Accessed 17 November 2015
  23. 23.
    Stockhause, M., Höck, H., Toussaint, F., Lautenschlager, M.: Quality assessment concept of the World Data Center for Climate and its application to the CMIP5 data. Geosci. Model Dev. 5(4), 1023–1032 (2012). doi:10.5194/gmd-5-1023-2012
  24. 24.
    Starr, J., Castro, E., Crosas, M., Dumontier, M., Downs, R.R., Duerr, R., Haak, L.L., Haendel, M., Herman, I., Hodson, S., Hourclé, J., Kratz, J.E., Lin, J., Nielsen, L.H., Nurnberger, A., Proell, S., Rauber, A., Sacchi, S., Smith, A., Taylor, M., Clark, T.: Achieving human and machine accessibility of cited data in scholarly publications. PeerJ Comput. Sci. 1(e1) (2015). doi:10.7717/peerj-cs.1
  25. 25.
    Castro, E., Garnett, A.: Building a bridge between journal articles and research data: The PKP-Dataverse Integration Project. Int. J. Digital Curation 9(1), 176–184 (2014). doi:10.2218/ijdc.v9i1.311 CrossRefGoogle Scholar
  26. 26.
    Mayernik, M.S., Callaghan, S., Leigh, R., Tedds, J.A., Worley, S.: Peer review of datasets: when, why, and how. Bull. Am. Meteorol. Soc. 96(2), 191–201 (2015). doi:10.1175/BAMS-D-13-00083.1 CrossRefGoogle Scholar
  27. 27.
    Meehl, G.A., Moss, R., Taylor, K.E., Eyring, V., Stouffer, R.J., Bony, S., Stevens, B.: Climate Model Intercomparisons: preparing for the next phase. Eos Trans. AGU 95(9), 77 (2014). doi:10.1002/2014EO090001 CrossRefGoogle Scholar
  28. 28.
    Bandrowski, A., Brush, M., Grethe, J.S., Haendel, M.A., Kennedy, D.N., Hill, S., Hof, P.R., Martone, M.E., Pols, M., Tan, S., Washington, N., Zudilova-Seinstra, E., Vasilevsky, N.: The Resource Identification Initiative: a cultural shift in publishing [version 1; referees: 2 approved] F1000Research 4, 134 (2015). doi:10.12688/f1000research.6555.1
  29. 29.
    Brase, J., Lautenschlager, M., Sens, I.: The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite. D-Lib Mag. 21(1/2) (2015). doi:10.1045/january2015-brase
  30. 30.
    Cragin, M.H., Palmer, C.L., Carlson, J.R., Witt, M.: Data sharing, small science and institutional repositories. Philos. Trans. R. Soc. A 368(1926), 4023–4038 (2010)CrossRefGoogle Scholar
  31. 31.
    Pryor, G.: Multi-scale data sharing in the life sciences: Some lessons for policy makers. Int. J. Digital Curation 4(3), 71–82 (2009). doi:10.2218/ijdc.v4i3.115

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Claire C. Austin
    • 1
    • 2
  • Theodora Bloom
    • 3
  • Sünje Dallmeier-Tiessen
    • 4
  • Varsha K. Khodiyar
    • 5
  • Fiona Murphy
    • 6
  • Amy Nurnberger
    • 7
  • Lisa Raymond
    • 8
  • Martina Stockhause
    • 9
  • Jonathan Tedds
    • 10
  • Mary Vardigan
    • 11
  • Angus Whyte
    • 12
  1. 1.Research Data CanadaTorontoCanada
  2. 2.Carleton UniversityOttawaCanada
  3. 3.BMJLondonUK
  4. 4.CERNGenevaSwitzerland
  5. 5.Nature Publishing GroupLondonUK
  6. 6.University of ReadingReadingUK
  7. 7.Columbia UniversityNew YorkUSA
  8. 8.Woods Hole Oceanographic InstitutionWoods HoleUSA
  9. 9.German Climate Computing Centre (DKRZ)HamburgGermany
  10. 10.University of LeicesterLeicesterUK
  11. 11.University of Michigan/ICPSRLeicesterUK
  12. 12.Digital Curation CentreEdinburghScotland, UK

Personalised recommendations