Earth Science Informatics

, Volume 9, Issue 1, pp 123–136 | Cite as

DOI for geoscience data - how early practices shape present perceptions

  • Jens KlumpEmail author
  • Robert Huber
  • Michael Diepenbroek
Review Article


The first minting of Digital Object Identifiers (DOI) for research data happened in 2004 in the context of the project “Publication and citation of primary scientific data” (STD-DOI). Some of the concepts and perceptions about DOI for data today have their roots in the way this project implemented DOI for research data and the decisions made in those early days still shape the discussion about the use of persistent identifiers for research data today. This project also laid the foundation for a tighter integration of journal publications and data. Promoted by early adopters, such as PANGAEA, DOI registration for data has reached a high level of maturity and has become an integral part of scientific publishing. This paper discusses the fundamental concepts applied in the identification of DOI for research data and how these can be interpreted for alternative and future applications of persistent identifiers for research data.


Persistent identifier Data publication Dynamic data Semantic web 



The authors would like to thank the German Research Foundation (DFG) for funding the projects STD-DOI I and II, and KOMFOR. Substantial support was also given by the European Commission via the FP7 projects COOPEUS, ENVRI and EUDAT. We also like to thank our colleagues in these projects for the work and discussion in establishing DOI for the publication of research data, and also thank our colleagues from other institutions who contributed to the many discussions we had on persistent identifiers over the past years. In particular we like to thank Joachim Wächter for making the original study on DOI for research data available and for initiating the pilot study leading to the project STD-DOI. Last, but not least, we would like to thank the reviewers for their constructive comments.

Compliance with ethical standards

Research towards this article was funded by the German Research Foundation (DFG) through the projects STD-DOI I and II, and KOMFOR. Substantial support was also given by the European Commission via the FP7 projects COOPEUS, ENVRI and EUDAT. The authors report no conflicts of interest.


  1. Aalbersberg IJ, Heeman F, Koers H, Zudilova-Seinstra E (2012) Elsevier’s article of the future enhancing the user experience and integrating data through applications. Insights: UKSG J 25(1):33–43. doi: 10.1629/2048-7754.25.1.33 Google Scholar
  2. Abrams S, Cruse P, Kunze J (2009) Preservation is not a place. Int J Digit Curation 4(1):8–21. doi: 10.2218/ijdc.v4i1.72 CrossRefGoogle Scholar
  3. Anon (2012a), DOI® Handbook, international DOI foundation, London, United Kingdom. [online] Available from:
  4. Anon (2012b), TIB DOI registration, TIB Hannover, Hannover, Germany. [online] Available from:
  5. Anon (2014a), Credit for code. Nat Genet, 46(1), 1–1, doi: 10.1038/ng.2869.
  6. Anon (2014b), Ship of Theseus, Wikipedia, the free encyclopedia. [online] Available from: (Accessed 12 January 2015)
  7. ARGO (2000), Argo floats data and metadata from global data assembly Centre (Argo GDAC), IFREMER, Brest, France. [online] Available from:
  8. Arms, W. Y. (1995), Key concepts in the architecture of the digital library, D-Lib Magazine. [online] Available from:
  9. Baggerly K (2010) Disclose all data in publications. Nature 467(7314):401. doi: 10.1038/467401b CrossRefGoogle Scholar
  10. Bazzanella B, Bortoli S, Bouquet P (2013) Can persistent identifiers be cool? Int J Digit Curation 8(1):14–28. doi: 10.2218/ijdc.v8i1.246 CrossRefGoogle Scholar
  11. Bechhofer S et al. (2013) Why linked data is not enough for scientists. Futur Gener Comput Syst 29(2):599–611. doi: 10.1016/j.future.2011.08.004 CrossRefGoogle Scholar
  12. Bechtold, S. (2003), Governance in Namespaces. Loyola L.A. Law Rev., 36(3), 1239–1320, doi:10.2139/ssrn.413681.Google Scholar
  13. Berners-Lee, T. (1998), Cool URIs don’t change, world wide web consortium (W3C), Cambridge, MA. [online] Available from:
  14. Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34–43. doi: 10.1038/scientificamerican0501-34 CrossRefGoogle Scholar
  15. Bilder, G. (2013), CrossRef services for ALMs, San Francisco, CA. [online] Available from:
  16. Bloom, T., E. Ganley, and M. Winker (2014), Data access for the open access literature: PLOS’s data policy. PLoS Biol, 12(2), e1001797, doi: 10.1371/journal.pbio.1001797.
  17. Brase J (2004) Using digital library techniques - registration of scientific primary data. In: Jones M et al (ed) Research and advanced technology for digital libraries, vol. 3232, pp. 488–494. Springer-Verlag, Heidelberg, Germany. [online] Available from: doi: 10.1007/978-3-540-30230-8_44
  18. Brase, J., M. Lautenschlager, and I. Sens (2015), The tenth anniversary of assigning DOI names to scientific data and a five year history of DataCite. D-Lib magazine, 21(1/2), doi:10.1045/january2015-brase. [online] Available from: (Accessed 19 January 2015)
  19. Bütikofer, N. (2009), Catalogue of criteria for assessing the trustworthiness of PI systems, nestor-materialien, niedersächsische staats und universitätsbibliothek göttingen, göttingen, Germany. [online] Available from:
  20. Callaghan S (2015) Data without peer: examples of Data peer review in the earth sciences. D-Lib 21(1/2). doi: 10.1045/january2015-callaghan
  21. CCSDS (2012), Reference model for an Open Archival Information System (OAIS). Magenta book, recommendation for space data system practices, recommended practice, consultative committee for space data systems, Greenbelt, MD. [online] Available from:
  22. CNRI (2010), HANDLE.NET (Version 7.0) technical manual, technical manual, corporation for national research initiatives, Reston, VA. [online] Available from:
  23. Cox S J D, L A Wyborn, R Fraser, T Rankine, R Woodcock, J Vote, and B Evans (2012), The Virtual Geophysics Laboratory (VGL): Scientific workflows operating across organizations and across infrastructures. In: AGU fall meeting abstracts, p. IN51B–1693, American Geophysical Union, San Francisco, CA. [online] Available from: (Accessed 16 September 2014)
  24. CrossRef (2000) Creating a DOI suffix, [online] Available from: (Accessed 14 March 2012)
  25. Dellavalle RP, Hester EJ, Heilig LF, Drake AL, Kuntzman JW, Graber M, Schilling LM (2003) Going, going, gone: lost internet references. Science 302(5646):787–788. doi: 10.1126/science.1088234 CrossRefGoogle Scholar
  26. Devarakonda, R., G. Palanisamy, J. Green, and B. Wilson (2010), Data sharing and retrieval using OAI-PMH. ESIN, 4(1), 1–5, doi: 10.1007/s12145-010-0073-0.
  27. Diepenbroek M, R Edmunds, W Hugo, and M Mokrane (2014), Guidelines in respect of metadata granularity, version 2.1, technical report, world data system, Tokyo, Japan. [online] Available from:
  28. Dobratz, S. et al. (2009), Catalogue of criteria for trusted digital repositories, Nestor materials, deutsche nationalbibliothek, Frankfurt (main), Germany. [online] Available from:
  29. Duerr RE, Downs RR, Tilmes C, Barkstrom B, Lenhardt WC, Glassy J, Bermudez LE, Slaughter P (2011) On the utility of identification schemes for digital earth science data: an assessment and recommendations. Earth Sci Inf 4(3):139–160. doi: 10.1007/s12145-011-0083-6 CrossRefGoogle Scholar
  30. Evans B et al. (2015), The NCI high performance computing and high performance data platform to support the analysis of petascale environmental data collections. In: Denzer R, Argent R M, Schimak G, Hřebíček J (eds) Environmental software systems. Infrastructures, services and applications, pp. 569–577, Springer International Publishing. [online] Available from: (Accessed 1 May 2015)
  31. Federation of Earth Science Information Partners (2012), Interagency data stewardship/citations/provider guidelines, ESIP Federation Wiki. [online] Available from: (Accessed 11 August 2014)
  32. Fenner, M. (2014), What is a DOI?, Gobbledygook - A blog about how the internet is changing scholarly communication. [online] Available from:
  33. FORCE11 (2013) Joint declaration of data citation principles, the future of research communication and e-scholarship. [online] Available from: (Accessed 11 August 2014)
  34. Grootveld M, van Egmond J (2012) Peer-reviewed open research data: results of a pilot. Int. J. Digit Curation 7(2):81–91. doi: 10.2218/ijdc.v7i2.231 CrossRefGoogle Scholar
  35. Harter S P, Kim H J (1996) Electronic journals and scholarly communication: a citation and reference study. Inf Res 2(1). [online] Available from:
  36. Hart JK, Martinez K (2006) Environmental sensor networks: a revolution in the earth system science? Earth Sci Rev 78(3–4):177–191. doi: 10.1016/j.earscirev.2006.05.001 CrossRefGoogle Scholar
  37. Heim B, Klump J, Fagel N, Oberhänsli H (2008a) Assembly and concept of a web-based GIS within the paleolimnological project CONTINENT (lake Baikal, Siberia). J Paleolimnol 39(4):567–584. doi: 10.1007/s10933-007-9131-0 CrossRefGoogle Scholar
  38. Heim B, Klump J, Fage l N , Oberhänsli H (2008b) Supplementary material to B. Heim et al. (2008): Assembly and concept of a web-based GIS within the paleoclimate project CONTINENT (lake Baikal, Siberia), data, German research centre for geosciences, Potsdam, Germany. [online] Available from:
  39. Helly, J., H. Staudigel, and A. Koppers (2003), Scalable models of data sharing in Earth sciences. Geochem Geophys Geosyst G (super 3), 4(1), 14, doi: 10.1029/2002GC000318.
  40. Hilse H-W, Kothe J (2006) Implementing persistent identifiers, consortium of European research libraries, London, UK. [online] Available from:
  41. Huber R, Asmi A, Buck J, de Lucas J M, Diepenbroek M, Michelini A, Participants of the joint COOPEUS/ENVRI/EUDAT PID Workshop (2015), Data citation and digital identifiers for time series data/environmental research infrastructures, [online] Available from: (Accessed 9 January 2015)
  42. ICSU WDS (2012), Certification of WDS members, ICSU world data system, Tokyo, Japan. [online] Available from:
  43. ISO (1996), Syntactic metalanguage — Extended BNF, Standard, International Organization for Standardization (ISO), Geneva, Switzerland. [online] Available from:
  44. Kahn R, Wilensky R (1995), A framework for distributed digital object services, technical note, corporation for national research initiatives, Reston, VA. [online] Available from:
  45. Kálmán T, Kurzawe D, Schwardmann U (2012) European Persistent Identifier Consortium -­ PIDs für die Wissenschaft. In: R. Altenhöner, C. Oellers (eds) Langzeitarchivierung von Forschungsdaten - Standards und disziplinspezifische Lösungen, pp. 151–168, Scivero, Berlin, Germany. [online] Available from:
  46. Klump J (2011a) Criteria for the trustworthiness of data centres. D-Lib Mag 17(1/2). doi: 10.1045/january2011-klump
  47. Klump, J. (2011b), Langzeiterhaltung digitaler Forschungsdaten. In: S. Büttner, H.-C. Hobohm, L. Müller (eds) Handbuch Forschungsdatenmanagement, pp. 115–122, Bock + Herrchen, Bad Honnef, Germany. [online] Available from:
  48. Klump J, Bertelmann R, Brase J, Diepenbroek M, Grobe H, Höck H, Lautenschlager M, Schindler U, Sens I, Wächter J (2006) Data publication in the open access initiative. Data Sci J 5:79–83. doi: 10.2481/dsj.5.79 CrossRefGoogle Scholar
  49. Koehler W (2004) A longitudinal study of web pages continued: a report after six years. Inf Res, 9(2), paper 174.Google Scholar
  50. König-Langlo G, Gernandt H (2009) 426 ozonesonde profiles from Georg-Forster-station, data, Alfred Wegener institute for polar and marine research, Bremerhaven, Bremerhaven, Germany. [online] Available from: (Accessed 9 November 2010)
  51. Kratz, J., Strasser C (2014) Data publication consensus and controversies [v2; ref status: indexed,]. F1000Research, 3(94), doi: 10.12688/f1000research.3979.2.
  52. Kunze J, Rodgers R (2013) The ARK identifier scheme, draft, internet engineering task force. [online] Available from:
  53. Lautenschlager M, Sens I (2003) Konzept zur zitierfähigkeit wissenschaftlicher primärdaten. Inf Wissenschaft Praxis 54(8):463–466Google Scholar
  54. Lawrence S, Coetzee F, Glover E, Pennock D, Flake G, Nielsen F, Krovetz R, Kruger A, Giles L (2001) Persistence of web references in scientific research. IEEE Comput 34(2):26–31. doi: 10.1109/2.901164 CrossRefGoogle Scholar
  55. Laxton, J., J.-J. Serrano, and A. Tellez-Arenas (2010), Geological applications using geospatial standards – an example from OneGeology-Europe and GeoSciML. Int. J. Digit Earth, 3(sup1), 31–49, doi: 10.1080/17538941003636909.
  56. Lehnert, K. A., Klump J (2012) The geoscience internet of things. In: Geophysical Research Abstracts, vol. 14, pp. EGU2012–13370, Copernicus Society, Vienna, Austria. [online] Available from:
  57. Lynch, C. (1997), Identifiers and their role in networked information applications, ARL: A Bimonthly Newsletter of Research Library Issues and Actions. [online] Available from:
  58. Ma K, Yang B (2014) A simple scheme for bibliography acquisition using DOI content negotiation proxy. Electron Libr 32(6):806–824. doi: 10.1108/EL-09-2012-0121 CrossRefGoogle Scholar
  59. Mayernik, M. S., Callaghan S, Leigh R, Tedds J, Worley S (2014) Peer review of datasets: when, why, and how. Bull. Amer. Meteor. Soc., doi:10.1175/BAMS-D-13–00083.1. [online] Available from: (Accessed 15 May 2014)
  60. McNally R, Mackenzie A, Hui A, Tomomitsu J (2012) Understanding the ‘intensive’ in ‘data intensive research’: data flows in next generation sequencing and environmental networked sensors. Int. J. Digit Curation 7(1):81–94. doi: 10.2218/ijdc.v7i1.216 CrossRefGoogle Scholar
  61. Mundt M (1998), Der DOI (digital object identifier) ein verlagsorientiertes indexierungswerkzeug auch anwendbar auf datensätze?, semesterarbeit, fachhochschule Potsdam, Potsdam, Germany. [online] Available from: doi: 10.2312/GFZ.misc.370184
  62. Noy NF, Klein M (2004) Ontology evolution: not the same as schema evolution. Know. Inf. Sys. 6(4):428–440. doi: 10.1007/s10115-003-0137-2 CrossRefGoogle Scholar
  63. Pampel, H., Vierkant P, Scholze F, Bertelmann R, Kindling M, Klump J, Goebelbecker H-J, Gundlach J, Schirmbacher P, Dierolf U (2013) Making research data repositories visible: the Registry. PLoS ONE, 8(11), e78080, doi: 10.1371/journal.pone.0078080.
  64. Parsons MA, Fox PA (2013) Is data publication the right metaphor? Data Sci J 12:WDS32–WDS46. doi: 10.2481/dsj.WDS-042 Google Scholar
  65. Paskin N (2003) On making and identifying a ‘copy’. D-Lib Mag 9(1). doi: 10.1045/january2003-paskin
  66. Paskin N (2005) Digital object identifiers for scientific data. Data Sci J 4:12–20. doi: 10.2481/dsj.4.12 CrossRefGoogle Scholar
  67. Priem J (2013) Scholarship: beyond the paper. Nature 495(7442):437–440. doi: 10.1038/495437a CrossRefGoogle Scholar
  68. Pröll, S., Rauber A (2013) Scalable data citation in dynamic, large databases: Model and reference implementation. In: 2013 I.E. International Conference on Big Data, pp. 307–312. [online] Available from:
  69. Rothacher, M., Koenig R, Snopek K (2007) TerraSAR-X predicted orbit, data, German research Centre for geosciences, Potsdam, Germany. [online] Available from:
  70. Schroeder, M., V. Stender, J. Klump, J. Wachter, and R. Kunkel (2013), The design of monitoring and data infrastructures - Applying a forward-thinking reference architecture. In: 2013 I.E. 10th Int Conf Netw Sens Control (ICNSC), pp. 216–220, IEEE, Paris, France. [online] Available from: (Accessed 4 December 2014)
  71. Sen M, Duffy T (2005) GeoSciML: development of a generic GeoScience markup language. Comput Geosci 31(9):1095–1103. doi: 10.1016/j.cageo.2004.12.003 CrossRefGoogle Scholar
  72. Sesink, L., R. van Horik, and H. Harmsen (2008), Data seal of approval, data archiving and networked services (DANS), den Haag, The Netherlands. [online] Available from: (Accessed 12 February 2009)
  73. Shaw, A. M. (2011), Gakkel ridge basalt melt inclusion and mineral chemistry, data, integrated earth data applications (IEDA), palisades, NY. [online] Available from: doi: 10.1594/IEDA/100004
  74. Shaw AM, Behn MD, Humphris SE, Sohn RA, Gregg PM (2010) Deep pooling of low degree melts and volatile fluxes at the 85°E segment of the gakkel ridge: evidence from olivine-hosted melt inclusions and glasses. Earth Planet Sci Lett 289(3–4):311–322. doi: 10.1016/j.epsl.2009.11.018 CrossRefGoogle Scholar
  75. Silvello G (2015) A methodology for citing linked open data subsets. D-Lib 21(1/2). doi: 10.1045/january2015-silvello
  76. Van de Sompel H, Nelson ML, Lagoze C, Warner S (2004) Resource harvesting within the OAI-PMH framework. D-Lib Mag 10(12):18. doi: 10.1045/december2004-vandesompel Google Scholar
  77. Starr, J. et al. (2014), DataCite metadata schema for the publication and citation of research data, metadata schema, DataCite e.V., Hannover, Germany. [online] Available from:
  78. Starr J, Gastl A (2011), isCitedBy: a metadata scheme for DataCite. D-Lib Magazine, 17(1/2), doi:10.1045/january2011-starrGoogle Scholar
  79. Study Group on the Functional Requirements for Bibliographic Records (1998) Functional requirements for bibliographic records, IFLA Series on Bibliographic Control, International Federation of Library Associations and Institutions, Munich, Germany. [online] Available from:
  80. Treloar A, Groenewegen D, Harboe-Ree C (2007) The data curation continuum - managing data objects in institutional repositories. D-Lib Mag, 13(9/10), 13, doi: 10.1045/september2007-treloar.
  81. Vines T H, Albert A Y K, Andrew R L, Débarre F, Bock D G, Franklin M T, Gilbert K J, Moore J-S, Renaut S, Rennison D J (2013) The availability of research data declines rapidly with article age. Curr Biol, doi: 10.1016/j.cub.2013.11.014. [online] Available from: (Accessed 6 January 2014)

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.CSIRO, Mineral Resources FlagshipKensingtonAustralia
  2. 2.MARUMUniversity of BremenBremenGermany

Personalised recommendations