OpenAIRE’s DOIBoost - Boosting Crossref for Research

  • Sandro La Bruzzo
  • Paolo Manghi
  • Andrea MannocciEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 988)


Research in information science and scholarly communication strongly relies on the availability of openly accessible datasets of scholarly entities metadata and, where possible, their relative payloads. Since such metadata information is scattered across diverse, freely accessible, online resources (e.g. Crossref, ORCID), researchers in this domain are doomed to struggle with (meta)data integration problems, in order to produce custom datasets of often undocumented and rather obscure provenance. This practice leads to waste of time, duplication of efforts, and typically infringes open science best practices of transparency and reproducibility of science. In this article, we describe how to generate DOIBoost, a metadata collection that enriches Crossref with inputs from Microsoft Academic Graph, ORCID, and Unpaywall for the purpose of supporting high-quality and robust research experiments, saving times to researchers and enabling their comparison. To this end, we describe the dataset value and its schema, analyse its actual content, and share the software Toolkit and experimental workflow required to reproduce it. The DOIBoost dataset and Software Toolkit are made openly available via DOIBoost will become an input source to the OpenAIRE information graph.


Scholarly communication Open science Data science Data integration Crossref ORCID Unpaywall Microsoft Academic Graph 



This work could be delivered thanks to the Open Science policies enacted by Microsoft, Unpaywall, ORCID, and Crossref, which are allowing researchers to openly collect their metadata records for the purpose of research under CC-0 and CC-BY licenses. The MAG dataset is available with ODC-BY license thanks to the Azure4research sponsorship signed between Microsoft Research and KMi. This work was partially funded by the EU projects OpenAIRE2020 (H2020-EINFRA-2014-1, grant agreement: 643410) and OpenAIRE-Advance H2020 project (grant number: 777541; call: H2020-EINFRA-2017) [4].


  1. 1.
    Chawla, D.S.: Unpaywall finds free versions of paywalled papers. Nature News (2017)Google Scholar
  2. 2.
    Sinha, A., et al.: An overview of Microsoft Academic Service (MAS) and applications. In: Proceedings of the 24th International Conference on World Wide Web (WWW 2015 Companion), pp. 243–246. ACM, New York (2015)Google Scholar
  3. 3.
    Haak, L.L., Fenner, M., Paglione, L., Pentz, E., Ratner, H.: ORCID: a system to uniquely identify researchers. Learn. Publish. 25, 259–264 (2012). Scholar
  4. 4.
    Manghi, P., Bolikowski, L., Manold, N., Schirrwagen, J., Smith, T.: OpenAIREplus: the European scholarly communication data infrastructure. D-Lib Mag. 18(9), 1 (2012)Google Scholar
  5. 5.
    Fortunato, S., et al.: Science of science. Science 359(6379), eaao0185 (2018)CrossRefGoogle Scholar
  6. 6.
    La Bruzzo, S., Manghi, P., Mannocci, A.: DOIBoost Dataset Dump (Version 1.0) [Data set]. Zenodo (2018).
  7. 7.
    La Bruzzo, S.: DOIBoost Software Toolkit (Version 1.0). Zenodo, 1 October 2018.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute of Information Science and Technology - CNRPisaItaly
  2. 2.Knowledge Media Institute – The Open UniversityMilton KeynesUK

Personalised recommendations