Skip to main content

OpenAIRE’s DOIBoost - Boosting Crossref for Research

Part of the Communications in Computer and Information Science book series (CCIS,volume 988)


Research in information science and scholarly communication strongly relies on the availability of openly accessible datasets of scholarly entities metadata and, where possible, their relative payloads. Since such metadata information is scattered across diverse, freely accessible, online resources (e.g. Crossref, ORCID), researchers in this domain are doomed to struggle with (meta)data integration problems, in order to produce custom datasets of often undocumented and rather obscure provenance. This practice leads to waste of time, duplication of efforts, and typically infringes open science best practices of transparency and reproducibility of science. In this article, we describe how to generate DOIBoost, a metadata collection that enriches Crossref with inputs from Microsoft Academic Graph, ORCID, and Unpaywall for the purpose of supporting high-quality and robust research experiments, saving times to researchers and enabling their comparison. To this end, we describe the dataset value and its schema, analyse its actual content, and share the software Toolkit and experimental workflow required to reproduce it. The DOIBoost dataset and Software Toolkit are made openly available via DOIBoost will become an input source to the OpenAIRE information graph.


  • Scholarly communication
  • Open science
  • Data science
  • Data integration
  • Crossref
  • Unpaywall
  • Microsoft Academic Graph

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-11226-4_11
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-11226-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.


  1. 1.

    Crossref APIs,

  2. 2.

    Microsoft Academic Graph,

  3. 3.


  4. 4.


  5. 5.


  6. 6.

    GRID database,

  7. 7.

    The field “access-rights” can assume the values OPEN, EMBARGO, RESTRICTED, CLOSED, UNKNOWN.

  8. 8.

    Apache Oozie,

  9. 9.

    Affero General Public License,

  10. 10.

    Crossref REST API - GitHub,

  11. 11.

    MAG Schema,

  12. 12.

    Unpaywall data format,

  13. 13.

    Levenshtein Distance,


  1. Chawla, D.S.: Unpaywall finds free versions of paywalled papers. Nature News (2017)

    Google Scholar 

  2. Sinha, A., et al.: An overview of Microsoft Academic Service (MAS) and applications. In: Proceedings of the 24th International Conference on World Wide Web (WWW 2015 Companion), pp. 243–246. ACM, New York (2015)

    Google Scholar 

  3. Haak, L.L., Fenner, M., Paglione, L., Pentz, E., Ratner, H.: ORCID: a system to uniquely identify researchers. Learn. Publish. 25, 259–264 (2012).

    CrossRef  Google Scholar 

  4. Manghi, P., Bolikowski, L., Manold, N., Schirrwagen, J., Smith, T.: OpenAIREplus: the European scholarly communication data infrastructure. D-Lib Mag. 18(9), 1 (2012)

    Google Scholar 

  5. Fortunato, S., et al.: Science of science. Science 359(6379), eaao0185 (2018)

    CrossRef  Google Scholar 

  6. La Bruzzo, S., Manghi, P., Mannocci, A.: DOIBoost Dataset Dump (Version 1.0) [Data set]. Zenodo (2018).

  7. La Bruzzo, S.: DOIBoost Software Toolkit (Version 1.0). Zenodo, 1 October 2018.

Download references


This work could be delivered thanks to the Open Science policies enacted by Microsoft, Unpaywall, ORCID, and Crossref, which are allowing researchers to openly collect their metadata records for the purpose of research under CC-0 and CC-BY licenses. The MAG dataset is available with ODC-BY license thanks to the Azure4research sponsorship signed between Microsoft Research and KMi. This work was partially funded by the EU projects OpenAIRE2020 (H2020-EINFRA-2014-1, grant agreement: 643410) and OpenAIRE-Advance H2020 project (grant number: 777541; call: H2020-EINFRA-2017) [4].

Author information

Authors and Affiliations


Corresponding author

Correspondence to Andrea Mannocci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

La Bruzzo, S., Manghi, P., Mannocci, A. (2019). OpenAIRE’s DOIBoost - Boosting Crossref for Research. In: Manghi, P., Candela, L., Silvello, G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11225-7

  • Online ISBN: 978-3-030-11226-4

  • eBook Packages: Computer ScienceComputer Science (R0)