Skip to main content
Log in

Missing institutions in OpenAlex: possible reasons, implications, and solutions

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The advent of open science calls for open data platforms with high data quality. As a fully open catalog of the global research system launched in January 2022, OpenAlex features two main advantages of easy data accessibility and broad data coverage, which has been widely used in quantitative science studies. Remarkably, OpenAlex is adopted as an important data source for Leiden university ranking. However, there is a severe data quality problem of missing institutions in journal article metadata in OpenAlex. This study investigates the possible reasons for the problem and its consequences and solutions by defining three types of institutional information—full institutional information (FII), partially missing institutional information (PMII) and completely missing institutional information (CMII). Our results show that the problem of missing institutions occurs in more than 60% of the journal articles in OpenAlex. The problem is particularly widespread in metadata from the early years and in the social sciences and humanities. Using sub-samples of the data, we further explore the possible reasons for the problem, the risk it might represent for distorted results, and possible solutions to the problem of missing institutions. The aim is to raise the importance of data quality improvements in open resources, and thus to support the responsible use of open resources in quantitative science studies and also in broader contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. OpenAlex API overview: https://docs.openalex.org/how-to-use-the-api/api-overview.

  2. OpenAlex snapshot: https://docs.openalex.org/download-all-data/openalex-snapshot.

  3. The CWTS Leiden Ranking 2023: https://www.leidenmadtrics.nl/articles/the-cwts-leiden-ranking-2023.

  4. Information about the CWTS Leiden Ranking: https://www.leidenranking.com/information/general.

  5. We have also analysed the phenomenon of missing institutions in authorships. 121,872,819 journal articles cover 366,851,172 authors in total, among whom about 47% have missing institutions. In particular, 112,510,937 first authors are identified, among whom about 53% have missing institutions. It can be learned that the data deficiency problem is prominent in authorships as well. However, since our major focus is missing institutions at the paper level, further discussion at the author level is not included in this study.

  6. Missing institutions we refer to is a phenomenon existing in journal articles. However, OpenAlex provides a list where the information of each institution indexed in the database as an entity is given. Therefore, we can complete the PMII by matching institution entities in the list.

  7. Clarivate InCites Help - Citation Topics: https://incites.help.clarivate.com/Content/Research-Areas/citation-topics.htm.

  8. OpenAlex technical documentation - Concepts: https://docs.openalex.org/api-entities/concepts

  9. OpenAlex technical documentation - Institutions: https://docs.openalex.org/api-entities/institutions

  10. Web of Science Core Collection Help: https://images.webofknowledge.com/WOKRS535R52/help/WOS/hs_organizations_enhanced.html.

  11. How affiliation profiles work in Scopus: https://service.elsevier.com/app/answers/detail/a_id/36052/supporthub/scopus/.

References

Download references

Acknowledgements

The present study is an extended version of an article presented at the 19th International Conference on Scientometrics and Informetrics, Bloomington (USA), 2–5, July (Cao et al., 2023). This work was supported by the National Natural Science Foundation of China (Grant nos. 71974150, 72374160, 72004169), and the National Laboratory Center for Library and information Science in Wuhan University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Zhang.

Ethics declarations

Conflict of interest

The first author (Lin Zhang) is Editor-in-Chief of Scientometrics.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Cao, Z., Shang, Y. et al. Missing institutions in OpenAlex: possible reasons, implications, and solutions. Scientometrics (2024). https://doi.org/10.1007/s11192-023-04923-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11192-023-04923-y

Keywords

Navigation