Skip to main content

Exploring the relevance of ORCID as a source of study of data sharing activities at the individual-level: a methodological discussion


ORCID is a scientific infrastructure created to solve the problem of author name ambiguity. Over the years ORCID has also become a useful source for studying academic activities reported by researchers. Our objective in this research was to use ORCID to analyze one of these research activities: the publication of datasets. We illustrate how the identification of datasets that shared in researchers’ ORCID profiles enables the study of the characteristics of the researchers who have produced them. To explore the relevance of ORCID to study data sharing practices we obtained all ORCID profiles reporting at least one dataset in their "works" list, together with information related to the individual researchers producing the datasets. The retrieved data was organized and analyzed in a SQL database hosted at CWTS. Our results indicate that DataCite is by far the most important data source for providing information about datasets recorded in ORCID. There is also a substantial overlap between DataCite records with other repositories (Figshare, Dryad, and Zenodo). The analysis of the distribution of researchers producing datasets shows that the top six countries with more data producers, also have a relatively higher percentage of people who have produced datasets out of total researchers with datasets than researchers in the total ORCID. By disciplines, researchers that belong to the areas of Natural Sciences and Medicine and Life Sciences are those with the largest amount of reported datasets. Finally, we observed that researchers who have started their PhD around 2015 published their first dataset earlier that those researchers that started their PhD before. The work concludes with some reflections of the possibilities of ORCID as a relevant source for research on data sharing practices.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

source information of datasets in ORCID

Fig. 3
Fig. 4

source is either DataCite or other and the total number of datasets whose source is DataCite

Fig. 5
Fig. 6
Fig. 7


  1. This is the most informal form of data sharing, also the most difficult to track and measure.

  2. To some extent we assume (of course with some limitations) that researchers: (1) first, connect their ORCID to DataCite records (which can be seen as minimal “mindful” act towards the incorporation of datasets in their ORCID profiles); (2) second, allow DataCite (and other trusted data repositories) to automatically update their public profiles, such updates typically require a basic acknowledgement by the user (e.g. authoring trusted partners, notification e-mails, approvals by the users, etc. – see, thus being different from more automatic and algorithm updates by platforms such Google Scholar or ResearchGate; and (3) arguably researchers review once in a while their ORCID profiles in order to correct them or change the information, this could be the chance for many researchers to incorporate (or not) their dataset records.

  3. We argue that studying data sharing practices as registered in ORCID represent a relatively more “active” perspective, in which researchers registered in ORCID facilitate to some extent the identification of these practices by recording them in their profiles. This may be seen as different from more “passive” types of engagements, in which researchers may be tracked to datasets records (e.g., via records in DataCite—see Mongeon et al, (2017) but they are not necessarily including them in their CVs, research profiles or simply not linking them to their ORCID ids. Thus, we argue that in this study we are focusing on relatively “mindful” forms of data sharing activities from an individual point of view, in which researchers with a dataset recorded in ORCID have minimally provided and allowed (either mechanically – via automatic updates from trusted data repositories, or more manually) datasets to be updated and recorded in their profiles.

  4. The NOWT classification is a grouping of WoS Journal Subject Categories (JSC), whereby each JSC is attached to one level each time (without overlaps). The NOWT classification was designed in the light of the Dutch Observatory of Science & Technology, and functioned as that instrument’s field classification system for over 30 years. The system contains various levels of aggregation, whereby the lowest level of aggregation consists of 37 scientific disciplines, and the highest level of aggregation consists of 7 larger domains of scholarly activity. In this study, that highest level is used.


Download references


This work was supported by Ministry of Science and Innovation of Spain (BES-2016-079394) and European Social Fund and was partially funded by the South African DST-NRF Centre of Excellence in Scientometrics and Science, Technology and Innovation Policy (SciSTIP). We thank an anonymous reviewer for insightful comments and recommendations of an early version of this paper.

Author information

Authors and Affiliations



ASC, NRG, TvL and RC contributed to the conception, design, and analysis of the study, as well as writing of the manuscript.

Corresponding author

Correspondence to Sixto-Costoya Andrea.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Andrea, SC., Nicolas, RG., van Thed, L. et al. Exploring the relevance of ORCID as a source of study of data sharing activities at the individual-level: a methodological discussion. Scientometrics 126, 7149–7165 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Data sharing
  • Datasets
  • Researcher’s profiles