Insights from the co-authorship network of the Italian academic statisticians

Bacci, Silvia; Bertaccini, Bruno; Petrucci, Alessandra

doi:10.1007/s11192-023-04761-y

Insights from the co-authorship network of the Italian academic statisticians

Open access
Published: 22 June 2023

Volume 128, pages 4269–4303, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientometrics Aims and scope Submit manuscript

Insights from the co-authorship network of the Italian academic statisticians

Download PDF

1482 Accesses
3 Citations
Explore all metrics

Abstract

Nowadays, new technologies have favored communication among scholars from different universities and countries, and huge amount of data and scientific works have become more and more accessible. This has led to an increase in the multidisciplinarity of research products, but often also to a more specialized level of knowledge of the scholars. Therefore, while belonging to the same disciplinary field, scholars may present different working styles and willingness to collaborate according to their specific topics of interest. This plays a particularly relevant role in Italy, where tenured scholars in academic institutions are classified in sub-fields that, in turn, may be aggregated for purposes of recruitment and career advancement. Aim of this contribution is to propose a methodological approach to understand if the work and collaborative style of academic scholars belonging to different sub-fields is really so similar as to justify their grouping. For illustrative purposes, we focus on the co-authorship network of Italian academic statisticians relying on the database of scientific works published since 1990 until 2021 and downloaded by SCOPUS. From this database, we obtain a network composed of 758 nodes and 1730 edges. Some network measures at node level representing the work and collaborative style of scholars (i.e., number of publications, degree, degree strength, some centrality indices, transitivity, and external-internal index) are explained through quantile regression models. Results provide policy makers with useful insights on which sub-fields present significant differences in terms of research interests and collaborative style, thus not justifying their aggregation for recruitment and career advancement purposes.

The strength of strong ties: How co-authorship affect productivity of academic economists?

Article 02 October 2014

Analysis of the Co-authorship Sub-networks of Italian Academic Researchers

The emergence of the higher education research field (1976–2018): preferential attachment, smallworldness and fragmentation in its collaboration networks

Article 13 August 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The scientific productivity has a strong impact on the opportunities to enhance the academic career of scholars. Thus, a fair evaluation of scholars’ research activities should always take into account the peculiarities of the scientific area in which they do research.

Since the last decade of the twentieth Century, the dissemination of new technologies has fostered the communication among scholars from different universities and countries, has made a huge amount of data easily available and a lot of scientific works have become increasingly accessible. At the same time, the knowledge of scholars has become more and more specialized. For these reasons, the collaboration among scholars is nowadays a key element for the advance of knowledge in many scientific fields. This is especially true for statisticians, because Statistics is, by its very nature, a multidisciplinary science that provides support to many different fields of knowledge (e.g., social and economic sciences, agricultural sciences, medicine, pharmaceutical sciences, psychology. biology, engineering, etc.). This peculiarity is well synthesized by an aphorism of John Wilder Tukey, a well known famous statistician: “The best thing about being a statistician is that you get to play in everyone’s backyard”.

In Italy, scholars employed in the universities (i.e., researchers, associate and full professors) are clustered in groups (named Scientific Disciplinary Sectors, SDSs) that identify the prominent orientation of the research profile of each scholar. We expect a different style of work and collaboration among scholars belonging to different SDSs that, in turn, is expected to affect the scientific productivity of scholars (e.g., in terms of quantity of scientific works, editorial classification, number of co-authors, scientific field of the co-authors). Moreover, the access to public competitions for the recruitment or for the advancement in the academic role of professor is subordinated to the achievement of a national scientific qualification that takes into account the scientific productivity related to the SDS in which the scholar does research. However, in many cases current legislation groups two or more SDSs in a same Competition Sector (CS) for academic recruitment and advancement purposes.

The research question that arises from the context at issue concerns whether grouping scholars belonging to different SDSs in a same CS truly reflects similar research interests, styles of work and collaboration so as to guarantee a fair comparison among scholars. Aim of this contribution is to propose a methodological approach based on network analysis (Scott, 2000; Kolaczyk, 2009; Newman, 2010) and quantile regression models (Koenker & Bassett, 1978; Koenker, 2005; Davino et al., 2013) to assess the differences among scholars belonging to different SDSs. In particular, we test our approach focusing on the work and collaborative style of Italian academic statisticians. As before mentioned, such class of scholars has, by its nature, an intrinsic propensity to multidisciplinarity. Moreover, being Statistics the disciplinary field to which the authors of this paper belong, awareness of its dynamics can help in understanding the results obtained. However, it is worth noting that the proposed analysis can be applied to any other subset of Italian SDSs and it can also be generalized to all (not Italian) situations in which scholars are grouped by research fields or by any other type of pre-established aggregation.

According to the current legislation, Italian statisticians are grouped in five SDSs and three CSs. For our aim, in the following we define the co-authorship network of Italian statisticians from the entries recorded in the SCOPUS web information system, one of the largest multi-disciplinary database of peer-reviewed journal articles and other scientific contribution. We detect and compute descriptive measures of such network and estimate quantile regression models to assess the effect of the SDS on the network measures, after controlling for individual and university characteristics.

Our contribution fits the international literature about network analysis applied to bibliographic data source and scientific collaborations. Recently, Baccini et al. (2022) implemented a multi-layer network analysis to identify homogeneous clusters of scientific journals where Italian statisticians usually publish: differences among SDS specializations are reflected in clusters they found (i.e., probability theory, theoretical statistics, applied statistics, economics). As concerns the scientific collaboration context in Italy, in the last years numerous scientific contributions (see, among others, Baccini and De Nicolao, 2016; Franceschini and Maisano, 2017; Demetrescu et al, 2020; De Stefano et al, 2022; Akbaritabar et al, 2021) focused on the effects and biases induced by regulations adopted by the Italian Ministry of University and Research (MUR) to promote the research assessment exercises, whereas, at least to our knowledge, consequences of aggregation of SDSs for purposes of recruitment and career advancements have not yet been studied from a scientific point of view. In this contribution we focus on this latter aspect.

We ideally prosecute the work by De Stefano et al. (2013), De Stefano and Zaccarin (2016), and De Stefano et al. (2019). De Stefano et al. (2013) analyzed the co-authorship networks of Italian academic statisticians in the time 1990-2009 resulting from three different databases (Web of Science, Current Index of Statistics, and a database retrieved from the MUR) and, among other things, investigated the collaborative styles of statisticians finding that statisticians from different SDSs have different styles. Focusing on the same databases, De Stefano and Zaccarin (2016) investigated the relation between scholars’ h-index and some descriptive network measures at node level and De Stefano et al. (2019) analyzed the tendency of scholars to cluster in communities. Both these studies corroborated the differences among scientific sectors. Differently from these works, our interest lies in how descriptive measures of the network at the node (i.e., scholar) level are affected by the SDS membership, having a special attention for those sectors that are aggregated by law for recruitment and career advancement purposes.

The remaining part of this contribution is organized as follows. In Sect. “The Italian structure of academic scientific fields” the actual Italian regulation that established SDSs and CSs is illustrated. In Sect. “Data collection and network characterization” details are provided on the collection from web of data concerning the scientific publications of scholars in the field of Statistics. In Sect. “Network description” the collaborative network of Italian statisticians is described. In Sect. “Quantile regression models” theoretical fundamentals about quantile regression models are illustrated and in Sect. “Results and discussion” evidence on the network of statisticians from a quantile regression analysis is provided and discussed. In Sect. “Conclusions” some final remarks conclude the contribution.

The Italian structure of academic scientific fields

The scientific collocation of scholars working in the Italian university system plays an important role, because it drives several organizational aspects of the academic life, such as the definition of bachelor and master degree programs, the constitution of university departments, and the recruitment of scholar staff (i.e., researchers and professors). To date, the scientific collocation of scholars is articulated on three main levels: 86 macro-fields, 190 fields (i.e., the Competition Sectors—acronym CSs), and 383 sub-fields (i.e., the Scientific Disciplinary Sectors—acronym SDSs), as stated by the Ministerial Decree DM 855/2015. The legislative milestones that led to the current organizational set-up are provided at the web page of the MUR^{Footnote 1}.

The current grouping scheme originates from an antecedent streamlined structure established by the Ministerial Decree DM 4/10/2000 that defined (annex A^{Footnote 2}) the 14 research areas and related SDSs in which academic scholars are still framed, and stated (annex B^{Footnote 3}) the typology of research activity characterizing each SDS. Framing academic scholars in SDSs has a practical utility, as the classification in SDSs is applied to the comparative assessment procedures, as stated at article 2 of the Ministerial Decree at issue.

In 2010 a radical reform of the Italian academic system has been released with the Law 240 of 30/12/2010. Among other things, this reform introduced a national scientific qualification as a necessary (but not sufficient) preliminary condition for career advancement in the Italian academy (i.e., to progress from researcher to associate professor or from associate professor to full professor). In this regard, Law 240/2010 (article 15, paragraph 1) introduced the CSs, that are a hierarchical aggregation of SDSs (each CS is articulated in one or more SDSs and each SDS belongs to just one CS) and that must be linked to the procedures for the recognition of the national scientific qualification. The detailed list of CSs, with the related SDSs nested within them, is defined in the Ministry Decree DM 855/2015 (annex A^{Footnote 4}) together with the description of the typology of research activity characterizing each CS (annex B^{Footnote 5}).

In summary, Italian academic scholars are currently classified both in CSs and in SDSs according to their research activity. Regarding the career progression, the national scientific qualification is the first requirement, and those who attained this qualification may compete in a public comparative examination. The procedure for the national scientific qualification relies on the grouping in CSs, whereas the public comparative examinations rely on the grouping in SDSs.

The aggregation of SDSs in CSs was carried out following criteria essentially linked to the areas of research activity characterizing a certain SDS and the relative number of scholars belonging to it. Therefore, with rare exceptions, the SDSs nested in a same CS are usually those with a low number of scholars and/or with similar or quite overlapped research topics as it can be deduced from the descriptive declaration of each SDS (see the above cited Annex B of the Ministerial Decree DM 4/10/2000).

As anticipated in the previous Section, Italian academic statisticians are classified in three CSs that include five SDSs:

CS 13-D1: Statistics
- SDS S01: Methodological Statistics
- SDS S02: Statistics for Experimental and Technological Research
CS 13-D2: Economic Statistics
- SDS S03: Economic Statistics
CS 13-D3: Demography and social statistics
- SDS S04: Demography
- SDS S05: Social Statistics

The aggregation of sectors S01 and S02 in the same CS is mainly due to the very low number of scholars belonging to the SDS S02. Differently, the overlapping of most of the research topics is the main reason that justified the aggregation of sectors S04 and S05 in the same CS.

However, the reasons above mentioned do not guarantee that scholars belonging to different SDSs and aggregated in a same CS have the same working style in terms of, among others, propensity to collaborate with (few or numerous) other scholars of the same or different SDSs. In turn, these elements affect the scientific productivity of a scholar, such as the quantity of published papers and the typology of scientific journals (e.g., national journal, international journals, journals with or without impact factor, monographs), on which the national scientific qualification is based. Therefore, to avoid the aggregation of scholars coming from SDSs characterized by substantially different styles of work, a quantitative analysis of these differences proves to be an additional useful instrument to support decision makers for a possible critical review of the composition of the CSs.

Data collection and network characterization

In developing this work we had to gather and manipulate information from different sources. The starting point was the list of the 783 statisticians employed as tenured teaching staff in an Italian (public or private) university institution at the end of February 2021. This list can be publicly downloaded from the MUR website^{Footnote 6}. All the scholars in this list are classified in one of the five SDSs cited in Sect. “The Italian structure of academic scientific fields”, that is, S01, S02, S03, S04, and S05. Statisticians working within the Italian university system but without tenure, such as research fellows and PhD students in Statistics, as well as statisticians working outside of the Italian university system are excluded from the list.

As well known, SCOPUS is one of the largest multidisciplinary registry of peer-reviewed journal articles. It covers more than 30 million publications from 1996 to the present. Authors with publications referenced in SCOPUS are automatically assigned a unique Author Identifier (named SCOPUSId) to avoid disambiguation problems when querying the registry. Unfortunately, the SCOPUSId is missing in the set of information downloadable from the MUR website. We were able to retrieve the SCOPUSId of 758 out of 783 statisticians thanks the features of the Scival (by Elsevier) web service^{Footnote 7}.

SciVal is an analytical insights tool based (and weekly updated) on data collected by SCOPUS, designed for research performance evaluation. Inside Scival, the SCOPUSId is obtainable in a semiautomatic way: the association is performed directly by the system if no ambiguity is detected. Otherwise, possible ambiguities are highlighted and a manual intervention is required to resolve such cases. The need to resolve ambiguities arises in the rare cases where multiple SCOPUS profiles have been generated for the same scholar, since the SCOPUS registry is updated through the information gathered from published papers that may contain incomplete authors’ surname and/or given-names (respectively in the case of multiple surnames or first names) or old affiliations. Obviously, the richer the scholar information passed to Scival, the better the chances of identifying the right SCOPUS profile. In querying Scival, we used the full set of information released by the MUR website, and the manual intervention to resolve ambiguities was only necessary for about fifty scholars (whose SCOPUSId was retrieved browsing manually the SCOPUS website or from their curriculum published on his/her academic institution website).

In the literature, some authors involved with the analysis of similar sources of information (describing scientific collaboration between scholars) approached the disambiguation problem in different ways. De Stefano et al. (2013) compared the network of collaborations between Italian statisticians recurring to three different bibliographic archives (one general, one thematic and one national), each of them using specific key identifiers. The authors’ information gathered from the MUR list about the tenured Italian academic statisticians was used to directly query each registry. However, this strategy has resulted in the need for manual interventions in the querying phase and final data cleaning procedures were required to eliminate possible errors (duplication of records or wrong attributions). Fuccella et al. (2016) tried to derive a unified archive merging different sources of bibliographic data relative to a bounded scientific community. In exploiting this task they faced two main challenges: the implementation of a records linkage procedure to avoid (or minimize) duplication of data referring to the same paper, and the need to disambiguate authors that was resolved recurring to an unsupervised technique due to the lack of training data. Carchiolo et al. (2022) designed a special algorithm generating a list of queries to be directly submitted in SCOPUS on the basis of the information gathered by the MUR website (shuffling the given name, if more than one, together with the initials of the first name and the affiliation; in case of failure, the condition about affiliation was discarded and queries repeated). Differently from the two works above mentioned that analyzed different sources of information, Carchiolo et al. (2022) used only SCOPUS as source of bibliometric data, but, differently from our proposal, they omitted the preliminary step of retrieving the authors’ identifiers, which has proved to be extremely useful in reducing disambiguation issues.

As anticipated above, the SCOPUSId was missing for 25 statisticians out of 783 from our initial list. They were scholars without scientific contributions indexed in SCOPUS when the list of statisticians was extracted (generally because they were very young researchers recently employed).

The SCOPUSId of the 758 statisticians was then used to download the list of their research products from the SCOPUS website. The download was performed using the SCOPUS “advanced search” functionality and returned a dataset made up of 14,838 records, each of them identified by a unique alphanumeric code (labeled EId) assigned by the SCOPUS bibliographic information system. A lot of additional information is also available for download from the SCOPUS registry: authorship information, bibliographical information, abstract and keywords, citation information, funding details, and others of minor importance (e.g., the eventual conference in which the paper was presented). To keep our database manageable and to avoid computational efforts in the later phases of the analysis, we limited the query extension to the authorship information, the entire citation information set and some bibliographical data like the serial identifier of the scientific journal that published the paper and the language in which it was written.

It is worth noting that the two datasets (the authors list and the related works list) are interrelated sources of information in a specific domain of knowledge, that is, the scientific collaboration among scholars where at least one of the authors is a tenured Italian academic statistician. Such a framework can conveniently be represented by the notation of the Entity-Relationship model (firstly proposed by Chen, 1976), with the relationship between authors and related works belonging to the so called “many to many” relationships class: each scholar collaborates on at least one work and each work can be co-authored by more than one scholar. The strength of the last relation is expected to be particularly high among statisticians because of the various fields of applications that characterize Statistics. To confirm this hypothesis, the SCOPUS product list revealed that about 90% of the downloaded articles were written by two or more authors. Unfortunately, such list was released with the information about the authorship merged in a single field (a unique sequence of all the author identifiers separated by semicolons). To overcome this inconvenience, we developed a special Visual Basic for Application (VBA) routine to parse and decompose each authorship string. This routine resulted in a dataset of 65,797 distinct combinations of the two unique identifiers previously described (EId and SCOPUSId); among these, only 18,813 pairs of key identifiers have a SCOPUSId corresponding to one of the 758 Italian statisticians registered in the MUR registry. The very high number of pairs not referable to scholars referenced in the MUR list is another element supporting the multidisciplinary nature of Statistics, although part of them could be attributable to statisticians working abroad or (in a vary minimal part) to PhD students or other non-tenured statisticians working within the Italian university system. This dataset was passed in input to a specially devised algorithm, developed inside the R environment, aimed at building the matrix describing the number of products co-authored by each pair of authors. The Entity-Relationship model describing the relations between the various sources of information used to describe the network of collaborations among Italian statisticians is depicted in Fig. 1.

Obviously, starting from the MUR list of the tenured Italian academic statisticians, the final number of scholars identified is much greater the initial one: the resulting scientific collaboration network is composed of 23,339 nodes, corresponding to the 758 Italian academic statisticians and their co-authors (non-statisticians as well as statisticians not belonging to the tenured staff of the Italian academy), and 159,250 edges, where each edge connects a pair of nodes representing two scholars co-authored at least one of the 14,838 papers referenced on SCOPUS. It must be emphasized that the distinction between statisticians and non-statisticians cannot be retrieved from the SCOPUS database: indeed, the lists of “topics” and “subject areas” provided for each author embrace a wide range of objects and, thus, it is not possible to univocally attribute a scholar to a specific matter (i.e., statistics or other subjects).

Each edge of the network is weighted inversely according to the number of co-authors for each paper, following the proposal of Newman (2001). Assuming that the reciprocal knowledge between co-authors i and j is as smaller as higher is the overall number of scholars that collaborated on the same paper p, weight $w_{ij}$ of the edge connecting nodes i and j is defined as

$$\begin{aligned} w_{ij} = \sum _p\frac{1}{N_{p(ij)}-1}, \end{aligned}$$

(1)

where $N_{p(ij)}$ is the total number of co-authors of paper p co-authored by i and j. The assumption underlying this formulation is that scientist shares his/her time equally between the other $N_{p(ij)}-1$ co-authors. We are aware that in presence of at least three co-authors, a scientist generally spends more time with some co-authors than with others. However, due to the absence of such information (the time spent) we believe this is a good approximation to make.

For the aims of the present study, in what follows we focus on the sub-network composed of the 758 nodes, which correspond to the Italian academic statisticians distributed among the five SDSs (as mentioned in Sect. “The Italian structure of academic scientific fields”), and the related 1730 edges, with each edge connecting a pair of Italian academic statistician scholars that co-authored at least one work. Edges are weighted as above described, thus they account for the total number of co-authors of each author.

Relying on some specific network indices detailed in the next section that summarize the scholars’ work style, the present contribution will investigate the following two main research questions:

Q1::: Does belonging to a certain SDS have a significant impact on a scholar’s work style?
Q2::: Do SDSs aggregated in a same CS differ significantly from one other?

Network description

Some descriptive statistics about the set of scholars involved in the analysis are reported in Table 1 (marginal distributions) and in Table 2 (conditioned distributions per SDS).

Table 1 Distribution of Italian academic statisticians, by SDS, gender, academic role, geographical area, university size, university management type, type of delivered academic curricula (absolute and relative frequencies)

Insights from the co-authorship network of the Italian academic statisticians

Abstract

Similar content being viewed by others

The strength of strong ties: How co-authorship affect productivity of academic economists?

Analysis of the Co-authorship Sub-networks of Italian Academic Researchers

The emergence of the higher education research field (1976–2018): preferential attachment, smallworldness and fragmentation in its collaboration networks

Introduction

The Italian structure of academic scientific fields

Data collection and network characterization

Network description

Quantile regression models

Results and discussion

Evidence about the effects of SDS

Q1: does belonging to a certain SDS have a significant impact on a scholar’s work style?

Q2: do SDSs aggregated in a same CS differ significantly from one other?

Effects of control variables

Conclusions

Data availability

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Appendices

Appendix A: Distribution of network descriptive measures

Appendix B: QR models: estimated coefficients of control variables and interaction effects

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

JEL Classification

Search

Navigation