Keywords

1 Introduction

Scientists are object of evaluation for funding allocation and career promotions. The discovery of leading scientists is an important task that simple statistics over long publication records may miss. In this demo, we present an online tool for analyzing researchers under a collaborative perspective by studying and ranking their capacity to maintain the same quality/quantity levels in different research environments.

Bibliometric indicators are increasingly used to evaluate scientific careers based on personal publication records. The simple number of papers published by an author rather than the received citations are still common ways to capture both the quantity and the impact of a scientist’s work. However, these measures represent only an evaluation of what is knowable from simple database searches. Still, these numbers actually make strong assumptions on the co-authorship of the research works in terms of how proportional the collaboration was among the co-authors. In a sense, scientists may look favorably good if working in a dynamic and active research environment. On the contrary, they may result unfairly below par due to modest research collaborators.

In literature, a number of related concepts have been presented, such as the undeserved co-authorship [4] and the scientific relevance [1]. Instead, our proposed application system is oriented to the study of what collaboration means. A research collaboration can be defined as a two-way process where individuals and/or organizations share learning, ideas and experiences to produce joint scientific outcomes. Collaborations are intrinsically necessary to the production of complete and groundbreaking research. In the light of this, one of the key aspect (and more demanded in recruitment scenarios) of a successful researcher is the development of a large and active network of collaborators that helps researchers bring new solutions within the research community.

The presented online tool is able to automatically compare scientists by deeply analyzing their local co-authorship networks and how they have been crucial in the production of research articles over time. Along the paper, we will present both the theoretical and algorithmic parts of the tool as well as the set of available features which can be freely tested on http://d-index.di.unito.it.

2 Background: Formalization of Scientific Collaborations

Based on the theoretical works proposed in [2, 3], for this demo application, we make use of a formalization of the co-authorship network that represents the environment in which a researcher has produced his/her scientific outcomes. Given two collaborating researchers (also called authors), \(r_i\), \(r_j\) and their common scientific network \(N^t_{r_i,r_j}\), defined as the set of researchers who collaborated with them, the autonomy of their collaboration \(a^t_{r_i, r_j}\) at time t is calculated as:

$$ a^t_{r_i, r_j} = \left\{ \begin{array}{l l} 0 &{} \quad \text {if}\ N^t_{r_i,r_j} = \emptyset \\ \frac{1}{\sum _{r_k \in N^t_{r_i,r_j}} \left( \sum _{x = 1}^{c(r_k, O^t_{r_i,r_j})} \frac{1}{x}\right) } &{} \quad \text {if}\ N^t_{r_i,r_j} \ne \emptyset \end{array} \right. $$

where the function \(c(r_k, O^t_{r_i,r_j})\) returns the number of times a researcher \(r_k\) co-authored a paper with both \(r_i\) and \(r_j\) at time t. The higher the autonomy the more independent the work of \(r_i\) and \(r_j\) is from their research environment. We then define the dependence value of \(r_i\) on the collaboration with \(r_j\) as \(d^t_{r_i \rightarrow r_j}\) as

$$ d^t_{r_i \rightarrow r_j} = \frac{p^t_{r_i,r_j}}{p^t_{r_i}} \times \frac{a^t_{r_i,r_j,N^t_{r_i}} + a^t_{r_j,\lnot r_i,N^t_{r_i}} }{a^t_{r_i,r_j,N^t_{r_i}} + a^t_{r_j,\lnot r_i,N^t_{r_i}} + a^t_{r_i,\lnot r_j,N^t_{r_i}}}, $$

where p is a productivity score (number of published works) of a is the autonomy score. The dependence value \(d^t_{r_j \rightarrow r_i}\) ranges from 0 to 1; in particular, \(d^t_{r_i \rightarrow r_j} \approx 0\) indicates that the dependence of \(r_i\) on \(r_j\), at the time t, is negligible, while a \(d^t_{r_i \rightarrow r_j} \approx 1\) highlights the contrary.

Thus, given the complete set of dependence values, for each year and relative to each co-author, we calculate the researcher’s dependence trajectory, by calculating the standard deviation, along the time, of each dependence value, for each co-author, from the optimal attended value of 0 (which would mean a dependence score of 0; i.e., the production of the considered researcher is independent from the collaboration with the considered co-author). In a sense, we aim at evaluating the overall independence of a researcher from the surrounding community. More formally, given a researcher \(r_i\), we define his/her dependence trajectory \(\overrightarrow{d_{r_i}} = \{sd^{t}_{r_i}, sd^{t+1}_{r_i},\cdots , sd^{t+n}_{r_i}\}\), where \(sd^{t}_{r_i}\) is calculated as

$$ sd^t_{r_i} = \sqrt{\frac{\sum _{r_k \in N_{r_i}} (d^t_{r_i \rightarrow r_k})^2}{|N_{r_i}|}}. $$

In words, the system detects anomalies in the collaboration patterns with respect to the attended behavior. Researchers, in fact, are expected to increment their collaboration network over time becoming independent from their single collaborations.

Fig. 1.
figure 1

Profile of a researcher in the presented application demo.

We can use these values to properly compare, and rank researchers with similar characteristics. More in detail, we provide a radar chart that can rank the independence performance of a considered researcher with respect to those who have (i) similar career length, (ii) similar number of publications, (iii) similar number of co-authors. We also provide a comparison with the active researchers and the whole community. Finally, we will integrate a feature to compare researchers with respect to topics automatically extracted from publication titles.

3 Application and Demo Scenario

In this section, we present our application, available at http://d-index.di.unito.it, for analyzing, comparing and ranking scientific collaboration patterns of researchers. As data input, we considered the DBLP data setFootnote 1, containing information about 1,717,211 authors and 3,268,812 scientific papersFootnote 2.

Fig. 2.
figure 2

Another screen-shot, taken from http://d-index.di.unito.it, that compares collaboration patters of Sir Tim Berners-Lee and Dr. Christos Faloutsos.

As shown in Fig. 1, the proposed application initially allows to search for any author indexed by DBLP and to analyze his/her scientific profile and her/his collaboration history over time (through several features and visualizations). Then, the online demo provides the following analyses:

  • Collaborations over time. The user can analyze the evolution over time of each scientific collaboration for a searched researcher.

  • Collaboration Pattern Analysis. The system can visualize the above-mentioned dependence patterns through a curve metaphor, mapping the evolution of the dependence of a researcher on the support of each co-author along the career. With this chart, it is also possible to select/deselect additional co-authors to make further analyses and comparisons.

  • Temporal Analysis. The demo provides a dynamic visualization chart (“time-lapse”) which allows the user to focus on a specific time interval and/or a subset of co-authors.

  • Ranking. This tool also allows to compare and rank the overall independence of an author, along his/her whole career, with the whole research community (Fig. 2). This visualization permits to focus on how much the entire production of a researcher can be considered dependent on the interactions with her/his local community.

The presented demo can be used to analyze each researcher in the entire DBLP community by also considering similar profiles (with parameters such as number of papers, number of co-authors, and length of career).