Keywords

1 Introduction

In contemporary science policy debates, one of the most heated discussions concerns the role and effects of research performance metrics in research assessment frameworks. According to the advocates of using these metrics, indicators based on citations and publications would be more objective than the traditional peer review system, hence allowing for breaking ‘old boys circles’ and hampering nepotism, cronyism, and other inappropriate academic practices (Geuna & Martin, 2003). Moreover, by setting measurable thresholds and benchmarks, performance metrics would stimulate both the quantity and quality of scientific production (Bonaccorsi, 2015; Geuna & Martin, 2003; Moed, 2017). Finally, research evaluation based on metrics would be less expensive than peer review, it would save taxpayers’ money (Geuna & Piolatto, 2016), and as recent evidence shows, it may provide comparable results, at least when the need is to assess research performance at the institutional level (Checchi et al., 2021). In addition, at the individual level, the predictive power of bibliometrics is superior to peer review in almost all disciplines (except medicine). On the other hand, critics insist on the ‘unintended consequences’ of using metrics and on the ‘constitutive effects’ that their pervasive presence has on the behaviour of researchers (Dahler-Larsen, 2014). These effects include goal displacement (scoring high on the metrics becoming a target in and of itself), promotion of the unethical use of citations (excessive self-citation, creation of citation cartels, the strategic exchange of citations, etc.), task reduction (academic activities that are not considered in the calculation of the indicators, such as teaching and public engagement, being avoided), and an artificial increase in productivity by ‘salami slicing’ (dividing one scientific work into multiple publications) (Fochler et al., 2016; de Rijcke et al., 2015). Even the recent rise in retractions and research misconduct (e.g. fabrication of results, plagiarism, ‘p-hacking’, etc.) has been linked to the increasing pressure of metrics (Biagioli et al., 2019).

One of the most interesting criticisms raised against metrics in research evaluations is that they would not only affect the behaviour of researchers, but also the epistemic content of the science they produce (i.e. ideas, research themes, methods, etc.). In particular, the excessive weight of metrics would damage the pluralism of scientific enquiry, rewarding the mainstream approaches not because of their scientific merit, but only because of their (transient) popularity or connection with academic power. For instance, in a recent joint declaration, the Académie des Sciences, Leopoldina, and the Royal Society (Académie des Sciences et al., 2017) write that ‘undue emphasis on bibliometric indicators […] may also hinder the appreciation of the work of excellent scientists outside the mainstream; it will also tend to promote those who follow current or fashionable research trends, rather than those whose work is highly novel and which might produce completely new directions of scientific research’ (p. 2). Metrics have been blamed for inducing risk avoidance in science: the researchers, under the pressure of scoring well on indicators, would focus on topics, research programmes, and methods that are more likely to be rewarded. By contrast, they would avoid revolutionary ideas, out-of-the-box innovation, and interdisciplinarity because these would be deemed to be too risky enterprises. Thus, orthodoxy and conformism would be promoted at the expense of critical thinking, damaging scientific progress.

Until recently, the presence and magnitude of the effects of metrics on scientists and science have been debated more often than empirically investigated. In recent years, the empirical study of the effects of research evaluations on research practices has begun. An increasing body of literature has started documenting how researchers react under competitive conditions, which may affect their likelihood of promotion, particularly how research evaluation frameworks based on ‘metrics’ have induced a change in the publishing behaviour of researchers. In Italy, for instance, recent evidence shows that the introduction of research evaluation procedures has promoted strategic behaviours among researchers via the creation of ‘citation clubs’ that are aimed at artificially inflating bibliometric outcomes (Baccini et al., 2019; Scarpa et al., 2018; Seeber et al., 2019).

However, studying the impact of metrics and the evaluation on the epistemic content of research, that is, on the theories and ideas that are produced by the scientific community under the regime of metrics-based evaluation, is still in its infancy (Muller & de Rijcke, 2017). In particular, the accuracy of the mainstream criticism outlined above is still to be addressed by empirical studies.

In this chapter, we propose a mining approach for the detection and analysis of ‘mainstream topics’. The proposed idea is that topics featuring mainstream research can emerge through an analysis of the epistemological aspects of scholarly publications extracted from conventional publication metadata, such as the title, the author-assigned keywords, and the abstract. As a first contribution, we provide a conceptual analysis of the notion of mainstream research that is exploited to enforce mainstream profiling based on peculiar behaviours/trends of research topics along a considered time interval. As a further contribution, we define a disciplined approach and the related techniques for topic mining based on the use of publication metadata and natural language processing (NLP) tools. Finally, a case study analysis is presented to assess i) the applicability of the proposed techniques to a real scenario based on a publication dataset of Italian scholars and ii) the scalability and reliability of some of the case study results when the proposed approach is based on a richer and comprehensive database of all international publications, as collected by Scopus Elsevier over 14 years.

The chapter is organised as follows: In Sect. 2, studies on the epistemic impacts of metrics and techniques for automatic topic extraction from scholarly publications are briefly presented. In Sect. 3, a conceptual analysis of mainstream and what it comprises is provided by highlighting the different and sometimes opposite meanings that are attached to this concept in the literature. In Sect. 4, we present our modelling considerations about mainstream profiling. The proposed approach and techniques to topic mining are illustrated in Sect. 5. In Sect. 6, the results obtained by applying the proposed techniques to both a real publication dataset of Italian scholars and to the whole Scopus publication set on the same disciplines are discussed. Finally, our concluding remarks are given in Sect. 7.

2 Literature Review

By the epistemic impacts of metrics, we are focusing on the array of changes induced by metrics-based evaluation regimes on the epistemic processes of knowledge production and their outputs (scientific ideas, theories, research programmes, etc.) (Muller & de Rijcke, 2017). Epistemic impacts should be analytically distinguished from the effects of metrics on the social structure of science and, specifically, on its reward system, even if, in concrete situations, both kinds of impacts are likely to occur together. The decline of interdisciplinary research and reduction of scientific pluralism are examples of the epistemic impacts of metrics. By contrast, the rise of self-citations and gift authorships are examples of changes in the reward system of science.

The reward system-related effects are relatively easier to capture using quantitative methods because they can be inferred from analysing publications and citations. Starting from the pivotal study of Butler (2003) on the effects of the Australian research evaluation system, most scientometric studies so far have focused on these quantitative indicators to investigate the changes in researcher behaviour under the pressure of metrics (Abramo et al., 2019; Abramo et al., 2021; Baccini et al., 2019). Epistemic impacts, on the other hand, are more difficult to track for three main reasons. First, epistemic concepts, such as interdisciplinarity, scientific pluralism, and scientific mainstream, do not have standard, uncontested definitions. Second, the quantitative operationalisations of these notions frequently run the risk of reducing complex phenomena to monodimensional measures that miss important epistemological nuances. Third, there is no consensus on what epistemic factors contribute the most to scientific progress. For instance, philosophers of science have long debated what degree and what kind of scientific pluralism is beneficial to scientific enquiry (see (Viola, 2018) for a detailed discussion of the literature on this topic). The epistemic deviations induced by metrics are difficult to point out because there is no universally accepted baseline normative epistemology that accounts for the correct functioning of science.

In light of these methodological and theoretical impasses, most of the research on the epistemic impacts of metrics so far has turned to methodologies, such as surveys and interviews, showing how researchers themselves perceive the pressure of metrics on their epistemic practices. In one of the first studies of this kind, Muller and de Rijcke (2017) interviews 38 Dutch and Austrian post-docs and junior group leaders in the life sciences, finding that researchers pervasively ‘think with indicators’. Indicators such as the journal impact factor do not intervene only in the evaluation of research after the fact, but also inform the entire research process, from the very conception of research projects to the choice of scientific collaborators and even the animal models used. Castellani et al. (2016) reach a similar conclusion after giving out a questionnaire to 12 Italian scientists from several disciplines. Their interviewees underline the risk that metrics in research evaluation can promote uniformity in the scientific community and discourage ground-breaking approaches. Also, the interviewees argue that metrics worsen the ‘publish or perish’ culture and induce scientists to publish low-quality material just to score better on productivity indicators. Feenstra and Lopez-Cozar (2021) interview 14 Spanish researchers in philosophy and ethics about the effects of metrics in their disciplines. Even though the interviewed researchers identify some positive effects, such as more transparent policies in the academic promotion process, they deem the impact on research agendas, publication language, and mental health as negative. In particular, metrics would hamper intellectual diversity in philosophical research and even lead to research misconduct. These studies highlight how metrics and indicators have gained a prominent place in the ‘epistemic living space’ of researchers, both in the natural and social sciences (Felt, 2009).

One limitation of these studies, however, is that they do not discuss the epistemic concepts used by the researchers to frame their experience of metrics. In this sense, then, they offer a valuable but partial perspective on epistemic impacts. By contrast, the present study is the first attempt to ground an investigation of an epistemic phenomenon, that is, the scientific mainstream, in a conceptual analysis of the related epistemic concept.

Further related work focuses on the methods and techniques for the classification of scholarly publications. Usually, a combination of automated procedures and manual activities/practices has been proposed (Glenisson et al., 2005). Solutions based on the use of human-assigned metadata, such as superimposed subject categories of articles and journals, represent a popular solution (Borner, 2010). This approach is effective when the choice of subject categories is shared by the final users and the classification results provide a scholarly picture in which the actors (i.e. the publication authors) can self-recognise the categorisation of their scientific products. However, manually defined subject categories are characterised by several well-known weaknesses. For instance, predefined categories are typically inadequate for dealing with publications about emerging topics characterised by recent formation and a new epistemic body (Suominen & Toivanen, 2016). Machine learning and unsupervised classification/clustering approaches have recently been proposed for overcoming such limitations. For instance, in Boyack et al. (2011) and Talley et al. (2011), topic modelling and clustering solutions are exploited to provide a visual, graph-based representation of a publication dataset extracted from the MEDLINE repository and the National Institutes of Health (NIH), respectively. Similar approaches have been investigated (Nichols, 2014; Yan et al., 2012) for the information retrieval field and for the National Science Foundation awards, respectively. On the other hand, the construction of a map of science merely derived from scholarly data by using automated classification algorithms is characterised by possible limitations, as well. For instance, automated solutions are generally weak in capturing the minor trends within a discipline, even if they provide a relevant contribution from the historical and epistemic point of view. A recent comparison between unsupervised learning and human-assigned approaches to classification of scholarly data has been provided (Suominen & Toivanen, 2016); in the study, a topic modelling solution based on the latent Dirichlet allocation (LDA) algorithm is exploited. The results show that it is difficult to argue the superiority of one method (human-based scholarly data classification) over the other (algorithm-based scholarly data classification) (Suominen & Toivanen, 2016). However, it is well recognised that machine-generated scholarly data classifications provide a strong contribution in terms of practicality (Castano et al., 2018). This means that the capability to rapidly generate thematic, interactive views of an underlying (large) scholarly publication dataset can be considered as a result, but it also represents a worth support/contribution for experts that aim to further refine/revise the obtained results to provide their own data views.

3 Conceptual Analysis of the Mainstream Notion

Etymologically, the term ‘mainstream’ refers to the main current of a river or a stream. According to the dictionary, the mainstream is the ‘prevailing current of thought, influence or activity’. As an adjective, ‘mainstream’ means ‘representing the prevalent attitudes, values, and practices of a society or a groupFootnote 1’. The term usually belongs to the context of artistic and cultural phenomena, where it is mainly used to denote trends in popular and media culture.Footnote 2 Sometimes, it takes a pejorative sense by subcultures who view the mainstream culture as artistically inferior.

When it is employed in a discussion about science, ‘mainstream’ preserves its nature as a common language term. However, a precise and widely accepted definition of what ‘mainstream’ means in reference to science is missing, as is an operational definition of how to measure it.

The term can be used as a noun (‘the mainstream in economics’), as well as an adjective (‘mainstream science’). In both cases, mainstream is said of many different aspects of the scientific enquiry, from the most abstract to the most practical. Mainstream can be the following:

  • A general theoretical framework or research programme (e.g. the neoclassical approach in economics)

  • A specific theory (e.g. the big bang theory in cosmology)

  • A position in a debate (e.g. the functionalism in the philosophy of mind)

  • A research object or topic (e.g. the quark bottom in high energy physics)

  • A research methodology (e.g. participant observation in cultural anthropology)

  • A technique, a research protocol, or a procedure (e.g. the PCR in molecular biology).

In the empirical study of what is mainstream, such variability must be considered to set an appropriate level for the analysis. Different empirical methods will capture the mainstream at different levels of ‘granularity’, depending on the scientific aspect being considered. However, the most important feature about the term and its usage is that it assumes different and sometimes opposite meanings in the literature. ‘Mainstream’ is used to reference not only different things, but also different and sometimes incompatible ways. By surveying the literature, we can analytically distinguish six key meanings, whose differences can be appreciated better when they are compared with their opposites.Footnote 3

  1. 1.

    ‘Orthodox’ vs. ‘heterodox’ or ‘fringe’. In this sense, the mainstream is the dominant school of thought within a certain discipline or field. The mainstream is characterised by adherence to certain scientific content (e.g., a theory, a research programme, a method, etc.). The nonmainstream schools, on the other hand, are characterised by their refusal of some of the mainstream’s tenets. In this sense, the term is used both positively and negatively. In the positive sense, the mainstream represents the standard view of the scientific community, whereas heterodox schools represent the margins of science, dangerously bordering pseudo-science (‘fringe science’) (Gottfredson, 1997). By contrast, when the term is used negatively, the closed-mindedness or refusal of the ‘pluralism’ of the mainstream is stressed (Colander et al., 2004). An instance of this meaning can be found in economics, where heterodox schools (e.g., Marxists, post-Keynesians, feminists, Old Institutionalists, and Austrians) are distinguished from the neoclassical mainstream. In Anglo-American philosophy, analytic philosophy may be considered the mainstream, whereas Continental philosophy can be thought of as the heterodox approach (Katzav & Vaesen, 2017).

  2. 2.

    ‘Normal’ vs. ‘Revolutionary’, ‘Ground-breaking’. The second meaning of mainstream refers to the distinction between normal and revolutionary science (Kuhn, 1996). Mainstream science would be characterised by a step-by-step, cumulative nature, whereas nonmainstream science would be more revolutionary, ground-breaking, and frequently not understood by the mainstream because of this. Mavericks and misunderstood geniuses would be the typical makers of nonmainstream science (Heinze, 2013). Compared with the first meaning, the focus is on the mode of scientific progress rather than on the adherence to some specific theory.

  3. 3.

    ‘Popular’ vs. ‘Niche’. The third meaning is neutral with respect to the scientific content of mainstream and nonmainstream science. Here, mainstream is purely characterised in quantitative terms as the research that is currently done by most of the researchers in a field. No judgement on the orthodoxy or progress of the mainstream is implied. Nonmainstream science is not considered as heterodox or ground-breaking, just as topics that are addressed by fewer researchers. This meaning of mainstream can be compared with the concept of ‘impact’ in a citation analysis. When the number of citations received by a publication is not considered a proxy of its scientific quality (a normative concept), it can nonetheless be considered a measure of popularity or influence (a descriptive concept).

  4. 4.

    ‘Trendy’, ‘Short-lived’, ‘Passing’ vs. ‘Stable’, ‘Long-Lasting’. In this sense, the focus is on the temporal extension of mainstream science. The mainstream is equated with the currently ‘hot’ areas of research. However, the short life of these areas is also implied. From this point of view, a mainstream researcher is a researcher who follows the trends, doing what everybody does at that moment. Mainstream topics are the ones that are currently fashionable in the research community, for whatever reason. These mainstream topics have the highest chance of being published in the highest-ranked journals and produce a high impact in terms of citations. In the literature on the perverse effects of research metrics, mainstream is frequently intended in this sense (e.g. de Rijcke et al., 2015).

  5. 5.

    ‘Supported by (academic) power’ vs. ‘Underground’. This meaning focuses on the socioacademic dimension of the mainstream, stressing the connection between mainstream and power. Mainstream science is what is defended by academic elites in prestigious universities and supported by economic and industrial powers. By contrast, nonmainstream science is seen as resistant. The underground approaches are not published in mainstream journals and are unlikely to receive funding through normal channels, even though they might receive funding from alternative sources. This meaning echoes the distinction between the underground or independent labels and the ‘majors’ in the music industry.Footnote 4 In economics, it is not unusual to describe the difference between mainstream and heterodox schools by pointing out not only the theoretical divergences, but also the different relationships that they entertain with economic powers (Cedrini & Fontana, 2018; Colander et al., 2004).

  6. 6.

    ‘Core’, ‘Western’ vs. ‘Periphery’, ‘Non-Western’. This meaning of mainstream is eccentric compared with the others and is found only in the bibliometric literature on the scientific production of developing countries (e.g. Gasparyan et al., 2017). In this literature, mainstream is used as a synonym of Western, and mainstream science is the scientific research either produced by highly developed countries or published in international outlets. By contrast, nonmainstream science is the science produced in developing countries and published in local journals.

Note that the six meanings, even if they are closely related, should not be considered synonyms. In fact, they are not mutually implied. For instance, a molecular biologist can deal with a niche topic (nonmainstream according to meaning 3) by applying a standard experimental method (mainstream according to 1). As a further example, an astrophysicist can investigate a ‘trendy’ celestial object (mainstream according to meaning 4) but in the context of a heterodox cosmological model (nonmainstream according to meaning 1). In Fig. 1, a summary of the six meanings, their opposites and the aspect of mainstreamness they highlight are provided.

Fig. 1
figure 1

The six different meanings of ‘mainstream’ in reference to science. The notion of mainstream has no universal meaning in discussions about science and science policy. Specifically, six different meanings can be analytically distinguished in the literature, noting that sometimes two or more meanings are intended at the same time. In the first column of the table, the key terms that capture each meaning are presented, along with their opposites (second column) that contribute to specifying their semantic content. Each meaning stresses a different dimension of the notion of mainstream, focusing on various aspects of the scientific activity. In the last column of the table, examples of studies that employ each meaning of the notion are provided

4 Modelling Mainstreams

Previous approaches to the mainstream definition have led to different operational definitions of it.

The meanings 1 and 5 (‘orthodox’ and ‘supported by power’) require considerable expert knowledge of the scientific fields to assess whether a publication belongs to the mainstream. To empirically investigate the mainstream that is intended in this sense implies gathering the opinion of several experts, with evident limitations in the number of publications that can be considered. Meaning 6 (‘core’) is easier to treat with quantitative methods because the geographical information can be retrieved automatically from the publications’ metadata. However, this meaning of mainstream is less interesting from the point of view of the debate on research metrics.

Hence, we remain with meanings 2, 3, and 4, that is, ‘normal science’, ‘popular’, and ‘trendy’. With relative ease, meanings 3 and 4 can be translated into quantitative measures. Popularity can be measured by the number of publications addressing a topic, whereas the trendiness of a topic can be measured by its temporal extension. Meaning 3 is particularly interesting because it refers to the epistemological concepts of normal versus revolutionary science advanced by Kuhn. Some observations by Kuhn and Lakatos can help us translate (partially) these notions into measures. According to Kuhn, during the normal science period, a paradigm is ‘articulated’ by the researchers, that is, it is expanded in different directions. Lakatos calls these paradigm articulations ‘progressive research programmes’ (1978). A progressive research programme can be recognised by its capacity to produce new research lines, that is, by its fruitfulness (Ivani, 2019). Thus, meaning 2 can be measured as a factor of productivity or the fruitfulness of a topic.Footnote 5

The proposed approach to mainstream detection integrates the following three meanings of the term: popularity (meaning 3), trendiness (meaning 4), and fruitfulness (meaning 2). They constitute the three dimensions of what is mainstream that will be considered in our study. Based on them, several profiles of mainstream can be outlined (Fig. 2):

  1. 1.

    Spot. This profile corresponds to a short-lived topic characterised by a short burst of attention from the research community that is focused in a limited interval of time. This profile mostly relies on meaning 4 (‘trendy’).

  2. 2.

    Persistent. This profile of mainstream is based on meaning 3, popularity. It describes a topic that enjoys stable attention from the research community but that has low productivity in terms of new research lines.

  3. 3.

    Impasse. This profile describes the behaviour of a research programme that progressively decreases in importance until it becomes marginally important.

  4. 4.

    Boosting. This mainstream profile corresponds to a fruitful research programme of the normal scientific phase, hence relying mostly on meaning 2. It is characterised by a long life with a high number of descendant topics.

Fig. 2
figure 2

Mainstream profiles. By combining the three meanings or aspects of the notion of mainstream that can be quantified (i.e. popularity, trendiness,, and fruitfulness), it is possible to delineate the various temporal profiles of a mainstream topic, that is, the various modes in which a mainstream topic may develop over time. The figure shows four of these modes. From the top to the bottom of the figure, they are as follows: spot topic (a short-lived topic that attracts a burst of attention in the research community), persistent topic (a topic that enjoys stable attention in the community but does not produce new research lines), impasse topic (a topic that branches in research lines, some of which decay), and boosting topic (a topic characterised by high fruitfulness that produces several new research lines). In the figure, the relation of filiation within a topic is represented by lines, whereas the size of the research lines (quantified in terms of publications) that form a topic is represented by circles

In different ways, each of these ideal profiles of mainstream integrates the three core aspects of meanings 2, 3, and 4. Our method aims at individuating instances of such profiles into the scientific production of our case studies.

5 Semiautomatic Topic Detection

Consider a dataset of scholarly publications P = {p1, p2, …, pn}. For topic detection in P, we propose the approach shown in Fig. 3 based on a pipeline characterised by dataset acquisition, keyword extraction, keyword graph construction, topic discovery, topic filtering, and topic analysis. In the following, we first present dataset acquisition, keyword extraction, and keyword graph construction as the preparation steps; then, we focus on the subsequent activities related to topic discovery, filtering, and analysis.

Fig. 3
figure 3

The proposed mining approach to topic detection. The proposed topic mining approach is based on a pipeline where the initial publication dataset with related metadata is first submitted to a keyword extraction stage aimed at extracting relevant tokens. The tokens are then organised in a graph based on keyword co-occurrences within publications (i.e. keyword graph construction). The subsequent steps of topic discovery and topic filtering are applied to generate the set of topics emerging from the publications. Finally, a topic analysis is enforced to determine trends over topics and mainstream behaviours

5.1 Dataset Preparation

Dataset preparation has the goal of extracting keywords from publications that are representative of the study’s focus. Moreover, once those keywords have been extracted, preparation aims at explicitly representing the distribution of keywords over publications so that the co-occurrence of the same keywords in publications is highlighted.

An initial step of dataset acquisition is extracting the metadata of each publication p ∈ P, namely the title and the author-assigned keywords. The keyword extraction step is then executed on the publication metadata by applying conventional NLP techniques, such as tokenisation, lemmatisation, and 2-gram recognition based on mutual information (Manning et al., 2008). A keyword set Kp is associated with each publication p ∈ P as a result. The step of keyword graph construction is finally executed to highlight when keywords co-occur in the publication descriptions, namely in the associated keyword sets. The result is a graph G = (N, E), where \( N=\bigcup_{i=1}^{i=n}{K}_{p_i} \) is the set of nodes constituted by the overall set of publication keywords extracted from the metadata and E is the set of graph edges connecting pairs of keyword nodes. An edge eij = (ni, nj, wij) denotes that the keyword represented by the node ni co-occurs with the keyword of node nj in the keyword sets of the publication descriptions. The weight wij denotes the strength/relevance of the ni, nj co-occurrence, namely the number of publications in which ni, nj co-occur.

Example

As an example of data preparation, we consider publication p1 with the title ‘Bologne et le Cardinal Légat Bertrand du Pouget’ and the following author-assigned keywords: avignon, bertrand du pouget, bologne, cardinal légat. The following keyword set Kp1 is extracted for the publication p1:

$${ {\begin{array}{l}{K}_{p1}=\left\{\underline{\mathrm{avignon}},\mathrm{bertrand},\mathrm{bertrand}\ \mathrm{du}\ \mathrm{pouget},\underline{\mathrm{bologne}},\mathrm{cardinal},\mathrm{cardinal}\ \mathrm{l}\acute{\mathrm{e}} \mathrm{gat},\mathrm{l}\acute{\mathrm{e}} \mathrm{gat},\mathrm{pouget}\right\} \end{array}}}$$

Similarly, consider a further publication p2 characterised by the following keyword set Kp2:

$$ {K}_{p2}=\left\{\underline{\mathrm{avignon}},\mathrm{bertrand}\ \mathrm{du}\ \mathrm{pouget},\underline{\mathrm{bologne}},\mathrm{histoire}\ \mathrm{de}\ {\mathrm{l}}^{\hbox{'}}\acute{\mathrm{E}} \mathrm{glise},\mathrm{jean}\ \mathrm{xxii}\right\} $$

In the keyword graph construction step, each item of the sets Kp1 and Kp2 becomes a node of the graph G = (N, E). Call na the graph node for the keyword avignon and nb the node for the keyword bologne. The edge eab = (na, nb, 2) is defined in G to denote that the keywords avignon and bologne co-occur in two publications (i.e. p1 and p2); thus, the weight of the edge between their respective nodes is 2.

5.2 Topic Discovery

The keywords used for describing publications are characterised by sparseness, meaning that the terms appearing in the keyword set Kp of a publication p are usually highly focused and are rarely employed in the keyword set of other publications. To reduce the impact of keyword sparseness and capture possible topic overlaps among publications, we exploit the idea that the keywords of a publication can be enriched with the keywords of other publications when these keywords are frequently co-occurring and, thus, when they are used within the same terminological context. For topic discovery, each publication p ∈ P is associated with an enriched keyword set\( \overline{K_p} \) that has the goal of describing the publication p with keywords that are general enough to reveal the publication topic instead of the publication focus.

For a publication p ∈ P, the construction of the set \( \overline{K_p} \) is described in the following way: Consider the keyword graph G = (N, E) built during dataset preparation and consider a keyword ki ∈ Kp. We call keyword co-occurrence context the set \( {K}_i^{\ast }=\Big\{{k}_j:\exists {e}_{ij}\left({n}_i,{n}_j,{w}_{ij}\right)\in E \)} such that there is at least one co-occurrence relation between ki and kj in G (i.e. the two keywords co-occur in the description of at least one publication). Given the publication p, we call the publication co-occurrence context the set \( {K}_p^{\ast }=\bigcup_{k_i\in {K}_p}{K}_i^{\ast } \). The set \( {K}_p^{\ast } \) contains keywords that are not directly used to describe the publication p but that co-occur with the keywords of Kp in other publications. Each keyword \( {k}_j\in {K}_p^{\ast } \) is associated with a weight ωj to denote the relevance of the keyword kj in describing the topic of the publication p. For a keyword \( {k}_j\in {K}_p^{\ast } \), the weight ωj is calculated as follows:

$$ {\omega}_j=\frac{1}{\genfrac{}{}{0pt}{}{\max {\omega}_z}{k_z\in {K}_p^{\ast }}}\ \sum_{k_i\in {K}_p}\sum_{k_j\in {K}_i^{\ast }}\alpha +{w}_{ij} $$

where α ∈  is a constant parameter and wij is the weight associated with the edge eij in the graph G, which denotes the number of co-occurrences in the publications of the keywords ki ∈ Kp and the keyword \( {k}_j\in {K}_p^{\ast } \). The α parameter is introduced to support a flexible definition of the weight ωj associated with a keyword \( {k}_j\in {K}_p^{\ast } \). In particular, the value of α is added to the weight ωj each time a keyword ki ∈ Kp co-occurs with a keyword \( {k}_j\in {K}_p^{\ast } \). When low values of α are considered (i.e. α = 0 or α = 1), the weight ωj mostly depends on the weight wij of the co-occurrences of the keyword kj with the keywords of ki ∈ Kp. When high values of α are considered, the weight ωj is increased each time a co-occurrence of the keyword kj is found with the keywords of ki ∈ Kp, despite the strength of the weight wij. This means that when α is high, we assign more importance to the keywords \( {k}_j\in {K}_p^{\ast } \) that have numerous co-occurrences with the keyword ki ∈ Kp and give less importance to the weight wij of such co-occurrences.

Finally, the enriched keyword set of a publication p is defined as \( \overline{K_p}=\left\{{k}_j:{k}_j\in {K}_p^{\ast}\wedge {\omega}_j\ge th\right\}, \) where th is a prefixed threshold to distinguish relevant versus nonrelevant keywords to include in Kp. Finally, a new graph \( \overline{G}=\left(N,\overline{E}\right) \) is generated according to the enriched keyword sets \( \overline{K} \). In \( \overline{G} \), the edges \( \overline{E} \) denote the keyword co-occurrence in the enriched keyword sets \( \overline{K} \)of the publications.

According to the enriched co-occurrence graph \( \overline{G}=\left(N,\overline{E}\right) \), we provide the following topic definition:

Topic

A topic Ts is a set of featuring keywords that describes a common research argument. A topic Ts is defined around a seed keyword ks that represents the label/name of the research argument. Given a seed keyword ks associated with a corresponding keyword node ns ∈ N in \( \overline{G} \), the topic Ts corresponds to the set of keywords associated with the nodes Ns ⊆ N connected with ks in the enriched co-occurrence graph \( \overline{G}=\left(N,\overline{E}\right) \), namely \( {N}_s=\Big\{{n}_j:\exists \overline{e_{sj}}\left({n}_s,{n}_j,{w}_{sj}\right)\in \overline{E} \)}.

We say that a publication p is about a topic Ts when at least one common keyword exists between the enriched keyword set Kp and topic Ts, namely KpTs ≠ ∅.

Example

As an example of topic discovery, in Fig. 4, we show the excerpt of a keyword set Kp and related co-occurrence context \( {K}_p^{\ast } \):

$$ {K}_p\subseteq \left\{\mathrm{avignon},\mathrm{papacy},\mathrm{xiv}\ \mathrm{century}\right\}\vspace*{-24pt} $$
$${ {\begin{array}{l}{K}_p^{\ast}\subseteq \left\{\mathrm{avignon},\mathrm{church}\ \mathrm{history},\mathrm{history},\mathrm{middle}\ \mathrm{ages},\mathrm{modern}\ \mathrm{era},\mathrm{papacy},\mathrm{xiv}\ \mathrm{century}\right\}\end{array}}} $$

On the left side of Fig. 4, we show the number of co-occurrences between any pair of keywords ki ∈ Kp and \( {k}_j\in {K}_p^{\ast } \). For instance, given ki = papacy and ki = modern era, the value wij = 5 is shown in Fig. 4 as denoting that papacy and modern era co-occur in five publications, namely eij = (ni, nj, 5) is set in the graph G.

Fig. 4
figure 4

Example of keywords and topics. An example of topic discovery. Given a publication p with keywords Kp and context \( {K}_p^{\ast } \), we show the keyword co-occurrences in the publications of the dataset (left side) and the weight ωj of each keyword \( {k}_j\in {K}_p^{\ast } \) with two different setting of the α parameter (right side)

On the right side of Fig. 4, we show the weight ωj of each keyword \( {k}_j\in {K}_p^{\ast } \) when two different settings of the α parameter are considered. When α = 1, the keywords \( {k}_j\in {K}_p^{\ast } \) with a higher weight ωj are middle ages and modern era, which are the keywords with highest wij value. It is interesting to note that modern era has a high ωj weight, even if this keyword only co-occurs with papacy in Kp. When α = 4, the keywords \( {k}_j\in {K}_p^{\ast } \) with a higher weight ωj are church history and middle ages, which are the keywords that co-occur with most of the publication keywords in Kp. It is interesting to note that the weight ωj of modern era is strongly reduced when α = 4. According to this example, by considering a threshold th = 0.8, the enriched keyword set \( \overline{K_p} \) is defined as follows:

$$ \overline{K_p}=\left\{\mathrm{middle}\ \mathrm{ages}\right\}\ \left(\mathrm{when}\ \alpha =1\right); \vspace*{-15pt}$$
$$ \overline{K_p}=\left\{\mathrm{church}\ \mathrm{history},\mathrm{middle}\ \mathrm{ages}\right\}\ \left(\mathrm{when}\ \alpha =4\right).\vspace*{-24pt} $$

5.3 Topic Filtering and Analysis

The ultimate goal of the proposed approach to topic detection is to analyse topics over time to highlight possible mainstream behaviours. To this end, topic filtering is executed to split the co-occurrence graph \( \overline{G} \) into a set of subgraphs \( \overline{G_Y}=\left({N}_Y,\overline{E_Y}\right) \), where each one is related to keyword co-occurrences in a specific year Y. A graph \( \overline{G_Y}\subseteq \overline{G} \) is constituted by i) the nodes \( {N}_Y=\bigcup_{i=1}^{i=k}\overline{K_{p_i}} \), where the sets \( \overline{K_{p_i}} \) are the enriched keyword sets of the publications from the year Y and ii) the edges \( \overline{E_Y} \), where an edge \( \overline{e_{ij}}\in \overline{E_Y} \) is defined as \( \overline{e_{ij}}=\left({n}_i,{n}_j,{w}_{ij}\right) \) and connects two keyword nodes ni, nj ∈ NY . The weight wij denotes the number of publications from the year Y in which the keywords ni, nj co-occur. We note that the number of publications per year can be (very) different from one year to another. As a result, for comparison of keyword co-occurrence weights across consecutive years, given an edge as \( \overline{e_{ij}}=\left({n}_i,{n}_j,{w}_{ij}\right) \) in the year Y, the number of co-occurrences wij is normalised by the overall number of publications from the year Y.

Given a seed keyword kswith the associated keyword node ns, a topic \( {T}_{Y_s} \) in the year Y corresponds to the set of keyword nodes \( {N}_{Y_s}\subseteq {N}_Y \) in the subgraph \( \overline{G_{Y_s}}\subseteq \overline{G_Y} \) where \( {N}_{Y_s}=\Big\{{n}_j:\exists \overline{e_{sj}}\left({n}_s,{n}_j,{w}_{sj}\right)\in \overline{E_Y} \)}.

As a result of topic filtering, a topic Ts can change over time because the set of keywords \( {T}_{Y_s} \) that characterises the topic can vary from one year to another. Moreover, when a topic is associated with a stable pair of keyword nodes ni, nj in two consecutive years Y and Y + 1, it is possible that the co-occurrence weight wij is different in the two considered years. As a result, the topic analysis step is executed to observe the behaviour of topics along time/years. In particular, the goal of this step is to recognise the possible mainstream topics according to the following definition:

Mainstream Topic

A mainstream topic M is a topic whose trend within a certain time interval of years [Y1, Y2] follows one of the mainstream profiles presented in Sect. 4, namely spot, persistent, impasse, or boosting.

Example

As an example, we consider the following enriched keyword sets associated with six publications in the time interval from 2015 to 2017:

  • \( \overline{K_0} \)(2015): church history, middle ages

  • \( -\overline{K_1} \)(2015): christianity, middle ages, history

  • \( -\overline{K_2} \)(2016): christianity, catholic church, philosophy

  • \( -\overline{K_3} \)(2016): middle ages, philosophy

  • \( -\overline{K_4} \)(2017): catholic church history, middle ages

  • \( -\overline{K_5} \)(2017): philosophy, middle ages

In Fig. 5, we show a tabular representation of the graph \( \overline{G} \) built according to the above keyword sets \( \overline{K} \). Consider a seed keyword ks=middle ages. All the keywords of Fig. 5 have at least one co-occurrence with the seed keyword; thus, they are all belonging to the considered topic Ts about ‘Middle Ages’. The strength of the co-occurrences between the seed keyword and keywords of Fig. 5 in the years 2015–2017 is shown in Fig. 6 (normalised value by the number of publications in each considered year). By observing the keyword strength in the considered time interval, we can envisage possible mainstream topic profiles. In particular, for the topic Tmiddle ages, the keyword history denotes a persistent topic behaviour (despite showing little fluctuation in 2016). We also note that the keywords church history and christianity denote an impasse topic behaviour, while the keyword philosophy denotes a boosting topic behaviour for the topic Tmiddle ages. As a final consideration of the observed mainstreams, we could claim that in the context of the Middle Ages studies, an initial interest in the History of the Church and Christianity shifted towards more philosophical studies about Catholicism.

Fig. 5
figure 5

Example of co-occurrence graph \( \overline{G} \)for a set of enriched keywords. As an example in the framework of topic discovery, the figure reports the number of co-occurrences within the publications for each pair of the considered keywords

Fig. 6
figure 6

Example of keyword weight in the years 2015–2017 about the topic ‘Middle Ages’. As an example of topic trend, we show the weight of the keywords associated with the topic ‘Middle Ages’ in the time interval of the years 2015–2017

6 Case Study Analysis

In this section, we present the results obtained by applying the proposed approach and related techniques for topic mining on a real publication dataset taken from selected institutional research archives of Italian universities. The main idea is to provide a clear description of the results we obtained, here by focusing on a few disciplines and using institutional publications data provided by four Italian universities. Regarding the case study analysis, an Online Appendix is provided with complementary figures and comments. The Appendix is available for download at the following link: http://islab.di.unimi.it/content/maverick_data/appendix.pdf.

6.1 Dataset Description

The proposed case study is based on a publication dataset collected from selected Italian universities. In the early 2000s, most Italian universities started to populate and maintain institutional research archives for persistently storing publications and research products. In particular, each university supported the creation of its own repository based on products published (and compulsorily uploaded) by its affiliated scholars. In selecting both universities and research areas to consider for building the dataset of the case study, we relied on the following recommendations: i) choose large, representative Italian universities, ii) choose a few selected research areas, and iii) compose a dataset that is representative of both bibliometric and nonbibliometric research areas according to the Italian regulation for research evaluation. As a result, the following four Italian universities have been selected: UNIBO—University of Bologna (with 2896 academic researchers—data consulted on April 15, 2021, from https://cercauniversita.cineca.it/php5/docenti/cerca.php), UNIMI—University of Milan (with 2258 academic researchers), UNIRM—University of Rome ‘La Sapienza’ (with 3350 academic researchers), and UNITO—University of Turin (with 2086 academic researchers). Moreover, among all the available disciplines, we focus on those publications authored by all scholars of the following research areas (as defined by The Italian National University Council—CUN): A01 (mathematics and informatics), A11 (history, philosophy, pedagogy, and psychology), and A13 (economics and statistics).

A summary view of the collected dataset is provided in Fig. 7. The dataset contains 123,504 publications labelled with 124,820 author-defined keywords.

Fig. 7
figure 7

Summary picture of the Maverick dataset. Number of publications and keywords in the Italian case study by university (Univ. of Bologna, Univ. of Milan, Univ. of Rome, and Univ. of Turin) and discipline (scientific area of study as classified by ANVUR)

For topic mining purposes, the keywords and publication titles are exploited. It is important to note that 58,585 publications of the dataset do not provide any keywords. For these publications, only the keywords extracted from the title are then used for topic mining. As a further remark, we observe that the number of publications per year is not constant. In fact, at the beginning of the 2000s, only a few publications were inserted into the selected archives by their authors, and this practice became a regular—and often compulsory—routine only around the year 2005. For this reason, the considered publications in this empirical exercise cover the years from 2005 to 2018. It is also important to stress that the number of publications per year is continuously increasing throughout the whole observed period because of the increasing role of performance-based exercises in Italy; thus, a normalisation step is required when the analysis focuses on the consistency of topics across different years.

6.2 General Results in Italian Academia

The results obtained on the considered dataset are briefly discussed by separately exploiting the publications of each considered research area, namely A01, A11, and A13.

First, to identify the mainstream profiles defined in Sect. 4 using the bibliometric data collected for this case study, we started defining a couple of synthetic operators to describe a topic’s behaviour over time. For any given topic k belonging to its G graph, we first generate a matrix of the number of links with all the other existing topics within the same discipline during the observed years, where in each of its cells, we have the corresponding number of papers. Then, we compute two simple correlation coefficients, ρkt, j, for any identified topic k: a pair of ρ-s (namely ρk, t and ρk, j) that represents the correlation coefficients of the topic’s number of publications published over time (t) and of the number of topic links that the selected topic (k) establishes with the other topics (j) in the discipline, respectively.

Figure 8 shows how a topic may behave according to different combinations of the values defined by its pair of ρ-s coefficients. Using the computed ρ coefficients, it is possible to broadly map the mainstream topic profiles defined in Sect. 4 in the area defined by the two ρ pairs.

Fig. 8
figure 8

Topic behaviour in the Italian case study according to ρ-s coefficients. Figure 8 shows how a topic may behave according to different combinations of the values defined by its pair of ρ-s coefficients. Examples of topic characteristics are provided in different zones of the provided space to show how the mapping of mainstream topic profiles defined in Sect. 4 can be obtained in the ρ-s coefficients space

A spot topic, for example, corresponds to a short-lived topic that, after a burst of attention from the research community in the past, is now abandoned. A ‘trendy’ topic within a discipline appears in the bottom left area of Fig. 8 (e.g. grey circle).

An impasse topic describes the development of a research programme having some topic links that died in recent years; this is in the bottom middle area of the graph (e.g. below the big yellow circle).

A persistent topic identifies a mainstream topic that enjoys stable attention from the research community but with low productivity in terms of links with new research lines. This topic appears in the bottom right part of Fig. 8 (e.g. purple circle).

A boosting topic, which has been described in Sect. 4 as a topic characterised by a long life and a high number of connections with other topics, is the top centre or top right corner of Fig. 8 (e.g. green and/or red circles).

In addition to this, Fig. 8 may be useful to identify niche topics, like the orange-like circle in the top left area, which is characterised by a decreasing number of papers published in the past few years along with an increasing number of topic links.

For each one of the disciplines considered in this empirical study, a figure has been created that visualises a map similar to Fig. 8, with a circle for each topic characterising the discipline during the period under investigation. Circle size and colour represent the number of topic links and topic size (e.g. number of publications), respectively. Each map describes the corresponding discipline using the proposed topic approach (Fig. 9).

Fig. 9
figure 9

Mathematics and informatics. Each of the considered disciplines in this empirical study is visualised on a map (Mathematics and informatics only is reported above), showing a circle for each topic characterising the discipline during the period under investigation. Circle size and colour represent the number of topic links and topic size (e.g., number of publications), respectively

Then, for each of the relevant topics of the identified mainstream categories, we compute a matrix describing the evolution over time of all the topic links generated by the topic itself, here based on the structure described in Fig. 6. Each measure of the evolution of a topic link over time is standardised by its dimension to generate a comparable heatmap that clearly visualises the temporal topic’s dynamics. The case study on A01 is reported here, including the figures mentioned above. For areas A11 and A13, the figures are reported and described in the Online Appendix (see Figs. 1–4 in the Appendix).

Area 1—Mathematics and Informatics

As a first example, Fig. 10a provides the heatmap of the topic ‘privacy’. This is an impasse topic with negative values for both the correlation coefficients (ρ-s). Each row of the heatmap represents the topics with which the topic ‘privacy’ reports links over time, and the colour indicates the intensity of such links. Red means few links, whereas white means many links. According to this heatmap, the topic ‘privacy’ used to be linked to topics like ‘access’, ‘security’, and ‘network’ in the past, whereas recently, it started to be associated with different topics such as ‘data’ and ‘systems’. This dynamic seems to be pretty much in line with the current increasing availability of new sources of (individual) data, for example, hospital individual data or credit bank transactions, which consequently challenges new issues related to data ‘privacy’ concerns.

Fig. 10
figure 10

Examples of a heatmap for mathematics and informatics. In (a) the topic privacy, in (b) the topic social. Examples of a heatmap for mathematics and informatics. Each measure of topic link evolution over time is standardised by its dimension to generate a comparable heatmap that clearly visualises the temporal topic’s dynamics in the discipline

Figure 10b provides the heatmap of the topic ‘social’. Both ρ-s are positive, which makes it a boosting topic. The corresponding heatmap suggests that this topic is now (in 2016–2018) very much linked with topics like ‘sentiment analysis’ and ‘social networks’ (e.g. Twitter).

This is again very much in line with the new and fast-growing literature that uses ‘big data’ to extract indicators to summarise, for example, users’ opinions. By contrast, at the beginning of the sample period, the topic ‘social’ was associated with more traditional topics like ‘education’, ‘participation’, ‘university’, and ‘discrimination’.

Area 11—History, Philosophy, Pedagogy, and Psychology

Turning to Area 11, the topic ‘ageing’, represented in detail in Fig. 2a of the Online Appendix, provides another interesting example of an impasse topic with both negative correlation coefficients (ρ-s).

In the most recent years, this topic has very much been associated with ‘experience’, ‘activity’, ‘creativity’, ‘life’, and ‘health’, while in the previous years, it used to be linked with discussions and studies more focused on the past (i.e. ‘history’ and ‘wars’).

Because the problem of the ageing of the population is increasingly and extremely relevant, the topic ‘ageing’ seems to be now more associated with discussions related to aged people’s quality of life (both in terms of health and wealth) and their occupations rather than their past historical memories.

In addition, the topic ‘female studies’ is a good example of a boosting topic (both rho-s are positive and high). According to graph Fig. 2b in the online supplementary material, this topic has been recently associated with topics like ‘child’ and ‘adolescent’ (which suggests an emerging focus on the relationship between mothers and children), ‘male’ (which points to gender-related studies), or ‘patient’ and ‘effect’ (which relates to the literature of causal analysis of health issues, which often may provide heterogeneous effects by gender).

By contrast, in the past, ‘female studies’ have been associated with topics related to women’s mental status (e.g. ‘mental health’, ‘stress’, ‘personality’, ‘attention’, ‘perception’, ‘memory’, ‘brain’, and ‘neuro’). In addition, it used to be associated with ‘work’ and ‘quality of life’, which may refer to work–life balance issues that appeared commonly in the literature.

Area 13—Economics and Statistics

Figure 3 in the Online Appendix shows a pretty different scenario for Area 13 compared with Area 11 and Area 1.

In fact, Fig. 4 in the Online Appendix shows three heatmaps for the three following topics: ‘development’, ‘taxation’, and ‘network analysis’.

Network analysis’ can be defined as “a set of integrated techniques to depict relations among actors and to analyse the social structures that emerge from the recurrence of these relations” (see Smelser & Baltes, 2001).

From our analysis, it may be characterised as a boosting topic that exhibits positive values of rho-s. Although in the past it focused on theory (being related with abstract analysis) and empirical analysis, it has been recently applied among economists, econometricians, and statisticians to topics such as ‘sentiment analysis’, ‘Twitter’, and ‘social media’, generating a new strand of literature studying a ‘network analysis’ taking advantage of the new sources of (big) data now available.

On the contrary, a clear example of an impasse topic in economics and statistics is represented by the topic ‘taxation’. The economics of taxation mainly collects studies regarding both the effects and consequences of taxes on economic decisions, as well as on how to efficiently design tax systems (e.g. income, capital, environmental taxes).

For this topic, both rho-s are negative, meaning that there has been a decreasing interest in this topic over the past decade. However, looking carefully at its development over the past few years (see Fig. 4c in the Online Appendix), it seems quite reasonable to identify dead links with topics such as ‘literature review’, ‘inequality measures’, ‘country taxation’, and ‘equity’ in favour of new emerging trends with topics like ‘income distribution’, ‘evidence’, and ‘effects’, which are very much in line with recent works on the global evolution of inequality, taxation top income dynamics, progressive wealth taxation, and so forth.

Finally, an example of a persistent topic is also identified in the economics and statistics area when looking at the topic ‘development’, for which both rho-s are almost close to zero. Development economics is a branch of economics that focuses on studies of economic, health, education, and social conditions in developing countries (especially low-income ones) compared with developed ones.

The heatmap for this topic, as represented by Fig. 4b in the Online Appendix, makes evident how this topic turns from being historically related with topics like ‘global’, ‘sustainability and growth’, or ‘industry’ in the past to new research frontiers aiming to explore how to estimate the ‘effects of policies’ and field interventions in emerging countries, often following—also in Italian academia—the studies winning Nobel Prizes in 2019 on the use of randomised clinical trials (RCTs) in this field to measure their ‘performances’, as well as on ‘innovation’ and ‘new perspectives’ in general.

6.3 Robustness of the Proposed Approach Using an International Dataset over 14 Years

To show the ability of the proposed approach to identify the existing publication topics, their evolution over time and their topic links in a broader (not only restricted to the national context as in the Italian case described before) and international context, we rely on different data sources: Scopus Elsevier. Through the Elsevier API service, we downloaded all the Scopus research products published between 2005 and 2018 that are classified as instances of at least one of the following subject areas (each journal may belong to more than one subject area): business, economics and econometrics, decision sciences, statistics and probability, and demography.

The dataset contains 1,700,286 unique papers published as articles (articles in press, editorial, erratum, and business articles), chapters, books, conference papers, notes, reviews, letters, and short surveys between 2005 and 2018 written by 1,433,297 different authors and labelled with 1,168,680 author-defined keywords. The obtained dataset is 12 times larger than the database analysed in the Italian case and covers almost all papers published in the selected disciplines by all the authors who are active in these research fields around the world. This database, even if not perfect in terms of its coverage for all the existing disciplines (it is well known how social sciences and humanities or medicine are not perfectly represented by Scopus, see for example Archambault et al., 2006), provides a good set of information to explore in depth the ability of the approach to identify and group the topics in the literature.

From these data, we have selected two topics (‘development’ and ‘taxation’)—out of the several topics analysed before for the Italian case—to demonstrate the scalability of the proposed approach to a broader data source and the ability of the method to go deeper into the identification of the relevant topic links. As a matter of fact, the main contribution of this chapter is to propose a methodological approach to topic mining, showing its applicability to different disciplines and its reliability—in terms of the obtained results—when datasets of different richness and size are considered.

Obviously, the topic description may be slightly different depending on the different sets of authors and publications considered. For example, consider that in a given discipline, the Italian authors could have a different publishing behaviour in the 14 years analysed when compared with the authors who are active in the international literature. In this case, the two analyses may not perfectly overlap.

As for the ‘development’ topic, the approach applied to the Scopus dataset can identify several different topics within the ‘development’ field. From the scholarly publications in ‘development’ journals, the proposed approach identifies eight (sub)topics. A first topic, which is the most relevant in terms of publications, is broadly named ‘development’ and is classified as a boosting topic (both rho-s are positive and high) in the Scopus dataset. That is, it has around 3000 papers published (increasing over time) with an increasing number of links with other topics. This topic is a persistent one in the Italian case study. In addition to this, seven additional subfields in the ‘development’ area have been identified, describing a relevant heterogeneity within the field. Topics like ‘development finance’ and ‘development funding’ are both classified as impasse topics (with negative rho-s), exhibiting ended links with other topics and reduced attention from the researchers publishing in the discipline over the considered years. At the same time, this approach provides evidence of some new boosting topics (with both rho-s positive and large) in subfields like ‘development economics’ and ‘development strategies’. Moreover, a ‘development’ heatmap (Fig. 6 in the Online Appendix) shows how the emerging topics over the past few years of the analysed sample have focused on studying cultural, educational, agricultural, trade, and migration issues, along with managerial, institutional, and governance strategies, with special attention given to sustainability, climate change, and the evaluation of the effects of policies in the context of African and Asian countries (like China and India). This more detailed description of the field is in line with the one offered in the Italian case study but with an improved degree of available details on both the thematic issues and specific countries.

As for the ‘taxation’ topic, we have now a richer set of subtopics identified by our approach. Although the overall ‘taxation’ topic has received decreasing interest over the past decade in the Italian case study (as shown in the previous section), the Scopus dataset shows how this field is highly heterogenous. Some topics show a decreasing interest from scholars (like ‘tax competition’), while other topics are clearly emerging (‘boosting topics’) in terms of both the number of papers and the number of topic links. Examples of these boosting topics are ‘tax incentives’, ‘tax havens’, ‘tax compliance’, and ‘tax morale’, which are identified as new emerging topics over the past 15 years.

A first heatmap on ‘taxation’ as a whole shows how the evolution of the international literature spans from links with topics like ‘regulation’, ‘redistribution’, and ‘welfare’ analysis in the early years to more recent developments in the field focused first on ‘income inequality’ (from 2011 to 2015) and then on ‘income distribution’, ‘tax reforms’, and ‘policy evaluation’ of interventions in countries like USA, Australia, and United Kingdom (Fig. 7 in the Online Appendix). Taking advantage of the richness of the considered Scopus dataset, we can go deeper in the analysis of these topics by looking at the heatmaps representing the links that even smaller subtopics in ‘taxation’ have established over time. For example, if we focus on the smaller topics identified by the proposed approach within the taxation field, we can identify ‘tax havens’ as a new emerging topic (which exhibits positive and large rho-s) with a rising number of publications from 2008 onwards. In addition, this subtopic has been interlinked since the very beginning (2008/2010) with topics such as ‘tax competition’, while from 2014 to 2018, it has been associated with topics like ‘tax avoidance’ and ‘tax evasion’, probably reflecting very recent contributions in the literature following the international debate on tax havens (e.g., the ‘Panama papers’ debate). Note that a similar degree of precision in describing the emerging or declining fields of research within a more general discipline like taxation studies could not be found when dealing with national subsamples of publications like the ones described in the Italian case study.

To conclude, the representation and discussion of some selected topics of the three disciplines analysed here shows how the proposed approach may be useful in describing both the geography of topics and the evolution and interlinkages of topics within a discipline by means of two datasets: i) institutional publications provided by four Italian universities and ii) international publications over 14 years. Moreover, the examples show how having a more comprehensive database of the worldwide production of papers is essential to prove the scalability of the proposed approach and reliability of the obtained results. All in all, richer and sizable datasets allow clearer and in-depth analyses to be provided. In particular, for a given number of papers available from the literature, the larger the number of the analysed topics, the sparser the topic links matrix. The cell size of this matrix is crucial to obtain reliable information on the temporal evolution of detailed topics and their interlinkage with new emerging or declining ones. Therefore, large bibliometric databases with millions of records can provide enough information to make this possible.

7 Concluding Remarks

The results from the case study show that the proposed approach to topic mining is capable of revealing trends of publication keywords and changes of these trends over time both when country-level data are available and when—even better—the larger international literature in a given field is considered. In combination with the contribution about mainstream modelling, this result represents a promising achievement, allowing us to recognise topic behaviours that can be associated with one of the profiles of the mainstream research defined in the project (i.e. spot, persistent, impasse, and boosting). In the following, we provide some considerations about possible extensions and applications based on our results.

Possible Research Extensions

Future research activities could focus on i) the extension of the case study dataset in the Italian context to get complete coverage of the topic evolution of a given discipline in Italian academia, ii) the use of a discipline-specific keyword dictionary for a more refined topic cleaning, and iii) the comparison of case study results against third-party datasets similar to the case described in Sect. 6.2. The extension of the Italian case study dataset requires including the institutional research archive of additional Italian universities and may be a powerful tool to comprehensively analyse the topic’s evolution and interlinkages across the different disciplines of Italian academia. On this point, we note that institutional research archives started to be populated at the beginning of the 2000s for almost all Italian universities. This means that i) the dataset adopted for the case study cannot be improved in terms of the size of the considered time interval and ii) the initial years of the considered time interval (i.e. the years from 2000 to 2004) are marginally useful for topic mining because few publications are present in the archives. As a result, through the extension of the available data, we aim to improve the richness of the publication corpus and increase the relevance of the case study in providing meaningful insights into the Italian picture in the period 2005–2018. A progressive inclusion of very recent publications from the considered universities is also required to keep the case study up to date. This may allow researchers from the various fields in the social sciences to study the evolution of their disciplines and the temporal changes that occurred in relation to a number of features, for example, new generations of researchers being more open towards international academia, the introduction of the research assessment exercises on the studied topics, and so forth. Moreover, the proposed approach applied to a complete country-specific bibliometric database may enable policy makers, such as ANVUR (Italian Agency for Evaluation of the University and Research System) or MIUR (Italian Ministry for Education, University, and Research) to design new policy interventions (which is their institutional mission) based on a solid analysis of ‘what worked’ (or not) in the past (as described in more detail in the possible applications to research evaluation provided below). A further issue for future research activities is the specification of a keyword dictionary for topic cleaning, possibly with a more detailed approach for each specific discipline. Sometimes, very general and poorly relevant keywords are included in the results of topic mining activities. A manually defined dictionary of keywords can be set up to refine the results of keyword extraction and improve the quality of the discovered topics. Finally, a comparison of the obtained results against a third-party dataset can be also envisaged, here following the lines described in Sect. 6.2, to compare the topics found in the case study with the Italian community against a dataset that is representative of the international academic community. The goal is to observe possible similarities and/or peculiar behaviours of the Italian community compared with a larger, international group of scholars.

Possible Applications to Research Evaluation

The topic trends that have emerged by applying the proposed techniques can be exploited to analyse changes in the publication practices of researchers along the temporal dimension. It is possible to apply these techniques to a publication dataset that is representative of the overall Italian Academy, meaning that almost all the institutional research archives of the Italian universities will be considered. A possible application scenario is to consider a ‘median scholar’ and the corresponding set of authored publications. By extracting the featured keywords from the ‘median scholar’ publications, it is possible to compare and correlate their research production against the topic trends associated with the scholarly keywords. In this way, shifts in the ‘median scholar’ interests can be tracked, as well as possible changes in terms of publication practices over time so that it is possible to observe whether the scholar’s behaviour endorses a topic whose trend can be recognised as mainstream according to specific time intervals. Similarly, one can focus on identifying heterogeneous publication patterns along the ability distribution, for example, studying if ‘top scholars’ behave differently than median or bottom ones. As a further application scenario, a similar approach can be enforced to analyse the changes that occur over time regarding a reference publication source (e.g. top journals) within a specific research area. In this way, it is possible to observe the evolution of ‘hot research topics’ in certain publication sources in correlation with the topic trends emerging from the already available results in that research area.

In addition to this, having access to the relevant data for the worldwide production of papers belonging to a specific discipline (as collected by standard bibliometric sources such as Scopus or Web of Science) may also enable a comparison of the national evolution of a discipline in a specific country (e.g. in Italy in our case) with respect to its own international benchmark. Moreover, a similar approach may also be adopted to study the effects of introducing a performance-based assessment exercises—as has happened in several countries around the world over the past decades—on the topics’ evolution in different disciplines at the local level.

Does the system of incentives provided by a performance-based assessment exercise have an impact on the evolution and choice of topics studied by academic scholars in their disciplines? Is there any evidence of a temporal shift towards international mainstream research (e.g. leaving niche topics aside) following the introduction of this type of assessment exercise? If so, is it socially optimal? All these research questions (and probably many others) will be part of the future research agenda in this strand of the literature.