Strategy
Topics as collectively shared perspectives on knowledge cannot be used as ground truths for assessing the outcomes of bibliometric topic reconstruction exercises because the former is an interpretive scheme embedded in human consciousness and the latter is a structured set of publications. To overcome this incompatibility, the ground truths must be constructed as sets of publications that validly represent topics. These ground truths need to be derived from researchers’ perspectives. Therefore, we utilize a bottom-up strategy for the construction of ground truths (Fig. 1). We first obtain individual researchers’ perspectives on sets of their own publications, which represent their ‘research trails’, i.e. the sequence of topics they have worked on (step 1). These ‘elementary topics’ of researchers have all important properties of a topic as defined in the previous section except one. What researchers consider as ‘their topics’ are foci on theoretical, methodological or empirical knowledge that serve as frames of reference and guide individual research. They are not, however, collectively shared perspectives. Thus, we can consider them as elementary models of topics but not as topics according to our definition. They can serve to construct our micro-level ground truths because these ground truths should be easier to reconstruct than those derived from ‘true’ topics, which only exist on the meso-level.
In step 2 we construct meso-level ground truths by integrating the publications researchers assigned to their individual topics with a keyword-based approach. This process leads to two types of meso-level ground truths, namely ‘pure’ publication sets that represent only one topic and ‘mixed’ publication sets that represent several overlapping topics. In step 3 we create a set of publications representing AMO research, create two data models (direct citation and bibliographic coupling), and cluster the networks.
With these clustering solutions we conduct two experiments. In experiment 1, we test whether the four combinations of data models and algorithms can reconstruct our individual-level ground truths, i.e. the sets of publications that were described as belonging to the same topic by researchers in interviews. In experiment 2, we test whether the four combinations can reconstruct the meso-level ground truths. We select the easiest task for the algorithms and only evaluate the reconstruction of the ‘pure’ meso-level ground publication sets, i.e. those publication sets that have only keywords from one topic.
Step 1 and 2: Constructing Ground Truths
Data
For the construction of the micro-level ground truths (step 1), we used data from a project on the emergence of experimental BEC in the 1990ies and 2000s (Laudel et al., 2014). In this project, 38 researchers working in atomic and molecular optics (AMO) who switched or did not switch topics by turning to experimental Bose–Einstein condensation (BEC) at different points in their careers were interviewed. Bose–Einstein condensates occur when gases of atoms are cooled to temperatures very close to absolute zero, particles lose their individual identities and coalesce into a single blob. Their existence was theoretically predicted in 1924 by Bose and Einstein. Although BEC was widely accepted as a theoretical possibility, its experimental realisation was regarded by many physicists as very difficult, if not impossible, to achieve for both theoretical and technological reasons. In 1995 the first BEC was experimentally produced by two US groups, an achievement that was later rewarded with the Nobel Prize. The scientific community was initially undecided whether BEC would be the end of a long quest or whether it would open up new research opportunities. However, it soon recognised that BEC can be used for a wide range of fundamental research in several subfields of physics and BEC research grew rapidly to become an established field of research (Fallani & Kastberg, 2015).
For the construction of micro-level and meso-level ground truths, we used interviews with twelve researchers who switched and two researchers who did not switch topics by turning to experimental BEC. By including AMO researchers who switched to experimental BEC, we can further specify one of our ground truths – publications on experimental BEC began in 1995 – and can ask whether clustering solutions reconstruct the emergence of experimental BEC.
Prior to the interviews with these AMO researchers, their publication metadata were downloaded from the Web of Science (WoS), and publication clusters representing their research biographies were constructed. These bibliometric data were discussed in interviews. Researchers confirmed or corrected the correspondence of clusters of their publications with topics they worked on. In the present study, we used data from fourteen of these AMO physicists.
The construction of the meso-level ground truths (step 2) was based on the results from step 1, namely the clusters of researchers’ publications that represented topics researchers worked on. From these publications, we extracted specific keywords and created topical keyword lists.
Methods
In the project conducted by Laudel et al. (2014), face-to-face semi-structured interviews were used. A major thematic focus of the interviews were the interviewee’s research topics beginning with their PhD topic, with an emphasis on thematic changes and the reasons for them. In addition, developments in the interviewee’s national and international communities were discussed. This part of the interview centred on graphical representations of the interviewees’ research trails. Research trails were constructed by downloading their publications from the Web of Science, constructing bibliographic coupling networks (using Salton’s cosine for bibliographic coupling strength) and choosing a threshold for the strength of bibliographic coupling at which the network disaggregates into components (Gläser & Laudel, 2015). Although this ‘manual’ approach also produces several unassigned publications, it is preferable to algorithmic clustering for the construction of an input to interviews. The visual representations of research trails serve as means of ‘graphic solicitation’ in interviews, for which instant visual recognition of different topics is essential. The components are intended to provide a ‘draft’ of topics a researcher has worked on over time (see Fig. 2 for an example). Their visualisations were used to prompt narratives about the content of the research at the beginning of the interview (for an extended description of the approach see Gläser & Laudel, 2015). During these narratives, researchers confirmed and sometimes corrected the picture by combining or separating clusters because they perceived research topics as belonging together or being separate (see the micro-level ground truths layer in Fig. 1). The interviews lasted on average 90 min and were fully transcribed. Transcripts were analysed by qualitative content analysis, i.e. we extracted relevant information from the transcripts by assigning it to categories that were derived from our conceptual framework (Gläser & Laudel, 2013, 2019).
There was no straightforward way in which individual-level ground truths – the sets of publications researchers identified as representing one of the topics they worked on—could be aggregated to meso-level ground truths.
As was to be expected, the perspectives of researchers on their topics differed. For example, central methods used by AMO physicists were considered a separate topic by some researchers, while others integrated them in the theoretical topics they worked on. Furthermore, interviewees assigned publications to disjunct topics while ‘true’ meso-level topics overlap. This is why simply combining all publications from the 14 researchers and applying the method for the construction of research trails to the combined set of publications did not produce any meaningful results, i.e. no threshold could be found at which components represented researchers’ topics. We tested this with the publications on the BEC topic and found that setting a low threshold led to BEC being combined with other topics in one component, while increasing the threshold made the component disintegrate into sub-graphs that did not represent researchers’ topics.
We therefore used the information from the interviews and a keyword-based approach to construct meso-level ground truths:
-
1.
Information from interviews was used to distinguish between ‘pure’ individual topics – those that addressed only one set of problems—and ‘mixed’ topics – those combining topics that were considered separate by other researchers.
-
2.
Publications assigned by the researchers to ‘pure’ topics and contained in the AMO dataset described below ( "Step 3: Obtaining Clustering Solutions"section.) were aggregated. This resulted in four sets of more than ten publications, which were used for the extraction of keywords. These included Bose–Einstein condensation (71 publications), cooling (25), atom interferometry (24) and precision spectroscopy (22).
-
3.
Keywords were extracted from all publications linked to each topic. We used author keywords, keywords plus, and the terms which we extracted from titles and abstracts. The term extraction was conducted with a rule-based approach using a noun phrase chunker in python’s ‘nltk’ package.Footnote 4 Based on the information from the interviews, groups of semantically equivalent terms could be identified. For example, laser cooled rubidium and ultracold rubidium could be subsumed to rubidium because all researchers worked at ultra-low temperatures, and laser cooling was the only method to achieve these temperatures.
-
4.
In order to determine sets of topic-specific terms, we treated each publication linked to one of the four ‘pure’ topics as a cluster and calculated the normalized mutual information (NMI) value for each keyword (Koopman & Wang, 2017). Due to the large differences in the sizes of the topics the results also included very general, frequently occurring terms. We manually excluded these terms (e.g. ‘atoms’, ‘gas’) from the lists of specific terms for each topic.
-
5.
We ranked terms according to their NMI values and iteratively selected terms for each topic until the set of terms allowed us to retrieve most of the publications of each topic. This led to a minimum set of keywords which occur in a large proportion of the publications linked to a topic, and occur mostly in these publications, hence can be considered specific for this topic (Table 1).
-
6.
These keyword lists were applied to retrieve researchers’ publications. Not surprisingly, many publications contained keywords from more than one list and were thus linked to more than one topic.
Table 1 List of the four ‘pure’ topics with the recall that is achieved by searching for the specific keywords (right column) in publications’ metadata This way, the disjunct assignment of publications by interviewees could be replaced with sets of publications in which topics overlap (Fig. 3). The areas bounded by bold lines in Fig. 3 represent ‘pure’ parts of ground truths, i.e. sets of publications that contain keywords from only one topical list. Reconstructing them can be considered the easiest task, which is why we used only these ‘pure’ meso-level ground truths in our experiments.
Step 3: Obtaining Clustering Solutions
Data
To construct the macro-level AMO dataset, we started delimiting the Web of Science by selecting all publications from journals in the subject category 'Physics, Atomic, Molecular & Chemical' published 1975–2005, excluding physical chemistry journals by searching for ‘chemi’ in the journal titles. We then expanded and refined this dataset by:
-
1.
Including publications from all other physics subject categories from 1990–2005 that cited at least two publications from the initial data set;
-
2.
Limiting the data set to publications from 1990 to 2005; and
-
3.
By including publications from all other physics subject categories from 1990 – 2005 that have been co-cited with at least two papers from the 1990—2005 data set.
This resulted in 369,188 publications, whose metadata were downloaded from the 2018 stable version of the Web of Science database hosted by the ‘Kompetenzzentrum Bibliometrie’ (KBFootnote 5). From this dataset we created a subset of publications that contained all AMO physics research relevant for our 14 researchers. To achieve this reduction, we decided to build and cluster the direct citation networkFootnote 6 of this dataset, with the giant component containing 366,480 publications, which included all relevant publications of the micro-level research trails. We applied the Leiden algorithm (Traag et al., 2018) for a coarse clustering (resolution 8e-6) of the giant component, and then extracted the largest cluster with 96,137 publications including 415 (78%) of the research trails’ publications. The missing 22% of the researchers’ publications contain mainly research not belonging to AMO physics or research at the borders of the field (e.g. interdisciplinary collaborations, application of common AMO physics methods in other fields). These 96,137 publications served as our macro-level AMO dataset. For this dataset, two data models were created.
The direct citation network was created using only the ‘internal’ links, i.e. only citations from and to publications in the AMO dataset. Weights were attached to the links according to the normalization formula in (Waltman & van Eck, 2012: 2380).Footnote 7 For constructing the bibliographic coupling network, only references have been used which are source items in the Web of Science. Here, weights were attached to the links by calculating the Salton’s cosine value.
Methods
In order to detect communities in both networks, we selected two algorithms popular in the scientometrics community, namely the so-called Leiden algorithm (v. 1.0.0) (Traag et al., 2018), which further develops the widely used Louvain algorithm, and the Infomap algorithm (v. 0.21.0) (Rosvall & Bergstrom, 2008), which has also already successfully been applied in bibliometrics (Šubelj et al., 2016; Velden, Boyack, et al., 2017; Velden, Yan, et al., 2017).
Both algorithms allow for parameter settings which create coarser or more fine-grained solutions. We varied this parameter in order to allow both algorithms to reconstruct smaller or larger topics in our dataset. The Leiden algorithm requires the specification of a resolution parameter. This parameter is included in its quality function, the Constant Potts Model (CPM), whose optimization for the chosen resolution value results in a particular partition of the network. The seed parameter was always set to ‘0′ for all runs. Varying the resolution parameter leads to different numbers of clusters in a partition.
Infomap finds the minimum description length of a random walk in a given network by creating modules. The parameter Markov random time has the standard value of 1. Changing this parameter means changing the number of steps of the random walker which are encoded (Kheirkhahzadeh et al., 2016). This will result in a more or less fine-grained solution. Furthermore, Infomap allows for ‘multilevel compression’, which means in the result that the ‘modular’ organization of a network can then be analyzed on several levels of hierarchy. Here, we decided to always analyze the second lowest level of the hierarchy,Footnote 8 since this seemed to be a fair decision considering the sizes of our topics (the lowest level of the hierarchy are individual nodes, the highest level of the hierarchy represents a rather coarse clustering).
The ‘granularity’ intervals were varied by changing the resolution parameter of the Leiden algorithm (higher values lead to more granular solutions) and Markov random time of the Infomap algorithm (higher values lead to less granular solutions). To account for the possibility to find a certain number of topics each of a certain minimum size, each parameter range was chosen to obtain a result with at least 20 clusters that have at least 40 publications each. This constraint sets a lower and an upper bound for the resulting granularity level of the clustering results. Clusters smaller than 20 publications have not been regarded in our analysis because the data set covers 16 years and we assume a topic to be represented by several publications per year. In Table 2 the upper and lower bounds of the parameter settings are shown, and the resulting size distributions of the coarsest (least granular) and finest (most granular) solution of the four combinations are displayed in Fig. 4. For the applications of the Leiden algorithm to the data models—solutions Leiden DC (direct citation model) and Leiden BC (bibliographic coupling model)—100 solutions each were produced (100 steps from lowest to highest resolution). For Infomap DC and Infomap BC, we produced 33 and 14 solutions, respectively.
Table 2 Minimum and maximum granularity levels for Leiden algorithm (resolution parameter) and Infomap (Markov random time), as well as steps taken (clustering solutions produced) from minimum to maximum Experiments 1 and 2: assessing the clustering solutions
In order to evaluate whether an individual researcher’s research topic has been reconstructed by the macro-level clustering, we took the publications that were assigned to a topic by the researchers and searched for them in the clusters produced by the four combinations of data models and algorithms for all parameter settings. To qualify as a successful reconstruction of a ground truth, a cluster had to fulfil two criteria:
-
1.
All publications of a topic belong to the same cluster, and
-
2.
No publications from the researcher’s other topics belong to this cluster.
If the publications of a ground truth fall into one cluster, and no publications from other ground truths are in the cluster, the evaluation gets the value 0 (= correct reconstruction). If several ground truths fall into one cluster, it is assigned the value 1 (ground truths lumping). If the publications of one ground truth spread over several clusters, it is assigned the value -1 (ground truth spread).
This assessment was applied to all individual-level ground truths and to the ‘pure’ meso-level ground truths. In addition, we checked whether clustering solutions could reconstruct the emergence of experimental BEC by analysing the dynamics of clusters in which individual-level and meso-level BEC publication sets were placed.