Introduction

Due to the continuous increase of journals and conferences, as well as the wide spread of preprint servers such as arXivFootnote 1, a massive amount of academic papers has accumulated. Revealing new research topics can bring significant benefits for people involved in the research environment, including researchers, research administrators, publishers, and funding bodies. However, it is hard even for experts to keep up with the large number of papers published.

To facilitate an understanding of an arbitrary research field, visualization of emerging research topics using papers published in the target field has attracted much attention (Callon et al. 1983; Assefa and Rorissa 2013; Muñoz-Leiva et al. 2012; Ronda-Pupo and Guerras-Martin 2012; Zhang et al. 2012; Hu et al. 2013; Li et al. 2010; Topalli and Ivanaj 2016). One widely used technique is co-word analysis (Callon et al. 1983), which maps the knowledge structure from word co-occurrences in the papers. In a traditional approach, the paper collection is first divided into temporal subsets (e.g., yearly or multiple years) according to the publication dates of the papers. Then, focusing on words in the content of papers such as titles and abstracts, the co-occurrence frequency of each word pair is calculated by counting the number of papers in which the two words appear together. Finally, the knowledge structure of each time period is visualized as a co-word network whose nodes correspoind to words and whose edges reflect high co-occurrence frequencies between words, and in which the number of edges in the network is generally determined by thresholding edge weights. A series of co-word networks over time (called dynamic co-word networks) helps us to investigate how a target field has developed (Assefa and Rorissa 2013; Muñoz-Leiva et al. 2012; Ronda-Pupo and Guerras-Martin 2012; Zhang et al. 2012; Hu et al. 2013; Li et al. 2010; Topalli and Ivanaj 2016). However, the traditional approach does not have the means to highlight how the emerging research trends change from time period to period; it might depict edges, even if the corresponding word associations are stable or gradually become stronger over time. Such stable word pairs can disturb the finding of discriminative or emerging research topics that characterize the time period. Although there are several methods with which to visualize temporal changes of word clusters or topics (Wang et al. 2014; Song et al. 2014; Mei and Zhai 2005; Hall et al. 2008), these usually describe dominant research topics in a time period using general and frequent single words, rather than specific technical terms that suddenly attract attention.

To make it easier to understand the research topic evolution of a target field, this paper presents TrendNets, a novel emerging research trend visualization approach that detects rapid changes in edge weights in dynamic co-word networks. Our aim here is to find “bursty research topics”—topics that have been growing rapidly rather than topics that are merely popular. That is, a research topic is considered to be bursty in a given time period if it is heavily discussed in that time period, but not in the previous time periods. To measure the burstiness of a research topic, we calculate the difference in word co-occurrence frequency between two successive time periods. In the proposed approach, we first arrange the vectorized edge weights of co-word networks in columns of a single matrix in the order of time. Then, to detect bursty research topics, we formulate a convex optimization framework that decomposes the matrix into a smooth part corresponding to stationary topics and a sparse part corresponding to bursty topics in a computationally efficient manner. We term this framework Fast Sparse–Smooth Matrix Decomposition. After this optimization, the non-zero entries of the resulting sparse networks are considered to represent bursty topics for the corresponding time periods. Experiments on both toy data and real publication data show that TrendNets can effectively detect emerging research trends.

The main contributions of this paper can be summarized as follows:

  • We introduce a novel bursty research topic detection scheme for a traditional co-word analysis for visualizing emerging research trends. The proposed method is computationally efficient, even for a large matrix calculated from dynamic co-word networks.

  • We perform extensive experiments based on both synthetic and real-world datasets, which show that TrendNets effectively describes the bursty topics when compared with the conventional co-word representations.

  • Our method can automatically detect technical phrases from a pool of title words without predefined vocabulary, which is useful because obtaining a set of titles is much easier than collecting keywords to papers provided by authors to papers.

  • We have made our codes available to encourage science mapping in all disciplines.

The remainder of this paper is organized as follows: the next section provides a brief review of the related studies; the third section presents the details of TrendNets; the results of the experiments on the synthetic dataset and paper collections are presented in the fourth and fifth sections, respectively; finally, the paper is summarized and some possible directions for future work are suggested.

Related work

As scholarly big data is growing, science mapping has become more important for encouraging the activities of researchers, research administrators, and science policymakers. Many studies construct knowledge networks using the bibliographic metadata of papers to visualize the relationships between academic-related entities such as authors, papers, and technical terms. For example, co-authorship networks have been exploited to find influential researchers and research groups (Börner et al. 2005). Constructing citation or co-citation networks is also a popular approach for visualizing content-based relevance among research subfields (Shibata et al. 2008). However, it is difficult for citation-based methods to grasp the latest trend because papers usually require time to be cited by other papers. On the other hand, co-word networks have the advantage of being able to discover emerging technologies in a timely manner because they can be constructed as quickly as new papers are published. Using textual words can directly map the knowledge structure of a research field into a conceptual space. The visualization of word associations in a network form is known to provide intuitional understanding (Motter et al. 2002; Drieger 2013; Doerfel and Barnett 1999). Thus, co-word analysis has been widely used to reveal research advances in several fields, including science education (Assefa and Rorissa 2013), strategic management (Ronda-Pupo and Guerras-Martin 2012), patience adherence (Zhang et al. 2012), and information retrieval (Hu et al. 2013). It has also been used to characterize the publication trends of journals (Ravikumar et al. 2015) or conferences (Liu et al. 2014). In conventional visualization, the number of edges in a network is generally determined via the thresholding of edge weights (i.e., word co-occurrence frequency). However, this simple approach tends to produce a lot of word pairs that are stable over time.

To visualize the transition of research topics in dynamic co-word networks, there are methods that detect clusters in each co-word network separately and then track the clusters over time (Wang et al. 2014; Song et al. 2014). Specifically, Wang et al. (2014) applied community detection to a co-word network at each time period and connected communities based on their node-based or edge-based similarity across successive time periods. Song et al. (2014) clustered words via the Markov Random Field and tracked each cluster using word similarity to show its development. These conventional methods can visualize the evolution of representative semantic clusters in a target research field. Because the clusters are usually labeled with general words, it is difficult to highlight specific technical terms that are emerging and likely to form a new technology.

Another line of topic evolution identification has leveraged probabilistic topic models such as Latent Dirichlet Allocation (LDA) (Blei et al. 2003). For example, Mei and Zhai (2005) applied a topic model to papers published in each time period separately and then linked the obtained topics across successive periods based on their similarities. Hall et al. (2008) applied LDA to an entire collection of papers to model research topics and then plotted the number of papers assigned to each topic within each time period. There are also advanced models that can directly treat papers’ time stamps in topic modeling (Wang and McCallum 2006; Blei and Lafferty 2006; Wang et al. 2008). These topic model-based methods usually label resulting clusters with top frequent words within the clusters, which does not emphasize emergent technical terms. On the other hand, TrendNets considers how the co-occurrence frequency of a word pair suddenly increases from previous time periods, directly extracting temporally representative word pairs.

Several studies have regarded words displaying a sudden increase in usage to be indicative of emerging research trends (Mane and Börner 2004; Kim and Chen 2015). Traditional burst detection approaches include the thresholding of occurrences (e.g., the moving average (Vlachos et al. 2004)) and state transition modeling (e.g., a two-state automaton proposed by Kleinberg (2003)). One of the most famous science mapping tools is named CiteSpace (Chen 2006), and it implements the Kleinberg’s burst detection method to show representative words in a specific time period. Kleinberg’s method requires two parameters that are difficult to simultaneously tune. To increase the usefulness of science mapping, our method for constructing TrendNets is formulated to work with only a single parameter. In addition, the related works usually assume that research keywords of papers are obtainable for input data. However, author-provided keywords of conference papers are not always available as open bibliographic information. Thus, we opted to use paper titles only, which are much easier to collect than keywords. Because our method focuses on the burst of edge weights in co-word networks rather than single-word occurrences, we can automatically extract emergent research phrases/terms that are composed of multiple words in the network form. In our experiments, we demonstrate that our TrendNets can effectively organize meaningful phrases from single words within paper titles.

Proposed method

This section describes how to construct TrendNets to map emerging research trends from dynamic co-word networks. An overview of the proposed method is shown in Fig. 1. As shown, we first construct a co-word network for each time period and convert its edge weights into a single matrix whose rows and columns correspond to word pair indexes and time period indexes, respectively. Then, we decompose the matrix it into a smooth part and a burst part. According to our definition of bursty research topics (see Introduction), the matrix decomposition step calculates the difference in word co-occurrence frequency between two successive time periods. The burst part corresponds to a large increase for a word pair, which is represented as a sparse matrix. Finally, the resulting sparse matrix is rearranged into a time-series of sparse co-word networks, in which each of the networks visualize bursty topics during the corresponding period. The details of each procedure are described below.

Fig. 1
figure 1

An overview of the proposed method that constructs TrendNets

Co-word network construction

Let \(\varOmega\) be a collection of papers published in an arbitrary research field. We denote the number of unique words in \(\varOmega\) by M. According to the publication dates of the papers, we divide the paper collection \(\varOmega\) into exclusive temporal subsets \(\varOmega (1), \varOmega (2), \ldots , \varOmega (T)\) such that \(\varOmega = \cup _{t=1}^T \varOmega (t)\). To construct a co-word network at the t-th time period (\(t\in ~\{1,2,\ldots , T\}\)), following Liu et al. (2014), we calculate how often two words appear on the same paper on average during the period as follows:

$${\tilde{W}}_t(i, j) = \frac{n_t(i, j)}{|\varOmega (t)|}, i\ne j, i, j\in \{1, 2, \ldots , M\},$$
(1)

where \(n_t(i, j)\) is the number of papers in which the i-th and j-th words appear together during the t-th time period and \(|\cdot |\) represents the cardinality of the set. This forms a symmetrical matrix \(\tilde{{\mathbf {W}}}_t\in {\mathbb {R}}^{M\times M}\), which corresponds to an adjacency matrix of a conventional co-word network. Then, as shown in Fig. 2, we turn the upper triangular part of \(\tilde{{\mathbf {W}}}_t\) into a column vector \(\tilde{{\mathbf {w}}}_t\in {\mathbb {R}}^{N}\), in which \(N = M(M-1)/2\). Finally, we arrange all vectors over time in the columns to construct a single matrix as \({\mathbf {W}} = [{\mathbf {w}}_1, {\mathbf {w}}_2, \ldots , {\mathbf {w}}_T] \in {\mathbb {R}}^{N\times T}\).

Fig. 2
figure 2

Matrix construction from a time series of co-word graphs

Fast sparse–smooth matrix decomposition

In what follows, we extract sparse graphs from the time series of co-word networks by solving a newly formulated convex optimization problem. Specifically, we decompose a given \({\mathbf {W}}\) into the sum of a sparse matrix and a smooth matrix, in which the sparse matrix is expected to only contain rapidly changed entries that are closely related to bursty topics. The convex optimization problem is given as follows:

$$\min _{{\mathbf {S}}} \frac{1}{2}\Vert D({\mathbf {W}} - {\mathbf {S}})\Vert _F^2 + \lambda \Vert {\mathbf {S}}\Vert _1,$$
(2)

where \({\mathbf {S}}\) contains extracted sparse graphs in its columns, D is a linear operator computing column differences, and \(\Vert \cdot \Vert _1\) and \(\Vert \cdot \Vert _F\) are the \(\ell _1\) and Frobenius norms, respectively. The operator D corresponds to the calculation of the difference in word co-occurrence frequency between two successive time periods. In Prob. (2), the first term plays a role in promoting the sparsity of \({\mathbf {S}}\) (NOTE: the \(\ell _1\) norm is the tightest convex relaxation of the \(\ell _0\) pseudo-norm, i.e., a reasonable sparsity measure). Meanwhile, the second term keeps the remaining matrix \({\mathbf {W}}-{\mathbf {S}}\) smooth in the column direction, implying that the matrix is expected to consist of entries that are not greatly changed during any periods.

Because Prob. (2) is a convex but nonsmooth optimization problem and, indeed, has no closed-form solution, we have to use some iterative algorithms to solve it. In our framework, the fast iterative shrinkage–thresholding algorithm (FISTA) (Beck and Teboulle 2009) is adopted. FISTA can solve convex optimization problems in the following form:

$$\min _{{\mathbf {x}}} f({\mathbf {x}}) + g({\mathbf {x}}),$$
(3)

where f is a smooth convex function with a Lipschitzian gradient, and g is a possibly nonsmooth convex function, and its proximity operatorFootnote 2 (Moreau 1962) is available. The algorithm is given as follows: for an initial vector \({\mathbf {y}}_0\) and \(z_0=1\), iterate

$$\left\lfloor \begin{array}{l} {\mathbf {x}}_{k+1}:={{\,\mathrm{prox}\,}}_{\frac{1}{L} g}({\mathbf {y}}_{k}-\frac{1}{L}\nabla f({\mathbf {y}}_{k})),\\ z_{k+1}:= \frac{1 + \sqrt{1 + 4z_{k}^2}}{2},\\ {\mathbf {y}}_{k+1}:= {\mathbf {x}}_{k} + \frac{z_{k}-1}{z_{k+1}}({\mathbf {x}}_{k}-{\mathbf {x}}_{k-1}), \end{array}\right.$$
(4)

where k is the iteration number and L is the Lipschitz constant of \(\nabla f\).

Let \(f:=\frac{1}{2}\Vert D(\cdot - {\mathbf {S}})\Vert _F^2\) and \(g:=\lambda \Vert \cdot \Vert _1\). Then, f is clearly differentiable, and its gradient:

$$\nabla f({\mathbf {S}}) = -{\mathbf {D}}^\top {\mathbf {D}}({\mathbf {W}}-{\mathbf {S}})$$

has the Lipschitz constant L being equal to \(\Vert D\Vert _{op}^2\), where \(\Vert \cdot \Vert _{op}\) is the operator norm. Meanwhile, since g is the \(\ell _1\) norm, its proximity operator boils down to the soft-thresholding operation:

$$\left[ {{\,\mathrm{prox}\,}}{\tfrac{1}{L} g}({\mathbf {S}})\right] _{i,j} = \text{ sgn }(S_{i,j})\max \{S_{i,j}-\tfrac{\lambda }{L}, 0\}.$$
(5)

As a result, FISTA can be readily applied to Prob. (2), which is summarized in Alg. 1.

figure a

FISTA is a reasonable choice for solving Prob. (2), and the reasons are twofold: (1) its convergence rate is \({\mathcal {O}}(1/k^2)\) (Beck and Teboulle 2009), i.e., very fast, and (2) it uses only the gradient of f and the proximity operator of g, and, in our case, these can be computed efficiently as explained above (the computational cost is linear with the number of entries in \({\mathbf {W}}\)). These properties are important in our framework because matrix \({\mathbf {W}}\) is large, and we must avoid expensive procedures, such as singular value decomposition. In all experiments, we set the Lipschitz constant L in Eq. (4) to 4.0.Footnote 3

Visualization of bursty research topics

After solving (2), we transform the t-th column in the resulting sparse matrix \({\mathbf {S}}\) into a symmetrical matrix \(\tilde{{\mathbf {S}}}_t\in {\mathbb {R}}^{M\times M}\). The non-zero, positive elements of \(\tilde{{\mathbf {S}}}_t\) form a new co-word network for the t-th time period in which nodes represent words and edges associate the words to form bursty research topics. For the effective visualization of the research topics, nodes in the graph are generally grouped according to the graph structure (Liu et al. 2014; Silva et al. 2016), in which spectral clustering or community detection techniques are often exploited. For this paper, we use the Louvain method (Blondel et al. 2008) to find semantic clusters of words.

Experiment I: simulation on synthetic data

Synthetic dataset construction

We first quantitatively evaluated the performance of our matrix decomposition method that enables TrendNets. Because a ground truth set for bursty research topic detection is unavailable, we constructed a synthetic dataset using the following two steps:

  • Constructing stable co-word networks We first generated 50 “stable” co-word networks, which are mostly similar to each other on the basis of a statistical model. Specifically, we used a subset of documents in the Reuters-21578 corpusFootnote 4 that were tagged with “trade” to train a 5-gram language model. For each time period \(t \in \{1, 2, \ldots , 50\}\), the 5-gram language model produced a set of 5000 documents that were statistically identical to the given corpus in which the maximum number of words in each document was set to 12. The document set constructed for t is denoted by \(\varOmega (t)\). Discarding words that appear less than 30 times in \(\{\varOmega (t)\}_{t=1}^{50}\) resulted in a vocabulary of \(M=6696\) unique words. We counted the co-occurrence frequency of the i-th and j-th words in \(\varOmega (t)\), which is denoted by \(G_t(i, j)\), producing a co-word network with adjacency matrix \(G_t\in {\mathbb {R}}^{M\times M}\). The resulting time series of networks can be considered as being stable over time. For each word pair (ij), the mean and standard deviation values of \(\{G_t(i, j)\}_{t=1}^{50}\) are denoted by \(\mu (i, j)\) and \(\sigma (i, j)\), respectively.

  • Generating synthetic bursts Next, we artificially generated bursts on the time series of networks to construct ground truth data. As shown in Fig. 3, we considered the following two types of bursts: “Type-A bursts” (Fig. 3a) are word pairs that have a peak of frequency at a certain time and “Type-B bursts” (Fig. 3b) are word pairs that keep the high frequency after a sudden increase. To generate these bursts, for each time period t, we randomly picked a word pair \((i', j')\) with probability \(p=0.05\) from all pairs such that \(\mu (i', j')>3\) and regarded the tuple \((i', j', t)\) as a burst point. We further randomly classified each word pair into Type-A or Type-B with probability \(p=0.95\) and \(p=0.05\), respectively. For a burst point \((i', j', t)\) labeled with Type-A, we increased its edge weight as follows:

    $$G_t(i', j') \leftarrow \mu (i', j') + u_t(i', j')\sigma (i', j'),$$
    (6)

    where \(u_t(i', j')\) is a random number drawn uniformly from [3, 6]. For a burst point \((i', j', t)\) labeled with Type-B, we iteratively computed \(G_{t'}(i', j')\) using Eq. (6) for \(t'=t,t+1,\ldots , 50\). Finally, we ran the proposed method starting from Eq. (1), in which \(n_t(i, j)=G_t(i,j)\) and \(|\varOmega (t)|=50\), respectively.

Fig. 3
figure 3

Illustrations of two types of bursts generated for the synthetic dataset. a Type-A bursts: word pairs that suddenly increase and disappear later; b Type-B bursts: word pairs that suddenly increase and keep the co-occurrence frequency after that. Deep blue bars indicate time periods that should be detected as bursts. (Color figure online)

Baselines

In the experiment, the proposed method was compared with the following four baseline methods:

  • Baseline 1: Thresholding This approach identifies word pairs whose co-occurrence frequencies exceed a predetermined threshold value as bursts. That is, if \({\tilde{W}}_t(i, j)\) in Eq. (1) is larger than threshold \(\tau _1\), then the word pair (ij) is detected as a burst at t. Baseline 1 corresponds to the traditional co-word network visualization.

  • Baseline 2: Thresholding of time derivative This approach regards word pairs whose time derivatives of frequency exceed a predetermined threshold value as bursts. That is, if \({\tilde{W}}_t(i, j) - {\tilde{W}}_{t-1}(i, j)\) is larger than threshold \(\tau _2\), then the word pair is marked as a burst at t.

  • Baseline 3: Thresholding of differences from the average over time This approach detects word pairs if the difference between those frequencies and the average frequency over time exceeds a predetermined threshold value. That is, if \({\tilde{W}}_t(i, j) - \mu (i,j)\) is larger than threshold \(\tau _3\), then the word pair is marked as a burst at t.

  • Baseline 4: Kleinberg’s method This approach exploits a well-known burst detection model presented by Kleinberg (Kleinberg 2003) and is implemented to visualize emerging research trends in conventional studies (Chen 2006; Katsurai 2017). It models a time series of a word pair’s co-occurrence frequency as a finite-state automaton that consists of a base state and a burst state. The burst degree of the word pair at each time period was calculated from the state transition sequence. The parameters required in this model are s and \(\gamma\). We used \(s=2.0\) and changed the value of \(\gamma\).

Fig. 4
figure 4

PR curve for burst detection results on synthetic data

Table 1 AUC measured using burst detection results on synthetic data

Simulation results

After applying each method to synthetic data, we calculated the precision and recall measures as follows:

$$\text {Recall} = \frac{\# \text { of correctly detected bursts}}{\# \text { of synthetic bursts}},$$
(7)
$$\text {Precision} = \frac{\# \text { of correctly detected bursts}}{\# \text { of detected bursts}}.$$
(8)

Figure 4 shows the Precision–Recall (PR) curve for each method, while Table 1 shows the Area Under the PR Curve (AUC). As shown, we found that the proposed method achieved the best burst detection performance. Baseline 1 provided the worst performance, which implies that the traditional co-word visualization might not highlight an emerging trend in a certain time period. Baseline 4, an application of Kleinberg’s burst detection, achieved a relatively better performance; however, there is unfortunately no automatic method for determining its two parameters s and \(\gamma\). Our burst detection method is more practical because it works with only a single parameter \(\lambda\), which changes the sparsity of networks. Baseline 2, that calculates the time derivative, is the most similar to our method because both aim to detect a sudden increase in edge weights of dynamic co-word networks. However, Baseline 2 often gives false detection of word pairs associated with large values of \(\mu (i, j)\) because the time derivative of such word pairs tends to become relatively large. Our method solved this problem by considering the smoothness along the time direction for each word pair.

Experiment II: TrendNets construction using conference papers

This section presents the results of the experiments on real-world datasets to verify the effectiveness of TrendNets. For dataset construction, we chose the following international conferences: CVPR, INFOCOM, and ICASSP. These conferences are considered prominent in the fields of computer vision, networking, and signal processing, respectively. For each conference, we collected titles presented over the past 16 years from DBLP.Footnote 5 DBLP is an open bibliographic database in the fields of computer science and engineering, which provides titles, author names, publication dates, and conference/journal names, but not abstracts, for papers. Then, from the set of words extracted from each title, we removed stop words (e.g., “a”, “the”, “from”) and unnecessary symbols (e.g., “:”, “?”, “!”). Finally, we performed word stemming (Porter 1980). Table 2 summarizes the details of the three datasets.

Table 2 Details of each conference’s dataset

We divided each conference’s dataset into \(T = 8\) temporal subsets so that each time period consists of two years. Our method was computationally efficient: for example, matrix decomposition with \(\lambda =4.0\times 10^{-4}\) on the ICASSP dataset took approximately 2.31 seconds running on a workstation with a 3.1 GHz Intel Xeon E5-1680 Processor.

Figure 5 shows TrendNets constructed for the latest four periods of the CVPR dataset, which used \(\lambda =8.5\times 10^{-6}\), considering the limited space of the paper. In the figure, nodes of the same color belong to the same cluster, as found via Louvain method. To make it easier to understand research topics, each usage of words in the stemmed forms was replaced with a non-stemmed word whose co-occurrences with other words on the map were the highest. As shown in Fig. 5, TrendNets effectively captured the characteristics of each time period in CVPR. During 2013–2014, the appearance of “human action recognition” and a burst of “pose tracking” are captured (Fig. 5b). Figure 5c contains a meaningful cluster consisting of “convolutional neural networks” and “object detection,” which characterizes the period from 2015 to 2016. We can see from Fig. 5d that recent emerging trends in learning are “adversarial learning” and “reinforcement learning”. A new emerging term, “visual question,” is also shown as an emerging trend of CVPR 2017–2018. The results suggest that using TrendNets helps us understand topic evolution in the target conference.

Fig. 5
figure 5

TrendNets constructed using the CVPR dataset (\(\lambda =8.5\times 10^{-6}\)). a CVPR 2011–2012, b CVPR 2013–2014, c CVPR 2015–2016, d CVPR 2017–2018

Fig. 6
figure 6

TrendNets (\(\lambda =1.7\times 10^{-3}\)) and their original co-word networks for the INFOCOM dataset in a comparison of the proposed method and the traditional method. a TrendNets for 2005–2006, b original co-word network for 2005–2006, c TrendNets for 2017–2018, d original co-word network for 2017–2018. The numbers of edges of (b) and (d) were set to the same as (a) and (c) via the thresholding of edge weights, respectively

Comparison with traditional co-word representation

Finally, we show how TrendNets can be different from the original co-word networks using INFOCOM and ICASSP as examples. We performed the proposed method on the INFOCOM dataset using \(\lambda =1.7\times 10^{-3}\). Figure 6 shows the two dynamic networks corresponding to TrendNets and the original co-word networks for INFOCOM during 2006–2007 and 2017–2018. For fair comparison, the number of edges in the original networks were set to the same of those of the corresponding TrendNets via thresholding. Compared with the original co-word networks in Fig. 6b, d, TrendNets found multiple semantic clusters to explain the details of the bursty research topics, as shown in Fig. 6a, c. Specifically, the conventional visualization (Fig. 6b, d) linked a lot of words to the central term “networks,” which makes it hard to capture the research topics. Our method solved this conventional co-word visualization’s problem and made it possible to show popular clusters such as “scheduling control” and “learning,” as shown in Fig. 6a, c, respectively.

Interestingly, as shown in Figs. 5c, d and 6a, b, the word “networks” commonly appears in the two conferences TrendNets. Thanks to the network form-based visualization, we can infer the meaning of each conferences “networks” on the basis of surrounding words. In our future work, we will detect and analyze such a difference in word usage across conferences in other fields.

Fig. 7
figure 7

TrendNets (\(\lambda =4.0\times 10^{-4}\)) and the original co-word networks for the ICASSP dataset in a comparison of the proposed method and the traditional method. a TrendNets for 2005–2006, b original co-word network for 2005–2006, c TrendNets for 2017–2018, d original co-word network for 2017–2018. The numbers of edges of (b) and (d) were set to the same as (a) and (c) via the thresholding of edge weights, respectively

We used \(\lambda =4.0\times 10^{-4}\) for mapping the emerging research trends of ICASSP. Figure 7 shows the results on the ICASSP dataset during 2015–2016 and 2017–2018. A large cluster of “neural networks” is connected to clusters of “speech” and “estimation” in Fig. 7a, while the terms “image,” “adversarial,” and “generative” appear near neural networks in Fig. 7c. In this way, TrendNets tell us that recent emerging trends of ICASSP are similar to those of CVPR, implying that our method has the potential to find the relevance between conferences.

As compared to the INFOCOM results, the original co-word networks seem to successfully visualize emerging trend topics. Unfortunately, a static term “source separation” exists in both Fig. 7b, d, and more temporally representative terms can be seen in TrendNets. Since these results of the ICASSP dataset show only a slight difference, we can assume that there are differences between conferences in how research topics evolve in terms of time, and it would be valuable to investigate the possibility of the proposed method to estimate it.

In summary, the proposed method enables bursty research trend visualization using a single parameter \(\lambda\) only, even if they are inconspicuous in the original co-word networks. To encourage science mapping in all research fields, the codes for the proposed method are available on the Web.Footnote 6 In addition, our TrendNets effectively revealed emerging research topics that are composed of multiple words, even though we only used the titles of papers without manually assigned keywords. Note that using abstracts in addition to the titles has the potential to cover additional words relevant to the papers that are absent from the titles.

Conclusion and future work

We have developed a novel emerging research trend mapping approach that detects rapid changes in the edge weights of dynamic co-word networks. We have formulated a convex optimization problem and provided an efficient solution to provide sparse networks named TrendNets. Our algorithm works in time linear with the number of words in original networks. In addition, the proposed method can easily work with a single parameter \(\lambda\), which is practical and user friendly. According to experiments using a synthetic dataset, the proposed method achieved better burst detection performance compared to other baseline methods. Experiments conducted on three conference datasets (CVPR, INFOCOM, and ICASSP) showed the advantage of TrendNets in detecting hot topics that are inconspicuous in the traditional co-word representation. Interestingly, the degree of effectiveness of TrendNets differs among the conferences: we can assume that the temporal stability of a word distribution depends on the research area. Quantifying this type of characteristic is a part of our future work.

Further room for investigation and improvement exists in our work. We currently considered only the rapid changes of the edge weights, ignoring the node degrees of original dynamic networks. To fully make use of the original network structure for research trend visualization, more sophisticated models need to be developed. It is also necessary to design more effective visualization of TrendNets. For example, we should examine how to set the colors and sizes of network components such as nodes, edges, and labels. We will continually improve the proposed method and conduct additional experiments using a larger dataset consisting of multiple conferences/disciplines.