TrendNets: mapping emerging research trends from dynamic co-word networks via sparse representation

Mapping the knowledge structure from word co-occurrences in a collection of academic papers has been widely used to provide insight into the topic evolution in an arbitrary research field. In a traditional approach, the paper collection is first divided into temporal subsets, and then a co-word network is independently depicted in a 2D map to characterize each period’s trend. To effectively map emerging research trends from such a time-series of co-word networks, this paper presents TrendNets, a novel visualization methodology that highlights the rapid changes in edge weights over time. Specifically, we formulated a new convex optimization framework that decomposes the matrix constructed from dynamic co-word networks into a smooth part and a sparse part: the former represents stationary research topics, while the latter corresponds to bursty research topics. Simulation results on synthetic data demonstrated that our matrix decomposition approach achieved the best burst detection performance over four baseline methods. In experiments conducted using papers published in the past 16 years at three conferences in different fields, we showed the effectiveness of TrendNets compared to the traditional co-word representation. We have made our codes available on the Web to encourage scientific mapping in all research fields.


Introduction
Due to the continuous increase of journals and conferences, as well as the wide spread of preprint servers such as arXiv 1 , a massive amount of academic papers has accumulated.Revealing new research topics can bring significant benefits for people involved in the research environment, including researchers, research administrators, publishers, and funding bodies.However, it is hard even for experts to keep up with the large number of papers published.
To facilitate an understanding of an arbitrary research field, visualization of emerging research topics using papers published in the target field has attracted much attention (Callon et al., 1983;Assefa and Rorissa, 2013;Muñoz-Leiva et al., 2012;Ronda-Pupo and Guerras-Martin, 2012;Zhang et al., 2012;Hu et al., 2013;Li et al., 2010;Topalli and Ivanaj, 2016).One widely used technique is co-word analysis (Callon et al., 1983), which maps the knowledge structure from word co-occurrences in the papers.In a traditional approach, the paper collection is first divided into temporal subsets (e.g., yearly or multiple years) according to the publication dates of the papers.Then, focusing on words in the content of papers such as titles and abstracts, the co-occurrence frequency of each word pair is calculated by counting the number of papers in which the two words appear together.Finally, the knowledge structure of each time period is visualized as a co-word network whose nodes correspoind to words and whose edges reflect high co-occurrence frequencies between words, and in which the number of edges in the network is generally determined by thresholding edge weights.A series of co-word networks over time (called dynamic co-word networks) helps us to investigate how a target field has developed (Assefa and Rorissa, 2013;Muñoz-Leiva et al., 2012;Ronda-Pupo and Guerras-Martin, 2012;Zhang et al., 2012;Hu et al., 2013;Li et al., 2010;Topalli and Ivanaj, 2016).However, the traditional approach does not have the means to highlight how the research trends change from time period to period; it might depict edges, even if the corresponding word associations are stable or gradually become stronger over time.Such stable word pairs can disturb the finding of discriminative or emerging research topics that characterize the time period.
To make it easier to understand the research topic evolution of a target field, this paper presents TrendNets, a novel research trend visualization approach that detects rapid changes in edge weights in dynamic co-word networks.Our aim here is to find "bursty research topics" that have been growing rapidly, rather than topics that are just popular.In the proposed approach, we first arrange the vectorized edge weights of co-word networks in columns of a single matrix in the order of time.Then, to detect bursty research topics, we formulate a convex optimization framework that decomposes the matrix into a smooth part corresponding to stationary topics and a sparse part corresponding to bursty topics in a computationally efficient manner.We term this framework Fast Sparse-Smooth Matrix Decomposition.After this optimization, the non-zero entries of the resulting sparse networks are considered to represent bursty topics for the corresponding time periods.Experiments on both toy data and real publication data show that TrendNets can effectively detect research trends.
The main contributions of this paper can be summarized as follows: -We introduce a novel bursty research topic detection scheme for a traditional co-word analysis for visualizing research trends.The proposed method is computationally efficient, even for a large matrix calculated from dynamic co-word networks.-We perform extensive experiments based on both synthetic and real-world datasets, which show that TrendNets effectively describes the bursty topics when compared with the conventional co-word representations.-We have made our codes available to encourage science mapping in all disciplines.
The remainder of this paper is organized as follows: the next section provides a brief review of the related studies; the third section presents the details of TrendNets; the results of the experiments on the synthetic dataset and paper collections are presented in the fourth and fifth sections, respectively; finally, the paper is summarized and some possible directions for future work are suggested.

Related Work
As scholarly big data is growing, science mapping has become more important for encouraging the activities of researchers, research administrators, and science policymakers.Many studies construct knowledge networks using the bibliographic metadata of papers to visualize the relationships between academicrelated entities such as authors, papers, and technical terms.For example, co-authorship networks have been exploited to find influential researchers and research groups (Börner et al., 2005).Constructing citation or co-citation networks is also a popular approach for visualizing content-based relevance among research subfields (Shibata et al., 2008).However, it is difficult for citationbased methods to grasp the latest trend because papers usually require time to be cited by other papers.On the other hand, co-word networks have the advantage of being able to discover emerging technologies in a timely manner because they can be constructed as quickly as new papers are published.Using textual words can directly map the knowledge structure of a research field into a conceptual space.The visualization of word associations in a network form is known to provide intuitional understanding (Motter et al., 2002;Drieger, 2013;Doerfel and Barnett, 1999).Thus, co-word analysis has been widely used to reveal research advances in several fields, including science education (Assefa and Rorissa, 2013), strategic management (Ronda-Pupo and Guerras-Martin, 2012), patience adherence (Zhang et al., 2012), and information retrieval (Hu et al., 2013).It has also been used to characterize the publication trends of journals (Ravikumar et al., 2015) or conferences (Liu et al., 2014).In conventional visualization, the number of edges in a network is generally determined via the thresholding of edge weights (i.e., word co-occurrence frequency).However, this simple approach tends to produce a lot of word pairs that are stable over time.
To visualize the transition of research topics in dynamic co-word networks, there are methods that detect clusters in each co-word network separately and then track the clusters over time (Wang et al., 2014;Song et al., 2014).Specifically, Wang et al. (2014) applied community detection to a co-word network at each time period and connected communities based on their node-based or edge-based similarity across successive time periods.Song et al. (Song et al., 2014) clustered words via the Markov Random Field and tracked each cluster using word similarity to show its development.These conventional methods can visualize the evolution of representative semantic clusters in a target research field.Because the clusters are usually labeled with general words, it is difficult to highlight specific technical terms that are emerging and likely to form a new technology.
Another line of topic evolution identification has leveraged probabilistic topic models such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003).For example, Mei and Zhai (2005) applied a topic model to papers published in each time period separately and then linked the obtained topics across successive periods based on their similarities.Hall et al. (2008) applied LDA to an entire collection of papers to model research topics and then plotted the number of papers assigned to each topic within each time period.There are also advanced models that can directly treat papers' time stamps in topic modeling (Wang and McCallum, 2006;Blei and Lafferty, 2006;Wang et al., 2008).These topic model-based methods usually label resulting clusters with top frequent words within the clusters, which does not emphasize emergent technical terms.On the other hand, TrendNets considers how the co-occurrence frequency of a word pair suddenly increases from previous time periods, directly extracting temporally representative word pairs.Detecting sudden increases in word occurrences has played an important role in topic mining and monitoring in text streams.Traditional burst detection approaches include the thresholding of occurrences (e.g., the moving average (Vlachos et al., 2004)) and state transition modeling (e.g., a two-state automaton proposed by Kleinberg (2003)).One of the most famous science mapping tools is named CiteSpace (Chen, 2006), and it implements the Klein- berg's burst detection method to show representative words in a specific time period.Kleinberg's method requires two parameters that are difficult to simultaneously tune.To increase the usefulness of science mapping, our method for constructing TrendNets is formulated to work with only a single parameter.

Proposed Method
This section describes how to construct TrendNets to map research trends from dynamic co-word networks.An overview of the proposed method is shown in Fig. 1. we first construct dynamic co-word networks and convert their edge weights into a single matrix.Then we decompose the matrix into a smooth part and a burst part.Finally, the resulting sparse matrix is rearranged into a time-series of sparse co-word networks in which each of the networks visualize bursty topics during the corresponding period.The details of each procedure are described below.

Co-word network construction
Let Ω be a collection of papers published in an arbitrary research field.We denote the number of unique words in Ω by M .According to the publication dates of the papers, we divide the paper collection Ω into exclusive temporal subsets Ω(1), Ω(2), • • • , Ω(T ) such that Ω = ∪ T t=1 Ω(t).To construct a co-word network at the t-th time period (t ∈ {1, 2, • • • , T }), following Liu et al. (2014), we calculate how often two words appear on the same paper on average during the period as follows: where n t (i, j) is the number of papers in which the i-th and j-th words appear together during the t-th time period and | • | represents the cardinality of the set.This forms a symmetrical matrix Wt ∈ R M ×M , which corresponds to an adjacency matrix of a conventional co-word network.Then, as shown in Fig. 2, we turn the upper triangular part of Wt into a column vector wt ∈ R N , in which N = M (M − 1)/2.Finally, we arrange all vectors over time in the columns to construct a single matrix as In what follows, we extract sparse graphs from the time series of co-word networks by solving a newly formulated convex optimization problem.Specifically, we decompose a given W into the sum of a sparse matrix and a smooth matrix, in which the sparse matrix is expected to only contain rapidly changed entries that are closely related to bursty topics.The convex optimization problem is given as follows: where S contains extracted sparse graphs in its columns, D is a linear operator computing column differences, and • 1 and • F are the 1 and Frobenius norms, respectively.The first term plays a role in promoting the sparsity of S (NOTE: the 1 norm is the tightest convex relaxation of the 0 pseudo-norm, i.e., a reasonable sparsity measure).Meanwhile, the second term keeps the remaining matrix W − S smooth in the column direction, implying that the matrix is expected to consist of entries that are not greatly changed during any periods.Because Prob.( 2) is a convex but nonsmooth optimization problem and, indeed, has no closed-form solution, we have to use some iterative algorithms to solve it.In our framework, the fast iterative shrinkage-thresholding algorithm (FISTA) (Beck and Teboulle, 2009) is adopted.FISTA can solve convex optimization problems in the following form: where f is a smooth convex function with a Lipschitzian gradient, and g is a possibly nonsmooth convex function, and its proximity operator2 (Moreau, 1962) is available.The algorithm is given as follows: for an initial vector y 0 and z 0 = 1, iterate where k is the iteration number and L is the Lipschitz constant of ∇f .Let f := 1 2 D(• − S) 2 F and g := λ • 1 .Then, f is clearly differentiable, and its gradient: has the Lipschitz constant L being equal to D 2 op , where • op is the operator norm.Meanwhile, since g is the 1 norm, its proximity operator boils down to the soft-thresholding operation: As a result, FISTA can be readily applied to Prob.(2), which is summarized in Alg. 1.
Algorithm 1: Fast sparse-smooth matrix decomposition by FISTA FISTA is a reasonable choice for solving Prob.(2), and the reasons are twofold: (i) its convergence rate is O(1/k 2 ) (Beck and Teboulle, 2009), i.e., very fast, and (ii) it uses only the gradient of f and the proximity operator of g, and, in our case, these can be computed efficiently as explained above (the computational cost is linear with the number of entries in W).These properties are important in our framework because matrix W is large, and we must avoid expensive procedures, such as singular value decomposition.In all experiments, we set the Lipschitz constant L in Eq. ( 4) to 4.03 .

Visualization of bursty research topics
After solving (2), we transform the t-th column in the resulting sparse matrix S into a symmetrical matrix St ∈ R M ×M .The non-zero, positive elements of St form a new co-word network for the t-th time period in which nodes represent words and edges associate the words to form bursty research topics.For the effective visualization of the research topics, nodes in the graph are generally grouped according to the graph structure (Liu et al., 2014;Silva et al., 2016), in which spectral clustering or community detection techniques are often exploited.For this paper, we use the Louvain method (Blondel et al., 2008) to find semantic clusters of words.

Synthetic dataset construction
We first quantitatively evaluated the performance of our matrix decomposition method that enables TrendNets.Because a ground truth set for bursty research topic detection is unavailable, we constructed a synthetic dataset using the following two steps: -Constructing stable co-word networks.We first generated 50 "stable" co-word networks, which are mostly similar to each other on the basis of a statistical model.Specifically, we used a subset of documents in the Reuters-21578 corpus4 that were tagged with "trade" to train a 5-gram language model.For each time period t ∈ {1, 2, • • • , 50}, the 5-gram language model produced a set of 5,000 documents that were statistically identical to the given corpus in which the maximum number of words in each document was set to 12.The document set constructed for t is denoted by Ω(t).
Discarding words that appear less than 30 times in {Ω(t)} 50 t=1 resulted in a vocabulary of M = 6, 696 unique words.We counted the co-occurrence frequency of the i-th and j-th words in Ω(t), which is denoted by G t (i, j), producing a co-word network with adjacency matrix G t ∈ R M ×M .The resulting time series of networks can be considered as being stable over time.For each word pair (i, j), the mean and standard deviation values of {G t (i, j)} 50 t=1 are denoted by µ(i, j) and σ(i, j), respectively.-Generating synthetic bursts.Next, we artificially generated bursts on the time series of networks to construct ground truth data.As shown in Fig. 3, we considered the following two types of bursts: "Type-A bursts" (Fig. 3a) are word pairs that have a peak of frequency at a certain time and "Type-B bursts" (Fig. 3b) are word pairs that keep the high frequency after a sudden increase.To generate these bursts, for each time period t, we randomly picked a word pair (i , j ) with probability p = 0.05 from all pairs such that µ(i , j ) > 3 and regarded the tuple (i , j , t) as a burst point.We further randomly classified each word pair into Type-A or Type-B with probability p = 0.95 and p = 0.05, respectively.For a burst point (i , j , t) labeled with Type-A, we increased its edge weight as follows: where u t (i , j ) is a random number drawn uniformly from [3,6].For a burst point (i , j , t) labeled with Type-B, we iteratively computed G t (i , j ) using Eq. ( 6) for t = t, t + 1, • • • , 50.Finally, we ran the proposed method starting from Eq. ( 1), in which n t (i, j) = G t (i, j) and |Ω(t)| = 50, respectively.

Baselines
In the experiment, the proposed method was compared with the following four baseline methods: -Baseline 1: Thresholding.This approach identifies word pairs whose cooccurrence frequencies exceed a predetermined threshold value as bursts.
That is, if Wt (i, j) in Eq. ( 1) is larger than threshold τ 1 , then the word pair (i, j) is detected as a burst at t. Baseline 1 corresponds to the traditional co-word network visualization.-Baseline 2: Thresholding of time derivative.This approach regards word pairs whose time derivatives of frequency exceed a predetermined threshold value as bursts.That is, if Wt (i, j) − Wt−1 (i, j) is larger than threshold τ 2 , then the word pair is marked as a burst at t. -Baseline 3: Thresholding of differences from the average over time.This approach detects word pairs if the difference between those frequencies and the average frequency over time exceeds a predetermined threshold value.That is, if Wt (i, j) − µ(i, j) is larger than threshold τ 3 , then the word pair is marked as a burst at t.  -Baseline 4: Kleinberg's method.This approach exploits a well-known burst detection model presented by Kleinberg (Kleinberg, 2003) and is implemented to visualize research trends in conventional studies (Chen, 2006;Katsurai, 2017).It models a time series of a word pair's co-occurrence frequency as a finite-state automaton that consists of a base state and a burst state.The burst degree of the word pair at each time period was calculated from the state transition sequence.The parameters required in this model are s and γ.We used s = 2.0 and changed the value of γ.

Simulation results
After applying each method to synthetic data, we calculated the precision and recall measures as follows: P recision = # of correctly detected bursts # of detected bursts .
Figure 4 shows the Precision-Recall (PR) curve for each method, while Table 1 shows the Area Under the PR Curve (AUC).As shown, we found that the proposed method achieved the best burst detection performance.Baseline 1 provided the worst performance, which implies that the traditional co-word visualization might not highlight a trend in a certain time period.Baseline 4, an application of Kleinberg's burst detection, achieved a relatively better performance; however, there is unfortunately no automatic method for determining its two parameters s and γ.Our burst detection method is more practical because it works with only a single parameter λ, which changes the sparsity of networks.Baseline 2, that calculates the time derivative, is the most similar to our method because both aim to detect a sudden increase in edge weights of dynamic co-word networks.However, Baseline 2 often gives false detection of word pairs associated with large values of µ(i, j) because the time derivative of such word pairs tends to become relatively large.Our method solved this problem by considering the smoothness along the time direction for each pair.

Experiment II: TrendNets Construction Using Conference Papers
This section presents the results of the experiments on real-world datasets to verify the effectiveness of TrendNets.For dataset construction, we chose the following international conferences: CVPR, INFOCOM, and ICASSP.These conferences are considered prominent in the fields of computer vision, networking, and signal processing, respectively.For each conference, we collected titles presented over the past 16 years from DBLP5 .Then, from the set of words extracted from each title, we removed stop words (e.g., "a", "the", "from") and unnecessary symbols (e.g., ":", "?", "!").Finally, we performed word stemming (Porter, 1980).Table 2 summarizes the details of the three datasets.
We divided each conference's dataset into T = 8 temporal subsets so that each time period consists of two years.Our method was computationally efficient: for example, matrix decomposition with λ = 4.0 × 10 −4 on the ICASSP dataset took approximately 2.31 seconds running on a workstation with a 3.1 GHz Intel Xeon E5-1680 Processor.
Figure 5 shows TrendNets constructed for the latest four periods of the CVPR dataset, which used λ = 8.5 × 10 −6 , considering the limited space of the paper.In the figure, nodes of the same color belong to the same cluster, as found via Louvain method.To make it easier to understand research topics, each usage of words in the stemmed forms was replaced with a non-stemmed word whose co-occurrences with other words on the map were the highest.As shown in Fig. 5, TrendNets effectively captured the characteristics of each  time period in CVPR.During 2013-2014, the appearance of "human action recognition" and a burst of "pose tracking" are captured (Fig. 5b). Figure 5c contains a meaningful cluster consisting of "convolutional neural networks" and "object detection," which characterizes the period from 2015 to 2016.We can see from Fig. 5d that recent trends in learning are "adversarial learning" and "reinforcement learning".A new emerging term, "visual question," is also shown as a trend of CVPR 2017-2018.The results suggest that using TrendNets helps us understand topic evolution in the target conference.
Comparison with traditional co-word representation Finally, we show how TrendNets can be different from the original co-word networks using INFOCOM and ICASSP as examples.We performed the proposed method on the INFOCOM dataset using λ = 1.7 × 10 −3 .Figure 6 shows the two dynamic networks corresponding to TrendNets and the original co-word   and c.Specifically, the conventional visualization (Figs.6b and d) linked a lot of words to the central term "networks," which makes it hard to capture the research topics.Our method solved this conventional co-word visualization's problem and made it possible to show popular clusters such as "scheduling control" and "learning," as shown in Figs.6a and c, respectively.We used λ = 4.0 × 10 −4 for mapping the research trends of ICASSP.Figure 7 shows the results on the ICASSP dataset during 2015-2016 and 2017-2018.A large cluster of "neural networks" is connected to clusters of "speech"  and "estimation" in Fig. 7a, while the terms "image," "adversarial," and "generative" appear near neural networks in Fig. 7c.In this way, TrendNets tell us that recent research trends of ICASSP are similar to those of CVPR, implying that our method has the potential to find the relevance between conferences.
As compared to the INFOCOM results, the original co-word networks seem to successfully visualize trend topics.Unfortunately, a static term "source separation" exists in both Figs.7b and d, and more temporally representative terms can be seen in TrendNets.Since these results of the ICASSP dataset show only a slight difference, we can assume that there are differences between conferences in how research topics evolve in terms of time, and it would be valuable to investigate the possibility of the proposed method to estimate it.
In summary, the proposed method enables research trend visualization using a single parameter λ only, even if they are inconspicuous in the original co-word networks.To encourage science mapping in all research fields, the codes for the proposed method are available on the Web6 .

Conclusion and Future Work
We have developed a novel research trend mapping approach that detects rapid changes in the edge weights of dynamic co-word networks.We have formulated a convex optimization problem and provided an efficient solution to provide sparse networks named TrendNets.Our algorithm works in time linear with the number of words in original networks.In addition, the proposed method can easily work with a single parameter λ, which is practical and user friendly.According to experiments using a synthetic dataset, the proposed method achieved better burst detection performance compared to other baseline methods.Experiments conducted on three conference datasets (CVPR, INFOCOM, and ICASSP) showed the advantage of TrendNets in detecting hot topics that are inconspicuous in the traditional co-word representation.Interestingly, the degree of effectiveness of TrendNets differs among the conferences: we can assume that the temporal stability of a word distribution depends on the research area.Quantifying this type of characteristic is a part of our future work.
Further room for investigation and improvement exists in our work.We currently considered only the rapid changes of the edge weights, ignoring the node degrees of original dynamic networks.To fully make use of the original network structure for research trend visualization, more sophisticated models need to be developed.It is also necessary to design more effective visualization of TrendNets.For example, we should examine how to set the colors and sizes of network components such as nodes, edges, and labels.We will continually improve the proposed method and conduct additional experiments using a larger dataset consisting of multiple conferences/disciplines.

Fig. 1 :
Fig. 1: An overview of the proposed method that constructs TrendNets.

Fig. 2 :
Fig. 2: Matrix construction from a time series of co-word graphs.

Fig. 3 :
Fig. 3: Illustrations of two types of bursts generated for the synthetic dataset.(a) Type-A bursts: word pairs that suddenly increase and disappear later; (b) Type-B bursts: word pairs that suddenly increase and keep the co-occurrence frequency after that.Deep blue bars indicate time periods that should be detected as bursts.

Fig. 4 :
Fig.4: PR curve for burst detection results on synthetic data.

Fig. 6 :
Fig. 6: TrendNets (λ = 1.7 × 10 −3 ) and their original co-word networks for the INFOCOM dataset in a comparison of the proposed method and the traditional method.(a) TrendNets for 2005-2006, (b) original co-word network for 2005-2006, (c) TrendNets for 2017-2018, (d) original co-word network for 2017-2018.The numbers of edges of (b) and (d) were set to the same as (a) and (c) via the thresholding of edge weights, respectively.

Fig. 7 :
Fig. 7: TrendNets (λ = 4.0 × 10 −4 ) and the original co-word networks for the ICASSP dataset in a comparison of the proposed method and the traditional method.(a) TrendNets for 2005-2006, (b) original co-word network for 2005-2006, (c) TrendNets for 2017-2018, (d) original co-word network for 2017-2018.The numbers of edges of (b) and (d) were set to the same as (a) and (c) via the thresholding of edge weights, respectively.

Table 1 :
AUC measured using burst detection results on synthetic data.

Table 2 :
Details of each conference's dataset.