After having described the circumstances of our case study, we are now presenting the actual topic space trajectories that we calculated from our paper corpus. What is more, we give interpretations and visualizations of our findings. While we conducted our study on a particular research field that we are familiar with, our approach here could be applied to other fields and text domains. Hence, our concrete analysis at this point simultaneously serves as an example of a generalizable approach.
Found topics
Table 1 Topics identified through NMF. Each topic is represented here by the top ranked terms. Terms are given in order of their rank (i.e., from highest to lowest word weight). Topics are numbered arbitrarily for reference. We manually assigned names to topics based on our own interpretation. Sometimes interpretations reflect a tendency or focus of a topic
Table 1 shows the topics identified through NMF. Each topic is represented by its top ten terms, determined from the word weights in its topic vector. We manually assigned a number and a name to each topic. The name is based on our own interpretation of the top words in a topic. For this, we considered up to 50 terms per topic. We may refer to some of these additional top words where we consider it useful for the interpretation of a topic.
Most topics found through NMF are well interpretable and often clearly correspond to one specific research area. To name some, we found topics related to Bayesian inference, neural networks, nonlinear control, optimization, social media, clustering, semantic web, recommender systems, graphs, reinforcement learning and image recognition. However, we also found a few topics, which allow for ambiguous interpretations. As an example, topic 7 seems to be a mixture of pattern mining and classification methods. Topic 11 is a topic that appears to be related to different types of learning, e.g. (based on more than the top ten words), transfer learning, online learning, representation learning, reinforcement learning and machine learning. As such, it could be interpreted as a general topic on different concepts of machine learning, i.e., dealing with different approaches on how and what to learn. Additionally, this topic is related to knowledge bases. For some research areas, we found several topics with different focusses. As an example, we found two topics on neural networks (topic 3 and 6). Topic 3 here is more concerned with the architecture of networks (e.g., containing the words layers, architecture, structure, convolutional, feedforward). Topic 6 puts more emphasis on neurons and the biological motivation of neural networks (e.g., neuron, firing, stimulus, synapses, brain, signal, cortical). Additionally, topic 6 puts more emphasis on advanced, dynamic neural networks (spiking, temporal, recurrent, memory).
Topic 17 is a mixture of document retrieval and XML-Tags from the Standards Tag Suite (STS), e.g., tex, math, formula. STS is an XML format used by publishers to exchange documents. In our data, STS was used for publications from one venue. Optimally, only the content from these tags should be parsed and added to the document representation. We spared this effort since few documents are concerned and a complicated checkup of the format of all documents with a subsequent parsing process would be required. Topic 18 is a mixture of feature extraction and methods related to dimensionality reduction (further top terms are manifold, subspace, embedding). We consider this a sensible mixture since these two topics are strongly related, i.e., dimensionality reduction methods are often used to extract features from data.
Altogether we found that NMF gives topics with good interpretability. We also found some limitations of the method. In some cases, two different topics are mixed together, although they are not strongly related semantically, e.g., in topic 7 pattern mining and classification. We surmise this behavior of NMF is encouraged when some third terms often co-occur with both topics, e.g., here algorithm and data mining. It could especially be encouraged through polysemic or homonymous terms (i.e., terms that, in a different context, have slightly or totally different meanings). Second, NMF sometimes learns two topics that could be one, e.g., two on neural networks. This behavior tends to occur for topics that are overrepresented in the training corpus. Where desired, sub- or supersampling based on (research) categories and paper numbers of venues could therefore mitigate such results. A third limitation of NMF is its flat structure. NMF hence fails to convey the taxonomy of topics, e.g., search engines being a subtopic of information retrieval. However, this lack of complexity is at the same time an advantage, since it improves the comprehensibility.
Topic Similarities
In Fig. 6 we depict the cosine similarities between the calculated topic vectors \(t_j\) with \(j \in \{1 \ldots t\}\). Calculating the cosine similarity is a method often used in information retrieval to compare word weights of document vectors. The cosine similarity measures the cosine of the angle between two vectors. For vectors with non-negative components it lies between zero and one. Its maximum is reached for an angle of zero, i.e., when both vectors point into the same direction. Its minimum is reached here when the vectors are orthogonal. From the plot we can see how topics are related to each other. More specifically, brighter cells show that two topics have more similar word weights. Because the original topics have a dimension of more than 14000, it would be laborious to analyze this through a direct comparison of the word weights.
In the plot we see that most topic pairs have a low similarity. This indicates that most learned topics are a strong feature by their own and hence should enable us to represent well the variety of our data in topic space without redundancies. In the plot we notice that two topics on neural networks are most strongly related (“network, networks” and “neurons, neural”), but still have a low cosine similarity of about 0.3. We further see that two topics on information retrieval are closely related (“retrieval, topics” and “document, documents”). Such observations show that topics which come from the same research field (or supercategory) lead to more similar topic vectors. Turning this argument upside down, we can to some degree confirm or rebut our interpretations of the topic vectors. This is because similar topic vectors indicate similar research fields. It is interesting to notice that in some cases topics bear a comparatively high similarity to a variety of different topics. As an example, the topic “tree, classifiers” is similar to almost each of the other topics. The topic “convex, optimization” is similar to such topics as “tree, classifiers”, “policy, reinforcement”, “feature, multi”, “density, regression”, “kernel, svm”. Supposedly, such cases occur due to a strong co-occurrence with other research topics and sometimes, these co-occurrences are also related semantically. As an example, optimization methods are often applied to a loss function for the training of classifiers such as some tree-based methods and support vector machines (SVMs), for fitting regression models, and in reinforcement learning. Hence, also the related topics occur together. The same holds for the topic on Bayesian inference, which is applied to a plethora of machine learning problems.
The lesson learned here is that research topics can be similar for at least three reasons: 1) They have the same supercategory. 2) One is a subcategory of the other. 3) One field is often applied to the other. Hence, calculating topic similarities enables us to interpret topics semantically to some degree. Note that the three cases can sometimes be hard to distinguish since learned topics sometimes do not reflect a pure, single research field. Methods for an automated analysis of these relations would open some interesting further research.
Overall historical interest in topics
In Fig. 7 (top), we illustrate the topic trajectory for all papers in our data set from the years 1987 up to 2018. For this, we calculated one centroid per publication year. We depict the trajectory as chronologically sorted stacked bars, where each column depicts the topic space representation of one year. In Fig. 7 (bottom), we calculated the sum of the document vectors without dividing by the publication number, as done for the centroids. Through this method, the topic weights are given proportional to the total number of papers in a year instead of relative to a total sum of 1. Note that topic names and numbers correspond to the topics in Table 1. We sometimes used shortened names to improve the overview in our visualizations and will repeat on doing so in the remainder of this paper. However, we emphasize here that topics might comprise further notions than visible from the short names alone.
We now analyze some striking results. For this, we refer to topics by our manually assigned topic names.. Observing the topics from 1987 up to 2000 and comparing these with the years from 2001 to 2018, it becomes apparent that three topics with the initially largest weights lose their relative importance over the years. These topics are topic 16 (Planning & Reasoning), topic 3 (Neural Networks) and topic 6 (Neurons & Dynamic Neural Networks). Note that the bars here are influenced by the number of venues, that published papers in a specific year. This is why biennially some bars become larger and smaller. The IJCAI conference here was held every second year in uneven years, i.e., those where aforementioned topics have more weight. We also have venues that did not yet exist at the beginning of our analysis. In 2001, for example, the Journal on Machine Learning Research (JMLR) was introduced (cf. Fig. 2). 2001 is also the first year in which a considerable amount of publications from the WWW conference appears in our data set (although founded earlier). Both these facts led to a larger number of publications in their specific research areas. Despite these influences, the overall tendencies of decline and raise in topic weights are visible. The mentioned decline of interest in neural networks took place at a time, when support vector machines (SVMs) became popular as a more efficient alternative (starting around mid-90s). SVMs stayed a widely used machine learning method. Neural networks, however, gained more interest again in more recent years. These facts become visible from our visualization and coincide with our personal background knowledge.
Further topics that have largely gained in interest include Social Media, Recommender Systems, Optimization, Matrix Methods and Bayesian Inference, among others. Social media became more popular through platforms like Facebook or Twitter. The interest in recommender systems has been promoted through the Netflix prize, a public contest on recommender systems that started in 2005, and through the introduction of the RecSys conference in 2007. Besides this, the growth of online platforms such as Amazon promoted the interest. Optimization, Bayesian inference and matrix methods have proven useful techniques that can be applied to a plethora of machine learning approaches. Topics that have recently lost weight in research are Semantic Web, Search Engines as well as Clustering and Classification & Pattern Mining. Again, this coincides with our personal intuition. These latter two, very general topics have already been explored deeply in research and were often replaced by more specific problems, applications and methods.
Altogether, we notice that some machine learning topics gain, some lose relative importance. Sometimes, topics are almost invisible at the beginning and grow over time. This is a hint that these research topics just emerged or became more popular due to a certain event. Usually, popularity for machine learning methods is triggered by some milestone in their performance, i.e., when beating a benchmark on a data set by some large margin over previous methods. Sometimes other events trigger interest, such as public contests or the release of platforms.
We also aggregated topics over all papers in our data set without differentiating by year. This gives us the overall prevalence of topics in our data set. What we found here was, that the most prevalent topics in our data set are Optimization, Classification & Pattern Mining and the two topics on Neural Networks. Each of these topics makes out about 6-7% of our corpus. This is expected due to the many venues with large publication numbers on these topics in our data set. The prevalence of the least important topics, Document Retrieval/STS-Tags and Clustering, is only about half as high (about 3%). This is partly due to their loss in popularity in recent years, where the overall publication numbers were higher.
Analyzing venue similarities through topical maps
Topic space embeddings and trajectories are of a dimensionality that, in general, cannot directly be depicted in a coordinate system. On the other hand, it is often useful to visualize data in such a way. This is because it allows to find similarities and differences between entities (e.g., venues), indicated by nearness (or distance) in the plot. To be able to visualize high-dimensional data in such a way, we employ a well-known technique called multidimensional scaling (MDS), cf. Mead (1992). The idea behind MDS is to layout high-dimensional vectors in a low-dimensional space while preserving distances as best as possible. The low-dimensional representation of each input vector is found based on the squared differences between the pairwise distances of vectors in the input and the output space. More precisely, starting from random coordinates, points are aligned in the low-dimensional space s.t. an objective function is minimized as follows:
$$\begin{aligned} \min _{{\hat{x}}_1, \ldots , {\hat{x}}_{n}} \sum _{i = 1}^n {\sum _{j = 1}^n {\left( d(x_i, x_j) - ||{\hat{x}}_{i} - {\hat{x}}_j||_2\right) ^2 }} \end{aligned}$$
In this, n is the number of vectors, \(x_i\) is one (high-dimensional) input vector and \({\hat{x}}_{i}\) is the corresponding low-dimensional output vector to be determined through MDS. The function d is a measure of distance in the input space. Here, we use the Euclidean distance, i.e., \(d(x,y) {:=}||x-y||_2\). Note that in theory a distance measure for compositional data such as the Aitchison distance (Martín-Fernández et al. 1998) or one between probability distributions, e.g., those presented in Schaefermeier et al. (2019), would be more suitable for our data. In our investigation, however, we found that the Euclidean distance leads to a better separation of venues into three different research fields, namely neural networks, information retrieval and general machine learning.
We use a dimensionality of two for the vectors \({\hat{x}}_{i}\), as this allows for good visualizations. In the resulting space, the two dimensions are not comprehensible as topics. Nonetheless, in this space we can analyze the topical similarity of venues based on their distance.
In Fig. 8 we depict topical representations of venues, calculated as the centroids of all their papers’ topic vectors. We projected these centroids into a two-dimensional space using MDS. In this figure, the closer points are together, the more similar the research topics of their corresponding venues are. Therefore we call a visualization like in Fig. 8 a topical map of the venues from a publication corpus.
An interesting observation here is that venues cluster into three areas: In the top area we have venues that fall into the information retrieval category. In the middle part we have the more general machine learning venues. These two clusters are separated by CIKM (Conference on Information and Knowledge Management), which appears to fall somewhere in between both worlds. In the bottom area, we have conferences and journals specialized on artificial neural networks. RecSys (Recommender Systems Conference) in the top right of the plot is a conference with a strong topical focus on recommender systems. Hence, it appears to be a category by itself. However, RecSys lies most closely to the information retrieval world and is most dissimilar from neural networks.
Further interesting patterns emerge once we look at the specific venues. NIPS (Neural Information Processing Systems), as an example, was founded as a conference situated closer towards the neural networks topic. Over time, however, it developed into a more general machine learning conference, as can be seen by looking at current conference proceedings. Note that as a general tendency, the number of published papers per venue is growing from year to year as noticeable from Figs. 2 and 7 (bottom). Hence, more recent publication years have a stronger influence on the centroid of a venue. This explains why NIPS falls into the general machine learning category. However, it clearly is the one conference from this category that is closest to the neural networks cluster. IJCAI (International Joint Conference on Artificial Intelligence) and ILP (Inductive Logic Programming, right middle) are both conferences which lie closest to the general machine learning category. Nonetheless, they put more emphasis on specialized topics, such as knowledge representations and logic based systems. Hence, they lie at the border to general machine learning with a small but visible gap to the central part. Similarly, COLT (Conference on Learning Theory) lies at the left border of the same cluster.
In summary, the topical map shows that topics are captured well and as expected through our venue representations. Besides this, topical maps lead to interesting insights once we analyze (in this case visually perceived) clusters and edge cases, i.e., outliers and points which lie between several clusters. A natural enhancement of this method would be an analysis of trajectories in topical maps, i.e., how venues drift apart or together over time.
Visualizing topic space trajectories through projection
Topic space trajectories exhibit too many dimensions (i.e., topics) for direct visualization in a coordinate system. To analyze trajectories, we hence project venue representations onto their most relevant two topics. We determine the relevance of a topic through its average weight in the trajectory, i.e., the topic weight averaged over all years. We demonstrate this method on the NIPS conference, which we selected due to its interesting trajectory. We selected some additional venues, most of which are related to neural networks, for comparison. Figure 9 depicts the trajectories created through this process. We marked the first and last year of each trajectory. Trajectories drift into the direction of the arrows. Through measuring the average weight in the trajectory, we identified the two topics on neural networks as the most relevant ones. The topic on the x-axis is the one we previously identified as being more concerned with the architecture of neural networks (we simply called it Neural Networks). The topic on the y-axis is more concerned with the biologic motivation of neural networks and with dynamic neural networks (i.e., recurrent and spiking neural networks). We called this one Neurons & Dynamic Neural Networks.
We observe that initially all venues drift to the origin of the coordinate system, i.e., lose their relative interest in both topics. In recent years, however, research on neural network architecture has gained attraction again. By backtracking the trajectory of NIPS, we can see that this started around 2010. Around these years, the field of deep learning (Bengio 2009) gained much interest. This interest was motivated and accelerated through several discoveries and improvements in the field, e.g., the ReLU (rectified linear unit) activation function in Glorot et al. (2011), largely improved training times through GPU programming (Raina et al. 2009) and breakthroughs in performance on benchmark data sets, e.g., on the MNIST data set of handwritten digits in Ciresan et al. (2010) and on the ImageNet data set through convolutional neural networks in Krizhevsky et al. (2012). At this point we would like to note that other topics that gained interest, such as optimization, reinforcement learning and image recognition (cf. Fig. 7) are strongly connected to neural networks. Hence, the total interest in topics involving neural networks has increased even more than apparent from the trajectory in Fig. 9.
The conference IJCAI, which was held every two years starting from 1969 and yearly starting from 2015, is almost stationary most of the time. It exhibits only a small proportion of papers on neural networks. In recent years it shows a movement to the right, i.e., an increasing relevance of research on neural network architecture. The NIPS conference has a comparatively smooth, easy to follow trajectory, which again ends close to the origin. This result is supported by the fact that NIPS has become a more general conference on machine learning. Neural Processing Letters starts close to NIPS but ends at a different location, with more relevance on both neural network topics. This is a reasonable result, since it is a journal focussed specifically on this research area. An interesting case is IEEE Transactions on Neural Networks, which has been renamed to IEEE Transactions on Neural Networks and Learning Systems in the year 2012. In our data set these two are handled as separate venues. We noticed the name change through the behavior of the trajectories. More specifically, the endpoint of the trajectory under the first venue name lies close to the starting point of the trajectory under the second name.
Altogether, we see that topic space trajectories are an effective method for a human interpretable analysis of topic drift. A drawback resulting from high-dimensional data, like topic vectors, is that we can only visualize the trajectory for up to three topics. This problem, however, can be mitigated through the selection of relevant topics. One possibility here is a manual topic selection through the user. Likewise, an automated solution can be established through a measure of topic relevance. For our example in Fig. 9, this measure is the average weight of the topic. We can imagine further measures for different applications. Using the maximal topic weight over all years would yield trajectories for topics which were strongly relevant, even when only for a short time. Measures based on the absolute difference between topic weights for different years open another promising direction. Such measures would return trajectories with strong movement in topic space. This can be fine-tuned based on which years are considered (e.g., the difference between the first and the last year, between all consecutive years or between all pairs of years) and how these are aggregated (e.g., using the average or maximum of the differences).
Visualizing topic space trajectories as heat maps
We found that heat maps as depicted in Fig. 10 are an effective method for interpretable topic visualizations. In such plots, topic weights can effortlessly be compared across venues. Each row here visualizes the topic space representation of one venue. In this specific instance, we calculated the centroid of papers from 2018. The topic weights are visualized through different color shades in the columns, with brighter colors indicating stronger topic weights.
Building upon this idea, in Fig. 11 (a) and (b) we visualize all topic space trajectories of our data set through one heat map per venue. In these heat maps, each row represents the topic space representation of a venue for a specific year. By following the rows from top to bottom, we see how interest in specific topics evolves over time. We only calculated and displayed centroids for years, in which at least ten papers were published at a venue. We do this for two reasons: First, occasionally we have instances of papers which seem to have a wrong year or venue. Second, sometimes our data set contains very few papers for a year. Both these lead to venue representations, that do not reflect reality well. We hence only calculate trajectories over years with more samples, i.e., papers. This leads to the effect, that we have no trajectory at all for the venue DMKD.
In the resulting heat maps we make two particularly interesting general observations: First, the topics evolve smoothly from year to year, despite the fact that each row was calculated from completely different papers. Second, we see that venues with similar research exhibit similar patterns in their heat maps. The heat maps thus can be regarded as distinguishable fingerprints that research areas leave. As an example, the heat maps of different venues on Information Retrieval (e.g., Information Retrieval and SIGIR) exhibit a visually similar appearance. This is due to their topic weights being stronger in the same columns as well as being similar in their development over time (i.e., over the rows). Likewise, such a similarity is strongly visible for conferences and journals on artificial neural networks (e.g., NIPS, Neural Computation and Neural Networks). In particular, we have two important topics on neural networks with a similar development over time.
Altogether the strongly visible patterns show that topic space trajectories capture topical specifics about venues as well as about their historical development. Our visualizations as heat maps make these specifics visually perceivable. We argue that heat maps are among the best possible visualization method for these kinds of trajectories. The reason is that heat maps capture all dimensions, i.e., topics, while providing a good and comparable overview.
We now analyze the trajectories of some particular venues. One important machine learning conference is the International Joint Conference on Artificial Intelligence (IJCAI). What strikes most for this venue, is that it has a strong focus on one particular topic, which is Planning & Reasoning, Logic & Association Rules. This focus, however, has decreased since 1995 and the conference has become more diverse. In particular, Reinforcement Learning has gained importance since then, which is a different approach to solve similar tasks.
Another interesting case are the European Conference on Machine Learning (ECML) and Principles and Practice of Knowledge Discovery in Databases (PKDD). Both conferences, similarly to IJCAI, started with a high interest in planning & reasoning (etc.) that declined since 1995 (for PKDD some years later). Both additionally exhibit a strong focus on Classification & Pattern Mining throughout their existence until 2007, as well as some interest in Clustering. What distinguishes these conferences is how weights are distributed across other topics. PKDD here is concerned with web pages, graphs and knowledge bases. ECML is concerned with matrix and kernel methods as well as SVMs and feature extraction. In 2008 these two conferences were merged and since then called ECML/PKDD. The resulting heat map of this conference thus exihbits even more diversely distributed topics than the two alone.
Topic diversity
Table 2 Ranking of venues by topical diversity. Topical diversity is measured as the effective number of species, which is the exponential of the Shannon entropy. Journals are printed in italic. There is a tendency of journals to be more focussed
In this part, we analyze the topical diversity of venues based on their topic space representations. For this, we calculate a measure of diversity from each of these vectors. As the components of the vectors are all positive and sum up to 1, we can interpret each topic space representation as a probability distribution over topics. A higher diversity should be obtained, the more evenly distributed these topics are. This can be achieved through the Shannon-Entropy. The Shannon-Entropy of a probability distribution p over a discrete random variable x with outcomes X is calculated as follows:
$$\begin{aligned} H(p) = -\sum _{x \in X}{p(x) \cdot \ln {p(x)}} \end{aligned}$$
While this measure becomes larger, the more evenly distributed topics are, its concrete value is not well interpretable. In Jost (2006), the entropy is therefore converted to a more interpretable measure through taking the exponential, i.e. by calculating exp(H(p)). This measure is often used in biology to calculate the effective number of species, i.e., the number of evenly distributed species that would be necessary to obtain the same calculated entropy. Hence, the maximum possible value of this measure for a distribution over n outcomes is exactly n. This is reached when all outcomes have the probability 1/n. In information theory, the measure is also sometimes referred to as the perplexity of a distribution, although with a different connotation and interpretation.
We calculate the effective number of species for the topic space representation of every venue. We then rank the conferences in decreasing order of diversity. The results are given in Table 2. The results agree with our background knowledge about these venues. More general conferences or journals on knowledge discovery, which contain research from almost every machine learning field are ranked highly in the list. First rank is ECML/PKDD with a diversity of 19.30. This is closely followed by other more general conferences and journals on machine learning and knowledge discovery, such as KDD and CIKM. In contrast to this, in the lower part of the table we see venues which are more specialized. As an extreme example, RecSys, with its strong focus on recommender systems, is ranked lowest with a diversity of 7.87. Second lowest rank is Neural Computation with a diversity of 11.73. COLT and ILP are comparatively specialized conferences as well. Almost all other conferences from the lower part of the table (ranks 19-28), are specialized conferences or journals on neural networks or information retrieval.
In the middle part of the table we have the cases which are in between. Interesting examples are ECML and PKDD, which starting from 2008 merged to the single conference ECML/PKDD. While both already exhibited a strong topical diversity, it increased even more through the merge. Another interesting case is NIPS (Conference on Neural Information Processing Systems). Although this originally was a conference on neural networks as its name indicates, it later evolved into a more general conference on machine learning and artificial intelligence. Hence, its topical diversity is considerably higher than the diversity of all other conferences and journals focussing on neural networks.
In a second analysis, we look at the development of diversity over the years. This is depicted as a heat map in Fig. 12. In this figure, each row depicts the topic diversity of a venue for all years our data set spans. The diversity is indicated by the color (or shade) of a cell in a row. The last row contains the average diversity over all conferences. Diversities were only calculated, where at least ten papers were available for a venue and year. Note that in some cases, we still have erroneous data. As an example, SIGIR did not take place before 1978. Analyzing the results, it is interesting to note that many venues start out with an increase in diversity (e.g., NIPS, WWW, KDD). In later years, however, and especially the last few years, the diversity often is lower than previously (e.g., NIPS, ICML, IEEE Transactions on Knowledge and Data Engineering, AISTATS). The tendency of an early increase is also noticeable in the last row containing the average diversity. Here, however, we see that later on the diversity is close to constant. It seems that during the last years, some conferences put more focus on specific topics again, while on average the diversity does not change. One reason for this could lie in the growing number of different conferences and journals. Venues here sometimes might want to distinguish themselves more from others. This leads to more focussed single venues, while the average diversity remains nearly constant. Venues could also have become more selective in their review process, choosing only papers that fit the conference well. This could be a consequence of the increasing popularity of machine learning in the recent years, which has also led to a larger number of submitted papers that venues can choose from.
Another interesting case is the WWW conference. From 2001 to 2006, this conference has a diversity that ranges between 9.7 and 10.7. In 2007, there is a sudden increase to a diversity of 12.9. Our topic trajectory heat map in Fig. 11 indicates that in this year the previously strongest topic on web pages has lost popularity. Instead, interest in search engines and social media has grown. Additionally, the topic on recommender systems starts to gain relevance, although already starting one year before in 2006. We assume that two events played a big role for this result: In 2006, the social media platform Twitter was released. Twitter quickly became a popular resource for research due to its large, global user base and public API. The second event was a competition on recommender systems, called the Netflix prizeFootnote 2. In this competition, which started in 2006, participants were invited to develop a recommender system that predicted user ratings for films. The prize money of one million dollars led to an increasing interest in recommender systems with more than 5000 teams actively participating in the competition.
Topic densities
In this part, we analyze the topical density of some venues. For this, we use the topic space representations of the documents in our data set. For one venue, we project the topic space representations of papers into two-dimensional space using MDS. If we have more than 1000 papers for a venue, we only utilize a random sample of 1000 papers. We do this to speed up our calculations, since MDS has a complexity of O(n2) for calculating the distances between the sample pairs. After MDS, we do a kernel density estimation (KDE) to estimate a probability density of papers in the two-dimensional space. The KDE is performed with a gaussian kernel and a grid search for the optimal bandwidth, a hyperparameter of this method. Finally, we plot the density as a heat map together with the locations of the projected papers.
While the dimensions in the projected space are not directly interpretable as topics, we can interpret distance in this space as topical distance. An advantage of representing conferences by their distribution over only their centroid is, that we can see the whole distribution of topics instead of only an aggregated mean value. This distribution captures additional information such as various topical hot spots, i.e., dense areas in the projected space.
Figure 13 shows the results of this process for some selected conferences. We notice here that venues with a broader focus (ECML, KDD) tend to have several “blobs” at the margin of the distribution. We suspect that these blobs are all from different topical focusses. Conferences with a stronger single topical focus (COLT, RecSys) do not exhibit this behavior or only slightly. For RecSys, we have a considerably dense area at the right part, while to the left of this papers drift apart from each other. This indicates one single very strong topical focus. Papers deviating from this focus are distributed evenly across other topics, i.e., without any further strongly visible cluster.
The lower four plots all show venues with research on neural networks. In our heat maps in Fig. 11 we showed that their topic space trajectories exhibit similar patterns. In the density plots, however, we see that nonetheless their distributions are different. Topic densities hence reveal additional information to the topic space representations of venues. This is because the venue representations only give an average of the paper vectors in topic space instead of the full distribution. For Neural Processing Letters, two distinct topical focusses are visible, one being broader (i.e., with more variance) than the other one. Neural Computation has a clear focus, similar to RecSys. This is backed by the fact, that it has the second lowest topic diversity directly after RecSys (cf. Table 2). NIPS has a broad distribution and papers are focussed on different topics, visible through blobs distributed around the margin. Neural Networks, similar to Neural Processing Letters, seems to have two main focusses. However, there is an area, where these two clusters almost merge. This indicates that there is a smooth thematic transition between both focusses. Hence, we suspect that these blobs are the two topics on neural network.
We saw in this section that topic densities give supplementary information to topic space embeddings. However, in the resulting space, we do not know which area of a density belongs to which topic. At this point, we leave the development of a method for a more specific topical interpretation of areas open for further research. One possibility here would be to create a “pseudo-paper” vector for each topic, with full weight on the topic. One could then project these pseudo-papers into the two-dimensional space together with the real papers and mark their positions. It might also be beneficial to apply MDS to the papers of several venues together before estimating the density of each. Finally, we envision that topic space trajectories are extended to such densities, i.e., the chronological development of densities is investigated.