Topic modeling in software engineering research

Silva, Camila Costa; Galster, Matthias; Gilson, Fabian

doi:10.1007/s10664-021-10026-0

Topic modeling in software engineering research

Open access
Published: 06 September 2021

Volume 26, article number 120, (2021)
Cite this article

Download PDF

You have full access to this open access article

Empirical Software Engineering Aims and scope Submit manuscript

Topic modeling in software engineering research

Download PDF

7459 Accesses
38 Citations
1 Altmetric
Explore all metrics

Abstract

Topic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.e., word clusters) from a corpus of textual documents. In software engineering, topic modeling has been used to analyze textual data in empirical studies (e.g., to find out what developers talk about online), but also to build new techniques to support software engineering tasks (e.g., to support source code comprehension). Topic modeling needs to be applied carefully (e.g., depending on the type of textual data analyzed and modeling parameters). Our study aims at describing how topic modeling has been applied in software engineering research with a focus on four aspects: (1) which topic models and modeling techniques have been applied, (2) which textual inputs have been used for topic modeling, (3) how textual data was “prepared” (i.e., pre-processed) for topic modeling, and (4) how generated topics (i.e., word clusters) were named to give them a human-understandable meaning. We analyzed topic modeling as applied in 111 papers from ten highly-ranked software engineering venues (five journals and five conferences) published between 2009 and 2020. We found that (1) LDA and LDA-based techniques are the most frequent topic modeling techniques, (2) developer communication and bug reports have been modelled most, (3) data pre-processing and modeling parameters vary quite a bit and are often vaguely reported, and (4) manual topic naming (such as deducting names based on frequent words in a topic) is common.

Semantic topic models for source code analysis

Article 22 November 2016

A survey on the use of topic models when mining software repositories

Article 10 September 2015

Latent Dirichlet Allocation (LDA) Based on Automated Bug Severity Prediction Model

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Text mining is about searching, extracting and processing text to provide meaningful insights from the text based on a certain goal. Techniques for text mining include natural language processing (NLP) to process, search and understand the structure of text (e.g., part-of-speech tagging), web mining to discover information resources on the web (e.g., web crawling), and information extraction to extract structured information from unstructured text and relationships between pieces of information (e.g., co-reference, entity extraction) (Miner et al. 2012). Text mining has been widely used in software engineering research (Bi et al. 2018), for example, to uncover architectural design decisions in developer communication (Soliman et al. 2016) or to link software artifacts to source code (Asuncion et al. 2010).

Topic modeling is a text mining and concept extraction method that extracts topics (i.e., coherent word clusters) from large corpora of textual documents to discovery hidden semantic structures in text (Miner et al. 2012). An advantage of topic modeling over other techniques is that it helps analyzing long texts (Treude and Wagner 2019; Miner et al. 2012), creates clusters as “topics” (rather than individual words) and is unsupervised (Miner et al. 2012).

Topic modeling has become popular in software engineering research (Sun et al. 2016; Chen et al. 2016). For example, Sun et al. (2016) found that topic modeling had been used to support source code comprehension, feature location and defect prediction. Additionally, Chen et al. (2016) found that many repository mining studies apply topic modeling to textual data such as source code and log messages to recommend code refactoring (Bavota et al. 2014b) or to localize bugs (Lukins et al. 2010).

Probabilistic topic models such as Latent Semantic Indexing (LSI) (Deerwester et al. 1990) and Latent Dirichlet Allocation (LDA) (Blei et al. 2003b) discover topics in a corpus of textual documents, using the statistical properties of word frequencies and co-occurrences (Lin et al. 2014). However, Agrawal et al. (2018) warn about systematic errors in the analysis of LDA topic models that limit the validity of topics. Lin et al. (2014) also advise that classical topic models usually generate sub-optimal topics when applied “as is” to small amounts or short text documents.

Considering the limitations of topic modeling techniques and topic models on the one hand and their potential usefulness in software engineering on the other hand, our goal is to describe how topic modeling has been applied in software engineering research. In detail, we explore the following research questions:

RQ1. Which topic modeling techniques have been used and for what purpose? There are different topic modeling techniques (see Section 2), each with their own limitations and constraints (Chen et al. 2016). This RQ aims at understanding which topic modeling techniques have been used (e.g., LDA, LSI) and for what purpose studies applied such techniques (e.g., to support software maintenance tasks). Furthermore, we analyze the types of contributions in studies that used topic modeling (e.g., a new approach as a solution proposal, or an exploratory study).
RQ2. What are the inputs into topic modeling? Topic modeling techniques accept different types of textual documents and require the configuration of parameters (see Section 2.1). Carefully choosing parameters (such as the number of topics to be generated) is essential for obtaining valuable and reliable topics (Agrawal et al. 2018; Treude and Wagner 2019). This RQ aims at analysing types of textual data (e.g., source code), actual documents (e.g., a Java class or an individual Java method) and configured parameters used for topic modeling to address software engineering problems.
RQ3: How are data pre-processed for topic modeling? Topic modeling requires that the analyzed text is pre-processed (e.g., by removing stop words) to improve the quality of the produced output (Aggarwal and Zhai 2012; Bi et al. 2018). This RQ aims at analysing how previous studies pre-processed textual data for topic modeling, including the steps for cleaning and transforming text. This will help us understand if there are specific pre-processing steps for a certain topic modeling technique or types of textual data.
RQ4. How are generated topics named? This RQ aims at analyzing if and how topics (word clusters) were named in studies. Giving meaningful names to topics may be difficult but may be required to help humans comprehend topics. For example, naming topics can provide a high-level view on topics discussed by developers in Stack Overflow (a Q&A website) (Barua et al. 2014) or by end mobile app users in tweets (Mezouar et al. 2018). Analysts (e.g., developers interested in what topics are discussed on Stack Overflow or app reviews) can then look at the name of the topic (i.e., its “label”) rather than the cluster of words. These labels or names must capture the overarching meaning of all words in a topic. We describe different approaches to naming topics generated by a topic model, such as manual or automated labeling of clusters with names based on the most frequent words of a topic (Hindle et al. 2013).

In this paper, we provide an overview of the use of topic modeling in 111 papers published between 2009 and 2020 in highly ranked venues of software engineering (five journals and five conferences). We identify characteristics and limitations in the use of topic models and discuss (a) the appropriateness of topic modeling techniques, (b) the importance of pre-processing, (c) challenges related to defining meaningful topics, and (d) the importance of context when manually naming topics.

The rest of the paper is organized as follows. In Section 2 we provide an overview of topic modeling. In Section 3 we describe other literature reviews on the topic as well as “meta-studies” that discuss topic modeling more generally. We describe the research method in Section 4 and present the results in Section 5. In Section 6, we summarize our findings and discuss implications and threats to validity. Finally, in Section 7 we present concluding remarks and future work.

2 Topic Modeling

Topic modeling aims at automatically finding topics, typically represented as clusters of words, in a given textual document (Bi et al. 2018). Unlike (supervised) machine learning-based techniques that solve classification problems, topic modeling does not use tags, training data or predefined taxonomies of concepts (Bi et al. 2018). Based on the frequencies of words and frequencies of co-occurrence of words within one or more documents, topic modeling clusters words that are often used together (Barua et al. 2014; Treude and Wagner 2019). Figure 1 illustrates the general process of topic modeling, from a raw corpus of documents (“Data input”) to topics generated for these documents (“Output”). Below we briefly introduce the basic concepts and terminology of topic modeling (based on Chen et al. (2016)):

Word w: a string of one or more alphanumeric characters (e.g., “software” or “management”);
Document d: a set of n words (e.g., a text snippet with five words: w₁ to w₅);
Corpus C: a set of t documents (e.g., nine text snippets: d₁ to d₉);
Vocabulary V: a set of m unique words that appear in a corpus (e.g., m = 80 unique words across nine documents);
Term-document matrix A: an m by t matrix whose A_i,j entry is the weight (according to some weighting function, such as term-frequency) of word w_i in document d_j. For example, given a matrix A with three words and three documents as

A_1,1 = 5 indicates that “code” appears five times in d₁, etc.;
Topic z: a collection of terms that co-occur frequently in the documents of a corpus. Considering probabilistic topic models (e.g., LDA), z refers to an m-length vector of probabilities over the vocabulary of a corpus. For example, in a vector z₁ = (code : 0.35;test : 0.17;bug : 0.08),

0.35 indicates that when a word is picked from a topic z₁, there is a 35% chance of drawing the word “code”, etc.;
Topic-term matrix ϕ (or T): a k by m matrix with k as the number of topics and ϕ_i,j the probability of word w_j in topic z_i. Row i of ϕ corresponds to z_i. For example, given a matrix ϕ as

0.05 in the first column indicates that the word “code” appears with a probability of 0.5% in topic z₃, etc.;
Topic membership vector 𝜃_d: for document d_i, a k-length vector of probabilities of the k topics. For example, given a vector \(\theta _{d_{i}} = (z_{1}: 0.25; z_{2}: 0.10; z_{3}: 0.08)\),

0.25 indicates that there is a 25% chance of selecting topic z₁ in d_i;
Document-topic matrix 𝜃 (or D): an n by k matrix with 𝜃_i,j as the probability of topic z_j in document d_i. Row i of 𝜃 corresponds to \(\theta _{d_{i}}\). For example, given a matrix 𝜃 as

0.10 in the first column indicates that document d₂ contains topic z₁ with probability of 10%, etc.

2.1 Data Input

Data used as input into topic modeling can take many forms. This requires decisions on what exactly are documents and what the scope of individual documents is (Miner et al. 2012). Therefore, we need to determine which unit of text shall be analyzed (e.g., subject lines of e-mails from a mailing list or the body of e-mails).

To model topics from raw text in a corpus C (see Fig. 1), the data needs to be converted into a structured vector-space model, such as the term-document matrix A. This typically also requires some pre-processing. Although each text mining approach (including topic modeling) may require specific pre-processing steps, there are some common steps, such as tokenization, stemming and removing stop words (Miner et al. 2012). We discuss pre-processing for topic modeling in more detail when presenting the results for RQ3 in Section 5.4.

2.2 Modeling

Different models can be used for topic modeling. Models typically differ in how they model topics and underlying assumptions. For example, besides LDA and LSI mentioned before, other examples of topic modeling techniques include Probabilistic Latent Semantic Indexing (pLSI) (Hofmann 1999). LSI and pLSI reduce the dimensionality of A using Singular Value Decomposition (SVD) (Hofmann 1999). Furthermore, variants of LDA have been proposed, such as Relational Topic Models (RTM) (Chang and Blei 2010) and Hierarchical Topic Models (HLDA) (Blei et al. 2003a). RTM finds relationships between documents based on the generated topics (e.g., if document d₁ contains the topic “microservices”, document d₂ contains the topic “containers” and document d_n contains the topic “user interface”, RTM will find a link between documents d₁ and d₂ (Chang and Blei 2010)). HLDA discovers a hierarchy of topics within a corpus, where each lower level in the hierarchy is more specific than the previous one (e.g., a higher topic “web development” may have subtopics such as “front-end” and “back-end”).

Topic modeling techniques need to be configured for a specific problem, objectives and characteristics of the analyzed text (Treude and Wagner 2019; Agrawal et al. 2018). For example, Treude and Wagner (2019) studied parameters, characteristics of text corpora and how the characteristics of a corpus impact the development of a topic modeling technique using LDA. Treude and Wagner (2019) found that textual data from Stack Overflow (e.g., threads of questions and answers) and GitHub (e.g., README files) require different configurations for the number of generated topics (k). Similarly, Barua et al. (2014) argued that the number of topics depends on the characteristics of the analyzed corpora. Furthermore, the values of modeling parameters (e.g., LDA’s hyperparameters α and β which control an initial topic distribution) can also be adjusted depending on the corpus to improve the quality of topics (Agrawal et al. 2018).

2.3 Output

By finding words that are often used together in documents in a corpus, a topic modeling technique creates clusters of words or topicsz_k. Words in such a cluster are usually related in some way, therefore giving the topic a meaning. For example, we can use a topic modeling technique to extract five topics from unstructured document such as a combination of Stack Overflow posts. One of the clusters generated could include the co-occurring words “error”, “debug” and “warn”. We can then manually inspect this cluster and by inference suggest the label “Exceptions” to name this topic (Barua et al. 2014).

3 Related Work

3.1 Previous Literature Reviews

Sun et al. (2016) and Chen et al. (2016), similar to our study, surveyed software engineering papers that applied topic modeling. Table 1 shows a comparison between our study and prior reviews. As shown in the table, Sun et al. (2016) focused on finding which software engineering tasks have been supported by topic models (e.g., support source code comprehension, feature location, traceability link recovery, refactoring, software testing, developer recommendations, software defects prediction and software history comprehension), and Chen et al. (2016) focused on characterizing how studies used topic modeling to mine software repositories.

Table 1 Comparison to previous reviews

Year	Venue	Title	Reference
2010	ICSE	Software Traceability with Topic Modeling	(Asuncion et al. 2010)
2017	ICSE	An Unsupervised Approach for Discovering Relevant Tutorial Fragments for APIs	(Jiang et al. 2017)
2013	ICSE	How to Effectively Use Topic Models for Software Engineering Tasks? An Approach Based on Genetic Algorithms	(Panichella et al. 2013)
2013	ICSE	Analysis of User Comments: An Approach for Software Requirements Evolution	(Galvis Carreno and Winbladh 2012)
2014	ICSE	AR-miner: Mining Informative Reviews for Developers from Mobile App Marketplace	(Chen et al. 2014)
2012	ICSE	Semi-automatically extracting FAQs to improve accessibility of software development knowledge	(Henß et al. 2012)
2019	MSR	Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools	(Chatterjee et al. 2019)
2014	MSR	Mining Questions Asked by Web Developers	(Bajaj et al. 2014)
2016	MSR	Topic Modeling of NASA Space System Problem Reports: Research in Practice	(Layman et al. 2016)
2013	MSR	Using citation influence to predict software defects	(Hu and Wong 2013)
2013	MSR	Bug report assignee recommendation using activity profiles	(Naguib et al. 2013)
2018	MSR	Feature Location Using Crowd-Based Screencasts	(Moslehi et al. 2018)
2016	MSR	On Mining Crowd-Based Speech Documentation	(Moslehi et al. 2016)
2015	MSR	The App Sampling Problem for App Store Mining	(Martin et al. 2015)
2009	MSR	Mining search topics from a code search engine usage log	(Bajracharya and Lopes 2009)
2012	ASE	Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling	(Nguyen et al. 2012)
2011	ASE	A Topic-based Approach for Narrowing the Search Space of Buggy Files from a Bug Report	(Nguyen et al. 2011)
2019	FSE	Going Big: A Large-scale Study on What Big Data Developers Ask	(Bagherzadeh and Khatchadourian 2019)
2017	FSE	Bayesian Specification Learning for Finding API Usage Errors	(Murali et al. 2017)
2013	MSR	Bug report assignee recommendation using activity profiles	(Naguib et al. 2013)
2018	MSR	Feature Location Using Crowd-Based Screencasts	(Moslehi et al. 2018)
2016	MSR	On Mining Crowd-Based Speech Documentation	(Moslehi et al. 2016)
2015	MSR	The App Sampling Problem for App Store Mining	(Martin et al. 2015)
2009	MSR	Mining search topics from a code search engine usage log	(Bajracharya and Lopes 2009)
2012	ASE	Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling	(Nguyen et al. 2012)
2011	ASE	A Topic-based Approach for Narrowing the Search Space of Buggy Files from a Bug Report	(Nguyen et al. 2011)
2019	FSE	Going Big: A Large-scale Study on What Big Data Developers Ask	(Bagherzadeh and Khatchadourian 2019)
2017	FSE	Bayesian Specification Learning for Finding API Usage Errors	(Murali et al. 2017)
2018	ESEM	What Do Concurrency Developers Ask About?: A Large-scale Study Using Stack Overflow	(Ahmed and Bagherzadeh 2018)
2017	TSE	Improving Automated Bug Triaging with Specialized Topic Model	(Xia et al. 2017b)
2014	TSE	Methodbook: Recommending move method refactorings via relational topic models	(Bavota et al. 2014b)
2018	TSE	Predicting Future Developer Behavior in the IDE Using Topic Models	(Damevski et al. 2018)
2013	EMSE	Integrating information retrieval, execution and link analysis algorithms to improve feature location in software	(Dit et al. 2013)
2013	EMSE	Automated topic naming: supporting cross-project analysis of software maintenance activities	(Hindle et al. 2013)
2017	EMSE	What do developers search for on the web?	(Xia et al. 2017a)
2013	EMSE	How do open source communities blog?	(Pagano and Maalej 2013)
2014	EMSE	How changes affect software entropy: an empirical study	(Canfora et al. 2014)
2019	EMSE	Towards prioritizing user-related issue reports of mobile applications	(Noei et al. 2019)
2019	EMSE	CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues	(Ahasanuzzaman et al. 2019)
2019	EMSE	Studying the consistency of star ratings and reviews of popular free hybrid Android and iOS apps	(Hu et al. 2019)

Metric	Definition	Context-specific	Used in
A measure	Measures difference between two populations (Vargha and Delaney 2000)	No	(Thomas et al. 2014)
Adjusted mutual information (AMI)	Compare two sets of clusters of a clustering technique, e.g., to compare gold standard labeled clusters and the clusters discovered by a technique	No	(Rosenberg and Moonen 2018)
Anomaly score	Defining program behavior as a statistical distribution, this metric represents the distance between the distribution of expected behavior and the actual program behavior (Murali et al. 2017)	Yes	(Murali et al. 2017)
Area Under the Curve (AUC)	Evaluates performance of a scoring classifier using the Receiver Operating Characteristic curve (ROC) which plots recall (true positive rate) against the fraction of false positives out of the negatives (false positive rate) (Kakas et al. 2011)	No	(Fowkes et al. 2016)
Average overlap	Average overlap between labels generated manually and labels automatically generated by the tested topic models (De Lucia et al. 2014)	Yes	(De Lucia et al. 2014)
Average percentage of faults detected (APFD)	Average percentage of faults detected by a prioritized test suite (Rothermel et al. 2001)	Yes	(Thomas et al. 2014)
Completeness	Extent to which all members of a given gold standard label set are assigned to the same cluster (Rosenberg and Moonen 2018)	Yes	(Rosenberg and Moonen 2018)
Homogeneity	Extent to which members of a proposed word cluster come from the same gold standard label set (Rosenberg and Moonen 2018)	Yes	(Rosenberg and Moonen 2018)
Effectiveness	Number of methods that must be investigated before the first method relevant to a feature is located (Poshyvanyk et al. 2007)	Yes	(Biggers et al. 2014; Poshyvanyk et al. 2012)
Effort reduction	Ratio between created clusters and clustered documents (log files) as a measure for the the reduced effort by analyzing clusters of log files rather than individual log files (Rosenberg and Moonen 2018)	Yes	(Rosenberg and Moonen 2018)
Precision	Fraction of documents retrieved that are relevant to the user’s information need (total number of documents retrieved that are relevant divided by the total number of documents that are retrieved) (Zeugmann et al. 2011)	No	(Silva et al. 2016; Murali et al. 2017; Cao et al. 2017; Zhang et al. 2016; Demissie et al. 2020; Blasco et al. 2020; Poshyvanyk et al. 2012)
Average Precision	Average precision value for a recalled value (Zhang and Zhang 2009)	No	(Liu et al. 2020)
Mean Average Precision (MAP)	Average of the aggregated average precision (Beitzel et al. 2009)	No	(Abdellatif et al. 2019; Rao and Kak 2011)
Maximum possible precision gain (MPG)	Precision of the best possible scenarios (e.g., in a tree of concepts, the user should navigate the shortest path between the root and the node with the relevant concept) that might be obtained with a technique (Poshyvanyk et al. 2012)	Yes	(Poshyvanyk et al. 2012)
Recall	Fraction of relevant documents that are successfully retrieved (total number of documents retrieved that are relevant divided by the total number of relevant documents in the corpus) (Zeugmann et al. 2011)	No	(Silva et al. 2016; Murali et al. 2017; Cao et al. 2017; Zhang et al. 2016; Demissie et al. 2020; Blasco et al. 2020; Poshyvanyk et al. 2012)
Recall @k	Fraction of relevant documents that are successfully retrieved in top k results (Yan et al. 2016b)	No	(Yan et al. 2016b)
F-measure	Weighted harmonic mean of precision and recall (Brank et al. 2011)	No	(Silva et al. 2016; Cao et al. 2017; Zhang et al. 2016; Blasco et al. 2020)
Mann-Whitney-Wilcoxon test	Non-parametric test of the null hypothesis that, for randomly selected values X and Y from two populations, the probability of X being greater than Y is equal to the probability of Y being greater than X (Mann and Whitney 1947)	No	(Thomas et al. 2014)
Mean Reciprocal Rank (MRR)	Reciprocal rank is calculated using precision @k: given a rank k, precision @k is the precision calculated over the set of retrieved documents with a rank of k. Thus, MRR is the average of the reciprocal rank of a set of queries. The set of queries refer to a list of documents of interest that may be found in the ranked list of retrieved documents) (Craswell 2009)	No	(Binkley et al. 2015; Zhang et al. 2016)
Minimal browsing area (MBA)	Shortest path between root node from a tree of concepts and the node containing the relevant results of a search in such tree (Poshyvanyk et al. 2012)	No	(Poshyvanyk et al. 2012)
Hit ratio	When recommending software functionalities (e.g., features for mobile apps), evaluates how many functionalities can be successfully recommended based on a list of hit functionalities (Hariri et al. 2013)	Yes	(Jiang et al. 2019)
Actual assignee hit ratio	In the context of bug assignment to developers (referred as assignees), evaluates how much the list of recommended assignees contains the actual assignee (Naguib et al. 2013)	Yes	(Naguib et al. 2013)
Top-k hit	In the context of bug assignment to developers (referred as assignees), measures if the ranked list of recommended assignees contains any assignee who has performed either assigning, reviewing, or resolving a bug report (Naguib et al. 2013)	Yes	(Naguib et al. 2013)
Normalized Discounted Cumulative Gain (NDCG)	Quality of Top-k Accuracy ranking (Croft and Metzler 2010)	No	(Jiang et al. 2019; Chen et al. 2014)
SCORE	Ranking-based metric that calculates the proportion of bugs versus the proportion of the code that must be examined for the localization of the bugs (Jones and Harrold 2005)	Yes	(Rao and Kak 2011)
Perplexity	Measure of performance for statistical models of natural language, which indicates the uncertainty in predicting a single word (Blei et al. 2003b)	No	(Yan et al. 2016b)
Purity	Extent to which clusters (from a clustering technique) contain a single label (Manning et al. 2008)	No	(Cao et al. 2017)
Term Entropy	Measure of uncertainty associated with a random variable (Shannon 1948). Studies calculated entropy for distribution of terms in documents. A document with lower entropy indicates that it has few dominant terms, while a document with higher entropy presents more dominant terms	No	(De Lucia et al. 2014; Cao et al. 2017)
Top-k Accuracy	Percentage of bug reports in which at least one relevant source code entity was returned in the top k results (e.g., a top-10 accuracy value of 0.15 indicates that for 15% of the bug reports at least one relevant source code entity was returned in the top 10 results) (Nguyen et al. 2011)	No	(Thomas et al. 2013; Tantithamthavorn et al. 2018; Abdellatif et al. 2019; Xia et al. 2017b)

Topic modeling in software engineering research

Abstract

Similar content being viewed by others

Semantic topic models for source code analysis

A survey on the use of topic models when mining software repositories

Latent Dirichlet Allocation (LDA) Based on Automated Bug Severity Prediction Model

1 Introduction

2 Topic Modeling

2.1 Data Input

2.2 Modeling

2.3 Output

3 Related Work

3.1 Previous Literature Reviews

3.2 Meta-studies on Topic Modeling

4 Research Method

4.1 Search Procedure

4.2 Study Selection Criteria

4.3 Data Extraction and Synthesis

5 Results

5.1 Overview

5.2 RQ1: Topic Models Used

5.2.1 Topic Modeling Techniques

5.2.2 Supported Tasks

5.2.3 Types of Contribution

5.3 RQ2: Topic Model Inputs

5.3.1 Types of Data

5.3.2 Documents

5.3.3 Model Parameters

5.4 RQ3: Pre-processing Steps

5.5 RQ4: Topic Naming

6 Discussion

6.1 RQ1: Topic Modeling Techniques

6.1.1 Summary of Findings

6.1.2 Comparative Studies

6.2 RQ2: Inputs to Topic Models

6.2.1 Summary of Findings

6.2.2 Documents and Parameters for Topic Models

6.2.3 Supported Tasks, Types of Data and Types of Contribution

6.3 RQ3: Data Pre-processing

6.3.1 Summary of Findings

6.3.2 Pre-processing Different Types of Data

6.4 RQ4: Assigning Names to Topics

6.5 Implications

6.6 Threats to Validity

7 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Appendix A

Appendix A

1.1 A.1 Papers Reviewed

1.2 A.2 Metrics Used in Comparative Studies

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation