Abstract
Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data and text documents. Researchers have published many articles in the field of topic modeling and applied in various fields such as software engineering, political science, medical and linguistic science, etc. There are various methods for topic modelling; Latent Dirichlet Allocation (LDA) is one of the most popular in this field. Researchers have proposed various models based on the LDA in topic modeling. According to previous work, this paper will be very useful and valuable for introducing LDA approaches in topic modeling. In this paper, we investigated highly scholarly articles (between 2003 to 2016) related to topic modeling based on LDA to discover the research development, current trends and intellectual structure of topic modeling. In addition, we summarize challenges and introduce famous tools and datasets in topic modeling based on LDA.
Similar content being viewed by others
References
Ahmed A et al (2012) Scalable inference in latent variable models. In: Proceedings of the fifth ACM international conference on web search and data mining. ACM
Alam MH, Ryu W-J, Lee S (2016) Joint multi-grain topic sentiment: modeling semantic aspects for online reviews. Inf Sci 339:206–223
Alashri S et al (2016) An analysis of sentiments on facebook during the 2016 US presidential election. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2016. IEEE
AlSumait L, Barbara D, Domeniconi C (2008) On-line lda: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Eighth IEEE International Conference on Data Mining, 2008. ICDM’08. IEEE
Asgari E, Chappelier J-C (2013) Linguistic Resources and Topic Models for the Analysis of Persian Poems in CLfL@ NAACL-HLT
Asuncion HU, Asuncion AU, Taylor RN (2010) Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1. ACM
Bagheri A, Saraee M, De Jong F (2014) ADM-LDA: an aspect detection model based on topic modelling using the structure of review sentences. J Inf Sci 40 (5):621–636
Balasubramanyan R et al (2012) Modeling polarizing topics: When do different political communities respond differently to the same news? in ICWSM
Bauer S et al (2012) Talking places: Modelling and analysing linguistic content in foursquare. In: Privacy, security, risk and trust (PASSAT), 2012 international conference on and 2012 international confernece on social computing (SocialCom). IEEE
Bhattacharya P et al (2014) Inferring user interests in the twitter social network. In: Proceedings of the 8th ACM conference on recommender systems. ACM
Bisgin H et al (2014) A phenome-guided drug repositioning through a latent variable model. BMC Bioinforma 15(1):267
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning. ACM
Chaney AJ-B, Blei DM (2012) Visualizing Topic Models in ICWSM
Chang J, Blei DM (2009) Relational topic models for document networks in international conference on artificial intelligence and statistics
Chang J (2011) lda: collapsed Gibbs sampling methods for topic models. R
Chen B et al (2010) What is an opinion about? Exploring political standpoints using opinion scoring model. In: AAAI
Chen T-H et al (2012) Explaining software defects using topic models. In: 2012 9th IEEE working conference on mining software repositories (MSR), IEEE
Chen L et al (2013) WT-LDA: user tagging augmented LDA for web service clustering. In: International conference on service-oriented computing. Springer
Chen S-H et al (2015) Latent dirichlet allocation based blog analysis for criminal intention detection system. In: 2015 International Carnahan Conference on Security Technology (ICCST). IEEE
Chen T-H, Thomas SW, Hassan AE (2016) A survey on the use of topic models when mining software repositories. Empir Softw Eng 21(5):1843–1919
Cheng VC et al (2014) Probabilistic aspect mining model for drug reviews. IEEE Trans Knowl Data Eng 26(8):2002–2013
Cheng X et al (2014) Btm: topic modeling over short texts. IEEE Transactions on Knowledge and Data Engineering 26(1):2928–2941
Cheng Z, Shen J (2016) On effective location-aware music recommendation. ACM Transactions on Information Systems (TOIS) 34(2):13
Chien J-T, Chueh C-H (2011) Dirichlet class language models for speech recognition. IEEE Transactions on Audio Speech, and Language Processing 19 (3):482–495
Chong W, Blei D, Li F-F (2009) Simultaneous image classification and annotation. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE
Choo J et al (2013) Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE transactions on visualization and computer graphics 19(12):1992–2001
Chuang J, Manning CD, Heer J (2012) Termite: Visualization techniques for assessing textual topic models. In: Proceedings of the international working conference on advanced visual interfaces. ACM
Cohen R, Ruths D (2013) Classifying political orientation on twitter: it’s not easy!. In: ICWSM
Cohen R et al (2014) Redundancy-aware topic modeling for patient record notes. PloS one 9(2):e87555
Cong Y et al (2012) Cross-modal information retrieval-a case study on Chinese wikipedia. In: International conference on advanced data mining and applications. Springer, Berlin
Cordeiro M (2012) Twitter event detection: combining wavelet analysis and topic inference summarization in doctoral symposium on informatics engineering
Cristani M et al (2008) Geo-located image analysis using latent representations. in Computer Vision and Pattern Recognition, 2008. CVPR, vol 2008. IEEE, IEEE Conference on
Daud A et al (2010) Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China 4(2):280–301
Debortoli S et al (2016) Text mining for information systems researchers: an annotated topic modeling tutorial. CAIS 39:7
Diao Q et al (2012) Finding bursty topics from microblogs. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers-volume 1. Association for Computational Linguistics
Eidelman V, Boyd-Graber J, Resnik P (2012) Topic models for dynamic translation model adaptation. In: Proceedings of the 50th annual meeting of the association for computational linguistics: short papers-volume 2. Association for computational linguistics
Eisenstein J et al (2010) A latent variable model for geographic lexical variation. In: Proceedings of the 2010 conference on empirical methods in natural language processings. Association for computational linguistics
Everingham M et al (2008) The pascal visual object classes challenge 2007 (voc 2007) results (2007)
Everingham M et al (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Fang Y et al (2012) Mining contrastive opinions on political texts using cross-perspective topic model. In: Proceedings of the fifth ACM international conference on web search and data mining. ACM
Fu X et al (2015) Dynamic non-parametric joint sentiment topic mixture model. Knowl-Based Syst 82:102–114
Fu X et al (2016) Dynamic online HDP model for discovering evolutionary topics from Chinese social texts. Neurocomputing 171:412–424
Gerber MS (2014) Predicting crime using Twitter and kernel density estimation. Decis Support Syst 61:115–125
Gethers M, Poshyvanyk D (2010) Using relational topic models to capture coupling among classes in object-oriented software systems. In: 2010 IEEE international conference on software maintenance (ICSM). IEEE
Giri R et al (2014) User behavior modeling in a cellular network using latent dirichlet allocation. In: International Conference on Intelligent Data Engineering and Automated Learning. Springer, Berlin
Godin F et al (2013) Using topic models for twitter hashtag recommendation. In: Proceedings of the 22nd international conference on world wide web. ACM
Greene D, Cross JP (2015) Unveiling the political agenda of the european parliament plenary: a topical analysis. In: Proceedings of the ACM web science conference. ACM
Gretarsson B et al (2012) Topicnets: Visual analysis of large text corpora with topic modeling. ACM Transactions on Intelligent Systems and Technology (TIST) 3 (2):23
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235
Guo J et al (2009) Named entity recognition in query. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM
Heintz I et al (2013) Automatic extraction of linguistic metaphor with lda topic modeling Inproceedings of the First Workshop on Metaphor in NLP
Henderson K, Eliassi-Rad T (2009) Applying latent dirichlet allocation to group discovery in large graphs. In: 2009 Proceedings of the ACM symposium on applied computing. ACM
Hong L, Dan O, Davison BD (2011) Predicting popular messages in twitter. In: Proceedings of the 20th international conference companion on world wide web. ACM
Hong L, Frias-Martinez E, Frias-Martinez V (2016) Topic models to infer socio-economic maps in AAAI
Hu Y et al (2012) ET-LDA: joint topic modeling for aligning events and their twitter feedback. In: AAAI
Hu P et al (2014) Latent topic model for audio retrieval. Pattern Recogn 47 (3):1138–1143
Hou L et al (2015) Newsminer: Multifaceted news analysis for event search. Knowl-Based Syst 76:17–29
Huang Z, Lu X, Duan H (2013) Latent treatment pattern discovery for clinical processes. Journal of medical systems 37(2):9915
Jagarlamudi J, Daume H III (2010) Extracting multilingual topics from unaligned comparable corpora. In: ECIR. Springer
Jiang Z et al (2012) Using link topic model to analyze traditional chinese medicine clinical symptom-herb regularities. In: 2012 IEEE 14th international conference on e-health networking, applications and services (Healthcom). IEEE
Jiang D et al (2015) SG-WSTD: a framework for scalable geographic web search topic discovery. Knowl-Based Syst 84:18–33
Jo Y, Oh AH (2011) Aspect and sentiment unification model for online review analysis. In: Proceedings of the fourth ACM international conference on web search and data mining. ACM
Kim Y, Shim K (2014) TWILITE: a recommendation system for twitter using a probabilistic model based on latent Dirichlet allocation. Inf Syst 42:59–77
Kim M et al (2017) Topiclens: efficient multi-level visual topic exploration of large-scale document collections. IEEE Trans Vis Comput Graph 23(1):151–160
Lacoste-Julien S, Sha F, Jordan MI (2009) DiscLDA: discriminative learning for dimensionality reduction and classification. In: Advances in neural information processing systems
Lange D, Naumann F (2011) Frequency-aware similarity measures: why Arnold Schwarzenegger is always a duplicate. In: Proceedings of the 20th ACM international conference on Information and knowledge management. ACM
Larkey LS, Connell ME (2001) Arabic information retrieval at UMass in TREC-10 in TREC
Lee S et al (2016) LARGen: automatic signature generation for Malwares using latent Dirichlet allocation IEEE Transactions on Dependable and Secure Computing
Levy KE, Franklin M (2014) Driving regulation: using topic models to examine political contention in the US trucking industry. Soc Sci Comput Rev 32(2):182–194
Lewis DD (1997) Reuters-21578 text categorization collection
Lewis DD et al (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5(Apr):361–397
Li W, McCallum A (2006) Pachinko allocation: DAG-structured mixture models of topic correlations. In: Proceedings of the 23rd international conference on machine learning. ACM
Li F, Huang M, Zhu X (2010) Sentiment Analysis with Global Topics and Local Dependency in AAAI
Li R (2012) Towards social user profiling: unified and discriminative influence model for inferring home locations. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM
Li J, Cardie C, Li S (2013) TopicSpam: a topic-model based approach for spam detection in ACL (2)
Li Z et al (2013) Enhancing news organization for convenient retrieval and browsing. ACM Transactions on Multimedia Computing. Communications, and Applications (TOMM) 10(1):1
Li C et al (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44(2):359–383
Li X, Ouyang J, Zhou X (2015) Supervised topic models for multi-label classification. Neurocomputing 149:811–819
Li Y et al (2016) Design and implementation of Weibo sentiment analysis based on LDA and dependency parsing. China Communications 13(11):91–105
Li C et al (2016) Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records. Knowl-Based Syst 99:168–182
Li Z et al (2016) Multimedia news summarization in search. ACM Transactions on Intelligent Systems and Technology (TIST) 7(3):33
Li Z, Tang J (2017) Weakly supervised deep matrix factorization for social image understanding. IEEE Trans Image Process 26(1):276–288
Li Z, Tang J, Mei T (2018) Deep collaborative embedding for social image understanding. IEEE transactions on pattern analysis and machine intelligence
Lienou M, Maitre H, Datcu M (2010) Semantic annotation of satellite images using latent Dirichlet allocation. IEEE Geosci Remote Sens Lett 7(1):28–32
Lin CX et al (2010) PET: a statistical model for popular events tracking in social communities. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM
Lin J et al, Addressing cold-start in app recommendation: latent user models constructed from twitter followers (2013). In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM
Linstead E et al (2007) Mining concepts from code with probabilistic topic models. ACM, Inproceedings of the twenty-second IEEE/ACM international conference on automated software engineering
Linstead E, Lopes C, Baldi P (2008) An application of latent Dirichlet allocation to analyzing software evolution. In: 7th international conference on machine learning and applications, 2008. ICMLA’08. IEEE
Liu B et al (2010) Identifying functional miRNA-mRNA regulatory modules with correspondence latent dirichlet allocation. Bioinformatics 26(24):3105–3111
Liu Z et al (2011) Plda+: Parallel latent dirichlet allocation with data placement and pipeline processing. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3):26
Liu B, Zhang L (2012) A survey of opinion mining and sentiment analysis. In: Mining text data. Springer, pp 415–463
Liu Y, Wang J, Jiang Y (2016) PT-LDA: a latent variable model to predict personality traits of social network users. Neurocomputing 210:155–163
Liu Y et al (2016). In: AAAI, Fortune teller: predicting Your Career Path
Lu H-M, Lee C-H (2015) The topic-over-time mixed membership model (TOT-MMM): a twitter hashtag recommendation model that accommodates for temporal clustering effects. IEEE Intell Sys 30(1):18–25
Lu H-M, Wei C-P, Hsiao F-Y (2016) Modeling healthcare data using multiple-channel latent Dirichlet allocation. J Biomed Inform 60:210–223
Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In: 15th working conference on reverse engineering, 2008. WCRE’08. IEEE
Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990
Lui M, Lau JH, Baldwin T (2014) Automatic detection and language identification of multilingual documents. Transactions of the Association for Computational Linguistics 2:27–40
Madan A et al (2011) Pervasive sensing to model political opinions in face-to-face networks. In: International conference on pervasive computing. Springer
Manandhar S, Yuret D (2013) Second joint conference on lexical and computational semantics (* sem), volume 2: Proceedings of the seventh international workshop on semantic evaluation (semeval 2013). In: 2nd joint conference on lexical and computational semantics (* SEM), volume 2: proceedings of the 7th international workshop on semantic evaluation (SemEval 2013)
Mao X-L et al, SSHLDA: a semi-supervised hierarchical topic model (2012). In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for computational linguistics
McCallum AK (2002), A machine learning for language toolkit, Mallet
McCallum A, Corrada-Emmanuel A, Wang X (2005) Topic and role discovery in social networks. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, pp 786–791
McFarland DA et al (2013) Differentiating language usage through topic models. Poetics 41(6):607–625
McInerney J, Blei DM (2014) Discovering newsworthy tweets with a geographical topic model in NewsKDD: Data Science for News Publishing workshop Workshop in conjunction with KDD2014 the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Miao J, Huang JX, Zhao J (2016) TopPRF: a probabilistic framework for integrating topic space into pseudo relevance feedback. ACM Transactions on Information Systems (TOIS) 34(4):22
Millar JR, Peterson GL, Mendenhall MJ (2009) Document clustering and visualization with latent Dirichlet allocation and self-organizing maps in FLAIRS Conference
Minka T, Lafferty J (2002) Expectation-propagation for the generative aspect model. In: Proceedings of the eighteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc
Murdock J, Allen C (2015) Visualization Techniques for Topic Model Checking. In: AAAI
Nakano T, Yoshii K, Goto M (2014) Vocal timbre analysis using latent Dirichlet allocation and cross-gender vocal timbre similarity. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014. IEEE
Nguyen DQ et al (2015) Improving topic models with latent feature word representations. Transactions of the Association for Computational Linguistics 3:299–313
Panichella A et al (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering. IEEE Press
Paul M, Girju R (2010) A two-dimensional topic-aspect model for discovering multi-faceted topics. Urbana 51(61801):36
Paul MJ, Dredze M (2011) You are what you tweet: analyzing twitter for public health. Icwsm 20:265–272
Paul M, Factorial M. Dredze. (2012) LDA: Sparse multi-dimensional text models in advances in neural information processing systems
Phan X-H, Nguyen C-T (2006) Jgibblda: a java implementation of latent dirichlet allocation (lda) using gibbs sampling for parameter estimation and inference
Philbin J, Sivic J, Zisserman A (2011) Geometric latent dirichlet allocation on a matching graph for large-scale image datasets. Int J Comput Vis 95(2):138–153
Preotiuc-Pietro D et al (2017) Beyond binary labels: political ideology prediction of twitter users Inproceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Prier KW et al (2011) Identifying health-related topics on twitter. in International Conference on Social Computing. Springer, Behavioral-Cultural Modeling, and Prediction
Qian S et al (2016) Multi-modal event topic model for social event analysis. IEEE Trans Multimedia 18(2):233–246
Qin Z, Cong Y, Wan T (2016) Topic modeling of Chinese language beyond a bag-of-words. Computer Speech and Language 40:60–78
Ramage D et al (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 1-volume 1. Association for computational linguistics
Ramage D, Manning CD, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM
Ramage D, Rosen E (2011) Stanford topic modeling toolbox
Rao Y (2016) Contextual sentiment topic model for adaptive social emotion classification. IEEE Intell Syst 31(1):41–47
Rao Y et al (2014) Building emotional dictionary for sentiment analysis of online news. World Wide Web 17(4):723–742
Rehurek R, Sojka P (2011) Gensim-statistical semantics in python
Ren Y, Wang R, Ji D (2016) A topic-enhanced word embedding for Twitter sentiment classification. Inf Sci 369:188–198
Rennie J (2017) The 20 Newsgroups data set. http
Roberts K et al (2012) EmpaTweet: annotating and detecting emotions on twitter. In: LREC
Rosen-Zvi M et al (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence. AUAI Press
Sandhaus E (2008) The New York times annotated corpus. Linguistic Data Consortium, Philadelphia
Savage T et al (2010) Topic XP: exploring topics in source code using latent Dirichlet allocation. In: 2010 IEEE International Conference on software maintenance (ICSM). IEEE
Sharma V et al (2015) Analyzing Newspaper Crime Reports for Identification of Safe Transit Paths in HLT-NAACL
Shi B et al (2016) Detecting common discussion topics across culture from news reader comments in ACL (1)
Siersdorfer S et al (2014) Analyzing and mining comments and comment ratings on the social web. ACM Trans Web (TWEB) 8(3):17
Sizov S (2010) Geofolk latent spatial semantics in web 2.0 social media. In: Proceedings of the third ACM international conference on web search and data mining. ACM
Song M, Kim MC, Jeong YK (2014) Analyzing the political landscape of 2012 korean presidential election in twitter. IEEE Intell Syst 29(2):18–26
Srijith P et al (2017) Sub-story detection in Twitter with hierarchical Dirichlet processes. Inf Process Manag 53(4):989–1003
Steyvers M, Griffiths T (2007) Probabilistic topic models. Handbook of latent semantic analysis 427(7):424–440
Steyvers M, Griffiths T (2011) Matlab topic modeling toolbox 1.4. http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm
Sun X et al (2016) Exploring topic models in software engineering data analysis: a survey. In: 2016 17th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD). IEEE
Sun S, Luo C, Chen J (2017) A review of natural language processing techniques for opinion mining systems. Information Fusion 36:10–25
Tan S et al (2014) Interpreting the public sentiment variations on twitter. IEEE transactions on knowledge and data engineering 26(5):1158–1170
Tang H et al (2013) A multiscale latent Dirichlet allocation model for object-oriented clustering of VHR panchromatic satellite images. IEEE Trans Geosci Remote Sens 51(3):1680–1692
Thomas SW (2011) Mining software repositories using topic models. In: Proceedings of the 33rd international conference on software engineering. ACM
Thomas SW et al (2011) Modeling the evolution of topics in source code histories. In: Proceedings of the 8th working conference on mining software repositories. ACM
Tian K, Revelle M, Poshyvanyk D (2009) Using latent dirichlet allocation for automatic categorization of software. In: 6th IEEE International working conference on mining software repositories, 2009. MSR’09. IEEE
Titov I, McDonald R (2008) Modeling online reviews with multi-grain topic models. In: Proceedings of the 17th international conference on world wide web. ACM
Vaduva C, Gavat I, Datcu M (2013) Latent Dirichlet allocation for spatial analysis of satellite images. IEEE Trans Geosci Remote Sens 51(5):2770–2786
Vulic I, De Smet W, Moens M-F (2011) Identifying word translations from comparable corpora using latent topic models. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers-volume 2. Association for computational linguistics
Wallach HM, Mimno DM, McCallum A (2009) Rethinking LDA: why priors matter. In: Advances in neural information processing systems
Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM
Wang C, Blei DM (2009) Decoupling sparsity and smoothness in the discrete hierarchical dirichlet process. In: Advances in neural information processing systems
Wang Y, Mori G (2011) Max-margin latent Dirichlet allocation for image classification and annotation. In: BMVC
Wang H et al (2011) Finding complex biological relationships in recent PubMed articles using Bio-LDA. PloS one 6(3):e17243
Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM
Wang X, Gerber MS, Brown DE (2012) Automatic Crime Prediction Using Events Extracted from Twitter Posts. SBP 12:231–238
Wang Y-C, Burke M, Kraut RE (2013) Gender, topic, and audience response: an analysis of user-generated content on facebook. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM
Wang J et al (2014) Image tag refinement by regularized latent Dirichlet allocation. Comput Vis Image Underst 124:61–70
Wang T et al (2014) Product aspect extraction supervised with online domain knowledge. Knowl-Based Syst 71:86–100
Wang S et al (2014) Cross media topic analytics based on synergetic content and user behavior modeling. In: IEEE International Conference on Multimedia and Expo (ICME), 2014. IEEE
Wang Y et al (2016) Catching fire via” Likes”: inferring topic preferences of trump followers on twitter. In: ICWSM
Weng J et al (2010) Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the third ACM international conference on Web search and data mining. ACM
Weng J, Lee B-S (2011) Event detection in twitter. ICWSM 11:401–408
Wick M, Ross M, Learned-Miller E (2007) Context-sensitive error correction: using topic models to improve OCR. In: 9th international conference on document analysis and recognition, 2007. ICDAR 2007. IEEE
Wilson AT, Chew PA (2010) Term weighting schemes for latent dirichlet allocation. In: Human language technologies: the 2010 annual conference of the north american chapter of the association for computational linguistics. Association for Computational Linguistics
Wu Y et al (2012) Ranking gene-drug relationships in biomedical literature using latent dirichlet allocation. In: Pacific symposium on biocomputing. NIH Public Access
Wu H et al (2012) Locally discriminative topic modeling. Pattern Recogn 45(1):617–625
Xianghua F et al (2013) Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon. Knowl-Based Syst 37:186–195
Xiao C et al (2017) Adverse drug reaction prediction with symbolic latent dirichlet allocation in AAAI
Xie P, Yang D, Xing EP (2015) Incorporating word correlation knowledge into topic modeling in HLT-NAACL
Xie W et al (2016) Topicsketch: real-time bursty topic detection from twitter. IEEE Trans Knowl Data Eng 28(8):2216–2229
Xu Z et al (2017) Crowdsourcing based social media data analysis of urban emergency events. Multimedia Tools and Applications 76(9):11567–11584
Yan X et al (2013) A biterm topic model for short texts. In: Proceedings of the 22nd international conference on world wide web. ACM
Yang M-C, Rim H-C (2014) Identifying interesting Twitter contents using topical analysis. Expert Syst Appl 41(9):4330–4336
Yang M, Kiang M (2015) Extracting Consumer Health Expressions of Drug Safety from Web Forum. In: 2015 48th Hawaii international conference on system sciences (HICSS). IEEE
Yang X et al (2017) Characterizing malicious Android apps by mining topic-specific data flow signatures Information and Software Technology
Yano T, Cohen WW, Smith NA (2009) Predicting response to political blog posts with topic models. In: Proceedings of human language technologies: the 2009 annual conference of the north american chapter of the association for computational linguistics. Association for computational linguistics
Yano T, Smith NA (2010) What’s worthy of comment? content and comment volume in political blogs in ICWSM
Yeh J-F, Tan Y-S, Lee C-H (2016) Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation. Neurocomputing 216:310–318
Yin Z et al (2011) Geographical topic discovery and comparison. In: Proceedings of the 20th international conference on world wide web. ACM
Yin H et al (2014) A temporal context-aware model for user behavior modeling in social media systems. In: Proceedings of the ACM SIGMOD international conference on Management of data, 2014. ACM
Yoshii K, Goto M (2012) A nonparametric Bayesian multipitch analyzer based on infinite latent harmonic allocation. IEEE Transactions on Audio. Speech, and Language Processing 20(3):717–730
Yu K et al (2014) Mining hidden knowledge for drug safety assessment: topic modeling of LiverTox as a case study. BMC Bioinforma 15(17):S6
Yu R, He X, Liu Y (2015) Glad: group anomaly detection in social media analysis. ACM Transactions on Knowledge Discovery from Data (TKDD) 10(2):18
Yu X, Yang J, Xie Z-Q (2015) A semantic overlapping community detection algorithm based on field sampling. Expert Syst Appl 42(1):366–375
Yuan B et al (2014). In: International conference on web information systems engineering. Springer, Berlin
Yuan J et al (2015) Lightlda: big topic models on modest computer clusters. In: Proceedings of the 24th international conference on world wide web. International world wide web conferences steering committee
Zhai Z, Liu B, Xu H, Jia P (2011) Constrained LDA for grouping product features in opinion mining. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 448–459
Zhang H et al (2007) Probabilistic community discovery using hierarchical latent gaussian mixture model. In: AAAI
Zhang X-P et al (2011) Topic model for chinese medicine diagnosis and prescription regularities analysis: case on diabetes. Chinese Journal Of Integrative Medicine 17 (4):307–313
Zhang J et al (2013) Social Influence Locality for Modeling Retweeting Behaviors in IJCAI
Zhang L, Sun X, Zhuge H (2015) Topic discovery of clusters from documents with geographical location. Concurrency and Computation: Practice and Experience 27(15):4015–4038
Zhang Y et al (2017) iDoctor: personalized and professionalized medical recommendations based on hybrid matrix factorization. Futur Gener Comput Syst 66:30–35
Zhao WX et al (2011) Comparing twitter and traditional media using topic models. In: European conference on information retrieval. Springer
Zhao F et al (2016) A personalized hashtag recommendation approach using LDA-based topic model in microblog environment. Futur Gener Comput Syst 65:196–206
Zhai K et al (2012) Mr. LDA: a flexible large scale topic modeling package using variational inference in mapreduce. In: Proceedings of the 21st international conference on world wide web. ACM
Zheng X et al (2014) Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification. Knowl-Based Syst 61:29–47
Zeng J, Liu Z-Q, Cao X-Q (2016) Fast online EM for big topic modeling. IEEE Trans Knowl Data Eng 28(3):675–688
Zhu J, Ahmed A, Xing EP (2009) MedLDA: maximum margin supervised topic models for regression and classification. In: Proceedings of the 26th annual international conference on machine learning. ACM
Zirn C, Stuckenschmidt H (2014) Multidimensional topic analysis in political texts. Data and Knowledge Engineering 90:38–53
Zoghbi S, Vulic I, Moens M-F (2016) Latent Dirichlet allocation for linking user-generated content and e-commerce data. Inf Sci 367:573–599
Acknowledgements
This article has been awarded by the National Natural Science Foundation of China (61170035, 61272420, 81674099, 61502233), the Fundamental Research Fund for the Central Universities (30916011328, 30918015103), and Nanjing Science and Technology Development Plan Project (201805036).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jelodar, H., Wang, Y., Yuan, C. et al. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimed Tools Appl 78, 15169–15211 (2019). https://doi.org/10.1007/s11042-018-6894-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6894-4