Abstract
With topic modeling methods, such as Latent Dirichlet Allocation (LDA), we can find topics in large text collections. To efficiently employ this information, there is a need for a method that automatically analyzes the topics with respect to their usefulness for applications like the detection of new innovations. This paper presents a novel method to automatically evaluate topics produced by LDA. The new approach puts the focus on finding topics with topic words that are not only coherent, but also specific. By using the documents associated with each word to calculate background topics, a baseline can be set for each topic word that helps assess whether its context fits the topic well. Experiments indicate that the resulting topics are more manageable in terms of their interpretability. Moreover, we show that the approach can be used to detect weak signals.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Publicly available at https://www.webhose.io/datasets; retrieved on August 27th, 2018.
- 2.
All the paper data sets were downloaded from Scopus API between January 4th, 2019 and February 4th, 2019 via http://www.api.elsevier.com and http://www.scopus.com.
- 3.
In all our experiments we used common preprocessing techniques such as stemming and custom stopword lists to enhance the topic quality.
- 4.
All the paper data sets were downloaded from Scopus API between January 4th, 2019 and February 4th, 2019 via http://www.api.elsevier.com and http://www.scopus.com.
- 5.
References
Chang J, Gerrish S, Wang C, Boyd-Graber J, Blei DM (2009) Reading tea leaves: how humans interpret topic models. In: Advances in neural information processing systems, vol 22. Vancouver, Canada, pp 288–296
Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on web search and data mining. Shanghai, China, pp 399–408
Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Lau, JH, Newman D, Karimi S, Baldwin T (2010) Best Topic Word Selection for Topic Labelling. In: Proceedings of the 23rd international conference on computational linguistics. Beijing, China, pp 605–613
He D, Wang M, Khattak AM, Zhang L, Gao W (2019) Automatic labeling of topic models using graph-based ranking. IEEE Access 7:131593–131608
Alokaili A, Aletras N, Stevenson M (2019) Re-ranking words to improve interpretability of automatically generated topics. In: Proceedings of the 13th international conference on computational semantics—long Papers. Gothenburg, Sweden, pp 43–54
Aletras N, Stevenson M (2013) Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th international conference on computational semantics. Potsdam, Germany, pp 13–22
Mimno D, Wallach HM, Talley E, Leenders M, McCallum A (2011) Optimizing Semantic Coherence in Topic Models. In: Proceedings of the conference on empirical methods in natural language processing. Edingburgh, United Kingdom, pp 262–272
Rosner F, Hinneburg A, Röder M, Nettling M, Both A (2014) Evaluating topic coherence measures. In: Computing research repository (CoRR), pp 1–4. arXiv:1403.6397
Korencic D, Ristov S, Snajder J (2018) Document-based topic coherence measures for news media text. Expert Syst Appl 114:357–373
AlSumait L, Barbara D, Gentle J, Domeniconi C (2009) Topic Significance Ranking of LDA Generative Models. In: Proceedings of the European conference on machine learning and knowledge discovery in databases. Bled, Slovenia, pp 67–82
Kölbl L, Mühlroth C, Wiser F, Grottke M, Durst C (2019) Big data im Innovationsmanagement: Wie machine learning die Suche nach Trends und Technologien revolutioniert. HMD Praxis der Wirtschaftsinformatik 56(5):900–913
Thorleuchter D, Scheja T, Van den Poel D (2014) Semantic weak signal tracing. In: Expert systems with applications 41(11):5009–5016
Saritas O, Smith JE (2011) The big picture-trends, drivers, wild cards, discontinuities and weak signals. Futures 43(3):292–312
Mühlroth C, Grottke M (2018) A systematic literature review of mining weak signals and trends for corporate foresight. J Bus Econ 88(5):643–687
Lajoie EW, Bridges L (2014) Innovation decisions: using the Gartner hype cycle. Libr Leadersh Manage 28(4)
Lau JH, Baldwin T (2016) The sensitivity of topic coherence evaluation to topic cardinality. In: Proceedings of the North American chapter of the association for computational linguistics: human language technologies. San Diego, USA, pp 483–487
Acknowledgements
This research and development project is funded by the German Federal Ministry of Education and Research (BMBF) within the Program Concept Innovations for “Tomorrows Production, Services, and Work” (02K16C190) and managed by the Project Management Agency Forschungszentrum Karlsruhe (PTKA). The authors are responsible for the contents of this publication.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kölbl, L., Grottke, M. (2020). Obtaining More Specific Topics and Detecting Weak Signals by Topic Word Selection. In: Pham, H. (eds) Reliability and Statistical Computing. Springer Series in Reliability Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-43412-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-43412-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43411-3
Online ISBN: 978-3-030-43412-0
eBook Packages: EngineeringEngineering (R0)