Obtaining More Specific Topics and Detecting Weak Signals by Topic Word Selection

Kölbl, Laura; Grottke, Michael

doi:10.1007/978-3-030-43412-0_12

Laura Kölbl³ &
Michael Grottke^3,4

Part of the book series: Springer Series in Reliability Engineering ((RELIABILITY))

817 Accesses
2 Citations

Abstract

With topic modeling methods, such as Latent Dirichlet Allocation (LDA), we can find topics in large text collections. To efficiently employ this information, there is a need for a method that automatically analyzes the topics with respect to their usefulness for applications like the detection of new innovations. This paper presents a novel method to automatically evaluate topics produced by LDA. The new approach puts the focus on finding topics with topic words that are not only coherent, but also specific. By using the documents associated with each word to calculate background topics, a baseline can be set for each topic word that helps assess whether its context fits the topic well. Experiments indicate that the resulting topics are more manageable in terms of their interpretability. Moreover, we show that the approach can be used to detect weak signals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Publicly available at https://www.webhose.io/datasets; retrieved on August 27th, 2018.
2.
All the paper data sets were downloaded from Scopus API between January 4th, 2019 and February 4th, 2019 via http://www.api.elsevier.com and http://www.scopus.com.
3.
In all our experiments we used common preprocessing techniques such as stemming and custom stopword lists to enhance the topic quality.
4.
All the paper data sets were downloaded from Scopus API between January 4th, 2019 and February 4th, 2019 via http://www.api.elsevier.com and http://www.scopus.com.
5.
www.gartner.com/doc/2803426/hype-cycle-d-printing; www.gartner.com/doc/3100228/hype-cycle-d-printing;
www.gartner.com/doc/3383717/hype-cycle-d-printing; viewed on February 15th, 2019.

References

Chang J, Gerrish S, Wang C, Boyd-Graber J, Blei DM (2009) Reading tea leaves: how humans interpret topic models. In: Advances in neural information processing systems, vol 22. Vancouver, Canada, pp 288–296
Google Scholar
Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on web search and data mining. Shanghai, China, pp 399–408
Google Scholar
Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Lau, JH, Newman D, Karimi S, Baldwin T (2010) Best Topic Word Selection for Topic Labelling. In: Proceedings of the 23rd international conference on computational linguistics. Beijing, China, pp 605–613
Google Scholar
He D, Wang M, Khattak AM, Zhang L, Gao W (2019) Automatic labeling of topic models using graph-based ranking. IEEE Access 7:131593–131608
Article Google Scholar
Alokaili A, Aletras N, Stevenson M (2019) Re-ranking words to improve interpretability of automatically generated topics. In: Proceedings of the 13th international conference on computational semantics—long Papers. Gothenburg, Sweden, pp 43–54
Google Scholar
Aletras N, Stevenson M (2013) Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th international conference on computational semantics. Potsdam, Germany, pp 13–22
Google Scholar
Mimno D, Wallach HM, Talley E, Leenders M, McCallum A (2011) Optimizing Semantic Coherence in Topic Models. In: Proceedings of the conference on empirical methods in natural language processing. Edingburgh, United Kingdom, pp 262–272
Google Scholar
Rosner F, Hinneburg A, Röder M, Nettling M, Both A (2014) Evaluating topic coherence measures. In: Computing research repository (CoRR), pp 1–4. arXiv:1403.6397
Korencic D, Ristov S, Snajder J (2018) Document-based topic coherence measures for news media text. Expert Syst Appl 114:357–373
Article Google Scholar
AlSumait L, Barbara D, Gentle J, Domeniconi C (2009) Topic Significance Ranking of LDA Generative Models. In: Proceedings of the European conference on machine learning and knowledge discovery in databases. Bled, Slovenia, pp 67–82
Google Scholar
Kölbl L, Mühlroth C, Wiser F, Grottke M, Durst C (2019) Big data im Innovationsmanagement: Wie machine learning die Suche nach Trends und Technologien revolutioniert. HMD Praxis der Wirtschaftsinformatik 56(5):900–913
Google Scholar
Thorleuchter D, Scheja T, Van den Poel D (2014) Semantic weak signal tracing. In: Expert systems with applications 41(11):5009–5016
Google Scholar
Saritas O, Smith JE (2011) The big picture-trends, drivers, wild cards, discontinuities and weak signals. Futures 43(3):292–312
Google Scholar
Mühlroth C, Grottke M (2018) A systematic literature review of mining weak signals and trends for corporate foresight. J Bus Econ 88(5):643–687
Google Scholar
Lajoie EW, Bridges L (2014) Innovation decisions: using the Gartner hype cycle. Libr Leadersh Manage 28(4)
Google Scholar
Lau JH, Baldwin T (2016) The sensitivity of topic coherence evaluation to topic cardinality. In: Proceedings of the North American chapter of the association for computational linguistics: human language technologies. San Diego, USA, pp 483–487
Google Scholar

Download references

Acknowledgements

This research and development project is funded by the German Federal Ministry of Education and Research (BMBF) within the Program Concept Innovations for “Tomorrows Production, Services, and Work” (02K16C190) and managed by the Project Management Agency Forschungszentrum Karlsruhe (PTKA). The authors are responsible for the contents of this publication.

Author information

Authors and Affiliations

Friedrich-Alexander-Universität Erlangen-Nürnberg, Nürnberg, Germany
Laura Kölbl & Michael Grottke
GfK SE, Global Data Science, Nürnberg, Germany
Michael Grottke

Authors

Laura Kölbl
View author publications
You can also search for this author in PubMed Google Scholar
Michael Grottke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laura Kölbl .

Editor information

Editors and Affiliations

Department of Industrial and Systems Engineering, Rutgers University, Piscataway, NJ, USA
Hoang Pham

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kölbl, L., Grottke, M. (2020). Obtaining More Specific Topics and Detecting Weak Signals by Topic Word Selection. In: Pham, H. (eds) Reliability and Statistical Computing. Springer Series in Reliability Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-43412-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-43412-0_12
Published: 29 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43411-3
Online ISBN: 978-3-030-43412-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics