Skip to main content

Obtaining More Specific Topics and Detecting Weak Signals by Topic Word Selection

  • Chapter
  • First Online:
Reliability and Statistical Computing

Part of the book series: Springer Series in Reliability Engineering ((RELIABILITY))

Abstract

With topic modeling methods, such as Latent Dirichlet Allocation (LDA), we can find topics in large text collections. To efficiently employ this information, there is a need for a method that automatically analyzes the topics with respect to their usefulness for applications like the detection of new innovations. This paper presents a novel method to automatically evaluate topics produced by LDA. The new approach puts the focus on finding topics with topic words that are not only coherent, but also specific. By using the documents associated with each word to calculate background topics, a baseline can be set for each topic word that helps assess whether its context fits the topic well. Experiments indicate that the resulting topics are more manageable in terms of their interpretability. Moreover, we show that the approach can be used to detect weak signals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Publicly available at https://www.webhose.io/datasets; retrieved on August 27th, 2018.

  2. 2.

    All the paper data sets were downloaded from Scopus API between January 4th, 2019 and February 4th, 2019 via http://www.api.elsevier.com and http://www.scopus.com.

  3. 3.

    In all our experiments we used common preprocessing techniques such as stemming and custom stopword lists to enhance the topic quality.

  4. 4.

    All the paper data sets were downloaded from Scopus API between January 4th, 2019 and February 4th, 2019 via http://www.api.elsevier.com and http://www.scopus.com.

  5. 5.

    www.gartner.com/doc/2803426/hype-cycle-d-printing; www.gartner.com/doc/3100228/hype-cycle-d-printing;

    www.gartner.com/doc/3383717/hype-cycle-d-printing; viewed on February 15th, 2019.

References

  1. Chang J, Gerrish S, Wang C, Boyd-Graber J, Blei DM (2009) Reading tea leaves: how humans interpret topic models. In: Advances in neural information processing systems, vol 22. Vancouver, Canada, pp 288–296

    Google Scholar 

  2. Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on web search and data mining. Shanghai, China, pp 399–408

    Google Scholar 

  3. Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  4. Lau, JH, Newman D, Karimi S, Baldwin T (2010) Best Topic Word Selection for Topic Labelling. In: Proceedings of the 23rd international conference on computational linguistics. Beijing, China, pp 605–613

    Google Scholar 

  5. He D, Wang M, Khattak AM, Zhang L, Gao W (2019) Automatic labeling of topic models using graph-based ranking. IEEE Access 7:131593–131608

    Article  Google Scholar 

  6. Alokaili A, Aletras N, Stevenson M (2019) Re-ranking words to improve interpretability of automatically generated topics. In: Proceedings of the 13th international conference on computational semantics—long Papers. Gothenburg, Sweden, pp 43–54

    Google Scholar 

  7. Aletras N, Stevenson M (2013) Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th international conference on computational semantics. Potsdam, Germany, pp 13–22

    Google Scholar 

  8. Mimno D, Wallach HM, Talley E, Leenders M, McCallum A (2011) Optimizing Semantic Coherence in Topic Models. In: Proceedings of the conference on empirical methods in natural language processing. Edingburgh, United Kingdom, pp 262–272

    Google Scholar 

  9. Rosner F, Hinneburg A, Röder M, Nettling M, Both A (2014) Evaluating topic coherence measures. In: Computing research repository (CoRR), pp 1–4. arXiv:1403.6397

  10. Korencic D, Ristov S, Snajder J (2018) Document-based topic coherence measures for news media text. Expert Syst Appl 114:357–373

    Article  Google Scholar 

  11. AlSumait L, Barbara D, Gentle J, Domeniconi C (2009) Topic Significance Ranking of LDA Generative Models. In: Proceedings of the European conference on machine learning and knowledge discovery in databases. Bled, Slovenia, pp 67–82

    Google Scholar 

  12. Kölbl L, Mühlroth C, Wiser F, Grottke M, Durst C (2019) Big data im Innovationsmanagement: Wie machine learning die Suche nach Trends und Technologien revolutioniert. HMD Praxis der Wirtschaftsinformatik 56(5):900–913

    Google Scholar 

  13. Thorleuchter D, Scheja T, Van den Poel D (2014) Semantic weak signal tracing. In: Expert systems with applications 41(11):5009–5016

    Google Scholar 

  14. Saritas O, Smith JE (2011) The big picture-trends, drivers, wild cards, discontinuities and weak signals. Futures 43(3):292–312

    Google Scholar 

  15. Mühlroth C, Grottke M (2018) A systematic literature review of mining weak signals and trends for corporate foresight. J Bus Econ 88(5):643–687

    Google Scholar 

  16. Lajoie EW, Bridges L (2014) Innovation decisions: using the Gartner hype cycle. Libr Leadersh Manage 28(4)

    Google Scholar 

  17. Lau JH, Baldwin T (2016) The sensitivity of topic coherence evaluation to topic cardinality. In: Proceedings of the North American chapter of the association for computational linguistics: human language technologies. San Diego, USA, pp 483–487

    Google Scholar 

Download references

Acknowledgements

This research and development project is funded by the German Federal Ministry of Education and Research (BMBF) within the Program Concept Innovations for “Tomorrows Production, Services, and Work” (02K16C190) and managed by the Project Management Agency Forschungszentrum Karlsruhe (PTKA). The authors are responsible for the contents of this publication.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laura Kölbl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kölbl, L., Grottke, M. (2020). Obtaining More Specific Topics and Detecting Weak Signals by Topic Word Selection. In: Pham, H. (eds) Reliability and Statistical Computing. Springer Series in Reliability Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-43412-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-43412-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-43411-3

  • Online ISBN: 978-3-030-43412-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics