A Temperature-Modified Dynamic Embedded Topic Model

Kumar, Amit; Esmaili, Nazanin; Piccardi, Massimo

doi:10.1007/978-981-19-8746-5_2

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1741))

Included in the following conference series:

Australasian Conference on Data Mining

393 Accesses

Abstract

Topic models are natural language processing models that can parse large collections of documents and automatically discover their main topics. However, conventional topic models fail to capture how such topics change as the collections evolve. To amend this, various researchers have proposed dynamic versions which are able to extract sequences of topics from timestamped document collections. Moreover, a recently-proposed model, the dynamic embedded topic model (DETM), joins such a dynamic analysis with the representational power of word and topic embeddings. In this paper, we propose modifying its word probabilities with a temperature parameter that controls the smoothness/sharpness trade-off of the distributions in an attempt to increase the coherence of the extracted topics. Experimental results over a selection of the COVID-19 Open Research Dataset (CORD-19), the United Nations General Debate Corpus, and the ACL Title and Abstract dataset show that the proposed model – nicknamed DETM-tau after the temperature parameter – has been able to improve the model’s perplexity and topic coherence for all datasets.

Supported by funding from Food Agility CRC Ltd, funded under the Commonwealth Government CRC Program. The CRC Program supports industry-led collaborations between industry, researchers and the community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Otherwise known as the multinomial distribution. The recent literature on variational inference seems to prefer the “categorical distribution” diction.

References

Alvarez-Melis, D., Saveski, M.: Topic modeling in Twitter: aggregating tweets by conversations. In: The 10th International Conference on Web and Social Media, pp. 519–522 (2016)
Google Scholar
Arnold, C., El-Saden, S., Bui, A., Taira, R.: Clinical case-based retrieval using latent topic analysis.In: AMIA Annual Symposium Proceedings, vol. 2010, pp. 26–30 (2010)
Google Scholar
Mikhaylov, S.J., Baturo, A., Dasandi, N.: Understanding state preferences with text as data. In: Introducing the UN General Debate Corpus. Research & Politics (2017)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Bird, S., et al.: The ACL anthology reference corpus: a reference dataset for bibliographic research in computational linguistics. In: International Conference on Language Resources and Evaluation, pp. 1755–1759 (2008)
Google Scholar
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians, pp. 859–877 (2017)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cecchini, M., Aytug, H., Koehler, G.J., Pathak, P.: Making words work: using financial text as a predictor of financial events. Decis. Support Syst. 50(1), 164–175 (2010)
Article Google Scholar
Devyatkin, D., Nechaeva, E., Suvorov, R., Tikhomirov, I.: Mapping the research landscape of agricultural sciences. Foresight STI Govern. 12(1), 57–76 (2018)
Article Google Scholar
Dieng, A.B., Ruiz, F.J.R., Blei, D.M.: The dynamic embedded topic model (2019)
Google Scholar
Dieng, A.B., Ruiz, F.J.R., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020)
Article Google Scholar
Kim, H., Drake, B., Endert, A., Park, H.: ArchiText: interactive hierarchical topic modeling. IEEE Trans. Vis. Comput. Graphics 27(9), 3644–3655 (2021)
Article Google Scholar
Lafferty, J.D., Blei, D.M.: The dynamic topic model. In: The 23rd International Conference on Machine Learning, pp. 113–120 (2006)
Google Scholar
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: The 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), pp. 530–539 (2014)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: The 31th International Conference on Machine Learning, vol. 32, pp. 1188–1196 (2014)
Google Scholar
Liu, T., Zhang, N.L., Chen, P.: Hierarchical latent tree analysis for topic detection. CoRR, vol. 8725, pp. 256–272 (2014)
Google Scholar
Minsky, M.: Steps toward artificial intelligence. Proc. IRE 49(1), 8–30 (1961)
Article MathSciNet Google Scholar
Nguyen, T.H., Shirai, K.: Topic modeling based sentiment analysis on social media for stock market prediction. In: The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNL 2015), pp. 1354–1364 (2015)
Google Scholar
Peng, M., et al.: Neural sparse topical coding. In: The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), pp. 2332–2340 (2018)
Google Scholar
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)
Google Scholar
Rodrigues, F., Lourenco, M., Ribeiro, B., Pereira, F.C.: Learning supervised topic models for classification and regression from crowds. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2409–2422 (2017)
Article Google Scholar
Sarioglu, E., Choi, H.-A., Yadav, K.: Clinical report classification using natural language processing and topic modeling. In: The 11th International Conference on Machine Learning and Applications, vol. 2, pp. 204–209 (2012)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. (2018)
Google Scholar
Wang, L.L., et al.: CORD-19: the COVID-19 open research dataset. In: 1st Workshop on NLP for COVID-19 at ACL 2020, vol. 1, pp. 1–12 (2020)
Google Scholar
Guixian, X., Meng, Y., Chen, Z., Qiu, X., Wang, C., Yao, H.: Research on topic detection and tracking for online news texts. IEEE Access 7, 58407–58418 (2019)
Article Google Scholar
Zhang, A., Zhu, J., Zhang, B.: Sparse online topic models. In: The 22nd International World Wide Web Conference (WWW 2013), pp. 1489–1500 (2013)
Google Scholar
Zhang, R., Pakhomov, S., Gladding, S., Aylward, M., Borman-Shoap, E., Melton, G.: Automated assessment of medical training evaluation text. In: AMIA Annual Symposium Proceedings, vol. 2012, pp. 1459–68 (2012)
Google Scholar
Zhu, J., Xing, E.P.: Sparse topical coding. In: The 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pp. 831–838 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Technology Sydney, Broadway, Sydney, NSW, 2007, Australia
Amit Kumar, Nazanin Esmaili & Massimo Piccardi
Food Agility CRC Ltd., Pitt St., Sydney, NSW, 2000, Australia
Amit Kumar

Authors

Amit Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Nazanin Esmaili
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Piccardi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amit Kumar .

Editor information

Editors and Affiliations

Western Sydney University, Sydney, NSW, Australia
Laurence A. F. Park
Victoria University of Wellington, Wellington, New Zealand
Heitor Murilo Gomes
Auckland University of Technology, Auckland, New Zealand
Maryam Doborjeh
RMIT University, Melbourne, VIC, Australia
Yee Ling Boo
University of Auckland, Auckland, New Zealand
Yun Sing Koh
CSIRO Scientific Computing, Canberra, ACT, Australia
Yanchang Zhao
Australian National University, Canberra, ACT, Australia
Graham Williams
Western Sydney University, Sydney, NSW, Australia
Simeon Simoff

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, A., Esmaili, N., Piccardi, M. (2022). A Temperature-Modified Dynamic Embedded Topic Model. In: Park, L.A.F., et al. Data Mining. AusDM 2022. Communications in Computer and Information Science, vol 1741. Springer, Singapore. https://doi.org/10.1007/978-981-19-8746-5_2

Download citation

DOI: https://doi.org/10.1007/978-981-19-8746-5_2
Published: 05 December 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8745-8
Online ISBN: 978-981-19-8746-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics