Lifelong Hierarchical Topic Modeling via Non-negative Matrix Factorization

Lin, Zhicheng; Yan, Jiaxing; Lei, Zhiqi; Rao, Yanghui

doi:10.1007/978-981-97-2421-5_11

Zhicheng Lin¹²,
Jiaxing Yan¹²,
Zhiqi Lei¹² &
…
Yanghui Rao¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14334))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

46 Accesses

Abstract

Hierarchical topic modeling has been widely used in mining the latent topic hierarchy of documents. However, most of such models are limited to a one-shot scenario since they do not use the identified topic information to guide the subsequent mining of topics. By storing and exploiting the previous knowledge, we propose a lifelong hierarchical topic model based on Non-negative Matrix Factorization (NMF) for boosting the topic quality over a text stream. In particular, we construct a knowledge graph by the accumulated topic hierarchy information and use the knowledge graph to guide the training of our model on future documents. Moreover, the structure information in the knowledge graph is completed by supervised learning. Experiments on real-world corpora validate the effectiveness of our approach on lifelong learning paradigms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that several dummy root topics were used in the original CluHTM. However, we experimentally observed that such a model quite concentrated on a few topics, especially for a relatively small corpus. To achieve a reasonable topic structure in lifelong learning paradigms, we discard such dummy root topics.

References

Ahmed, A., Hong, L., Smola, A.: Nested Chinese restaurant franchise process: applications to user tracking and document modeling. In: ICML, pp. 1426–1434 (2013)
Google Scholar
Alvarez-Melis, D., Jaakkola, T.S.: Tree-structured decoding with doubly-recurrent neural networks. In: ICLR (2017)
Google Scholar
Andrieu, C., De Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Mach. Learn. 50(1), 5–43 (2003)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Google Scholar
Card, D., Tan, C., Smith, N.A.: Neural models for documents with metadata. In: ACL, pp. 2031–2040 (2018)
Google Scholar
Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr, E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: WSDM, pp. 101–110 (2010)
Google Scholar
Chen, X.: Learning with sparsity: Structures, optimization and applications. Ph.D. thesis, Carnegie Mellon University (2013)
Google Scholar
Chen, Y., Wu, J., Lin, J., Liu, R., Zhang, H., Ye, Z.: Affinity regularized non-negative matrix factorization for lifelong topic modeling. IEEE Trans. Knowl. Data Eng. 32(7), 1249–1262 (2020)
Article Google Scholar
Chen, Y., Zhang, H., Wu, J., Wang, X., Liu, R., Lin, M.: Modeling emerging, evolving and fading topics using dynamic soft orthogonal NMF with sparse representation. In: ICDM, pp. 61–70 (2015)
Google Scholar
Chen, Z., Liu, B.: Topic modeling using topics from many domains, lifelong learning and big data. In: ICML. vol. 32, pp. 703–711 (2014)
Google Scholar
Chen, Z., Ma, N., Liu, B.: Lifelong learning for sentiment classification. arXiv preprint arXiv:1801.02808 (2018)
Chen, Z., Ding, C., Zhang, Z., Rao, Y., Xie, H.: Tree-structured topic modeling with nonparametric neural variational inference. In: ACL/IJCNLP, pp. 2343–2353 (2021)
Google Scholar
Choo, J., Lee, C., Reddy, C.K., Park, H.: Weakly supervised nonnegative matrix factorization for user-driven clustering. Data Min. Knowl. Discov. 29(6), 1598–1621 (2015)
Article MathSciNet Google Scholar
Dai, L., Zhu, R., Wang, J.: Joint nonnegative matrix factorization based on sparse and graph laplacian regularization for clustering and co-differential expression genes analysis. Complex. 2020, 3917812:1–3917812:10 (2020)
Google Scholar
Duan, Z., et al.: Sawtooth factorial topic embeddings guided gamma belief network. In: ICML, pp. 2903–2913 (2021)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AISTATS, vol. 15, pp. 315–323 (2011)
Google Scholar
Greene, D., O’Callaghan, D., Cunningham, P.: How many topics? stability analysis for topic models. In: ECML/PKDD, vol. 8724, pp. 498–513 (2014)
Google Scholar
Griffiths, T., Jordan, M., Tenenbaum, J., Blei, D.: Hierarchical topic models and the nested Chinese restaurant process. In: NIPS, vol. 16, pp. 17–24 (2003)
Google Scholar
Gupta, P., Chaudhary, Y., Runkler, T.A., Schütze, H.: Neural topic modeling with continual lifelong learning. In: ICML, vol. 119, pp. 3907–3917 (2020)
Google Scholar
Isonuma, M., Mori, J., Bollegala, D., Sakata, I.: Tree-structured neural topic model. In: ACL, pp. 800–806 (2020)
Google Scholar
Kim, J.H., Kim, D., Kim, S., Oh, A.: Modeling topic hierarchies with the recursive Chinese restaurant process. In: CIKM, pp. 783–792 (2012)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)
Google Scholar
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. CoRR abs/1612.00796 (2016)
Google Scholar
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: EACL, pp. 530–539 (2014)
Google Scholar
Lee, D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS. vol. 13, pp. 556–562 (2000)
Google Scholar
Lin, T., Hu, Z., Guo, X.: Sparsemax and relaxed wasserstein for topic sparsity. In: WSDM, pp. 141–149 (2019)
Google Scholar
Liu, R., Wang, X., Wang, D., Zuo, Y., Zhang, H., Zheng, X.: Topic splitting: a hierarchical topic model based on non-negative matrix factorization. J. Syst. Sci. Syst. Eng. 27(4), 479–496 (2018)
Article Google Scholar
Miao, Y., Grefenstette, E., Blunsom, P.: Discovering discrete latent topics with neural variational inference. In: ICML, pp. 2410–2419 (2017)
Google Scholar
Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with pachinko allocation. In: ICML, pp. 633–640 (2007)
Google Scholar
Ming, Z.Y., Wang, K., Chua, T.S.: Prototype hierarchy based clustering for the categorization and navigation of web collections. In: SIGIR, pp. 2–9 (2010)
Google Scholar
Mitchell, T., et al.: Never-ending learning. Commun. ACM 61(5), 103–115 (2018)
Article Google Scholar
Paisley, J.W., Wang, C., Blei, D.M., Jordan, M.I.: Nested hierarchical dirichlet processes. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 256–270 (2015)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Google Scholar
Qin, X., Lu, Y., Chen, Y., Rao, Y.: Lifelong learning of topics and domain-specific word embeddings. In: ACL/IJCNLP (Findings), pp. 2294–2309 (2021)
Google Scholar
Rohe, K., Qin, T., Yu, B.: Co-clustering directed graphs to discover asymmetries and directional communities. PNAS 113(45), 12679–12684 (2016)
Article MathSciNet Google Scholar
Sethuraman, J.: A constructive definition of dirichlet priors. Statistica Sinica 639–650 (1994)
Google Scholar
Silver, D.L.: Machine lifelong learning: challenges and benefits for artificial general intelligence. In: AGI, vol. 6830, pp. 370–375 (2011)
Google Scholar
Tan, C., Card, D., Smith, N.A.: Friendships, rivalries, and trysts: characterizing relations between ideas in texts. In: ACL, pp. 773–783 (2017)
Google Scholar
Teh, Y., Jordan, M., Beal, M., Blei, D.: Sharing clusters among related groups: hierarchical dirichlet processes. In: NIPS, vol. 17, pp. 1385–1392 (2004)
Google Scholar
Viegas, F., et al.: Cluwords: exploiting semantic word clustering representation for enhanced topic modeling. In: WSDM, pp. 753–761 (2019)
Google Scholar
Viegas, F., Cunha, W., Gomes, C., Pereira, A., Rocha, L., Goncalves, M.: Cluhtm-semantic hierarchical topic modeling based on cluwords. In: ACL, pp. 8138–8150 (2020)
Google Scholar
Wu, J., et al.: Neural mixed counting models for dispersed topic discovery. In: ACL, pp. 6159–6169 (2020)
Google Scholar
Xu, Z., Chang, X., Xu, F., Zhang, H.: L\({}_{\text{1/2 }}\) regularization: A thresholding representation theory and a fast solver. IEEE Trans. Neural Networks Learn. Syst. 23(7), 1013–1027 (2012)
Article Google Scholar
Zhao, H., Phung, D., Huynh, V., Le, T., Buntine, W.L.: Neural topic model via optimal transport. In: ICLR (2021)
Google Scholar

Download references

Acknowledgements

This work has been supported by the National Natural Science Foundation of China (61972426).

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Zhicheng Lin, Jiaxing Yan, Zhiqi Lei & Yanghui Rao

Authors

Zhicheng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxing Yan
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqi Lei
View author publications
You can also search for this author in PubMed Google Scholar
Yanghui Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanghui Rao .

Editor information

Editors and Affiliations

Peng Cheng Laboratory, Shenzhen, China
Xiangyu Song
China University of Geosciences, Wuhan, China
Ruyi Feng
China University of Geosciences, Wuhan, China
Yunliang Chen
Deakin University, Burwood, VIC, Australia
Jianxin Li
University of Exeter, Exeter, UK
Geyong Min

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, Z., Yan, J., Lei, Z., Rao, Y. (2024). Lifelong Hierarchical Topic Modeling via Non-negative Matrix Factorization. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14334. Springer, Singapore. https://doi.org/10.1007/978-981-97-2421-5_11

Download citation

DOI: https://doi.org/10.1007/978-981-97-2421-5_11
Published: 12 May 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2420-8
Online ISBN: 978-981-97-2421-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Lifelong Hierarchical Topic Modeling via Non-negative Matrix Factorization