Abstract
Hierarchical topic modeling has been widely used in mining the latent topic hierarchy of documents. However, most of such models are limited to a one-shot scenario since they do not use the identified topic information to guide the subsequent mining of topics. By storing and exploiting the previous knowledge, we propose a lifelong hierarchical topic model based on Non-negative Matrix Factorization (NMF) for boosting the topic quality over a text stream. In particular, we construct a knowledge graph by the accumulated topic hierarchy information and use the knowledge graph to guide the training of our model on future documents. Moreover, the structure information in the knowledge graph is completed by supervised learning. Experiments on real-world corpora validate the effectiveness of our approach on lifelong learning paradigms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that several dummy root topics were used in the original CluHTM. However, we experimentally observed that such a model quite concentrated on a few topics, especially for a relatively small corpus. To achieve a reasonable topic structure in lifelong learning paradigms, we discard such dummy root topics.
References
Ahmed, A., Hong, L., Smola, A.: Nested Chinese restaurant franchise process: applications to user tracking and document modeling. In: ICML, pp. 1426–1434 (2013)
Alvarez-Melis, D., Jaakkola, T.S.: Tree-structured decoding with doubly-recurrent neural networks. In: ICLR (2017)
Andrieu, C., De Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Mach. Learn. 50(1), 5–43 (2003)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Card, D., Tan, C., Smith, N.A.: Neural models for documents with metadata. In: ACL, pp. 2031–2040 (2018)
Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr, E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: WSDM, pp. 101–110 (2010)
Chen, X.: Learning with sparsity: Structures, optimization and applications. Ph.D. thesis, Carnegie Mellon University (2013)
Chen, Y., Wu, J., Lin, J., Liu, R., Zhang, H., Ye, Z.: Affinity regularized non-negative matrix factorization for lifelong topic modeling. IEEE Trans. Knowl. Data Eng. 32(7), 1249–1262 (2020)
Chen, Y., Zhang, H., Wu, J., Wang, X., Liu, R., Lin, M.: Modeling emerging, evolving and fading topics using dynamic soft orthogonal NMF with sparse representation. In: ICDM, pp. 61–70 (2015)
Chen, Z., Liu, B.: Topic modeling using topics from many domains, lifelong learning and big data. In: ICML. vol. 32, pp. 703–711 (2014)
Chen, Z., Ma, N., Liu, B.: Lifelong learning for sentiment classification. arXiv preprint arXiv:1801.02808 (2018)
Chen, Z., Ding, C., Zhang, Z., Rao, Y., Xie, H.: Tree-structured topic modeling with nonparametric neural variational inference. In: ACL/IJCNLP, pp. 2343–2353 (2021)
Choo, J., Lee, C., Reddy, C.K., Park, H.: Weakly supervised nonnegative matrix factorization for user-driven clustering. Data Min. Knowl. Discov. 29(6), 1598–1621 (2015)
Dai, L., Zhu, R., Wang, J.: Joint nonnegative matrix factorization based on sparse and graph laplacian regularization for clustering and co-differential expression genes analysis. Complex. 2020, 3917812:1–3917812:10 (2020)
Duan, Z., et al.: Sawtooth factorial topic embeddings guided gamma belief network. In: ICML, pp. 2903–2913 (2021)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AISTATS, vol. 15, pp. 315–323 (2011)
Greene, D., O’Callaghan, D., Cunningham, P.: How many topics? stability analysis for topic models. In: ECML/PKDD, vol. 8724, pp. 498–513 (2014)
Griffiths, T., Jordan, M., Tenenbaum, J., Blei, D.: Hierarchical topic models and the nested Chinese restaurant process. In: NIPS, vol. 16, pp. 17–24 (2003)
Gupta, P., Chaudhary, Y., Runkler, T.A., Schütze, H.: Neural topic modeling with continual lifelong learning. In: ICML, vol. 119, pp. 3907–3917 (2020)
Isonuma, M., Mori, J., Bollegala, D., Sakata, I.: Tree-structured neural topic model. In: ACL, pp. 800–806 (2020)
Kim, J.H., Kim, D., Kim, S., Oh, A.: Modeling topic hierarchies with the recursive Chinese restaurant process. In: CIKM, pp. 783–792 (2012)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. CoRR abs/1612.00796 (2016)
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: EACL, pp. 530–539 (2014)
Lee, D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS. vol. 13, pp. 556–562 (2000)
Lin, T., Hu, Z., Guo, X.: Sparsemax and relaxed wasserstein for topic sparsity. In: WSDM, pp. 141–149 (2019)
Liu, R., Wang, X., Wang, D., Zuo, Y., Zhang, H., Zheng, X.: Topic splitting: a hierarchical topic model based on non-negative matrix factorization. J. Syst. Sci. Syst. Eng. 27(4), 479–496 (2018)
Miao, Y., Grefenstette, E., Blunsom, P.: Discovering discrete latent topics with neural variational inference. In: ICML, pp. 2410–2419 (2017)
Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with pachinko allocation. In: ICML, pp. 633–640 (2007)
Ming, Z.Y., Wang, K., Chua, T.S.: Prototype hierarchy based clustering for the categorization and navigation of web collections. In: SIGIR, pp. 2–9 (2010)
Mitchell, T., et al.: Never-ending learning. Commun. ACM 61(5), 103–115 (2018)
Paisley, J.W., Wang, C., Blei, D.M., Jordan, M.I.: Nested hierarchical dirichlet processes. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 256–270 (2015)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Qin, X., Lu, Y., Chen, Y., Rao, Y.: Lifelong learning of topics and domain-specific word embeddings. In: ACL/IJCNLP (Findings), pp. 2294–2309 (2021)
Rohe, K., Qin, T., Yu, B.: Co-clustering directed graphs to discover asymmetries and directional communities. PNAS 113(45), 12679–12684 (2016)
Sethuraman, J.: A constructive definition of dirichlet priors. Statistica Sinica 639–650 (1994)
Silver, D.L.: Machine lifelong learning: challenges and benefits for artificial general intelligence. In: AGI, vol. 6830, pp. 370–375 (2011)
Tan, C., Card, D., Smith, N.A.: Friendships, rivalries, and trysts: characterizing relations between ideas in texts. In: ACL, pp. 773–783 (2017)
Teh, Y., Jordan, M., Beal, M., Blei, D.: Sharing clusters among related groups: hierarchical dirichlet processes. In: NIPS, vol. 17, pp. 1385–1392 (2004)
Viegas, F., et al.: Cluwords: exploiting semantic word clustering representation for enhanced topic modeling. In: WSDM, pp. 753–761 (2019)
Viegas, F., Cunha, W., Gomes, C., Pereira, A., Rocha, L., Goncalves, M.: Cluhtm-semantic hierarchical topic modeling based on cluwords. In: ACL, pp. 8138–8150 (2020)
Wu, J., et al.: Neural mixed counting models for dispersed topic discovery. In: ACL, pp. 6159–6169 (2020)
Xu, Z., Chang, X., Xu, F., Zhang, H.: L\({}_{\text{1/2 }}\) regularization: A thresholding representation theory and a fast solver. IEEE Trans. Neural Networks Learn. Syst. 23(7), 1013–1027 (2012)
Zhao, H., Phung, D., Huynh, V., Le, T., Buntine, W.L.: Neural topic model via optimal transport. In: ICLR (2021)
Acknowledgements
This work has been supported by the National Natural Science Foundation of China (61972426).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lin, Z., Yan, J., Lei, Z., Rao, Y. (2024). Lifelong Hierarchical Topic Modeling via Non-negative Matrix Factorization. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14334. Springer, Singapore. https://doi.org/10.1007/978-981-97-2421-5_11
Download citation
DOI: https://doi.org/10.1007/978-981-97-2421-5_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2420-8
Online ISBN: 978-981-97-2421-5
eBook Packages: Computer ScienceComputer Science (R0)