Skip to main content

Hierarchical neural topic modeling with manifold regularization

Abstract

Topic models have been widely used for learning the latent explainable representation of documents, but most of the existing approaches discover topics in a flat structure. In this study, we propose an effective hierarchical neural topic model with strong interpretability. Unlike the previous neural topic models, we explicitly model the dependency between layers of a network, and then combine latent variables of different layers to reconstruct documents. Utilizing this network structure, our model can extract a tree-shaped topic hierarchy with low redundancy and good explainability by exploiting dependency matrices. Furthermore, we introduce manifold regularization into the proposed method to improve the robustness of topic modeling. Experiments on real-world datasets validate that our model outperforms other topic models in several widely used metrics with much fewer computation costs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Notes

  1. 1.

    The code of our model is available in public at: https://github.com/hostnlp/HNTM.

  2. 2.

    https://github.com/uilab-github/rCRP

  3. 3.

    https://github.com/joewandy/hlda

  4. 4.

    https://github.com/misonuma/tsntm

  5. 5.

    https://github.com/hostnlp/nTSNTM

  6. 6.

    https://github.com/sophieburkhardt/dirichlet-vae-topic-models

  7. 7.

    https://github.com/mxiny/NB-NTM

  8. 8.

    https://github.com/ysmiao/nvdm

References

  1. 1.

    Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 585–591 (2001)

  2. 2.

    Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)

    MathSciNet  MATH  Google Scholar 

  3. 3.

    Blei, D. M., Griffiths, T. L., Jordan, M. I., Tenenbaum, J. B.: Hierarchical topic models and the nested chinese restaurant process. In: Proceedings of the 16th International Conference on Neural Information Processing Systems, pp. 17–24 (2003)

  4. 4.

    Blei, D. M., Ng, A. Y., Jordan, M. I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. 5.

    Burkhardt, S., Kramer, S.: Decoupling sparsity and smoothness in the dirichlet variational autoencoder topic model. J. Mach. Learn. Res. 20(131), 131:1–131:27 (2019)

    MathSciNet  MATH  Google Scholar 

  6. 6.

    Chang, J., Gerrish, S., Wang, C., Boyd-graber, J., Blei, D. M.: Reading tea leaves: How humans interpret topic models. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 288–296 (2009)

  7. 7.

    Chen, Z., Ding, C., Zhang, Z., Rao, Y., Xie, H.: Tree-structured topic modeling with nonparametric neural variational inference. In: Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 2343–2353 (2021)

  8. 8.

    Fang, W., Yao, X., Zhao, X., Yin, J., Xiong, N.: A stochastic control approach to maximize profit on service provisioning for mobile cloudlet platforms. IEEE Trans. Syst. Man Cybern. Sys. 48(4), 522–534 (2018)

    Article  Google Scholar 

  9. 9.

    Goyal, P., Hu, Z., Liang, X., Wang, C., Xing, E. P.: Nonparametric variational auto-encoders for hierarchical representation learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5094–5102 (2017)

  10. 10.

    Hu, W., Zhu, J., Su, H., Zhuo, J., Zhang, B.: Semi-supervised max-margin topic model with manifold posterior regularization. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 1865–1871 (2017)

  11. 11.

    Isonuma, M., Mori, J., Bollegala, D., Sakata, I.: Tree-structured neural topic model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 800–806 (2020)

  12. 12.

    Kim, J. H., Kim, D., Kim, S., Oh, A. H.: Modeling topic hierarchies with the recursive chinese restaurant process. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 783–792 (2012)

  13. 13.

    Kingma, D. P., Welling, M.: Auto-encoding variational bayes. In: Proceedings of the 2nd International Conference on Learning Representations (2014)

  14. 14.

    Lau, J. H., Newman, D., Baldwin, T.: Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 530–539 (2014)

  15. 15.

    Li, X., Zhang, J., Ouyang, J.: Dirichlet multinomial mixture with variational manifold regularization: Topic modeling over short texts. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, vol. 33, pp 7884–7891 (2019)

  16. 16.

    Lin, B., Zhu, F., Zhang, J., Chen, J., Chen, X., Xiong, N.N.: A time-driven data placement strategy for a scientific workflow combining edge computing and cloud computing. IEEE Trans. Industr. Inform. 15(7), 4254–4265 (2019)

    Article  Google Scholar 

  17. 17.

    Liu, L., Huang, H., Gao, Y., Zhang, Y., Wei, X.: Neural variational correlated topic modeling. In: Proceedings of the World Wide Web Conference, pp. 1142–1152 (2019)

  18. 18.

    Liu, T., Zhang, N. L., Chen, P.: Hierarchical latent tree analysis for topic detection. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 256–272 (2014)

  19. 19.

    Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. In: Proceedings of the 5th International Conference on Learning Representations (2017)

  20. 20.

    Miao, Y., Grefenstette, E., Blunsom, P.: Discovering discrete latent topics with neural variational inference. In: Proceedings of the 34th International Conference on Machine Learning, pp. 2410–2419 (2017)

  21. 21.

    Miao, Y., Yu, L., Blunsom, P.: Neural variational inference for text processing. In: Proceedings of the 33nd International Conference on Machine Learning, pp. 1727–1736 (2016)

  22. 22.

    Mimno, D. M., Li, W., McCallum, A.: Mixtures of hierarchical topics with pachinko allocation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 633–640 (2007)

  23. 23.

    Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1791–1799 (2014)

  24. 24.

    Nan, F., Ding, R., Nallapati, R., Xiang, B.: Topic modeling with wasserstein autoencoders. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6345–6381 (2019)

  25. 25.

    Neal, R. M.: Probabilistic inference using markov chain monte carlo methods. Department of Computer Science, University of Toronto Toronto, Ontario, Canada (1993)

  26. 26.

    Newman, D., Lau, J. H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 100–108 (2010)

  27. 27.

    Qu, Y., Xiong, N.: Rfh: A resilient, fault-tolerant and high-efficient replication algorithm for distributed cloud storage. In: Proceedings of the 41st International Conference on Parallel Processing, pp. 520–529 (2012)

  28. 28.

    Srivastava, A., Sutton, C.: Autoencoding variational inference for topic models. In: Proceedings of the 5th International Conference on Learning Representations (2017)

  29. 29.

    Wu, J., Rao, Y., Zhang, Z., Xie, H., Li, Q., Wang, F. L., Chen, Z.: Neural mixed counting models for dispersed topic discovery. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6159–6169 (2020)

  30. 30.

    Wu, X., Li, C., Zhu, Y., Miao, Y.: Short text topic modeling with topic distribution quantization and negative sampling decoder. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 1772–1782 (2020)

  31. 31.

    Xiong, N., Vasilakos, A. V., Wu, J., Yang, Y. R., Rindos, A., Zhou, Y., Song, W. Z., Pan, Y.: A self-tuning failure detection scheme for cloud computing service. In: Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, pp. 668–679 (2012)

Download references

Acknowledgements

We are grateful to the reviewers for their constructive comments and suggestions on this study. This work has been supported in part by the National Natural Science Foundation of China (61972426), Guangdong Basic and Applied Basic Research Foundation (2020A1515010536), the Faculty Research Grants (DB21B6 and DB21A9) of Lingnan University, Hong Kong, and Research Grants Council of Hong Kong SAR, China (UGC/FDS16/E01/19). The work has also been supported in part by the One-off Special Fund from Central and Faculty Fund in Support of Research from 2019/20 to 2021/22 (MIT02/19-20), the Research Cluster Fund (RG 78/2019-2020R), The Education University of Hong Kong.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yanghui Rao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Explainability in the Web

Guest Editors: Guandong Xu, Hongzhi Yin, Irwin King, and Lin Li

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Ding, C., Rao, Y. et al. Hierarchical neural topic modeling with manifold regularization. World Wide Web 24, 2139–2160 (2021). https://doi.org/10.1007/s11280-021-00963-7

Download citation

Keywords

  • Neural topic modeling
  • Hierarchical structure
  • Tree network
  • Manifold regularization