Skip to main content

Improving Neural Topic Models with Wasserstein Knowledge Distillation

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2023)

Abstract

Topic modeling is a dominant method for exploring document collections on the web and in digital libraries. Recent approaches to topic modeling use pretrained contextualized language models and variational autoencoders. However, large neural topic models have a considerable memory footprint. In this paper, we propose a knowledge distillation framework to compress a contextualized topic model without loss in topic quality. In particular, the proposed distillation objective is to minimize the cross-entropy of the soft labels produced by the teacher and the student models, as well as to minimize the squared 2-Wasserstein distance between the latent distributions learned by the two models. Experiments on two publicly available datasets show that the student trained with knowledge distillation achieves topic coherence much higher than that of the original student model, and even surpasses the teacher while containing far fewer parameters than the teacher. The distilled model also outperforms several other competitive topic models on topic coherence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/AdhyaSuman/CTMKD.

References

  1. Adhya, S., Sanyal, D.K.: What does the Indian Parliament discuss? An exploratory analysis of the question hour in the Lok Sabha. In: Proceedings of the LREC 2022 Workshop on Natural Language Processing for Political Sciences, Marseille, France, pp. 72–78. European Language Resources Association, June 2022. https://aclanthology.org/2022.politicalnlp-1.10

  2. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the International Conference on Machine Learning, pp. 214–223. PMLR (2017). https://proceedings.mlr.press/v70/arjovsky17a.html

  3. Bianchi, F., Terragni, S., Hovy, D.: Pre-training is a hot topic: contextualized document embeddings improve topic coherence. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 759–766 (2021). https://aclanthology.org/2021.acl-short.96/

  4. Bianchi, F., Terragni, S., Hovy, D., Nozza, D., Fersini, E.: Cross-lingual contextualized topic models with zero-shot learning. In: The 16th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics (2021). https://aclanthology.org/2021.eacl-main.143/

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

  6. Dieng, A.B., Ruiz, F.J., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020). https://doi.org/10.1162/tacl_a_00325

    Article  Google Scholar 

  7. Frogner, C., Zhang, C., Mobahi, H., Araya-Polo, M., Poggio, T.: Learning with a Wasserstein loss. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, vol. 2, pp. 2053–2061. MIT Press, Cambridge (2015). https://papers.neurips.cc/paper/5679-learning-with-a-wasserstein-loss.pdf

  8. Furlanello, T., Lipton, Z., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks. In: Proceedings of the International Conference on Machine Learning, pp. 1607–1616. PMLR (2018). https://proceedings.mlr.press/v80/furlanello18a.html

  9. Gao, R., Kleywegt, A.: Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. (2022). https://doi.org/10.1287/moor.2022.1275

  10. Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vis. 129(6), 1789–1819 (2021). https://doi.org/10.1007/s11263-021-01453-z

    Article  Google Scholar 

  11. Guo, J., Cai, Y., Fan, Y., Sun, F., Zhang, R., Cheng, X.: Semantic models for the first-stage retrieval: a comprehensive review. ACM Trans. Inf. Syst. (TOIS) 40(4), 1–42 (2022). https://doi.org/10.1145/3486250

    Article  Google Scholar 

  12. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network, 2(7). arXiv preprint arXiv:1503.02531 (2015)

  13. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the International Conference on Learning Representations (2014). https://arxiv.org/abs/1312.6114

  14. Krasnashchok, K., Jouili, S.: Improving topic quality by promoting named entities in topic modeling. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 247–253 (2018). https://aclanthology.org/P18-2040/

  15. Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, pp. 530–539. Association for Computational Linguistics, April 2014. https://aclanthology.org/E14-1056

  16. Montavon, G., Müller, K.R., Cuturi, M.: Wasserstein training of restricted Boltzmann machines. In: Advances in Neural Information Processing Systems, vol. 29 (2016). https://papers.nips.cc/paper/6248-wasserstein-training-of-restricted-boltzmann-machines

  17. Nityasya, M.N., Wibowo, H.A., Chevi, R., Prasojo, R.E., Aji, A.F.: Which student is best? A comprehensive knowledge distillation exam for task-specific BERT models. arXiv preprint arXiv:2201.00558 (2022)

  18. Olkin, I., Pukelsheim, F.: The distance between two random vectors with given dispersion matrices. Linear Algebra Appl. 48, 257–263 (1982). https://doi.org/10.1016/0024-3795(82)90112-4

    Article  MathSciNet  MATH  Google Scholar 

  19. Ozair, S., Lynch, C., Bengio, Y., van den Oord, A., Levine, S., Sermanet, P.: Wasserstein dependency measure for representation learning. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/f9209b7866c9f69823201c1732cc8645-Paper.pdf

  20. Pan, S., Wu, J., Zhu, X., Zhang, C., Wang, Y.: Tri-party deep network representation. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 1895–1901. AAAI Press (2016). https://www.ijcai.org/Proceedings/16/Papers/271.pdf

  21. Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances. Ann. Rev. Stat. Appl. 6, 405–431 (2019). https://doi.org/10.1146/annurev-statistics-030718-104938

    Article  MathSciNet  Google Scholar 

  22. Reimers, N., et al.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics (2019). https://aclanthology.org/D19-1410

  23. Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408 (2015). https://doi.org/10.1145/2684822.2685324

  24. Srivastava, A., Sutton, C.: Autoencoding variational inference for topic models. In: International Conference on Learning Representations (2017). https://openreview.net/forum?id=BybtVK9lg

  25. Terragni, S., Fersini, E., Galuzzi, B.G., Tropeano, P., Candelieri, A.: OCTIS: comparing and optimizing topic models is simple! In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 263–270 (2021). https://aclanthology.org/2021.eacl-demos.31/

  26. Vayansky, I., Kumar, S.A.: A review of topic modeling methods. Inf. Syst. 94, 101582 (2020). https://doi.org/10.1016/j.is.2020.101582

    Article  Google Scholar 

  27. Villani, C.: Topics in Optimal Transportation, vol. 58. American Mathematical Society (2021). https://www.math.ucla.edu/~wgangbo/Cedric-Villani.pdf

  28. Zhai, C., Geigle, C.: A tutorial on probabilistic topic models for text data retrieval and analysis. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1395–1398 (2018). https://doi.org/10.1145/3209978.3210189

  29. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713–3722 (2019). https://doi.ieeecomputersociety.org/10.1109/ICCV.2019.00381

  30. Zhang, Y., Jiang, T., Yang, T., Li, X., Wang, S.: HTKG: deep keyphrase generation with neural hierarchical topic guidance. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 1044–1054. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3477495.3531990

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debarshi Kumar Sanyal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Adhya, S., Sanyal, D.K. (2023). Improving Neural Topic Models with Wasserstein Knowledge Distillation. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28238-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28237-9

  • Online ISBN: 978-3-031-28238-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics