Improving Neural Topic Models with Wasserstein Knowledge Distillation

Adhya, Suman; Sanyal, Debarshi Kumar

doi:10.1007/978-3-031-28238-6_21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13981))

Included in the following conference series:

European Conference on Information Retrieval

1527 Accesses
1 Citations
2 Altmetric

Abstract

Topic modeling is a dominant method for exploring document collections on the web and in digital libraries. Recent approaches to topic modeling use pretrained contextualized language models and variational autoencoders. However, large neural topic models have a considerable memory footprint. In this paper, we propose a knowledge distillation framework to compress a contextualized topic model without loss in topic quality. In particular, the proposed distillation objective is to minimize the cross-entropy of the soft labels produced by the teacher and the student models, as well as to minimize the squared 2-Wasserstein distance between the latent distributions learned by the two models. Experiments on two publicly available datasets show that the student trained with knowledge distillation achieves topic coherence much higher than that of the original student model, and even surpasses the teacher while containing far fewer parameters than the teacher. The distilled model also outperforms several other competitive topic models on topic coherence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/AdhyaSuman/CTMKD.

References

Adhya, S., Sanyal, D.K.: What does the Indian Parliament discuss? An exploratory analysis of the question hour in the Lok Sabha. In: Proceedings of the LREC 2022 Workshop on Natural Language Processing for Political Sciences, Marseille, France, pp. 72–78. European Language Resources Association, June 2022. https://aclanthology.org/2022.politicalnlp-1.10
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the International Conference on Machine Learning, pp. 214–223. PMLR (2017). https://proceedings.mlr.press/v70/arjovsky17a.html
Bianchi, F., Terragni, S., Hovy, D.: Pre-training is a hot topic: contextualized document embeddings improve topic coherence. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 759–766 (2021). https://aclanthology.org/2021.acl-short.96/
Bianchi, F., Terragni, S., Hovy, D., Nozza, D., Fersini, E.: Cross-lingual contextualized topic models with zero-shot learning. In: The 16th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics (2021). https://aclanthology.org/2021.eacl-main.143/
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Dieng, A.B., Ruiz, F.J., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020). https://doi.org/10.1162/tacl_a_00325
Article Google Scholar
Frogner, C., Zhang, C., Mobahi, H., Araya-Polo, M., Poggio, T.: Learning with a Wasserstein loss. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, vol. 2, pp. 2053–2061. MIT Press, Cambridge (2015). https://papers.neurips.cc/paper/5679-learning-with-a-wasserstein-loss.pdf
Furlanello, T., Lipton, Z., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks. In: Proceedings of the International Conference on Machine Learning, pp. 1607–1616. PMLR (2018). https://proceedings.mlr.press/v80/furlanello18a.html
Gao, R., Kleywegt, A.: Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. (2022). https://doi.org/10.1287/moor.2022.1275
Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vis. 129(6), 1789–1819 (2021). https://doi.org/10.1007/s11263-021-01453-z
Article Google Scholar
Guo, J., Cai, Y., Fan, Y., Sun, F., Zhang, R., Cheng, X.: Semantic models for the first-stage retrieval: a comprehensive review. ACM Trans. Inf. Syst. (TOIS) 40(4), 1–42 (2022). https://doi.org/10.1145/3486250
Article Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network, 2(7). arXiv preprint arXiv:1503.02531 (2015)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the International Conference on Learning Representations (2014). https://arxiv.org/abs/1312.6114
Krasnashchok, K., Jouili, S.: Improving topic quality by promoting named entities in topic modeling. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 247–253 (2018). https://aclanthology.org/P18-2040/
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, pp. 530–539. Association for Computational Linguistics, April 2014. https://aclanthology.org/E14-1056
Montavon, G., Müller, K.R., Cuturi, M.: Wasserstein training of restricted Boltzmann machines. In: Advances in Neural Information Processing Systems, vol. 29 (2016). https://papers.nips.cc/paper/6248-wasserstein-training-of-restricted-boltzmann-machines
Nityasya, M.N., Wibowo, H.A., Chevi, R., Prasojo, R.E., Aji, A.F.: Which student is best? A comprehensive knowledge distillation exam for task-specific BERT models. arXiv preprint arXiv:2201.00558 (2022)
Olkin, I., Pukelsheim, F.: The distance between two random vectors with given dispersion matrices. Linear Algebra Appl. 48, 257–263 (1982). https://doi.org/10.1016/0024-3795(82)90112-4
Article MathSciNet MATH Google Scholar
Ozair, S., Lynch, C., Bengio, Y., van den Oord, A., Levine, S., Sermanet, P.: Wasserstein dependency measure for representation learning. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/f9209b7866c9f69823201c1732cc8645-Paper.pdf
Pan, S., Wu, J., Zhu, X., Zhang, C., Wang, Y.: Tri-party deep network representation. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 1895–1901. AAAI Press (2016). https://www.ijcai.org/Proceedings/16/Papers/271.pdf
Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances. Ann. Rev. Stat. Appl. 6, 405–431 (2019). https://doi.org/10.1146/annurev-statistics-030718-104938
Article MathSciNet Google Scholar
Reimers, N., et al.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics (2019). https://aclanthology.org/D19-1410
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408 (2015). https://doi.org/10.1145/2684822.2685324
Srivastava, A., Sutton, C.: Autoencoding variational inference for topic models. In: International Conference on Learning Representations (2017). https://openreview.net/forum?id=BybtVK9lg
Terragni, S., Fersini, E., Galuzzi, B.G., Tropeano, P., Candelieri, A.: OCTIS: comparing and optimizing topic models is simple! In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 263–270 (2021). https://aclanthology.org/2021.eacl-demos.31/
Vayansky, I., Kumar, S.A.: A review of topic modeling methods. Inf. Syst. 94, 101582 (2020). https://doi.org/10.1016/j.is.2020.101582
Article Google Scholar
Villani, C.: Topics in Optimal Transportation, vol. 58. American Mathematical Society (2021). https://www.math.ucla.edu/~wgangbo/Cedric-Villani.pdf
Zhai, C., Geigle, C.: A tutorial on probabilistic topic models for text data retrieval and analysis. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1395–1398 (2018). https://doi.org/10.1145/3209978.3210189
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713–3722 (2019). https://doi.ieeecomputersociety.org/10.1109/ICCV.2019.00381
Zhang, Y., Jiang, T., Yang, T., Li, X., Wang, S.: HTKG: deep keyphrase generation with neural hierarchical topic guidance. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 1044–1054. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3477495.3531990

Download references

Author information

Authors and Affiliations

Indian Association for the Cultivation of Science, Jadavpur, 700032, India
Suman Adhya & Debarshi Kumar Sanyal

Authors

Suman Adhya
View author publications
You can also search for this author in PubMed Google Scholar
Debarshi Kumar Sanyal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debarshi Kumar Sanyal .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Université Grenoble-Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
University of Tsukuba, Ibaraki, Japan
Hideo Joho
Dublin City University, Dublin, Ireland
Brian Davis
Dublin City University, Dublin, Ireland
Cathal Gurrin
Universität Regensburg, Regensburg, Germany
Udo Kruschwitz
Dublin City University, Dublin, Ireland
Annalina Caputo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adhya, S., Sanyal, D.K. (2023). Improving Neural Topic Models with Wasserstein Knowledge Distillation. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-28238-6_21
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28237-9
Online ISBN: 978-3-031-28238-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Neural Topic Models with Wasserstein Knowledge Distillation