Abstract
Topic modeling is a dominant method for exploring document collections on the web and in digital libraries. Recent approaches to topic modeling use pretrained contextualized language models and variational autoencoders. However, large neural topic models have a considerable memory footprint. In this paper, we propose a knowledge distillation framework to compress a contextualized topic model without loss in topic quality. In particular, the proposed distillation objective is to minimize the cross-entropy of the soft labels produced by the teacher and the student models, as well as to minimize the squared 2-Wasserstein distance between the latent distributions learned by the two models. Experiments on two publicly available datasets show that the student trained with knowledge distillation achieves topic coherence much higher than that of the original student model, and even surpasses the teacher while containing far fewer parameters than the teacher. The distilled model also outperforms several other competitive topic models on topic coherence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adhya, S., Sanyal, D.K.: What does the Indian Parliament discuss? An exploratory analysis of the question hour in the Lok Sabha. In: Proceedings of the LREC 2022 Workshop on Natural Language Processing for Political Sciences, Marseille, France, pp. 72–78. European Language Resources Association, June 2022. https://aclanthology.org/2022.politicalnlp-1.10
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the International Conference on Machine Learning, pp. 214–223. PMLR (2017). https://proceedings.mlr.press/v70/arjovsky17a.html
Bianchi, F., Terragni, S., Hovy, D.: Pre-training is a hot topic: contextualized document embeddings improve topic coherence. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 759–766 (2021). https://aclanthology.org/2021.acl-short.96/
Bianchi, F., Terragni, S., Hovy, D., Nozza, D., Fersini, E.: Cross-lingual contextualized topic models with zero-shot learning. In: The 16th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics (2021). https://aclanthology.org/2021.eacl-main.143/
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Dieng, A.B., Ruiz, F.J., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020). https://doi.org/10.1162/tacl_a_00325
Frogner, C., Zhang, C., Mobahi, H., Araya-Polo, M., Poggio, T.: Learning with a Wasserstein loss. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, vol. 2, pp. 2053–2061. MIT Press, Cambridge (2015). https://papers.neurips.cc/paper/5679-learning-with-a-wasserstein-loss.pdf
Furlanello, T., Lipton, Z., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks. In: Proceedings of the International Conference on Machine Learning, pp. 1607–1616. PMLR (2018). https://proceedings.mlr.press/v80/furlanello18a.html
Gao, R., Kleywegt, A.: Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. (2022). https://doi.org/10.1287/moor.2022.1275
Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vis. 129(6), 1789–1819 (2021). https://doi.org/10.1007/s11263-021-01453-z
Guo, J., Cai, Y., Fan, Y., Sun, F., Zhang, R., Cheng, X.: Semantic models for the first-stage retrieval: a comprehensive review. ACM Trans. Inf. Syst. (TOIS) 40(4), 1–42 (2022). https://doi.org/10.1145/3486250
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network, 2(7). arXiv preprint arXiv:1503.02531 (2015)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the International Conference on Learning Representations (2014). https://arxiv.org/abs/1312.6114
Krasnashchok, K., Jouili, S.: Improving topic quality by promoting named entities in topic modeling. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 247–253 (2018). https://aclanthology.org/P18-2040/
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, pp. 530–539. Association for Computational Linguistics, April 2014. https://aclanthology.org/E14-1056
Montavon, G., Müller, K.R., Cuturi, M.: Wasserstein training of restricted Boltzmann machines. In: Advances in Neural Information Processing Systems, vol. 29 (2016). https://papers.nips.cc/paper/6248-wasserstein-training-of-restricted-boltzmann-machines
Nityasya, M.N., Wibowo, H.A., Chevi, R., Prasojo, R.E., Aji, A.F.: Which student is best? A comprehensive knowledge distillation exam for task-specific BERT models. arXiv preprint arXiv:2201.00558 (2022)
Olkin, I., Pukelsheim, F.: The distance between two random vectors with given dispersion matrices. Linear Algebra Appl. 48, 257–263 (1982). https://doi.org/10.1016/0024-3795(82)90112-4
Ozair, S., Lynch, C., Bengio, Y., van den Oord, A., Levine, S., Sermanet, P.: Wasserstein dependency measure for representation learning. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/f9209b7866c9f69823201c1732cc8645-Paper.pdf
Pan, S., Wu, J., Zhu, X., Zhang, C., Wang, Y.: Tri-party deep network representation. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 1895–1901. AAAI Press (2016). https://www.ijcai.org/Proceedings/16/Papers/271.pdf
Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances. Ann. Rev. Stat. Appl. 6, 405–431 (2019). https://doi.org/10.1146/annurev-statistics-030718-104938
Reimers, N., et al.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics (2019). https://aclanthology.org/D19-1410
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408 (2015). https://doi.org/10.1145/2684822.2685324
Srivastava, A., Sutton, C.: Autoencoding variational inference for topic models. In: International Conference on Learning Representations (2017). https://openreview.net/forum?id=BybtVK9lg
Terragni, S., Fersini, E., Galuzzi, B.G., Tropeano, P., Candelieri, A.: OCTIS: comparing and optimizing topic models is simple! In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 263–270 (2021). https://aclanthology.org/2021.eacl-demos.31/
Vayansky, I., Kumar, S.A.: A review of topic modeling methods. Inf. Syst. 94, 101582 (2020). https://doi.org/10.1016/j.is.2020.101582
Villani, C.: Topics in Optimal Transportation, vol. 58. American Mathematical Society (2021). https://www.math.ucla.edu/~wgangbo/Cedric-Villani.pdf
Zhai, C., Geigle, C.: A tutorial on probabilistic topic models for text data retrieval and analysis. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1395–1398 (2018). https://doi.org/10.1145/3209978.3210189
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713–3722 (2019). https://doi.ieeecomputersociety.org/10.1109/ICCV.2019.00381
Zhang, Y., Jiang, T., Yang, T., Li, X., Wang, S.: HTKG: deep keyphrase generation with neural hierarchical topic guidance. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 1044–1054. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3477495.3531990
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Adhya, S., Sanyal, D.K. (2023). Improving Neural Topic Models with Wasserstein Knowledge Distillation. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-28238-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28237-9
Online ISBN: 978-3-031-28238-6
eBook Packages: Computer ScienceComputer Science (R0)