CoTE: A Flexible Method for Joint Learning of Topic and Embedding Models

Zhao, Bo; Yuan, Chunfeng; Huang, Yihua

doi:10.1007/978-981-97-2421-5_27

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14334))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

33 Accesses

Abstract

The topic and embedding models are two of the most popular categories of techniques to learn the latent semantics from text. In the topic models, each word is generated according to its global context; while in the embedding models, each word occurrence is measured by surrounding words. Thus it is expected to train the topic and embedding models jointly by utilizing multi-context information to learn better representations. In this paper, we propose a flexible method named CoTE to achieve this goal, which can integrate a variety of the topic and embedding models together. And we design a general 3-stage learning procedure to optimize the parameters of CoTE, which adopts a rotation optimization scheme. We chose and combined two groups of the de-facto topic and embedding models to implement the CoTE-PD and CoTE-LW algorithms. Experimental results show that CoTE achieves accuracy improvements in both individual components.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/AKSW/Palmetto.

References

Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013), pp. 13–22 (2013)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Google Scholar
Das, R., Zaheer, M., Dyer, C.: Gaussian LDA for topic models with word embeddings. In: Proceedings of the 53nd Annual Meeting of the Association for Computational Linguistics (2015)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dieng, A.B., Ruiz, F.J., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020)
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 50–57. ACM, New York (1999)
Google Scholar
Jiang, D., Shi, L., Lian, R., Wu, H.: Latent topic embedding. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics, pp. 2689–2698. The COLING 2016 Organizing Committee (2016)
Google Scholar
Keya, K.N., Papanikolaou, Y., Foulds, J.R.: Neural embedding allocation: distributed representations of topic models. Comput. Linguist. 48(4), 1021–1052 (2022)
Article Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, vol. 14, pp. 1188–1196 (2014)
Google Scholar
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2177–2185. Curran Associates, Inc. (2014)
Google Scholar
Li, S., Chua, T.S., Zhu, J., Miao, C.: Generative topic embedding: a continuous representation of documents. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 666–675. Association for Computational Linguistics (2016)
Google Scholar
Li, X., Chi, J., Li, C., Ouyang, J., Fu, B.: Integrating topic modeling with word embeddings by mixtures of vMFs. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 151–160. The COLING 2016 Organizing Committee (2016)
Google Scholar
Liu, Y., Liu, Z., Chua, T.S., Sun, M.: Topical word embeddings. In: AAAI, pp. 2418–2424 (2015)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 (2012)
Google Scholar
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM 2015), pp. 399–408 (2015). https://doi.org/10.1145/2684822.2685324
Shi, B., Lam, W., Jameel, S., Schockaert, S., Lai, K.P.: Jointly learning word embeddings and latent topics. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 375–384 (2017)
Google Scholar
Vorontsov, K., Potapenko, A.: Additive regularization of topic models. Mach. Learn. (2014)
Google Scholar
word2vec (2013). https://code.google.com/archive/p/word2vec/
Xu, H., Wang, W., Liu, W., Carin, L.: Distilled Wasserstein learning for word embedding and topic modeling. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Xun, G., Li, Y., Zhao, W.X., Gao, J., Zhang, A.: A correlated topic model using word embeddings. In: IJCAI, vol. 17, pp. 4207–4213 (2017)
Google Scholar

Download references

Acknowledgement

This work has been supported by the National Natural Science Foundation of China (No. U181461).

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Bo Zhao, Chunfeng Yuan & Yihua Huang
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210023, China
Bo Zhao, Chunfeng Yuan & Yihua Huang

Authors

Bo Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chunfeng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yihua Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Zhao .

Editor information

Editors and Affiliations

Peng Cheng Laboratory, Shenzhen, China
Xiangyu Song
China University of Geosciences, Wuhan, China
Ruyi Feng
China University of Geosciences, Wuhan, China
Yunliang Chen
Deakin University, Burwood, VIC, Australia
Jianxin Li
University of Exeter, Exeter, UK
Geyong Min

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, B., Yuan, C., Huang, Y. (2024). CoTE: A Flexible Method for Joint Learning of Topic and Embedding Models. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14334. Springer, Singapore. https://doi.org/10.1007/978-981-97-2421-5_27

Download citation

DOI: https://doi.org/10.1007/978-981-97-2421-5_27
Published: 12 May 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2420-8
Online ISBN: 978-981-97-2421-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CoTE: A Flexible Method for Joint Learning of Topic and Embedding Models