Interpretable Probabilistic Embeddings: Bridging the Gap Between Topic Models and Neural Networks

  • Anna PotapenkoEmail author
  • Artem Popov
  • Konstantin Vorontsov
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 789)


We consider probabilistic topic models and more recent word embedding techniques from a perspective of learning hidden semantic representations. Inspired by a striking similarity of the two approaches, we merge them and learn probabilistic embeddings with online EM-algorithm on word co-occurrence data. The resulting embeddings perform on par with Skip-Gram Negative Sampling (SGNS) on word similarity tasks and benefit in the interpretability of the components. Next, we learn probabilistic document embeddings that outperform paragraph2vec on a document similarity task and require less memory and time for training. Finally, we employ multimodal Additive Regularization of Topic Models (ARTM) to obtain a high sparsity and learn embeddings for other modalities, such as timestamps and categories. We observe further improvement of word similarity performance and meaningful inter-modality similarities.



The work was supported by Government of the Russian Federation (agreement 05.Y09.21.0018) and the Russian Foundation for Basic Research grants 17-07-01536, 16-37-00498.


  1. 1.
    Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 19–27. Association for Computational Linguistics, Stroudsburg (2009)Google Scholar
  2. 2.
    Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics. In: IWCS (2013)Google Scholar
  3. 3.
    Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)zbMATHGoogle Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. 5.
    Bruni, E., Boleda, G., Baroni, M., Tran, N.K.: Distributional semantics in technicolor. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, ACL 2012, vol. 1, pp. 136–145. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
  6. 6.
    Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Neural Information Processing Systems (2009)Google Scholar
  7. 7.
    Dai, A.M., Olah, C., Le, Q.V.: Document embedding with paragraph vectors. CoRR abs/1507.07998 (2015)Google Scholar
  8. 8.
    Das, R., Zaheer, M., Dyer, C.: Gaussian LDA for topic models with word embeddings. In: ACL (1), pp. 795–804. The Association for Computer Linguistics (2015)Google Scholar
  9. 9.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)CrossRefGoogle Scholar
  10. 10.
    Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002)CrossRefGoogle Scholar
  11. 11.
    Gentner, D.: Structure-mapping: a theoretical framework for analogy. Cogn. Sci. 7(2), 155–170 (1983)CrossRefGoogle Scholar
  12. 12.
    Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)CrossRefGoogle Scholar
  13. 13.
    Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with genuine similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Hoffman, M.D., Blei, D.M., Bach, F.R.: Online learning for latent dirichlet allocation. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) NIPS, pp. 856–864. Curran Associates, Inc. (2010)Google Scholar
  15. 15.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, UAI 1999, pp. 289–296. Morgan Kaufmann Publishers Inc., San Francisco (1999)Google Scholar
  16. 16.
    Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R.S., Torralba, A., Urtasun, R., Fidler, S.: Skip-thought vectors. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, pp. 3294–3302. MIT Press, Cambridge (2015)Google Scholar
  17. 17.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. CoRR abs/1405.4053 (2014)Google Scholar
  18. 18.
    Leviant, I., Reichart, R.: Judgment language matters: towards judgment language informed vector space modeling. arXiv (arXiv:1508.00106) (2015)
  19. 19.
    Levy, O., Goldberg, Y.: Linguistic regularities in sparse and explicit word representations. In: Morante, R., Yih, W. (eds.) CoNLL, pp. 171–180. ACL (2014)Google Scholar
  20. 20.
    Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2177–2185. Curran Associates, Inc. (2014)Google Scholar
  21. 21.
    Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. TACL 3, 211–225 (2015)Google Scholar
  22. 22.
    Liu, Y., Liu, Z., Chua, T.S., Sun, M.: Topical word embeddings. In: AAAI, pp. 2418–2424 (2015)Google Scholar
  23. 23.
    Luo, H., Liu, Z., Luan, H.B., Sun, M.: Online learning of interpretable word embeddings. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) EMNLP, pp. 1687–1692. The Association for Computational Linguistics (2015)Google Scholar
  24. 24.
    Luo, H., Liu, Z., Luan, H.B., Sun, M.: Online learning of interpretable word embeddings. In: EMNLP (2015)Google Scholar
  25. 25.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the Conference 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, vol. 1, pp. 238–247 (2014)Google Scholar
  27. 27.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) NIPS, pp. 3111–3119 (2013)Google Scholar
  28. 28.
    Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)CrossRefGoogle Scholar
  29. 29.
    Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 262–272. Association for Computational Linguistics, Stroudsburg (2011)Google Scholar
  30. 30.
    Murphy, B., Talukdar, P.P., Mitchell, T.M.: Learning effective and interpretable semantic models using non-negative sparse embedding. In: Kay, M., Boitet, C. (eds.) COLING, pp. 1933–1950. Indian Institute of Technology Bombay (2012)Google Scholar
  31. 31.
    Newman, D., Bonilla, E.V., Buntine, W.L.: Improving topic coherence with regularized topic models. In: NIPS (2011)Google Scholar
  32. 32.
    Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, pp. 100–108. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  33. 33.
    Nguyen, Q.D., Billingsley, R., Du, L., Johnson, M.: Improving topic models with latent feature word representations. Trans. Assoc. Comput. Linguist. 3, 299–313 (2015)Google Scholar
  34. 34.
    Panchenko, A., Ustalov, D., Arefyev, N., Paperno, D., Konstantinova, N., Loukachevitch, N., Biemann, C.: Human and machine judgements for Russian semantic relatedness. In: Ignatov, D.I., Khachay, M.Y., Labunets, V.G., Loukachevitch, N., Nikolenko, S.I., Panchenko, A., Savchenko, A.V., Vorontsov, K. (eds.) AIST 2016. CCIS, vol. 661, pp. 221–235. Springer, Cham (2017)CrossRefGoogle Scholar
  35. 35.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)Google Scholar
  36. 36.
    Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM 2015, pp. 399–408. ACM, New York (2015)Google Scholar
  37. 37.
    Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)CrossRefGoogle Scholar
  38. 38.
    Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: Blunsom, P., Cohen, S.B., Dhillon, P.S., Liang, P. (eds.) VS@HLT-NAACL, pp. 192–200. The Association for Computational Linguistics (2015)Google Scholar
  39. 39.
    Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Sparse word embeddings using 1 regularized online learning. In: Kambhampati, S. (ed.) IJCAI, pp. 2915–2921. IJCAI/AAAI Press (2016)Google Scholar
  40. 40.
    Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 2010(37), 141–188 (2010)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Vorontsov, K., Frei, O., Apishev, M., Romov, P., Dudarenko, M.: BigARTM: open source library for regularized multimodal topic modeling of large collections. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 370–381. Springer, Cham (2015)CrossRefGoogle Scholar
  42. 42.
    Vorontsov, K., Frei, O., Apishev, M., Romov, P., Suvorova, M., Yanina, A.: Non-Bayesian additive regularization for multimodal topic modeling of large collections. In: Aletras, N., Lau, J.H., Baldwin, T., Stevenson, M. (eds.) TM@CIKM, pp. 29–37. ACM (2015)Google Scholar
  43. 43.
    Vorontsov, K., Potapenko, A.: Additive regularization of topic models. Mach. Learn. 101(1), 303–323 (2015)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Schwabe, D., Almeida, V.A.F., Glaser, H., Baeza-Yates, R.A., Moon, S.B. (eds.) WWW, pp. 1445–1456. International World Wide Web Conferences Steering Committee/ACM (2013)Google Scholar
  45. 45.
    Zuo, Y., Zhao, J., Xu, K.: Word network topic model: a simple but general solution for short and imbalanced texts. Knowl. Inf. Syst. 48(2), 379–398 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Anna Potapenko
    • 1
    Email author
  • Artem Popov
    • 2
  • Konstantin Vorontsov
    • 3
  1. 1.National Research University Higher School of EconomicsMoscowRussia
  2. 2.Lomonosov Moscow State UniversityMoscowRussia
  3. 3.Moscow Institute of Physics and TechnologyMoscowRussia

Personalised recommendations