Spherical Paragraph Model

  • Ruqing ZhangEmail author
  • Jiafeng Guo
  • Yanyan Lan
  • Jun Xu
  • Xueqi Cheng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10772)


Representing texts as fixed-length vectors is central to many language processing tasks. Most traditional methods build text representations based on the simple Bag-of-Words (BoW) representation, which loses the rich semantic relations between words. Recent advances in natural language processing have shown that semantically meaningful representations of words can be efficiently acquired by distributed models, making it possible to build text representations based on a better foundation called the Bag-of-Word-Embedding (BoWE) representation. However, existing text representation methods using BoWE often lack sound probabilistic foundations or cannot well capture the semantic relatedness encoded in word vectors. To address these problems, we introduce the Spherical Paragraph Model (SPM), a probabilistic generative model based on BoWE, for text representation. SPM has good probabilistic interpretability and can fully leverage the rich semantics of words, the word co-occurrence information as well as the corpus-wide information to help the representation learning of texts. Experimental results on topical classification and sentiment analysis demonstrate that SPM can achieve new state-of-the-art performances on several benchmark datasets.



This work was funded by the 973 Program of China under Grant No. 2014CB340401, the National Natural Science Foundation of China (NSFC) under Grants No. 61232010, 61433014, 61425016, 61472401, 61203298 and 61722211, the Youth Innovation Promotion Association CAS under Grants No. 20144310 and 2016102, and the National Key R&D Program of China under Grants No. 2016QY02D0405.


  1. 1.
    Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)CrossRefGoogle Scholar
  2. 2.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990)CrossRefGoogle Scholar
  3. 3.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefzbMATHGoogle Scholar
  4. 4.
    Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57. ACM (1999)Google Scholar
  5. 5.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)zbMATHGoogle Scholar
  6. 6.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  7. 7.
    Vulic, I., Moens, M.F.: Cross-lingual semantic similarity of words as the similarity of their semantic word responses. In: NAACL-HLT, pp. 106–116. ACL (2013)Google Scholar
  8. 8.
    Clinchant, S., Perronnin, F.: Aggregating continuous word embeddings for information retrieval. In: Proceedings of the Workshop on Continuous Vector Space Models and Their Compositionality, pp. 100–109 (2013)Google Scholar
  9. 9.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. ICML 14, 1188–1196 (2014)Google Scholar
  10. 10.
    Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6(Sep), 1345–1382 (2005)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval (1986)Google Scholar
  12. 12.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. EMNLP 14, 1532–1543 (2014)Google Scholar
  13. 13.
    Jaakkola, T.S., Haussler, D., et al.: Exploiting generative models in discriminative classifiers. In: NIPS, pp. 487–493 (1999)Google Scholar
  14. 14.
    Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, vol. 1631, p. 1642. Citeseer (2013)Google Scholar
  15. 15.
    Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: ICML-11, pp. 1017–1024 (2011)Google Scholar
  16. 16.
    Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP, pp. 1746–1751 (2014)Google Scholar
  17. 17.
    Mardia, K.V., Jupp, P.E.: Directional Statistics, vol. 494. Wiley, New York (2009)zbMATHGoogle Scholar
  18. 18.
    Manning, C.D., Schütze, H., et al.: Foundations of statistical natural language processing, vol. 999. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  19. 19.
    Zhong, S., Ghosh, J.: Generative model-based document clustering: a comparative study. Knowl. Inf. Syst. 8(3), 374–384 (2005)CrossRefGoogle Scholar
  20. 20.
    Reisinger, J., Waters, A., Silverthorn, B., Mooney, R.J.: Spherical topic models. In: ICML-10, pp. 903–910 (2010)Google Scholar
  21. 21.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)Google Scholar
  22. 22.
    Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999)CrossRefzbMATHGoogle Scholar
  23. 23.
    Watson, G.N.: A Treatise on the Theory of Bessel Functions. Cambridge University Press, Cambridge (1995)zbMATHGoogle Scholar
  24. 24.
    Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: NIPS, pp. 3294–3302 (2015)Google Scholar
  25. 25.
    Hill, F., Cho, K., Korhonen, A.: Learning distributed representations of sentences from unlabelled data. In: NAACL-HLT (2016)Google Scholar
  26. 26.
    Banerjee, A., Basu, S.: Topic models over text streams: a study of batch and online unsupervised learning. In: SDM. vol. 7, pp. 437–442. SIAM (2007)Google Scholar
  27. 27.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL, p. 271. ACL (2004)Google Scholar
  28. 28.
    Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL, pp. 115–124. ACL (2005)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Ruqing Zhang
    • 1
    Email author
  • Jiafeng Guo
    • 1
  • Yanyan Lan
    • 1
  • Jun Xu
    • 1
  • Xueqi Cheng
    • 1
  1. 1.CAS Key Lab of Network Data Science and Technology, Institute of Computing TechnologyUniversity of Chinese Academy of SciencesBeijingChina

Personalised recommendations