Sentiment Classification with Supervised Sequence Embedding

  • Dmitriy Bespalov
  • Yanjun Qi
  • Bing Bai
  • Ali Shokoufandeh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7523)


In this paper, we introduce a novel approach for modeling n-grams in a latent space learned from supervised signals. The proposed procedure uses only unigram features to model short phrases (n-grams) in the latent space. The phrases are then combined to form document-level latent representation for a given text, where position of an n-gram in the document is used to compute corresponding combining weight. The resulting two-stage supervised embedding is then coupled with a classifier to form an end-to-end system that we apply to the large-scale sentiment classification task. The proposed model does not require feature selection to retain effective features during pre-processing, and its parameter space grows linearly with size of n-gram. We present comparative evaluations of this method using two large-scale datasets for sentiment classification in online reviews (Amazon and TripAdvisor). The proposed method outperforms standard baselines that rely on bag-of-words representation populated with n-gram features.


Sentiment Classification Large-Scale Text Mining Supervised Feature Learning Supervised Embedding 


  1. 1.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1–135 (2008)CrossRefGoogle Scholar
  2. 2.
    Zhu, S., Ji, X., Xu, W., Gong, Y.: Multi-labelled classification using maximum entropy method. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 274–281. ACM, New York (2005)CrossRefGoogle Scholar
  3. 3.
    Sun, A., Lim, E.P.: Hierarchical text classification and evaluation. In: Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM 2001, pp. 521–528. IEEE Computer Society, Washington, DC (2001)Google Scholar
  4. 4.
    McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48 (1998)Google Scholar
  5. 5.
    Nigam, K.: Using maximum entropy for text classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, pp. 61–67 (1999)Google Scholar
  6. 6.
    Yi, K., Beheshti, J.: A hidden markov model-based text classification of medical documents. J. Inf. Sci. 35, 67–81 (2009)CrossRefGoogle Scholar
  7. 7.
    Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Mach. Learn. 39, 103–134 (2000)zbMATHCrossRefGoogle Scholar
  8. 8.
    Mirowski, P., Ranzato, M., LeCun, Y.: Dynamic auto-encoders for semantic indexing. In: Proceedings of the NIPS 2010 Workshop on Deep Learning (2010)Google Scholar
  9. 9.
    Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 1386–1395. Association for Computational Linguistics, USA (2010)Google Scholar
  10. 10.
    Cavnar, W., Trenkle, J.: N-gram-based text categorization. Ann. Arbor. MI 48113(2), 161–175 (1994)Google Scholar
  11. 11.
    Yan, J., Liu, N., Zhang, B., Yan, S., Chen, Z., Cheng, Q., Fan, W., Ma, W.Y.: Ocfs: optimal orthogonal centroid feature selection for text categorization. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 122–129. ACM, New York (2005)CrossRefGoogle Scholar
  12. 12.
    Jing, H., Wang, B., Yang, Y., Xu, Y.: A General Framework of Feature Selection for Text Categorization. In: Perner, P. (ed.) MLDM 2009. LNCS, vol. 5632, pp. 647–662. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Bottou, L.: Stochastic Learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) Machine Learning 2003. LNCS (LNAI), vol. 3176, pp. 146–168. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Robertson, S., Zaragoza, H.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr. 3, 333–389 (2009)CrossRefGoogle Scholar
  15. 15.
    Bespalov, D., Bai, B., Qi, Y., Shokoufandeh, A.: Sentiment classification based on supervised latent n-gram analysis. In: ACM Conference on Information and Knowledge Management, CIKM (2011)Google Scholar
  16. 16.
    Lebanon, G., Mao, Y., Dillon, J.: The locally weighted bag of words framework for document representation. J. Mach. Learn. Res. 8, 2405–2441 (2007)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Bottou, L.E., Cun, Y.L.: Large scale online learning. In: NIPS 2003. MIT Press (2004)Google Scholar
  18. 18.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: International Conference on Machine Learning, ICML (2008)Google Scholar
  19. 19.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of The American Society for Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  20. 20.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM Press, New York (1999)CrossRefGoogle Scholar
  21. 21.
    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  22. 22.
    Weston, J., Bengio, S., Usunier, N.: Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning 81(1), 21–35 (2010)CrossRefGoogle Scholar
  23. 23.
    Bengio, Y.: Learning Deep Architectures for AI. Now Publishers Inc., Hanover (2009)Google Scholar
  24. 24.
    Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), Omnipress, Bellevue (June 2011)Google Scholar
  25. 25.
    Socher, R., Lin, C.C.Y., Ng, A., Manning, C.: Parsing natural scenes and natural language with recursive neural networks. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 129–136. ACM, New York (June 2011)Google Scholar
  26. 26.
    Bengio, Y., Ducharme, R., Vincent, P., Operationnelle, D.D.E.R.: A neural probabilistic language model. Journal of Machine Learning Research 3, 1137–1155 (2000)Google Scholar
  27. 27.
    Morin, F.: Hierarchical probabilistic neural network language model. In: AISTATS 2005, pp. 246–252 (2005)Google Scholar
  28. 28.
    Leslie, C.S., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for SVM protein classification. In: NIPS, pp. 1417–1424 (2002)Google Scholar
  29. 29.
    Weston, J., Leslie, C., Ie, E., Zhou, D., Elisseeff, A., Noble, W.S.: Semi-supervised protein classification using cluster kernels. Bioinformatics 21(15), 3241–3247 (2005)CrossRefGoogle Scholar
  30. 30.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)zbMATHGoogle Scholar
  31. 31.
    Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boomboxes and blenders: Domain adaptation for sentiment classification. In: ACL, pp. 187–205 (2007)Google Scholar
  32. 32.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F., Dietterich, G., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)Google Scholar
  33. 33.
    Deshpande, M., Karypis, G.: Evaluation of Techniques for Classifying Biological Sequences. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 417–431. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  34. 34.
    Duskin, O., Feitelson, D.G.: Distinguishing humans from robots in web search logs: preliminary results using query rates and intervals. In: Proceedings of the 2009 Workshop on Web Search Click Data. WSCD 2009, pp. 15–19. ACM, New York (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Dmitriy Bespalov
    • 1
  • Yanjun Qi
    • 2
  • Bing Bai
    • 2
  • Ali Shokoufandeh
    • 1
  1. 1.Drexel UniversityPhiladelphiaUSA
  2. 2.NEC Labs AmericaPrincetonUSA

Personalised recommendations