Bootstrapping Large Scale Polarity Lexicons through Advanced Distributional Methods

  • Giuseppe Castellucci
  • Danilo Croce
  • Roberto Basili
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9336)


Recent interests in Sentiment Analysis brought the attention on effective methods to detect opinions and sentiments in texts. Many approaches in literature are based on hand-coded resources that model the prior polarity of words or multi-word expressions. The development of such resources is expensive and language dependent so that they cannot fully cover linguistic sentiment phenomena. This paper presents an automatic method for deriving large-scale polarity lexicons based on Distributional Models of Lexical Semantics. Given a set of heuristically annotated sentences from Twitter, we transfer the sentiment information from sentences to words. The approach is mostly unsupervised, and experiments on different Sentiment Analysis tasks in English and Italian show the benefits of the generated resources.


Polarity lexicon generation Distributional semantics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: Proc. of ICML (2003)Google Scholar
  2. 2.
    Basile, V., Bolioli, A., Nissim, M., Patti, V., Rosso, P.: Overview of the evalita 2014 sentiment polarity classification task. In: Proc. of the 4th EVALITA (2014)Google Scholar
  3. 3.
    Basile, V., Nissim, M.: Sentiment analysis on Italian tweets. In: Proc. of the 4th WS: Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (2013)Google Scholar
  4. 4.
    Basili, R., Pazienza, M.T., Zanzotto, F.M.: Efficient parsing for information extraction. In: ECAI, pp. 135–139 (1998)Google Scholar
  5. 5.
    Castellucci, G., Croce, D., Basili, R.: Acquiring a large scale polarity lexicon through unsupervised distributional methods. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds.) NLDB 2015. LNCS, vol. 9103, pp. 73–86. Springer, Heidelberg (2015) CrossRefGoogle Scholar
  6. 6.
    Castellucci, G., Croce, D., De Cao, D., Basili, R.: A multiple kernel approach for twitter sentiment analysis in Italian. In: 4th EVALITA 2014 (2014)Google Scholar
  7. 7.
    Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proc. of EMNLP (2011)Google Scholar
  8. 8.
    Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proc. of 5th LREC, pp. 417–422 (2006)Google Scholar
  9. 9.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. JMLR 9, 1871–1874 (2008)zbMATHGoogle Scholar
  10. 10.
    Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. Processing, 1–6 (2009)Google Scholar
  11. 11.
    Harris, Z.: Distributional structure. In: Katz, J.J., Fodor, J.A. (eds.) The Philosophy of Linguistics. Oxford University Press (1964)Google Scholar
  12. 12.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proc. of 10th Int. Conf. on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)Google Scholar
  13. 13.
    Kiritchenko, S., Zhu, X., Mohammad, S.M.: Sentiment analysis of short informal texts. JAIR 50, 723–762 (2014)zbMATHGoogle Scholar
  14. 14.
    Landauer, T., Dumais, S.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104 (1997)Google Scholar
  15. 15.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013).
  16. 16.
    Miller, G.A.: Wordnet: A lexical database for english. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  17. 17.
    Mohammad, S.M., Turney, P.D.: Emotions evoked by common words and phrases: using mechanical turk to create an emotion lexicon. In: Proc. of CAAGET Workshop (2010)Google Scholar
  18. 18.
    Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., Wilson, T.: Semeval-2013 task 2: sentiment analysis in twitter. In: Proc. of SemEval. ACL, USA, June 2013Google Scholar
  19. 19.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  20. 20.
    Rao, D., Ravichandran, D.: Semi-supervised polarity lexicon induction. In: Proc. of the EACL, pp. 675–682. ACL (2009)Google Scholar
  21. 21.
    Rosenthal, S., Ritter, A., Nakov, P., Stoyanov, V.: Semeval-2014 task 9: sentiment analysis in twitter. In: Proc. SemEval. ACL and Dublin City University (2014)Google Scholar
  22. 22.
    Sahlgren, M.: The Word-Space Model. Ph.D. thesis, Stockholm University (2006)Google Scholar
  23. 23.
    Saif, H., Fernandez, M., He, Y., Alani, H.: SentiCircles for contextual and conceptual semantic sentiment analysis of twitter. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 83–98. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  24. 24.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)Google Scholar
  25. 25.
    Stone, P.J., Dunphy, D.C., Smith, M.S., Ogilvie, D.M.: The General Inquirer: A Computer Approach to Content Analysis. MIT Press (1966)Google Scholar
  26. 26.
    Suttles, J., Ide, N.: Distant supervision for emotion classification with discrete binary values. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 121–136. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  27. 27.
    Turney, P.D., Littman, M.L.: Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst. 21(4), 315–346 (2003)CrossRefGoogle Scholar
  28. 28.
    Vanzo, A., Croce, D., Basili, R.: A context-based model for sentiment analysis in twitter. In: Proc. of 25th COLING: Best Paper, pp. 2345–2354. Dublin City University and Association for Computational Linguistics (2014)Google Scholar
  29. 29.
    Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience (1998)Google Scholar
  30. 30.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proc. of EMNLP. ACL (2005)Google Scholar
  31. 31.
    Zhang, Z., Singh, M.P.: Renew: a semi-supervised framework for generating domain-specific lexicons and sentiment analysis. In: Proc. of 52nd Annual Meeting of the ACL, vol. 1, pp. 542–551. ACL, June 2014 (Long Papers)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Giuseppe Castellucci
    • 1
  • Danilo Croce
    • 2
  • Roberto Basili
    • 2
  1. 1.Department of Electronic EngineeringUniversity of Roma Tor VergataRomaItaly
  2. 2.Department of Enterprise EngineeringUniversity of Roma Tor VergataRomaItaly

Personalised recommendations