Semi-supervised Acquisition of Croatian Sentiment Lexicon

  • Goran Glavaš
  • Jan Šnajder
  • Bojana Dalbelo Bašić
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7499)


Sentiment analysis aims to recognize subjectivity expressed in natural language texts. Subjectivity analysis tries to answer if the text unit is subjective or objective, while polarity analysis determines whether a subjective text is positive or negative. Sentiment of sentences and documents is often determined using some sort of a sentiment lexicon. In this paper we present three different semi-supervised methods for automated acquisition of a sentiment lexicon that do not depend on pre-existing language resources: latent semantic analysis, graph-based propagation, and topic modelling. Methods are language independent and corpus-based, hence especially suitable for languages for which resources are very scarce. We use the presented methods to acquire sentiment lexicon for Croatian language. The performance of the methods was evaluated on the task of determining both subjectivity and polarity at (subjectivity + polarity task) and the task of determining polarity of subjective words (polarity only task). The results indicate that the methods are especially suitable for the polarity only task.


Topic Modelling Latent Dirichlet Allocation Sentiment Analysis Neutral Word Latent Semantic Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)Google Scholar
  2. 2.
    Riloff, E., Patwardhan, S., Wiebe, J.: Feature subsumption for opinion analysis. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 440–448. Association for Computational Linguistics (2006)Google Scholar
  3. 3.
    Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proceedings of the National Conference on Artificial Intelligence, pp. 755–760 (2004)Google Scholar
  4. 4.
    Somasundaran, S., Wilson, T., Wiebe, J., Stoyanov, V.: QA with attitude: Exploiting opinion type analysis for improving question answering in on-line discussions and the news. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM), Citeseer (2007)Google Scholar
  5. 5.
    Hatzivassiloglou, V., McKeown, K.: Predicting the semantic orientation of adjectives. In: Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics, pp. 174–181. Association for Computational Linguistics (1997)Google Scholar
  6. 6.
    Turney, P., Littman, M.: Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS) (2003)Google Scholar
  7. 7.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Computational Linguistics 35, 399–433 (2009)CrossRefGoogle Scholar
  8. 8.
    Andreevskaia, A., Bergler, S.: Mining WordNet for fuzzy sentiment: Sentiment tag extraction from WordNet glosses. In: Proceedings of EACL, vol. 6, pp. 209–216 (2006)Google Scholar
  9. 9.
    Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology (2011) (in press)Google Scholar
  10. 10.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Association for Computational Linguistics (2005)Google Scholar
  11. 11.
    Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Computational Linguistics, 1–41 (2011)Google Scholar
  12. 12.
    Fellbaum, C.: WordNet. In: Theory and Applications of Ontology: Computer Applications, pp. 231–243 (2010)Google Scholar
  13. 13.
    Dumais, S.: Latent semantic analysis. Annual Review of Information Science and Technology 38, 188–230 (2004)CrossRefGoogle Scholar
  14. 14.
    Kamps, J., Marx, M., Mokken, R., De Rijke, M.: Using WordNet to measure semantic orientations of adjectives (2004)Google Scholar
  15. 15.
    Esuli, A., Sebastiani, F.: PageRanking WordNet synsets: An application to opinion mining. In: Annual Meeting-Association for Computational Linguistics, vol. 45, p. 424 (2007)Google Scholar
  16. 16.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web (1999)Google Scholar
  17. 17.
    Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  18. 18.
    Hoffman, M., Blei, D., Bach, F.: Online learning for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, vol. 23, pp. 856–864 (2010)Google Scholar
  19. 19.
    Šnajder, J., Dalbelo Bašić, B., Tadić, M.: Automatic acquisition of inflectional lexica for morphological normalisation. Information Processing & Management 44, 1720–1731 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Goran Glavaš
    • 1
  • Jan Šnajder
    • 1
  • Bojana Dalbelo Bašić
    • 1
  1. 1.Faculty of Electrical Engineering and ComputingUniversity of ZagrebZagrebCroatia

Personalised recommendations