Advertisement

A Simple and Efficient Algorithm for Lexicon Generation Inspired by Structural Balance Theory

  • Anis Yazidi
  • Aleksander Bai
  • Hugo Hammer
  • Paal Engelstad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9101)

Abstract

Sentiment lexicon generation is a major task in the field of Sentiment Analysis. In contrast to the bulk of research that has focused almost exclusively on Label Propagation as primary tool for lexicon generation, we introduce a simple, yet efficient algorithm for lexicon generation that is inspired by Structural Balance Theory. Our algorithm is shown to outperform the classical Label Propagation algorithm.

A major drawback of Label Propagation resides in the fact that words which are situated many hops away from the seed words tend to get low sentiment values since the inaccuracy in the synonym-relationship is not taken properly into account. In fact, a label of a word is simply the average of it is neighbours. To circumvent this problem, we propose a novel algorithm that supports better transitive sentiment polarity transferring from seed word to target words using the theory of Structural Balance theory.

The premise of the algorithm is exemplified using the enemy of my enemy is my friend that preserves the transitivity structure captured by antonyms and synonyms. Thus, a low sentiment score is an indication of sentimental neutrality rather than due to the fact that the word in question is located at a far distance from the seeds.

The lexicons based on thesauruses were built using different variants of our proposed algorithm. The lexicons were evaluated by classifying product and movie reviews and the results show satisfying classification performances that outperform Label Propagation. We consider Norwegian as a case study, but the algorithm be can easily applied to other languages.

Keywords

Sentiment analysis Structural Balance Theory 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, Technical Report CMU-CALD-02-107, Carnegie Mellon University (2002)Google Scholar
  2. 2.
    Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proceedings of AAAI, pp. 755–760 (2004)Google Scholar
  3. 3.
    Mohammad, S., Dunne, C., Dorr, B.: Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 599–608. Association for Computational Linguistics (2009)Google Scholar
  4. 4.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 168–177 (2004)Google Scholar
  5. 5.
    Kamps, J., Marx, M., Mokken, R.J., De Rijke, M.: Using wordnet to measure semantic orientations of adjectives (2004)Google Scholar
  6. 6.
    Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  7. 7.
    Blair-Goldensohn, S., Hannan, K., McDonald, R., Neylon, T., Reis, G.A., Reynar, J.: Building a sentiment summarizer for local service reviews. In: WWW Workshop on NLP in the Information Explosion Era, p. 14 (2008)Google Scholar
  8. 8.
    Rao, D., Ravichandran, D.: Semi-supervised polarity lexicon induction. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 675–682. Association for Computational Linguistics (2009)Google Scholar
  9. 9.
    Blum, A., Lafferty, J., Rwebangira, M.R., Reddy, R.: Semi-supervised learning using randomized mincuts. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 13. ACM (2004)Google Scholar
  10. 10.
    Hassan, A., Radev, D.: Identifying text polarity using random walks. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 395–403. Association for Computational Linguistics (2010)Google Scholar
  11. 11.
    Kim, S.M., Hovy, E.: Automatic identification of pro and con reasons in online reviews. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 483–490. Association for Computational Linguistics (2006)Google Scholar
  12. 12.
    Andreevskaia, A., Bergler, S.: Mining wordnet for a fuzzy sentiment: sentiment tag extraction from wordnet glosses. In: EACL, vol. 6, pp. 209–215 (2006)Google Scholar
  13. 13.
    Hassan, A., Abu-Jbara, A., Jha, R., Radev, D.: Identifying the semantic orientation of foreign words. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 592–597. Association for Computational Linguistics (2011)Google Scholar
  14. 14.
    Leskovec, J., Huttenlocher, D., Kleinberg, J.: Predicting positive and negative links in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, pp. 641–650. ACM (2010)Google Scholar
  15. 15.
    Kunegis, J., Lommatzsch, A., Bauckhage, C.: The slashdot zoo: mining a social network with negative edges. In: Proceedings of the 18th International Conference on World Wide Web, pp. 741–750. ACM (2009)Google Scholar
  16. 16.
    Golbeck, J.A.: Computing and Applying Trust in Web-based Social Networks. PhD thesis, College Park, MD, USA (2005) AAI3178583Google Scholar
  17. 17.
    Hammer, H., Bai, A., Yazidi, A., Engelstad, P.: Building sentiment lexicons applying graph theory on information from three norwegian thesauruses. In: Norweian Informatics Conference (2014)Google Scholar
  18. 18.
    Nielsen, F.Å.: A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. CoRR abs/1103.2903 (2011)Google Scholar
  19. 19.
    Bing, L.: Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. Springer (2011)Google Scholar
  20. 20.
    Hammer, H.L., Solberg, P.E., Øvrelid, L.: Sentiment classification of online political discussions: a comparison of a word-based and dependency-based method. In: Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 90–96. Association for Computational Linguistics (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Anis Yazidi
    • 1
  • Aleksander Bai
    • 1
  • Hugo Hammer
    • 1
  • Paal Engelstad
    • 1
  1. 1.Institute of Information TechnologyOslo and Akershus University College of Applied SciencesOsloNorway

Personalised recommendations