A Simple and Efficient Algorithm for Lexicon Generation Inspired by Structural Balance Theory
Sentiment lexicon generation is a major task in the field of Sentiment Analysis. In contrast to the bulk of research that has focused almost exclusively on Label Propagation as primary tool for lexicon generation, we introduce a simple, yet efficient algorithm for lexicon generation that is inspired by Structural Balance Theory. Our algorithm is shown to outperform the classical Label Propagation algorithm.
A major drawback of Label Propagation resides in the fact that words which are situated many hops away from the seed words tend to get low sentiment values since the inaccuracy in the synonym-relationship is not taken properly into account. In fact, a label of a word is simply the average of it is neighbours. To circumvent this problem, we propose a novel algorithm that supports better transitive sentiment polarity transferring from seed word to target words using the theory of Structural Balance theory.
The premise of the algorithm is exemplified using the enemy of my enemy is my friend that preserves the transitivity structure captured by antonyms and synonyms. Thus, a low sentiment score is an indication of sentimental neutrality rather than due to the fact that the word in question is located at a far distance from the seeds.
The lexicons based on thesauruses were built using different variants of our proposed algorithm. The lexicons were evaluated by classifying product and movie reviews and the results show satisfying classification performances that outperform Label Propagation. We consider Norwegian as a case study, but the algorithm be can easily applied to other languages.
KeywordsSentiment analysis Structural Balance Theory
Unable to display preview. Download preview PDF.
- 1.Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, Technical Report CMU-CALD-02-107, Carnegie Mellon University (2002)Google Scholar
- 2.Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proceedings of AAAI, pp. 755–760 (2004)Google Scholar
- 3.Mohammad, S., Dunne, C., Dorr, B.: Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 599–608. Association for Computational Linguistics (2009)Google Scholar
- 4.Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 168–177 (2004)Google Scholar
- 5.Kamps, J., Marx, M., Mokken, R.J., De Rijke, M.: Using wordnet to measure semantic orientations of adjectives (2004)Google Scholar
- 7.Blair-Goldensohn, S., Hannan, K., McDonald, R., Neylon, T., Reis, G.A., Reynar, J.: Building a sentiment summarizer for local service reviews. In: WWW Workshop on NLP in the Information Explosion Era, p. 14 (2008)Google Scholar
- 8.Rao, D., Ravichandran, D.: Semi-supervised polarity lexicon induction. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 675–682. Association for Computational Linguistics (2009)Google Scholar
- 9.Blum, A., Lafferty, J., Rwebangira, M.R., Reddy, R.: Semi-supervised learning using randomized mincuts. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 13. ACM (2004)Google Scholar
- 10.Hassan, A., Radev, D.: Identifying text polarity using random walks. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 395–403. Association for Computational Linguistics (2010)Google Scholar
- 11.Kim, S.M., Hovy, E.: Automatic identification of pro and con reasons in online reviews. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 483–490. Association for Computational Linguistics (2006)Google Scholar
- 12.Andreevskaia, A., Bergler, S.: Mining wordnet for a fuzzy sentiment: sentiment tag extraction from wordnet glosses. In: EACL, vol. 6, pp. 209–215 (2006)Google Scholar
- 13.Hassan, A., Abu-Jbara, A., Jha, R., Radev, D.: Identifying the semantic orientation of foreign words. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 592–597. Association for Computational Linguistics (2011)Google Scholar
- 14.Leskovec, J., Huttenlocher, D., Kleinberg, J.: Predicting positive and negative links in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, pp. 641–650. ACM (2010)Google Scholar
- 15.Kunegis, J., Lommatzsch, A., Bauckhage, C.: The slashdot zoo: mining a social network with negative edges. In: Proceedings of the 18th International Conference on World Wide Web, pp. 741–750. ACM (2009)Google Scholar
- 16.Golbeck, J.A.: Computing and Applying Trust in Web-based Social Networks. PhD thesis, College Park, MD, USA (2005) AAI3178583Google Scholar
- 17.Hammer, H., Bai, A., Yazidi, A., Engelstad, P.: Building sentiment lexicons applying graph theory on information from three norwegian thesauruses. In: Norweian Informatics Conference (2014)Google Scholar
- 18.Nielsen, F.Å.: A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. CoRR abs/1103.2903 (2011)Google Scholar
- 19.Bing, L.: Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. Springer (2011)Google Scholar
- 20.Hammer, H.L., Solberg, P.E., Øvrelid, L.: Sentiment classification of online political discussions: a comparison of a word-based and dependency-based method. In: Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 90–96. Association for Computational Linguistics (2014)Google Scholar