Advertisement

On Enhancing the Label Propagation Algorithm for Sentiment Analysis Using Active Learning with an Artificial Oracle

  • Anis Yazidi
  • Hugo Lewi Hammer
  • Aleksander Bai
  • Paal Engelstad
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9120)

Abstract

A core component of Sentiment Analysis is the generation of sentiment lists. Label propagation is equivocally one of the most used approaches for generating sentiment lists based on annotated seed words in a manual manner. Words which are situated many hops away from the seed words tend to get low sentiment values. Such inherent property of the Label Propagation algorithm poses a controversial challenge in sentiment analysis. In this paper, we propose an iterative approach based on the theory of Active Learning [1] that attempts to remedy to this problem without any need for additional manual labeling. Our algorithm is bootstrapped with a limited amount of seeds. Then, at each iteration, a fixed number of “informative words” are selected as new seeds for labeling according to different criteria that we will elucidate in the paper. Subsequently, the Label Propagation is retrained in the next iteration with the additional labeled seeds. A major contribution of this article is that, unlike the theory of Active Learning that prompts the user for additional labeling, we generate the additional seeds with an Artificial Oracle. This is radically different from the main stream of Active Learning Theory that resorts to a human (user) as oracle for labeling those additional seeds. Consequently, we relieve the user from the cumbersome task of manual annotation while still achieving a high performance. The lexicons were evaluated by classifying product and movie reviews. Most of the generated sentiment lexicons using Active learning perform better than the Label Propagation algorithm.

Keywords

Sentiment analysis Label propagation Active learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Settles, B.: Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6(1), 1–114 (2012)CrossRefGoogle Scholar
  2. 2.
    Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, Technical Report CMU-CALD-02-107, Carnegie Mellon University (2002)Google Scholar
  3. 3.
    Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proceedings of AAAI, pp. 755–760 (2004)Google Scholar
  4. 4.
    Mohammad, S., Dunne, C., Dorr, B.: Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 599–608. Association for Computational Linguistics (2009)Google Scholar
  5. 5.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 168–177 (2004)Google Scholar
  6. 6.
    Kamps, J., Marx, M., Mokken, R.J., De Rijke, M.: Using wordnet to measure semantic orientations of adjectives (2004)Google Scholar
  7. 7.
    Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  8. 8.
    Blair-Goldensohn, S., Hannan, K., McDonald, R., Neylon, T., Reis, G.A., Reynar, J.: Building a sentiment summarizer for local service reviews. In: WWW Workshop on NLP in the Information Explosion Era, p. 14 (2008)Google Scholar
  9. 9.
    Rao, D., Ravichandran, D.: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. In: Association for Computational Linguistics, pp. 675–682 (2009)Google Scholar
  10. 10.
    Blum, A., Lafferty, J., Rwebangira, M.R., Reddy, R.: Semi-supervised learning using randomized mincuts. In: Proceedings of the twenty-first International Conference on Machine Learning, p. 13. ACM (2004)Google Scholar
  11. 11.
    Hassan, A., Radev, D.: Identifying text polarity using random walks. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 395–403. Association for Computational Linguistics (2010)Google Scholar
  12. 12.
    Kim, S.M., Hovy, E.: Automatic identification of pro and con reasons in online reviews. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 483–490. Association for Computational Linguistics (2006)Google Scholar
  13. 13.
    Peng, W., Park, D.H.: Generate adjective sentiment dictionary for social media sentiment analysis using constrained nonnegative matrix factorization. Urbana 51, 61801 (2004)Google Scholar
  14. 14.
    Hassan, A., Abu-Jbara, A., Jha, R., Radev, D.: Identifying the semantic orientation of foreign words. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2, pp. 592–597. Association for Computational Linguistics (2011)Google Scholar
  15. 15.
    Hammer, H., Bai, A., Yazidi, A., Engelstad, P.: Building sentiment lexicons applying graph theory on information from three Norwegian thesauruses. In: Norweian Informatics Conference (2014)Google Scholar
  16. 16.
    Nielsen, F.Å.: A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. CoRR abs/1103.2903 (2011)Google Scholar
  17. 17.
    Bing, L.: Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. Springer (2011)Google Scholar
  18. 18.
    Hammer, H.L., Solberg, P.E., Øvrelid, L.: Sentiment classification of online political discussions: a comparison of a word-based and dependency-based method. In: Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 90–96. Association for Computational Linguistics (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Anis Yazidi
    • 1
  • Hugo Lewi Hammer
    • 1
  • Aleksander Bai
    • 1
  • Paal Engelstad
    • 1
  1. 1.Department of Computer ScienceOslo and Akershus University College of Applied SciencesOsloNorway

Personalised recommendations