Active Learning Enhanced Document Annotation for Sentiment Analysis

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8127)


Sentiment analysis is a popular research area devoted to methods allowing automatic analysis of the subjectivity in textual content. Many of these methods are based on the using of machine learning and they usually depend on manually annotated training corpora. However, the creation of corpora is a time-consuming task, which leads to necessity of methods facilitating this process. Methods of active learning, aimed at the selection of the most informative examples according to the given classification task, can be utilized in order to increase the effectiveness of the annotation. Currently it is a lack of systematical research devoted to the application of active learning in the creation of corpora for sentiment analysis. Hence, the aim of this work is to survey some of the active learning strategies applicable in annotation tools used in the context of sentiment analysis. We evaluated compared strategies on the domain of product reviews. The results of experiments confirmed the increase of the corpus quality in terms of higher classification accuracy achieved on the test set for most of the evaluated strategies (more than 20% higher accuracy in comparison to the random strategy).


sentiment analysis active learning semi-automatic annotation text mining 


  1. 1.
    Koncz, P., Paralic, J.: An approach to feature selection for sentiment analysis. In: 15th IEEE International Conference on Intelligent Engineering Systems (INES 2011), pp. 357–362 (2011)Google Scholar
  2. 2.
    Liu, B.: Sentiment Analysis and Opinion Mining. Morgan & Claypool (2012)Google Scholar
  3. 3.
    Koncz, P., Paralič, J.: Automated creation of corpora for the needs of sentiment analysis. In: 3rd RapidMiner Community Meeting and Conference (RCOMM 2012), pp. 107–113. Shaker Verlag, Aachen (2012)Google Scholar
  4. 4.
    Kamps, J., Marx, M., Mokken, R.J., de Rijke, M.: Using WordNet to Measure Semantic Orientations of Adjectives. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pp. 1115–1118 (2004)Google Scholar
  5. 5.
    Kim, S.-M., Hovy, E.: Automatic Detection of Opinion Bearing Words and Sentences. In: Proceedings of the Second International Joint Conference on Natural Language Processing (JCNLP 2005), pp. 61–66 (2005)Google Scholar
  6. 6.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 79–86 (2002)Google Scholar
  7. 7.
    Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Trans. Inf. Syst. 26(3), 1–34 (2008)CrossRefGoogle Scholar
  8. 8.
    Abbasi, A., France, S., Zhang, Z., Chen, H.: Selecting Attributes for Sentiment Classification Using Feature Relation Networks. IEEE Transactions on Knowledge and Data Engineering 23(3), 447–462 (2011)CrossRefGoogle Scholar
  9. 9.
    Prabowo, R., Thelwall, M.: Sentiment analysis: A combined approach. Journal of Informetrics 3(2), 143–157 (2009)CrossRefGoogle Scholar
  10. 10.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics, Barcelona (2004)Google Scholar
  11. 11.
    Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Information Sciences 181(6), 1138–1152 (2011)CrossRefGoogle Scholar
  12. 12.
    Petz, G., Karpowicz, M., Fürschuß, H., Auinger, A., Winkler, S.M., Schaller, S., Holzinger, A.: On Text Preprocessing for Opinion Mining Outside of Laboratory Environments. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds.) AMT 2012. LNCS, vol. 7669, pp. 618–629. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Zhou, S., Chen, Q., Wang, X.: Active deep learning method for semi-supervised sentiment classification. Neurocomputing (May 2013)Google Scholar
  14. 14.
    Dasgupta, S., Ng, V.: Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language, pp. 701–709 (2009)Google Scholar
  15. 15.
    Olsson, F.: A literature survey of active machine learning in the context of natural language processing SE - SICS Technical Report. Swedish Institute of Computer Science, Box 1263, SE-164 29 Kista, Sweden (2009)Google Scholar
  16. 16.
    Boiy, E., Moens, M.-F.: A machine learning approach to sentiment analysis in multilingual Web texts. Information Retrieval 12(5), 526–558 (2008)CrossRefGoogle Scholar
  17. 17.
    Li, S., Ju, S., Zhou, G., Li, X.: Active learning for imbalanced sentiment classification. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 139–148 (2012)Google Scholar
  18. 18.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. In: Proceeding of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2005), pp. 347–354 (2005)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2013

Authors and Affiliations

  1. 1.Dept. of Cybernetics and Artificial IntelligenceTechnical University of KošiceSlovak Republic

Personalised recommendations