Skip to main content

Active Learning Enhanced Document Annotation for Sentiment Analysis

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 8127)


Sentiment analysis is a popular research area devoted to methods allowing automatic analysis of the subjectivity in textual content. Many of these methods are based on the using of machine learning and they usually depend on manually annotated training corpora. However, the creation of corpora is a time-consuming task, which leads to necessity of methods facilitating this process. Methods of active learning, aimed at the selection of the most informative examples according to the given classification task, can be utilized in order to increase the effectiveness of the annotation. Currently it is a lack of systematical research devoted to the application of active learning in the creation of corpora for sentiment analysis. Hence, the aim of this work is to survey some of the active learning strategies applicable in annotation tools used in the context of sentiment analysis. We evaluated compared strategies on the domain of product reviews. The results of experiments confirmed the increase of the corpus quality in terms of higher classification accuracy achieved on the test set for most of the evaluated strategies (more than 20% higher accuracy in comparison to the random strategy).


  • sentiment analysis
  • active learning
  • semi-automatic annotation
  • text mining


  1. Koncz, P., Paralic, J.: An approach to feature selection for sentiment analysis. In: 15th IEEE International Conference on Intelligent Engineering Systems (INES 2011), pp. 357–362 (2011)

    Google Scholar 

  2. Liu, B.: Sentiment Analysis and Opinion Mining. Morgan & Claypool (2012)

    Google Scholar 

  3. Koncz, P., Paralič, J.: Automated creation of corpora for the needs of sentiment analysis. In: 3rd RapidMiner Community Meeting and Conference (RCOMM 2012), pp. 107–113. Shaker Verlag, Aachen (2012)

    Google Scholar 

  4. Kamps, J., Marx, M., Mokken, R.J., de Rijke, M.: Using WordNet to Measure Semantic Orientations of Adjectives. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pp. 1115–1118 (2004)

    Google Scholar 

  5. Kim, S.-M., Hovy, E.: Automatic Detection of Opinion Bearing Words and Sentences. In: Proceedings of the Second International Joint Conference on Natural Language Processing (JCNLP 2005), pp. 61–66 (2005)

    Google Scholar 

  6. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 79–86 (2002)

    Google Scholar 

  7. Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Trans. Inf. Syst. 26(3), 1–34 (2008)

    CrossRef  Google Scholar 

  8. Abbasi, A., France, S., Zhang, Z., Chen, H.: Selecting Attributes for Sentiment Classification Using Feature Relation Networks. IEEE Transactions on Knowledge and Data Engineering 23(3), 447–462 (2011)

    CrossRef  Google Scholar 

  9. Prabowo, R., Thelwall, M.: Sentiment analysis: A combined approach. Journal of Informetrics 3(2), 143–157 (2009)

    CrossRef  Google Scholar 

  10. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics, Barcelona (2004)

    Google Scholar 

  11. Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Information Sciences 181(6), 1138–1152 (2011)

    CrossRef  Google Scholar 

  12. Petz, G., Karpowicz, M., Fürschuß, H., Auinger, A., Winkler, S.M., Schaller, S., Holzinger, A.: On Text Preprocessing for Opinion Mining Outside of Laboratory Environments. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds.) AMT 2012. LNCS, vol. 7669, pp. 618–629. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  13. Zhou, S., Chen, Q., Wang, X.: Active deep learning method for semi-supervised sentiment classification. Neurocomputing (May 2013)

    Google Scholar 

  14. Dasgupta, S., Ng, V.: Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language, pp. 701–709 (2009)

    Google Scholar 

  15. Olsson, F.: A literature survey of active machine learning in the context of natural language processing SE - SICS Technical Report. Swedish Institute of Computer Science, Box 1263, SE-164 29 Kista, Sweden (2009)

    Google Scholar 

  16. Boiy, E., Moens, M.-F.: A machine learning approach to sentiment analysis in multilingual Web texts. Information Retrieval 12(5), 526–558 (2008)

    CrossRef  Google Scholar 

  17. Li, S., Ju, S., Zhou, G., Li, X.: Active learning for imbalanced sentiment classification. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 139–148 (2012)

    Google Scholar 

  18. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. In: Proceeding of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2005), pp. 347–354 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 IFIP International Federation for Information Processing

About this paper

Cite this paper

Koncz, P., Paralič, J. (2013). Active Learning Enhanced Document Annotation for Sentiment Analysis. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds) Availability, Reliability, and Security in Information Systems and HCI. CD-ARES 2013. Lecture Notes in Computer Science, vol 8127. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40510-5

  • Online ISBN: 978-3-642-40511-2

  • eBook Packages: Computer ScienceComputer Science (R0)