Data-Driven vs. Dictionary-Based Word n-Gram Feature Induction for Sentiment Analysis

  • Robert Remus
  • Sven Rill
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8105)

Abstract

We address the question which word n-gram feature induction approach yields the most accurate discriminative model for machine learning-based sentiment analysis within a specific domain: a purely data-driven word n-gram feature induction or a word n-gram feature induction based on a domain-specific or domain-non-specific polarity dictionary. We evaluate both approaches in document-level polarity classification experiments in 2 languages, English and German, for 4 analog domains each: user-written product reviews on books, DVDs, electronics and music. We conclude that while dictionary-based feature induction leads to large dimensionality reductions, purely data-driven feature induction yields more accurate discriminative models.

Keywords

Sentiment analysis feature induction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1–135 (2008)CrossRefGoogle Scholar
  2. 2.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 79–86 (2002)Google Scholar
  3. 3.
    Wiebe, J., Wilson, T., Bruce, R., Bell, M., Martin, M.: Learning subjective language. Computational Linguistics 30(3), 277–308 (2004)CrossRefGoogle Scholar
  4. 4.
    Lewis, D.: Feature selection and feature extraction for text categorization. In: Proceedings of the Workshop on Speech and Natural Language, pp. 212–217 (1992)Google Scholar
  5. 5.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)CrossRefGoogle Scholar
  6. 6.
    Waltinger, U.: GermanPolarityClues: A lexical resource for German sentiment analysis. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), pp. 1638–1642 (2010)Google Scholar
  7. 7.
    Waltinger, U.: An empirical study on machine learning-based sentiment classification using polarity clues. Web Information Systems and Technologies 75(4), 202–214 (2011)CrossRefGoogle Scholar
  8. 8.
    Sekine, S.: The domain dependence of parsing. In: Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP), pp. 96–102 (1997)Google Scholar
  9. 9.
    Escudero, G., Màrquez, L., Rigau, G.: An empirical study of the domain dependence of supervised word sense disambiguation systems. In: Proceedings of Joint Conference on Empirical Methods in Natural Language Processing (EMNLP) and Very Large Corpora (VLC), pp. 172–180 (2000)Google Scholar
  10. 10.
    Wang, D., Liu, Y.: A cross-corpus study of unsupervised subjectivity identification based on calibrated EM. In: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), pp. 161–167 (2011)Google Scholar
  11. 11.
    Lee, D.: Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the bnc jungle. Language Learning & Technology 5(3), 37–72 (2001)Google Scholar
  12. 12.
    Bank, M., Remus, R., Schierle, M.: Textual characteristics for language engineering. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), pp. 515–519 (2012)Google Scholar
  13. 13.
    Remus, R., Bank, M.: Textual characteristics of different-sized corpora. In: Proceedings of the 5th Workshop on Building and Using Comparable Corpora (BUCC), pp. 156–160 (2012)Google Scholar
  14. 14.
    Remus, R.: Domain adaptation using domain similarity- and domain complexity-based instance selection for cross-domain sentiment analysis. In: Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW 2012), Workshop on Sentiment Elicitation from Natural Text for Information Retrieval and Extraction (SENTIRE), pp. 717–723 (2012)Google Scholar
  15. 15.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Computational Linguistics 35(3), 399–433 (2009)CrossRefGoogle Scholar
  16. 16.
    Fahrni, A., Klenner, M.: Old wine or warm beer: Target-specific sentiment analysis of adjectives. In: Proceedings of the Symposium on Affective Language in Human and Machine, AISB Convention, pp. 60–63 (2008)Google Scholar
  17. 17.
    Wu, Y., Jin, P.: SemEval-2010 task 18: Disambiguating sentiment ambiguous adjectives. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval), pp. 81–85 (2010)Google Scholar
  18. 18.
    Vapnik, V.: The Nature of Statistical Learning. Springer, New York (1995)CrossRefMATHGoogle Scholar
  19. 19.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  20. 20.
    Noreen, E.: Computer Intensive Methods for Testing Hypothesis – An Introduction. John Wiley and Sons, Inc. (1989)Google Scholar
  21. 21.
    Yeh, A.: More accurate tests for the statistical significance of result differences. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING), pp. 947–953 (2000)Google Scholar
  22. 22.
    Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 440–447 (2007)Google Scholar
  23. 23.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology (HLT) and the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 347–354 (2005)Google Scholar
  24. 24.
    Esuli, A., Sebastiani, F.: SentiWordNet: A publicly available lexical resource for opinion mining. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pp. 417–422 (2006)Google Scholar
  25. 25.
    Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), pp. 2200–2204 (2010)Google Scholar
  26. 26.
    Remus, R., Quasthoff, U., Heyer, G.: SentiWS – a publicly available German-language resource for sentiment analysis. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), pp. 1168–1171 (2010)Google Scholar
  27. 27.
    Rill, S., Scheidt, J., Drescher, J., Schütz, O., Reinel, D., Wogenstein, F.: A generic approach to generate opinion lists of phrases for opinion mining applications. In: Proceedings of the 1st International Workshop on Issues of Sentiment Discovery and Opinion Mining, WISDOM (2012)Google Scholar
  28. 28.
    Rill, S., Adolph, S., Drescher, J., Reinel, D., Scheidt, J., Schütz, O., Wogenstein, F., Zicari, R., Korfiatis, N.: A phrase-based opinion list for the german language. In: Proceedings of the 1st Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis (PATHOS), pp. 305–313 (2012)Google Scholar
  29. 29.
    Polanyi, L., Zaenen, A.: Contextual Valence Shifters. In: Computing Attitude and Affect in Text: Theory and Applications. The Information Retrieval Series, vol. 20, pp. 1–9. Springer, Dordrecht (2006)CrossRefGoogle Scholar
  30. 30.
    Wiegand, M., Balahur, A., Roth, B., Klakow, D., Montoyo, A.: A survey on the role of negation in sentiment analysis. In: Proceedings of the 2010 Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP), pp. 60–68 (2010)Google Scholar
  31. 31.
    Choi, Y., Cardie, C.: Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the 13th Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 793–801 (2008)Google Scholar
  32. 32.
    Klenner, M., Petrakis, S., Fahrni, A.: Robust compositional polarity classification. In: Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing (RANLP), pp. 180–184 (2009)Google Scholar
  33. 33.
    Liu, J., Seneff, S.: Review sentiment scoring via a parse-and-paraphrase paradigm. In: Proceedings of the 14th Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 161–169 (2008)Google Scholar
  34. 34.
    Moilanen, K., Pulman, S.: Sentiment composition. In: Proceedings of the 6th International Conference on Recent Advances in Natural Language Processing (RANLP), pp. 378–382 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Robert Remus
    • 1
  • Sven Rill
    • 2
    • 3
  1. 1.Natural Language Processing Group, Department of Computer ScienceUniversity of LeipzigGermany
  2. 2.Goethe University FrankfurtGermany
  3. 3.Institute of Information SystemsUniversity of Applied Sciences HofHofGermany

Personalised recommendations