Data-Driven vs. Dictionary-Based Word n-Gram Feature Induction for Sentiment Analysis

Remus, Robert; Rill, Sven

doi:10.1007/978-3-642-40722-2_18

Data-Driven vs. Dictionary-Based Word n-Gram Feature Induction for Sentiment Analysis

Robert Remus²² &
Sven Rill^23,24

Conference paper

1365 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8105))

Abstract

We address the question which word n-gram feature induction approach yields the most accurate discriminative model for machine learning-based sentiment analysis within a specific domain: a purely data-driven word n-gram feature induction or a word n-gram feature induction based on a domain-specific or domain-non-specific polarity dictionary. We evaluate both approaches in document-level polarity classification experiments in 2 languages, English and German, for 4 analog domains each: user-written product reviews on books, DVDs, electronics and music. We conclude that while dictionary-based feature induction leads to large dimensionality reductions, purely data-driven feature induction yields more accurate discriminative models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1–135 (2008)
Article Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 79–86 (2002)
Google Scholar
Wiebe, J., Wilson, T., Bruce, R., Bell, M., Martin, M.: Learning subjective language. Computational Linguistics 30(3), 277–308 (2004)
Article Google Scholar
Lewis, D.: Feature selection and feature extraction for text categorization. In: Proceedings of the Workshop on Speech and Natural Language, pp. 212–217 (1992)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)
Article Google Scholar
Waltinger, U.: GermanPolarityClues: A lexical resource for German sentiment analysis. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), pp. 1638–1642 (2010)
Google Scholar
Waltinger, U.: An empirical study on machine learning-based sentiment classification using polarity clues. Web Information Systems and Technologies 75(4), 202–214 (2011)
Article Google Scholar
Sekine, S.: The domain dependence of parsing. In: Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP), pp. 96–102 (1997)
Google Scholar
Escudero, G., Màrquez, L., Rigau, G.: An empirical study of the domain dependence of supervised word sense disambiguation systems. In: Proceedings of Joint Conference on Empirical Methods in Natural Language Processing (EMNLP) and Very Large Corpora (VLC), pp. 172–180 (2000)
Google Scholar
Wang, D., Liu, Y.: A cross-corpus study of unsupervised subjectivity identification based on calibrated EM. In: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), pp. 161–167 (2011)
Google Scholar
Lee, D.: Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the bnc jungle. Language Learning & Technology 5(3), 37–72 (2001)
Google Scholar
Bank, M., Remus, R., Schierle, M.: Textual characteristics for language engineering. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), pp. 515–519 (2012)
Google Scholar
Remus, R., Bank, M.: Textual characteristics of different-sized corpora. In: Proceedings of the 5th Workshop on Building and Using Comparable Corpora (BUCC), pp. 156–160 (2012)
Google Scholar
Remus, R.: Domain adaptation using domain similarity- and domain complexity-based instance selection for cross-domain sentiment analysis. In: Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW 2012), Workshop on Sentiment Elicitation from Natural Text for Information Retrieval and Extraction (SENTIRE), pp. 717–723 (2012)
Google Scholar
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Computational Linguistics 35(3), 399–433 (2009)
Article Google Scholar
Fahrni, A., Klenner, M.: Old wine or warm beer: Target-specific sentiment analysis of adjectives. In: Proceedings of the Symposium on Affective Language in Human and Machine, AISB Convention, pp. 60–63 (2008)
Google Scholar
Wu, Y., Jin, P.: SemEval-2010 task 18: Disambiguating sentiment ambiguous adjectives. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval), pp. 81–85 (2010)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning. Springer, New York (1995)
Book MATH Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Noreen, E.: Computer Intensive Methods for Testing Hypothesis – An Introduction. John Wiley and Sons, Inc. (1989)
Google Scholar
Yeh, A.: More accurate tests for the statistical significance of result differences. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING), pp. 947–953 (2000)
Google Scholar
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 440–447 (2007)
Google Scholar
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology (HLT) and the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 347–354 (2005)
Google Scholar
Esuli, A., Sebastiani, F.: SentiWordNet: A publicly available lexical resource for opinion mining. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pp. 417–422 (2006)
Google Scholar
Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), pp. 2200–2204 (2010)
Google Scholar
Remus, R., Quasthoff, U., Heyer, G.: SentiWS – a publicly available German-language resource for sentiment analysis. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), pp. 1168–1171 (2010)
Google Scholar
Rill, S., Scheidt, J., Drescher, J., Schütz, O., Reinel, D., Wogenstein, F.: A generic approach to generate opinion lists of phrases for opinion mining applications. In: Proceedings of the 1st International Workshop on Issues of Sentiment Discovery and Opinion Mining, WISDOM (2012)
Google Scholar
Rill, S., Adolph, S., Drescher, J., Reinel, D., Scheidt, J., Schütz, O., Wogenstein, F., Zicari, R., Korfiatis, N.: A phrase-based opinion list for the german language. In: Proceedings of the 1st Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis (PATHOS), pp. 305–313 (2012)
Google Scholar
Polanyi, L., Zaenen, A.: Contextual Valence Shifters. In: Computing Attitude and Affect in Text: Theory and Applications. The Information Retrieval Series, vol. 20, pp. 1–9. Springer, Dordrecht (2006)
Chapter Google Scholar
Wiegand, M., Balahur, A., Roth, B., Klakow, D., Montoyo, A.: A survey on the role of negation in sentiment analysis. In: Proceedings of the 2010 Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP), pp. 60–68 (2010)
Google Scholar
Choi, Y., Cardie, C.: Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the 13th Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 793–801 (2008)
Google Scholar
Klenner, M., Petrakis, S., Fahrni, A.: Robust compositional polarity classification. In: Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing (RANLP), pp. 180–184 (2009)
Google Scholar
Liu, J., Seneff, S.: Review sentiment scoring via a parse-and-paraphrase paradigm. In: Proceedings of the 14th Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 161–169 (2008)
Google Scholar
Moilanen, K., Pulman, S.: Sentiment composition. In: Proceedings of the 6th International Conference on Recent Advances in Natural Language Processing (RANLP), pp. 378–382 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Natural Language Processing Group, Department of Computer Science, University of Leipzig, Germany
Robert Remus
Goethe University Frankfurt, Germany
Sven Rill
Institute of Information Systems, University of Applied Sciences Hof, Hof, Germany
Sven Rill

Authors

Robert Remus
View author publications
You can also search for this author in PubMed Google Scholar
Sven Rill
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technical University Darmstadt, 64289 Darmstadt, Germany, and German Institute for International Education Research,, 60486, Frankfurt, Germany
Iryna Gurevych
Technical University Darmstadt, 64289, Darmstadt, Germany
Chris Biemann
Technical University Darmstadt, 64289 Darmsadt, and German Institute for International Educational Research, 60486, Frankfurt, Germany
Torsten Zesch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Remus, R., Rill, S. (2013). Data-Driven vs. Dictionary-Based Word n-Gram Feature Induction for Sentiment Analysis. In: Gurevych, I., Biemann, C., Zesch, T. (eds) Language Processing and Knowledge in the Web. Lecture Notes in Computer Science(), vol 8105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40722-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-40722-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40721-5
Online ISBN: 978-3-642-40722-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics