Advertisement

Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets

  • Nicola Burns
  • Yaxin Bi
  • Hui Wang
  • Terry Anderson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6881)

Abstract

More people are buying products online and expressing their opinions on these products through online reviews. Sentiment analysis can be used to extract valuable information from reviews, and the results can benefit both consumers and manufacturers. This research shows a study which compares two well known machine learning algorithms namely, dynamic language model and naïve Bayes classifier. Experiments have been carried out to determine the consistency of results when the datasets are of different sizes and also the effect of a balanced or unbalanced dataset. The experimental results indicate that both the algorithms over a realistic unbalanced dataset can achieve better results than the balanced datasets commonly used in research.

Keywords

Sentiment analysis opinion mining naïve Bayes language model 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alias-I 2008, LingPipe 3.9.2, http://www.alias-i.com/lingpipe (March 1, 2010)
  2. 2.
    Blitzer, J., Dredze, M., Pereira, F.: Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In: 45th Annual Meeting of the Association for Computational Linguistics, pp. 440–447. ACL, Prague (2007)Google Scholar
  3. 3.
    Carpenter, B.: Scaling High-Order Character Language Models to Gigabytes. In: Workshop on Software, pp. 86–99. Association for Computational Linguistics, Morristown (2005)Google Scholar
  4. 4.
    Dave, K., Lawrence, S., Pennock, D.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: 12th International Conference on World Wide Web, pp. 519–528. ACM, New York (2003)Google Scholar
  5. 5.
    Ding, X., Liu, B.: The Utility of Linguistic Rules in Opinion Mining. In: 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 811–812. ACM, New York (2007)Google Scholar
  6. 6.
    Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: International Conference on Web Search and Web Data Mining, pp. 231–240. ACM, New York (2008)CrossRefGoogle Scholar
  7. 7.
    Hu, M., Liu, B.: Mining opinion features in customer reviews. In: 19th National Conference on Artificial Intelligence, pp. 755–760. AAAI Press / The MIT Press (2004b)Google Scholar
  8. 8.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM, New York (2004a)CrossRefGoogle Scholar
  9. 9.
    Jelinek, F., Merialdo, B., Roukos, S., Strauss, M.: A dynamic language model for speech recognition. In: Workshop on Speech and Natural Language, pp. 293–295. Association for Computational Linguistics, Morristown (1991)Google Scholar
  10. 10.
    Liu, B., Hu, M., Cheng, J.: Opinion Observer: Analyzing and Comparing Opinions on the Web. In: Proceedings of the 14th International Conference on World Wide Web, pp. 342–351. ACM, New York (2005)CrossRefGoogle Scholar
  11. 11.
    Pang, B., Lee, L.: A Sentimental Education: Sentiment Analysis using subjectivity summarization based on minimum cuts. In: 42nd Annual Meeting on Association for Computational Linguistics, pp. 271–278. Association for Computational Linguistics, Morristown (2004)Google Scholar
  12. 12.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics, Morristown (2002)Google Scholar
  13. 13.
    Potter, M.: February 1st-last update, European online retail sales up (2010), http://uk.reuters.com/article/2010/02/01/uk-europe-retail-online-idUKTRE61000G20100201 (March 1, 2011)
  14. 14.
    Thelen, M., Riloff, E.: A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. In: ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 214–221. Association for Computational Linguistics, Morristown (2002)Google Scholar
  15. 15.
    Turney, P.D.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proceedings of the 40th Annual Meeting of the ACL, pp. 417–424. Association for Computational Linguistics, Morristown (2002)Google Scholar
  16. 16.
    Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications: An International Journal 36(3), 6527–6535 (2009)CrossRefGoogle Scholar
  17. 17.
    Yu, H., Hatzivassiloglou, V.: Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 129–136. Association for Computational Linguistics, Morristown (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Nicola Burns
    • 1
  • Yaxin Bi
    • 1
  • Hui Wang
    • 1
  • Terry Anderson
    • 1
  1. 1.University of UlsterUK

Personalised recommendations