Skip to main content

Predicting the Political Sentiment of Web Log Posts Using Supervised Machine Learning Techniques Coupled with Feature Selection

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4811))

Abstract

As the number of web logs dramatically grows, readers are turning to them as an important source of information. Automatic techniques that identify the political sentiment of web log posts will help bloggers categorize and filter this exploding information source. In this paper we illustrate the effectiveness of supervised learning for sentiment classification on web log posts. We show that a Naïve Bayes classifier coupled with a forward feature selection technique can on average correctly predict a posting’s sentiment 89.77% of the time with a standard deviation of 3.01. It significantly outperforms Support Vector Machines at the 95% confidence level with a confidence interval of [1.5, 2.7]. The feature selection technique provides on average an 11.84% and a 12.18% increase for Naïve Bayes and Support Vector Machines results respectively. Previous sentiment classification research achieved an 81% accuracy using Naïve Bayes and 82.9% using SVMs on a movie domain corpus.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beineke, P., Hastie, T., Vaithyanathan, S.: The Sentimental Factor: Improving Review Classification via Human-Provided Information. In: ACL 2004. Proceedings ACL: Association of Computational Linguistics, Barcelona, pp. 263-270 (2004)

    Google Scholar 

  2. Carroll, J.: Local TV and Newspapers Remain Most Popular News Sources, Increased use of Internet news this year. The Gallup Poll. poll.gallup.comcontent/default.aspx?CI=14389 (December 2004)

    Google Scholar 

  3. Das, S., Chen, M.: Yahoo! for Amazon: Extracting Marketing Sentiment from Stock Message Boards. In: APFA 2001. Proceedings of the 8th Asia Pacific Finance Association Annual Conference (2001)

    Google Scholar 

  4. Dube, J.: Blog Readership up 58% in 2004. CyberJournalist.net (January 2005), www.cyberjournalist.net/news/001819.php

  5. Engström, C.: Topic Dependence in sentiment classification. Master’s thesis, St Edmunds’s College, University of Cambridge (2004)

    Google Scholar 

  6. Gard, L.: The Business of Blogging. Business Week Online (December 2004)

    Google Scholar 

  7. Hatzivassiloglou, V., McKeown, K.: Predicting the Semantic Orientation of Adjectives. In: Proceedings of the ACL-EACL 1997 Joint Conference: 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pp. 174–181 (1997)

    Google Scholar 

  8. Hearst, M.: Direction-based text interpretation as an information access refinement. In: Jacobs, P. (ed.) Text-Based Intelligent Systems, Lawrence Erlbaum Associated (1992)

    Google Scholar 

  9. Hu, M., Liu,B.: Mining and Summarizing Customer Reviews. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining KDD 2004, pp.168-174 (2004)

    Google Scholar 

  10. Huettner, A., Subasic, P.: Fuzzy typing for document management. In: ACL 2000 Companion Volume: Tutorial Abstracts and Demonstration Notes, pp. 26–27 (2000)

    Google Scholar 

  11. Kushal, D., Lawrence, S., Pennock, D.: Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In: WW W 2003. Proceedings of the Twelfth International World Wide Conferences, pp. 519–553 (2003)

    Google Scholar 

  12. Madden, M.: Online Pursuits: The Changing Picture of Who’s Online and What They Do. Pew Internet and the American Life Project Report (2003), www.pewinternet.org/PPF/r/106/report_display.asp

  13. Nasukawa, T., Yi, J.: Sentiment Analysis: Capturing Favorability Using Natural Language Processing. In: Proceedings of the K-CAP-03, 2nd International Conference on Knowledge Capture, pp. 70–77 (2003)

    Google Scholar 

  14. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79-86 (2002)

    Google Scholar 

  15. Pang, B., Lee, L.: A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In: Proceedings of the 42nd ACL, pp. 271-278 (2004)

    Google Scholar 

  16. Pew Internet and the American Life Project (2004), www.pewinternet.org/trends/Internet%20Activities_12.21.04.htm

  17. Pew Internet and the American Life Project (2005), www.pewinternet.org/trends/Internet_Activities_12.05.05.htm

  18. Rainie, L.: The State of Blogging. Pew Intenet and the American Life Project Report (2005), www.pewinternet.org/PPF/r/144/report_display.asp

  19. Rainie, L., Shermak J.: Search engine use shoots up in the past year and edges towards email as the primary internet application. Pew Internet and the American Life Project Report in conjunction with comScore Media Metrix (2005), www.pewinternet.org/pdfs/PIP_SearchData_1105.pdf

  20. Sack, W.: On the computation of point of view. In: Proceedings of the Twelfth American Association of Artificial Intelligence (AAAI), pp. 1488. Student Abstract (1994), www.pewinternet.org/pdfs/PIP_SearchData_1105.pdfv

  21. Tong, R M.: An Operational System for Detecting and Tracking Opinions in On-line Discussion. In: SIGIR 2001 Workshop on Operational Text Classification (2001)

    Google Scholar 

  22. Turney, P.D., Littman, M.L.: Unsupervised Learning of Semantic Orientation from a Hundred-billion-word Corpus. Technical Report EGB-1094, National Research Council Canada (2002)

    Google Scholar 

  23. Witten, I.H., Frank, E.: Data Mining Practical Learning Tools and Techniques with Java Implementations. Academic Press, San Diego, CA (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Olfa Nasraoui Myra Spiliopoulou Jaideep Srivastava Bamshad Mobasher Brij Masand

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Durant, K.T., Smith, M.D. (2007). Predicting the Political Sentiment of Web Log Posts Using Supervised Machine Learning Techniques Coupled with Feature Selection. In: Nasraoui, O., Spiliopoulou, M., Srivastava, J., Mobasher, B., Masand, B. (eds) Advances in Web Mining and Web Usage Analysis. WebKDD 2006. Lecture Notes in Computer Science(), vol 4811. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77485-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77485-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77484-6

  • Online ISBN: 978-3-540-77485-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics