Predicting the Political Sentiment of Web Log Posts Using Supervised Machine Learning Techniques Coupled with Feature Selection

  • Kathleen T. Durant
  • Michael D. Smith
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4811)

Abstract

As the number of web logs dramatically grows, readers are turning to them as an important source of information. Automatic techniques that identify the political sentiment of web log posts will help bloggers categorize and filter this exploding information source. In this paper we illustrate the effectiveness of supervised learning for sentiment classification on web log posts. We show that a Naïve Bayes classifier coupled with a forward feature selection technique can on average correctly predict a posting’s sentiment 89.77% of the time with a standard deviation of 3.01. It significantly outperforms Support Vector Machines at the 95% confidence level with a confidence interval of [1.5, 2.7]. The feature selection technique provides on average an 11.84% and a 12.18% increase for Naïve Bayes and Support Vector Machines results respectively. Previous sentiment classification research achieved an 81% accuracy using Naïve Bayes and 82.9% using SVMs on a movie domain corpus.

Keywords

Sentiment Classification Blogs Web Logs Naïve Bayes Support Vector Machines WEKA feature selection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Kathleen T. Durant
    • 1
  • Michael D. Smith
    • 1
  1. 1.Harvard University, Harvard School of Engineering and Applied Sciences, Cambridge MAUSA

Personalised recommendations