R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora

  • Ben BlameyEmail author
  • Tom Crick
  • Giles Oatley
Conference paper


Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.


Online Social Network Sentiment Analysis Computational Linguistics National Happiness Sentiment Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    M. Hagiwara. Unnatural Language Processing Contest 2nd will be held at NLP2011 (2010). URL
  2. 2.
    A. Ritter, S. Clark, Mausam, O. Etzioni, in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (ACL, Edinburgh, Scotland, UK., 2011), pp. 1524– 1534Google Scholar
  3. 3.
    S. Brody, N. Diakopoulos, in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (ACL, Edinburgh, Scotland, UK., 2011), pp. 562–570Google Scholar
  4. 4.
    C. Shannon, Bell System Technical Journal (27), 379 (1948)Google Scholar
  5. 5.
    D. Klein, J. Smarr, H. Nguyen, C. Manning, in Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2003), CONLL ’03, pp. 180–183. DOI  10.3115/1119176. 1119204
  6. 6.
    N. Xue, Computational Linguistics and Chinese Language Processing 8, 29 (2003)Google Scholar
  7. 7.
    F. Peng, D. Schuurmans, S. Wang, V. Keselj, in Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2003), EACL ’03, pp. 267–274. DOI  10.3115/1067807.1067843
  8. 8.
    W.B. Cavnar, J.M. Trenkle, in Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval (1994), pp. 161–175Google Scholar
  9. 9.
    S. Raaijmakers, W. Kraaij, in ICWSM (2008)Google Scholar
  10. 10.
    R. Pon, A. C’ardenas, D. Buttler, T. Critchlow, in Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on (2007), pp. 354–361. DOI  10.1109/CIDM.2007. 368896. URL 10.1109/CIDM.2007.368896
  11. 11.
    Q. Ye, Z. Zhang, R. Law, Expert Syst. Appl. 36(3), 6527 (2009). DOI  10.1016/j.eswa.2008. 07.035
  12. 12.
    F. Peng, D. Schuurmans, S. Wang, in Proc. of HLT-NAACL 03 (2003), pp. 110–117Google Scholar
  13. 13.
    B. Carpenter. Yahoo group message discussion (2010). URL com/group/LingPipe/message/917
  14. 14.
    K. Rybina, Sentiment analysis of contexts around query terms in documents. Master’s thesis (2012)Google Scholar
  15. 15.
    A. Go, R. Bhayani, L. Huang, Processing 150(12), 1 (2009)Google Scholar
  16. 16.
    A. Pak, P. Paroubek, in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10) (ELRA, Valletta, Malta, 2010)Google Scholar
  17. 17.
    D. Bespalov, B. Bai, Y. Qi, A. Shokoufandeh, in Proceedings of the 20th ACM international conference on Information and knowledge management (ACM, New York, NY, USA, 2011), CIKM ’11, pp. 375–382Google Scholar
  18. 18.
    F.M..B.R. Pennebaker, J.W. Linguistic inquiry and word count: Liwc2001 (2001)Google Scholar
  19. 19.
    A.D.I. Kramer. Facebook gross national happiness application (2010). URL http://www.
  20. 20.
    J. Read, Proceedings of the ACL Student ResearchWorkshop on ACL 05 43(June), 43 (2005)Google Scholar
  21. 21.
    B. Pang, L. Lee, S. Vaithyanathan, in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, 2002), pp. 79–86Google Scholar
  22. 22.
    S. Das, M. Chen, in Asia Pacific Finance Assc. Annual Conf. (APFA) (2001)Google Scholar
  23. 23.
    T. Joachims, Making large-scale SVM learning practical (MIT press, 1999)Google Scholar
  24. 24.
    Alias-i. Lingpipe 4.1.0 (2008). URL
  25. 25.
    B. Carpenter, in Proceedings of the Workshop on Software (Association for Computational Linguistics, Stroudsburg, PA, USA, 2005), Software ’05, pp. 86–99Google Scholar

Copyright information

© Springer-Verlag London 2012

Authors and Affiliations

  1. 1.Cardiff Metropolitan UniversityCardiffUK

Personalised recommendations