Skip to main content

Sensing Social Media: A Range of Approaches for Sentiment Analysis

  • Chapter
  • First Online:
Cyberemotions

Part of the book series: Understanding Complex Systems ((UCS))

Abstract

Sentiment analysis deals with the computational detection and extraction of opinions, beliefs and emotions in written text. It combines theories and methodologies from a diverse set of scientific domains, such as psychology, natural language processing and machine learning. It fulfils the very important role of transforming the unstructured textual communication between social media users into quantifiable and informed estimations of expressed sentiment, which can subsequently be used by physicists, sociologists, complex system experts in studying the collective properties of such phenomena. The problem has been addressed from two different but often complementary directions: lexicon-based solutions that rely on sentiment dictionaries (i.e., lists of words in which each token is annotated with an indication of the affective content it typically conveys) and machine learning solutions that automatically or semi-automatically learn to detect the affective content of text. In this chapter, we discuss a range of solutions and their strengths and weaknesses in different environments and settings. We conclude that based on the application environment as well as the desired output, different types of analyses are appropriate, with varying levels of predictive accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Refer to Chap. 1.5 of Pang and Lee (2008) for a detailed discussion about the terminology.

  2. 2.

    Most information retrieval systems and algorithms explicitly use this information in estimating the relevancy of a document in regard to a user query (Robertson et al. 2004).

  3. 3.

    Valence and arousal are discussed in more depth in Sect. 6.2.2.

  4. 4.

    In contrast, ‘unsupervised’ machine learning algorithms do not require annotated data.

  5. 5.

    http://www.bbc.co.uk/messageboards/.

  6. 6.

    http://www.digg.com.

  7. 7.

    The BBC message boards are closely administered and moderated, while Digg posts aren’t.

  8. 8.

    Studies (Bradley and Lang 1999) have shown that emotions can be characterised as “coincidence of values on a number of different strategic dimensions”, such as valence and arousal.

  9. 9.

    http://www.livejournal.com.

  10. 10.

    Stratification ensures that the splits for training and testing are as equal as possible for every class.

  11. 11.

    http://www.twitter.com.

  12. 12.

    The extraction took place in 2009, before the website changed its focus to promotion of music.

  13. 13.

    All the Weka .arff files for all datasets are available upon request.

  14. 14.

    The full list of discussion threads can be found in the Appendix material as http://doi.ieeecomputersociety.org/10.1109/T-AFFC.2012.26.

  15. 15.

    That may be due to the fact that by definition the geometric mean is less susceptible to outliers than the arithmetic mean.

  16. 16.

    http://www.cyberemotions.eu.

References

  • Ahn, J., Gobron, S., Silvestre, Q., Thalmann, D.: Asymmetrical facial expressions based on an advanced interpretation of two-dimensional russell’s emotional model. In: ENGAGE 2010, pp. 1–12 (2010)

    Google Scholar 

  • Asur, S., Huberman, B.A.: Predicting the future with social media. In: Huang, X.J., King, I., Raghavan, V., Rueger, S. (eds.) Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 01, pp. 492–499. IEEE Computer Society, Washington (2010). doi:10.1109/WI-IAT.2010.63

    Chapter  Google Scholar 

  • Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’10), Valetta, pp. 2200–2204 (2010)

    Google Scholar 

  • Barrett, L.F., Russell, J.A.: The structure of current affect: controversies and emerging consensus. Curr. Dir. Psychol. Sci. 8 (1), 10–14 (1999). doi:10.1111/1467-8721.00003

    Article  Google Scholar 

  • Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)

    MATH  Google Scholar 

  • Bradley, M.M., Lang, P.J.: Affective norms for English words (ANEW): instruction manual and affective ratings. Tech. Rep. C-1, University of Florida: Center for Research in Psychophysiology (1999)

    Google Scholar 

  • Carvalho, P., Sarmento, L., Silva, M.J., de Oliveira, E.: Clues for detecting irony in user-generated contents: oh…!! it’s “so easy”;-). In: Jiang, M., Yu, B. (eds.) Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion, pp. 53–56. ACM, New York (2009). doi:10.1145/1651461.1651471

    Chapter  Google Scholar 

  • Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  • Cornelius, R.R.: The Science of Emotion. Prentice Hall, Upper Saddle River (1996)

    Google Scholar 

  • Dodds, P., Danforth, C.: Measuring the happiness of large-scale written expression: songs, blogs, and presidents. J. Happiness Stud. 11 (4), 441–456 (2010). doi:10.1007/s10902-009-9150-9

    Article  Google Scholar 

  • Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J Mach Learn Res 9 (August), 1871–1874 (2008)

    MATH  Google Scholar 

  • Fox, E.: Emotion Science. Palgrave Macmillan, London (2008)

    Google Scholar 

  • González-Bailón, S., Banchs, R.E., Kaltenbrunner, A.: Emotions, public opinion, and U.S. presidential approval rates: a 5-year analysis of online political discussions. Hum. Commun. Res. 38 (2), 121–143 (2012). doi:10.1111/j.1468-2958.2011.01423.x

    Google Scholar 

  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: Theweka data mining software: an update. SIGKDD Explor. Newsl. 11 (1), 10–18 (2009). doi:10.1145/1656274.1656278

    Article  Google Scholar 

  • Jelinek, F., Merialdo, B., Roukos, S., Strauss, M.: A dynamic language model for speech recognition. In: Marcus, M.P. (ed.) Proceedings of the Workshop on Speech and Natural Language, pp. 293–295. Association for Computational Linguistics, Stroudsburg (1991). doi:10.3115/112405.112464

    Chapter  Google Scholar 

  • Jijkoun, V., de Rijke, M., Weerkamp, W. (2010) Generating focused topic-specific sentiment lexicons. In: Hajic, J., Carberry, S., Clark, S. (eds.) ACL 2010, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 585–594. Association for Computational Linguistics, Stroudsburg

    Google Scholar 

  • Joachims, T.: Making large-scale SVM learning practical. In: Schoelkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1999)

    Google Scholar 

  • John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Besnard, P., Hanks, S. (eds.) Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers, San Francisco (1995)

    Google Scholar 

  • Keerthi, S.S., Sundararajan, S., Chang, K.W., Hsieh, C.J., Lin, C.J.: A sequential dual method for large scale multi-class linear svms. In: Li, Y., Liu, B., Sarawagi, S. (eds.) Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 408–416. ACM, New York (2008). doi:10.1145/1401890.1401942

    Chapter  Google Scholar 

  • Kramer, A.D.: An unobtrusive behavioral model of “gross national happiness”. In: Mynatt, E., Fitzpatrick, G., Hudson, S., Edwards, K., Rodden, T. (eds.) CHI’10 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 287–290. ACM, New York (2010). doi:10.1145/1753326.1753369

    Google Scholar 

  • Le Cessie, S., Van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. J. R. Stat. Soc. C 41 (1), 191–201 (1992). doi:10.2307/2347628

    MATH  Google Scholar 

  • Lee, L., Pang, B.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Knight, K., Ng, H.T., Oflazer, K. (eds.) ACL 2005 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics, Stroudsburg (2005)

    Google Scholar 

  • MacDonald, C., Ounis, I.: The TREC Blogs06 collection: creating and analysing a blog test collection. Tech. Rep. TR-2006-24, Department of Computer Science, University of Glasgow (2006)

    Google Scholar 

  • MacDonald, C., Ounis, I., Soboroff, I.: Overview of the TREC 2008 Blog Track. In: The Sixteenth Text REtrieval Conference (TREC 2008) Proceedings, NIST Special Publication SP 500-277, p. 1 (2008)

    Google Scholar 

  • Manning, C.D., Schuetze, H.: Foundations of Statistical Natural Language Processing, 1st edn. MIT Press, Cambridge (1999)

    Google Scholar 

  • Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  • Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38 (11), 39–41 (1995). doi:10.1145/219717.219748

    Article  Google Scholar 

  • Mishne, G.: Experiments with mood classification in blog posts. In: Proceedings of ACM SIGIR 2005 Workshop on Stylistic Analysis of Text for Information Access (2005)

    Google Scholar 

  • Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  • Mitrović, M., Paltoglou, G., Tadić, B.: Networks and emotion-driven user communities at popular blogs. Eur. Phys. J. B 77 (4), 597–609 (2010). doi:10.1140/epjb/e2010-00279-x

    Article  ADS  Google Scholar 

  • Owsley, S., Sood, S., Hammond, K.J.: Domain specific affective classification of documents. In: Computational Approaches to Analyzing Weblogs, Papers from the 2006 AAAI Spring Symposium, Technical Report SS-06-03, pp. 181–183. AAAI Press, Menlo Park (2006)

    Google Scholar 

  • Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’10), Valetta, pp. 1320–1326 (2010)

    Google Scholar 

  • Paltoglou, G.: Sentiment analysis in social media. In: Agarwal, N., Wigand, R.T., Lim, M. (eds.) Online Collective Action: Dynamics of the Crowd in Social Media. Lecture Notes in Social Networks, pp. 3–18. Springer, Wien (2014). doi:10.1007/978-3-7091-1340-0_1

    Google Scholar 

  • Paltoglou, G., Buckley, K.: Subjectivity annotation of the microblog 2011 realtime adhoc relevance judgments. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) Advances in Information Retrieval, 35th European Conference on IR Research, ECIR 2013, Moscow, March 2013. Proceedings, pp. 344–355. Springer, Berlin/Heidelberg (2013). doi:10.1007/978-3-642-36973-5_29

    Google Scholar 

  • Paltoglou, G., Thelwall, M.: Twitter, Myspace, Digg: unsupervised sentiment analysis in social media. ACM Trans. Intell. Syst. Technol. 3 (4), 66:1–66:19 (2012). doi:10.1145/2337542.2337551

    Google Scholar 

  • Paltoglou, G., Thelwall, M.: Seeing stars of valence and arousal in blog posts. IEEE Trans. Affect. Comput. 4 (1), 116–123 (2013). doi:10.1109/T-AFFC.2012.36

    Article  Google Scholar 

  • Paltoglou, G., Thelwall, M., Buckely, K.: Online textual communication annotated with grades of emotion strength. In: Proc. EMOTION, pp. 25–31 (2010)

    Google Scholar 

  • Paltoglou, G., Theunis, M., Kappas, A., Thelwall, M.: Predicting emotional responses to long informal text. IEEE Trans. Affect. Comput. 4 (1), 106–115 (2013). doi:10.1109/T-AFFC.2012.26

    Article  Google Scholar 

  • Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2 (1–2), 1–135 (2008). doi:10.1561/1500000011

    Article  Google Scholar 

  • Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, EMNLP ’02, vol. 10, pp. 79–86. ACL, Stroudsburg (2002)

    Google Scholar 

  • Pennebaker, J.W., Francis, M.E.: Linguistic Inquiry and Word Count, 1st edn. Lawrence Erlbaum, Mahwah (1999)

    Google Scholar 

  • Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)

    Google Scholar 

  • Ponomareva, N., Thelwall, M.: Do neighbours help?: an exploration of graph-based algorithms for cross-domain sentiment classification. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, Jeju Island, pp. 655–665. ACL, Stroudsburg (2012)

    Google Scholar 

  • Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning, 1st edn. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  • Quirk, R., Greenbaum, S., Leech, G., Svartvik, J.: A Comprehensive Grammar of the English Language. Longman, New York (1985)

    Google Scholar 

  • Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)

    MathSciNet  MATH  Google Scholar 

  • Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management, pp. 42–49. ACM, New York (2004)

    Google Scholar 

  • Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39 (6), 1161–1178 (1980). doi:10.1037/h0077714

    Article  Google Scholar 

  • Russell, J.A.: Pancultural aspects of the human conceptual organization of emotions. J. Pers. Soc. Psychol. 45 (6), 1281–1288 (1983). doi:10.1037/0022-3514.45.6.1281

    Article  Google Scholar 

  • Scherer, K.R.: What are emotions? And how can they be measured? Soc. Sci. Inf. 44 (4), 695–729 (2005). doi:10.1177/0539018405058216

    Article  Google Scholar 

  • Schimmack, U.: Pleasure, displeasure, and mixed feelings: are semantic opposites mutually exclusive? Cognit. Emot. 15 (1), 81–97 (2001). doi:10.1080/02699930126097

    Article  Google Scholar 

  • Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34 (1), 1–47 (2002). doi:10.1145/505282.505283

    Article  Google Scholar 

  • Shimada, K., Endo, T.: Seeing several stars: a rating inference task for a document containing several evaluation criteria. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) Advances in Knowledge Discovery and Data Mining, 12th Pacific-Asia Conference, PAKDD 2008 Osaka, May 2008 Proceedings. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), vol. 5012, pp. 1006–1014. Springer, Berlin/Heidelberg (2008). doi:10.1007/978-3-540-68125-0_106

    Google Scholar 

  • Strapparava, C., Mihalcea, R.: Learning to identify emotions in text. In: Wainwright, R.L., Haddad, H. (eds.) Proceedings of the 2008 ACM Symposium on Applied Computing (SAC), pp. 1556–1560. ACM, New York (2008). doi:10.1145/1363686.1364052

    Chapter  Google Scholar 

  • Strapparava, C., Valitutti, A.: WordNet-Affect: an affective extension of WordNet. In: Lino, M.T., Xavier, M.F., Ferreira, F., Costa, R., Silva, R. (eds.) Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04), pp. 1083–1086. European Language Resources Association, Paris (2004)

    Google Scholar 

  • Thelwall, M., Wilkinson, D.: Public dialogs in social network sites: What is their purpose? J. Am. Soc. Inf. Sci. Technol. 61 (2), 392–404 (2010). doi:10.1002/asi.21241

    Google Scholar 

  • Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci. Technol. 61 (12), 2544–2558 (2010). doi:10.1002/asi.21416

    Article  Google Scholar 

  • Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment in twitter events. J. Am. Soc. Inf. Sci. Technol. 62 (2), 406–418 (2011). doi:10.1002/asi.21462

    Article  Google Scholar 

  • Whitelaw, C., Garg, N., Argamon, S.: Using appraisal groups for sentiment analysis. In: Herzog, O., Scheck, H.J., Fuhr, N., Chowdhury, A., Teiken, W. (eds.) Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 625–631. ACM, New York (2005). doi:10.1145/1099554.1099714

    Google Scholar 

  • Wiebe, J.M., Bruce, R.F., O’Hara, T.P.: Development and use of a gold-standard data set for subjectivity classifications. In: Dale, R., Church, K.W. (eds.) Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 246–253. Association for Computational Linguistics, Stroudsburg (1999). doi:10.3115/1034678.1034721

    Chapter  Google Scholar 

  • Witten, I.H., Bell, T.C.: The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Inf. Theory 37 (4), 1085–1094 (1991). doi:10.1109/18.87000

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by a European Union grant by the 7th Framework Programme, Theme 3: Science of complex systems for socially intelligent ICT. It is part of the CyberEmotions project (contract 231323).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Paltoglou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Paltoglou, G., Thelwall, M. (2017). Sensing Social Media: A Range of Approaches for Sentiment Analysis. In: Holyst, J. (eds) Cyberemotions. Understanding Complex Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-43639-5_6

Download citation

Publish with us

Policies and ethics