Sensing Social Media: A Range of Approaches for Sentiment Analysis

Paltoglou, Georgios; Thelwall, Mike

doi:10.1007/978-3-319-43639-5_6

Georgios Paltoglou²⁰ &
Mike Thelwall²⁰

Part of the book series: Understanding Complex Systems ((UCS))

1581 Accesses
7 Citations
1 Altmetric

Abstract

Sentiment analysis deals with the computational detection and extraction of opinions, beliefs and emotions in written text. It combines theories and methodologies from a diverse set of scientific domains, such as psychology, natural language processing and machine learning. It fulfils the very important role of transforming the unstructured textual communication between social media users into quantifiable and informed estimations of expressed sentiment, which can subsequently be used by physicists, sociologists, complex system experts in studying the collective properties of such phenomena. The problem has been addressed from two different but often complementary directions: lexicon-based solutions that rely on sentiment dictionaries (i.e., lists of words in which each token is annotated with an indication of the affective content it typically conveys) and machine learning solutions that automatically or semi-automatically learn to detect the affective content of text. In this chapter, we discuss a range of solutions and their strengths and weaknesses in different environments and settings. We conclude that based on the application environment as well as the desired output, different types of analyses are appropriate, with varying levels of predictive accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Refer to Chap. 1.5 of Pang and Lee (2008) for a detailed discussion about the terminology.
2.
Most information retrieval systems and algorithms explicitly use this information in estimating the relevancy of a document in regard to a user query (Robertson et al. 2004).
3.
Valence and arousal are discussed in more depth in Sect. 6.2.2.
4.
In contrast, ‘unsupervised’ machine learning algorithms do not require annotated data.
5.
http://www.bbc.co.uk/messageboards/.
6.
http://www.digg.com.
7.
The BBC message boards are closely administered and moderated, while Digg posts aren’t.
8.
Studies (Bradley and Lang 1999) have shown that emotions can be characterised as “coincidence of values on a number of different strategic dimensions”, such as valence and arousal.
9.
http://www.livejournal.com.
10.
Stratification ensures that the splits for training and testing are as equal as possible for every class.
11.
http://www.twitter.com.
12.
The extraction took place in 2009, before the website changed its focus to promotion of music.
13.
All the Weka .arff files for all datasets are available upon request.
14.
The full list of discussion threads can be found in the Appendix material as http://doi.ieeecomputersociety.org/10.1109/T-AFFC.2012.26.
15.
That may be due to the fact that by definition the geometric mean is less susceptible to outliers than the arithmetic mean.
16.
http://www.cyberemotions.eu.

References

Ahn, J., Gobron, S., Silvestre, Q., Thalmann, D.: Asymmetrical facial expressions based on an advanced interpretation of two-dimensional russell’s emotional model. In: ENGAGE 2010, pp. 1–12 (2010)
Google Scholar
Asur, S., Huberman, B.A.: Predicting the future with social media. In: Huang, X.J., King, I., Raghavan, V., Rueger, S. (eds.) Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 01, pp. 492–499. IEEE Computer Society, Washington (2010). doi:10.1109/WI-IAT.2010.63
Chapter Google Scholar
Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’10), Valetta, pp. 2200–2204 (2010)
Google Scholar
Barrett, L.F., Russell, J.A.: The structure of current affect: controversies and emerging consensus. Curr. Dir. Psychol. Sci. 8 (1), 10–14 (1999). doi:10.1111/1467-8721.00003
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)
MATH Google Scholar
Bradley, M.M., Lang, P.J.: Affective norms for English words (ANEW): instruction manual and affective ratings. Tech. Rep. C-1, University of Florida: Center for Research in Psychophysiology (1999)
Google Scholar
Carvalho, P., Sarmento, L., Silva, M.J., de Oliveira, E.: Clues for detecting irony in user-generated contents: oh…!! it’s “so easy”;-). In: Jiang, M., Yu, B. (eds.) Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion, pp. 53–56. ACM, New York (2009). doi:10.1145/1651461.1651471
Chapter Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cornelius, R.R.: The Science of Emotion. Prentice Hall, Upper Saddle River (1996)
Google Scholar
Dodds, P., Danforth, C.: Measuring the happiness of large-scale written expression: songs, blogs, and presidents. J. Happiness Stud. 11 (4), 441–456 (2010). doi:10.1007/s10902-009-9150-9
Article Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J Mach Learn Res 9 (August), 1871–1874 (2008)
MATH Google Scholar
Fox, E.: Emotion Science. Palgrave Macmillan, London (2008)
Google Scholar
González-Bailón, S., Banchs, R.E., Kaltenbrunner, A.: Emotions, public opinion, and U.S. presidential approval rates: a 5-year analysis of online political discussions. Hum. Commun. Res. 38 (2), 121–143 (2012). doi:10.1111/j.1468-2958.2011.01423.x
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: Theweka data mining software: an update. SIGKDD Explor. Newsl. 11 (1), 10–18 (2009). doi:10.1145/1656274.1656278
Article Google Scholar
Jelinek, F., Merialdo, B., Roukos, S., Strauss, M.: A dynamic language model for speech recognition. In: Marcus, M.P. (ed.) Proceedings of the Workshop on Speech and Natural Language, pp. 293–295. Association for Computational Linguistics, Stroudsburg (1991). doi:10.3115/112405.112464
Chapter Google Scholar
Jijkoun, V., de Rijke, M., Weerkamp, W. (2010) Generating focused topic-specific sentiment lexicons. In: Hajic, J., Carberry, S., Clark, S. (eds.) ACL 2010, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 585–594. Association for Computational Linguistics, Stroudsburg
Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Schoelkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1999)
Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Besnard, P., Hanks, S. (eds.) Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers, San Francisco (1995)
Google Scholar
Keerthi, S.S., Sundararajan, S., Chang, K.W., Hsieh, C.J., Lin, C.J.: A sequential dual method for large scale multi-class linear svms. In: Li, Y., Liu, B., Sarawagi, S. (eds.) Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 408–416. ACM, New York (2008). doi:10.1145/1401890.1401942
Chapter Google Scholar
Kramer, A.D.: An unobtrusive behavioral model of “gross national happiness”. In: Mynatt, E., Fitzpatrick, G., Hudson, S., Edwards, K., Rodden, T. (eds.) CHI’10 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 287–290. ACM, New York (2010). doi:10.1145/1753326.1753369
Google Scholar
Le Cessie, S., Van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. J. R. Stat. Soc. C 41 (1), 191–201 (1992). doi:10.2307/2347628
MATH Google Scholar
Lee, L., Pang, B.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Knight, K., Ng, H.T., Oflazer, K. (eds.) ACL 2005 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics, Stroudsburg (2005)
Google Scholar
MacDonald, C., Ounis, I.: The TREC Blogs06 collection: creating and analysing a blog test collection. Tech. Rep. TR-2006-24, Department of Computer Science, University of Glasgow (2006)
Google Scholar
MacDonald, C., Ounis, I., Soboroff, I.: Overview of the TREC 2008 Blog Track. In: The Sixteenth Text REtrieval Conference (TREC 2008) Proceedings, NIST Special Publication SP 500-277, p. 1 (2008)
Google Scholar
Manning, C.D., Schuetze, H.: Foundations of Statistical Natural Language Processing, 1st edn. MIT Press, Cambridge (1999)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38 (11), 39–41 (1995). doi:10.1145/219717.219748
Article Google Scholar
Mishne, G.: Experiments with mood classification in blog posts. In: Proceedings of ACM SIGIR 2005 Workshop on Stylistic Analysis of Text for Information Access (2005)
Google Scholar
Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill, New York (1997)
MATH Google Scholar
Mitrović, M., Paltoglou, G., Tadić, B.: Networks and emotion-driven user communities at popular blogs. Eur. Phys. J. B 77 (4), 597–609 (2010). doi:10.1140/epjb/e2010-00279-x
Article ADS Google Scholar
Owsley, S., Sood, S., Hammond, K.J.: Domain specific affective classification of documents. In: Computational Approaches to Analyzing Weblogs, Papers from the 2006 AAAI Spring Symposium, Technical Report SS-06-03, pp. 181–183. AAAI Press, Menlo Park (2006)
Google Scholar
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’10), Valetta, pp. 1320–1326 (2010)
Google Scholar
Paltoglou, G.: Sentiment analysis in social media. In: Agarwal, N., Wigand, R.T., Lim, M. (eds.) Online Collective Action: Dynamics of the Crowd in Social Media. Lecture Notes in Social Networks, pp. 3–18. Springer, Wien (2014). doi:10.1007/978-3-7091-1340-0_1
Google Scholar
Paltoglou, G., Buckley, K.: Subjectivity annotation of the microblog 2011 realtime adhoc relevance judgments. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) Advances in Information Retrieval, 35th European Conference on IR Research, ECIR 2013, Moscow, March 2013. Proceedings, pp. 344–355. Springer, Berlin/Heidelberg (2013). doi:10.1007/978-3-642-36973-5_29
Google Scholar
Paltoglou, G., Thelwall, M.: Twitter, Myspace, Digg: unsupervised sentiment analysis in social media. ACM Trans. Intell. Syst. Technol. 3 (4), 66:1–66:19 (2012). doi:10.1145/2337542.2337551
Google Scholar
Paltoglou, G., Thelwall, M.: Seeing stars of valence and arousal in blog posts. IEEE Trans. Affect. Comput. 4 (1), 116–123 (2013). doi:10.1109/T-AFFC.2012.36
Article Google Scholar
Paltoglou, G., Thelwall, M., Buckely, K.: Online textual communication annotated with grades of emotion strength. In: Proc. EMOTION, pp. 25–31 (2010)
Google Scholar
Paltoglou, G., Theunis, M., Kappas, A., Thelwall, M.: Predicting emotional responses to long informal text. IEEE Trans. Affect. Comput. 4 (1), 106–115 (2013). doi:10.1109/T-AFFC.2012.26
Article Google Scholar
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2 (1–2), 1–135 (2008). doi:10.1561/1500000011
Article Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, EMNLP ’02, vol. 10, pp. 79–86. ACL, Stroudsburg (2002)
Google Scholar
Pennebaker, J.W., Francis, M.E.: Linguistic Inquiry and Word Count, 1st edn. Lawrence Erlbaum, Mahwah (1999)
Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Google Scholar
Ponomareva, N., Thelwall, M.: Do neighbours help?: an exploration of graph-based algorithms for cross-domain sentiment classification. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, Jeju Island, pp. 655–665. ACL, Stroudsburg (2012)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning, 1st edn. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Quirk, R., Greenbaum, S., Leech, G., Svartvik, J.: A Comprehensive Grammar of the English Language. Longman, New York (1985)
Google Scholar
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)
MathSciNet MATH Google Scholar
Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management, pp. 42–49. ACM, New York (2004)
Google Scholar
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39 (6), 1161–1178 (1980). doi:10.1037/h0077714
Article Google Scholar
Russell, J.A.: Pancultural aspects of the human conceptual organization of emotions. J. Pers. Soc. Psychol. 45 (6), 1281–1288 (1983). doi:10.1037/0022-3514.45.6.1281
Article Google Scholar
Scherer, K.R.: What are emotions? And how can they be measured? Soc. Sci. Inf. 44 (4), 695–729 (2005). doi:10.1177/0539018405058216
Article Google Scholar
Schimmack, U.: Pleasure, displeasure, and mixed feelings: are semantic opposites mutually exclusive? Cognit. Emot. 15 (1), 81–97 (2001). doi:10.1080/02699930126097
Article Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34 (1), 1–47 (2002). doi:10.1145/505282.505283
Article Google Scholar
Shimada, K., Endo, T.: Seeing several stars: a rating inference task for a document containing several evaluation criteria. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) Advances in Knowledge Discovery and Data Mining, 12th Pacific-Asia Conference, PAKDD 2008 Osaka, May 2008 Proceedings. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), vol. 5012, pp. 1006–1014. Springer, Berlin/Heidelberg (2008). doi:10.1007/978-3-540-68125-0_106
Google Scholar
Strapparava, C., Mihalcea, R.: Learning to identify emotions in text. In: Wainwright, R.L., Haddad, H. (eds.) Proceedings of the 2008 ACM Symposium on Applied Computing (SAC), pp. 1556–1560. ACM, New York (2008). doi:10.1145/1363686.1364052
Chapter Google Scholar
Strapparava, C., Valitutti, A.: WordNet-Affect: an affective extension of WordNet. In: Lino, M.T., Xavier, M.F., Ferreira, F., Costa, R., Silva, R. (eds.) Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04), pp. 1083–1086. European Language Resources Association, Paris (2004)
Google Scholar
Thelwall, M., Wilkinson, D.: Public dialogs in social network sites: What is their purpose? J. Am. Soc. Inf. Sci. Technol. 61 (2), 392–404 (2010). doi:10.1002/asi.21241
Google Scholar
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci. Technol. 61 (12), 2544–2558 (2010). doi:10.1002/asi.21416
Article Google Scholar
Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment in twitter events. J. Am. Soc. Inf. Sci. Technol. 62 (2), 406–418 (2011). doi:10.1002/asi.21462
Article Google Scholar
Whitelaw, C., Garg, N., Argamon, S.: Using appraisal groups for sentiment analysis. In: Herzog, O., Scheck, H.J., Fuhr, N., Chowdhury, A., Teiken, W. (eds.) Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 625–631. ACM, New York (2005). doi:10.1145/1099554.1099714
Google Scholar
Wiebe, J.M., Bruce, R.F., O’Hara, T.P.: Development and use of a gold-standard data set for subjectivity classifications. In: Dale, R., Church, K.W. (eds.) Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 246–253. Association for Computational Linguistics, Stroudsburg (1999). doi:10.3115/1034678.1034721
Chapter Google Scholar
Witten, I.H., Bell, T.C.: The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Inf. Theory 37 (4), 1085–1094 (1991). doi:10.1109/18.87000
Article Google Scholar

Download references

Acknowledgements

This work was supported by a European Union grant by the 7th Framework Programme, Theme 3: Science of complex systems for socially intelligent ICT. It is part of the CyberEmotions project (contract 231323).

Author information

Authors and Affiliations

School of Mathematics and Computer Science, University of Wolverhampton, Wulfruna Street, Wolverhampton, WV1 1LY, UK
Georgios Paltoglou & Mike Thelwall

Authors

Georgios Paltoglou
View author publications
You can also search for this author in PubMed Google Scholar
Mike Thelwall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios Paltoglou .

Editor information

Editors and Affiliations

Faculty of Physics, Warsaw University of Technology, Warsaw, Poland
Janusz A. Holyst

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Paltoglou, G., Thelwall, M. (2017). Sensing Social Media: A Range of Approaches for Sentiment Analysis. In: Holyst, J. (eds) Cyberemotions. Understanding Complex Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-43639-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-43639-5_6
Published: 25 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43637-1
Online ISBN: 978-3-319-43639-5
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics