You Are What You Tweet: A New Hybrid Model for Sentiment Analysis

Huang, Arthur; Ebert, David; Rider, Parker

doi:10.1007/978-3-319-62416-7_29

Arthur Huang¹⁴,
David Ebert¹⁴ &
Parker Rider¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10358))

Included in the following conference series:

International Conference on Machine Learning and Data Mining in Pattern Recognition

3900 Accesses
3 Citations

Abstract

The rise of social media has provided new opportunities to study human emotions through self-reported information such as text, emojis/emoticons, and geo-locations. Research has shown that hybrid models which integrate lexicons and machine learning methods can improve the accuracy of sentiment prediction. We propose the Normalized Difference Sentiment Index (NDSI) to identify frequently-occurring words that are predictive of positive or negative sentiments. Furthermore, we propose e-senti, a new hybrid model which combines 3 attributes (lexicons, a new NDSI word rank list, and tweet features) into a random forest classifier. We contribute to the methodology of sentiment analysis by introducing a model that is easy to implement, efficient, and accurate. We compare four widely used lexicons and find the AFINN lexicon most effective and efficient for our model. We test the e-senti model based on the sentiment140 data and tweets from Los Angeles County, California. Our results show that the maximum accuracy for the sentiment140 data is 86.1% and for our Los Angeles County data is 74.6%, outperforming most existing methods. Our future work will link the geo-tagged sentiment data to land use data to reveal how emotions and the built environment are connected.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of Twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38. Association for Computational Linguistics (2011)
Google Scholar
Alessia, D., Ferri, F., Grifoni, P., Guzzo, T.: Approaches, tools and applications for sentiment analysis implementation. Int. J. Comput. Appl. 125(3), 26–33 (2015)
Google Scholar
Barbera, P.: StreamR: access to Twitter streaming API via R (2014). R package version 0.2.1. https://CRAN.R-project.org/package=streamR
Barbosa, L., Feng, J.: Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 36–44. Association for Computational Linguistics (2010)
Google Scholar
Bradley, M.M., Lang, P.J.: Affective norms for English words (ANEW): instruction manual and affective ratings. Technical report C-1, The Center for Research in Psychophysiology, University of Florida (1999)
Google Scholar
Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28(2), 15–21 (2013)
Article Google Scholar
Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 2, 15–21 (2013)
Article Google Scholar
Chanel, G., Kronegg, J., Grandjean, D., Pun, T.: Emotion assessment: arousal evaluation using EEG’s and peripheral physiological signals. In: Gunsel, B., Jain, A.K., Tekalp, A.M., Sankur, B. (eds.) MRCS 2006. LNCS, vol. 4105, pp. 530–537. Springer, Heidelberg (2006). doi:10.1007/11848035_70
Chapter Google Scholar
Dadvar, M., Hauff, C., de Jong, F.: Scope of negation detection in sentiment analysis. In: Proceedings of the Dutch-Belgian Information Retrieval Workshop, Amsterdam, pp. 16–20 (2011)
Google Scholar
Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using Twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 241–249. Association for Computational Linguistics (2010)
Google Scholar
Davidov, D., Tsur, O., Rappoport, A.: Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 107–116. Association for Computational Linguistics (2010)
Google Scholar
Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 231–240. ACM (2008)
Google Scholar
Galavotti, L., Nardi, V.J., Sebastiani, F., Simi, M.: Feature selection and negative evidence in automated text categorization. In: Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2000) (2000)
Google Scholar
Ghiassi, M., Skinner, J., Zimbra, D.: Twitter brand sentiment analysis: a hybrid system using n-gram analysis and dynamic artificial neural network. Expert Syst. Appl. 40(16), 6266–6282 (2013)
Article Google Scholar
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1, 12 (2009)
Google Scholar
Gupte, A., Joshi, S., Gadgul, P., Kadam, A.: Comparative study of classification algorithms used in sentiment analysis. Int. J. Comput. Sci. Inf. Technol. 5(5), 6261–6264 (2014)
Google Scholar
Lima, A.C.E., de Castro, L.N., Corchado, J.M.: A polarity analysis framework for Twitter messages. Appl. Math. Comput. 270, 756–767 (2015)
Google Scholar
Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 375–384. ACM (2009)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)
MATH Google Scholar
Maynard, D., Funk, A.: Automatic detection of political opinions in tweets. In: García-Castro, R., Fensel, D., Antoniou, G. (eds.) ESWC 2011. LNCS, vol. 7117, pp. 88–99. Springer, Heidelberg (2012). doi:10.1007/978-3-642-25953-1_8
Chapter Google Scholar
Mohammad, S., Turney, P.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)
Article MathSciNet Google Scholar
Mohammad, S.M.: Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Emotion Measurement (2015)
Google Scholar
Mudinas, A., Zhang, D., Levene, M.: Combining lexicon and learning based approaches for concept-level sentiment analysis. In: Proceedings of the 1st International Workshop on Issues of Sentiment Discovery and Opinion Mining, p. 5 (2012)
Google Scholar
Nielsen, F.: A new anew: evaluation of a word list for sentiment analysis in microblogs. arXiv:1103.2903 (2011)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)
Article Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)
Google Scholar
Poria, S., Cambria, E., Winterstein, G., Huang, G.B.: Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl. Based Syst. 69, 45–63 (2014)
Article Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Prabowo, R., Thelwall, M.: Sentiment analysis: a combined approach. J. Informetr. 3(2), 143–157 (2009)
Article Google Scholar
Cambridge University Press: Cambridge online dictionary. Accessed 1 Mar 2017
Google Scholar
Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Based Syst. 89, 14–46 (2015)
Article Google Scholar
Sabatinelli, D., Keil, A., Frank, D.W., Lang, P.J.: Emotional perception: correspondence of early and late event-related potentials with cortical and subcortical functional MRI. Biol. Psychol. 92(3), 513–519 (2013)
Article Google Scholar
Saif, H., Fernandez, M., He, Y., Alani, H.: Alleviating data sparsity for Twitter sentiment analysis. In: 1st Interantional Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013) (2013)
Google Scholar
Saif, H., He, Y., Alani, H.: Semantic smoothing for twitter sentiment analysis. In: Proceeding of the 10th International Semantic Web Conference (ISWC) (2011)
Google Scholar
Saif, H., He, Y., Alani, H.: Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold. In: CEUR Workshop Proceedings, vol. 838 (2012)
Google Scholar
Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.: Twitter polarity classification with label propagation over lexical links and the follower graph. In: Proceedings of the First workshop on Unsupervised Learning in NLP, pp. 53–63 (2011)
Google Scholar
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)
Article Google Scholar
Tan, P.N., Steinbach, M., Kumar, V., et al.: Introduction to Data Mining, vol. 1. Pearson Addison Wesley, Boston (2006)
Google Scholar
Thayer, J.F., Åhs, F., Fredrikson, M., Sollers, J.J., Wager, T.D.: A meta-analysis of heart rate variability and neuroimaging studies: implications for heart rate variability as a marker of stress and health. Neurosci. Biobehav. Rev. 36(2), 747–756 (2012)
Article Google Scholar
Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424 (2002)
Google Scholar
Valstar, M.F., Mehu, M., Jiang, B., Pantic, M., Scherer, K.: Meta-analysis of the first facial expression recognition challenge. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(4), 966–979 (2012)
Article Google Scholar
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Association for Computational Linguistics (2005)
Google Scholar
Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)
Article Google Scholar
Xiang, B., Zhou, L., Reuters, T.: Improving Twitter sentiment analysis with topic-based mixture modeling and semi-supervised training. In: ACL, Maryland, pp. 434–439 (2014)
Google Scholar
Zhou, H., Chen, L., Shi, F., Huang, D.: Learning bilingual sentiment word embeddings for cross-language sentiment classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 430–440 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Engineering and Computer Science, Tarleton State University, Stephenville, TX, 76402, USA
Arthur Huang, David Ebert & Parker Rider

Authors

Arthur Huang
View author publications
You can also search for this author in PubMed Google Scholar
David Ebert
View author publications
You can also search for this author in PubMed Google Scholar
Parker Rider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arthur Huang .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Sachsen, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, A., Ebert, D., Rider, P. (2017). You Are What You Tweet: A New Hybrid Model for Sentiment Analysis. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-62416-7_29
Published: 02 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics