Skip to main content

You Are What You Tweet: A New Hybrid Model for Sentiment Analysis

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10358))

Abstract

The rise of social media has provided new opportunities to study human emotions through self-reported information such as text, emojis/emoticons, and geo-locations. Research has shown that hybrid models which integrate lexicons and machine learning methods can improve the accuracy of sentiment prediction. We propose the Normalized Difference Sentiment Index (NDSI) to identify frequently-occurring words that are predictive of positive or negative sentiments. Furthermore, we propose e-senti, a new hybrid model which combines 3 attributes (lexicons, a new NDSI word rank list, and tweet features) into a random forest classifier. We contribute to the methodology of sentiment analysis by introducing a model that is easy to implement, efficient, and accurate. We compare four widely used lexicons and find the AFINN lexicon most effective and efficient for our model. We test the e-senti model based on the sentiment140 data and tweets from Los Angeles County, California. Our results show that the maximum accuracy for the sentiment140 data is 86.1% and for our Los Angeles County data is 74.6%, outperforming most existing methods. Our future work will link the geo-tagged sentiment data to land use data to reveal how emotions and the built environment are connected.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of Twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38. Association for Computational Linguistics (2011)

    Google Scholar 

  2. Alessia, D., Ferri, F., Grifoni, P., Guzzo, T.: Approaches, tools and applications for sentiment analysis implementation. Int. J. Comput. Appl. 125(3), 26–33 (2015)

    Google Scholar 

  3. Barbera, P.: StreamR: access to Twitter streaming API via R (2014). R package version 0.2.1. https://CRAN.R-project.org/package=streamR

  4. Barbosa, L., Feng, J.: Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 36–44. Association for Computational Linguistics (2010)

    Google Scholar 

  5. Bradley, M.M., Lang, P.J.: Affective norms for English words (ANEW): instruction manual and affective ratings. Technical report C-1, The Center for Research in Psychophysiology, University of Florida (1999)

    Google Scholar 

  6. Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28(2), 15–21 (2013)

    Article  Google Scholar 

  7. Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 2, 15–21 (2013)

    Article  Google Scholar 

  8. Chanel, G., Kronegg, J., Grandjean, D., Pun, T.: Emotion assessment: arousal evaluation using EEG’s and peripheral physiological signals. In: Gunsel, B., Jain, A.K., Tekalp, A.M., Sankur, B. (eds.) MRCS 2006. LNCS, vol. 4105, pp. 530–537. Springer, Heidelberg (2006). doi:10.1007/11848035_70

    Chapter  Google Scholar 

  9. Dadvar, M., Hauff, C., de Jong, F.: Scope of negation detection in sentiment analysis. In: Proceedings of the Dutch-Belgian Information Retrieval Workshop, Amsterdam, pp. 16–20 (2011)

    Google Scholar 

  10. Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using Twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 241–249. Association for Computational Linguistics (2010)

    Google Scholar 

  11. Davidov, D., Tsur, O., Rappoport, A.: Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 107–116. Association for Computational Linguistics (2010)

    Google Scholar 

  12. Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 231–240. ACM (2008)

    Google Scholar 

  13. Galavotti, L., Nardi, V.J., Sebastiani, F., Simi, M.: Feature selection and negative evidence in automated text categorization. In: Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2000) (2000)

    Google Scholar 

  14. Ghiassi, M., Skinner, J., Zimbra, D.: Twitter brand sentiment analysis: a hybrid system using n-gram analysis and dynamic artificial neural network. Expert Syst. Appl. 40(16), 6266–6282 (2013)

    Article  Google Scholar 

  15. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1, 12 (2009)

    Google Scholar 

  16. Gupte, A., Joshi, S., Gadgul, P., Kadam, A.: Comparative study of classification algorithms used in sentiment analysis. Int. J. Comput. Sci. Inf. Technol. 5(5), 6261–6264 (2014)

    Google Scholar 

  17. Lima, A.C.E., de Castro, L.N., Corchado, J.M.: A polarity analysis framework for Twitter messages. Appl. Math. Comput. 270, 756–767 (2015)

    Google Scholar 

  18. Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 375–384. ACM (2009)

    Google Scholar 

  19. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  20. Maynard, D., Funk, A.: Automatic detection of political opinions in tweets. In: García-Castro, R., Fensel, D., Antoniou, G. (eds.) ESWC 2011. LNCS, vol. 7117, pp. 88–99. Springer, Heidelberg (2012). doi:10.1007/978-3-642-25953-1_8

    Chapter  Google Scholar 

  21. Mohammad, S., Turney, P.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)

    Article  MathSciNet  Google Scholar 

  22. Mohammad, S.M.: Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Emotion Measurement (2015)

    Google Scholar 

  23. Mudinas, A., Zhang, D., Levene, M.: Combining lexicon and learning based approaches for concept-level sentiment analysis. In: Proceedings of the 1st International Workshop on Issues of Sentiment Discovery and Opinion Mining, p. 5 (2012)

    Google Scholar 

  24. Nielsen, F.: A new anew: evaluation of a word list for sentiment analysis in microblogs. arXiv:1103.2903 (2011)

  25. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  26. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)

    Google Scholar 

  27. Poria, S., Cambria, E., Winterstein, G., Huang, G.B.: Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl. Based Syst. 69, 45–63 (2014)

    Article  Google Scholar 

  28. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  29. Prabowo, R., Thelwall, M.: Sentiment analysis: a combined approach. J. Informetr. 3(2), 143–157 (2009)

    Article  Google Scholar 

  30. Cambridge University Press: Cambridge online dictionary. Accessed 1 Mar 2017

    Google Scholar 

  31. Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Based Syst. 89, 14–46 (2015)

    Article  Google Scholar 

  32. Sabatinelli, D., Keil, A., Frank, D.W., Lang, P.J.: Emotional perception: correspondence of early and late event-related potentials with cortical and subcortical functional MRI. Biol. Psychol. 92(3), 513–519 (2013)

    Article  Google Scholar 

  33. Saif, H., Fernandez, M., He, Y., Alani, H.: Alleviating data sparsity for Twitter sentiment analysis. In: 1st Interantional Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013) (2013)

    Google Scholar 

  34. Saif, H., He, Y., Alani, H.: Semantic smoothing for twitter sentiment analysis. In: Proceeding of the 10th International Semantic Web Conference (ISWC) (2011)

    Google Scholar 

  35. Saif, H., He, Y., Alani, H.: Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold. In: CEUR Workshop Proceedings, vol. 838 (2012)

    Google Scholar 

  36. Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.: Twitter polarity classification with label propagation over lexical links and the follower graph. In: Proceedings of the First workshop on Unsupervised Learning in NLP, pp. 53–63 (2011)

    Google Scholar 

  37. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)

    Article  Google Scholar 

  38. Tan, P.N., Steinbach, M., Kumar, V., et al.: Introduction to Data Mining, vol. 1. Pearson Addison Wesley, Boston (2006)

    Google Scholar 

  39. Thayer, J.F., Åhs, F., Fredrikson, M., Sollers, J.J., Wager, T.D.: A meta-analysis of heart rate variability and neuroimaging studies: implications for heart rate variability as a marker of stress and health. Neurosci. Biobehav. Rev. 36(2), 747–756 (2012)

    Article  Google Scholar 

  40. Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424 (2002)

    Google Scholar 

  41. Valstar, M.F., Mehu, M., Jiang, B., Pantic, M., Scherer, K.: Meta-analysis of the first facial expression recognition challenge. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(4), 966–979 (2012)

    Article  Google Scholar 

  42. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Association for Computational Linguistics (2005)

    Google Scholar 

  43. Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)

    Article  Google Scholar 

  44. Xiang, B., Zhou, L., Reuters, T.: Improving Twitter sentiment analysis with topic-based mixture modeling and semi-supervised training. In: ACL, Maryland, pp. 434–439 (2014)

    Google Scholar 

  45. Zhou, H., Chen, L., Shi, F., Huang, D.: Learning bilingual sentiment word embeddings for cross-language sentiment classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 430–440 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Huang, A., Ebert, D., Rider, P. (2017). You Are What You Tweet: A New Hybrid Model for Sentiment Analysis. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62416-7_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62415-0

  • Online ISBN: 978-3-319-62416-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics