Electronic Commerce Research

, Volume 18, Issue 1, pp 181–199 | Cite as

A model for sentiment and emotion analysis of unstructured social media text

  • Jitendra Kumar Rout
  • Kim-Kwang Raymond Choo
  • Amiya Kumar Dash
  • Sambit Bakshi
  • Sanjay Kumar Jena
  • Karen L. Williams


Sentiment analysis has applications in diverse contexts such as in the gathering and analysis of opinions of individuals about various products, issues, social, and political events. Understanding public opinion can help improve decision making. Opinion mining is a way of retrieving information via search engines, blogs, microblogs and social networks. Individual opinions are unique to each person, and Twitter tweets are an invaluable source of this type of data. However, the huge volume and unstructured nature of text/opinion data pose a challenge to analyzing the data efficiently. Accordingly, proficient algorithms/computational strategies are required for mining and condensing tweets as well as finding sentiment bearing words. Most existing computational methods/models/algorithms in the literature for identifying sentiments from such unstructured data rely on machine learning techniques with the bag-of-word approach as their basis. In this work, we use both unsupervised and supervised approaches on various datasets. Unsupervised approach is being used for the automatic identification of sentiment for tweets acquired from Twitter public domain. Different machine learning algorithms such as Multinomial Naive Bayes (MNB), Maximum Entropy and Support Vector Machines are applied for sentiment identification of tweets as well as to examine the effectiveness of various feature combinations. In our experiment on tweets, we achieve an accuracy of 80.68% using the proposed unsupervised approach, in comparison to the lexicon based approach (the latter gives an accuracy of 75.20%). In our experiments, the supervised approach where we combine unigram, bigram and Part-of-Speech as feature is efficient in finding emotion and sentiment of unstructured data. For short message services, using the unigram feature with MNB classifier allows us to achieve an accuracy of 67%.


Sentiment analysis Bag-of-words Lexicon Laplace smoothing Parts-of-Speech (POS) Machine learning 



This research is partially supported by the following projects: (1) Information Security Education & Awareness Project (Phase II), Ministry of Electronics and Information Technology (MeitY), Government of India, and (2) Grant No. ETI/359/2014 by Fund for Improvement of S&T Infrastructure in Universities and Higher Educational Institutions (FIST) Program 2016, Department of Science and Technology, Government of India.


  1. 1.
    Bikel, D. M., & Sorensen, J. (2007). If we want your opinion. In: International conference on semantic computing (ICSC 2007) (pp. 493–500). doi: 10.1109/ICSC.2007.81.
  2. 2.
    Cambria, E., Schuller, B., Xia, Y., & Havasi, C. (2013). New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems, 28(2), 15–21. doi: 10.1109/MIS.2013.30.CrossRefGoogle Scholar
  3. 3.
    Chen, R., & Xu, W. (2016). The determinants of online customer ratings: A combined domain ontology and topic text analytics approach. Electronic Commerce Research. doi: 10.1007/s10660-016-9243-6.Google Scholar
  4. 4.
    Ding, X., Liu, B., & Yu, P. S. (2008). A holistic lexicon-based approach to opinion mining. In: Proceedings of the 2008 international conference on web search and data mining (pp. 231–240). doi: 10.1145/1341531.1341561.
  5. 5.
    Esuli, A., & Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. In: Proceedings of 5th language resources and evaluation (Vol. 6, pp. 417–422).Google Scholar
  6. 6.
    Fei, G., Liu, B., Hsu, M., Castellanos, M., & Ghosh, R. (2012). A dictionary-based approach to identifying aspects implied by adjectives for opinion mining. In: Proceedings of 24th international conference on computational linguistics (p. 309).Google Scholar
  7. 7.
    Feldman, R., Fresco, M., Goldenberg, J., Netzer, O., & Ungar, L. (2007). Extracting product comparisons from discussion boards. In: Seventh IEEE international conference on data mining (ICDM 2007) (pp. 469–474). doi: 10.1109/ICDM.2007.27.
  8. 8.
    Godbole, N., Srinivasaiah, M., & Skiena, S. (2007). Large-scale sentiment analysis for news and blogs. Proceedings of the international conference on weblogs and social media (ICWSM), 7(21), 219–222.Google Scholar
  9. 9.
    Hamouda, A., & Rohaim, M. (2011). Reviews classification using sentiwordnet lexicon. In: World congress on computer science and information technology.Google Scholar
  10. 10.
    Jindal, N., & Liu, B. (2006). Mining comparative sentences and relations. In: Proceedings of the 21st national conference on artificial intelligence (Vol. 2, pp. 1331–1336).Google Scholar
  11. 11.
    Van de Kauter, M., Breesch, D., & Hoste, V. (2015). Fine-grained analysis of explicit and implicit sentiment in financial news articles. Expert Systems with Applications, 42(11), 4999–5010. doi: 10.1016/j.eswa.2015.02.007.CrossRefGoogle Scholar
  12. 12.
    Li, J., Fong, S., Zhuang, Y., & Khoury, R. (2015). Hierarchical classification in text mining for sentiment analysis of online news. Soft Computing, 20, 3411–3420. doi: 10.1007/s00500-015-1812-4.CrossRefGoogle Scholar
  13. 13.
    Li, Y., Qin, Z., Xu, W., & Guo, J. (2015). A holistic model of mining product aspects and associated sentiments from online reviews. Multimedia Tools and Applications, 74(23), 10177–10194. doi: 10.1007/s11042-014-2158-0.CrossRefGoogle Scholar
  14. 14.
    Liu, B. (2010). Sentiment analysis and subjectivity. Handbook of Natural Language Processing, 2, 627–666.Google Scholar
  15. 15.
    Liu, B. (2011). Opinion mining and sentiment analysis. In: Web data mining: Exploring hyperlinks, contents, and usage data (pp. 459–526). doi: 10.1007/978-3-642-19460-3_11.
  16. 16.
    Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167. doi: 10.2200/S00416ED1V01Y201204HLT016.CrossRefGoogle Scholar
  17. 17.
    Liu, P., Gulla, J. A., & Zhang, L. (2016). Dynamic topic-based sentiment analysis of large-scale online news. In: Proceedings of the 17th international conference on web information systems engineering (pp. 3–18). doi: 10.1007/978-3-319-48743-4_1.
  18. 18.
    Ma, Y., Chen, G., & Wei, Q. (2017). Finding users preferences from large-scale online reviews for personalized recommendation. Electronic Commerce Research, 17(1), 3–29. doi: 10.1007/s10660-016-9240-9.CrossRefGoogle Scholar
  19. 19.
    Mo, S. Y. K., Liu, A., & Yang, S. Y. (2016). News sentiment to market impact and its feedback effect. Environment Systems and Decisions, 36(2), 158–166. doi: 10.1007/s10669-016-9590-9.CrossRefGoogle Scholar
  20. 20.
    Montoyo, A., MartíNez-Barco, P., & Balahur, A. (2012). Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments. Decision Support Systems, 53(4), 675–679. doi: 10.1016/j.dss.2012.05.022.CrossRefGoogle Scholar
  21. 21.
    Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., & Ngo, D. C. L. (2015). Text mining of news-headlines for forex market prediction: A multi-layer dimension reduction algorithm with semantics and sentiment. Expert Systems with Applications, 42(1), 306–324. doi: 10.1016/j.eswa.2014.08.004.CrossRefGoogle Scholar
  22. 22.
    Ohana, B. (2009). Opinion mining with the sentwordnet lexical resource. M.Sc. dissertation, Dublin Institute of Technology.Google Scholar
  23. 23.
    Pang, B., & Lee, L. (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for computational linguistics (p. 271). doi: 10.3115/1218955.1218990.
  24. 24.
    Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135. doi: 10.1561/1500000011.CrossRefGoogle Scholar
  25. 25.
    Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing (Vol. 10, pp. 79–86). doi: 10.3115/1118693.1118704.
  26. 26.
    Parkhe, V., & Biswas, B. (2016). Sentiment analysis of movie reviews: Finding most important movie aspects using driving factors. Soft Computing, 20(9), 3373–3379. doi: 10.1007/s00500-015-1779-1.CrossRefGoogle Scholar
  27. 27.
    Peng, J., Choo, K. K. R., & Ashman, H. (2016). Astroturfing detection in social media: Using binary n-gram analysis for authorship attribution. In: Proceedings of the 15th IEEE international conference on trust, security and privacy in computing and communications (TrustCom 2016) (pp. 121–1286).Google Scholar
  28. 28.
    Peng, J., Choo, K. K. R., & Ashman, H. (2016). Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. Journal of Network and Computer Applications, 70, 171–182. doi: 10.1016/j.jnca.2016.04.001.CrossRefGoogle Scholar
  29. 29.
    Peng, J., Detchon, S., Choo, K. K. R., & Ashman, H. (2016). Astroturfing detection in social media: A binary n-gram-based approach. Concurrency and Computation: Practice and Experience. doi: 10.1002/cpe.4013.Google Scholar
  30. 30.
    Pröllochs, N., Feuerriegel, S., & Neumann, D. (2015). Enhancing sentiment analysis of financial news by detecting negation scopes. In: Proceedings of the 48th Hawaii international conference on system sciences (HICSS) (pp. 959–968). doi: 10.1109/HICSS.2015.119.
  31. 31.
    Robinson, R., Goh, T. T., & Zhang, R. (2012). Textual factors in online product reviews: A foundation for a more influential approach to opinion mining. Electronic Commerce Research, 12(3), 301–330. doi: 10.1007/s10660-012-9095-7.CrossRefGoogle Scholar
  32. 32.
    Rout, J., Dalmia, A., Choo, K. K. R., Bakshi, S., & Jena, S. (2017). Revisiting semi-supervised learning for online deceptive review detection. IEEE Access, 5(1), 1319–1327. doi: 10.1109/ACCESS.2017.2655032.CrossRefGoogle Scholar
  33. 33.
    Rout, J., Singh, S., Jena, S., & Bakshi, S. (2017). Deceptive review detection using labeled and unlabeled data. Multimedia Tools and Applications, 76(3), 3187–3211. doi: 10.1007/s11042-016-3819-y.CrossRefGoogle Scholar
  34. 34.
    Sadegh, M., Ibrahim, R., & Othman, Z. A. (2012). Opinion mining and sentiment analysis: A survey. International Journal of Computers & Technology, 2(3), 171–178.Google Scholar
  35. 35.
    Song, L., Lau, R. Y. K., Kwok, R. C. W., Mirkovski, K., & Dou, W. (2017). Who are the spoilers in social media marketing? Incremental learning of latent semantics for social spam detection. Electronic Commerce Research, 17(1), 51–81. doi: 10.1007/s10660-016-9244-5.CrossRefGoogle Scholar
  36. 36.
    Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267–307. doi: 10.1162/COLI_a_00049.CrossRefGoogle Scholar
  37. 37.
    Tang, H., Tan, S., & Cheng, X. (2009). A survey on sentiment detection of reviews. Expert Systems with Applications, 36(7), 10760–10773. doi: 10.1016/j.eswa.2009.02.063.CrossRefGoogle Scholar
  38. 38.
    Turney, P.D. (2002). Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417–424). doi: 10.3115/1073083.1073153.
  39. 39.
    Wang, D., Li, J., Xu, K., & Wu, Y. (2017). Sentiment community detection: Exploring sentiments and relationships in social networks. Electronic Commerce Research, 17(1), 103–132. doi: 10.1007/s10660-016-9233-8.CrossRefGoogle Scholar
  40. 40.
    Zheng, L., Wang, H., & Gao, S. (2015). Sentimental feature selection for sentiment analysis of chinese online reviews. International Journal of Machine Learning and Cybernetics. doi: 10.1007/s13042-015-0347-4.Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Jitendra Kumar Rout
    • 1
  • Kim-Kwang Raymond Choo
    • 2
  • Amiya Kumar Dash
    • 1
  • Sambit Bakshi
    • 1
  • Sanjay Kumar Jena
    • 1
  • Karen L. Williams
    • 2
  1. 1.Department of Computer ScienceNational Institute of TechnologyRourkelaIndia
  2. 2.Department of Information Systems and Cyber SecurityUniversity of Texas at San AntonioSan AntonioUSA

Personalised recommendations