Cluster Computing

, Volume 22, Supplement 3, pp 7149–7164 | Cite as

Opinion mining on large scale data using sentiment analysis and k-means clustering

  • Sumbal Riaz
  • Mehvish Fatima
  • M. KamranEmail author
  • M. Wasif Nisar


With the rapid growth of web technology and easy access of internet, online shopping has been increased. Now people express their opinions and share their experiences that greatly influence new buyers for purchasing products, thereby generating large data sets. This large data is very helpful for analyzing customer preference, needs and its behavior toward a product. Companies face the challenge of analyzing this sheer amount of data to extract customer opinion. To address this challenge, in this paper, we performed sentiment analysis on the customer review real-world data at phrase level to find out customer preference by analyzing subjective expressions. Then we calculated the strength of sentiment word to find out the intensity of each expression and applied clustering for placing the words in various clusters based on their intensity. We also compared the results of our technique with star-ranking given on the same dataset and found the drastic change in our results. We also provide a visual representation of our results to provide a clear insight of customer preference and behavior to help decision makers for better decision making.


Heterogeneous data processing Imbalanced learning Intelligent computing 


  1. 1.
    Smith, A., Anderson, M.: Online Shopping and E-Commerce. Pew Research Center, Washington, DC (2016)Google Scholar
  2. 2.
    Liu, B.: Sentiment analysis and subjectivity. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Boca Raton (2010)Google Scholar
  3. 3.
    Asghar, M.Z., Ahmad, S., Qasim, M., Zahra, S.R., Kundi, F.M.: SentiHealth: creating health-related sentiment lexicon using hybrid approach. SpringerPlus 5, 1139 (2016)CrossRefGoogle Scholar
  4. 4.
    Hutto, C.J., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)Google Scholar
  5. 5.
    Wang, H., Can, D., Kazemzadeh, A., Bar, F., Narayanan, S.: A system for real-time twitter sentiment analysis of 2012 US presidential election cycle. In: Proceedings of the ACL 2012 System Demonstrations, pp. 115–120 (2012)Google Scholar
  6. 6.
    Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs (2011). arXiv:1103.2903
  7. 7.
    Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5, 1093–1113 (2014)CrossRefGoogle Scholar
  8. 8.
    Bai, X.: Predicting consumer sentiments from online text. Decis. Support Syst. 50, 732–742 (2011)CrossRefGoogle Scholar
  9. 9.
    Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28, 15–21 (2013)CrossRefGoogle Scholar
  10. 10.
    Archak, N., Ghose, A., Ipeirotis, P.G.: Deriving the pricing power of product features by mining consumer reviews. Manag. Sci. 57, 1485–1509 (2011)CrossRefGoogle Scholar
  11. 11.
    Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37, 267–307 (2011)CrossRefGoogle Scholar
  12. 12.
    Kang, H., Yoo, S.J., Han, D.: Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst. Appl. 39, 6000–6010 (2012)CrossRefGoogle Scholar
  13. 13.
    Wang, S., Li, D., Song, X., Wei, Y., Li, H.: A feature selection method based on improved Fisher’s discriminant ratio for text sentiment classification. Expert Syst. Appl. 38, 8696–8702 (2011)CrossRefGoogle Scholar
  14. 14.
    Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Syst. 89, 14–46 (2015)CrossRefGoogle Scholar
  15. 15.
    Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst. Appl. 36, 6527–6535 (2009)CrossRefGoogle Scholar
  16. 16.
    Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: the good the bad and the OMG!. ICWSM 11, 164 (2011)Google Scholar
  17. 17.
    Saif, H., He, Y., Alani, H.: Semantic sentiment analysis of twitter. Semant. Web-ISWC 2012, 508–524 (2012)Google Scholar
  18. 18.
    Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 36-44 (2010)Google Scholar
  19. 19.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354 (2005)Google Scholar
  20. 20.
    Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural language use: our words, our selves. Ann. Rev. Psychol. 54, 547–577 (2003)CrossRefGoogle Scholar
  21. 21.
    Lu, Y., Kong, X., Quan, X., Liu, W., Xu, Y.: Exploring the sentiment strength of user reviews. In: International Conference on Web-Age Information Management, pp. 471–482 (2010)CrossRefGoogle Scholar
  22. 22.
    Eirinaki, M., Pisal, S., Singh, J.: Feature-based opinion mining and ranking. J. Comput. Syst. Sci. 78, 1175–1184 (2012)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Deng, Z.-H., Luo, K.-H., Yu, H.-L.: A study of supervised term weighting scheme for sentiment analysis. Expert Syst. Appl. 41, 3506–3513 (2014)CrossRefGoogle Scholar
  24. 24.
    Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38 (2011)Google Scholar
  25. 25.
    Khan, F.H., Bashir, S., Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. Decis. Support Syst. 57, 245–257 (2014)CrossRefGoogle Scholar
  26. 26.
    Asghar, M.Z., Khan, A., Ahmad, S., Qasim, M., Khan, I.A.: Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12, e0171649 (2017)CrossRefGoogle Scholar
  27. 27.
    Mostafa, M.M.: More than words: social networks’ text mining for consumer brand sentiments. Expert Syst. Appl. 40, 4241–4251 (2013)CrossRefGoogle Scholar
  28. 28.
    Asghar, M.Z., Khan, A., Ahmad, S., Khan, I.A., Kundi, F.M.: A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS oNE 10, e0140204 (2015)CrossRefGoogle Scholar
  29. 29.
    Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based sentiment analysis of twitter posts. Expert Syst. Appl. 40, 4065–4074 (2013)CrossRefGoogle Scholar
  30. 30.
    Bell, D., Koulouri, T., Lauria, S., Macredie, R.D., Sutton, J.: Microblogging as a mechanism for human-robot interaction. Knowl. Syst. 69, 64–77 (2014)CrossRefGoogle Scholar
  31. 31.
    Popescu, O., Strapparava, C.: Time corpora: epochs, opinions and changes. Knowl. Syst. 69, 3–13 (2014)CrossRefGoogle Scholar
  32. 32.
    Neviarouskaya, A., Prendinger, H., Ishizuka, M.: SentiFul: a lexicon for sentiment analysis. IEEE Trans. Affect. Comput. 2, 22–36 (2011)CrossRefGoogle Scholar
  33. 33.
    Asghar, M.Z., Khan, A., Ahmad, A., Kundi, F.M.: Preprocessing in natural language processing. Emerg. Issues Nat. Appl. Sci. 10, 152–161 (2013)Google Scholar
  34. 34.
    Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings. IEEE International Forum on Research and Technology Advances in Digital Libraries: ADL 98, pp. 12–18 (1998)Google Scholar
  35. 35.
    Lee, D., Jeong, O.-R., Lee, S.: Opinion mining of customer feedback data on the web. In: Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication, pp. 230–235 (2008)Google Scholar
  36. 36.

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceCOMSATS Institute of Information TechnologyWah CanttPakistan

Personalised recommendations