Advertisement

Combining similarity and sentiment in opinion mining for product recommendation

Abstract

In the world of recommender systems, so-called content-based methods are an important approach that rely on the availability of detailed product or item descriptions to drive the recommendation process. For example, recommendations can be generated for a target user by selecting unseen products that are similar to the products that the target user has liked or purchased in the past. To do this, content-based methods must be able to compute the similarity between pairs of products (unseen products and liked products, for example) and typically this is achieved by comparing product features or other descriptive elements. The approach works well when product descriptions are readily available and when they are detailed enough to afford an effective similarity comparison. But this is not always the case. Detailed product descriptions may not be available since they can be expensive to create and maintain. In this article we consider another source of product descriptions in the form of the user-generated reviews that frequently accompany products on the web. We ask whether it is possible to mine these reviews, unstructured and noisy as they are, to produce useful product descriptions that can be used in a recommendation system. In particular we describe a novel approach to product recommendation that harnesses not only the features that can be mined from user-generated reviews but also the expressions of sentiment that are associated with these features. We present a recommendation ranking strategy that combines similarity and sentiment to suggest products that are similar but superior to a query product according to the opinion of reviewers, and we demonstrate the practical benefits of this approach across a variety of Amazon product domains.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    The sentiment lexicon from Hu and Liu (2004a) contains lists of 2,009 positive words and 4,783 negative words. There are no weights assigned to words in the sentiment lexicon; rather, all words are considered to equally reflect either positive or negative sentiment.

  2. 2.

    The range of both B1 and B2 is [-1, +1]. Since the range of S i m(Q,C) is [0, +1], S e n t(Q,C) is normalised to [0, +1] in (8).

  3. 3.

    In this case, related products are those as suggested by Amazon’s “Customers who viewed this item also viewed these items” approach to recommendation.

  4. 4.

    All terms in sentences are first converted to lowercase, stop words are removed and the remaining terms are stemmed to their root form.

  5. 5.

    http://glaros.dtc.umn.edu/gkhome/views/cluto

  6. 6.

    http://www.tweetfeel.com/

References

  1. Archak, N., Ghose, A., & Ipeirotis, P.G. (2011). Deriving the pricing power of product features by mining consumer reviews. Management Science, 57(8), 1485–1509.

  2. Baccianella, S., Esuli, A., & Sebastiani, F. (2009). Multi-facet rating of product reviews. In Advances in Information Retrieval, 31th European Conference on Information Retrieval Research (ECIR 2009) (pp. 461–472). Toulouse, France: Springer.

  3. Bar-Haim, R., Dinur, E., Feldman, R., Fresko, M., & Goldstein, G. (2011). Identifying and following expert investors in stock microblogs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pp 1310–1319. PA, USA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=2145432.2145569.

  4. Boiy, E., & Moens, M.F. (2009). A machine learning approach to sentiment analysis in multilingual web texts. Information Retrieval, 12(5), 526–558.

  5. Bridge, D., Göker, M.H., McGinty, L., & Smyth, B. (2005). Case-based recommender systems. Knowledge Engineering Review, 20(03), 315–320.

  6. Burke, R. (2002). Hybrid recommender systems: Survey and experiments. User Model User-Adapted International, 12(4), 331–370. doi:10.1023/A:1021240730564.

  7. Burke, R., Hammond, K., & Yound, B. (1997). The findme approach to assisted browsing. IEEE Expert, 12(4), 32–40. doi:10.1109/64.608186.

  8. Dasgupta, S., & Ng, V. (2009). Mine the easy, classify the hard: A semi-supervised approach to automatic sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, ACL ’09, pp 701–709. PA, USA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1690219.1690244.

  9. De Francisci Morales, G., Gionis, A., & Lucchese, C. (2012). From chatter to headlines: Harnessing the real-time web for personalized news recommendation. In Proceedings of the fifth ACM International Conference on Web Search and Data Mining, WSDM ’12, pp. 153–162. NY, USA: ACM. doi:10.1145/2124295.2124315.

  10. Desrosiers, C., & Karypis, G. (2011). A comprehensive survey of neighborhood-based recommendation methods. In Recommender Systems Handbook (pp. 107–144): Springer.

  11. Ding, X., Liu, B., & Yu, P.S. (2008). A holistic lexicon-based approach to opinion mining. In Proceedings of the 1st ACM International Conference on Web Search and Data Mining (pp. 231–240): ACM.

  12. Dong, R., O’Mahony, M.P, Schaal, M., McCarthy, K., & Smyth, B. (2013). Sentimental product recommendation. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys ’13. (pp. 411–414). NY, USA: ACM. doi:10.1145/2507157.2507199.

  13. Dong, R., O’Mahony, M.P., & Smyth, B. (2014). Further experiments in opinionated product recommendation. In Proceedings of the 22nd International Conference on Case-Based Reasoning, ICCBR ’14 (pp. 110–124): Springer.

  14. Dong, R., Schaal, M., O’Mahony, M.P., McCarthy, K., & Smyth, B. (2013). Opinionated product recommendation. In Proceedings of the 21st International Conference on Case-Based Reasoning, ICCBR ’13 (pp. 44–58). Heidelberg: Springer.

  15. Dong, R., Schaal, M., O’Mahony, M.P., & Smyth, B. (2013). Topic extraction from online reviews for classification and recommendation. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence, IJCAI ’13. Menlo Park, California: AAAI Press.

  16. Dooms, S., De Pessemier, T., & Martens, L. (2013). Movietweetings: a movie rating dataset collected from twitter. In Workshop on Crowdsourcing and Human Computation for Recommender Systems, CrowdRec at RecSys, Vol. 13.

  17. Feldman, R., Rosenfeld, B., Bar-Haim, R., & Fresko, M. (2011). The stock sonar sentiment analysis of stocks based on a hybrid approach. In Proceedings of the 23rd IAAI Conference.

  18. Garcia Esparza, S., O’Mahony, M.P., & Smyth, B. (2010). On the real-time web as a source of recommendation knowledge. In Proceedings of the fourth ACM Conference on Recommender Systems, RecSys ’10. (pp. 305–308). NY, USA: ACM. doi:10.1145/1864708.1864773.

  19. Herlocker, J.L., Konstan, J.A., & Riedl, J. (2000). Explaining collaborative filtering recommendations. In Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, CSCW ’00. (pp. 241–250). NY, USA: ACM. doi:10.1145/358916.358995.

  20. Hsu, C.F., Khabiri, E., & Caverlee, J. (2009). Ranking comments on the social web. In Proceedings of the 2009 IEEE International Conference on Social Computing (SocialCom-09) (pp. 90–97). Vancouver, Canada.

  21. Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04. (pp. 168–177). NY, USA: ACM. doi:10.1145/1014052.1014073.

  22. Hu, M., & Liu, B. (2004). Mining opinion features in customer reviews. In Proceedings of the 19th National Conference on Artifical Intelligence, AAAI’04. (pp. 755–760): AAAI Press. http://dl.acm.org/citation.cfm?id=1597148.1597269.

  23. Huang, J., Etzioni, O., Zettlemoyer, L., Clark, K., & Lee, C. (2012). Revminer: An extractive interface for navigating reviews on a smartphone. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, UIST ’12. (pp. 3–12). NY, USA: ACM. doi:10.1145/2380116.2380120.

  24. Jiang, L., Yu, M., Zhou, M., Liu, X., & Zhao, T. (2011). Target-dependent twitter sentiment classification (pp. 151–160): ACL.

  25. Justeson, J.S., & Katz, S.M. (1995). Technical terminology: Some linguistic properties and an algorithm for identification in text. National Language Engineering, 1(1), 9–27.

  26. Kim, H.D, & Zhai, C. (2009). Generating comparative summaries of contradictory opinions in text. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09. (pp. 385–394). NY, USA: ACM. doi:10.1145/1645953.1646004.

  27. Kim, S.M., & Pantel, P. (2006). Chklovski, T.,, Pennacchiotti, M.: Automatically assessing review helpfulness. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2006) (pp. 423–430). Sydney, Australia.

  28. Kim, S.M., Pantel, P., Chklovski, T., & Pennacchiotti, M. (2006). Automatically assessing review helpfulness. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP ’06. (pp. 423–430). PA, USA: Association for Computational Linguistics http://dl.acm.org/citation.cfm?id=1610075.1610135.

  29. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.

  30. Kruskal, W.H., & Wallis, W.A. (1952). Use of ranks in one-criterion variance analysis. Journal American Statistics Association, 47(260), 583–621.

  31. Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lecture Human Language Technology, 5(1), 1–167.

  32. Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th International Conference on World Wide Web, WWW ’05. (pp. 342–351). NY, USA: ACM. doi:10.1145/1060745.1060797.

  33. Liu, J., Cao, Y., Lin, C.Y., Huang, Y., & Zhou, M. (2007). Low-quality product review detection in opinion summarization. In EMNLP-CoNLL (pp. 334–342).

  34. Liu, Y., Huang, X., An, A., & Yu, X. (2008). Modeling and predicting the helpfulness of online reviews. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008) (pp. 443–452). Pisa, Italy: IEEE Computer Society.

  35. Lops, P., De Gemmis, M., & Semeraro, G. (2011). Content-based recommender systems: State of the art and trends. In Recommender Systems Handbook (pp. 73–105): Springer.

  36. Manning, C.D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval Vol. 1: Cambridge University Press Cambridge.

  37. McGlohon, M., Glance, N.S., & Reiter, Z. (2010). Star quality: Aggregating reviews to rank products and merchants. In Proceedings of 4th International AAAI Conference on Weblogs and Social Media, ICWSM ’10.

  38. Mishne, G. (2006). Multiple ranking strategies for opinion retrieval in blogs. In Online Proceedings of TREC: Citeseer.

  39. Moghaddam, S., & Ester, M. (2010). Opinion digger: An unsupervised opinion miner from unstructured product reviews. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10. (pp. 1825–1828). NY, USA: ACM. doi:10.1145/1871437.1871739.

  40. Na, S.H., Lee, Y., Nam, S.H., & Lee, J.H. (2009). Improving opinion retrieval based on query-specific sentiment lexicon. In Advances in Information Retrieval (pp. 734–738): Springer.

  41. Nigam, K., & Hurst, M. (2004). Towards a robust metric of opinion. In AAAI Spring Symposium on Exploring Attitude and Affect in Text (pp. 598–603).

  42. O’Mahony, M.P., & Smyth, B. (2009). Learning to recommend helpful hotel reviews. In Proceedings of the 3rd ACM Conference on Recommender Systems, RecSys ’09. NY, USA.

  43. O’Mahony, M.P., & Smyth, B. (2010). A classification-based review recommender. Knowledge-Based Systems, 23(4), 323–329.

  44. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the 2nd ACL Conference on Empirical Methods in Natural Language Processing - Volume 10, EMNLP ’02.(pp. 79–86). PA, USA: Association for Computational Linguistics. doi:10.3115/1118693.1118704.

  45. Paul, M.J., Zhai, C., & Girju, R. (2010). Summarizing contrastive viewpoints in opinionated text. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP ’10. (pp. 66–76). PA, USA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1870658.1870665.

  46. Pazzani, M., & Billsus, D. (2007). Content-based recommendation systems. In Brusilovsky, P., Kobsa, A., & Nejdl, W. (Eds.) The Adaptive Web, Lecture Notes in Computer Science. (Vol. 4321 pp. 325–341): Springer Berlin Heidelberg. doi:10.1007/978-3-540-72079-9_10.

  47. Pazzani, M., & Billsus, D. (2007). Content-based recommendation systems. In The Adaptive Web, Lecture Notes in Computer Science, (Vol. 4321 pp. 325–341): Springer Berlin Heidelberg.

  48. Phelan, O., McCarthy, K., & Smyth, B. (2009). Using twitter to recommend real-time topical news. In Proceedings of the 3rd ACM conference on Recommender systems (pp. 385–388): ACM.

  49. Poirier, D., Tellier, I., Fessant, F., & Schluth, J. (2010). Towards text-based recommendations. In Adaptivity, Personalization and Fusion of Heterogeneous Information, RIAO ’10 (pp. 136–137). Paris, France. http://dl.acm. org/citation.cfm?id=1937055.1937089.

  50. Popescu, A.M., & Etzioni, O. (2007). Extracting product features and opinions from reviews. In Natural Language Processing and Text Mining (pp. 9–28). London: Springer.

  51. Qiu, G., Liu, B., Bu, J., & Chen, C. (2009). Expanding domain sentiment lexicon through double propagation. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, IJCAI ’09, (Vol. 9 pp. 1199–1204).

  52. Reilly, J., McCarthy, K., McGinty, L., & Smyth, B. (2004). Dynamic critiquing. Advances in Case-Based Reasoning, 37–50.

  53. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, CSCW ’94. (pp. 175–186). NY, USA: ACM. doi:10.1145/192844.192905.

  54. Reyes, A., & Rosso, P. (2012). Making objective decisions from subjective data: Detecting irony in customer reviews. Decision Support Systems, 53(4), 754–760.

  55. Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, WWW ’01. (pp. 285–295). NY, USA: ACM. doi:10.1145/ 371920.372071.

  56. Shardanand, U., & Maes, P. (1995). Social information filtering: algorithms for automating ẅord of mouth. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 210–217): ACM Press/Addison-Wesley Publishing Co.

  57. Smyth, B. (2007). Case-based recommendation. In Brusilovsky, P., Kobsa, A., & Nejdl, W. (Eds.) The Adaptive Web, Lecture Notes in Computer Science. (Vol. 4321 pp. 342–376): Springer Berlin Heidelberg. doi:10.1007/978-3-540-72079-9_11.

  58. Tata, S., & Di Eugenio, B. (2010). Generating fine-grained reviews of songs from album reviews. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10 (pp. 1376–1385). PA, USA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1858681.1858821.

  59. Tintarev, N., & Masthoff, J. (2007). In Effective explanations of recommendations: User-centered design. In: Proceedings of the 1st ACM Conference on Recommender Systems, RecSys ’07. (pp. 153–156). NY, USA: ACM. doi:10.1145/1297231.1297259.

  60. Tsur, O., Davidov, D., & Rappoport, A. (2010). Icwsm - a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. In Proceedings of the International AAAI Conference on Weblogs and Social Media. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1495/1851.

  61. Tumasjan, A., Sprenger, T.O., Sandner, P.G., & Welpe, I.M. (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment. In Proceedings of 4th International AAAI Conference on Weblogs and Social Media, ICWSM ’10.

  62. Turney, P.D. (2002). Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pp. 417–424. Association for Computational Linguistics. PA, USA. doi:10.3115/1073083.1073153.

  63. Wiebe, J.M., Bruce, R.F., & O’Hara, T.P. (1999). Development and use of a gold-standard data set for subjectivity classifications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, ACL ’99. (p. Association for Computational Linguistics). PA, USA. doi:10.3115/1034678.1034721.

  64. Yessenalina, A., Yue, Y., & Cardie, C. (2010). Multi-level structured models for document-level sentiment classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP ’10 (pp. 1046–1056). PA, USA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1870658.1870760.

  65. Zhai, Z., Liu, B., Xu, H., & Jia, P. (2011). Clustering product features for opinion mining. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM ’11. (pp. 347–354). NY, USA: ACM. doi:10.1145/1935826.1935884.

  66. Zhang, K., Narayanan, R., & Choudhary, A. (2010). Voice of the customers: Mining online customer reviews for product feature-based ranking. In Proceedings of the 3rd Workshop on Online Social Networks, WOSN ’10. CA, USA. http://dl.acm.org/citation.cfm?id=1863190.1863201.

  67. Zhang, M., & Ye, X. (2008). A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 411–418): ACM.

  68. Zhang, W., Jia, L., Yu, C., & Meng, W. (2008). Improve the effectiveness of the opinion retrieval and opinion polarity classification. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (pp. 1415–1416): ACM.

  69. Zhang, Z., & Varadarajan, B. (2006). Utility scoring of product reviews. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM ’06. (pp. 51–57). NY, USA: ACM. doi:10.1145/1183614.1183626.

  70. Zhuang, L., Jing, F., & Zhu, X.Y. (2006). Movie review mining and summarization. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM ’06. (pp. 43–50). NY, USA: ACM. doi:10.1145/1183614.1183625.

Download references

Author information

Correspondence to Ruihai Dong.

Additional information

This work is supported by Science Foundation Ireland under grant 07/CE/I1147. The Insight Centre for Data Analytics is supported by Science Foundation Ireland under Grant Number SFI/12/RC/2289.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dong, R., O’Mahony, M.P., Schaal, M. et al. Combining similarity and sentiment in opinion mining for product recommendation. J Intell Inf Syst 46, 285–312 (2016). https://doi.org/10.1007/s10844-015-0379-y

Download citation

Keywords

  • User-generated Reviews
  • Opinion Mining
  • Sentiment-based Product Recommendation