Skip to main content

Opinion-based entity ranking

Abstract

The deployment of Web 2.0 technologies has led to rapid growth of various opinions and reviews on the web, such as reviews on products and opinions about people. Such content can be very useful to help people find interesting entities like products, businesses and people based on their individual preferences or tradeoffs. Most existing work on leveraging opinionated content has focused on integrating and summarizing opinions on entities to help users better digest all the opinions. In this paper, we propose a different way of leveraging opinionated content, by directly ranking entities based on a user’s preferences. Our idea is to represent each entity with the text of all the reviews of that entity. Given a user’s keyword query that expresses the desired features of an entity, we can then rank all the candidate entities based on how well opinions on these entities match the user’s preferences. We study several methods for solving this problem, including both standard text retrieval models and some extensions of these models. Experiment results on ranking entities based on opinions in two different domains (hotels and cars) show that the proposed extensions are effective and lead to improvement of ranking accuracy over the standard text retrieval models for this task.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. 1.

    http://dir.yahoo.com/.

  2. 2.

    http://www.amazon.com.

  3. 3.

    http://www.tripadvisor.com.

  4. 4.

    http://www.mediapost.com/publications/.

  5. 5.

    http://www.edmunds.com/.

  6. 6.

    http://www.tripadvisor.com/.

  7. 7.

    Thesaurus.com.

  8. 8.

    http://www.google.com/products.

  9. 9.

    http://www.bing.com.

  10. 10.

    http://www.yahoo.com.

  11. 11.

    http://www.google.com.

  12. 12.

    http://www.independent.co.uk/life-style/motoring/motoring-news/japanese-cars-are-still-the-most-reliable-2016405.html.

  13. 13.

    http://www.cnn.com.

  14. 14.

    http://www.bbc.com.

  15. 15.

    http://www.amazon.com.

  16. 16.

    http://www.bestbuy.com.

  17. 17.

    http://www.walmart.com.

References

  1. Amati, G., & van Rijsbergen, C. J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information System, 20(4), 357–389.

    Article  Google Scholar 

  2. Balog, K., Azzopardi, L., & de Rijke, M. (2009). A language modeling framework for expert finding. Information Processing & Management, 45(1), 1–19.

    Article  Google Scholar 

  3. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In WWW ’03: proceedings of the twelfth international conference on World Wide Web (pp. 519–528). ACM Press.

  4. Fang, H., Tao, T., & Zhai. C. (2004). A formal study of information retrieval heuristics. In SIGIR ’04: proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (ppp. 49–56). New York, NY, USA: ACM Press.

  5. Fang, H., & Zhai, C. (2007). Probabilistic models for expert finding. In ECIR (pp. 418–430).

  6. Gamon, M. (2004). Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of the 20th international conference on Computational Linguistics (p. 841). Geneva, Switzerland: Association for Computational Linguistics.

  7. Hannah, J. P. B. H. I. O. D., & Macdonald, C. (2007). University of Glasgow at TREC2007: Experiments in blog and enterprise tracks with Terrier. In Proceeddings of the 16th text retrieval conference (TREC 2007).

  8. He, B., Macdonald, C., He, J., & Ounis, I. (2008). An effective statistical approach to blog post opinion retrieval. In CIKM ’08: proceeding of the 17th ACM conference on information and knowledge management (pp. 1063–1072). New York, NY, USA: ACM.

  9. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information System, 20(4), 422–446.

    Article  Google Scholar 

  10. Koren, J., Zhang, Y., Liu, X. (2008). Personalized interactive faceted search. In WWW ’08: proceeding of the 17th international conference on World Wide Web (pp. 477–486). New York, NY, USA: ACM.

  11. Krulwich, B., & Burkey, C. (1996). The contactfinder agent: Answering bulletin board questions with referrals. In AAAI/IAAI (Vol. 1, pp. 10–15).

  12. Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In SIGIR ’01: proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 111–119). New York, NY, USA: ACM.

  13. Lu, Y., Zhai, C., & Sundaresan, N. (2009). Rated aspect summarization of short comments. In Proceedings of the 18th international conference on World wide web (pp. 131–140). Madrid, Spain: ACM.

  14. Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2nd international conference on Knowledge capture (pp. 70–77). Sanibel Island, FL, USA: ACM.

  15. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006).

  16. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP) (pp. 79–86).

  17. Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the ACL (pp. 271—278).

  18. Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL (pp. 115–124).

  19. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In SIGIR ’98: proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 275–281). New York, NY, USA: ACM.

  20. Prabowo, R., & Thelwall, M. (2009). Sentiment analysis: A combined approach. Journal of Informetrics, 3(2), 143–157.

    Article  Google Scholar 

  21. Robertson, S. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval, 3(4), 333–389.

    Google Scholar 

  22. Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1994). Okapi at TREC-3. In TREC (p. 109).

  23. Sadikov, E., Madhavan, J., Wang, L., & Halevy, A. (2010). Clustering query refinements by user intent. In WWW ’10: Proceedings of the 19th international conference on World wide web (pp. 841–850). New York, NY, USA: ACM.

  24. Salton, G., & Buckley, C. (1997). Improving retrieval performance by relevance feedback, pp. 355–364.

  25. Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the social sciences. New York: McGraw-Hill.

    Google Scholar 

  26. Snyder, B., & Barzilay, R. (2007). Multiple aspect ranking using the good grief algorithm. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL) (pp. 300–307).

  27. Tan, B., & Peng, F. (2008). Unsupervised query segmentation using generative language models and wikipedia. In WWW ’08: proceeding of the 17th international conference on World Wide Web (pp. 347–356). New York, NY, USA: ACM.

  28. Tunkelang, D. (2009). Faceted search. San Rafael: Morgan and Claypool Publishers.

  29. Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information System, 21(4), 315–346.

    Article  Google Scholar 

  30. Wang, H., Lu, Y., & Zhai, C. (2010). Latent aspect rating analysis on review text data: A rating regression approach. In KDD ’10: proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 783–792). New York, NY, USA: ACM.

  31. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83.

    Article  Google Scholar 

  32. Yang, K., Yu, N., Valerio, A., Zhangm, H., & Ke, W. (2007). Fusion approach to finding opinions in blogosphere. ICWSM.

  33. Zhai, C. (2008). Statistical language models for information retrieval. San Rafael: Morgan & Claypool.

    Google Scholar 

  34. Zhai, C., Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information System, 22(2), 179–214.

    Article  Google Scholar 

  35. Zhai, C., Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR ’01: proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 334–342). New York, NY, USA: ACM.

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their useful comments which have helped improve the evaluation part of the work. This paper is based upon work supported in part by an Alfred P. Sloan Research Fellowship, an AFOSR MURI Grant FA9550-08-1-0265, and by the National Science Foundation under grants IIS-0713581, CNS-0834709, and CNS-1028381.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kavita Ganesan.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ganesan, K., Zhai, C. Opinion-based entity ranking. Inf Retrieval 15, 116–150 (2012). https://doi.org/10.1007/s10791-011-9174-8

Download citation

Keywords

  • Opinion matching
  • Entity oriented search
  • Preference based entity search
  • Product search
  • Vertical search
  • Ad-hoc faceted navigation