Information Retrieval

, Volume 15, Issue 2, pp 116–150 | Cite as

Opinion-based entity ranking

Article

Abstract

The deployment of Web 2.0 technologies has led to rapid growth of various opinions and reviews on the web, such as reviews on products and opinions about people. Such content can be very useful to help people find interesting entities like products, businesses and people based on their individual preferences or tradeoffs. Most existing work on leveraging opinionated content has focused on integrating and summarizing opinions on entities to help users better digest all the opinions. In this paper, we propose a different way of leveraging opinionated content, by directly ranking entities based on a user’s preferences. Our idea is to represent each entity with the text of all the reviews of that entity. Given a user’s keyword query that expresses the desired features of an entity, we can then rank all the candidate entities based on how well opinions on these entities match the user’s preferences. We study several methods for solving this problem, including both standard text retrieval models and some extensions of these models. Experiment results on ranking entities based on opinions in two different domains (hotels and cars) show that the proposed extensions are effective and lead to improvement of ranking accuracy over the standard text retrieval models for this task.

Keywords

Opinion matching Entity oriented search Preference based entity search Product search Vertical search Ad-hoc faceted navigation 

Notes

Acknowledgments

We would like to thank the anonymous reviewers for their useful comments which have helped improve the evaluation part of the work. This paper is based upon work supported in part by an Alfred P. Sloan Research Fellowship, an AFOSR MURI Grant FA9550-08-1-0265, and by the National Science Foundation under grants IIS-0713581, CNS-0834709, and CNS-1028381.

References

  1. Amati, G., & van Rijsbergen, C. J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information System, 20(4), 357–389.CrossRefGoogle Scholar
  2. Balog, K., Azzopardi, L., & de Rijke, M. (2009). A language modeling framework for expert finding. Information Processing & Management, 45(1), 1–19.CrossRefGoogle Scholar
  3. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In WWW ’03: proceedings of the twelfth international conference on World Wide Web (pp. 519–528). ACM Press.Google Scholar
  4. Fang, H., Tao, T., & Zhai. C. (2004). A formal study of information retrieval heuristics. In SIGIR ’04: proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (ppp. 49–56). New York, NY, USA: ACM Press.Google Scholar
  5. Fang, H., & Zhai, C. (2007). Probabilistic models for expert finding. In ECIR (pp. 418–430).Google Scholar
  6. Gamon, M. (2004). Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of the 20th international conference on Computational Linguistics (p. 841). Geneva, Switzerland: Association for Computational Linguistics.Google Scholar
  7. Hannah, J. P. B. H. I. O. D., & Macdonald, C. (2007). University of Glasgow at TREC2007: Experiments in blog and enterprise tracks with Terrier. In Proceeddings of the 16th text retrieval conference (TREC 2007).Google Scholar
  8. He, B., Macdonald, C., He, J., & Ounis, I. (2008). An effective statistical approach to blog post opinion retrieval. In CIKM ’08: proceeding of the 17th ACM conference on information and knowledge management (pp. 1063–1072). New York, NY, USA: ACM.Google Scholar
  9. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information System, 20(4), 422–446.CrossRefGoogle Scholar
  10. Koren, J., Zhang, Y., Liu, X. (2008). Personalized interactive faceted search. In WWW ’08: proceeding of the 17th international conference on World Wide Web (pp. 477–486). New York, NY, USA: ACM.Google Scholar
  11. Krulwich, B., & Burkey, C. (1996). The contactfinder agent: Answering bulletin board questions with referrals. In AAAI/IAAI (Vol. 1, pp. 10–15).Google Scholar
  12. Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In SIGIR ’01: proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 111–119). New York, NY, USA: ACM.Google Scholar
  13. Lu, Y., Zhai, C., & Sundaresan, N. (2009). Rated aspect summarization of short comments. In Proceedings of the 18th international conference on World wide web (pp. 131–140). Madrid, Spain: ACM.Google Scholar
  14. Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2nd international conference on Knowledge capture (pp. 70–77). Sanibel Island, FL, USA: ACM.Google Scholar
  15. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006).Google Scholar
  16. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP) (pp. 79–86).Google Scholar
  17. Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the ACL (pp. 271—278).Google Scholar
  18. Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL (pp. 115–124).Google Scholar
  19. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In SIGIR ’98: proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 275–281). New York, NY, USA: ACM.Google Scholar
  20. Prabowo, R., & Thelwall, M. (2009). Sentiment analysis: A combined approach. Journal of Informetrics, 3(2), 143–157.CrossRefGoogle Scholar
  21. Robertson, S. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval, 3(4), 333–389.Google Scholar
  22. Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1994). Okapi at TREC-3. In TREC (p. 109).Google Scholar
  23. Sadikov, E., Madhavan, J., Wang, L., & Halevy, A. (2010). Clustering query refinements by user intent. In WWW ’10: Proceedings of the 19th international conference on World wide web (pp. 841–850). New York, NY, USA: ACM.Google Scholar
  24. Salton, G., & Buckley, C. (1997). Improving retrieval performance by relevance feedback, pp. 355–364.Google Scholar
  25. Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the social sciences. New York: McGraw-Hill.Google Scholar
  26. Snyder, B., & Barzilay, R. (2007). Multiple aspect ranking using the good grief algorithm. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL) (pp. 300–307).Google Scholar
  27. Tan, B., & Peng, F. (2008). Unsupervised query segmentation using generative language models and wikipedia. In WWW ’08: proceeding of the 17th international conference on World Wide Web (pp. 347–356). New York, NY, USA: ACM.Google Scholar
  28. Tunkelang, D. (2009). Faceted search. San Rafael: Morgan and Claypool Publishers.Google Scholar
  29. Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information System, 21(4), 315–346.CrossRefGoogle Scholar
  30. Wang, H., Lu, Y., & Zhai, C. (2010). Latent aspect rating analysis on review text data: A rating regression approach. In KDD ’10: proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 783–792). New York, NY, USA: ACM.Google Scholar
  31. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83.CrossRefGoogle Scholar
  32. Yang, K., Yu, N., Valerio, A., Zhangm, H., & Ke, W. (2007). Fusion approach to finding opinions in blogosphere. ICWSM.Google Scholar
  33. Zhai, C. (2008). Statistical language models for information retrieval. San Rafael: Morgan & Claypool.Google Scholar
  34. Zhai, C., Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information System, 22(2), 179–214.CrossRefGoogle Scholar
  35. Zhai, C., Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR ’01: proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 334–342). New York, NY, USA: ACM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations