The deployment of Web 2.0 technologies has led to rapid growth of various opinions and reviews on the web, such as reviews on products and opinions about people. Such content can be very useful to help people find interesting entities like products, businesses and people based on their individual preferences or tradeoffs. Most existing work on leveraging opinionated content has focused on integrating and summarizing opinions on entities to help users better digest all the opinions. In this paper, we propose a different way of leveraging opinionated content, by directly ranking entities based on a user’s preferences. Our idea is to represent each entity with the text of all the reviews of that entity. Given a user’s keyword query that expresses the desired features of an entity, we can then rank all the candidate entities based on how well opinions on these entities match the user’s preferences. We study several methods for solving this problem, including both standard text retrieval models and some extensions of these models. Experiment results on ranking entities based on opinions in two different domains (hotels and cars) show that the proposed extensions are effective and lead to improvement of ranking accuracy over the standard text retrieval models for this task.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price includes VAT (USA)
Tax calculation will be finalised during checkout.
Amati, G., & van Rijsbergen, C. J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information System, 20(4), 357–389.
Balog, K., Azzopardi, L., & de Rijke, M. (2009). A language modeling framework for expert finding. Information Processing & Management, 45(1), 1–19.
Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In WWW ’03: proceedings of the twelfth international conference on World Wide Web (pp. 519–528). ACM Press.
Fang, H., Tao, T., & Zhai. C. (2004). A formal study of information retrieval heuristics. In SIGIR ’04: proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (ppp. 49–56). New York, NY, USA: ACM Press.
Fang, H., & Zhai, C. (2007). Probabilistic models for expert finding. In ECIR (pp. 418–430).
Gamon, M. (2004). Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of the 20th international conference on Computational Linguistics (p. 841). Geneva, Switzerland: Association for Computational Linguistics.
Hannah, J. P. B. H. I. O. D., & Macdonald, C. (2007). University of Glasgow at TREC2007: Experiments in blog and enterprise tracks with Terrier. In Proceeddings of the 16th text retrieval conference (TREC 2007).
He, B., Macdonald, C., He, J., & Ounis, I. (2008). An effective statistical approach to blog post opinion retrieval. In CIKM ’08: proceeding of the 17th ACM conference on information and knowledge management (pp. 1063–1072). New York, NY, USA: ACM.
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information System, 20(4), 422–446.
Koren, J., Zhang, Y., Liu, X. (2008). Personalized interactive faceted search. In WWW ’08: proceeding of the 17th international conference on World Wide Web (pp. 477–486). New York, NY, USA: ACM.
Krulwich, B., & Burkey, C. (1996). The contactfinder agent: Answering bulletin board questions with referrals. In AAAI/IAAI (Vol. 1, pp. 10–15).
Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In SIGIR ’01: proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 111–119). New York, NY, USA: ACM.
Lu, Y., Zhai, C., & Sundaresan, N. (2009). Rated aspect summarization of short comments. In Proceedings of the 18th international conference on World wide web (pp. 131–140). Madrid, Spain: ACM.
Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2nd international conference on Knowledge capture (pp. 70–77). Sanibel Island, FL, USA: ACM.
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006).
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP) (pp. 79–86).
Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the ACL (pp. 271—278).
Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL (pp. 115–124).
Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In SIGIR ’98: proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 275–281). New York, NY, USA: ACM.
Prabowo, R., & Thelwall, M. (2009). Sentiment analysis: A combined approach. Journal of Informetrics, 3(2), 143–157.
Robertson, S. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval, 3(4), 333–389.
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1994). Okapi at TREC-3. In TREC (p. 109).
Sadikov, E., Madhavan, J., Wang, L., & Halevy, A. (2010). Clustering query refinements by user intent. In WWW ’10: Proceedings of the 19th international conference on World wide web (pp. 841–850). New York, NY, USA: ACM.
Salton, G., & Buckley, C. (1997). Improving retrieval performance by relevance feedback, pp. 355–364.
Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the social sciences. New York: McGraw-Hill.
Snyder, B., & Barzilay, R. (2007). Multiple aspect ranking using the good grief algorithm. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL) (pp. 300–307).
Tan, B., & Peng, F. (2008). Unsupervised query segmentation using generative language models and wikipedia. In WWW ’08: proceeding of the 17th international conference on World Wide Web (pp. 347–356). New York, NY, USA: ACM.
Tunkelang, D. (2009). Faceted search. San Rafael: Morgan and Claypool Publishers.
Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information System, 21(4), 315–346.
Wang, H., Lu, Y., & Zhai, C. (2010). Latent aspect rating analysis on review text data: A rating regression approach. In KDD ’10: proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 783–792). New York, NY, USA: ACM.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83.
Yang, K., Yu, N., Valerio, A., Zhangm, H., & Ke, W. (2007). Fusion approach to finding opinions in blogosphere. ICWSM.
Zhai, C. (2008). Statistical language models for information retrieval. San Rafael: Morgan & Claypool.
Zhai, C., Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information System, 22(2), 179–214.
Zhai, C., Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR ’01: proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 334–342). New York, NY, USA: ACM.
We would like to thank the anonymous reviewers for their useful comments which have helped improve the evaluation part of the work. This paper is based upon work supported in part by an Alfred P. Sloan Research Fellowship, an AFOSR MURI Grant FA9550-08-1-0265, and by the National Science Foundation under grants IIS-0713581, CNS-0834709, and CNS-1028381.
About this article
Cite this article
Ganesan, K., Zhai, C. Opinion-based entity ranking. Inf Retrieval 15, 116–150 (2012). https://doi.org/10.1007/s10791-011-9174-8
- Opinion matching
- Entity oriented search
- Preference based entity search
- Product search
- Vertical search
- Ad-hoc faceted navigation