Opinion-based entity ranking

Ganesan, Kavita; Zhai, ChengXiang

doi:10.1007/s10791-011-9174-8

Opinion-based entity ranking

Published: 18 August 2011

Volume 15, pages 116–150, (2012)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

Opinion-based entity ranking

Download PDF

Kavita Ganesan¹ &
ChengXiang Zhai¹

1194 Accesses
73 Citations
3 Altmetric
Explore all metrics

Abstract

The deployment of Web 2.0 technologies has led to rapid growth of various opinions and reviews on the web, such as reviews on products and opinions about people. Such content can be very useful to help people find interesting entities like products, businesses and people based on their individual preferences or tradeoffs. Most existing work on leveraging opinionated content has focused on integrating and summarizing opinions on entities to help users better digest all the opinions. In this paper, we propose a different way of leveraging opinionated content, by directly ranking entities based on a user’s preferences. Our idea is to represent each entity with the text of all the reviews of that entity. Given a user’s keyword query that expresses the desired features of an entity, we can then rank all the candidate entities based on how well opinions on these entities match the user’s preferences. We study several methods for solving this problem, including both standard text retrieval models and some extensions of these models. Experiment results on ranking entities based on opinions in two different domains (hotels and cars) show that the proposed extensions are effective and lead to improvement of ranking accuracy over the standard text retrieval models for this task.

Ranking entities on the basis of users’ opinions

Article 27 October 2015

Shariq Bashir

Statistical Entity Ranking with Domain Knowledge

Named Entity Oriented Related News Ranking

1 Introduction

The era of Social Computing has kindled massive growth of opinions and reviews on the web, including reviews on businesses, products and opinions about people. Let us just consider reviews of movies. On yahoo’s directory listing, ^{Footnote 1} the number of movie review sites alone is nearing two hundred. This number does not even include the growing number of blogs or social networking sites where people have the ability to freely express opinions about movies.

The vast amount of opinions expressed by experts and ordinary users can be very useful to help people make all kinds of decisions, ranging from what to buy to what treatment to choose for a disease. For example, shoppers at Amazon ^{Footnote 2} typically would read the reviews about a product before buying it, and travelers may rely on opinions about hotels on Tripadvisor ^{Footnote 3} to help them choose an appropriate hotel at the destination. It has been shown that 77% of online shoppers use reviews and ratings when making a purchase decision. ^{Footnote 4}

Unfortunately, the abundance of opinions also poses challenges in digesting all the opinions about an entity or a topic. For example, a popular product such as the iPhone may have hundreds of reviews on Amazon.com, and popular hotels like Marriott or Hilton may have over five hundred reviews on Tripadvisor. Thus, the task of developing computational techniques to help users digest and exploit all the opinions is a very important and interesting research challenge.

Most existing work on tackling this general challenge has focused on integrating and summarizing opinions to help users better digest all the opinions (see Sect. 2 for a detailed review of related work). In this paper, we propose a different way of leveraging opinionated content, that is to directly rank interesting entities based on how well the opinions on these entities match a user’s preferences. Since a user is often interested in choosing an entity based on the opinions on the entity, ranking entities in this way provides a more direct support for a user’s decision-making task. For example, the decision-making task in the case of a user shopping for a product is to decide which product out of the many to buy. Thus, it would be very helpful for such a user if we can take a keyword query from the user expressing his/her preferences for the product (e.g.,“comfortable seats, cheap and reliable” for a car), and return a ranked list of cars in the order of likelihood that a car matches the users preferences. With such a capability, the user is no longer overwhelmed by all the reviews available on all cars, but rather the user can now analyze a much smaller set of cars that roughly matches his/her preferences based on the judgment of other users. Further, this type of ranking is flexible in that it can be applied to any entity for which opinionated content is available.

To rank entities in this way, our idea is to represent each entity with the text of all the reviews of that particular entity, often available from various websites. Given a user’s keyword query that expresses the desired features of an entity, we can then rank the relevant entities based on how well its reviews match the user’s preferences. An ideal setup for an Opinion-Based Entity Ranking system is as shown in Fig. 1, where the user can freely express preferences as a natural keyword query.

It is natural for a user to specify preferences on various aspects of an entity in the envisioned entity ranking task. Thus we can expect a user’s query to consist of preferences on multiple aspects; for example, a preference query for a car might be “good gas mileage, cheap, reliable”, which consists of preferences on three different aspects (i.e., efficiency, price, and reliability). In general, if a user enters a query in a single query box, we would need to parse a query to obtain preferences on different aspects. In this paper, we focus on studying effectiveness of different ranking methods, so we assume that the multiple aspects in a user’s query have already been segmented in order to factor out the influence of query segmentation on retrieval accuracy. Such a segmented preference query can also be naturally obtained by providing a multi-aspect query form or asking a user to use a delimiter (e.g., a comma) to separate multiple preferences. For example, in Fig. 2, we show a system interface where the users can find hotels in any city by stating their preferences on the various aspects of hotels.

Although this ranking problem closely resembles an information retrieval problem where the reviews of an entity can be regarded as an “entity document,” there are two important differences. First, the query is meant to express a user’s preferences in keywords; thus it is expected to be longer than regular keyword queries on the Web. More importantly, the query generally would contain preferences on multiple aspects of an entity. As we will show later in the paper, modeling these aspects can improve ranking accuracy. Second, the ranking criteria are to capture how well an entity satisfies a user’s preferences rather than the relevance of a document to a query as in the case of regular retrieval. Therefore, the matching of opinionated words or sentiment would be important. We will show that although traditional query expansion works reasonably well in some cases, expanding a query with similar opinion words can significantly improve ranking accuracy on different types of data.

In addition to studying the effectiveness of standard text retrieval models for this task, we further propose several extensions of these models to better solve this special ranking problem. Specifically, we propose two heuristics: (1) query aspect modeling where we use each query aspect to rank entities and then aggregate the ranked results from the multiple aspects of the query; and (2) opinion expansion where we expand a query with related opinion words found in an online thesaurus. Our approach is light-weight, scalable and flexible as we avoid the need for costly information extraction and data mining.

Evaluation of this ranking task is a challenge since no existing test collection can be used for evaluation. We created the first benchmark data set by leveraging existing rating information. While it is not hard to collect reviews for different entities, it is a significant challenge to obtain reasonable queries and also to evaluate ranking accuracy quantitatively. We solve this problem by leveraging the ratings of different aspects of cars and hotels available on Edmunds.com ^{Footnote 5} and Tripadvisor.com, ^{Footnote 6} and created two different data sets as a gold standard for quantitative evaluation. The data sets are available at http://sifaka.cs.uiuc.edu/ir/downloads.html.

Experimental results on these two data sets show that the proposed extensions over standard retrieval models are effective for the task of opinion-based entity ranking. The focused expansion technique (i.e. opinion expansion) is shown to be particularly effective. Modeling the aspects in a user’s query as opposed to just treating the query as a “long keyword query” is also beneficial, especially for longer queries with more aspects.

We also conducted a small-scale user study to further evaluate the effectiveness of the proposed methods, and the results confirm that the proposed methods can return useful and meaningful ranking lists of entities based on keyword preferences.

2 Related work

To the best of our knowledge, no previous study has leveraged opinionated content to rank entities the way we have proposed. However, there are several lines of related work which we briefly describe in this section.

2.1 Sentiment analysis

Sentiment analysis involves classifying opinions in text into categories like “positive” or “negative” often with an implicit category of “neutral”. Methods in this line of work can be categorized as supervised (requires labeled training data) (Dave et al. 2003; Gamon 2004; Pang and Lee 2004; Pang et al. 2002), unsupervised (relies on lexicon and external knowledge) (Nasukawa and Yi 2003; Turney and Littman 2003) or hybrid approaches (Pang and Lee 2005; Prabowo and Thelwall 2009). While sentiment analysis provides a means to generate polarity ratings at different levels of granularity (document, sentence or phrase), it does not provide direct support in matching a user’s preference on an aspect with polarity ratings on the aspect of interest. Moreover, since these ratings are categorical, it would be ineffective to rank entities based on whether its aspect is “positive” or “negative”.

2.2 Rating prediction and decomposition

In recent years, there has been work in trying to decompose reviews to make aspect based rating predictions (Lu et al. 2009; Wang et al. 2010; Snyder and Barzilay 2007). This line of work is closely related to ours as, once we obtain ratings on different aspects, we would be able to rank entities based on their ratings in the aspects interesting to a user. This approach, however, has some practical limitations. First, these approaches assume a fixed number of aspects on a given entity. It is not only impractical to define or mine a set of aspects for each category of entities (e.g. politicians: approval rating, character; laptops:battery life, screen), but a fixed number of aspects would also severely limit the type of queries a user could issue. More importantly, all the work in this line, require some supervision in that they require the availability of ratings associated with reviews, which may not always be present. We take a more general stance, that is to assume limited knowledge on the opinions and the aspects being queried and focus on leveraging robust retrieval models to match the user’s preferences for an entity with opinions on that entity.

2.3 Expert finding

Another relevant area of research is Expert Finding. Expert finding is about finding people rather than documents and the goal is to retrieve a ranked list of experts with expertise on a given topic (Balog et al. 2009; Fang and Zhai 2007; Krulwich and Burkey 1996). The techniques used range from standard retrieval methods (Krulwich and Burkey 1996) like the vector space model to state-of-the-art techniques (Balog et al. 2009; Fang and Zhai 2007) that use probabilistic and language modeling approaches. Although our work is conceptually related, in that we use information about an entity to rank entities, unlike expert finding we can rank any type of entity for which opinionated content is available. Also, instead of trying to rank entities based on how well it matches a topic, we focus on ranking entities based on how well a user’s preferences are matched with opinions on that entity.

2.4 Opinion retrieval

Opinion retrieval was first explored in the TREC Enterprise Track (on email search). The goal of opinion retrieval is to locate documents (primarily blog posts) that have opinionated content. The idea here is to test the ability to find opinion expressing posts as this is essential in specialized searches like blog search. An opinion retrieval system (He et al. 2008; Yang et al. 2007) is usually built on top of standard retrieval models where relevant content is first retrieved, and then opinion analysis is done on the retrieved content to return only opinionated documents. In contrast, our idea assumes that we already have the opinionated content for a given category of entities (e.g. reviews for all hotels in San Francisco). The goal is thus to rank the entities in the order of likelihood that the entity matches the user’s preferences.

2.5 Multifaceted search

Multifaceted search is highly related to our general goal. Faceted search, also called faceted navigation or faceted browsing, allows users to explore and find information that they need by filtering or navigating with the help of some pre-determined facets (Tunkelang 2009). The users often provide a very general query (some systems do not support queries), and then they use the various facets to navigate through the results until the items of interest are found. In other words, the goal is to connect users to items that are of most interest to them. While our goal is similar, the paradigm is different. First of all, in our setup, users find entities based on unstructured text containing opinions of other users rather than structured or categorical data (often used in faceted navigation). In addition, our focus is more on keywords in the query that allows users to specify their interest on various facets. For example, a user who is looking for a laptop with a specific criteria, would provide a query such as ‘Lenovo, very light, bright screen’. In such a query, the facets are actually implicit where in this case the facets being queried are brand, weight and screen. In traditional faceted navigation, these facets are explicitly defined and are usually fixed. Thus, our idea can be considered an ad-hoc faceted navigation or a personalized faceted navigation (Koren et al. 2008) system. Our idea can be combined with traditional faceted navigation to provide a powerful search system that can greatly improve user productivity.

3 Methods for opinion-based entity ranking

In this section, we present several methods for ranking entities based on how well its opinions match a user’s preferences, including both standard retrieval models, which we treat as baselines, and some extensions of these models that we propose. To facilitate the discussion, we first introduce some notation. Let $E=\{e_1,{\ldots},e_n\}$ be a set of entities to be ranked. For each entity e _i, we assume that we can collect a set of review documents $R_i=\{r_{i1},{\ldots},r_{in_i}\}$ that contain the opinions about the entity expressed by users or reviewers, where r _ij is a review document. Let D _i be the concatenation of all the review documents of an entity e _i. For convenience, we call D _i the opinion document for entity e _i. To solve the entity ranking problem, we cast it as a text retrieval problem where the text collection ${{\mathcal{C}}}$ consists of all the opinion documents for all the entities. That is, ${{\mathcal{C}}=\{D_1, {\ldots}, D_n\}. }$

From a user’s perspective, the easiest way to express preferences for an entity would be to use keywords to describe desirable properties in various aspects. For example, a query for cars may look like “good gas mileage, small size, reliable.” We denote such a keyword query by Q. On the surface, our problem is very similar to a regular retrieval problem. However, as discussed in Sect. 1, there are some important differences, which we will leverage to extend a regular retrieval model to improve ranking accuracy. In particular, our queries semantically consist of a set of sub-queries each describing preferences for one separate aspect of an entity, and we will show that it is indeed beneficial to model these semantic aspects. We will also show that emphasizing matching of opinion words through opinion expansion is very effective because it captures the desired matching criteria of relevance better for this ranking task. We now present three baseline standard retrieval models and then we present the two extensions mentioned.

3.1 Standard retrieval models

By casting the entity ranking problem as a problem of preference matching, we can directly use any standard retrieval model to solve the problem. Here we present three state-of-the-art standard retrieval models that we will experiment with; they are known to be most effective (Amati and van Rijsbergen 2002; Fang et al. 2004) for the task of text retrieval.

3.1.1 BM25 (Okapi)

The BM25 (or Okapi) retrieval function was proposed by Robertson et al. (1994) and has been shown to be quite effective and robust for many tasks. Although it was derived based on probabilistic models, it can also be regarded as a variant of the popular vector space model since it provides a term frequency-inverse document frequency (TF-IDF) weighting-based ranking formula. Formally, the score of an opinion document D in collection ${{\mathcal{C}}}$ (with n documents) and a query Q is given by:

$$ \begin{aligned} S_{BM25}(D,Q) & = \sum_{t \in Q \cap D} \frac{k_1 c(t,D)}{c(t,D) + k_1 (1-b+b{\ast} |D|/|\widetilde{{\bf D}}|)} \\ & \quad \times \log \frac{n+1}{n_t} \end{aligned} $$

where c(t, D) and c(t, Q) are the count of term t in document D and query Q, respectively, |D| is the length of document $D, |\widetilde{{\bf D}}|$ is the average document length in the collection, n _t is the number of documents containing term t, and b, k ₁, and k ₃ are parameters that are typically set as b = 0.75, k ₁ between 1.0 to 2.0, and k ₃ between 0 and 1000. We replaced the IDF in the original Okapi formula with the normal IDF because the original one causes negative weights (Fang et al. 2004) and also performs significantly worse than the normal one in our experiments.

3.1.2 Dirichlet prior

The Dirichlet prior retrieval function is one of the most effective language models for retrieval (Zhai and Lafferty 2004). It is derived based on query likelihood scoring (Ponte and Croft 1998) and Bayesian estimation of document language model (Lafferty and Zhai 2001), but its weighting formula also resembles TF-IDF weighting and document length normalization. Formally, the score of document D and query Q is:

$$ S_{Dir}(D,Q)=\sum_{t \in Q \cap D} c(t,Q) \log \left(1+ \frac{c(t,D)}{\mu p(t|{{\mathcal{C}}})}\right) + |Q| \log \frac{\mu}{\mu+|D|} $$

where the notations are as in Okapi, ${p(t|{\mathcal{C}})}$ is the probability of term t according to a background collection language model, and μ is a smoothing parameter to be empirically set.

3.1.3 PL2

PL2 is one of the most effective functions in the family of divergence from randomness retrieval (DFR) models (Amati and van Rijsbergen 2002). Its scoring formula is based on basic statistics similar to those used in other retrieval functions and is formally defined as:

$$ \begin{aligned} S_{PL2}(D,Q) & = \sum\nolimits_{t \in Q \cap D} c(t,Q)\\ &\quad \times \frac{ tfn_t^{D} \cdot \log_2(tfn_t^{D} \cdot \lambda_t) + \log_2e \cdot \left(\frac{1}{\lambda_t}-tfn_t^D\right) + 0.5 \cdot \log_2(2\pi \cdot tfn_t^D)}{tfn_t^D+1} \end{aligned} $$

where ${tfn_t^D = c(t,D) + \log_2\left(1 + c \cdot \frac{|\widetilde{{\bf D}}|}{|D|}\right), \lambda_t=\frac{n}{c(t,{\mathcal{C}})}}$ (${c(t,{\mathcal{C}})}$ is the count of term t in the collection ${{\mathcal{C}}}$) and c > 0 is a retrieval parameter.

All these three standard retrieval models have corresponding pseudo feedback methods that can take some top ranked documents in an initial retrieval result as if they were relevant documents to extract additional terms to expand a query. Since we use the Terrier (Ounis et al. 2006) toolkit for our experiments, we leverage the pseudo feedback mechanism implemented in this toolkit. Terrier provides various DFR (Amati and van Rijsbergen 2002) based term weighting models for query expansion. We specifically use the Bose-Einstein 1 (Bo1) model, which is based on Bose-Einstein statistics (Hannah and Macdonald 2007) and is similar to Rocchio (Salton and Buckley 1997).

Although standard retrieval models can be used to solve the opinion-based entity ranking problem, these models do not consider the multiple aspects in the query. It also does not consider the special notion of “relevance” when matching an opinion document with a query. Below, we present two extensions of a standard retrieval function to model query aspects and expand a query with opinion words.

3.2 Query aspect modeling (QAM)

In our setup, we assume that separate query fields would be provided for each aspect, thus the query would naturally consist of multiple aspects. However, a standard retrieval model would not distinguish these multiple aspects; as a result, it is possible that an entity may be scored high just because of matching one of the many aspects extremely well. Thus, one way to improve a standard retrieval function is to use each aspect query to score an opinion document (equivalently an entity) and then combine the scores of an entity in all the query aspects. This way, we can ensure that an entity matches all the aspects. Another potential advantage of modeling aspects in a query, though not explored in this paper, is the ability to add expansion terms that are relevant to the aspect. For example, say we have a two aspect query—‘good gas mileage’ and ‘extremely comfy’. If we distinguish this query based on aspects, for `good gas mileage’, terms like `mpg’,‘mileage’, ‘fuel’ can be potentially added. However, if we treat the user’s preferences as long query, without distinguishing aspects, we have to be very careful on the type of terms added as we may end up retrieving items that are better in one aspect compared to the other.

While we have assumed separate query fields for different aspects, the aspects in a query can also be obtained explicitly by asking a user to use a special delimiter such as a comma to separate multiple aspects. These aspect queries can also be obtained from a regular keyword query using query parsing or segmentation techniques as shown in the work of Tan and Peng (2008). Thus, by capturing multiple aspects in the query, we may now denote a query with $Q=\{Q_1, {\ldots}, Q_k\}$ where k ≥ 1 and Q _i is a keyword query for an aspect of the entity, which we will refer to as an aspect query.

We now present several methods for leveraging this aspect structure. Let S(D, Q) be any retrieval function. We can use the function to compute a score for each document with respect to each aspect query Q _i (i.e., S(D, Q _i)), and then combine the scores to generate an overall score for each document. Depending on how we combine the scores, we have several variants of this query aspect modeling (QAM) strategy. In particular, we can either combine the scores directly or combine the ranks of documents according to their scores in each query aspect. Moreover, we can also use different ways to aggregate the scores or ranks. In our experiments, we tested the following QAM scoring methods:

Average Score: $S_{AvgScore}(D,Q) = \frac{1}{k} \sum_{i=1}^k S(D,Q_i)$
Average Rank: $S_{AvgRank}(D,Q) = \frac{1}{k} \sum_{i=1}^k Rank(D,Q_i)$
Median Rank: $S_{MedRank}(D,Q) = Median_{i \in [1,k]} Rank(D,Q_i)$
Min Rank: $S_{MinRank}(D,Q)= Min_{i \in [1,k]} Rank(D,Q_i)$
Max Rank: $S_{MaxRank}(D,Q)= Max_{i \in [1,k]} Rank(D,Q_i)$

Here, Rank(D, Q _i) refers to the rank of document D in the ranked list of documents for aspect query Q _i. Note that we did not consider other variations of score combination because of the concern that scores of a document in different aspects may not be comparable.

3.3 Opinion expansion

Another limitation of the standard retrieval models for opinion-based entity ranking is that matching an opinion word and matching an ordinary topic word are not distinguished. Intuitively, since we would like to reward an opinion document where a query aspect is favorably reviewed, it is important to match opinion words in the user’s query. However, since topic words are expected to be much more common in review documents and have less variation than opinion words, we hypothesized that expanding a query with additional “equivalent” opinion words may help in emphasizing the matching of opinion words.

Consider a query like `fantastic battery life’. Due to the non-uniform way in which people express opinions, for the same expression, some may say `awesome battery life’ while others may say something brief such as `good battery’. Therefore, it would be beneficial to expand such a query by adding synonyms of the word fantastic.

We thus propose the following opinion expansion method to expand a query with related opinion words. We use a controlled online dictionary^{Footnote 7} to first extract two classes of words from the query: (1) intensifiers, which are adverbs such as very, really, extremely and (2) common praise words, which are adjectives such as good, great, fantastic. In the case of intensifier words, we use only words that are neutral, where the orientation of the word would depend on the word or phrase following. This is to avoid changing the intended orientation of the query. For example, for the query ‘extremely comfortable car’, related intensifiers such as exaggeratedly and excessively can change the actual meaning of the user’s preference as both these words have negative connotation. Such words would thus not be included in our list or expanded on when opinion expansion is performed. Table 1 shows the complete list of intensifiers and praise words used for opinion expansion.

Table 1 List of praise words and intensifiers used for opinion expansion

Opinion-based entity ranking

Abstract

Similar content being viewed by others

Ranking entities on the basis of users’ opinions

Statistical Entity Ranking with Domain Knowledge

Named Entity Oriented Related News Ranking

1 Introduction

2 Related work

2.1 Sentiment analysis

2.2 Rating prediction and decomposition

2.3 Expert finding

2.4 Opinion retrieval

2.5 Multifaceted search

3 Methods for opinion-based entity ranking

3.1 Standard retrieval models

3.1.1 BM25 (Okapi)

3.1.2 Dirichlet prior

3.1.3 PL2

3.2 Query aspect modeling (QAM)

3.3 Opinion expansion

4 Data set

4.1 Review document collection

4.2 Query generation

4.3 Relevance judgments generation

5 Experiments

5.1 Experimental setup

5.1.1 Evaluation measures

5.1.2 Data pre-processing

5.1.3 Implementation of retrieval methods

5.2 Experiment results

5.2.1 Standard retrieval models

5.2.2 Opinion expansion

5.2.3 Query aspect modeling

5.2.4 Behavior of retrieval models with opinion expansion

5.2.5 Influence of the availability of review data

5.2.6 Sample results

6 User study

6.1 Procedure

6.2 Analysis of relevance ratings

6.3 Effectiveness of gold standard rankings

7 Discussion

8 Conclusions and future work

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation