Efficient Confident Search in Large Review Corpora

  • Theodoros Lappas
  • Dimitrios Gunopulos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6322)


Given an extensive corpus of reviews on an item, a potential customer goes through the expressed opinions and collects information, in order to form an educated opinion and, ultimately, make a purchase decision. This task is often hindered by false reviews, that fail to capture the true quality of the item’s attributes. These reviews may be based on insufficient information or may even be fraudulent, submitted to manipulate the item’s reputation. In this paper, we formalize the Confident Search paradigm for review corpora. We then present a complete search framework which, given a set of item attributes, is able to efficiently search through a large corpus and select a compact set of high-quality reviews that accurately captures the overall consensus of the reviewers on the specified attributes. We also introduce CREST (Confident REview Search Tool), a user-friendly implementation of our framework and a valuable tool for any person dealing with large review corpora. The efficacy of our framework is demonstrated through a rigorous experimental evaluation.


Search Engine Sentiment Analysis Inverted Index Skyline Point Customer Review 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Archak, N., Ghose, A., Ipeirotis, P.: Show me the money! Deriving the pricing power of product features by mining consumer reviews. In: SIGKDD (2007)Google Scholar
  2. 2.
    Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE (2001)Google Scholar
  3. 3.
    Caprara, A., Fischetti, M., Toth, P.: Algorithms for the set covering problem. Annals of Operations Research (1996)Google Scholar
  4. 4.
    Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: ICDE (2003)Google Scholar
  5. 5.
    Chvatal, V.: A greedy heuristic for the set-covering problem. Mathematics of Operations Research (1979)Google Scholar
  6. 6.
    Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: WWW 2003 (2003)Google Scholar
  7. 7.
    Ghani, R., Probst, K., Liu, Y., Krema, M., Fano, A.: Text mining for product attribute extraction. SIGKDD Explorations Newsletter (2006)Google Scholar
  8. 8.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: SIGKDD (2004)Google Scholar
  9. 9.
    Hu, M., Liu, B.: Mining opinion features in customer reviews. In: AAAI (2004)Google Scholar
  10. 10.
    Jindal, N., Liu, B.: Opinion spam and analysis. In: WSDM 2008 (2008)Google Scholar
  11. 11.
    Ku, L.-W., Liang, Y.-T., Chen, H.-H.: Opinion extraction, summarization and tracking in news and blog corpora. In: AAAI Symposium on Computational Approaches to Analysing Weblogs, AAAI-CAAW (2006)Google Scholar
  12. 12.
    Liu, J., Cao, Y., Lin, C.-Y., Huang, Y., Zhou, M.: Low-quality product review detection in opinion summarization. In: EMNLP-CoNLL (2007)Google Scholar
  13. 13.
    Min Kim, S., Pantel, P., Chklovski, T., Pennacchiotti, M.: Automatically assessing review helpfulness. In: EMNLP 2006 (2006)Google Scholar
  14. 14.
    Pang, B., Lee, L.: Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL (2005)Google Scholar
  15. 15.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: EMNLP (2002)Google Scholar
  16. 16.
    Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. (2005)Google Scholar
  17. 17.
    Popescu, A.-M., Etzioni, O.: Extracting product features and opinions from reviews. In: HLT 2005 (2005)Google Scholar
  18. 18.
    Riloff, E., Patwardhan, S., Wiebe, J.: Feature subsumption for opinion analysis. In: EMNLP (2006)Google Scholar
  19. 19.
    Turney, P.D.: Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: ACL (2002)Google Scholar
  20. 20.
    Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB 1998 (1998)Google Scholar
  21. 21.
    Zhang, Z., Varadarajan, B.: Utility scoring of product reviews. In: CIKM (2006)Google Scholar
  22. 22.
    Zhuang, L., Jing, F., Zhu, X., Zhang, L.: Movie review mining and summarization. In: CIKM (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Theodoros Lappas
    • 1
  • Dimitrios Gunopulos
    • 2
  1. 1.UC Riverside 
  2. 2.University of Athens 

Personalised recommendations