Information Retrieval Journal

, Volume 18, Issue 6, pp 530–558 | Cite as

OpinoFetch: a practical and efficient approach to collecting opinions on arbitrary entities



The abundance of opinions on the Web is now becoming a critical source of information in a variety of application areas such as business intelligence, market research and online shopping. Unfortunately, due to the rapid growth of online content, there is no one source to obtain a comprehensive set of opinions about a specific entity or a topic, making access to such content severely limited. While previous works have been focused on mining and summarizing online opinions, there is limited work on exploring the automatic collection of opinion content on the Web. In this paper, we propose a lightweight and practical approach to collecting opinion containing pages, namely review pages on the Web for arbitrary entities. We leverage existing Web search engines and use a novel information network called the FetchGraph to efficiently obtain review pages for entities of interest. Our experiments in three different domains show that our method is more effective than plain search engine results and we are able to collect entity specific review pages efficiently with reasonable precision and accuracy.


Opinion crawling Opinion aggregation Opinion analysis Review crawling Opinion collection Review aggregation 


  1. Chakrabarti, S., Punera, K., & Subramanyam, M. (2002). Accelerated focused crawling through online relevance feedback. In Proceedings of WWW ’02 (2002).Google Scholar
  2. Chakrabarti, S., van den Berg, M., & Dom, B. (1999). Focused crawling: A new approach to topic-specific web resource discovery. In Proceedings of the WWW ’99 (1999).Google Scholar
  3. Chen, H., Chung, Y.-M., Ramsey, M. C., & Yang, C. C. (1998). A smart itsy bitsy spider for the web. Journal of the American Society for Information Science, Special Issue on AI Techniques for Emerging Information Systems Applications, 49(7), 604–618.CrossRefGoogle Scholar
  4. De Bra, P., Houben, G. J., Kornatzky, Y., & Post, R. (1994). Information retrieval in distributed hypertexts. In Proceedings of the 4th RIAO conference, (1994).Google Scholar
  5. Diligenti, M., Coetzee, F., Lawrence, S., Giles, C. L., & Gori, M. (2000). Focused crawling using context graphs. In Proceedings of the 26th international conference on VLDB, VLDB ’00 (2000).Google Scholar
  6. Ganesan, K., Zhai, C., & Han, J. (2010). Opinosis: A graph based approach to abstractive summarization of highly redundant opinions. In Proceedings of COLING ’10, Beijing, China (2010).Google Scholar
  7. Ganesan, K., Zhai, C., & Viegas, E. (2012). Micropinion generation: An unsupervised approach to generating ultra-concise summaries of opinions. In Proceedings of the WWW ’12 (2012).Google Scholar
  8. Gerani, S., Mehdad, Y., Carenini, G., Ng, R. T., & Nejat, B. (2014). Abstractive summarization of product reviews using discourse structure. In Proceedings of the EMNLP ’14.Google Scholar
  9. Hersovici, M., Jacovi, M., Maarek, Y. S., Pelleg, D., Shtalhaim, M., & Ur, S. (1998). The shark-search algorithm. An application: Tailored web site mapping. Computer Networks and ISDN Systems, 30(1), 317–326.CrossRefGoogle Scholar
  10. Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of KDD ’04 (2004).Google Scholar
  11. Hu, M., & Liu, B. (2004). Mining opinion features in customer reviews. In Proceedings of AAAI ’04 (2004).Google Scholar
  12. Johnson, J., Tsioutsiouliklis, K., & Giles, C. L. (2003). In T. Fawcett & N. Mishra (Eds.), ICML.Google Scholar
  13. Kim, H. D., & Zhai, C. (2009). Generating comparative summaries of contradictory opinions in text. In Proceedings of the CIKM ’09 (2009).Google Scholar
  14. Lu, Y., Zhai, C., & Sundaresan, N. (2009). Rated aspect summarization of short comments. In Proceedings of the 18th international conference on World wide web.Google Scholar
  15. McCallum, A., Nigam, K., Rennie, J., & Seymore, K. (1999). A machine learning approach to building domain-specific search engines. AAAI Spring symposium on intelligent agents in cyberspace: ProceedingsGoogle Scholar
  16. Novak, B. (2004). A survey of focused web crawling algorithms. SKIDD.Google Scholar
  17. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006).Google Scholar
  18. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP ’02.Google Scholar
  19. Real, R., & Vargas, J. M. (1996). The probabilistic basis of Jaccard’s Index of similarity. Systematic Biology, 45, 380–385.CrossRefGoogle Scholar
  20. Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24(4), 35–43.Google Scholar
  21. Snyder, B., & Barzilay, R. (2007). Multiple aspect ranking using the good grief algorithm. In Proceedings of HLT-NAACL ’07, pp. 300–307.Google Scholar
  22. Vural, A. G., Cambazoglu, B. B., & Senkul, P. (2012). Sentiment-focused web crawling. In Proceedings of the CIKM ’12.Google Scholar
  23. Zhai, C. (2008). Statistical language models for information retrieval. Synthesis Lectures on Human Language Technologies, 1(1), 1–141.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.University of Illinois at Urbana ChampaignUrbanaUSA

Personalised recommendations