Skip to main content
Log in

OpinoFetch: a practical and efficient approach to collecting opinions on arbitrary entities

Information Retrieval Journal Aims and scope Submit manuscript

Cite this article

Abstract

The abundance of opinions on the Web is now becoming a critical source of information in a variety of application areas such as business intelligence, market research and online shopping. Unfortunately, due to the rapid growth of online content, there is no one source to obtain a comprehensive set of opinions about a specific entity or a topic, making access to such content severely limited. While previous works have been focused on mining and summarizing online opinions, there is limited work on exploring the automatic collection of opinion content on the Web. In this paper, we propose a lightweight and practical approach to collecting opinion containing pages, namely review pages on the Web for arbitrary entities. We leverage existing Web search engines and use a novel information network called the FetchGraph to efficiently obtain review pages for entities of interest. Our experiments in three different domains show that our method is more effective than plain search engine results and we are able to collect entity specific review pages efficiently with reasonable precision and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Chakrabarti, S., Punera, K., & Subramanyam, M. (2002). Accelerated focused crawling through online relevance feedback. In Proceedings of WWW ’02 (2002).

  • Chakrabarti, S., van den Berg, M., & Dom, B. (1999). Focused crawling: A new approach to topic-specific web resource discovery. In Proceedings of the WWW ’99 (1999).

  • Chen, H., Chung, Y.-M., Ramsey, M. C., & Yang, C. C. (1998). A smart itsy bitsy spider for the web. Journal of the American Society for Information Science, Special Issue on AI Techniques for Emerging Information Systems Applications, 49(7), 604–618.

    Article  Google Scholar 

  • De Bra, P., Houben, G. J., Kornatzky, Y., & Post, R. (1994). Information retrieval in distributed hypertexts. In Proceedings of the 4th RIAO conference, (1994).

  • Diligenti, M., Coetzee, F., Lawrence, S., Giles, C. L., & Gori, M. (2000). Focused crawling using context graphs. In Proceedings of the 26th international conference on VLDB, VLDB ’00 (2000).

  • Ganesan, K., Zhai, C., & Han, J. (2010). Opinosis: A graph based approach to abstractive summarization of highly redundant opinions. In Proceedings of COLING ’10, Beijing, China (2010).

  • Ganesan, K., Zhai, C., & Viegas, E. (2012). Micropinion generation: An unsupervised approach to generating ultra-concise summaries of opinions. In Proceedings of the WWW ’12 (2012).

  • Gerani, S., Mehdad, Y., Carenini, G., Ng, R. T., & Nejat, B. (2014). Abstractive summarization of product reviews using discourse structure. In Proceedings of the EMNLP ’14.

  • Hersovici, M., Jacovi, M., Maarek, Y. S., Pelleg, D., Shtalhaim, M., & Ur, S. (1998). The shark-search algorithm. An application: Tailored web site mapping. Computer Networks and ISDN Systems, 30(1), 317–326.

    Article  Google Scholar 

  • Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of KDD ’04 (2004).

  • Hu, M., & Liu, B. (2004). Mining opinion features in customer reviews. In Proceedings of AAAI ’04 (2004).

  • Johnson, J., Tsioutsiouliklis, K., & Giles, C. L. (2003). In T. Fawcett & N. Mishra (Eds.), ICML.

  • Kim, H. D., & Zhai, C. (2009). Generating comparative summaries of contradictory opinions in text. In Proceedings of the CIKM ’09 (2009).

  • Lu, Y., Zhai, C., & Sundaresan, N. (2009). Rated aspect summarization of short comments. In Proceedings of the 18th international conference on World wide web.

  • McCallum, A., Nigam, K., Rennie, J., & Seymore, K. (1999). A machine learning approach to building domain-specific search engines. AAAI Spring symposium on intelligent agents in cyberspace: Proceedings

  • Novak, B. (2004). A survey of focused web crawling algorithms. SKIDD.

  • Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006).

  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP ’02.

  • Real, R., & Vargas, J. M. (1996). The probabilistic basis of Jaccard’s Index of similarity. Systematic Biology, 45, 380–385.

    Article  Google Scholar 

  • Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24(4), 35–43.

    Google Scholar 

  • Snyder, B., & Barzilay, R. (2007). Multiple aspect ranking using the good grief algorithm. In Proceedings of HLT-NAACL ’07, pp. 300–307.

  • Vural, A. G., Cambazoglu, B. B., & Senkul, P. (2012). Sentiment-focused web crawling. In Proceedings of the CIKM ’12.

  • Zhai, C. (2008). Statistical language models for information retrieval. Synthesis Lectures on Human Language Technologies, 1(1), 1–141.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kavita Ganesan.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ganesan, K., Zhai, C. OpinoFetch: a practical and efficient approach to collecting opinions on arbitrary entities. Inf Retrieval J 18, 530–558 (2015). https://doi.org/10.1007/s10791-015-9272-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10791-015-9272-0

Keywords

Navigation