Integrating Proximity to Subjective Sentences for Blog Opinion Retrieval

  • Rodrygo L. T. Santos
  • Ben He
  • Craig Macdonald
  • Iadh Ounis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5478)

Abstract

Opinion finding is a challenging retrieval task, where it has been shown that it is especially difficult to improve over a strongly performing topic-relevance baseline. In this paper, we propose a novel approach for opinion finding, which takes into account the proximity of query terms to subjective sentences in a document. We adapt two state-of-the-art opinion detection techniques to identify subjective sentences from the retrieved documents. Our first technique uses the OpinionFinder toolkit to classify the subjectiveness of sentences in a document. Our second technique uses an automatically generated dictionary of subjective terms derived from the document collection itself to identify the most subjective sentences in a document. We extend the Divergence From Randomness (DFR) proximity model to integrate the proximity of query terms to the subjective sentences identified by either of the proposed techniques. We evaluate these techniques on five different strong baselines across two different query datasets from the TREC Blog track. We show that we can significantly improve over the baselines and that, in several settings, our proposed techniques can at least match the top performing systems at the TREC Blog track.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rosenbloom, A.: The blogosphere. Communications of the ACM 47(12), 30–33 (2004)CrossRefGoogle Scholar
  2. 2.
    Mishne, G., de Rijke, M.: A study of blog search. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 289–301. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Macdonald, C., Ounis, I., Soboroff, I.: Overview of the TREC 2007 Blog track. In: Proc. of the 16th Text REtrieval Conference (2007)Google Scholar
  4. 4.
    Ounis, I., de Rijke, M., Macdonald, C., Mishne, G., Soboroff, I.: Overview of the TREC 2006 Blog track. In: Proc. of the 15th Text REtrieval Conference (2006)Google Scholar
  5. 5.
    Ounis, I., Macdonald, C., Soboroff, I.: On the TREC Blog track. In: Proc. of the 2nd International Conference on Weblogs and Social Media, Seattle, WA, USA. AAAI, Menlo Park (2008)Google Scholar
  6. 6.
    Macdonald, C., He, B., Ounis, I., Soboroff, I.: Limits of opinion-finding baseline systems. In: Proc. of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, pp. 747–748. ACM, New York (2008)Google Scholar
  7. 7.
    Macdonald, C., Ounis, I.: The TREC Blogs 2006 collection: Creating and analysing a blog test collection. Technical Report TR-2006-224, Department of Computing Science, University of Glasgow (2006)Google Scholar
  8. 8.
    He, B., Macdonald, C., He, J., Ounis, I.: An effective statistical approach to blog post opinion retrieval. In: Proc. of the 17th ACM Conference on Information and Knowledge Management, pp. 1063–1072. ACM, New York (2008)Google Scholar
  9. 9.
    Amati, G., Ambrosi, E., Bianchi, M., Gaibisso, C., Gambosi, G.: Automatic construction of an opinion-term vocabulary for ad hoc retrieval. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 89–100. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Vechtomova, O.: Using subjective adjectives in opinion retrieval from blogs. In: Proc. of the 16th Text REtrieval Conference (2007)Google Scholar
  11. 11.
    Zhou, G., Joshi, H., Bayrak, C.: Topic categorization for relevancy and opinion detection. In: Proc. of the 16th Text REtrieval Conference (2007)Google Scholar
  12. 12.
    He, B., Macdonald, C., Ounis, I.: Ranking opinionated blog posts using OpinionFinder. In: Proc. of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, pp. 727–728. ACM Press, New York (2008)Google Scholar
  13. 13.
    Ounis, I., Macdonald, C., Soboroff, I.: Overview of the TREC 2008 Blog track. In: Proc. of the 17th Text REtrieval Conference (2008)Google Scholar
  14. 14.
    Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: OpinionFinder: A system for subjectivity analysis. In: Proc. of HLT/EMNLP on Interactive Demos (2005)Google Scholar
  15. 15.
    Cacheda, F., Plachouras, V., Ounis, I.: A case study of distributed information retrieval architectures to index one terabyte of text. Information Processing and Management 41(5), 1141–1161 (2005)CrossRefGoogle Scholar
  16. 16.
    Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, pp. 472–479. ACM, New York (2005)Google Scholar
  17. 17.
    Srikanth, M., Srihari, R.: Biterm language models for document retrieval. In: Proc. of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 425–426. ACM, New York (2002)Google Scholar
  18. 18.
    Peng, J., Macdonald, C., He, B., Plachouras, V., Ounis, I.: Incorporating term dependency in the DFR framework. In: Proc. of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 843–844. ACM, New York (2007)Google Scholar
  19. 19.
    Amati, G.: Probability models for information retrieval based on Divergence From Randomness. PhD thesis, University of Glasgow (2003)Google Scholar
  20. 20.
    Lioma, C., Macdonald, C., Plachouras, V., Peng, J., He, B., Ounis, I.: University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise tracks with Terrier. In: Proc. of the 15th Text REtrieval Conference (2006)Google Scholar
  21. 21.
    Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A high performance and scalable information retrieval platform. In: Proc. of the SIGIR Workshop on Open Source Information Retrieval (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Rodrygo L. T. Santos
    • 1
  • Ben He
    • 1
  • Craig Macdonald
    • 1
  • Iadh Ounis
    • 1
  1. 1.Department of Computing ScienceUniversity of GlasgowUK

Personalised recommendations