Proximity-Based Reference Resolution to Improve Text Retrieval

Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 285)

Abstract

Queries that contain named entities are very common especially in the blog retrieval. Current approaches for document retrieval are based on the frequency of query terms in documents. These methods may underestimate the query frequency due to the fact that named entities are usually referenced using anaphoric expressions. In this paper we focus on pronouns as anaphoric expressions and propose a method for finding query-entity types including female, male and non − person which helps to identify the proper set of pronouns that can refer to each query. We also propose a proximity-based method for estimating the frequency of the anaphoric expressions which are referring to a query-entity in a document. Experimental results on a standard blog collection show that the proposed method is effective and provides significant improvement over the term-frequency-based baseline.

Keywords

Term Frequency Query Term Entity Frequency Coreference Resolution Opinion Retrieval 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Heidelberg (2007)Google Scholar
  2. 2.
    Gerani, S., Carman, M.J., Crestani, F.: Proximity-based opinion retrieval. In: Proc. of SIGIR 2010, pp. 403–410 (2010)Google Scholar
  3. 3.
    Macdonald, C., Ounis, I.: The TREC blogs06 collection: Creating and analysing a blog test collection. DCS Technical Report Series (2006), http://www.dcs.gla.ac.uk/~craigm/publications/macdonald06creating.pdf
  4. 4.
    Na, S.H., Ng, H.T.: A 2-poisson model for probabilistic coreference of named entities for improved text retrieval. In: Proc. of SIGIR 2009, pp. 275–282 (2009)Google Scholar
  5. 5.
    Nam, S.-H., Na, S.-H., Lee, Y., Lee, J.-H.: DiffPost: Filtering Non-relevant Content Based on Content Difference between Two Consecutive Blog Posts. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 791–795. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Robertson, S., Walker, S., Jones, S., Hancock, M., Gatford, M.: Okapi at TREC-3. In: Overview of the 3rd Text REtrieval Conference (TREC 3), pp. 109–126 (1994)Google Scholar
  7. 7.
    Santos, R.L.T., He, B., Macdonald, C., Ounis, I.: Integrating Proximity to Subjective Sentences for Blog Opinion Retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 325–336. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27, 521–544 (2001)CrossRefGoogle Scholar
  9. 9.
    Versley, Y., Ponzetto, S., Poesio, M., Eidelman, V., Jern, A., Smith, J., Yang, X., Moschitti, A.: Bart: A modular toolkit for coreference resolution. In: Proc. 6th Int. Conf. on Language Resources and Evaluation (LREC 2008), European Language Resources Association (ELRA), Marrakech (2008)Google Scholar
  10. 10.
    Zhang, W., Yu, C., Meng, W.: Opinion retrieval from blogs. In: Proc. of CIKM 2007, pp. 831–840 (2007)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Faculty of InformaticsUniversity of LuganoLuganoSwitzerland

Personalised recommendations