Advertisement

Experiments with Geo-Filtering Predicates for IR

  • Jochen L. Leidner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4022)

Abstract

This paper describes a set of experiments for monolingual English retrieval at Geo-CLEF 2005, evaluating a technique for spatial retrieval based on named entity tagging, toponym resolution, and re-ranking by means of geographic filtering. To this end, a series of systematic experiments in the Vector Space paradigm are presented. Plain bag-of-words versus phrasal retrieval and the potential of meronymy query expansion as a recall-enhancing device are investigated, and three alternative geo-spatial filtering techniques based on spatial clipping are compared and evaluated on 25 monolingual English queries. Preliminary results show that always choosing toponym referents based on a simple “maximum population” heuristic to approximate the salience of a referent fails to outperform TF*IDF baselines with the Geo-CLEF 2005 dataset when combined with three geo-filtering predicates. Conservative geo-filtering outperforms more aggressive predicates. The evidence further seems to suggest that query expansion with WordNet meronyms is not effective in combination with the method described. A post-hoc analysis indicates that responsible factors for the low performance include sparseness of available population data, gaps in the gazetteer that associates Minimum Bounding Rectangles with geo-terms in the query, and the composition of the Geo-CLEF 2005 dataset itself.

Keywords

Digital Library Query Expansion Relevance Assessment Minimum Bounding Rectangle Aggressive Predicate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gey, F., Larson, R., Sanderson, M., Joho, H., Clough, P., Petras, V.: GeoCLEF: The CLEF 2005 Cross-Language Geographic Information Retrieval Track Overview. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 908–919. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Gospodnetić, O., Hatcher, E.: Lucene in Action. Manning, Greenwich(2005)Google Scholar
  3. 3.
    Cutting, D.: Lucene (2005), [online] http://lucene.apache.org/
  4. 4.
    Curran, J.R., Clark, S.: Language independent NER using a maximum entropy tagger. In: Proceedings of the Seventh Conference on Natural Language Learning (CoNLL 2003), Edmonton, Canada, pp. 164–167 (2003)Google Scholar
  5. 5.
    Smith, D.A., Crane, G.: Disambiguating Geographic Names in a Historical Digital Library. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 127–136. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Leidner, J.L., Sinclair, G., Webber, B.: Grounding spatial named entities for information extraction and question answering. In: Proceedings of the Workshop on the Analysis of Geographic References held at the Joint Conference for Human Language Technology and the Annual Meeting of the Noth American Chapter of the Association for Computational Linguistics 2003 (HLT/NAACL 2003), Edmonton, Alberta, Canada, pp. 31–38 (2003)Google Scholar
  7. 7.
    Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-Where: Geotagging Web content. In: Sanderson, M., Järvelin, K., Allan, J., Bruza, P. (eds.) SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25-29, pp. 273–280. ACM, New York (2004)Google Scholar
  8. 8.
    Leidner, J.L.: An evaluation dataset for the toponym resolution task. Computers, Environment and Urban Systems. Special Issue on Geographic Information Retrieval 30(4), 400–417 (in press, 2006)Google Scholar
  9. 9.
    Alani, H., Jones, C.B., Tudhope, D.: Voronoi-based region approximation for geographical information retrieval with gazetteers. International Journal of Geographical Information Science 15(4), 287–306 (2001)CrossRefGoogle Scholar
  10. 10.
    Larson, R.R., Frontiera, P.: Spatial Ranking Methods for Geographic Information Retrieval (GIR) in Digital Libraries. In: Heery, R., Lyon, L. (eds.) ECDL 2004. LNCS, vol. 3232, pp. 45–56. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jochen L. Leidner
    • 1
    • 2
    • 3
  1. 1.Linguit GmbHBad BergzabernGermany
  2. 2.University of the Saarland, FR 7.4 – Speech Signal ProcessingSaarbrückenGermany
  3. 3.School of InformaticsUniversity of EdinburghEdinburghScotland, UK

Personalised recommendations