Experiments with Geo-Filtering Predicates for IR
This paper describes a set of experiments for monolingual English retrieval at Geo-CLEF 2005, evaluating a technique for spatial retrieval based on named entity tagging, toponym resolution, and re-ranking by means of geographic filtering. To this end, a series of systematic experiments in the Vector Space paradigm are presented. Plain bag-of-words versus phrasal retrieval and the potential of meronymy query expansion as a recall-enhancing device are investigated, and three alternative geo-spatial filtering techniques based on spatial clipping are compared and evaluated on 25 monolingual English queries. Preliminary results show that always choosing toponym referents based on a simple “maximum population” heuristic to approximate the salience of a referent fails to outperform TF*IDF baselines with the Geo-CLEF 2005 dataset when combined with three geo-filtering predicates. Conservative geo-filtering outperforms more aggressive predicates. The evidence further seems to suggest that query expansion with WordNet meronyms is not effective in combination with the method described. A post-hoc analysis indicates that responsible factors for the low performance include sparseness of available population data, gaps in the gazetteer that associates Minimum Bounding Rectangles with geo-terms in the query, and the composition of the Geo-CLEF 2005 dataset itself.
KeywordsDigital Library Query Expansion Relevance Assessment Minimum Bounding Rectangle Aggressive Predicate
Unable to display preview. Download preview PDF.
- 1.Gey, F., Larson, R., Sanderson, M., Joho, H., Clough, P., Petras, V.: GeoCLEF: The CLEF 2005 Cross-Language Geographic Information Retrieval Track Overview. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 908–919. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 2.Gospodnetić, O., Hatcher, E.: Lucene in Action. Manning, Greenwich(2005)Google Scholar
- 3.Cutting, D.: Lucene (2005), [online] http://lucene.apache.org/
- 4.Curran, J.R., Clark, S.: Language independent NER using a maximum entropy tagger. In: Proceedings of the Seventh Conference on Natural Language Learning (CoNLL 2003), Edmonton, Canada, pp. 164–167 (2003)Google Scholar
- 6.Leidner, J.L., Sinclair, G., Webber, B.: Grounding spatial named entities for information extraction and question answering. In: Proceedings of the Workshop on the Analysis of Geographic References held at the Joint Conference for Human Language Technology and the Annual Meeting of the Noth American Chapter of the Association for Computational Linguistics 2003 (HLT/NAACL 2003), Edmonton, Alberta, Canada, pp. 31–38 (2003)Google Scholar
- 7.Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-Where: Geotagging Web content. In: Sanderson, M., Järvelin, K., Allan, J., Bruza, P. (eds.) SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25-29, pp. 273–280. ACM, New York (2004)Google Scholar
- 8.Leidner, J.L.: An evaluation dataset for the toponym resolution task. Computers, Environment and Urban Systems. Special Issue on Geographic Information Retrieval 30(4), 400–417 (in press, 2006)Google Scholar