Abstract
This paper describes and evaluates the use of Geographical Knowledge Re-Ranking, Linguistic Processing, and Query Expansion techniques to improve Geographical Information Retrieval effectiveness. Geographical Knowledge Re-Ranking is performed with Geographical Gazetteers and conservative Toponym Disambiguation techniques that boost the ranking of the geographically relevant documents retrieved by standard state-of-the-art Information Retrieval algorithms. Linguistic Processing is performed in two ways: 1) Part-of-Speech tagging and Named Entity Recognition and Classification are applied to analyze the text collections and topics to detect toponyms, 2) Stemming (Porter’s algorithm) and Lemmatization are also applied in combination with default stopwords filtering. The Query Expansion methods tested are the Bose-Einstein (Bo1) and Kullback-Leibler term weighting models. The experiments have been performed with the English Monolingual test collections of the GeoCLEF evaluations (from years 2005, 2006, 2007, and 2008) using the TF-IDF, BM25, and InL2 Information Retrieval algorithms over unprocessed texts as baselines. The experiments have been performed with each GeoCLEF test collection (25 topics per evaluation) separately and with the fusion of all these collections (100 topics). The results of evaluating separately Geographical Knowledge Re-Ranking, Linguistic Processing (lemmatization, stemming, and the combination of both), and Query Expansion with the fusion of all the topics show that all these processes improve the Mean Average Precision (MAP) and RPrecision effectiveness measures in all the experiments and show statistical significance over the baselines in most of them. The best results in MAP and RPrecision are obtained with the InL2 algorithm using the following techniques: Geographical Knowledge Re-Ranking, Lemmatization with Stemming, and Kullback-Leibler Query Expansion. Some configurations with Geographical Knowledge Re-Ranking, Linguistic Processing and Query Expansion have improved the MAP of the best official results at GeoCLEF evaluations of 2005, 2006, and 2007.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amati, G.: Probability Models for Information Retrieval Based on Divergence From Randomness. Ph.D. thesis, University of Glasgow (2003)
Brants, T.: TnT: A Statistical Part-of-speech Tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, ANLC2000, pp. 224–231. Association for Computational Linguistics, Stroudsburg (2000). http://dx.doi.org/10.3115/974147.974178
Buscaldi, D., Rosso, P.: Explicit Query Diversification for Geographical Information Retrieval. In: The 33rd European Conference on Information Retrieval, ECIR 2011, Ireland, pp. 73–80. (April 2011). https://hal.archives-ouvertes.fr/hal-00596899
Ferrés, D., Rodríguez, H.: TALP at GeoCLEF 2007: Results of a Geographical Knowledge Filtering Approach with Terrier. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 830–833. Springer, Heidelberg (2008)
Hill, L.L.: Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 280–290. Springer, Heidelberg (2000)
Jones, C.B., Purves, R.S.: Geographical Information Retrieval. International Journal of Geographical Information Science 22(3), 219–228 (2008). http://dx.doi.org/10.1080/13658810701626343
Jones, R., Zhang, W.V., Rey, B., Jhala, P., Stipp, E.: Geographic Intention and Modification in Web Search. Int. J. Geogr. Inf. Sci. 22(3), 229–246 (2008). http://dx.doi.org/10.1080/13658810701626186
Larson, R.R., Gey, F.C., Petras, V.: Berkeley at GeoCLEF: Logistic Regression and Fusion for Geographic Information Retrieval. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 963–976. Springer, Heidelberg (2006)
Mandl, T., Gey, F.C., Nunzio, G.M.D., Ferro, N., Sanderson, M., Santos, D., Womser-Hacker, C.: An Evaluation Resource for Geographic Information Retrieval. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, May 26-June 1, Marrakech, Morocco. European Language Resources Association (2008). http://www.lrec-conf.org/proceedings/lrec2008/summaries/8.html
Martins, B., Calado, P.: Learning to Rank for Geographic Information Retrieval. In: Purves, R., Clough, P.D., Jones, C.B. (eds.) Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR 2010, Zurich, Switzerland, February 18–19. ACM (2010). http://doi.acm.org/10.1145/1722080.1722107
Martins, B., Cardoso, N., Chaves, M.S., Andrade, L., Silva, M.J.: The University of Lisbon at GeoCLEF 2006. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 986–994. Springer, Heidelberg (2007)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proceedings of ACM SIGIR 2006 Workshop on Open Source Information Retrieval (OSIR 2006) (2006)
Perea-Ortega, J.M., García-Cumbreras, M.A., Ureña-López, L.A., García-Vega, M.: Geo-Textual Relevance Ranking to Improve a Text-Based Retrieval for Geographic Queries. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 278–281. Springer, Heidelberg (2011)
Sakai, T.: Statistical Reform in Information Retrieval? SIGIR Forum 48(1), 3–12 (2014). http://doi.acm.org/10.1145/2641383.2641385
Smucker, M.D., Allan, J., Carterette, B.: A Comparison of Statistical Significance Tests for Information Retrieval Evaluation. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM 2007, pp. 623–632. ACM, New York (2007). http://doi.acm.org/10.1145/1321440.1321528
Wang, R., Neumann, G.: Ontology-Based Query Construction for GeoCLEF. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 880–884. Springer, Heidelberg (2009)
Wang, R., Neumann, G.: Ontology-Based Query Construction for GeoCLEF. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 880–884. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ferrés, D., Rodríguez, H. (2015). Evaluating Geographical Knowledge Re-Ranking, Linguistic Processing and Query Expansion Techniques for Geographical Information Retrieval. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds) String Processing and Information Retrieval. SPIRE 2015. Lecture Notes in Computer Science(), vol 9309. Springer, Cham. https://doi.org/10.1007/978-3-319-23826-5_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-23826-5_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23825-8
Online ISBN: 978-3-319-23826-5
eBook Packages: Computer ScienceComputer Science (R0)