Normalizing Spatial Information to Improve Geographical Information Indexing and Retrieval in Digital Libraries

  • Damien Palacio
  • Christian Sallaberry
  • Mauro Gaio
Conference paper
Part of the Lecture Notes in Geoinformation and Cartography book series (LNGC)


Our contribution is dedicated to geographic information contained in unstructured textual documents. The main focus of this article is to propose a general indexing strategy that is dedicated to spatial information, but which could be applied to temporal and thematic information as well. More specifically, we have developed a process flow that indexes the spatial information contained in textual documents. This process flow interprets spatial information and computes corresponding accurate footprints. Our goal is to normalize such heterogeneous grained and scaled spatial information (points, polylines, polygons). This normalization is carried out at the index level by grouping spatial information together within spatial areas and by using statistics to compute frequencies for such areas and weights for the retrieved documents.


Spatial Information Vector Space Model Spatial Index Thematic Information Tile Frequency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Baccino T, Pynte J (1994) Spatial coding and discourse models during text reading. Lang Cogn Process 9:143–155CrossRefGoogle Scholar
  2. Cai G (2002) GeoVSM: an integrated retrieval model for geographic information. In: Egenhofer MJ, Mark DM (eds) GIScience. Lecture notes in computer science, vol 2478. Springer, Boulder, CO, USA, pp 65–79Google Scholar
  3. Clough P, Joho H, Purves R (2006) Judging the spatial relevance of documents for GIR. In: ECIR’06: Proceedings of the 28th European conference on IR research, April 2006, Lecture notes in computer science, vol 3936. Springer, London, UK, pp 548–552Google Scholar
  4. Egenhofer MJ (1991) Reasoning about Binary Topological Relations. In: Gunther O, Schek H-J (eds) SSD. Lecture notes in computer science, vol 525. Springer, Zürich, Switzerland, pp 143–160Google Scholar
  5. Gaio M, Sallaberry C, Etcheverry P, Marquesuzaa C, Lesbegueries J (2008) A global process to access documents’ contents from a geographical point of view. J Vis Lang Comput 19(1):3–23CrossRefGoogle Scholar
  6. Glander T, Dollner J (2007) Cell-based generalization of 3D building groups with outlier management. In: Samet H, Shahabi C, Schneider M (eds) GIS. ACM, Seattle, WA, USA, p 54Google Scholar
  7. Jones CB, Purves R (2006) GIR’05 2005 ACM workshop on geographical information retrieval. SIGIR Forum 40(1):34–37CrossRefGoogle Scholar
  8. Jones CB, Alani H, Tudhope D (2001) Geographical information retrieval with ontologies of place. In: Montello DR (ed) Proceedings of the conference on spatial information theory (COSIT 2001). Lecture notes in computer science, vol 2205. Springer, Heidelberg/Morro Bayand, pp 322–335Google Scholar
  9. Kanhabua N, Nørvag K (2008) Improving temporal language models for determining time of non-timestamped documents. In: ECDL’08: Proceedings of the 12th European conference on research and advanced technology for digital libraries, Springer, Berlin/Heidelberg, pp 358–370Google Scholar
  10. Le Parc-Lacayrelle A, Gaio M, Sallaberry C (2007) La composante temps dans l’information géographique textuelle. Revue Document Numérique 10(2):129–148CrossRefGoogle Scholar
  11. Li H, Srihari KR, Niu C, Li W (2002) Location normalization for information extraction. In: 19th international conference on computational linguistics (COLING 2002). Howard International House and Academia Sinica, Taipei, Association for Computational LinguisticsGoogle Scholar
  12. Mandl T, Gey FC, Nunzio GMD, Ferro N, Larson R, Sanderson M, Santos D, Womser-Hacker C, Xie X (2007) GeoCLEF 2007: the CLEF 2007 cross-language geographic information retrieval track overview. In: Peters C, Jijkoun V, Mandl T, Muller H, Oard DW, Penas A, Petras V, Santos D (eds) CLEF. Lecture notes in computer science, vol 5152. Springer, Budapest, Hungary, pp 745–772Google Scholar
  13. Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefGoogle Scholar
  14. Marquesuzaà C, Etcheverry P, Lesbegueries J (2005) Exploiting geospatial markers to explore and resocialize localized documents. In: Rodriguez MA, Cruz IF, Egenhofer MJ, Levashkin S (eds) GeoS. Lecture notes in computer science, vol 3799. Springer, Mexico City, Mexico, pp 153–165Google Scholar
  15. Martins B, Silva MJ, Andrade L (2005) Indexing and ranking in Geo-IR systems. In: GIR’05: Proceedings of the 2005 workshop on geographic information retrieval, ACM, New York, pp 31–34Google Scholar
  16. Martins B, Manguinhas H, Borbinha JL (2008) Extracting and exploring the geo-temporal semantics of textual resources. In: Proceedings of the IEEE international conference on semantic computing. (ICSC’08), IEEE Computer Society, Washington, DC, USA, pp 1–9Google Scholar
  17. Rees T (2003) “C-squares”, a new spatial indexing system and its applicability to the description of oceanographic datasets. Oceanography 16(1):11–19CrossRefGoogle Scholar
  18. Robbins S, Evans AC, Collins DL, Whitesides S (2003) Tuning and comparing spatial normalization methods. In: Ellis RE, Peters TM (eds) MICCAI (2). Lecture notes in computer science, vol 2879. Springer, Montréal, Canada, pp 910–917Google Scholar
  19. Sallaberry C, Baziz M, Lesbegueries J, Gaio M (2007) Towards an IE and IR system dealing with spatial information in digital libraries – evaluation case study. In: ICEIS’07: Proceedings of the 9th international sonference on enterprise information systems, Funchal, Madeira, Portugal, pp 190–197Google Scholar
  20. Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw-Hill, New York, NY, USAGoogle Scholar
  21. Sautter G, Bohm K, Padberg F, Tichy WF (2007) Empirical evaluation of semi-automated XML annotation of text documents with the GoldenGATE Editor. In: ECDL’07: Proceedings of the 11th European conference on digital libraries. Lecture notes in computer science, vol 4675. Springer, Budapest, Hungary, pp 357–367Google Scholar
  22. Savoy J (2002) Morphologie et recherche d’information. Technical report, Institut interfacultaire d’informatique, Université de Neuchatel, NeuchatelGoogle Scholar
  23. Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Docum 28(1):11–21CrossRefGoogle Scholar
  24. Vaid S, Jones CB, Joho H, Sanderson M (2005) Spatio-textual indexing for geographical search on the web. In: Medeiros CB, Egenhofer MJ, Bertino E (eds) SSTD. Lecture notes in computer science, vol 3633. Springer, Angra dos Reis, Brazil, pp 218–235Google Scholar
  25. Visser U (2004) Intelligent information integration for the semantic web. Springer, HeidelbergCrossRefGoogle Scholar
  26. Zhang Q (2005) Road network generalization based on connection analysis. In: Developments in spatial data handling. Springer, Berlin/Heidelberg, pp 343–353CrossRefGoogle Scholar
  27. Zhou S, Jones CB (2004) Shape-aware line generalisation with weighted effective area. In: Fisher PF (ed) Developments in spatial data handling 11th international symposium on spatial data handling. Springer, Kyoto, Japan, pp 369–380Google Scholar
  28. Zhou X, Zhang Y, Lu S, Chen G (2000) On spatial information retrieval and database generalization. In: Proceedings of the Kyoto international conference on digital libraries. Kyoto, pp 380–386Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  • Damien Palacio
    • 1
  • Christian Sallaberry
    • 1
  • Mauro Gaio
    • 1
  1. 1.LIUPPA, Université de Pau, avenue de l’UniversitéPauFrance

Personalised recommendations