Advertisement

Geocoding Textual Documents Through a Hierarchy of Linear Classifiers

  • Fernando Melo
  • Bruno Martins
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9273)

Abstract

In this paper, we empirically evaluate an automated technique, based on a hierarchical representation for the Earth’s surface and leveraging linear classifiers, for assigning geospatial coordinates to previously unseen documents, using only the raw text as input evidence. We measured the results obtained with models based on Support Vector Machines, over collections of geo-referenced Wikipedia articles in four different languages, namely English, German, Spanish and Portuguese. The best performing models obtained state-of-the-art results, corresponding to an average prediction error of 83 Kilometers, and a median error of just 9 Kilometers, in the case of the English Wikipedia collection.

Keywords

Text mining Document geocoding Hierarchical text classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adams, B., Janowicz, K.: On the geo-indicativeness of non-georeferenced text. In: Proceedings of the International AAAI Conference on Weblogs and Social Media (2012)Google Scholar
  2. 2.
    Dias, D., Anastácio, I., Martins, B.: A language modeling approach for georeferencing textual documents. Actas del Congreso Español de Recuperación de Información (2012)Google Scholar
  3. 3.
    Dutton, G.: Encoding and handling geospatial data with hierarchical triangular meshes. In: Kraak, M.J., Molenaar, M., (eds.) Advances in GIS Research II. CRC Press (1996)Google Scholar
  4. 4.
    Górski, K.M., Hivon, E., Banday, A.J., Wandelt, B.D., Hansen, F.K., Reinecke, M., Bartelmann, M.: HEALPIX - a framework for high resolution discretization, and fast analysis of data distributed on the sphere. The Astrophysical Journal 622(2) (2005)Google Scholar
  5. 5.
    Lieberman, M.D., Samet, H.: Multifaceted toponym recognition for streaming news. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (2011)Google Scholar
  6. 6.
    Mehler, A., Bao, Y., Li, X., Wang, Y., Skiena, S.: Spatial analysis of news sources. IEEE Transactions on Visualization and Computer Graphics 12(5) (2006)Google Scholar
  7. 7.
    Roller, S., Speriosu, M., Rallapalli, S., Wing, B., Baldridge, J.: Supervised text-based geolocation using language models on an adaptive grid. In: Proceedings of the Conference on Empirical Methods on Natural Language Processing (2012)Google Scholar
  8. 8.
    Santos, J., Anastácio, I., Martins, B.: Using machine learning methods for disambiguating place references in textual documents. GeoJournal 80(3) (2015)Google Scholar
  9. 9.
    Speriosu, M., Baldridge, J.: Text-driven toponym resolution using indirect supervision. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2013)Google Scholar
  10. 10.
    Vincenty, T.: Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey Review XXIII(176) (1975)Google Scholar
  11. 11.
    Wing, B., Baldridge, J.: Simple supervised document geolocation with geodesic grids. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2011)Google Scholar
  12. 12.
    Wing, B., Baldridge, J.: Hierarchical discriminative classification for text-based geolocation. In: Proceedings of the Conference on Empirical Methods on Natural Language Processing (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Instituto Superior Técnico and INESC-IDUniversidade de LisboaLisbonPortugal

Personalised recommendations