Advertisement

GeoJournal

, Volume 80, Issue 3, pp 375–392 | Cite as

Using machine learning methods for disambiguating place references in textual documents

  • João Santos
  • Ivo Anastácio
  • Bruno Martins
Article

Abstract

This paper presents a machine learning method for disambiguating place references in text. Solving this task can have important applications in the digital humanities and computational social sciences, by supporting the geospatial analysis of large document collections. We combine multiple features that capture the similarity between candidate disambiguations, the place references, and the context where the place references occur, in order to rank and choose from a set of candidate disambiguations, obtained from a knowledge base containing geospatial coordinates and textual descriptions for different places from all around the world. The proposed method was evaluated through English corpora used in previous work in this area, and also with a subset of the English Wikipedia. Experimental results demonstrate that the proposed method is indeed effective, showing that out-of-the-box learning algorithms and relatively simple features can obtain a high accuracy in this task.

Keywords

Place reference disambiguation Geographic text mining and retrieval Entity linking in text Learning to rank 

References

  1. Adams, B., & Janowicz, K. (2012). On the geo-indicativeness of non-georeferenced text. In Proceedings of the international AAAI conference on weblogs and social media.Google Scholar
  2. Adams, B., & McKenzie. (2013). Inferring thematic places from spatially referenced natural language descriptions. In D. Sui, S. Elwood, & M. Goodchild (Eds.), Crowdsourcing Geographic Knowledge, Springer.Google Scholar
  3. Amitay, E., Har’El, N., Sivan, R., & Soffer A. (2004). Web-a-where: Geotagging web content. In Proceedings of the ACM SIGIR conference on information retrieval.Google Scholar
  4. Anastácio, I., Calado, P., & Martins B. (2011). Supervised learning for linking named entities to wikipedia pages. In Proceedings of the text analysis conference.Google Scholar
  5. Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation . Journal of Machine Learning Research, 3(1), 993–1022.Google Scholar
  6. Broder, A. Z. (1997). On the resemblance and containment of documents. In Proceedings of the conference on compression and complexity of sequences.Google Scholar
  7. Brown, T., Baldridge, J., Esteva, M., & Xu, W. (2012). The substantial words are in the ground and sea: Computationally linking text and geography. In Texas studies in literature and language: Linguistics and literary studies: Computation and convergence.Google Scholar
  8. Bunescu, R., & Pasca, M. (2006). Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the European conference of the association for computational linguistics.Google Scholar
  9. Burges, C. J. C. (2010). From RankNet to LambdaRank to LambdaMART: An overview. Microsoft research technical report.Google Scholar
  10. Cucerzan, S.-P. (2007). Large-scale named entity disambiguation based on wikipedia data. In Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning.Google Scholar
  11. Dias, D., Anastácio, I., & Martins, B. (2012). Geocoding textual documents through hierarchical classifiers based on language models. Linguamática, Revista para o Processamento Automático das Línguas Ibéricas, 4(2), 13–25.Google Scholar
  12. Ding, J., Gravano., & Shivakumar, N. (2000). Computing geographical scopes of web resources. In Proceedings of the International Conference on Very Large Data Bases, Cairo, Egypt.Google Scholar
  13. Dutton, G. (1996). Encoding and handling geospatial data with hierarchical triangular meshes. In Advances in GIS research II.Google Scholar
  14. Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the Annual Meeting on Association for Computational Linguistics, Michigan, USA.Google Scholar
  15. Gale, W., Church, K., & Yarowsky, D. (1992). One sense per discourse. In Proceedings of the MLT workshop on speech and natural language.Google Scholar
  16. Jenness, J. (2008). Calculating areas and centroids on the sphere. In Proceedings of the annual ESRI international user conference.Google Scholar
  17. Ji, H., & Grishman, R. (2011). Knowledge base population: Successful approaches and challenges. In Proceedings of the annual meeting of the association for computational linguistics.Google Scholar
  18. Leidner, J. (2007). Toponym resolution: A comparison and taxonomy of heuristics and methods. PhD thesis, University of Edinburgh.Google Scholar
  19. Lieberman, M., & Samet, H. (2011). Multifaceted toponym recognition for streaming news. In Proceedings of the ACM SIGIR conference on information retrieval.Google Scholar
  20. Lieberman, M., & Samet, H. (2012). Adaptive context features for toponym resolution in streaming news. In Proceedings of the ACM SIGIR conference on information retrieval.Google Scholar
  21. Lieberman, M., Samet, H., & Sankaranarayanan, J. (2010). Geotagging with local lexicons to build indexes for textually-specified spatial data. In Proceedings of the IEEE international conference on data engineering.Google Scholar
  22. Mani, I., Hitzeman, J., Richer, J., Harris, D., Quimby, R., & Wellner B. (2008). SpatialML annotation scheme, corpora, and tools. In Proceedings of the international conference on language resources and evaluation.Google Scholar
  23. Martins, B., Anastácio, I., & Calado, P. (2010). A machine learning approach for resolving place references in text. In Procedings of the AGILE international conference on geographic information science.Google Scholar
  24. Mehler, A., Bao, Y., Li, X., Wang, Y., & Skiena, S. (2006). Spatial analysis of news sources. IEEE Transactions on Visualization and Computer Graphics, 12(5).Google Scholar
  25. Mihalcea, R., & Csomai, A. (2007). Wikify!: Linking documents to encyclopedic knowledge. In Proceedings of the ACM conference on conference on information and knowledge management.Google Scholar
  26. Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Linguisticae Investigationes, 1(30), 3–26.Google Scholar
  27. Pohl, A. (2010). Classifying the wikipedia articles into the opencyc taxonomy. In Proceedings of the ISWC workshop on the web of linked entities.Google Scholar
  28. Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., Xiong, W.-Y., & Li, H. (2008). Learning to rank relational objects and its application to web search. In Proceeding of the international conference on world wide web.Google Scholar
  29. Roller, S., Speriosu, M., Rallapalli, S., Wing, B., & Baldridge, J. (2012). Supervised text-based geolocation using language models on an adaptive grid. In Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning.Google Scholar
  30. Santos, J., Anastácio, I., & Martins, B. (2013). The entity linking system from dmir at the 2013 tac-kbp entity linking tasks. In Proceedings of the text analysis conference.Google Scholar
  31. Smith, D. A., & Crane, G. (2001). Disambiguating geographic names in a historical digital library. In Proceedings of the European conference on digital libraries.Google Scholar
  32. Speriosu, M., & Baldridge, J. (2013). Text-driven toponym resolution using indirect supervision. In Proceedings of the annual metting of the association for computational linguistics.Google Scholar
  33. Vincenty, T. (1975). Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey Review, XXIII(176), 88–93.Google Scholar
  34. Zheng, Z., Li, F., Huang, M., & Zhu, X. (2010). Learning to link entities with knowledge base. In Proceedings of the conference of the North American chapter of the association for computational linguistics.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Instituto Superior TécnicoLisbonPortugal

Personalised recommendations