Advertisement

GEO-NASS: A Semantic Tagging Experience from Geographical Data on the Media

  • Angel Luis Garrido
  • Maria G. Buey
  • Sergio Ilarri
  • Eduardo Mena
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8133)

Abstract

From a documentary point of view, an important aspect when we are conducting a rigorous labeling is to consider the geographic locations related to each document. Although there exist tools and geographic databases, it is not easy to find an automated labeling system for multilingual texts specialized in this type of recognition and further adapted to a particular context.

This paper proposes a method that combines geographic location techniques with Natural Language Processing and statistical and semantic disambiguation tools to perform an appropriate labeling in a general way. The method can be configured and fine-tuned for a given context in order to optimize the results. The paper also details an experience of using the proposed method over a content management system in a real organization (a major Spanish newspaper). The experimental results obtained show an overall accuracy of around 80%, which shows the potential of the proposal.

Keywords

Geographic IR gazetteer semantic tagging NLP ontologies text classification media news 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Smeaton, A.F.: Using NLP or NLP Resources for Information Retrieval Tasks. In: Natural Language Information Retrieval. Kluwer Academic Publishers (1999)Google Scholar
  2. 2.
    Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: 27th International Conference on Research and Development in Information Retrieval (SIGIR 2004), pp. 273–280. ACM (2004)Google Scholar
  3. 3.
    Sekine, S., Ranchhod, E.: Named Entities: Recognition, Classification and Use. John Benjamins (2009)Google Scholar
  4. 4.
    Hill, L.L.: Core elements of digital gazetteers: placenames, categories, and footprints. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 280–290. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Quercini, G., Samet, H., Sankaranarayanan, J., Lieberman, M.D.: Determining the spatial reader scopes of news sources using local lexicons. In: 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 43–52. ACM (2010)Google Scholar
  6. 6.
    Rauch, E., Bukatin, M., Baker, K.: A confidence-based framework for disambiguating geographic terms. In: HLT-NAACL 2003 Workshop on Analysis of Geographic References, vol. 1, pp. 50–54. Association for Computational Linguistics (2003)Google Scholar
  7. 7.
    Li, H., Srihari, R.K., Niu, C., Li, W.: Location normalization for information extraction. In: 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)Google Scholar
  8. 8.
    Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., Fuart, F., Zaghouani, W., Widiger, A., Forslund, A.-C., Best, C.: Geocoding multilingual texts: Recognition, disambiguation and visualisation. The Computing Research Repository (CoRR) abs/cs/0609065 (2006)Google Scholar
  9. 9.
    Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993)CrossRefGoogle Scholar
  10. 10.
    Janowicz, K., Keßler, C.: The role of ontology in improving gazetteer interaction. International Journal of Geographical Information Science 22(10), 1129–1157 (2008)CrossRefGoogle Scholar
  11. 11.
    Machado, I.M.R., de Alencar, R.O., de Oliveira Campos Jr., R., Davis Jr., C.A.: An ontological gazetteer and its application for place name disambiguation in text. Journal of the Brazilian Computer Society 17(4), 267–279 (2011)CrossRefGoogle Scholar
  12. 12.
    Gilchrist, A.: Thesauri, taxonomies and ontologies - an etymological note. Journal of Documentation 59(1), 7–18 (2003)CrossRefGoogle Scholar
  13. 13.
    Garrido, A., Gómez, O., Ilarri, S., Mena, E.: Nass: News Annotation Semantic System. In: 23rd International Conference on Tools with Artificial Intelligence, pp. 904–905. IEEE (2011)Google Scholar
  14. 14.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  15. 15.
    Garrido, A.L., Gómez, O., Ilarri, S., Mena, E.: An experience developing a semantic annotation system in a media group. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 333–338. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. 16.
    Lieberman, M.D., Samet, H., Sankaranarayanan, J.: Geotagging with local lexicons to build indexes for textually-specified spatial data. In: 2010 IEEE 26th International Conference on Data Engineering, pp. 201–212. IEEE (2010)Google Scholar
  17. 17.
    Gale, W.A., Church, K.W., Yarowsky, D.: One sense per discourse. In: Workshop on Speech and Natural Language (HLT 1991), pp. 233–237. Association for Computational Linguistics (1992)Google Scholar
  18. 18.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. In: Information Processing and Management, vol. 24, pp. 513–523. Pergamon Press, Inc. (1988)Google Scholar
  19. 19.
    McGuinness, D.L., van Harmelen, F.: OWL Web Ontology Language Overview. W3C Recommendation (2004), http://www.w3.org/TR/owl-features/
  20. 20.
    Prudíhommeaux, E.: SPARQL Query Language for RDF. W3C Working Draft (2006), http://www.w3.org/TR/2006/WD-rdf-sparql-query-20061004/
  21. 21.
    Carrasco, R., Gelbukh, A.: Evaluation of TnT Tagger for Spanish. In: 4th Mexican International Conference on Computer Science, pp. 18–25. IEEE (2003)Google Scholar
  22. 22.
    Vallez, M., Pedraza-Jimenez, R.: Natural language processing in textual information retrieval and related topics. Hipertext.net (5) (2007)Google Scholar
  23. 23.
    Aguado de Cea G., Puch, J., Ramos, J.: Tagging spanish texts: The problem of ‘se’. In: Sixth International Conference on Language Resources and Evaluation (LREC 2008), pp. 2321–2324 (2008)Google Scholar
  24. 24.
    Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: An open-source suite of language analyzers. In: 4th International Conference on Language Resources and Evaluation, pp. 239–242. European Language Resources Association (2004)Google Scholar
  25. 25.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press, Cambridge (2008)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Angel Luis Garrido
    • 1
  • Maria G. Buey
    • 1
  • Sergio Ilarri
    • 1
  • Eduardo Mena
    • 1
  1. 1.IIS DepartmentUniversity of ZaragozaZaragozaSpain

Personalised recommendations