Automatically Geotagging Articles in the Welsh Newspapers Online Collection

Conference paper

Abstract

The National Library of Wales’ Welsh Newspapers Online collection comprises over 16 million articles from historic newspapers. It is stored in NLW’s institutional repository, and is a rich source of historic text. The text of the articles has been extracted from the digitised images using OCR. This project investigates methods of determining which articles can be automatically located to places within Wales. We use machine learning, text mining and the OpenStreetMap data as a gazetteer.

References

  1. 1.
    Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: Geotagging web content. In: Proceedings of SIGIR’04, pp. 273–280 (2004)Google Scholar
  2. 2.
    Bird, S.: Nltk: The natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72 (2006)Google Scholar
  3. 3.
    Buscaldi, D., Rosso, P.: Map-based versus knowledge-based toponym disambiguation. In: Proceedings of GIR’08, pp. 19–22 (2008)Google Scholar
  4. 4.
    Haklay, M., Weber, P.: OpenStreetMap: user-generated street maps. IEEE Pervasive Comput. 7(4), 12–18 (2008)CrossRefGoogle Scholar
  5. 5.
    Leidner, J.L., Lieberman, M.D.: Detecting geographical references in the form of place names and associated spatial natural language. SIGSPATIAL Spec. 3(2), 5–11 (2011)CrossRefGoogle Scholar
  6. 6.
    Lieberman, M.D., Samet, H., Sankaranayananan, J.: Geotagging: using proximity, sibling, and prominence clues to understand comma groups. In: GIR’10, pp. 6:1–6:8 (2010)Google Scholar
  7. 7.
    Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATHGoogle Scholar
  8. 8.
    Sultanik, E.A., Fink, C.: Rapid geotagging and disambiguation of social media text via an indexed gazetteer. Proc. ISCRAM 12, 1–10 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceAberystwyth UniversityAberystwythWales, UK
  2. 2.National Library of WalesAberystwythWales, UK

Personalised recommendations