Geoparsing of Czech RSS News and Evaluation of Its Spatial Distribution

  • Jiří Horák
  • Pavel Belaj
  • Igor Ivan
  • Peter Nemec
  • Jiří Ardielli
  • Jan Růžička

Abstract

Geoparsing assigns geographic identifiers to textual words and phrases in documents. The specific problem is how to apply geoparsing in languages where changes of word termination occur. An appropriate method requires a flexible solution reflecting different strategies and priorities. Sixteen Czech RSS news channels were evaluated according to ten criteria. Three selected RSS channels were monitored for more than two years. The applied geoparsing included successive steps of different filters’ application and utilized the generation of different grammatical cases for recognized entities. Various problems with geographical names are classified and documented. The quality assessment shows satisfactory results namely for identification of names in domiciles (94%). The pessimistic strategy is applied to analyze a geographical balance of news distribution. The results show significant differences between distribution of news in monitored channels and document a high concentration of cultural and national news in several locations.

Keywords

RSS Geoparsing Geocoding News Czech TV 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aronoff, S.: Geographic Information Systems: A Management Perspective. WDL Publicatios, Ottawa (1989)Google Scholar
  2. 2.
    Beaman, R.S., Conn, B.J.: Automated geoparsing and georeferencing of Malesian collection locality data. Telopea. 10(1), 43–52 (2003)Google Scholar
  3. 3.
    Caldwell, D.: Geoparsing Maps the Future of Text Documents, http://www.directionsmag.com/article.php?article_id=3268
  4. 4.
    Chowdhury, G.G.: Natural language processing. Annual Review of Information Science and Technology 37(1), 51–89 (2003)CrossRefGoogle Scholar
  5. 5.
    Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proc. Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora., pp. 90–99. The Association for Computational Linguistics, Stroudsburg (1999)Google Scholar
  6. 6.
    Da Silva, J.F., Kozareva, Z., Lopes, G.P.: Cluster Analysis and Classification of Named Entities. In: Proc. Conference on Language Resources and Evaluation, pp. 321–324. LREC, Lisbon (2004)Google Scholar
  7. 7.
    Erik, F.T.K.S.: Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In: Proc. of CoNLL 2002, Taipei, Taiwan, pp. 155–158 (2002)Google Scholar
  8. 8.
    Güting, R.H.: An Introduction to Spatial Database Systems. VLDB Journal 3(4), 357–399 (1994)CrossRefGoogle Scholar
  9. 9.
    Jun, S., Ahamad, M.: FeedEx: Collaborative exchange of news feeds. In: Proc. of the 15th International Conference on World Wide Web, pp. 113–122. ACM, New York (2006)CrossRefGoogle Scholar
  10. 10.
    Keller, M., Brownstein, J. S., Freifeld, C. C.: Expanding a Gazetteer-Based Approach for Geo-Parsing Disease Alerts (2008), http://prior-knowledge-language-ws.wdfiles.com/local--files/start/keller_slides.pdf
  11. 11.
    Košková, I., Kafka, Š.: Geoparser – automatické vyhledávání geografických lokalizací v textu. In: Proceedings of Geoinformační Infrastruktury Pro Praxi., p. 100. MSD, Brno (2009)Google Scholar
  12. 12.
    Lee, S., Lee, G.G.: Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 658–669. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Nemec, P.: Evaluation of trends and structures of news using geoparsing. Diploma Thesis, p. 83. VŠB-TU Ostrava (2010)Google Scholar
  14. 14.
    Nemec, P., Horák, J.: The Geographical Balance of Regional News of Czech TV CT24. In: Proc. of international Symposium GIS Ostrava 2009, p. 10. TANGER, Ostrava (2009)Google Scholar
  15. 15.
    Piskorski, J.: Extraction of Polish Named-Entities. In: Proc. Conference on Language Resources and Evaluation, pp. 313–316. LREC, Lisbonne (2004)Google Scholar
  16. 16.
    Popov, B., Kirilov, A., Maynard, D., Manov, D.: Creation of reusable components and language resources for Named Entity Recognition in Russian. In: Proc. Conference on Language Resources and Evaluation, pp. 309–312. LREC, Lisbonne (2004)Google Scholar
  17. 17.
  18. 18.
    Saaty, L.T.: Fundamentals of decision making and priority theory with analytic hierarchy process. RWS publications, Pittsburgh (1994)Google Scholar
  19. 19.
    Saaty, L.T., Vargas, L.G.: Models, methods, concepts, and applications of the analytic hierarchy process. Kluwer Academic, Boston (2001)CrossRefGoogle Scholar
  20. 20.
    Sia, K.C., Cho, J.: Efficient monitoring algorithm for fast news alerts. IEEE Transactions on Knowledge and Data Engineering 19(7), 950–961 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jiří Horák
    • 1
  • Pavel Belaj
    • 1
  • Igor Ivan
    • 1
  • Peter Nemec
    • 2
  • Jiří Ardielli
    • 1
  • Jan Růžička
    • 1
  1. 1.Institute of GeoinformaticsVSB Technical University of OstravaOstravaCzech Republic
  2. 2.Software602 a. s.Praha 4Czech Republic

Personalised recommendations