Advertisement

Named Entities in Czech: Annotating Data and Developing NE Tagger

  • Magda Ševčíková
  • Zdeněk Žabokrtský
  • Oldřich Krůza
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4629)

Abstract

This paper deals with the treatment of Named Entities (NEs) in Czech. We introduce a two-level NE classification. We have used this classification for manual annotation of two thousand sentences, gaining more than 11,000 NE instances. Employing the annotated data and Machine-Learning techniques (namely the top-down induction of decision trees), we have developed and evaluated a software system aimed at automatic detection and classification of NEs in Czech texts.

Keywords

Manual Annotation Computational Linguistics Annotate Data Trigger Word Capitalize Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Grishman, R., Sundheim, B.: Message Understanding Conference - 6: A Brief History. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING), vol. I, pp. 466–471 (1996)Google Scholar
  2. 2.
    Sekine, S.: Named Entity: History and Future (2004), http://www.cs.nyu.edu/~sekine/papers/NEsurvey200402.pdf
  3. 3.
    Collins, M., Singer, Y.: Unsupervised Models for Named Entity Classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC), pp. 189–196 (1999)Google Scholar
  4. 4.
    Talukdar, P.P., Brants, T., Liberman, M., Pereira, F.: A Context Pattern Induction Method for Named Entity Extraction. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pp. 141–148 (2006)Google Scholar
  5. 5.
    Hajič, J., Panevová, J., Hajičová, E., Sgall, P., Pajas, P., Štěpánek, J., Havelka, J., Mikulová, M., Žabokrtský, Z., Ševčíková, M.: Prague Dependency Treebank 2.0 (2006)Google Scholar
  6. 6.
    Fleischman, M., Hovy, E.: Fine Grained Classification of Named Entities. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING), vol. I, pp. 267–273 (2002)Google Scholar
  7. 7.
    Sekine, S.: Sekine’s Extended Named Entity Hierarchy (2003), http://nlp.cs.nyu.edu/ene/
  8. 8.
    Ševčíková, M., Žabokrtský, Z., Krůza, O.: Zpracování pojmenovaných entit v českých textech. ÚFAL MFF UK, Praha (2007)Google Scholar
  9. 9.
    Santos, D., Seco, N., Cardoso, N., Vilela, R.: HAREM: An Advanced NER Evaluation Contest for Portuguese. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pp. 1986–1991 (2006)Google Scholar
  10. 10.
    Sassano, M., Utsuro, T.: Named Entity Chunking Techniques in Supervised Learning for Japanese Named Entity Recognition. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING), vol. II, pp. 705–711 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Magda Ševčíková
    • 1
  • Zdeněk Žabokrtský
    • 1
  • Oldřich Krůza
    • 1
  1. 1.Faculty of Mathematics and Physics, Charles University, Malostranské nám. 25, CZ-11800 PragueCzech Republic

Personalised recommendations