Advertisement

DESAM — Annotated corpus for Czech

  • Karel Pala
  • Pavel Rychlý
  • Pavel Smrž
Contributed Papers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1338)

Abstract

This paper deals with Czech disambiguated corpus DESAM. It is a tagged corpus which has been manually disambiguated and can be used in various applications. We discuss the structure of the corpus, tools used for its managing, linguistic applications, and also possible use of machine learning techniques relying on the disambiguated data. Possible ways of developing the procedures for complete automatic disambiguation are considered.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    K. Pala. Desambiguating syntactic constructions from tagged corpus. In Workshop on AI Methods in Machine Learning, 1996.Google Scholar
  2. 2.
    R. Garside. The CLAWS word-tagging system, The computational analysis of English. Longman, London, 1987.Google Scholar
  3. 3.
    D. Cutting. A practical part-of-speech tagger. In Proceedings of the 3rd Conference on Natural Language Processing, Trento, Italy, March–April 1992.Google Scholar
  4. 4.
    F. Karlsson, A. Voutilainen, J. Heikkila, and A. Anttila. Constraint Grammars. Mouton de Gruyter, Berlin, 1995.Google Scholar
  5. 5.
    P. Ševeček. LEMMA — a lemmatizer for Czech. Brno, 1996. (manuscript).Google Scholar
  6. 6.
    K. Osolsobě. Algorithmic description of Czech morphology. PhD thesis, Masaryk University, Brno, 1996.Google Scholar
  7. 7.
    V. Puža. Syntactic analysis of natural language with a view to a corpora tagging. Master's thesis, Faculty of Informatics, Masaryk University, Brno, 1997.Google Scholar
  8. 8.
    B. M. Schulze and O. Christ. The CQP User's Manual.Google Scholar
  9. 9.
    O. Christ. The XKWIC User Manual.Google Scholar
  10. 10.
    J. Jelinek, J. V. Bečka, and M. Těšiteloá. Frequency Dictionary of Czech. Academia, Praha, 1961.Google Scholar
  11. 11.
    J. Hajič and B. Hladká. Probabilistic and rule-based tagging of an inflective language — a comparison. Technical Report 1, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, November 1996.Google Scholar
  12. 12.
    T. J. Sejnowski and C. R. Rosenberg. Parallel Networks that Learn to Pronounce English Text. Complex Systems, 1:145–168, 1987.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Karel Pala
    • 1
  • Pavel Rychlý
    • 1
  • Pavel Smrž
    • 1
  1. 1.Faculty of InformaticsMasaryk University BrnoBrnoCzech Republic

Personalised recommendations