Advertisement

Computers and the Humanities

, Volume 31, Issue 2, pp 115–133 | Cite as

An Estonian Morphological Analyser and the Impact of a Corpus on Its Development

  • Heiki-Jaan Kaalep
Article

Abstract

The paper describes a morphological analyser forEstonian and how using a text corpus influenced theprocess of creating it and the resulting programitself. The influence is not limited to the lexicononly, but is also noticeable in the resulting algorithm andimplementation too. When work on the analyser began,there were no computational treatment of Estonianderivatives and compounds. After some cycles ofdevelopment and testing on the corpus, we came up withan acceptable algorithm for their treatment. Both themorphological analyser and the speller based on ithave been successfully marketed.

computer implementation Estonian language engineering morphology text corpora 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brodda, B. and F. Karlsson. "An Experiment with Automatic Morphological Analysis of Finnish". Papers from the Institute of Linguistics. Publication 40. Stockholm: University of Stockholm, 1980.Google Scholar
  2. EKG. Eesti Keele Grammatika 1 (Grammar of the Estonian Language 1.). Ed. M. Erelt. Tallinn: Eesti TA EKI, 1995.Google Scholar
  3. Francis, N.W. and H. Kucera. Manual of Information to Accompany a Standard Corpus of Present-Day Edited American English, for Use with Digital Computers. Providence, R.I., 1964.Google Scholar
  4. Guidelines. Guidelines for Electronic Text Encoding and Interchange. Ed. Michael Sperberg-McQueen and Lou Burnard, Text Encoding Initiative. Chicago: Oxford. April 8, 1994Google Scholar
  5. Hennoste, T., K. Muischnek, H. Potter and T. Roosmaa. "Tartu Ülikooli eesti kirjakeele korpus: ülevaade tehtust ja probleemidest (The Tartu University Corpus of Estonian Literary Language: An Overview of Finished Things and Problems)". Keel ja Kirjandus, 10 (1993), 587–600.Google Scholar
  6. Itogi. VINITI Itogi nauki i tehniki. Serija informatika (VINITI Summaries of Science and Technology. Series of Informatics), Vol. 7. Moscow, 1983Google Scholar
  7. Johansson, S., G. Leech, H. and Goodluck. Manual of Information to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for Use with Digital Computers. Oslo, 1978.Google Scholar
  8. Karlsson, F. "SWETWOL:A Comprehensive Morphological Analyzer for Swedish". Nordic Journal of Linguistics 1 (1992), 1–45.Google Scholar
  9. Kasik, R. Eesti keele tuletusõpetus: õppevahend eesti filoloogia ja žurnalistikaosakonna üliõpilastele. 1. Substantiivituletus (Estonian Derivation: A Textbook for the Students of the Dept. of Estonian Linguistics and Journalism. 1. Derivation of Substantives). TR Ü, Tartu, 1984.Google Scholar
  10. Kasik, R. Eesti keele tuletusõpetus: õppevahend eesti filoloogia ja žurnalistikaosakonna üliõpilastele. 1. Adjektiivi-ja adverbituletus (Estonian Derivation: A Textbook for the Students of the Dept. of Estonian Linguistics and Journalism. 1. Derivation of Adjectives and Adverbs). TR Ü, Tartu, 1992.Google Scholar
  11. Kask, A. "Liitsõnad ja liitmisviisid eesti keeles (Compound Words and Ways of Compounding in Estonian)". Eesti keele grammatika 3.1.,Tartu, 1967.Google Scholar
  12. Koskenniemi, K. "Two-Level Morphology: A General Computational Model for Wordform Recognition and Production". Publications of the Dept. of General Linguistics, University of Helsinki 11 (1983).Google Scholar
  13. Kull, R. Liitnimisõnade kujunemine eesti kirjakeeles (Nominal Compound Development in Estonian Literary Language). Dissertation for candidate of philological sciences, ENSV TA KKI, Tallinn, 1967.Google Scholar
  14. Proszeky, G. and L. Tihanyi. "A Fast Morphological Analyzer for Lemmatizing Agglutinative Languages". Papers in Computational Lexicography. Complex-92. Ed. F. Kiefer, G. Kiss and J. Pajzs. Budapest: Linguistics Institute, HAS, 1992, pp. 265–278.Google Scholar
  15. Solak, A. and K. Oflazer. "Design and Implementation of a Spelling Checker for Turkish". Literary and Linguistic Computing 8(3) (1993).Google Scholar
  16. Sproat, R. Morphology and Computation. Cambridge, MA: The MIT Press.Google Scholar
  17. Svartvik, J. and R. Quirk. A Corpus of English Conversation. Lund, 1980.Google Scholar
  18. Valgma, J. and N. Remmel. Eesti Keele Grammatika (Grammar of the Estonian Language). Tallinn: Valgus, 1970.Google Scholar
  19. Viks, Ü. A Concise Morphological Dictionary of Estonian. Tallinn: Institute of Estonian Language and Literature, 1992.Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Heiki-Jaan Kaalep
    • 1
  1. 1.University of TartuEstonia

Personalised recommendations