Machine Translation

, Volume 18, Issue 4, pp 275–297 | Cite as

A Morphological Tagger for Korean: Statistical Tagging Combined with Corpus-Based Morphological Rule Application

Article

Abstract

This paper describes a novel approach to morphological tagging for Korean, an agglutinative language with a very productive inflectional system. The tagger takes raw text as input and returns a lemmatized and morphologically disambiguated output for each word: the lemma is labeled with a part-of-speech (POS) tag and the inflections are labeled with inflectional tags. Unlike the standard approach to tagging for morphologically complex languages, in our proposed approach the tagging phase precedes the analysis phase. It comprises a trigram-based tagging component followed by a morphological rule application component, obtaining 95% precision and recall on unseen test data.

Keywords

agglutinative morphology Korean morphological rules morphological tagger statistical tagging Treebank 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Reference

  1. Chan, Jeongwon, Geunbae Lee., Jong-Hyeok Lee. 1998, ‘Generalized Unknown Morpheme Guessing for Hybrid POS Tagging of Korean’. in Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, Canada, pp. 85–93.Google Scholar
  2. Church, Kenneth. 1988‘A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text’Computer Speech and Language51954CrossRefGoogle Scholar
  3. Ezeiza N., I. Alegria J. M. Arriola R. Urizar., I. Aduriz. 1998, ‘Combining Stochastic and Rule-based Methods for Disambiguation in Agglutinative Languages’. in COLING-ACL ’98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada, pp. 379–384.Google Scholar
  4. Good, I.J. 1953‘The Population Frequencies of Species and the Estimation of1 Population Parameters’.Biometrika40237264Google Scholar
  5. Hajičc, Jan and Barbora Hladkà. 1998, ‘Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset’. in COLING-ACL ’98. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada, pp. 483–490.Google Scholar
  6. Hajič Jan, Pavel Krbec, Pavel Květoň, Karel Oliva, Vladimír Petkevičc. 2001, ‘Serial Combination of Rules and Statistics: A Case Study in Czech Tagging’. in Association for Computational Linguistics 39th Annual Meeting and 10th Conference of the European Chapter, Toulouse, France, pp. 260–267.Google Scholar
  7. Hakkani-Tür, , Dilek, Z., Kemal, Oflazer., Gökhan, Tür. 2002‘Statistical Morphological Disambiguation for Agglutinative Languages’Computers and the Humanities36381410CrossRefGoogle Scholar
  8. Han Chung-hye, Na-Rae Han. 2001, ‘Part of Speech Tagging Guidelines for Penn Korean Treebank’. IRCS Report 01-09, IRCS, University of Pennsylvania.Google Scholar
  9. Han, Chung-hye, Na-Rare[sic]Han Eon-SukKo., , Eon-Suk, Ko, Martha, Palmer. 2002‘Development and Evaluation of a Korean Treebank and its Application to NLP’. in LREC 2002: Third International Conference on Language Resources and EvaluationLas Palmas de Gran CanariaSpain16351642Google Scholar
  10. Hong Y., M.W. Koo., G. Yang. 1996, ‘A Korean Morphological Analyzer for Speech Translation System’. in ICSLP 96: The Fourth International Conference on Spoken Language Processing, Philadelphia, PA, pp. 676–679.Google Scholar
  11. Karttunen Lauri. 1996, ‘Directed Replacement’. in 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, California, pp. 108–115.Google Scholar
  12. Katz Slava, M. 1987‘Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer’IEEE Transaction on Acoustics, Speech and Signal Processing35400401Google Scholar
  13. Lee Sang-Zoo, Jun-ichi Tsujii, Hae-Chang Rim. 2000, ‘Lexicalized Hidden Markov Models for Part-of-speech Tagging’. in Proceedings of the 18th International Conference on Computational Linguistics, COLING 2000 in Europe, Saarbrücken, Germany, pp. 481–487.Google Scholar
  14. Lim Hewui Seok, Jin-Dong Kim., Hae-Chang Rim. 1997, ‘A Korean Part-of-speech Tagger using Transformation-based Error-driven Learning’. in Proceedings of the 1997 International Conference on Computer Processing of Oriental Languages, Hong Kong, pp. 456–459.Google Scholar
  15. Lim Heui-Suk, Sang-Zoo Lee., Hae-Chang Rim. 1995, ‘An Efficient Korean Morphological Analyzer Using Exclusive Information’. in International Conference on Computer Processing of Oriental Languages, ICCPOL ’95, Honolulu, HI.Google Scholar
  16. Marcus, Mitch, Beatrice, Santorini., Marcinkiewicz, M. 1993‘Building a Large Annotated Corpus of English’Computational Linguistics19313330Google Scholar
  17. Palmer Martha, Chung-hye Han, Anoop Sarkar, Ann Bies. 2002, ‘Integrating Korean Analysis Components in a Modular Korean/English Machine Translation System’. Ms. University of Pennsylvania and Simon Fraser University.Google Scholar
  18. Ratnaparkhi Adwait. 1996, ‘A Maximum Entropy Model for Part-of-speech Tagging’. in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, pp. 133–142.Google Scholar
  19. Sarkar Anoop, Chung hye Han. 2002, ‘Statistical Morphological Tagging and Parsing of Korean with an LTAG Grammar’. in Proceedings of the 6th International Workshop on Tree Adjoining Grammars and Related Formalisms TAG+6, Venice, Italy, pp. 48–56.Google Scholar
  20. Srinivas B. 1997, ‘Complexity of Lexical Descriptions and its Relevance to Partial Parsing’. Ph.D. thesis, Department of Computer and Information Sciences, University of Pennsylvania.Google Scholar
  21. Tufiş Dan, Péter Dienes, Csaba Oravecz., Tamás Váradi. 2000, ‘Principled Hidden Tagset Design for Iterated Tagging of Hungarian’.in LREC 2000: 2nd International Conference on Language Resources and Evaluation, Athens, Greece, pp. 1421–1426.Google Scholar
  22. Weischedel, Ralph., Richard, Schwartz., Jeff, Palmucci, Marie, Meteer., Lance, Ramshaw. 1993‘Coping with Ambiguity and Unknown Words through Probabilistic Models’Computational Linguistics19359382Google Scholar
  23. Yoon, Juntae, C. Lee, S. Kim., M. Song ( Morany: [Morphological Analyzer of Yonsei University Morany: Morphological Analysis based on Large Lexical Database Extracted from Corpus], in Proceedings of the 11th Conference on Hangul and Korean Language Information Processing (), pp. 92–98.Google Scholar

Copyright information

© Springer 2005

Authors and Affiliations

  1. 1.Department of LinguisticsSimon Fraser UniversityBurnabyCanada
  2. 2.Department of Computer and Information SciencesUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations