Abstract
Unlike compound noun terms in English and French, where words are separated by white space, Korean compound noun terms are not separated by white space. In addition, some compound noun terms in the real world result from a spacing error. Thus the analysis of compound noun terms is a difficult task in Korean NLP. Systems based on probabilistic and statistical information extracted from a corpus have shown good performance on Korean compound noun analysis. However, if the domain of the actual system is expanded beyond that of the training system, then the performance on the compound noun analysis would not be consistent. In this paper, we will describe the analysis of Korean compound noun terms based on a longest substring algorithm and an agenda-based chart parsing technique, with a simple heuristic method to resolve the analyses’ ambiguities. The system successfully analysed 95.6% of the testing data (6024 compound noun terms) which ranged from 2 to 11 syllables. The average ambiguities ranged from 1 to 33 for each compound noun term.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Basili, R., Moschitti, A., Pazienza, M.: NLP-driven IR: Evaluating Performance over a Text Classification Task. In: Proceedings of IJCAI 2001, Seattle, Washington (2001)
Earley, J.: An Efficient Context-Free Parsing Algorithm. CACM 13(2), 94–102 (1970)
Hirshberg, D.S.: Algorithms for the Longest Common Subsequence Problem. The Journal of ACM 24(4), 664–675 (1977)
Kando, N., Kageura, K., Yoshoka, M., Oyama, K.: Phrase Processing Methods for Japanese Text Retrieval. SIGIR Forum 32(2), 23–28 (1998)
Kang, S.: Korean Morphological Analysis Program for Linux OS (2001), http://nlp.kookmin.ac.kr
Kim, J., Kwak, B., Lee, S., Lee, G., Lee, J.: A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean. Information Retrieval 4(2), 115–132 (2001)
Kwak, B., Kim, J., Lee, G., Seo, J.: Corpus-based Learning of Compound Noun Indexing. In: Proceedings of ACL 2000 Workshop on Recent Advances in NLP and IR, Hong Kong, pp. 57–66 (2000)
Lee, J., Ahn, J.: Using n-Grams for Korean Text Retrieval. In: Proceedings of SIGIR 1996, Zurich, Switzerland, pp. 216–224 (1996)
Min, K., Wilson, W.H., Moon, Y.: Preferred Document Classification for a Highly Inflectional/Derivational Language. In: McKay, B., Slaney, J.K. (eds.) Canadian AI 2002. LNCS (LNAI), vol. 2557, pp. 12–23. Springer, Heidelberg (2002)
Park, H., Han, Y., Lee, K., Choi, K.: A Probabilistic Approach to Compound noun Indexing in Korean Texts. In: Proceeding of COLING 1996, Copenhagen, Denmark, pp. 514–518 (1996)
Yoon, J.: Compound Noun Segmentation Based on Lexical Data Extracted from Corpus. In: Proceedings of 6th ANLP, Seattle, pp. 196–203 (2000)
Yun, B., Kwak, Y., Rim, H.: Resolving Ambiguous Segmantation of Korean Compound Nouns Using Statistics and Rules. Computational Intelligence 15(2), 101–113 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Min, K., Wilson, W.H., Moon, YJ. (2003). Korean Compound Noun Term Analysis Based on a Chart Parsing Technique. In: Gedeon, T.(.D., Fung, L.C.C. (eds) AI 2003: Advances in Artificial Intelligence. AI 2003. Lecture Notes in Computer Science(), vol 2903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24581-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-24581-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20646-0
Online ISBN: 978-3-540-24581-0
eBook Packages: Springer Book Archive