Korean Compound Noun Term Analysis Based on a Chart Parsing Technique

Min, Kyongho; Wilson, William H.; Moon, Yoo-Jin

doi:10.1007/978-3-540-24581-0_16

Kyongho Min⁸,
William H. Wilson⁹ &
Yoo-Jin Moon¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2903))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1544 Accesses
1 Citations

Abstract

Unlike compound noun terms in English and French, where words are separated by white space, Korean compound noun terms are not separated by white space. In addition, some compound noun terms in the real world result from a spacing error. Thus the analysis of compound noun terms is a difficult task in Korean NLP. Systems based on probabilistic and statistical information extracted from a corpus have shown good performance on Korean compound noun analysis. However, if the domain of the actual system is expanded beyond that of the training system, then the performance on the compound noun analysis would not be consistent. In this paper, we will describe the analysis of Korean compound noun terms based on a longest substring algorithm and an agenda-based chart parsing technique, with a simple heuristic method to resolve the analyses’ ambiguities. The system successfully analysed 95.6% of the testing data (6024 compound noun terms) which ranged from 2 to 11 syllables. The average ambiguities ranged from 1 to 33 for each compound noun term.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Basili, R., Moschitti, A., Pazienza, M.: NLP-driven IR: Evaluating Performance over a Text Classification Task. In: Proceedings of IJCAI 2001, Seattle, Washington (2001)
Google Scholar
Earley, J.: An Efficient Context-Free Parsing Algorithm. CACM 13(2), 94–102 (1970)
MATH Google Scholar
Hirshberg, D.S.: Algorithms for the Longest Common Subsequence Problem. The Journal of ACM 24(4), 664–675 (1977)
Article Google Scholar
Kando, N., Kageura, K., Yoshoka, M., Oyama, K.: Phrase Processing Methods for Japanese Text Retrieval. SIGIR Forum 32(2), 23–28 (1998)
Article Google Scholar
Kang, S.: Korean Morphological Analysis Program for Linux OS (2001), http://nlp.kookmin.ac.kr
Kim, J., Kwak, B., Lee, S., Lee, G., Lee, J.: A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean. Information Retrieval 4(2), 115–132 (2001)
Article MATH Google Scholar
Kwak, B., Kim, J., Lee, G., Seo, J.: Corpus-based Learning of Compound Noun Indexing. In: Proceedings of ACL 2000 Workshop on Recent Advances in NLP and IR, Hong Kong, pp. 57–66 (2000)
Google Scholar
Lee, J., Ahn, J.: Using n-Grams for Korean Text Retrieval. In: Proceedings of SIGIR 1996, Zurich, Switzerland, pp. 216–224 (1996)
Google Scholar
Min, K., Wilson, W.H., Moon, Y.: Preferred Document Classification for a Highly Inflectional/Derivational Language. In: McKay, B., Slaney, J.K. (eds.) Canadian AI 2002. LNCS (LNAI), vol. 2557, pp. 12–23. Springer, Heidelberg (2002)
Chapter Google Scholar
Park, H., Han, Y., Lee, K., Choi, K.: A Probabilistic Approach to Compound noun Indexing in Korean Texts. In: Proceeding of COLING 1996, Copenhagen, Denmark, pp. 514–518 (1996)
Google Scholar
Yoon, J.: Compound Noun Segmentation Based on Lexical Data Extracted from Corpus. In: Proceedings of 6th ANLP, Seattle, pp. 196–203 (2000)
Google Scholar
Yun, B., Kwak, Y., Rim, H.: Resolving Ambiguous Segmantation of Korean Compound Nouns Using Statistics and Rules. Computational Intelligence 15(2), 101–113 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Sciences, Auckland University of Technology, Private Bag 92006, Auckland, 1020, New Zealand
Kyongho Min
School of Computer Science and Engineering, UNSW, Sydney, 2052, Australia
William H. Wilson
Department of MIS, Hankook University of Foreign Studies, Yongin, Kyonggi, 449-791, Korea
Yoo-Jin Moon

Authors

Kyongho Min
View author publications
You can also search for this author in PubMed Google Scholar
William H. Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Yoo-Jin Moon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Australian National University, ACT 0200, Acton, Australia
Tamás (Tom) Domonkos Gedeon
Murdoch University,
Lance Chun Che Fung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Min, K., Wilson, W.H., Moon, YJ. (2003). Korean Compound Noun Term Analysis Based on a Chart Parsing Technique. In: Gedeon, T.(.D., Fung, L.C.C. (eds) AI 2003: Advances in Artificial Intelligence. AI 2003. Lecture Notes in Computer Science(), vol 2903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24581-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-24581-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20646-0
Online ISBN: 978-3-540-24581-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics