Abstract
The role of lexical resources is often understated in NLP research. The complexity of Chinese, Japanese and Korean (CJK) poses special challenges to developers of NLP tools, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT). These difficulties are exacerbated by the lack of comprehensive lexical resources, especially for proper nouns, and the lack of a standardized orthography, especially in Japanese. This paper summarizes some of the major linguistic issues in the development NLP applications that are dependent on lexical resources, and discusses the central role such resources should play in enhancing the accuracy of NLP tools.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brill, E., Kacmarick, G., Brocket, C.: Automatically Harvesting Katakana- English Term Pairs from Search Engine Query Logs. In: Microsoft Research, Proc. of the Sixth Natural Language Processing Pacific Rim Symposium, Tokyo, Japan (2001)
Packard, L.J.: New Approaches to Chinese Word Formation. Mouton Degruyter, Berlin (1998)
Emerson, T.: Segmenting Chinese in Unicode. In: Proc. of the 16th International Unicode Conference, Amsterdam (2000)
Goto, I., Uratani, N., Ehara, T.: Cross-Language Information Retrieval of Proper Nouns using Context Information. NHK Science and Technical Research Laboratories. In: Proc. of the Sixth Natural Language Processing Pacific Rim Symposium, Tokyo, Japan (2001)
Huang, J.C.: Phrase Structure, Lexical Integrity, and Chinese Compounds. Journal of the Chinese Teachers Language Association 19(2), 53–78 (1984)
Jacquemin, C.: Spotting and Discovering Terms through Natural Language Processing. The MIT Press, Cambridge (2001)
Halpern, J., Kerman, J.: The Pitfalls and Complexities of Chinese to Chinese Conversion. In: Proc. of the Fourteenth International Unicode Conference in Cambridge, MA (1999)
Halpern, J.: The Challenges of Intelligent Japanese Searching. Working paper, The CJK Dictionary Institute, Saitama, Japan (2000a), http://www.cjk.org/cjk/joa/joapaper.htm
Halpern, J.: Is English Segmentation Trivial? Working paper. The CJK Dictionary Institute, Saitama, Japan (2000b), http://www.cjk.org/cjk/reference/engmorph.htm
Kwok, K.L.: Lexicon Effects on Chinese Information Retrieval. In: Proc. of 2nd Conf. on Empirical Methods in NLP. ACL, pp. 141–148 (1997)
Lunde, K.: CJKV Information Processing. O’Reilly & Associates, Sebastopol (1999)
Yu, S., Zhu, X.-f., Wang, H.: New Progress of the Grammatical Knowledge-base of Contemporary Chinese. Journal of Chinese Information Processing, Institute of Computational Linguistics, Peking University 15(1) (2000)
Ma, W.-y., Chen, K.-J.: Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, pp. 168–171 (2003)
Tsou, B.K., Tsoi, W.F., Lai, T.B.Y., Hu, J., Chan, S.W.K.: LIVAC, a Chinese synchronous corpus, and some applications. In: 2000 International Conference on Chinese Language Computing ICCLC 2000, Chicago (2000)
Zhou, Q., Yu, S.: Blending Segmentation with Tagging in Chinese Language Corpus Processing. In: 15th International Conference on Computational Linguistics, COLING 1994 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Halpern, J. (2006). The Contribution of Lexical Resources to Natural Language Processing of CJK Languages. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_77
Download citation
DOI: https://doi.org/10.1007/11939993_77
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)