The Contribution of Lexical Resources to Natural Language Processing of CJK Languages

Halpern, Jack

doi:10.1007/11939993_77

The Contribution of Lexical Resources to Natural Language Processing of CJK Languages

Jack Halpern²²

Conference paper

1568 Accesses
2 Citations
3 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Abstract

The role of lexical resources is often understated in NLP research. The complexity of Chinese, Japanese and Korean (CJK) poses special challenges to developers of NLP tools, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT). These difficulties are exacerbated by the lack of comprehensive lexical resources, especially for proper nouns, and the lack of a standardized orthography, especially in Japanese. This paper summarizes some of the major linguistic issues in the development NLP applications that are dependent on lexical resources, and discusses the central role such resources should play in enhancing the accuracy of NLP tools.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brill, E., Kacmarick, G., Brocket, C.: Automatically Harvesting Katakana- English Term Pairs from Search Engine Query Logs. In: Microsoft Research, Proc. of the Sixth Natural Language Processing Pacific Rim Symposium, Tokyo, Japan (2001)
Google Scholar
Packard, L.J.: New Approaches to Chinese Word Formation. Mouton Degruyter, Berlin (1998)
Google Scholar
Emerson, T.: Segmenting Chinese in Unicode. In: Proc. of the 16th International Unicode Conference, Amsterdam (2000)
Google Scholar
Goto, I., Uratani, N., Ehara, T.: Cross-Language Information Retrieval of Proper Nouns using Context Information. NHK Science and Technical Research Laboratories. In: Proc. of the Sixth Natural Language Processing Pacific Rim Symposium, Tokyo, Japan (2001)
Google Scholar
Huang, J.C.: Phrase Structure, Lexical Integrity, and Chinese Compounds. Journal of the Chinese Teachers Language Association 19(2), 53–78 (1984)
Google Scholar
Jacquemin, C.: Spotting and Discovering Terms through Natural Language Processing. The MIT Press, Cambridge (2001)
Google Scholar
Halpern, J., Kerman, J.: The Pitfalls and Complexities of Chinese to Chinese Conversion. In: Proc. of the Fourteenth International Unicode Conference in Cambridge, MA (1999)
Google Scholar
Halpern, J.: The Challenges of Intelligent Japanese Searching. Working paper, The CJK Dictionary Institute, Saitama, Japan (2000a), http://www.cjk.org/cjk/joa/joapaper.htm
Halpern, J.: Is English Segmentation Trivial? Working paper. The CJK Dictionary Institute, Saitama, Japan (2000b), http://www.cjk.org/cjk/reference/engmorph.htm
Kwok, K.L.: Lexicon Effects on Chinese Information Retrieval. In: Proc. of 2nd Conf. on Empirical Methods in NLP. ACL, pp. 141–148 (1997)
Google Scholar
Lunde, K.: CJKV Information Processing. O’Reilly & Associates, Sebastopol (1999)
Google Scholar
Yu, S., Zhu, X.-f., Wang, H.: New Progress of the Grammatical Knowledge-base of Contemporary Chinese. Journal of Chinese Information Processing, Institute of Computational Linguistics, Peking University 15(1) (2000)
Google Scholar
Ma, W.-y., Chen, K.-J.: Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, pp. 168–171 (2003)
Google Scholar
Tsou, B.K., Tsoi, W.F., Lai, T.B.Y., Hu, J., Chan, S.W.K.: LIVAC, a Chinese synchronous corpus, and some applications. In: 2000 International Conference on Chinese Language Computing ICCLC 2000, Chicago (2000)
Google Scholar
Zhou, Q., Yu, S.: Blending Segmentation with Tagging in Chinese Language Corpus Processing. In: 15th International Conference on Computational Linguistics, COLING 1994 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

The CJK Dictionary Institute (CJKI), 34-14, 2-chome, Tohoku, Niiza-shi, Saitama, 352-0001, Japan
Jack Halpern

Authors

Jack Halpern
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Halpern, J. (2006). The Contribution of Lexical Resources to Natural Language Processing of CJK Languages. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_77

Download citation

DOI: https://doi.org/10.1007/11939993_77
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics