The Contribution of Lexical Resources to Natural Language Processing of CJK Languages

  • Jack Halpern
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4274)


The role of lexical resources is often understated in NLP research. The complexity of Chinese, Japanese and Korean (CJK) poses special challenges to developers of NLP tools, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT). These difficulties are exacerbated by the lack of comprehensive lexical resources, especially for proper nouns, and the lack of a standardized orthography, especially in Japanese. This paper summarizes some of the major linguistic issues in the development NLP applications that are dependent on lexical resources, and discusses the central role such resources should play in enhancing the accuracy of NLP tools.


Natural Language Processing Machine Translation Word Segmentation Proper Noun Lexical Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Brill, E., Kacmarick, G., Brocket, C.: Automatically Harvesting Katakana- English Term Pairs from Search Engine Query Logs. In: Microsoft Research, Proc. of the Sixth Natural Language Processing Pacific Rim Symposium, Tokyo, Japan (2001)Google Scholar
  2. 2.
    Packard, L.J.: New Approaches to Chinese Word Formation. Mouton Degruyter, Berlin (1998)Google Scholar
  3. 3.
    Emerson, T.: Segmenting Chinese in Unicode. In: Proc. of the 16th International Unicode Conference, Amsterdam (2000)Google Scholar
  4. 4.
    Goto, I., Uratani, N., Ehara, T.: Cross-Language Information Retrieval of Proper Nouns using Context Information. NHK Science and Technical Research Laboratories. In: Proc. of the Sixth Natural Language Processing Pacific Rim Symposium, Tokyo, Japan (2001)Google Scholar
  5. 5.
    Huang, J.C.: Phrase Structure, Lexical Integrity, and Chinese Compounds. Journal of the Chinese Teachers Language Association 19(2), 53–78 (1984)Google Scholar
  6. 6.
    Jacquemin, C.: Spotting and Discovering Terms through Natural Language Processing. The MIT Press, Cambridge (2001)Google Scholar
  7. 7.
    Halpern, J., Kerman, J.: The Pitfalls and Complexities of Chinese to Chinese Conversion. In: Proc. of the Fourteenth International Unicode Conference in Cambridge, MA (1999)Google Scholar
  8. 8.
    Halpern, J.: The Challenges of Intelligent Japanese Searching. Working paper, The CJK Dictionary Institute, Saitama, Japan (2000a),
  9. 9.
    Halpern, J.: Is English Segmentation Trivial? Working paper. The CJK Dictionary Institute, Saitama, Japan (2000b),
  10. 10.
    Kwok, K.L.: Lexicon Effects on Chinese Information Retrieval. In: Proc. of 2nd Conf. on Empirical Methods in NLP. ACL, pp. 141–148 (1997)Google Scholar
  11. 11.
    Lunde, K.: CJKV Information Processing. O’Reilly & Associates, Sebastopol (1999)Google Scholar
  12. 12.
    Yu, S., Zhu, X.-f., Wang, H.: New Progress of the Grammatical Knowledge-base of Contemporary Chinese. Journal of Chinese Information Processing, Institute of Computational Linguistics, Peking University 15(1) (2000)Google Scholar
  13. 13.
    Ma, W.-y., Chen, K.-J.: Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, pp. 168–171 (2003)Google Scholar
  14. 14.
    Tsou, B.K., Tsoi, W.F., Lai, T.B.Y., Hu, J., Chan, S.W.K.: LIVAC, a Chinese synchronous corpus, and some applications. In: 2000 International Conference on Chinese Language Computing ICCLC 2000, Chicago (2000)Google Scholar
  15. 15.
    Zhou, Q., Yu, S.: Blending Segmentation with Tagging in Chinese Language Corpus Processing. In: 15th International Conference on Computational Linguistics, COLING 1994 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jack Halpern
    • 1
  1. 1.The CJK Dictionary Institute (CJKI)SaitamaJapan

Personalised recommendations