Skip to main content

The Contribution of Lexical Resources to Natural Language Processing of CJK Languages

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Abstract

The role of lexical resources is often understated in NLP research. The complexity of Chinese, Japanese and Korean (CJK) poses special challenges to developers of NLP tools, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT). These difficulties are exacerbated by the lack of comprehensive lexical resources, especially for proper nouns, and the lack of a standardized orthography, especially in Japanese. This paper summarizes some of the major linguistic issues in the development NLP applications that are dependent on lexical resources, and discusses the central role such resources should play in enhancing the accuracy of NLP tools.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brill, E., Kacmarick, G., Brocket, C.: Automatically Harvesting Katakana- English Term Pairs from Search Engine Query Logs. In: Microsoft Research, Proc. of the Sixth Natural Language Processing Pacific Rim Symposium, Tokyo, Japan (2001)

    Google Scholar 

  2. Packard, L.J.: New Approaches to Chinese Word Formation. Mouton Degruyter, Berlin (1998)

    Google Scholar 

  3. Emerson, T.: Segmenting Chinese in Unicode. In: Proc. of the 16th International Unicode Conference, Amsterdam (2000)

    Google Scholar 

  4. Goto, I., Uratani, N., Ehara, T.: Cross-Language Information Retrieval of Proper Nouns using Context Information. NHK Science and Technical Research Laboratories. In: Proc. of the Sixth Natural Language Processing Pacific Rim Symposium, Tokyo, Japan (2001)

    Google Scholar 

  5. Huang, J.C.: Phrase Structure, Lexical Integrity, and Chinese Compounds. Journal of the Chinese Teachers Language Association 19(2), 53–78 (1984)

    Google Scholar 

  6. Jacquemin, C.: Spotting and Discovering Terms through Natural Language Processing. The MIT Press, Cambridge (2001)

    Google Scholar 

  7. Halpern, J., Kerman, J.: The Pitfalls and Complexities of Chinese to Chinese Conversion. In: Proc. of the Fourteenth International Unicode Conference in Cambridge, MA (1999)

    Google Scholar 

  8. Halpern, J.: The Challenges of Intelligent Japanese Searching. Working paper, The CJK Dictionary Institute, Saitama, Japan (2000a), http://www.cjk.org/cjk/joa/joapaper.htm

  9. Halpern, J.: Is English Segmentation Trivial? Working paper. The CJK Dictionary Institute, Saitama, Japan (2000b), http://www.cjk.org/cjk/reference/engmorph.htm

  10. Kwok, K.L.: Lexicon Effects on Chinese Information Retrieval. In: Proc. of 2nd Conf. on Empirical Methods in NLP. ACL, pp. 141–148 (1997)

    Google Scholar 

  11. Lunde, K.: CJKV Information Processing. O’Reilly & Associates, Sebastopol (1999)

    Google Scholar 

  12. Yu, S., Zhu, X.-f., Wang, H.: New Progress of the Grammatical Knowledge-base of Contemporary Chinese. Journal of Chinese Information Processing, Institute of Computational Linguistics, Peking University 15(1) (2000)

    Google Scholar 

  13. Ma, W.-y., Chen, K.-J.: Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, pp. 168–171 (2003)

    Google Scholar 

  14. Tsou, B.K., Tsoi, W.F., Lai, T.B.Y., Hu, J., Chan, S.W.K.: LIVAC, a Chinese synchronous corpus, and some applications. In: 2000 International Conference on Chinese Language Computing ICCLC 2000, Chicago (2000)

    Google Scholar 

  15. Zhou, Q., Yu, S.: Blending Segmentation with Tagging in Chinese Language Corpus Processing. In: 15th International Conference on Computational Linguistics, COLING 1994 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Halpern, J. (2006). The Contribution of Lexical Resources to Natural Language Processing of CJK Languages. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_77

Download citation

  • DOI: https://doi.org/10.1007/11939993_77

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49665-6

  • Online ISBN: 978-3-540-49666-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics