Advertisement

Korean Stochastic Word-Spacing with Dynamic Expansion of Candidate Words List

  • Mi-young Kang
  • Sung-ja Choi
  • Ae-sun Yoon
  • Hyuk-chul Kwon
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)

Abstract

The main aim of this work is to implement stochastic Korean Word-Spacing System which is equally robust for both inner-data and external-data. Word-spacing in Korean is influential in deciding semantic and syntactic scope. In order to cope with various problem yielded by word-spacing errors while processing Korean text, this study (a) presents a simple stochastic word-spacing system with only two parameters using relative word-unigram frequencies and odds favoring the inner-spacing probability of disyllables located at the boundary of stochastic-based words; (b) endeavors to diminish training-data-dependency by dynamically creating candidate words list with the longest-radix-selecting algorithm and (c) removes noise from the training-data by refining training procedure. The system thus becomes robust against unseen words and offers similar performance for both inner-data and external-data: it obtained 98.35% and 97.47% precision in word-unit correction from the inner test-data and the external test-data, respectively.

Keywords

Data Sparseness Word Boundary Word Probability Dynamic Expansion Candidate Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chung, Y.M., Lee, J.Y.: Automatic Word-segmentation at Line-breaks for Korean Text Processing. In: Proceedings of 6th Conference of Korean Society for Information Management, pp. 21–24 (1999)Google Scholar
  2. 2.
    Kang, M.Y., Kwon, H.C.: Improving Word Spacing Correction Methods for Efficient Text Processing. Proceedings of the Korean Information Science Society (B) 30. 1, 486–488 (2003)Google Scholar
  3. 3.
    Kang, M.Y., Park, S.H., Yoon, A.S., Kwon, H.C.: Potential Governing Relationship and a Korean Grammar Checker Using Partial Parsing. In: Hendtlass, T., Ali, M. (eds.) IEA/AIE 2002. LNCS (LNAI), vol. 2358, pp. 692–702. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Kang, S.S.: Automatic Segmentation for Hangul Sentences. In: Proceeding of the 10th Confer- ence on Hangul and Korean Information Processing, pp. 137–142 (1998)Google Scholar
  5. 5.
    Kang, S.S., Woo, C.W.: Automatic Segmentation of Words Using Syllable Bigram Statistics. In: Proceedings of 6th Natural Language Processing Pacific Rim Symposium, pp. 729–732 (2001)Google Scholar
  6. 6.
    Kim, S.N., Nam, H.S., Kwon, H.C.: Correction Methods of Spacing Words for Improving the Korean Spelling and Grammar Checkers. In: Proceedings of 5th Natural Language Processing Pacific Rim Symposium, pp. 415–419 (1999)Google Scholar
  7. 7.
    Lee, D.K., Lee, S.Z., Lim, H.S., Rim, H.C.: Two Statistical Models for Automatic Word Spacing of Korean Sentences. Journal of KISS(B): Software and Applications 30. 4, 358–370 (2003)Google Scholar
  8. 8.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (2001)Google Scholar
  9. 9.
    Sim, C.M., Kwon, H.C.: Implementation of a Korean Spelling Checker Based on Collocation of Words. Journal of KISS(B): Software and Applications 23. 7, 776–785 (1996)Google Scholar
  10. 10.
    Sim, K.S.: Automated Word-Segmentation for Korean Using Mutual Information of Syllables. Journal of KISS(B): Software and Applications 23. 9, 991–1000 (1996)Google Scholar
  11. 11.
    Yoon, K.S., Kang, M.Y., Kwon, H.C.: Improving Word Spacing Correction Methods Using Heuristic Clues. In: Proceedings of the EALPIIT 2003, pp. 5–11 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Mi-young Kang
    • 1
  • Sung-ja Choi
    • 1
  • Ae-sun Yoon
    • 1
  • Hyuk-chul Kwon
    • 1
  1. 1.Korean Language Processing Lab, School of Electrical & Computer EngineeringPusan National UniversityBusanKorea

Personalised recommendations