Skip to main content

Chunking Using Conditional Random Fields in Korean Texts

  • Conference paper
Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

Abstract

We present a method of chunking in Korean texts using conditional random fields (CRFs), a recently introduced probabilistic model for labeling and segmenting sequence of data. In agglutinative languages such as Korean and Japanese, a rule-based chunking method is predominantly used for its simplicity and efficiency. A hybrid of a rule-based and machine learning method was also proposed to handle exceptional cases of the rules. In this paper, we present how CRFs can be applied to the task of chunking in Korean texts. Experiments using the STEP 2000 dataset show that the proposed method significantly improves the performance as well as outperforms previous systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abney, S.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-based Parsing. Kluwer Academic Publishers, Dordrecht (1991)

    Google Scholar 

  2. Ramashaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Thired ACL Workshop on Very Large Corpora (1995)

    Google Scholar 

  3. Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: Chunking. In: Proceedings of CoNLL-2000, pp. 127–132 (2000)

    Google Scholar 

  4. Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL 2001, ACL (2001)

    Google Scholar 

  5. Park, S.-B., Zhang, B.-T.: Combining a Rule-based Method and a k-NN for Chunking Korean Text. In: Proceedings of the 19th International Conference on Computer Processing of Oriental Languages, pp. 225–230 (2001)

    Google Scholar 

  6. Park, S.-B., Zhang, B.-T.: Text Chunking by Combining Hand-Crafted Rules and Memory-Based Learning. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 497–504 (2003)

    Google Scholar 

  7. Shin, H.-P.: Maximally Efficient Syntactic Parsing with Minimal Resources. In: Proceedings of the Conference on Hangul and Korean Language Information Processing, pp. 242–244 (1999)

    Google Scholar 

  8. Kim, M.-Y., Kang, S.-J., Lee, J.-H.: Dependency Parsing by Chunks. In: Proceedings of the 27th KISS Spring Conference, pp. 327–329 (1999)

    Google Scholar 

  9. Yoon, J.-T., Choi, K.-S.: Study on KAIST Corpus, CS-TR-99-139, KAIST CS (1999)

    Google Scholar 

  10. Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)

    Google Scholar 

  11. McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of International Conference on Machine Learning, Stanford, California, pp. 591–598 (2000)

    Google Scholar 

  12. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the 18th International Conference on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  13. Sha, F., Pereira, F.: Shallow Parsing with Conditional Random Fields. In: Proceedings of Human Language Technology-NAACL, Edmonton, Canada (2003)

    Google Scholar 

  14. Wallach, H.: Efficient Training of Conditional Random Fields. Thesis. Master of Science School of Cognitive Science, Division of Informatics. University of Edinburgh (2002)

    Google Scholar 

  15. Tan, Y., Yao, T., Chen, Q., Zhu, J.: Applying conditional random fields to chinese shallow parsing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 167–176. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Hammersley, J., Clifford, P.: Markov fields on finite graphs and lattices. Unpublished manuscript (1971)

    Google Scholar 

  17. Liu, D.C., Nocedal, J.: On the limited memory bfgs method for large-scale optimization. Mathematic Programming 45, 503–528 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  18. Phan, H.X., Nguyen, M.L.: FlexCRFs: A Flexible Conditional Random Fields Toolkit (2004), http://www.jaist.ac.jp/~hieuxuan/flexcrfs/flexcrfs.html

  19. Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random fields. In: Proceedings of the ACM SIGIR (2003)

    Google Scholar 

  20. Chen, S.F., Rosenfeld, R.: A Gaussian prior for smoothing maximum entropy models. Technical Report CMU-CS-99-108, Carnegie Mellon University (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, YH., Kim, MY., Lee, JH. (2005). Chunking Using Conditional Random Fields in Korean Texts. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_14

Download citation

  • DOI: https://doi.org/10.1007/11562214_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics