Chunking Using Conditional Random Fields in Korean Texts

Lee, Yong-Hun; Kim, Mi-Young; Lee, Jong-Hyeok

doi:10.1007/11562214_14

Yong-Hun Lee²²,
Mi-Young Kim²² &
Jong-Hyeok Lee²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

International Conference on Natural Language Processing

1590 Accesses
4 Citations

Abstract

We present a method of chunking in Korean texts using conditional random fields (CRFs), a recently introduced probabilistic model for labeling and segmenting sequence of data. In agglutinative languages such as Korean and Japanese, a rule-based chunking method is predominantly used for its simplicity and efficiency. A hybrid of a rule-based and machine learning method was also proposed to handle exceptional cases of the rules. In this paper, we present how CRFs can be applied to the task of chunking in Korean texts. Experiments using the STEP 2000 dataset show that the proposed method significantly improves the performance as well as outperforms previous systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Chunking in Turkish with Conditional Random Fields

Chunker for Gujarati Language Using Hybrid Approach

Gut, Besser, Chunker – Selecting the Best Models for Text Chunking with Voting

References

Abney, S.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-based Parsing. Kluwer Academic Publishers, Dordrecht (1991)
Google Scholar
Ramashaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Thired ACL Workshop on Very Large Corpora (1995)
Google Scholar
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: Chunking. In: Proceedings of CoNLL-2000, pp. 127–132 (2000)
Google Scholar
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL 2001, ACL (2001)
Google Scholar
Park, S.-B., Zhang, B.-T.: Combining a Rule-based Method and a k-NN for Chunking Korean Text. In: Proceedings of the 19^th International Conference on Computer Processing of Oriental Languages, pp. 225–230 (2001)
Google Scholar
Park, S.-B., Zhang, B.-T.: Text Chunking by Combining Hand-Crafted Rules and Memory-Based Learning. In: Proceedings of the 41^st Annual Meeting of the Association for Computational Linguistics, pp. 497–504 (2003)
Google Scholar
Shin, H.-P.: Maximally Efficient Syntactic Parsing with Minimal Resources. In: Proceedings of the Conference on Hangul and Korean Language Information Processing, pp. 242–244 (1999)
Google Scholar
Kim, M.-Y., Kang, S.-J., Lee, J.-H.: Dependency Parsing by Chunks. In: Proceedings of the 27^th KISS Spring Conference, pp. 327–329 (1999)
Google Scholar
Yoon, J.-T., Choi, K.-S.: Study on KAIST Corpus, CS-TR-99-139, KAIST CS (1999)
Google Scholar
Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of International Conference on Machine Learning, Stanford, California, pp. 591–598 (2000)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the 18^th International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar
Sha, F., Pereira, F.: Shallow Parsing with Conditional Random Fields. In: Proceedings of Human Language Technology-NAACL, Edmonton, Canada (2003)
Google Scholar
Wallach, H.: Efficient Training of Conditional Random Fields. Thesis. Master of Science School of Cognitive Science, Division of Informatics. University of Edinburgh (2002)
Google Scholar
Tan, Y., Yao, T., Chen, Q., Zhu, J.: Applying conditional random fields to chinese shallow parsing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 167–176. Springer, Heidelberg (2005)
Chapter Google Scholar
Hammersley, J., Clifford, P.: Markov fields on finite graphs and lattices. Unpublished manuscript (1971)
Google Scholar
Liu, D.C., Nocedal, J.: On the limited memory bfgs method for large-scale optimization. Mathematic Programming 45, 503–528 (1989)
Article MATH MathSciNet Google Scholar
Phan, H.X., Nguyen, M.L.: FlexCRFs: A Flexible Conditional Random Fields Toolkit (2004), http://www.jaist.ac.jp/~hieuxuan/flexcrfs/flexcrfs.html
Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random fields. In: Proceedings of the ACM SIGIR (2003)
Google Scholar
Chen, S.F., Rosenfeld, R.: A Gaussian prior for smoothing maximum entropy models. Technical Report CMU-CS-99-108, Carnegie Mellon University (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Div. of Electrical and Computer Engineering POSTECH and AITrc, San 31, Hyoja-dong, Nam-gu, Pohang, 790-784, R. of Korea
Yong-Hun Lee, Mi-Young Kim & Jong-Hyeok Lee

Authors

Yong-Hun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Mi-Young Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Hyeok Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Language Technology, Macquarie University, 2019, Sydney, NSW, Australia
Robert Dale
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
Institute for Infocomm Research, 21, Heng Mui Keng Terrace, 119613, Singapore
Jian Su
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, YH., Kim, MY., Lee, JH. (2005). Chunking Using Conditional Random Fields in Korean Texts. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_14

Download citation

DOI: https://doi.org/10.1007/11562214_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Chunking Using Conditional Random Fields in Korean Texts

Abstract

Access this chapter

Preview

Similar content being viewed by others

Chunking in Turkish with Conditional Random Fields

Chunker for Gujarati Language Using Hybrid Approach

Gut, Besser, Chunker – Selecting the Best Models for Text Chunking with Voting

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Chunking Using Conditional Random Fields in Korean Texts

Abstract

Access this chapter

Preview

Similar content being viewed by others

Chunking in Turkish with Conditional Random Fields

Chunker for Gujarati Language Using Hybrid Approach

Gut, Besser, Chunker – Selecting the Best Models for Text Chunking with Voting

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation