Automatic Extraction of Collocations From Korean Text

Kim, Seonho; Yoon, Juntae; Song, Mansuk

doi:10.1023/A:1017507019909

Automatic Extraction of Collocations From Korean Text

Published: August 2001

Volume 35, pages 273–297, (2001)
Cite this article

Computers and the Humanities Aims and scope Submit manuscript

Seonho Kim¹,
Juntae Yoon¹ &
Mansuk Song¹

6 Citations
Explore all metrics

Abstract

In this paper, we propose a statistical method to automaticallyextract collocations from Korean POS-tagged corpus. Since a large portion of language is represented by collocation patterns, the collocational knowledge provides a valuable resource for NLP applications. One difficulty of collocation extraction is that Korean has a partially free word order, which also appears in collocations. In this work, we exploit four statistics, ‘frequency’,‘randomness’, ‘convergence’, and ‘correlation' in order to take into account the flexible word order of Korean collocations. We separate meaningful bigrams using an evaluation function based on the four statistics and extend the bigrams to n-gram collocations using a fuzzy relation. Experiments show that this method works well for Korean collocations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Three-Layered Collocation Extraction Tool and Its Application in China English Studies

Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes

Research on Collocation Extraction Based on Syntactic and Semantic Dependency Analysis

References

Benson, M., E. Benson and R. Ilson. The BBI Combinatory Dictionary of English: A Guide to Word Combinations. Amsterdam and Philadelphia: John Benjamins, 1986.
Google Scholar
Breidt, E. “Extraction of V-N Collocations from Text Corpora: A Feasibility Study for German”. In the 1st ACL-Workshop on Very Large Corpora. 1993.
Choueka, Y., T. Klein and E. Neuwitz. 1983. “Automatic Retrieval of Frequent Idiomatic and Collocational Expressions in a Large Corpus”. Journal for Literary and Linguistic Computing, 4 (1983), 34–38.
Google Scholar
Church, K. and P. Hanks. “Word Association Norms, Mutual Information, and Lexicography”. Computational Linguistics, 16(1) (1989), 22–29.
Google Scholar
Cowie, A.P. “The Treatment of Collocations and Idioms in Learner's Dictionaries”. Applied Linguistics, 2(3) (1981), 223–235.
Google Scholar
Cruse, D.P. Lexical Semantics. Cambridge University Press, 1986.
Dunning, T. “Accurate Methods for the Statistics of Surprise and Coincidence”. Computational Linguistics (1993).
Haruno, M., S. Ikehara and T. Yamazaki. “Learning Bilingual Collocations by Word-Level Sorting”. In Proceedings of the 16th COLING, 1996, pp. 525–530.
Ikehara, S., S. Shirai and H. Uchino. “A Statistical Method for Extracting Uninterrupted and Interrupted Collocations”. In Proceedings of the 16th COLING, 1996, pp. 574–579.
Kjellmer, G. 1995 A Mint of Phrases: Corpus Linguistics. Longman, 1995, pp. 111–127.
Klir, J.G. and B. Yuan. Fuzzy Sets And Fuzzy Logic: Theory and Applications. Prentice-Hall, 1995.
Lee, K.J., J.-H. Kim and G.C. Kim. “Extracting Collocations from Tagged Corpus in Korean”. Proceedings of the 22nd Korean Information Science Society, 2 (1995), 623–626.
Google Scholar
Lin, D. “Extracting Collocations from Text Corpora”. In Proceedings of Tirst Workshop on Computational Terminology. Montreal, Canada, 1998.
Lin, D. “Automatic Identification of Non-compositional Phrases”. In the 37th Annual Meeting of ACL, 1999, pp. 317–324.
Manning, D.C. and H. Schütze. Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press, 1999.
Google Scholar
Martin, W. and V.P. Sterkenburg. Lexicography: Principles and Practice, 1983.
Nagao, M. and S. Mori. “A New Method of n-Gram Statistics for Large Number of n and Automatic Extraction of Words and Phrases from Large Text Data of Japanese”. In Proceedings of the 15th COLING, 1994, pp. 611–615.
Ross, S.M. Introduction To Probability and Statistics for Engineers and Scientists. John Wiley & Sons, 1987.
Shimohata, S., T. Sugio and J. Nagata. “Retrieving Collocations by Co-Occurrences and Word Order Constraints”. In the 35th Annual Meeting of ACL, 1997, pp. 476–481.
Smadja, F. “Retrieving Collocations from Text: Xtract”. Computational Linguistics, 19(1) (1993), 143–177.
Google Scholar
Smadja, F., K. MaKeown and V. Hatzivassiloglou. “Translating Collocations for Bilingual Lexicons: A Statistical Approach”. In Computational Linguistics, 22(1) (1996), 1–38.
Google Scholar
Yoon, J., C. Lee, S. Kim and M. Song. “Morphological Analysis Based on Lexical Datatbase Extracted from Corpus”. In Proceedings of Hangul and Korean Information Processing. 1999.

Download references

Author information

Authors and Affiliations

Department of Computer Science, College of Engineering, Yonsei University, Seoul, 120-749, Korea
Seonho Kim, Juntae Yoon & Mansuk Song

Authors

Seonho Kim
View author publications
You can also search for this author in PubMed Google Scholar
Juntae Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Mansuk Song
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S., Yoon, J. & Song, M. Automatic Extraction of Collocations From Korean Text. Computers and the Humanities 35, 273–297 (2001). https://doi.org/10.1023/A:1017507019909

Download citation

Issue Date: August 2001
DOI: https://doi.org/10.1023/A:1017507019909

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Extraction of Collocations From Korean Text

Abstract

Access this article

Similar content being viewed by others

A Three-Layered Collocation Extraction Tool and Its Application in China English Studies

Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes

Research on Collocation Extraction Based on Syntactic and Semantic Dependency Analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Automatic Extraction of Collocations From Korean Text

Abstract

Access this article

Similar content being viewed by others

A Three-Layered Collocation Extraction Tool and Its Application in China English Studies

Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes

Research on Collocation Extraction Based on Syntactic and Semantic Dependency Analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation