Skip to main content
Log in

Automatic Extraction of Collocations From Korean Text

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

In this paper, we propose a statistical method to automaticallyextract collocations from Korean POS-tagged corpus. Since a large portion of language is represented by collocation patterns, the collocational knowledge provides a valuable resource for NLP applications. One difficulty of collocation extraction is that Korean has a partially free word order, which also appears in collocations. In this work, we exploit four statistics, ‘frequency’,‘randomness’, ‘convergence’, and ‘correlation' in order to take into account the flexible word order of Korean collocations. We separate meaningful bigrams using an evaluation function based on the four statistics and extend the bigrams to n-gram collocations using a fuzzy relation. Experiments show that this method works well for Korean collocations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Benson, M., E. Benson and R. Ilson. The BBI Combinatory Dictionary of English: A Guide to Word Combinations. Amsterdam and Philadelphia: John Benjamins, 1986.

    Google Scholar 

  • Breidt, E. “Extraction of V-N Collocations from Text Corpora: A Feasibility Study for German”. In the 1st ACL-Workshop on Very Large Corpora. 1993.

  • Choueka, Y., T. Klein and E. Neuwitz. 1983. “Automatic Retrieval of Frequent Idiomatic and Collocational Expressions in a Large Corpus”. Journal for Literary and Linguistic Computing, 4 (1983), 34–38.

    Google Scholar 

  • Church, K. and P. Hanks. “Word Association Norms, Mutual Information, and Lexicography”. Computational Linguistics, 16(1) (1989), 22–29.

    Google Scholar 

  • Cowie, A.P. “The Treatment of Collocations and Idioms in Learner's Dictionaries”. Applied Linguistics, 2(3) (1981), 223–235.

    Google Scholar 

  • Cruse, D.P. Lexical Semantics. Cambridge University Press, 1986.

  • Dunning, T. “Accurate Methods for the Statistics of Surprise and Coincidence”. Computational Linguistics (1993).

  • Haruno, M., S. Ikehara and T. Yamazaki. “Learning Bilingual Collocations by Word-Level Sorting”. In Proceedings of the 16th COLING, 1996, pp. 525–530.

  • Ikehara, S., S. Shirai and H. Uchino. “A Statistical Method for Extracting Uninterrupted and Interrupted Collocations”. In Proceedings of the 16th COLING, 1996, pp. 574–579.

  • Kjellmer, G. 1995 A Mint of Phrases: Corpus Linguistics. Longman, 1995, pp. 111–127.

  • Klir, J.G. and B. Yuan. Fuzzy Sets And Fuzzy Logic: Theory and Applications. Prentice-Hall, 1995.

  • Lee, K.J., J.-H. Kim and G.C. Kim. “Extracting Collocations from Tagged Corpus in Korean”. Proceedings of the 22nd Korean Information Science Society, 2 (1995), 623–626.

    Google Scholar 

  • Lin, D. “Extracting Collocations from Text Corpora”. In Proceedings of Tirst Workshop on Computational Terminology. Montreal, Canada, 1998.

  • Lin, D. “Automatic Identification of Non-compositional Phrases”. In the 37th Annual Meeting of ACL, 1999, pp. 317–324.

  • Manning, D.C. and H. Schütze. Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press, 1999.

    Google Scholar 

  • Martin, W. and V.P. Sterkenburg. Lexicography: Principles and Practice, 1983.

  • Nagao, M. and S. Mori. “A New Method of n-Gram Statistics for Large Number of n and Automatic Extraction of Words and Phrases from Large Text Data of Japanese”. In Proceedings of the 15th COLING, 1994, pp. 611–615.

  • Ross, S.M. Introduction To Probability and Statistics for Engineers and Scientists. John Wiley & Sons, 1987.

  • Shimohata, S., T. Sugio and J. Nagata. “Retrieving Collocations by Co-Occurrences and Word Order Constraints”. In the 35th Annual Meeting of ACL, 1997, pp. 476–481.

  • Smadja, F. “Retrieving Collocations from Text: Xtract”. Computational Linguistics, 19(1) (1993), 143–177.

    Google Scholar 

  • Smadja, F., K. MaKeown and V. Hatzivassiloglou. “Translating Collocations for Bilingual Lexicons: A Statistical Approach”. In Computational Linguistics, 22(1) (1996), 1–38.

    Google Scholar 

  • Yoon, J., C. Lee, S. Kim and M. Song. “Morphological Analysis Based on Lexical Datatbase Extracted from Corpus”. In Proceedings of Hangul and Korean Information Processing. 1999.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S., Yoon, J. & Song, M. Automatic Extraction of Collocations From Korean Text. Computers and the Humanities 35, 273–297 (2001). https://doi.org/10.1023/A:1017507019909

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1017507019909

Navigation