A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean

Kim, Jee-Hyub; Kwak, Byung-Kwan; Lee, Seungwoo; Lee, Geunbae; Lee, Jong-Hyeok

doi:10.1023/A:1011466928139

A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean

Published: July 2001

Volume 4, pages 115–132, (2001)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean

Download PDF

Jee-Hyub Kim¹,
Byung-Kwan Kwak²,
Seungwoo Lee²,
Geunbae Lee² &
…
Jong-Hyeok Lee²

128 Accesses
9 Citations
3 Altmetric
Explore all metrics

Abstract

In Korean information retrieval, compound nouns play an important role in improving precision in search experiments. There are two major approaches to compound noun indexing in Korean: statistical and linguistic. Each method, however, has its own shortcomings, such as limitations when indexing diverse types of compound nouns, over-generation of compound nouns, and data sparseness in training. In this paper, we propose a corpus-based learning method, which can index diverse types of compound nouns using rules automatically extracted from a large corpus. The automatic learning method is more portable and requires less human effort, although it exhibits a performance level similar to the manual-linguistic approach. We also present a new filtering method to solve the problems of compound noun over-generation and data sparseness.

References

Cha J, Lee G and Lee J-H (1998) Generalized unknown morpheme guessing for hybrid POS tagging of Korean. In: Proceedings of Sixth Workshop on Very Large Corpora in Coling-ACL 98.
Church KW and Hanks P (1990) Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1): 22-29.
Google Scholar
Evans DA and Zhai C (1996) Noun-phrase analysis in unrestricted text for information retrieval. In: Proceeding of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA, pp. 17-24.
Fagan JL (1989) The effectiveness of a non-syntactic approach to automatic phrase indexing for document retrieval. JASIS, 40(2): 115-132.
Google Scholar
Fox EA (1983) Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types. Ph.D. Thesis, Cornell University.
Kando N, Kageura K, Yoshoka M and Oyama K (1998) Phrase processing methods for Japanase text retrieval. SIGIR Forum, 32(2): 23-28.
Google Scholar
Kim MJ, Park M, Chang H, Choi J and Lee SJ (1998) The generation methods of compound noun for efficient index term extraction. In: Proceedings of the 10th Conference of Korean and Korean Information Processing, pp. 121-129.
Kim PK (1994) The automatic indexing of compound words from Korean text based on mutual information. Journal of KISS, 21(7): 1333-1340.
Google Scholar
Lee H-A, Lee J-H and Lee G (1997) Noun phrase indexing using clausal segmentation. Journal of KISS, 24(3): 302-311.
Google Scholar
Lee JH (1995) Combining multiple evidence from different properties of weighting schemes. In: SIGIR'95, pp. 180-188.
Salton G and Buckley C (1991) Text REtrieval conferences evaluation program. In: ftp://ftp.cs.cornell.edu/pub/ smart/.trec eval.7.0beta.tar.gz.
Strzalkowski T, Guthrie L, Karlgren J, Leistensnider J, Lin F, Perez-Carballo J, Straszheim T, Wang J andWilding J. (1996) Natural language information retrieval: TREC-5 report. In: The Fifth Text REtrieval Conference (TREC-5), NIST Special Publication. pp. 500-238.
Su K-Y, Wu M-W and Chang J-S (1994) A corpus-based approach to automatic compound extraction. In: Proceedings of ACL 94, pp. 242-247.
van Rijsbergen CJ (1979) Information Retrieval. Butterworths, London.
Google Scholar
Won H, ParkMand Lee G (2000) Integrated multi-level indexing method for compound noun processing. Journal of KISS, 27(1): 84-95.
Google Scholar
Yoon J-T, Jong E-S and Song M (1998) Analysis of Korean compound noun indexing using lexical information between nouns. Journal of KISS, 25(11): 1716-1725.
Google Scholar
Yun B-H, Kwak Y-J and Rim H-C (1997) A Korean information retrieval model alleviating syntactic term mismatches. In: Proceedings of the Natural Language Processing Pacific Rim Symposium, pp. 107-112.
Zhai C (1997) Fast statistical parsing of noun phrases for document indexing. In: Fifth Conference on Applied Natural Language Processing, pp. 312-319.

Download references

Author information

Authors and Affiliations

Biological Research Information Center (BRIC), Pohang, South Korea
Jee-Hyub Kim
Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea
Byung-Kwan Kwak, Seungwoo Lee, Geunbae Lee & Jong-Hyeok Lee

Authors

Jee-Hyub Kim
View author publications
You can also search for this author in PubMed Google Scholar
Byung-Kwan Kwak
View author publications
You can also search for this author in PubMed Google Scholar
Seungwoo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Geunbae Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Hyeok Lee
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, JH., Kwak, BK., Lee, S. et al. A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean. Information Retrieval 4, 115–132 (2001). https://doi.org/10.1023/A:1011466928139

Download citation

Issue Date: July 2001
DOI: https://doi.org/10.1023/A:1011466928139

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean

Abstract

Article PDF

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

Clinical Information Retrieval: A Literature Review

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean

Abstract

Article PDF

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

Clinical Information Retrieval: A Literature Review

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation