Building a Large-Scale Commonsense Knowledge Base by Converting an Existing One in a Different Language

Jung, Yuchul; Lee, Joo-Young; Kim, Youngho; Park, Jaehyun; Myaeng, Sung-Hyon; Rim, Hae-Chang

doi:10.1007/978-3-540-70939-8_3

Yuchul Jung¹,
Joo-Young Lee²,
Youngho Kim¹,
Jaehyun Park²,
Sung-Hyon Myaeng¹ &
…
Hae-Chang Rim²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1515 Accesses
3 Citations

Abstract

This paper describes our effort to build a large-scale commonsense knowledge base in Korean by converting a pre-existing one in English, called ConceptNet. The English commonsense knowledge base is essentially a huge net consisting of concepts and relations. Triplets in the form of Concept-Relation-Concept in the net were extracted from English sentences collected from volunteers through a Web site, who were interested in entering commonsense knowledge. Our effort is an attempt to obtain its Korean version by utilizing a variety of language resources and tools. We not only employed a morphological analyzer and existing commercial machine translation software but also developed our own special-purpose translation and out-of-vocabulary handling methods. In order to handle ambiguity, we also devised a noisy concept filtering and concept generalization methods. Out of the 2.4 million assertions, i.e. triplets of concept-relation-concept, in the English ConceptNet, we generated about 200,000 Korean assertions so far. Based on our manual judgments of a 5% sample, the accuracy was 84.4%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liu, H., Singh, P.: ConceptNet – A Practical Commonsense Reasoning Tool-kit. BT Technology Journal, 211–226 (2004)
Google Scholar
Singh, P., Lin, T., Mueller, E.T., Lim, G., Perkins, T., Li Zhu, W.: Open Mind Common Sense: Knowledge acquisition from the general public. In: Proc. of the 1st Int. Conf. on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems, pp. 1123–1237 (2002)
Google Scholar
Fellbaum, C.: WordNet, an electronic lexical database. MIT Press, Cambridge (1998)
MATH Google Scholar
Hangeul Society: Urimal Korean Unabridged Dictionary, Eomungag (in Korean) (1997)
Google Scholar
Moon, Y.J.: Methodology and Techniques for the Design of Korean Noun WordNet. In: Proc. of the Natural Language Processing Pacific Rim Symposium, pp. 465–469 (1997)
Google Scholar
Lee, C.K., et al.: Automatic WordNet mapping using word sense disambiguation. In: Proc. of Joint SIGDAT Conference on EMNLP/VLC, pp. 142–147 (2000)
Google Scholar
Lee, D.-G.: Probabilistic Models for Korean Morphological Analysis and Part-of-Speech Tagging. Ph. D. thesis, Korea University (2005)
Google Scholar
Park, S.-Y.: Probabilistic Feature-based Parsing Model for Korea Syntactic Analysis. Ph. D. thesis, Korea University (2005)
Google Scholar
Brill, E.: Some advances in rule-based part of speech tagging. In: Proc. of the 20th National Conf. on Artificial Intelligence, pp. 722–727 (1994)
Google Scholar
Jung, Y.I., Yoon, A.-S., Kwon, H.-C.: Disambiguation Based on Wordnet for Transliteration of Arabic Numerals for Korean TTS. In: Proc. of Computational Linguistics and Intelligent Text Processing (CICLing), pp. 366–377 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, Daejeon, 305-732, Korea
Yuchul Jung, Youngho Kim & Sung-Hyon Myaeng
Department of Computer Science and Engineering, Korea University 1, 5-ka, Anam-dong, Seongbuk-Gu, Seoul 136-701, Korea
Joo-Young Lee, Jaehyun Park & Hae-Chang Rim

Authors

Yuchul Jung
View author publications
You can also search for this author in PubMed Google Scholar
Joo-Young Lee
View author publications
You can also search for this author in PubMed Google Scholar
Youngho Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jaehyun Park
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Hyon Myaeng
View author publications
You can also search for this author in PubMed Google Scholar
Hae-Chang Rim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jung, Y., Lee, JY., Kim, Y., Park, J., Myaeng, SH., Rim, HC. (2007). Building a Large-Scale Commonsense Knowledge Base by Converting an Existing One in a Different Language. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-70939-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics