Abstract
This paper describes our effort to build a large-scale commonsense knowledge base in Korean by converting a pre-existing one in English, called ConceptNet. The English commonsense knowledge base is essentially a huge net consisting of concepts and relations. Triplets in the form of Concept-Relation-Concept in the net were extracted from English sentences collected from volunteers through a Web site, who were interested in entering commonsense knowledge. Our effort is an attempt to obtain its Korean version by utilizing a variety of language resources and tools. We not only employed a morphological analyzer and existing commercial machine translation software but also developed our own special-purpose translation and out-of-vocabulary handling methods. In order to handle ambiguity, we also devised a noisy concept filtering and concept generalization methods. Out of the 2.4 million assertions, i.e. triplets of concept-relation-concept, in the English ConceptNet, we generated about 200,000 Korean assertions so far. Based on our manual judgments of a 5% sample, the accuracy was 84.4%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Liu, H., Singh, P.: ConceptNet – A Practical Commonsense Reasoning Tool-kit. BT Technology Journal, 211–226 (2004)
Singh, P., Lin, T., Mueller, E.T., Lim, G., Perkins, T., Li Zhu, W.: Open Mind Common Sense: Knowledge acquisition from the general public. In: Proc. of the 1st Int. Conf. on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems, pp. 1123–1237 (2002)
Fellbaum, C.: WordNet, an electronic lexical database. MIT Press, Cambridge (1998)
Hangeul Society: Urimal Korean Unabridged Dictionary, Eomungag (in Korean) (1997)
Moon, Y.J.: Methodology and Techniques for the Design of Korean Noun WordNet. In: Proc. of the Natural Language Processing Pacific Rim Symposium, pp. 465–469 (1997)
Lee, C.K., et al.: Automatic WordNet mapping using word sense disambiguation. In: Proc. of Joint SIGDAT Conference on EMNLP/VLC, pp. 142–147 (2000)
Lee, D.-G.: Probabilistic Models for Korean Morphological Analysis and Part-of-Speech Tagging. Ph. D. thesis, Korea University (2005)
Park, S.-Y.: Probabilistic Feature-based Parsing Model for Korea Syntactic Analysis. Ph. D. thesis, Korea University (2005)
Brill, E.: Some advances in rule-based part of speech tagging. In: Proc. of the 20th National Conf. on Artificial Intelligence, pp. 722–727 (1994)
Jung, Y.I., Yoon, A.-S., Kwon, H.-C.: Disambiguation Based on Wordnet for Transliteration of Arabic Numerals for Korean TTS. In: Proc. of Computational Linguistics and Intelligent Text Processing (CICLing), pp. 366–377 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jung, Y., Lee, JY., Kim, Y., Park, J., Myaeng, SH., Rim, HC. (2007). Building a Large-Scale Commonsense Knowledge Base by Converting an Existing One in a Different Language. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-70939-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)