Skip to main content

Building a Large-Scale Commonsense Knowledge Base by Converting an Existing One in a Different Language

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Abstract

This paper describes our effort to build a large-scale commonsense knowledge base in Korean by converting a pre-existing one in English, called ConceptNet. The English commonsense knowledge base is essentially a huge net consisting of concepts and relations. Triplets in the form of Concept-Relation-Concept in the net were extracted from English sentences collected from volunteers through a Web site, who were interested in entering commonsense knowledge. Our effort is an attempt to obtain its Korean version by utilizing a variety of language resources and tools. We not only employed a morphological analyzer and existing commercial machine translation software but also developed our own special-purpose translation and out-of-vocabulary handling methods. In order to handle ambiguity, we also devised a noisy concept filtering and concept generalization methods. Out of the 2.4 million assertions, i.e. triplets of concept-relation-concept, in the English ConceptNet, we generated about 200,000 Korean assertions so far. Based on our manual judgments of a 5% sample, the accuracy was 84.4%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liu, H., Singh, P.: ConceptNet – A Practical Commonsense Reasoning Tool-kit. BT Technology Journal, 211–226 (2004)

    Google Scholar 

  2. Singh, P., Lin, T., Mueller, E.T., Lim, G., Perkins, T., Li Zhu, W.: Open Mind Common Sense: Knowledge acquisition from the general public. In: Proc. of the 1st Int. Conf. on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems, pp. 1123–1237 (2002)

    Google Scholar 

  3. Fellbaum, C.: WordNet, an electronic lexical database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  4. Hangeul Society: Urimal Korean Unabridged Dictionary, Eomungag (in Korean) (1997)

    Google Scholar 

  5. Moon, Y.J.: Methodology and Techniques for the Design of Korean Noun WordNet. In: Proc. of the Natural Language Processing Pacific Rim Symposium, pp. 465–469 (1997)

    Google Scholar 

  6. Lee, C.K., et al.: Automatic WordNet mapping using word sense disambiguation. In: Proc. of Joint SIGDAT Conference on EMNLP/VLC, pp. 142–147 (2000)

    Google Scholar 

  7. Lee, D.-G.: Probabilistic Models for Korean Morphological Analysis and Part-of-Speech Tagging. Ph. D. thesis, Korea University (2005)

    Google Scholar 

  8. Park, S.-Y.: Probabilistic Feature-based Parsing Model for Korea Syntactic Analysis. Ph. D. thesis, Korea University (2005)

    Google Scholar 

  9. Brill, E.: Some advances in rule-based part of speech tagging. In: Proc. of the 20th National Conf. on Artificial Intelligence, pp. 722–727 (1994)

    Google Scholar 

  10. Jung, Y.I., Yoon, A.-S., Kwon, H.-C.: Disambiguation Based on Wordnet for Transliteration of Arabic Numerals for Korean TTS. In: Proc. of Computational Linguistics and Intelligent Text Processing (CICLing), pp. 366–377 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jung, Y., Lee, JY., Kim, Y., Park, J., Myaeng, SH., Rim, HC. (2007). Building a Large-Scale Commonsense Knowledge Base by Converting an Existing One in a Different Language. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70939-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70938-1

  • Online ISBN: 978-3-540-70939-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics