Skip to main content

Multilingual Conceptual Access to Lexicon Based on Shared Orthography: An Ontology-Driven Study of Chinese and Japanese

  • Chapter
  • First Online:
Language Production, Cognition, and the Lexicon

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 48))

Abstract

In this paper we propose a model for ontology-driven conceptual access to multilingual lexicon taking advantage of the cognitive-conceptual structure of radical system embedded in shared orthography of Chinese and Japanese. Our proposal rely crucially on two facts. First, both Chinese and Japanese use Chinese characters (hanzi/kanji) in their orthography. Second, the Chinese character orthography is anchored on a system of radical parts which encodes basic concepts. Each character as an orthographic unit contains radicals which indicate the broad semantic class of the meaning of that unit. Our study utilizes the homomorphism between the Chinese hanzi and Japanese kanji systems, but goes beyond the character-to-character mapping of kanji-hanzi conversion, to identify bilingual word correspondences. We use bilingual dictionaries, including WordNets, to verify semantic relation between the cross-lingual pairs. These bilingual pairs are then mapped to ontology of characters structured according to the organization of the basic concepts of radicals. The conceptual structure of the radical ontology is proposed as the model for simultaneous conceptual access to both languages. A study based on words containing characters composed of the “口 (mouth)” radical is given to illustrate the proposal and the actual model. It is suggested that the proposed model has the conceptual robustness to be applied to other languages based on the fact that it works now for two typologically very different languages and that the model contains Generative Lexicon (GL)-like coercive links to account for a wide range of possible cross-lingual semantic relations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www2.nict.go.jp/r/r312/EDR/index.html.

  2. 2.

    The categories of the concepts represented by radicals are extended based on Qualia structure. Each category in Table 10 and appendix was manually analyzed and assigned.

References

  • Chou, Y. M., & Huang, C. R. (2007). Hantology: A linguistic resource for chinese language processing and studying. Proceedings of the 5th LREC (pp. 587–590).

    Google Scholar 

  • Chou, Y. M., & Huang, C. R. (2010). Hantology: conceptual system discovery based on orthographic convention. In C. R. Huang (Ed.), ontology and the lexicon: A natural language processing perspective (pp. 122–143). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Chou, Y. M. (2012). The application of chinese-japanese characters and words knowledgebase: DaoZhaiSuiBi and Sheng WenZuanKao as examples. Journal of Chinese Literature of National Taipei University, 12, 41–56.

    Google Scholar 

  • Chou, Y. M., & Huang, C. R. (2013). The formal representation for chinese characters. In C. R. Huang, Y. M. Sophia, & Y. Lee (Eds.), Special issues on ontology and chinese language processing. contemporary linguistics (pp.142–161) (in Chinese).

    Google Scholar 

  • Chu, C., Nakazawa, T., & Kurohashi S. (2012). Chinese character mapping table of japanese, traditional chinese, and simplified chinese. Proceedings of the 8th LREC (pp. 2149–2152).

    Google Scholar 

  • Fellbaum, C. (1998). WordNet: An electronic lexical database. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Goh, C.-L., Asahara, M., & Matusmoto, Y. (2005). Building a Japanese-Chinese dictionary using Kanji/Hanzi conversion. natural language processing-IJCNLP 2005. In Lecture notes in computer science (Vol. 3651, pp. 670–681). Heidelberg: Springer.

    Google Scholar 

  • Hiroyuki, S., Shoichi, Y., & Eric, Long. (2003). Linguistic ecology of kanji variants in contemporary Japan. Tokyo: Sanseido Co., Ltd. (in Japanese).

    Google Scholar 

  • Hsieh, C. C., & Lin, S. (1997). A survey of full-text data bases and related techniques for chinese ancient documents in Academia Sinica. International Journal of Computational Linguistics and Chinese Language Processing, 2(1), 105-130 (in Chinese).

    Google Scholar 

  • Huang, C. R., & Ahrens, K. (2003). Individuals, kind and events: Classifier coercion of nouns. Language Sciences, 25(4), 353–373.

    Google Scholar 

  • Huang, C. R., Prévot, L., Su, I. L., & Hong, J. F. (2007). Towards a conceptual core for multicultural processing: A multilingual ontology based on the Swadesh list. In T. Ishida, S. R. Fussell, P. T. J. M. Vossen, (Eds.), Intercultural collaboration I. Lecture notes in computer science, state-of-the-art survey (pp. 17–30). Springer-Verlag.

    Google Scholar 

  • Huang, C. R., Chiyo, H., Kuo, T. Y., Su, I. L., & Hsieh, S. K. (2008). WordNet-anchored comparison of Chinese-Japanese kanji word. Proceedings of the 4th Global WordNet Conference. Szeged, Hungary.

    Google Scholar 

  • Huang, C. R., Chang, R. Y., & Li, S. (2010). Sinica BOW: Integration of bilingual wordnet and SUMO. In H. Chu-Ren, N. Calzolari, A. Gangemi, A. Lenci, A. Oltramari, L. Prevot (Eds.), Ontology and the Lexicon (pp. 201–211). Cambridge: Cambridge University Press.

    Google Scholar 

  • Huang, C. R., Yang, Y. J., & Chen, S. Y. (2013). Radicals as ontologies: Concept derivation and knowledge representation of Four-Hoofed mammals as semantic symbols. In Guangshun Cao, Hilary Chappell, Redouane Djamouri, & Thekla Wiebusch (Eds.), Breaking down the barriers: interdisciplinary studies in Chinese linguistics and beyond. A Festschrift for Professor Alain Peyraube (pp. 1117–1133). Taipei: Institute of Linguistics. Academia Sinica.

    Google Scholar 

  • Li, X. D. (1997). Hanzi de Qi Yuan Yu Yan Bian Lun Cong(the origin and evolution of han characters). Taipei: Lian-Keng. (in Chinese).

    Google Scholar 

  • Miller, G. A. (1995). WordNet: A lexical database for english. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Morohashi, T. (1960). Dai Kan-Wa Jiten(the great han-japanese dictionary). Tokyo: Taishukan Publishing Co., Ltd. (in Japanese).

    Google Scholar 

  • Nakada, N., & Hayashi, C. (2000). Nihon no Kanji(Japanese Kanji). Toyko: Chuo Koron new company.

    Google Scholar 

  • Niles, I., & Pease, A. (2001).Towards a standard upper ontology. Proceedings of the 2nd International Conference on Formal Ontology in Information Systems, Ogunquit, Maine, October 17–19.

    Google Scholar 

  • Pustejovsky, J. (1995). The generative lexicon. MA: The MIT Press.

    Google Scholar 

  • Shirai, K., Tokunaga, T., Huang, C.-R., Hsieh, S.-K., Kuo, T.-Y., Sornlertlamvanich, V., & Charoenporn, T. (2008). Constructing taxonomy of numerative classifiers for Asian languages. Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP2008), Hyderabad, India.

    Google Scholar 

  • Soria, C., Monachini, M., Bertagna, F., Calzolari, N., Huang, C. R., Hsieh, S. K., Marchetti, A., & Tesconi, M. (2009). Exploring interoperability of language resources: The case of cross-lingual semi-automatic enrichment of wordnets. In S. Gilles et al. (Eds.), Special issue on interoperability of multilingual language processing. Language Resources and Evaluation (Vol. 43, No. 1, pp. 87–96).

    Google Scholar 

  • Sugimoto, T. (1998). Nihon Mojishi no Kenkyu (Research on the History of Japanese Kangi). Tokyo: Yasakashobo. (in Japanese).

    Google Scholar 

  • Tan, C. L., & Nagao, M. (1995). Automatic alignment of japanese-chinese bilingual texts. IEICE Transactions of Information Systems, E78-D(1), 68–76.

    Google Scholar 

  • Xyu, S. (121/2004). ShuoWenJieZi (The explanation of words and the parsing of chinese characters). This edition. Beijing: ZhongHua. (in Chinese).

    Google Scholar 

  • Yokoi, T. (1995). The EDR electronic dictionary. Communications of the ACM, 38(11), 42–44.

    Article  Google Scholar 

Download references

Acknowledgments

Earlier version of this paper was presented at CogALex 2008 and we would like to thank Michael Zock and other participants for their helpful comments. We would also like to thank anonymous reviewers of this volume for their constructive suggestions. Work done on this paper was partially supported by grant GRF-543512 from Hong Kong RGC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chu-Ren Huang .

Editor information

Editors and Affiliations

Appendix A: The Dimension of “口(Mouth) Conceptual Extension”

Appendix A: The Dimension of “口(Mouth) Conceptual Extension”

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Huang, CR., Chou, YM. (2015). Multilingual Conceptual Access to Lexicon Based on Shared Orthography: An Ontology-Driven Study of Chinese and Japanese. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08043-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08042-0

  • Online ISBN: 978-3-319-08043-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics