Abstract
In this paper we propose a model for ontology-driven conceptual access to multilingual lexicon taking advantage of the cognitive-conceptual structure of radical system embedded in shared orthography of Chinese and Japanese. Our proposal rely crucially on two facts. First, both Chinese and Japanese use Chinese characters (hanzi/kanji) in their orthography. Second, the Chinese character orthography is anchored on a system of radical parts which encodes basic concepts. Each character as an orthographic unit contains radicals which indicate the broad semantic class of the meaning of that unit. Our study utilizes the homomorphism between the Chinese hanzi and Japanese kanji systems, but goes beyond the character-to-character mapping of kanji-hanzi conversion, to identify bilingual word correspondences. We use bilingual dictionaries, including WordNets, to verify semantic relation between the cross-lingual pairs. These bilingual pairs are then mapped to ontology of characters structured according to the organization of the basic concepts of radicals. The conceptual structure of the radical ontology is proposed as the model for simultaneous conceptual access to both languages. A study based on words containing characters composed of the “口 (mouth)” radical is given to illustrate the proposal and the actual model. It is suggested that the proposed model has the conceptual robustness to be applied to other languages based on the fact that it works now for two typologically very different languages and that the model contains Generative Lexicon (GL)-like coercive links to account for a wide range of possible cross-lingual semantic relations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The categories of the concepts represented by radicals are extended based on Qualia structure. Each category in Table 10 and appendix was manually analyzed and assigned.
References
Chou, Y. M., & Huang, C. R. (2007). Hantology: A linguistic resource for chinese language processing and studying. Proceedings of the 5th LREC (pp. 587–590).
Chou, Y. M., & Huang, C. R. (2010). Hantology: conceptual system discovery based on orthographic convention. In C. R. Huang (Ed.), ontology and the lexicon: A natural language processing perspective (pp. 122–143). Cambridge: Cambridge University Press.
Chou, Y. M. (2012). The application of chinese-japanese characters and words knowledgebase: DaoZhaiSuiBi and Sheng WenZuanKao as examples. Journal of Chinese Literature of National Taipei University, 12, 41–56.
Chou, Y. M., & Huang, C. R. (2013). The formal representation for chinese characters. In C. R. Huang, Y. M. Sophia, & Y. Lee (Eds.), Special issues on ontology and chinese language processing. contemporary linguistics (pp.142–161) (in Chinese).
Chu, C., Nakazawa, T., & Kurohashi S. (2012). Chinese character mapping table of japanese, traditional chinese, and simplified chinese. Proceedings of the 8th LREC (pp. 2149–2152).
Fellbaum, C. (1998). WordNet: An electronic lexical database. Cambridge: MIT Press.
Goh, C.-L., Asahara, M., & Matusmoto, Y. (2005). Building a Japanese-Chinese dictionary using Kanji/Hanzi conversion. natural language processing-IJCNLP 2005. In Lecture notes in computer science (Vol. 3651, pp. 670–681). Heidelberg: Springer.
Hiroyuki, S., Shoichi, Y., & Eric, Long. (2003). Linguistic ecology of kanji variants in contemporary Japan. Tokyo: Sanseido Co., Ltd. (in Japanese).
Hsieh, C. C., & Lin, S. (1997). A survey of full-text data bases and related techniques for chinese ancient documents in Academia Sinica. International Journal of Computational Linguistics and Chinese Language Processing, 2(1), 105-130 (in Chinese).
Huang, C. R., & Ahrens, K. (2003). Individuals, kind and events: Classifier coercion of nouns. Language Sciences, 25(4), 353–373.
Huang, C. R., Prévot, L., Su, I. L., & Hong, J. F. (2007). Towards a conceptual core for multicultural processing: A multilingual ontology based on the Swadesh list. In T. Ishida, S. R. Fussell, P. T. J. M. Vossen, (Eds.), Intercultural collaboration I. Lecture notes in computer science, state-of-the-art survey (pp. 17–30). Springer-Verlag.
Huang, C. R., Chiyo, H., Kuo, T. Y., Su, I. L., & Hsieh, S. K. (2008). WordNet-anchored comparison of Chinese-Japanese kanji word. Proceedings of the 4th Global WordNet Conference. Szeged, Hungary.
Huang, C. R., Chang, R. Y., & Li, S. (2010). Sinica BOW: Integration of bilingual wordnet and SUMO. In H. Chu-Ren, N. Calzolari, A. Gangemi, A. Lenci, A. Oltramari, L. Prevot (Eds.), Ontology and the Lexicon (pp. 201–211). Cambridge: Cambridge University Press.
Huang, C. R., Yang, Y. J., & Chen, S. Y. (2013). Radicals as ontologies: Concept derivation and knowledge representation of Four-Hoofed mammals as semantic symbols. In Guangshun Cao, Hilary Chappell, Redouane Djamouri, & Thekla Wiebusch (Eds.), Breaking down the barriers: interdisciplinary studies in Chinese linguistics and beyond. A Festschrift for Professor Alain Peyraube (pp. 1117–1133). Taipei: Institute of Linguistics. Academia Sinica.
Li, X. D. (1997). Hanzi de Qi Yuan Yu Yan Bian Lun Cong(the origin and evolution of han characters). Taipei: Lian-Keng. (in Chinese).
Miller, G. A. (1995). WordNet: A lexical database for english. Communications of the ACM, 38(11), 39–41.
Morohashi, T. (1960). Dai Kan-Wa Jiten(the great han-japanese dictionary). Tokyo: Taishukan Publishing Co., Ltd. (in Japanese).
Nakada, N., & Hayashi, C. (2000). Nihon no Kanji(Japanese Kanji). Toyko: Chuo Koron new company.
Niles, I., & Pease, A. (2001).Towards a standard upper ontology. Proceedings of the 2nd International Conference on Formal Ontology in Information Systems, Ogunquit, Maine, October 17–19.
Pustejovsky, J. (1995). The generative lexicon. MA: The MIT Press.
Shirai, K., Tokunaga, T., Huang, C.-R., Hsieh, S.-K., Kuo, T.-Y., Sornlertlamvanich, V., & Charoenporn, T. (2008). Constructing taxonomy of numerative classifiers for Asian languages. Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP2008), Hyderabad, India.
Soria, C., Monachini, M., Bertagna, F., Calzolari, N., Huang, C. R., Hsieh, S. K., Marchetti, A., & Tesconi, M. (2009). Exploring interoperability of language resources: The case of cross-lingual semi-automatic enrichment of wordnets. In S. Gilles et al. (Eds.), Special issue on interoperability of multilingual language processing. Language Resources and Evaluation (Vol. 43, No. 1, pp. 87–96).
Sugimoto, T. (1998). Nihon Mojishi no Kenkyu (Research on the History of Japanese Kangi). Tokyo: Yasakashobo. (in Japanese).
Tan, C. L., & Nagao, M. (1995). Automatic alignment of japanese-chinese bilingual texts. IEICE Transactions of Information Systems, E78-D(1), 68–76.
Xyu, S. (121/2004). ShuoWenJieZi (The explanation of words and the parsing of chinese characters). This edition. Beijing: ZhongHua. (in Chinese).
Yokoi, T. (1995). The EDR electronic dictionary. Communications of the ACM, 38(11), 42–44.
Acknowledgments
Earlier version of this paper was presented at CogALex 2008 and we would like to thank Michael Zock and other participants for their helpful comments. We would also like to thank anonymous reviewers of this volume for their constructive suggestions. Work done on this paper was partially supported by grant GRF-543512 from Hong Kong RGC.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A: The Dimension of “口(Mouth) Conceptual Extension”
Appendix A: The Dimension of “口(Mouth) Conceptual Extension”
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Huang, CR., Chou, YM. (2015). Multilingual Conceptual Access to Lexicon Based on Shared Orthography: An Ontology-Driven Study of Chinese and Japanese. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-08043-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08042-0
Online ISBN: 978-3-319-08043-7
eBook Packages: Computer ScienceComputer Science (R0)