Abstract
Currently, in the field of information processing, characters are defined and shared using coded character sets. Character processing based on coded character sets, however, has two problems: (1) Coded character sets may lack some necessary characters. (2) Characters in coded character sets have fixed semantics. They may prevent to implement classical text database for philological studies. Especially for Kanji (Chinese character), they are serious problems to digitize classical texts. To resolve the problems, we proposed “Chaon” model which is a new model of character processing based on character ontology. To realize them, a character ontology is required. Especially for Kanji, large scale ontology is required. So we realized a large scale character ontology which includes 98 thousand characters including Unicode and non-Unicode characters. This paper focuses our design or principal of a large scale character ontology based on Chaon model, and overview of its implementation named CHISE (Character Information Service Environment).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chou, Y.-M., Huang, C.-R.: Hantology: An ontology based on conventionalized conceptualization. In: Dale, R., et al. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, Springer, Heidelberg (2005)
Chou, Y.-M., Huang, C.-R.: Hantology-a linguistic resource for chinese language processing and studying. In: LREC 2006 (5th edition of the International Conference on Language Resources and Evaluation) (May 2006)
Fujiwara, Y., Suzuki, Y., Morioka, T.: Network of words. In: Artificial Life and Robotics (2002)
Hanziku: http://www.sinica.edu.tw/~cdp/zip/hanzi/hanzicd.zip
International Organization for Standardization (ISO). Information technology — Universal Multiple-Octet Coded Character Set (UCS), ISO/IEC 10646:2003 (March 2003)
Kamichi, K.: CHISE link map, http://fonts.jp/chise_linkmap/
Kamichi, K.: Kage, http://fonts.jp/kage/
Kamichi, K.: KAGE — an automatic glyph generating engine for large character code set. In: Proceedings of the Glyph and Typesetting Workshop. 21st Century COE program East Asian Center for Informatics in Humanities — Toward an Overall Inheritance and Development of Kanji Culture —, Kyoto University, February 2004, pp. 85–92 (2004)
Kawabata, T.: Unification/subsumption criterion of UCS Ideographs (beta edition), http://kanji-database.sourceforge.net/housetsu.html
Kawabata, T.: Reference Information on Ideographs and IDS Informations on Japanese researchers, IRG N1139 (May 2005), http://www.cse.cuhk.edu.hk/~irg/irg/irg24/IRGN1139IDSResearch.pdf
Kawabata, T.: A judgement method of “equivalence” of UCS Ideographs based on IDS (in Japanese). In: The 17th seminar on Computing for Oriental Studies, March 2006, pp. 105–119 (2006)
Mojikyo institute. http://www.mojikyo.org/
Morioka, T.: UTF-2000 — vision of code independent character representation system (in Japanese). In: Frontier of Asian Informatics, November 2000, pp. 13–24 (2000)
Morioka, T., Wittern, C.: Developping of character object technology with character databases (in Japanese). In: IPA result report 2002. Information-Technology Promotion Agency, Japan (2002), http://www.ipa.go.jp/NBP/13nendo/reports/explorat/charadb/charadb.pdf
The omega typesetting and document processing system. http://omega.cse.unsw.edu.au:8080/
VF (virtual font) for Open Type fonts. http://psitau.at.infoseek.co.jp/otf.html
ASCII Japanese TEX (pTEX). http://www.ascii.co.jp/pb/ptex/
The object-oriented scripting language Ruby. http://www.ruby-lang.org/
Tomohiko, M.: CHISE IDS find. http://mousai.kanji.zinbun.kyoto-u.ac.jp/ids-find
XEmacs. http://www.xemacs.org/
Yutaka, N.: UTF-2000 Announcement (April 1998), http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/utf-2000.html
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Morioka, T. (2008). CHISE: Character Processing Based on Character Ontology. In: Tokunaga, T., Ortega, A. (eds) Large-Scale Knowledge Resources. Construction and Application. LKR 2008. Lecture Notes in Computer Science(), vol 4938. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78159-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-78159-2_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78158-5
Online ISBN: 978-3-540-78159-2
eBook Packages: Computer ScienceComputer Science (R0)