CHISE: Character Processing Based on Character Ontology

Morioka, Tomohiko

doi:10.1007/978-3-540-78159-2_14

Tomohiko Morioka¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4938))

Included in the following conference series:

International Conference on Large-Scale Knowledge Resources

593 Accesses
6 Citations

Abstract

Currently, in the field of information processing, characters are defined and shared using coded character sets. Character processing based on coded character sets, however, has two problems: (1) Coded character sets may lack some necessary characters. (2) Characters in coded character sets have fixed semantics. They may prevent to implement classical text database for philological studies. Especially for Kanji (Chinese character), they are serious problems to digitize classical texts. To resolve the problems, we proposed “Chaon” model which is a new model of character processing based on character ontology. To realize them, a character ontology is required. Especially for Kanji, large scale ontology is required. So we realized a large scale character ontology which includes 98 thousand characters including Unicode and non-Unicode characters. This paper focuses our design or principal of a large scale character ontology based on Chaon model, and overview of its implementation named CHISE (Character Information Service Environment).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chou, Y.-M., Huang, C.-R.: Hantology: An ontology based on conventionalized conceptualization. In: Dale, R., et al. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, Springer, Heidelberg (2005)
Google Scholar
Chou, Y.-M., Huang, C.-R.: Hantology-a linguistic resource for chinese language processing and studying. In: LREC 2006 (5th edition of the International Conference on Language Resources and Evaluation) (May 2006)
Google Scholar
Fujiwara, Y., Suzuki, Y., Morioka, T.: Network of words. In: Artificial Life and Robotics (2002)
Google Scholar
Hanziku: http://www.sinica.edu.tw/~cdp/zip/hanzi/hanzicd.zip
International Organization for Standardization (ISO). Information technology — Universal Multiple-Octet Coded Character Set (UCS), ISO/IEC 10646:2003 (March 2003)
Google Scholar
Kamichi, K.: CHISE link map, http://fonts.jp/chise_linkmap/
Kamichi, K.: Kage, http://fonts.jp/kage/
Kamichi, K.: KAGE — an automatic glyph generating engine for large character code set. In: Proceedings of the Glyph and Typesetting Workshop. 21st Century COE program East Asian Center for Informatics in Humanities — Toward an Overall Inheritance and Development of Kanji Culture —, Kyoto University, February 2004, pp. 85–92 (2004)
Google Scholar
Kawabata, T.: Unification/subsumption criterion of UCS Ideographs (beta edition), http://kanji-database.sourceforge.net/housetsu.html
Kawabata, T.: Reference Information on Ideographs and IDS Informations on Japanese researchers, IRG N1139 (May 2005), http://www.cse.cuhk.edu.hk/~irg/irg/irg24/IRGN1139IDSResearch.pdf
Kawabata, T.: A judgement method of “equivalence” of UCS Ideographs based on IDS (in Japanese). In: The 17th seminar on Computing for Oriental Studies, March 2006, pp. 105–119 (2006)
Google Scholar
Mojikyo institute. http://www.mojikyo.org/
Morioka, T.: UTF-2000 — vision of code independent character representation system (in Japanese). In: Frontier of Asian Informatics, November 2000, pp. 13–24 (2000)
Google Scholar
Morioka, T., Wittern, C.: Developping of character object technology with character databases (in Japanese). In: IPA result report 2002. Information-Technology Promotion Agency, Japan (2002), http://www.ipa.go.jp/NBP/13nendo/reports/explorat/charadb/charadb.pdf
The omega typesetting and document processing system. http://omega.cse.unsw.edu.au:8080/
VF (virtual font) for Open Type fonts. http://psitau.at.infoseek.co.jp/otf.html
ASCII Japanese TEX (pTEX). http://www.ascii.co.jp/pb/ptex/
The object-oriented scripting language Ruby. http://www.ruby-lang.org/
Tomohiko, M.: CHISE IDS find. http://mousai.kanji.zinbun.kyoto-u.ac.jp/ids-find
XEmacs. http://www.xemacs.org/
Yutaka, N.: UTF-2000 Announcement (April 1998), http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/utf-2000.html

Download references

Author information

Authors and Affiliations

Documentation and Information Center for Chinese Studies, Institute for Research in Humanities, Kyoto University,
Tomohiko Morioka

Authors

Tomohiko Morioka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Takenobu Tokunaga Antonio Ortega

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morioka, T. (2008). CHISE: Character Processing Based on Character Ontology. In: Tokunaga, T., Ortega, A. (eds) Large-Scale Knowledge Resources. Construction and Application. LKR 2008. Lecture Notes in Computer Science(), vol 4938. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78159-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-78159-2_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78158-5
Online ISBN: 978-3-540-78159-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics