Skip to main content

CHISE: Character Processing Based on Character Ontology

  • Conference paper
Large-Scale Knowledge Resources. Construction and Application (LKR 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4938))

Included in the following conference series:

Abstract

Currently, in the field of information processing, characters are defined and shared using coded character sets. Character processing based on coded character sets, however, has two problems: (1) Coded character sets may lack some necessary characters. (2) Characters in coded character sets have fixed semantics. They may prevent to implement classical text database for philological studies. Especially for Kanji (Chinese character), they are serious problems to digitize classical texts. To resolve the problems, we proposed “Chaon” model which is a new model of character processing based on character ontology. To realize them, a character ontology is required. Especially for Kanji, large scale ontology is required. So we realized a large scale character ontology which includes 98 thousand characters including Unicode and non-Unicode characters. This paper focuses our design or principal of a large scale character ontology based on Chaon model, and overview of its implementation named CHISE (Character Information Service Environment).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chou, Y.-M., Huang, C.-R.: Hantology: An ontology based on conventionalized conceptualization. In: Dale, R., et al. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, Springer, Heidelberg (2005)

    Google Scholar 

  2. Chou, Y.-M., Huang, C.-R.: Hantology-a linguistic resource for chinese language processing and studying. In: LREC 2006 (5th edition of the International Conference on Language Resources and Evaluation) (May 2006)

    Google Scholar 

  3. Fujiwara, Y., Suzuki, Y., Morioka, T.: Network of words. In: Artificial Life and Robotics (2002)

    Google Scholar 

  4. Hanziku: http://www.sinica.edu.tw/~cdp/zip/hanzi/hanzicd.zip

  5. International Organization for Standardization (ISO). Information technology — Universal Multiple-Octet Coded Character Set (UCS), ISO/IEC 10646:2003 (March 2003)

    Google Scholar 

  6. Kamichi, K.: CHISE link map, http://fonts.jp/chise_linkmap/

  7. Kamichi, K.: Kage, http://fonts.jp/kage/

  8. Kamichi, K.: KAGE — an automatic glyph generating engine for large character code set. In: Proceedings of the Glyph and Typesetting Workshop. 21st Century COE program East Asian Center for Informatics in Humanities — Toward an Overall Inheritance and Development of Kanji Culture —, Kyoto University, February 2004, pp. 85–92 (2004)

    Google Scholar 

  9. Kawabata, T.: Unification/subsumption criterion of UCS Ideographs (beta edition), http://kanji-database.sourceforge.net/housetsu.html

  10. Kawabata, T.: Reference Information on Ideographs and IDS Informations on Japanese researchers, IRG N1139 (May 2005), http://www.cse.cuhk.edu.hk/~irg/irg/irg24/IRGN1139IDSResearch.pdf

  11. Kawabata, T.: A judgement method of “equivalence” of UCS Ideographs based on IDS (in Japanese). In: The 17th seminar on Computing for Oriental Studies, March 2006, pp. 105–119 (2006)

    Google Scholar 

  12. Mojikyo institute. http://www.mojikyo.org/

  13. Morioka, T.: UTF-2000 — vision of code independent character representation system (in Japanese). In: Frontier of Asian Informatics, November 2000, pp. 13–24 (2000)

    Google Scholar 

  14. Morioka, T., Wittern, C.: Developping of character object technology with character databases (in Japanese). In: IPA result report 2002. Information-Technology Promotion Agency, Japan (2002), http://www.ipa.go.jp/NBP/13nendo/reports/explorat/charadb/charadb.pdf

  15. The omega typesetting and document processing system. http://omega.cse.unsw.edu.au:8080/

  16. VF (virtual font) for Open Type fonts. http://psitau.at.infoseek.co.jp/otf.html

  17. ASCII Japanese TEX (pTEX). http://www.ascii.co.jp/pb/ptex/

  18. The object-oriented scripting language Ruby. http://www.ruby-lang.org/

  19. Tomohiko, M.: CHISE IDS find. http://mousai.kanji.zinbun.kyoto-u.ac.jp/ids-find

  20. XEmacs. http://www.xemacs.org/

  21. Yutaka, N.: UTF-2000 Announcement (April 1998), http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/utf-2000.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takenobu Tokunaga Antonio Ortega

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Morioka, T. (2008). CHISE: Character Processing Based on Character Ontology. In: Tokunaga, T., Ortega, A. (eds) Large-Scale Knowledge Resources. Construction and Application. LKR 2008. Lecture Notes in Computer Science(), vol 4938. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78159-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78159-2_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78158-5

  • Online ISBN: 978-3-540-78159-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics