Skip to main content
Log in

SCUT-COUCH2009—a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

A comprehensive online unconstrained Chinese handwriting dataset, SCUT-COUCH2009, is introduced in this paper. As a revision of SCUT-COUCH2008 [1], the SCUT-COUCH2009 database consists of more datasets with larger vocabularies and more writers. The database is built to facilitate the research of unconstrained online Chinese handwriting recognition. It is comprehensive in the sense that it consists of 11 datasets of different vocabularies, named GB1, GB2, TradGB1, Big5, Pinyin, Letters, Digit, Symbol, Word8888, Word17366 and Word44208. In particular, the SCUT-COUCH2009 database contains handwritten samples of 6,763 single Chinese characters in the GB2312-80 standard, 5,401 traditional Chinese characters of the Big5 standard, 1,384 traditional Chinese characters corresponding to the level 1 characters of the GB2312-80 standard, 8,888 frequently used Chinese words, 17,366 daily-used Chinese words, 44,208 complete words from the Fourth Edition of “The Contemporary Chinese Dictionary”, 2,010 Pinyin and 184 daily-used symbols. The samples were collected using PDAs (Personal Digit Assistant) and smart phones with touch screens and were contributed by more than 190 persons. The total number of character samples is over 3.6 million. The SCUT-COUCH2009 database is the first publicly available large vocabulary online Chinese handwriting database containing multi-type character/word samples. We report some evaluation results on the database using state-of-the-art recognizers for benchmarking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Li, Y.Y., Jin, L.W., Zhu, X.H., Long, T.: SCUT-COUCH2008: a comprehensive online unconstrained Chinese handwritingdataset. In: Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition, ICFHR08, pp. 165–170 (2008)

  2. Liu C.L., Jaeger S., Nakagawa M.: Online recognition of Chinese characters: the state-of-the-art. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 198–213 (2004)

    Article  Google Scholar 

  3. Jaeger, S., Nakagawa, M.: Two on-line Japanese character databases in Unipen format. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, ICDAR01, pp. 566 –570 (2001)

  4. The UNIPEN Project, http://hwr.nici.kun.nl/unipen/

  5. Suen, C.Y., Nadal, C., Legault, R., Mai, T.A., Lam, L.: Computer recognition of unconstrained handwritten numerals. In: Proceedings of the IEEE, 80(7), 1162–1180 (1992)

  6. Hull J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)

    Article  Google Scholar 

  7. Viard-Gaudin, C., Lallican, P.M., Knerr, S., Binter, P.: The IRESTE On/Off (IRONOFF) dual handwriting database. In: Proceedings of the fifth International Conference on Document Analysis and Recognition, ICDAR99, pp. 455–458 (1999)

  8. Bhattacharya, U., Chaudhuri, B.B.: Databases for research on recognition of handwritten characters of Indian scripts. In: Proceedings of the eighth International Conference on Document Analysis and Recognition, ICDAR05, pp. 789–793(2005)

  9. Matsumoto, K., Fukushima, T., Nakagawa, M.: Collection and analysis of on-line handwritten Japanese character patterns. In: Proceedings of the sixth International Conference on Document Analysis and Recognition, ICDAR01, pp. 496–500 (2001)

  10. Mori S., Yamamoto K., Yamada H., Saito T.: On a hand printed kyoiku-kanji character database. Bull. Electrotech. Lab 43(11–12), 752–773 (1979)

    Google Scholar 

  11. Liu, Y.J., Tai, J.W., Liu, J.: An introduction to the 4 million handwriting Chinese character samples library. In: Proceedings of the International Conference on Chinese Computing and Processing of Orient Language, ICCPOL89, pp. 94–97 (1989)

  12. Ge, Y., Huo, Q.: A comparative study of several modeling approaches for large vocabulary offline recognition of handwritten Chinese characters. In: Proceedings of the 16th International Conference on Pattern Recognition, ICPR02, pp. 85–88 (2002)

  13. Su T.H., Zhang T.W., Guan D.J.: Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text. Int. J. Document Anal. Recogn., IJDAR07 10(1), 27–38 (2007)

    Article  Google Scholar 

  14. Wang, D.H., Liu, C.L., Yu, J.L., Zhou, X.D.: CASIA-OLHWDB1: a database of online handwritten Chinese characters. ICDAR2009. In: Proceedings of the 6th International Conference on Document Analysis and Recognition, ICDAR09, pp. 1206–1210 (2009)

  15. Alamri, H., Sadri, J., Nobile, N., Suen, C.Y.: A novel comprehensive database for Arabic off-line handwriting recognition. In: Proceedings of 11th International Conference on Frontiers in Handwriting Recognition, ICFHR 08, pp. 664–669 (2008)

  16. Perez, D., Tarazon, L., Serrano, N., Castro, F., Terrades, O.R., Juan, A.: The GERMANA database. In: Proceedings of 10th International Conference on Document Analysis and Recognition, ICDAR09, pp. 301–305 (2009)

  17. Ziaratban, M., Faez, K., Bagheri, F.: FHT: An unconstraint Farsi handwritten text database. In: Proceedings of 10th International Conference on Document Analysis and Recognition, ICDAR09, pp. 281–285 (2009)

  18. Ge, Y., Guo, F.J., Zhen, L.X., Chen, Q.S.: Online Chinese character recognition system with handwritten Pinyin input. In: Proceedings of eighth International Conference on Document Analysis and Recognition, ICDAR05, pp. 1265–1269 (2005)

  19. Sogou Internet Word corpus, http://www.sogou.com/labs/dl/w.html

  20. PowerWord Website, http://cp.iciba.com/

  21. Long T., Jin L.W.: Building compact MQDF classifier for large character set recognition by subspace distribution sharing. Pattern Recogn. 41(9), 2916–2925 (2008)

    Article  MATH  Google Scholar 

  22. Bai, Z.L., Huo, Q.: A study on the use of 8-directional features for online handwritten Chinese character recognition. In: Proceedings of eighth International Conference on Document Analysis and Recognition, ICDAR05, pp. 232–236 (2005)

  23. Ding, K., Jin, L.W., Gao, X.: A new method for rotation free method for online unconstrained handwritten Chinese word recognition: a holistic approach. In: Proceedings of 10th International Conference on Document Analysis and Recognition, ICDAR09, pp. 1131–1135 (2009)

  24. Long, T., Jin, L.W.: A novel orientation free method for online unconstrained cursive handwritten Chinese word recognition. In: Proceedings of the 19th International Conference on Pattern Recognition, ICPR08, pp. 1–4 (2008)

  25. Krzanowski W.J., Jonathan P. et al.: Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Appl. Stat. 44(2), 101–115 (2005)

    Google Scholar 

  26. Huang, Z.B., Ding, K., Jin, L.W., Gao, X.: Writer adaptive online handwriting recognition using incremental linear discriminant analysis. In: Proceedings of 10th International Conference on Document Analysis and Recognition, ICDAR09, pp. 91–95 (2009)

  27. ZPen, http://www.danedigital.com/6-Zpen/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lianwen Jin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, L., Gao, Y., Liu, G. et al. SCUT-COUCH2009—a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation. IJDAR 14, 53–64 (2011). https://doi.org/10.1007/s10032-010-0116-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-010-0116-6

Keywords

Navigation