Abstract
A comprehensive online unconstrained Chinese handwriting dataset, SCUT-COUCH2009, is introduced in this paper. As a revision of SCUT-COUCH2008 [1], the SCUT-COUCH2009 database consists of more datasets with larger vocabularies and more writers. The database is built to facilitate the research of unconstrained online Chinese handwriting recognition. It is comprehensive in the sense that it consists of 11 datasets of different vocabularies, named GB1, GB2, TradGB1, Big5, Pinyin, Letters, Digit, Symbol, Word8888, Word17366 and Word44208. In particular, the SCUT-COUCH2009 database contains handwritten samples of 6,763 single Chinese characters in the GB2312-80 standard, 5,401 traditional Chinese characters of the Big5 standard, 1,384 traditional Chinese characters corresponding to the level 1 characters of the GB2312-80 standard, 8,888 frequently used Chinese words, 17,366 daily-used Chinese words, 44,208 complete words from the Fourth Edition of “The Contemporary Chinese Dictionary”, 2,010 Pinyin and 184 daily-used symbols. The samples were collected using PDAs (Personal Digit Assistant) and smart phones with touch screens and were contributed by more than 190 persons. The total number of character samples is over 3.6 million. The SCUT-COUCH2009 database is the first publicly available large vocabulary online Chinese handwriting database containing multi-type character/word samples. We report some evaluation results on the database using state-of-the-art recognizers for benchmarking.
Similar content being viewed by others
References
Li, Y.Y., Jin, L.W., Zhu, X.H., Long, T.: SCUT-COUCH2008: a comprehensive online unconstrained Chinese handwritingdataset. In: Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition, ICFHR08, pp. 165–170 (2008)
Liu C.L., Jaeger S., Nakagawa M.: Online recognition of Chinese characters: the state-of-the-art. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 198–213 (2004)
Jaeger, S., Nakagawa, M.: Two on-line Japanese character databases in Unipen format. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, ICDAR01, pp. 566 –570 (2001)
The UNIPEN Project, http://hwr.nici.kun.nl/unipen/
Suen, C.Y., Nadal, C., Legault, R., Mai, T.A., Lam, L.: Computer recognition of unconstrained handwritten numerals. In: Proceedings of the IEEE, 80(7), 1162–1180 (1992)
Hull J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)
Viard-Gaudin, C., Lallican, P.M., Knerr, S., Binter, P.: The IRESTE On/Off (IRONOFF) dual handwriting database. In: Proceedings of the fifth International Conference on Document Analysis and Recognition, ICDAR99, pp. 455–458 (1999)
Bhattacharya, U., Chaudhuri, B.B.: Databases for research on recognition of handwritten characters of Indian scripts. In: Proceedings of the eighth International Conference on Document Analysis and Recognition, ICDAR05, pp. 789–793(2005)
Matsumoto, K., Fukushima, T., Nakagawa, M.: Collection and analysis of on-line handwritten Japanese character patterns. In: Proceedings of the sixth International Conference on Document Analysis and Recognition, ICDAR01, pp. 496–500 (2001)
Mori S., Yamamoto K., Yamada H., Saito T.: On a hand printed kyoiku-kanji character database. Bull. Electrotech. Lab 43(11–12), 752–773 (1979)
Liu, Y.J., Tai, J.W., Liu, J.: An introduction to the 4 million handwriting Chinese character samples library. In: Proceedings of the International Conference on Chinese Computing and Processing of Orient Language, ICCPOL89, pp. 94–97 (1989)
Ge, Y., Huo, Q.: A comparative study of several modeling approaches for large vocabulary offline recognition of handwritten Chinese characters. In: Proceedings of the 16th International Conference on Pattern Recognition, ICPR02, pp. 85–88 (2002)
Su T.H., Zhang T.W., Guan D.J.: Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text. Int. J. Document Anal. Recogn., IJDAR07 10(1), 27–38 (2007)
Wang, D.H., Liu, C.L., Yu, J.L., Zhou, X.D.: CASIA-OLHWDB1: a database of online handwritten Chinese characters. ICDAR2009. In: Proceedings of the 6th International Conference on Document Analysis and Recognition, ICDAR09, pp. 1206–1210 (2009)
Alamri, H., Sadri, J., Nobile, N., Suen, C.Y.: A novel comprehensive database for Arabic off-line handwriting recognition. In: Proceedings of 11th International Conference on Frontiers in Handwriting Recognition, ICFHR 08, pp. 664–669 (2008)
Perez, D., Tarazon, L., Serrano, N., Castro, F., Terrades, O.R., Juan, A.: The GERMANA database. In: Proceedings of 10th International Conference on Document Analysis and Recognition, ICDAR09, pp. 301–305 (2009)
Ziaratban, M., Faez, K., Bagheri, F.: FHT: An unconstraint Farsi handwritten text database. In: Proceedings of 10th International Conference on Document Analysis and Recognition, ICDAR09, pp. 281–285 (2009)
Ge, Y., Guo, F.J., Zhen, L.X., Chen, Q.S.: Online Chinese character recognition system with handwritten Pinyin input. In: Proceedings of eighth International Conference on Document Analysis and Recognition, ICDAR05, pp. 1265–1269 (2005)
Sogou Internet Word corpus, http://www.sogou.com/labs/dl/w.html
PowerWord Website, http://cp.iciba.com/
Long T., Jin L.W.: Building compact MQDF classifier for large character set recognition by subspace distribution sharing. Pattern Recogn. 41(9), 2916–2925 (2008)
Bai, Z.L., Huo, Q.: A study on the use of 8-directional features for online handwritten Chinese character recognition. In: Proceedings of eighth International Conference on Document Analysis and Recognition, ICDAR05, pp. 232–236 (2005)
Ding, K., Jin, L.W., Gao, X.: A new method for rotation free method for online unconstrained handwritten Chinese word recognition: a holistic approach. In: Proceedings of 10th International Conference on Document Analysis and Recognition, ICDAR09, pp. 1131–1135 (2009)
Long, T., Jin, L.W.: A novel orientation free method for online unconstrained cursive handwritten Chinese word recognition. In: Proceedings of the 19th International Conference on Pattern Recognition, ICPR08, pp. 1–4 (2008)
Krzanowski W.J., Jonathan P. et al.: Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Appl. Stat. 44(2), 101–115 (2005)
Huang, Z.B., Ding, K., Jin, L.W., Gao, X.: Writer adaptive online handwriting recognition using incremental linear discriminant analysis. In: Proceedings of 10th International Conference on Document Analysis and Recognition, ICDAR09, pp. 91–95 (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jin, L., Gao, Y., Liu, G. et al. SCUT-COUCH2009—a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation. IJDAR 14, 53–64 (2011). https://doi.org/10.1007/s10032-010-0116-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-010-0116-6