A Corpus of Word-Level Offline Handwritten Numeral Images from Official Indic Scripts
Dataset development is one of the most imperative tasks in document image processing research. The problem becomes more challenging when it comes about Numeral Image Database (NIdb) for official Indic scripts. Few efforts are made so far but they were restricted on single script which is basically a local script of the fellow researcher who prepared the database. In this paper, a technique for development of a handwritten NIdb of four popular Indic scripts namely Bangla, Devanagari, Roman and Urdu is proposed. Initially data were collected in unconstrained manner at Word-level from different writers with varying age, sex and educational qualification. All the images are stored in grey-level at .jpg format so that the data can be used in various ways as per need. A benchmark result on the present dataset is proposed using a novel hybrid approach with respect to Handwritten Numeral Script Identification (HNSI) problem.
KeywordsDocument image analysis Numeral image database Handwritten numeral script identification Wavelet radon transform Benchmarking
The authors are very much thankful to Mr. Tousif Jaman and Mr. Sahaniaj Dhukra, students of Aliah University for their immense help during data collection process.
- 2.Obaidullah, S.M., Rahaman, Z., Das, N., Roy, K.: Development of document image database for handwritten indic scripts. Int. J. Appl. Eng. Res. 9(20), 4625–4630 (2014)Google Scholar
- 3.Chaudhury, B.B.: A complete handwritten numeral database of Bangla—A Major Indic Script. In: 10th International Workshop on Frontiers in Handwriting Recognition, France (2006)Google Scholar
- 6.Saito, T., Yamada, H., Yamamoto, K.: On the database ELT9 of handprinted characters in JIS Chinese characters and its analysis (in Japanese). Trans. IECEJ, J. 68-D(4), 757–764 (1985)Google Scholar
- 7.Al-Ohali, Y., Cheriet, M., Suen, C.: Databases for recognition of handwritten Arabic cheques. Pattern Recogn. 36, 111–121 (2003)Google Scholar
- 8.Noumi, T., Matsui, T., Yamashita, I., Wakahara, T., Tsutsumida, T.: Tegaki Suji Database ‘IPTP CD-ROM1’ no Ichi Bunseki (in Japanese). In: 1994 Autumn Meeting of IEICE, D-309, September, 1994Google Scholar
- 10.Roy, K., Banerjee, A., Pal, U.: A system for word-wise handwritten script identification for indian postal automation. In: Proceedings of IEEE India Annual Conference 2004, pp. 266–271 (2004)Google Scholar
- 11.Mandal, J.K., Sengupta, M.: Authentication/secret message transformation through wavelet transform based subband image coding (WTSIC). In: InternationalSymposium on Electronic System Design 2010, pp 225–229, ISBN 978-0-7695-4294-2, Bhubaneswar, India, doi: 10.1109/ISED.2010.50.,2010
- 12.Bhateja, V., Urooj, S., Mehrotra, R., Verma, R., Ekuakille, A.L., Verma, V.D.: A composite wavelets and morphology approach for ECG noise filtering. PReMI 2013, pp. 361–366Google Scholar
- 13.Dey, N., Das, A., Chaudhuri, S.S.: Wavelet based normal and abnormal heart sound identification using spectrogram analysis. Int. J. Comput. Sci. Eng. Technol. (IJCSET), 3(6) (2012). ISSN: 2229–3345Google Scholar
- 14.Matlab Documentation: http://www.mathworks.in/help/pdf_doc/images/images_tb.pdf. Accessed Mar 01 2015
- 15.Obaidullah, S.M., Mondal, A., Das, N., Roy, K.: Script identification from printed indian document images and performance evaluation using different classifiers. Appl. Comput. Intell. Soft Comput. 2014, Article ID 896128, 12 (2014). doi: 10.1155/2014/896128