A Corpus of Word-Level Offline Handwritten Numeral Images from Official Indic Scripts

  • Sk Md Obaidullah
  • Chayan Halder
  • Nibaran Das
  • Kaushik Roy
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 379)


Dataset development is one of the most imperative tasks in document image processing research. The problem becomes more challenging when it comes about Numeral Image Database (NIdb) for official Indic scripts. Few efforts are made so far but they were restricted on single script which is basically a local script of the fellow researcher who prepared the database. In this paper, a technique for development of a handwritten NIdb of four popular Indic scripts namely Bangla, Devanagari, Roman and Urdu is proposed. Initially data were collected in unconstrained manner at Word-level from different writers with varying age, sex and educational qualification. All the images are stored in grey-level at .jpg format so that the data can be used in various ways as per need. A benchmark result on the present dataset is proposed using a novel hybrid approach with respect to Handwritten Numeral Script Identification (HNSI) problem.


Document image analysis Numeral image database Handwritten numeral script identification Wavelet radon transform Benchmarking 



The authors are very much thankful to Mr. Tousif Jaman and Mr. Sahaniaj Dhukra, students of Aliah University for their immense help during data collection process.


  1. 1.
    Obaidullah, S.M., Das, S.K., Roy, K.: A System for Handwritten Script Identification from Indian Document. J. Pattern Recogn. Res. 8(1), 1–12 (2013)CrossRefGoogle Scholar
  2. 2.
    Obaidullah, S.M., Rahaman, Z., Das, N., Roy, K.: Development of document image database for handwritten indic scripts. Int. J. Appl. Eng. Res. 9(20), 4625–4630 (2014)Google Scholar
  3. 3.
    Chaudhury, B.B.: A complete handwritten numeral database of Bangla—A Major Indic Script. In: 10th International Workshop on Frontiers in Handwriting Recognition, France (2006)Google Scholar
  4. 4.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradientbased learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  5. 5.
    Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16, 550–554 (1994)CrossRefGoogle Scholar
  6. 6.
    Saito, T., Yamada, H., Yamamoto, K.: On the database ELT9 of handprinted characters in JIS Chinese characters and its analysis (in Japanese). Trans. IECEJ, J. 68-D(4), 757–764 (1985)Google Scholar
  7. 7.
    Al-Ohali, Y., Cheriet, M., Suen, C.: Databases for recognition of handwritten Arabic cheques. Pattern Recogn. 36, 111–121 (2003)Google Scholar
  8. 8.
    Noumi, T., Matsui, T., Yamashita, I., Wakahara, T., Tsutsumida, T.: Tegaki Suji Database ‘IPTP CD-ROM1’ no Ichi Bunseki (in Japanese). In: 1994 Autumn Meeting of IEICE, D-309, September, 1994Google Scholar
  9. 9.
    Vajda, S., Roy, K., Pal, U., Chaudhuri, B.B., Belaid, A.: Automation of Indian postal documents written in Bangla and English. Int. J. Pattern Recogn. Artif. Intell. 23(8), 1599–1632 (2009)CrossRefGoogle Scholar
  10. 10.
    Roy, K., Banerjee, A., Pal, U.: A system for word-wise handwritten script identification for indian postal automation. In: Proceedings of IEEE India Annual Conference 2004, pp. 266–271 (2004)Google Scholar
  11. 11.
    Mandal, J.K., Sengupta, M.: Authentication/secret message transformation through wavelet transform based subband image coding (WTSIC). In: InternationalSymposium on Electronic System Design 2010, pp 225–229, ISBN 978-0-7695-4294-2, Bhubaneswar, India, doi: 10.1109/ISED.2010.50.,2010
  12. 12.
    Bhateja, V., Urooj, S., Mehrotra, R., Verma, R., Ekuakille, A.L., Verma, V.D.: A composite wavelets and morphology approach for ECG noise filtering. PReMI 2013, pp. 361–366Google Scholar
  13. 13.
    Dey, N., Das, A., Chaudhuri, S.S.: Wavelet based normal and abnormal heart sound identification using spectrogram analysis. Int. J. Comput. Sci. Eng. Technol. (IJCSET), 3(6) (2012). ISSN: 2229–3345Google Scholar
  14. 14.
    Matlab Documentation: Accessed Mar 01 2015
  15. 15.
    Obaidullah, S.M., Mondal, A., Das, N., Roy, K.: Script identification from printed indian document images and performance evaluation using different classifiers. Appl. Comput. Intell. Soft Comput. 2014, Article ID 896128, 12 (2014). doi: 10.1155/2014/896128

Copyright information

© Springer India 2016

Authors and Affiliations

  • Sk Md Obaidullah
    • 1
  • Chayan Halder
    • 2
  • Nibaran Das
    • 3
  • Kaushik Roy
    • 2
  1. 1.Department of Computer Science & EngineeringAliah UniversityKolkataIndia
  2. 2.Department of Computer ScienceWest Bengal State UniversityKolkataIndia
  3. 3.Department of Computer Science & EngineeringJadavpur UniversityKolkataIndia

Personalised recommendations