Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 404))

  • 754 Accesses

Abstract

We present PWDB_13, a Word-level printed document image corpus from thirteen official Indic scripts, which consists of 26,000 words with equal distribution of each of the thirteen script types, collected by an automated process. A realistic classification framework based on four major regions of India has been proposed which represent the work as a unique one. Benchmarking is done with respect to PSI or printed script identification problem as it is very relevant in multi-script scenario. The result is said to be impressive observing the volume of the corpus and intrinsic complexities of Indic scripts. PWDB_13 will bridge the gap of unavailability of a complete document image dataset on all official Indic scripts and freely available to the researchers for noncommercial use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Obaidullah, S.M., Das, S.K., Roy, K.: A system for handwritten script identification from Indian document. J. Pattern Recognit. Res. 8(1), 1–12 (2013)

    Article  Google Scholar 

  2. Ghosh, D., Dube, T., Shivprasad, S.P.: Script recognition—a review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)

    Article  Google Scholar 

  3. Obaidullah, S.M., Rahaman, Z., Das, N., Roy, K.: Development of document image database for offline handwritten Indic script identification—a state-of-the-art. Int. J. Appl. Eng. Res. 9(20) special issue, 4625–4630, Research India Publication

    Google Scholar 

  4. Chaudhuri, B.B., Pal, U.: An OCR system to read two Indian language scripts: Bangla and Devanagari (Hindi). In: Proceedings of 4th International Conference on Document Analysis and Recognition, pp. 18–20. University Health Network (1997)

    Google Scholar 

  5. Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. 19, 176–181 (1997)

    Google Scholar 

  6. Chaudhury, S., Harit, G., Madnani, S., Shet, R.B.: Identification of scripts of Indian languages by combining trainable classifiers. In: Proceedings of Indian Conference on Computer Vision, Graphics and Image Processing, Bangalore, India (2000)

    Google Scholar 

  7. Dhanya, D., Ramakrishnan, A.G., Pati, P.B.: Script identification in printed bilingual documents. Sadhana 27(part-1), 73–82 (2002)

    Google Scholar 

  8. Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. Pattern Recognit. Lett. 29(9), 1218–1229 (2008)

    Article  Google Scholar 

  9. Obaidullah, S.M., Mondal, A., Das, N., Roy, K.: Script identification from printed indian document images and performance evaluation using different classifiers. Appl. Comput. Intell. Soft Comput. 2014(Article ID 896128), 12 (2014)

    Google Scholar 

  10. Roy, K., Banerjee, A., Pal, U.: A system for word-wise handwritten script identification for Indian postal automation. In: Proceedings of IEEE India Annual Conference, pp. 266–271 (2004)

    Google Scholar 

  11. Obaidullah, S.M., Halder, C., Das, N., Roy, K.: A corpus of word-level offline handwritten numeral images from official Indic scripts. In: International Conference on Computer and Communication Technologies. AISC Series, Springer, Hyderabad (2015)

    Google Scholar 

  12. Mandal, J.K., Sengupta, M.: Authentication/secret message transformation through wavelet transform based subband image coding (WTSIC). In: International Symposium on Electronic System Design 2010, pp. 225–229. Bhubaneswar, India (2010). ISBN: 978-0-7695-4294-2

    Google Scholar 

  13. Bhateja, V., Urooj, S., Mehrotra, R., Verma, R., Ekuakille, A.L., Verma, V.D.: A composite wavelets and morphology approach for ECG noise filtering. PReMI 2013, 361–366 (2013)

    Google Scholar 

  14. Dey, N., Das, A., Chaudhuri, S.S.: Wavelet based normal and abnormal heart sound identification using spectrogram analysis. Int. J. Comput. Sci. Eng. Technol. (IJCSET) 3(6) (2012)

    Google Scholar 

  15. Pardeshi, R., Chaudhury, B.B., Hangarge, M., Santosh, K.C.: Automatic handwritten Indian scripts identification. In: Proceedings of 14th International Conference on Frontiers in Handwriting Recognition, pp. 375–380 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sk. Md. Obaidullah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer India

About this paper

Cite this paper

Obaidullah, S.M., Halder, C., Das, N., Roy, K. (2016). PWDB_13: A Corpus of Word-Level Printed Document Images from Thirteen Official Indic Scripts. In: Das, S., Pal, T., Kar, S., Satapathy, S., Mandal, J. (eds) Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015. Advances in Intelligent Systems and Computing, vol 404. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2695-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2695-6_21

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2693-2

  • Online ISBN: 978-81-322-2695-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics