PWDB_13: A Corpus of Word-Level Printed Document Images from Thirteen Official Indic Scripts

Obaidullah, Sk. Md.; Halder, Chayan; Das, Nibaran; Roy, Kaushik

doi:10.1007/978-81-322-2695-6_21

Sk. Md. Obaidullah⁷,
Chayan Halder⁸,
Nibaran Das⁹ &
…
Kaushik Roy⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 404))

754 Accesses

Abstract

We present PWDB_13, a Word-level printed document image corpus from thirteen official Indic scripts, which consists of 26,000 words with equal distribution of each of the thirteen script types, collected by an automated process. A realistic classification framework based on four major regions of India has been proposed which represent the work as a unique one. Benchmarking is done with respect to PSI or printed script identification problem as it is very relevant in multi-script scenario. The result is said to be impressive observing the volume of the corpus and intrinsic complexities of Indic scripts. PWDB_13 will bridge the gap of unavailability of a complete document image dataset on all official Indic scripts and freely available to the researchers for noncommercial use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Obaidullah, S.M., Das, S.K., Roy, K.: A system for handwritten script identification from Indian document. J. Pattern Recognit. Res. 8(1), 1–12 (2013)
Article Google Scholar
Ghosh, D., Dube, T., Shivprasad, S.P.: Script recognition—a review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)
Article Google Scholar
Obaidullah, S.M., Rahaman, Z., Das, N., Roy, K.: Development of document image database for offline handwritten Indic script identification—a state-of-the-art. Int. J. Appl. Eng. Res. 9(20) special issue, 4625–4630, Research India Publication
Google Scholar
Chaudhuri, B.B., Pal, U.: An OCR system to read two Indian language scripts: Bangla and Devanagari (Hindi). In: Proceedings of 4th International Conference on Document Analysis and Recognition, pp. 18–20. University Health Network (1997)
Google Scholar
Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. 19, 176–181 (1997)
Google Scholar
Chaudhury, S., Harit, G., Madnani, S., Shet, R.B.: Identification of scripts of Indian languages by combining trainable classifiers. In: Proceedings of Indian Conference on Computer Vision, Graphics and Image Processing, Bangalore, India (2000)
Google Scholar
Dhanya, D., Ramakrishnan, A.G., Pati, P.B.: Script identification in printed bilingual documents. Sadhana 27(part-1), 73–82 (2002)
Google Scholar
Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. Pattern Recognit. Lett. 29(9), 1218–1229 (2008)
Article Google Scholar
Obaidullah, S.M., Mondal, A., Das, N., Roy, K.: Script identification from printed indian document images and performance evaluation using different classifiers. Appl. Comput. Intell. Soft Comput. 2014(Article ID 896128), 12 (2014)
Google Scholar
Roy, K., Banerjee, A., Pal, U.: A system for word-wise handwritten script identification for Indian postal automation. In: Proceedings of IEEE India Annual Conference, pp. 266–271 (2004)
Google Scholar
Obaidullah, S.M., Halder, C., Das, N., Roy, K.: A corpus of word-level offline handwritten numeral images from official Indic scripts. In: International Conference on Computer and Communication Technologies. AISC Series, Springer, Hyderabad (2015)
Google Scholar
Mandal, J.K., Sengupta, M.: Authentication/secret message transformation through wavelet transform based subband image coding (WTSIC). In: International Symposium on Electronic System Design 2010, pp. 225–229. Bhubaneswar, India (2010). ISBN: 978-0-7695-4294-2
Google Scholar
Bhateja, V., Urooj, S., Mehrotra, R., Verma, R., Ekuakille, A.L., Verma, V.D.: A composite wavelets and morphology approach for ECG noise filtering. PReMI 2013, 361–366 (2013)
Google Scholar
Dey, N., Das, A., Chaudhuri, S.S.: Wavelet based normal and abnormal heart sound identification using spectrogram analysis. Int. J. Comput. Sci. Eng. Technol. (IJCSET) 3(6) (2012)
Google Scholar
Pardeshi, R., Chaudhury, B.B., Hangarge, M., Santosh, K.C.: Automatic handwritten Indian scripts identification. In: Proceedings of 14th International Conference on Frontiers in Handwriting Recognition, pp. 375–380 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Aliah University, Kolkata, West Bengal, India
Sk. Md. Obaidullah
Department of Computer Science, West Bengal State University, Kolkata, West Bengal, India
Chayan Halder & Kaushik Roy
Department of Computer Science & Engineering, Jadavpur University, Kolkata, West Bengal, India
Nibaran Das

Authors

Sk. Md. Obaidullah
View author publications
You can also search for this author in PubMed Google Scholar
Chayan Halder
View author publications
You can also search for this author in PubMed Google Scholar
Nibaran Das
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sk. Md. Obaidullah .

Editor information

Editors and Affiliations

Machine Intelligence Unit, ISI, Kolkata, West Bengal, India
Swagatam Das
Computer Science and Engineering, National Institute of Technology, Durgapur, West Bengal, India
Tandra Pal
National Institute of Technology, Durgapur, West Bengal, India
Samarjit Kar
Deparment of CSE, Anil Neerukonda Ins. of Tech. & Sci., Vishakapatnam, India
Suresh Chandra Satapathy
Kalyani University, Nadia, West Bengal, India
Jyotsna Kumar Mandal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Obaidullah, S.M., Halder, C., Das, N., Roy, K. (2016). PWDB_13: A Corpus of Word-Level Printed Document Images from Thirteen Official Indic Scripts. In: Das, S., Pal, T., Kar, S., Satapathy, S., Mandal, J. (eds) Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015. Advances in Intelligent Systems and Computing, vol 404. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2695-6_21

Download citation

DOI: https://doi.org/10.1007/978-81-322-2695-6_21
Published: 25 October 2015
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2693-2
Online ISBN: 978-81-322-2695-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics