A Robust OCR for Degraded Documents

Dhingra, Kapil Dev; Sanyal, Sudip; Sharma, Pramod Kumar

doi:10.1007/978-0-387-74938-9_34

Kapil Dev Dhingra⁴,
Sudip Sanyal⁴ &
Pramod Kumar Sharma⁴

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 4))

1005 Accesses
6 Citations
3 Altmetric

In the last two decades, many advances have been made in the field of document image analysis and recognition. In the recent past, several methods for recognizing Latin, Chinese, Japanese, and Arabic scripts have been proposed [7–9]. Until now, most of the OCR work has concentrated on high quality images and great success has been achieved by character recognition systems. Apart from these successes, there still exist two challenging problems in the field of recognition. The first one is optical character recognition (OCR) for low-quality images. Images having luminance variations, noise, and random degradation of text are difficult to read by OCR systems. The second open problem is that of recognizing off-line cursive handwritten character recognition [15]. Our work concentrates on the former one particularly for Devanagari script, which is the script for Hindi, Nepali, Marathi, and several other Indic languages. Together, these languages have a user base exceeding 500 million people.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bansal V, Sinha RMK (2001) A Devanagari OCR and a brief review of OCR research for Indian scripts. Proceedings of STRANS01
Google Scholar
Chaudhari BB, Pal U (1997) An OCR system to read two Indian Languages scripts. Proc. of 4th Int. Conf. on Document Analysis and Recognition, 1011–1015
Google Scholar
Atul Negi, Chakravarthy Bhagvati, Krishna B (2001) An OCR System for Telugu, ICDAR, 1110
Google Scholar
Jawahar CV, Pavan Kumar MNSSK, Ravi Kiran SS (2003) A Bilingual OCR for Hindi-Telugu Documents and its Applications, ICDAR. 408–412
Google Scholar
Xuewen Wang, Xiaoqing Ding, Changsong Liu (2002) Optimized Gabor Filters Based Feature Extraction for Character Recognition, Proc.16th International Conference on Pattern Recognition, 223–226
Google Scholar
Qiang Huo, Yong Ge and Zhi Dan Feng, (2001) High Performance Chinese OCR Based on Gabor Features, Discriminative Feature Extraction and Model Training. Proc. IEEE International Conference on Accoustic, Speech and Signal Processing, 1517–1520
Google Scholar
Mantas J (1986) An Overview of Character Recognition Methodologies, Pattern Recognition 19:425–430
Article Google Scholar
Bozinovic RM, Srihari SN (1989) Offline Cursive Script Word Recognition. IEEE Trans on Pattern Analaysis and Machine Intelligence 11:68–83
Article Google Scholar
Mori S, Suen CY, Yamamoto K (1992) Historical Review of OCR Research and Development. Proc. of IEEE 80:1029–1058
Article Google Scholar
Nagy G (2000) Twenty Years of Document Image Analysis in Pattern Analysis and Machine Intelligence. IEEE Trans. on Pattern Analysis and Machine Intelligence 22:38–62
Article Google Scholar
Zhang J, Yan Y, Lades M (1997) Face recognition: Eigenface, Elastic Matching, and Neural Nets. Proc. of IEEE 85:1423–1435
Article Google Scholar
Juang BH and Katigiri S (1992) Discriminative Learning for Minimum Error Classification Paper Title. IEEE Trans. on Signal Processing 4:3043–3054
Article Google Scholar
XuewenWang, Xiaoqing Ding and Changsong Liu (2005) Gabor Based Feature Extraction for Character Recognition. Pattern Recognition 38:369–379
Article Google Scholar
Alain Biem (2006) Minimum Classification Error Training for Online Handwriting Recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence 28:1041–1051
Article Google Scholar
Plamondon R, Srihari SN (2000) On-line and Off-line Handwriting Recognition: A Comprehensive Survey. IEEE. Trans. on Pattern Analysis and Machine Intelligence 22:63–84
Article Google Scholar
Kanungo T. et al. (2000) A Statistical, Nonparametric Methodology for Document Degradation Model Validation. IEEE Trans. on Pattern Analysis and Machine Intelligence 20:1209–1223
Google Scholar
Chaudhuri BB and Pal U (1997) Skew Angle Detection of Digitized Indian Script Documents. IEEE Trans. on Pattern Analysis and Machine Intelligence 19:182–186
Article Google Scholar
Maurer CR, Qi R, Raghavan V (2003) A Linear Time Algorithm for Computing Exact Euclidean Distance Transforms of Binary Images in Arbitrary Dimensions. IEEE Trans. on Pattern Analysis and Machine Intelligence 25:265–270
Article Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Information Technology Allahabad, Universal Digital Library Research Lab, 211012, UP, India
Kapil Dev Dhingra, Sudip Sanyal & Pramod Kumar Sharma

Authors

Kapil Dev Dhingra
View author publications
You can also search for this author in PubMed Google Scholar
Sudip Sanyal
View author publications
You can also search for this author in PubMed Google Scholar
Pramod Kumar Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Canberra, ACT 2601, Canberra, Australia
Xu Huang
National Taipei University, 151 University Rd, Taiwan, China
Yuh-Shyan Chen
IAENG Secretariat, 37-39 Hung To Road, Hong Kong, China
Sio-Iong Ao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dhingra, K.D., Sanyal, S., Sharma, P.K. (2008). A Robust OCR for Degraded Documents. In: Huang, X., Chen, YS., Ao, SI. (eds) Advances in Communication Systems and Electrical Engineering. Lecture Notes in Electrical Engineering, vol 4. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-74938-9_34

Download citation

DOI: https://doi.org/10.1007/978-0-387-74938-9_34
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-74937-2
Online ISBN: 978-0-387-74938-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics