Working of the Tesseract OCR on Different Fonts of Gujarati Language

Joshi, Kartik; Arolkar, Harshal

doi:10.1007/978-981-97-0744-7_15

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 916))

Included in the following conference series:

International Conference on Information and Communication Technology for Competitive Strategies

16 Accesses

Abstract

An optical character recognition engine is the technological solution for preserving books and manuscripts that may soon be lost due to deterioration. In digital form, documents and/or text files are editable, searchable, and shareable. To save them from getting destroyed, documents and/or text files need to be scanned/converted into digital form and passed onto the optical character recognition engine to generate the digital text file. For a large amount of data, manual typing and conversion is nearly impossible. In this paper, the authors have tried to analyze the working of the Tesseract OCR engine for the images that contain Gujarati text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wikipedia (2017) Gujarati language: Wikipedia. https://en.wikipedia.org/wiki/Gujarati_language
Smith R (2007) An overview of the Tesseract OCR engine. In: Ninth international conference on document analysis and recognition, 2007. ICDAR 2007, vol 2, pp 629–633
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
GitHub (2017) Tesseract OCR GitHub. https://github.com/tesseract-ocr
Verstraeten C (2017) How to train Tesseract 3.01: Cédric Verstraeten. https://blog.cedric.ws/how-to-train-tesseract-301
Patel C, Patel A, Patel D (2012) Optical character recognition by open-source OCR tool tesseract: a case study. Int J Comput Appl 55:10
Google Scholar
Audichya MK, Saini JKR (2022) A study to recognize printed Gujarati characters using Tesseract OCR. 1, 2 Computer Science, Gujarat Technological University
Google Scholar
Patel C, Desai A (2013) Gujarati handwritten character recognition using hybrid method based on binary tree-classifier and K-nearest neighbour. Int J Eng Res Technol (IJERT) 2(6):2337–2345
Google Scholar
Chaudhari SA, Gulati RM (2013) An OCR for separation and identification of mixed English: Gujarati digits using kNN classifier. Int Confer Intell Syst Sig Process (ISSP) 13:190–193
Google Scholar

Download references

Author information

Authors and Affiliations

GLS University, Ahmedabad, Gujarat, 380006, India
Kartik Joshi & Harshal Arolkar

Authors

Kartik Joshi
View author publications
You can also search for this author in PubMed Google Scholar
Harshal Arolkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kartik Joshi .

Editor information

Editors and Affiliations

Global Knowledge Research Foundation, Ahmedabad, Gujarat, India
Amit Joshi
Nottingham Trent University, Nottingham, UK
Mufti Mahmud
University of Peradeniya, Delthota, Sri Lanka
Roshan G. Ragel
Department of CSE, SNS College of Technology, Coimbatore, Tamil Nadu, India
S. Kartik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Joshi, K., Arolkar, H. (2024). Working of the Tesseract OCR on Different Fonts of Gujarati Language. In: Joshi, A., Mahmud, M., Ragel, R.G., Kartik, S. (eds) ICT: Cyber Security and Applications. ICTCS 2022. Lecture Notes in Networks and Systems, vol 916. Springer, Singapore. https://doi.org/10.1007/978-981-97-0744-7_15

Download citation

DOI: https://doi.org/10.1007/978-981-97-0744-7_15
Published: 14 May 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0743-0
Online ISBN: 978-981-97-0744-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics