Skip to main content
Log in

A survey of mono- and multi-lingual character recognition using deep and shallow architectures: indic and non-indic scripts

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The cultural and regional diversity across the world and specifically in India has given birth to a large number of writing systems and scripts having a variety of character sets. For scripts having a larger character set, just a simple keyboard with limited character set is not the optimal way for providing inputs to the computer. Variations in individual handwriting due to mood swings, changes in medium of writing, changes in writing styles, etc. pose a challenge before the character recognition (CR) research community. Similar kinds of symbols in various scripts and languages act as a big barrier in multilingual CR. Lack of benchmark results and corpora for multilingual CR hinder the research in multilingual CR. There have been only a limited number of articles for optimal combination of features and classifiers to process multilingual data. Multilingual CR has least explored the Indic scripts. This paper presents a detailed review and analysis of the work done in multilingual online as well as offline CR for Indic and non-Indic scripts. The paper mainly contributes in two ways: Firstly, it provides a clear perspective about various phases of monolingual and multilingual CR; and secondly, identifies the major deficiencies in monolingual and multilingual CR for printed and handwritten text. It contributes by giving an in-depth view of work done at each phase including data acquisition, pre-processing, segmentation, feature extraction, recognition and post-processing of CR. Issues to be resolved at each phase have also been elaborated. The recent work done using Deep and Shallow architectures has been analysed. Tools used for these architectures have been compared to highlight their pros and cons. The present work also suggests how further research can be conducted in the field of monolingual and multilingual CR. The problems such as CR in hybrid documents, identifying more reliable features, resolving issues of similar characters, identifying optimal combination strategies for deep and shallow architectures, etc. need to be tackled in future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Abdelaziz I, Abdou S, Al-Barhamtoshy H (2016) A large vocabulary system for arabic online handwriting recognition. Pattern Anal Appl 19(4):1129–1141

    Article  MathSciNet  Google Scholar 

  • Agrawal M, Bali K, Madhvanath S, Vuurpijl L (2005) Upx: A new xml representation for annotated datasets of online handwriting data. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 1161–1165

  • Ahmed SB, Naz S, Razzak MI, Rashid SF, Afzal MZ, Breuel TM (2016) Evaluation of cursive and non-cursive scripts using recurrent neural networks. Neural Comput Appl 27(3):603–613

    Article  Google Scholar 

  • Ahmed SB, Naz S, Swati S, Razzak MI (2017) Handwritten Urdu character recognition using one-dimensional BLSTM classifier. Neural Comput Appl, pp 1–9

  • Ait-Mohand K, Paquet T, Ragot N (2014) Combining structure and parameter adaptation of hmms for printed text recognition. IEEE Trans Pattern Anal Mach Intell 36(9):1716–1732

    Article  Google Scholar 

  • Al-Boeridi ON, Ahmad SS, Koh S (2015) A scalable hybrid decision system (HDS) for roman word recognition using ann SVM: study case on malay word recognition. Neural Comput Appl 26(6):1505–1513

    Article  Google Scholar 

  • Al Maadeed S, Ayouby W, Hassaïne A, Aljaam JM (2012) Quwi: an arabic and english handwriting dataset for offline writer identification. In: Frontiers in handwriting recognition (ICFHR), 2012 international conference on, IEEE, pp 746–751

  • Alginahi YM, Mudassar M, Kabir MN (2015) An arabic script recognition system. KSII Trans Internet Inf Syst 9(9):3701–3720

    Google Scholar 

  • Almaksour A, Anquetil E (2009) Fast incremental learning strategy driven by confusion reject for online handwriting recognition. In: Document analysis and recognition, 2009. ICDAR’09. 10th international conference on, IEEE, pp 81–85

  • Amara NEB, Mazhoud O, Bouzrara N, Ellouze N (2005) Arabase: a relational database for arabic OCR systems. Int Arab J Inf Technol 2(4):259–266

    Google Scholar 

  • Arica N, Yarman-Vural FT (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans Syst Man Cybern Part C 31(2):216–233

    Article  Google Scholar 

  • Arora S, Sharma D, Arora S (2014) Recognition of gurmukhi text from sign board images captured from mobile camera. Int J Inf Comput Technol 4(17):1839–1845

    Google Scholar 

  • Arvind K, Kumar J, Ramakrishnan A (2007) Line removal and restoration of handwritten strokes. In: Conference on computational intelligence and multimedia applications, 2007. International conference on, IEEE, vol 3, pp 208–214

  • Azeem SA, Ahmed H (2013) Effective technique for the recognition of offline arabic handwritten words using hidden markov models. Int J Doc Anal Recognit 16(4):399–412

    Article  Google Scholar 

  • Bag S, Harit G, Bhowmick P (2014) Recognition of bangla compound characters using structural decomposition. Pattern Recognit 47(3):1187–1201

    Article  Google Scholar 

  • Bai ZL, Huo Q (2004) Underline detection and removal in a document image using multiple strategies. In: Pattern recognition, 2004. ICPR 2004. Proceedings of the 17th international conference on, IEEE, vol 2, pp 578–581

  • Bansal V, Sinha R (2002) Segmentation of touching and fused Devanagari characters. Pattern Recognit 35(4):875–893

    Article  MATH  Google Scholar 

  • Baral S, Bhattacharya S, Chakraborty A, Bhattacharya U, Parui SK (2014) A machine learning approach to detection of core region of online handwritten bangla word samples. In: Frontiers in handwriting recognition (ICFHR), 2014 14th international conference on, IEEE, pp 458–463

  • Basu S, Das N, Sarkar R, Kundu M, Nasipuri M, Basu DK (2009) A hierarchical approach to recognition of handwritten bangla characters. Pattern Recognit 42(7):1467–1484

    Article  MATH  Google Scholar 

  • Benjelil M, Kanoun S, Mullot R, Alimi AM (2009) Arabic and latin script identification in printed and handwritten types based on steerable pyramid features. In: Document analysis and recognition, 2009. ICDAR’09. 10th international conference on, IEEE, pp 591–595

  • Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305

    MathSciNet  MATH  Google Scholar 

  • Bharath A, Madhvanath S (2012) Hmm-based lexicon-driven and lexicon-free word recognition for online handwritten indic scripts. IEEE Trans Pattern Anal Mach Intell 34(4):670–682

    Article  Google Scholar 

  • Bharath A, Madhvanath S (2014) Allograph modeling for online handwritten characters in devanagari using constrained stroke clustering. ACM Trans Asian Lang Inf Process 13(3):12

    Google Scholar 

  • Bhaskarabhatla AS, Madhvanath S (2004) Experiences in collection of handwriting data for online handwriting recognition in indic scripts. In: LREC, Citeseer

  • Bhattacharya S, Maitra DS, Bhattacharya U, Parui SK (2016) An end-to-end system for bangla online handwriting recognition. In: Frontiers in handwriting recognition (ICFHR), 2016 15th International conference on, IEEE, pp 373–378

  • Bhattacharya U, Shridhar M, Parui SK, Sen P, Chaudhuri B (2012) Offline recognition of handwritten bangla characters: an efficient two-stage approach. Pattern Anal Appl 15(4):445–458

    Article  MathSciNet  Google Scholar 

  • Bhowmik TK, Parui SK, Roy U, Schomaker L (2016) Bangla handwritten character segmentation using structural features: a supervised and bootstrapping approach. ACM Trans Asian Low-Resour Lang Inf Process 15(4):29

    Article  Google Scholar 

  • Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recognit 85:172–184

    Article  Google Scholar 

  • Biadsy F, Saabni R, El-Sana J (2011) Segmentation-free online arabic handwriting recognition. Int J Pattern Recognit Artif Intell 25(07):1009–1033

    Article  Google Scholar 

  • Bianne-Bernard AL, Menasri F, Mohamad RAH, Mokbel C, Kermorvant C, Likforman-Sulem L (2011) Dynamic and contextual information in hmm modeling for handwritten word recognition. IEEE Trans Pattern Anal Mach Intell 33(10):2066–2080

    Article  Google Scholar 

  • Blanchard J, Artieres T (2004) On-line handwritten documents segmentation. In: Frontiers in handwriting recognition, 2004. IWFHR-9 2004. Ninth international workshop on, IEEE, pp 148–153

  • Blumenstein M, Cheng CK, Liu XY (2002) New preprocessing techniques for handwritten word recognition. In: Proceedings of the second IASTED international conference on visualization, imaging and image processing (VIIP 2002), ACTA Press, Calgary, pp 480–484

  • Bozinovic RM, Srihari SN (1989) Off-line cursive script word recognition. IEEE Trans Pattern Anal Mach Intell 11(1):68–83

    Article  Google Scholar 

  • Carbonnel S, Anquetil E (2004) Lexicon organization and string edit distance learning for lexical post-processing in handwriting recognition. In: Frontiers in handwriting recognition, 2004. IWFHR-9 2004. Ninth international workshop on, IEEE, pp 462–467

  • Casey RG, Lecolinet E (1996) A survey of methods and strategies in character segmentation. IEEE Trans Pattern Anal Mach Intell 18(7):690–706

    Article  Google Scholar 

  • Cavalin PR, Sabourin R, Suen CY, Britto AS Jr (2009) Evaluation of incremental learning algorithms for hmm in the recognition of alphanumeric characters. Pattern Recognit 42(12):3241–3253

    Article  MATH  Google Scholar 

  • Chakraborty D, Pal U (2016) Baseline detection of multi-lingual unconstrained handwritten text lines. Pattern Recognit Lett 74:74–81

    Article  Google Scholar 

  • Chherawala Y, Roy PP, Cheriet M (2016) Feature set evaluation for offline handwriting recognition systems: application to the recurrent neural network model. IEEE Trans Cybern 46(12):2825–2836

    Article  Google Scholar 

  • Chherawala Y, Roy PP, Cheriet M (2017) Combination of context-dependent bidirectional long short-term memory classifiers for robust offline handwriting recognition. Pattern Recognit Lett 90:58–64

    Article  Google Scholar 

  • Connell SD, Jain AK (2001) Template-based online character recognition. Pattern Recognit 34(1):1–14

    Article  MATH  Google Scholar 

  • Connell SD, Jain AK (2002) Writer adaptation for online handwriting recognition. IEEE Trans Pattern Anal Mach Intell 24(3):329–346

    Article  Google Scholar 

  • Dalal S, Malik L (2008) A survey of methods and strategies for feature extraction in handwritten script identification. In: Emerging trends in engineering and technology, 2008. ICETET’08. First international conference on, IEEE, pp 1164–1169

  • Das N, Reddy JM, Sarkar R, Basu S, Kundu M, Nasipuri M, Basu DK (2012) A statistical-topological feature combination for recognition of handwritten numerals. Appl Soft Comput 12(8):2486–2495

    Article  Google Scholar 

  • Dash KS, Puhan NB, Panda G (2016) BESAC: binary external symmetry axis constellation for unconstrained handwritten character recognition. Pattern Recognit Lett 83:413–422

    Article  Google Scholar 

  • De Oliveira J, de Carvalho JM, de A Freitas C, Sabourin R (2002) Feature sets evaluation for handwritten word recognition. In: Frontiers in handwriting recognition, 2002. Proceedings. Eighth international workshop on, IEEE, pp 446–450

  • De Stefano C, Marcelli A (2004) An efficient method for online cursive handwriting strokes reordering. Int J Pattern Recognit Artif Intell 18(07):1157–1171

    Article  Google Scholar 

  • Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process 3

  • Dhaka VP, Sharma MK (2015) An efficient segmentation technique for devanagari offline handwritten scripts using the feedforward neural network. Neural Comput Appl 26(8):1881–1893

    Article  Google Scholar 

  • Dutta D, Chowdhury AR, Bhattacharya U, Parui SK (2014) Stroke level user-adaptation for stroke order free online handwriting recognition. In: Frontiers in handwriting recognition (ICFHR), 2014 14th international conference on, IEEE, pp 250–255

  • Elanwar RI, Rashwan MA, Mashali SA (2007) Simultaneous segmentation and recognition of arabic characters in an unconstrained on-line cursive handwritten document. In: Proceedings of world academy of science, engineering and technology vol 23, pp 288–291

  • Elgammal AM, Ismail MA (2001) Techniques for language identification for hybrid Arabic-English document images. In: Document analysis and recognition, 2001. Proceedings. Sixth international conference on, IEEE, pp 1100–1104

  • Elnagar A, Alhajj R (2003) Segmentation of connected handwritten numeral strings. Pattern Recognit 36(3):625–634

    Article  Google Scholar 

  • Eskenazi S, Gomez-Krämer P, Ogier JM (2017) A comprehensive survey of mostly textual document segmentation algorithms since 2008. Pattern Recognit 64:1–14

    Article  Google Scholar 

  • Farooq F, Bhardwaj A, Govindaraju V (2009) Using topic models for ocr correction. Int J Doc Anal Recognit 12(3):153–164

    Article  Google Scholar 

  • Farulla GA, Murru N, Rossini R (2017) A fuzzy approach to segment touching characters. Expert Syst Appl 88:1–13

    Article  Google Scholar 

  • Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H (2018) Gan-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. arXiv preprint arXiv:1803.01229

  • Frishkopf L, Harmon L (1961) Machine reading of cursive script. Inf Theory, pp 300–316

  • Gader PD, Khabou MA (1996) Automatic feature generation for handwritten digit recognition. IEEE Trans Pattern Anal Mach Intell 18(12):1256–1261

    Article  Google Scholar 

  • Ghods V, Kabir E, Razzazi F (2013) Effect of delayed strokes on the recognition of online farsi handwriting. Pattern Recognit Lett 34(5):486–491

    Article  Google Scholar 

  • Ghods V, Kabir E, Razzazi F (2014) Fusion of hmm classifiers, based on x, y and (x, y) signals, for the recognition of online farsi handwriting: a large lexicon approach. Arab J Sci Eng 39(3):1713–1723

    Article  Google Scholar 

  • Ghosh D, Dube T, Shivaprasad A (2010) Script recognition: a review. IEEE Trans Pattern Anal Mach Intell 32(12):2142–2161

    Article  Google Scholar 

  • Giménez A, Khoury I, Andrés-Ferrer J, Juan A (2014) Handwriting word recognition using windowed bernoulli HMMs. Pattern Recognit Lett 35:149–156

    Article  Google Scholar 

  • Guerfali W, Plamondon R (1993) Normalizing and restoring on-line handwriting. Pattern Recognit 26(3):419–431

    Article  Google Scholar 

  • Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  • Hládek D, Staš J, Ondáš S, Juhár J, Kovács L (2017) Learning string distance with smoothing for OCR spelling correction. Multimedia Tools and Appl 76(22):24549–24567

    Article  Google Scholar 

  • Hochberg J, Kelly P, Thomas T, Kerns L (1997) Automatic script identification from document images using cluster-based templates. IEEE Trans Pattern Anal Mach Intell 19(2):176–181

    Article  Google Scholar 

  • Hochberg J, Bowers K, Cannon M, Kelly P (1999) Script and language identification for handwritten document images. Int J Doc Anal Recognit 2(2–3):45–52

    Article  Google Scholar 

  • Holzinger A, Stocker C, Peischl B, Simonic KM (2012) On using entropy for enhancing handwriting preprocessing. Entropy 14(11):2324–2350

    Article  Google Scholar 

  • Hu J, Rosenthal AS, Brown MK (1997) Combining high-level features with sequential local features for on-line handwriting recognition. In: International conference on image analysis and processing. Springer, Berlin, pp 647–654

  • Huang BQ, Zhang Y, Kechadi MT (2007) Preprocessing techniques for online handwriting recognition. In: Intelligent systems design and applications, 2007. ISDA 2007. Seventh international conference on, IEEE, pp 793–800

  • Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554

    Article  Google Scholar 

  • Humied IA (2016) Segmentation accuracy for offline arabic handwritten recognition based on bounding box algorithm. Int J Comput Sci Netw Secur 16(9):98

    Google Scholar 

  • Hussain R, Raza A, Siddiqi I, Khurshid K, Djeddi C (2015) A comprehensive survey of handwritten document benchmarks: structure, usage and evaluation. EURASIP J Image Video Process 2015(1):46

    Article  Google Scholar 

  • Iwana BK, Frinken V, Riesen K, Uchida S (2017) Efficient temporal pattern recognition by means of dissimilarity space embedding with discriminative prototypes. Pattern Recognit 64:268–276

    Article  MATH  Google Scholar 

  • Jaeger S, Nakagawa M (2001) Two on-line Japanese character databases in unipen format. In: Document analysis and recognition, 2001. Proceedings. Sixth international conference on, IEEE, pp 566–570

  • Jaeger S, Ma H, Doermann D (2005) Identifying script on word-level with informational confidence. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 416–420

  • Jawahar C, Kumar MP, Kiran SR (2003) A bilingual ocr for hindi-telugu documents and its applications. In: Document analysis and recognition, 2003. Proceedings. Seventh international conference on, IEEE, pp 408–412

  • Jayadevan R, Kolhe SR, Patil PM, Pal U (2011) Offline recognition of devanagari script: a survey. IEEE Trans Syst Man Cybern Part C 41(6):782–796

    Article  Google Scholar 

  • Jayech K, Mahjoub MA, Amara NEB (2016) Synchronous multi-stream hidden markov model for offline arabic handwriting recognition without explicit segmentation. Neurocomputing 214:958–971

    Article  Google Scholar 

  • Jothi JAA, Rajam VMA (2017) A survey on automated cancer diagnosis from histopathology images. Artif Intell Rev 48(1):31–81

    Article  Google Scholar 

  • Kacem A, Saïdani A (2017) A texture-based approach for word script and nature identification. Pattern Anal Appl 20(4):1157–1167

    Article  MathSciNet  Google Scholar 

  • Kavallieratou E, Fakotakis N, Kokkinakis G (1999) New algorithms for skewing correction and slant removal on word-level [ocr]. In: Electronics, circuits and systems, 1999. Proceedings of ICECS’99. The 6th IEEE international conference on, IEEE, vol 2, pp 1159–1162

  • Kavitha S, Shivakumara P, Kumar GH, Tan C (2015) A robust script identification system for historical indian document images. Malays J Comput Sci 28(4):283–300

    Article  Google Scholar 

  • Keysers D, Deselaers T, Rowley HA, Wang LL, Carbune V (2017) Multi-language online handwriting recognition. IEEE Trans Pattern Anal Mach Intell 39(6):1180–1194

    Article  Google Scholar 

  • Kherallah M, Elbaati A, Abed H, Alimi A (2008) The on/off (LMCA) dual arabic handwriting database. In: 11th International conference on frontiers in handwriting recognition (ICFHR)

  • Kherallah M, Tagougui N, Alimi AM, El Abed H, Margner V (2011) Online arabic handwriting recognition competition. In: Document analysis and recognition (ICDAR), 2011 international conference on, IEEE, pp 1454–1458

  • Kim IJ, Xie X (2015) Handwritten hangul recognition using deep convolutional neural networks. Int J Doc Anal Recognit 18(1):1–13

    Article  Google Scholar 

  • Kukich K (1992) Techniques for automatically correcting words in text. ACM Comput Surv 24(4):377–439

    Article  Google Scholar 

  • Kumar M, Jindal M, Sharma R, Jindal SR (2018) Character and numeral recognition for non-indic and indic scripts: a survey. Artif Intell Rev, pp 1–27

  • Kumar R, Sharma RK (2013) An efficient post processing algorithm for online handwriting Gurmukhi character recognition using set theory. Int J Pattern Recognit Artif Intell 27(04):1353002

    Article  Google Scholar 

  • Lacerda EB, Mello CA (2013) Segmentation of connected handwritten digits using self-organizing maps. Expert Syst Appl 40(15):5867–5877

    Article  Google Scholar 

  • Lai S, Jin L, Yang W (2017) Toward high-performance online HCCR: A CNN approach with dropdistortion, path signature and spatial stochastic max-pooling. Pattern Recognit Lett 89:60–66

    Article  Google Scholar 

  • Lam L, Suen CY (1995) Optimal combinations of pattern classifiers. Pattern Recognit Lett 16(9):945–954

    Article  Google Scholar 

  • Lee JJ, Kim JH (1996) A unified network-based approach for online recognition of multi-lingual cursive handwritings. In: Proceedings of fifth international workshop frontiers in handwriting recognition, pp 393–397

  • Lee MH, Kim SH, Lee GS, Kim SH, Yang HJ (2012) Correction for misrecognition of korean texts in signboard images using improved levenshtein metric. KSII Trans Internet Inf Syst 6(2):722–733

    Google Scholar 

  • Lehal G, Singh C (2001) A technique for segmentation of Gurmukhi text. In: International conference on computer analysis of images and patterns, Springer, Berlin, pp 191–200

  • Li F, Shen Q, Li Y, Mac Parthaláin N (2016) Handwritten chinese character recognition using fuzzy image alignment. Soft Comput 20(8):2939–2949

    Article  Google Scholar 

  • Li Y, Jin L, Zhu X, Long T (2008) SCUT-COUCH2008: a comprehensive online unconstrained chinese handwriting dataset. ICFHR 2008:165–170

    Google Scholar 

  • Li YX, Tan CL, Ding X (2005) A hybrid post-processing system for offline handwritten chinese script recognition. Pattern Anal Appl 8(3):272–286

    Article  MathSciNet  Google Scholar 

  • Liu CL, Jaeger S, Nakagawa M (2004) Online recognition of chinese characters: the state-of-the-art. IEEE Trans Pattern Anal Mach Intell 26(2):198–213

    Article  Google Scholar 

  • Liu CL, Yin F, Wang DH, Wang QF (2011) Casia online and offline chinese handwriting databases. In: Document analysis and recognition (ICDAR), 2011 International conference on, IEEE, pp 37–41

  • Liu X, Fu H, Jia Y (2008) Gaussian mixture modeling and learning of neighboring characters for multilingual text extraction in images. Pattern Recognit 41(2):484–493

    Article  MATH  Google Scholar 

  • Liu YH, Lin CC, Chang F (2005) Language identification of character images using machine learning techniques. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 630–634

  • Liwicki M, Bunke H (2005) Iam-ondb-an on-line english sentence database acquired from handwritten text on a whiteboard. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 956–961

  • Liwicki M, Bunke H (2009) Feature selection for HMM and BLSTM based handwriting recognition of whiteboard notes. Int J Pattern Recognit Artif Intell 23(05):907–923

    Article  Google Scholar 

  • Llorens D, Prat F, Marzal A, Vilar JM, Castro MJ, Amengual JC, Barrachina S, Castellanos A, Boquera SE, Gómez J, et al (2008) The UJIpenchars database: a pen-based database of isolated handwritten characters. In: LREC

  • Lorigo LM, Govindaraju V (2006) Offline arabic handwriting recognition: a survey. IEEE Trans Pattern Anal Mach Intell 28(5):712–724

    Article  Google Scholar 

  • Ma L, Liu H, Wu J (2011) MRG-OHTC database for online handwritten tibetan character recognition. In: Document analysis and recognition (ICDAR), 2011 international conference on, IEEE, pp 207–211

  • Mahalat MH, Mollah AF, Basu S, Nasipuri M (2017) Design of novel post-processing algorithms for handwritten arabic numerals classification. Int J Appl Pattern Recognit 4(4):342–357

    Article  Google Scholar 

  • Mandler E (1987) Advanced preprocessing technique for on-line script recognition of nonconnected symbols. In: Proceedings of 3rd international symposium on handwriting and computer applications, pp 64–66

  • Marti UV, Bunke H (2001) Using a statistical language model to improve the performance of an hmm-based cursive handwriting recognition system. In: Hidden Markov models: applications in computer vision. World Scientific, Singapore, pp 65–90

  • Mitrpanont JL, Limkonglap U (2007) Using contour analysis to improve feature extraction in thai handwritten character recognition systems. In: Computer and information technology, 2007. CIT 2007. 7th IEEE international conference on, IEEE, pp 668–673

  • Mohamad RAH, Likforman-Sulem L, Mokbel C (2009) Combining slanted-frame classifiers for improved HMM-based arabic handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31(7):1165–1177

    Article  Google Scholar 

  • Mohamed Ar, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22

    Article  Google Scholar 

  • Mori S, Suen CY, Yamamoto K (1992) Historical review of ocr research and development. Proc IEEE 80(7):1029–1058

    Article  Google Scholar 

  • Nakagawa M, Matsumoto K (2004) Collection of on-line handwritten japanese character pattern databases and their analyses. Doc Anal Recognit 7(1):69–81

    Google Scholar 

  • Namboodiri AM, Jain AK (2004) Online handwritten script recognition. IEEE Trans Pattern Anal Mach Intell 26(1):124–130

    Article  Google Scholar 

  • Naz S, Umar AI, Ahmad R, Ahmed SB, Shirazi SH, Siddiqi I, Razzak MI (2016) Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing 177:228–241

    Article  Google Scholar 

  • Naz S, Umar AI, Ahmad R, Ahmed SB, Shirazi SH, Razzak MI (2017) Urdu Nasta’liq text recognition system based on multi-dimensional recurrent neural network and statistical features. Neural Comput Appl 28(2):219–231

    Article  Google Scholar 

  • Nethravathi B, Archana C, Shashikiran K, Ramakrishnan AG, Kumar V (2010) Creation of a huge annotated database for tamil and kannada ohr. In: Frontiers in handwriting recognition (ICFHR), 2010 international conference on, IEEE, pp 415–420

  • Nguyen CT, Zhu B, Nakagawa M (2014) A semi-incremental recognition method for on-line handwritten english text. In: Frontiers in handwriting recognition (ICFHR), 2014 14th international conference on, IEEE, pp 234–239

  • Niu XX, Suen CY (2012) A novel hybrid CNN–SVM classifier for recognizing handwritten digits. Pattern Recognit 45(4):1318–1325

    Article  Google Scholar 

  • Obaidullah SM, Halder C, Santosh K, Das N, Roy K (2018) Phdindic_11: page-level handwritten document image dataset of 11 official indic scripts for script identification. Multimedia Tools Appl 77(2):1643–1678

    Article  Google Scholar 

  • Oprean C, Likforman-Sulem L, Popescu A, Mokbel C (2015) Handwritten word recognition using web resources and recurrent neural networks. Int J Doc Anal Recognit 18(4):287–301

    Article  Google Scholar 

  • Pal U, Belaıd A, Choisy C (2003) Touching numeral segmentation using water reservoir concept. Pattern Recognit Lett 24(1–3):261–272

    Article  Google Scholar 

  • Pal U, Jayadevan R, Sharma N (2012) Handwriting recognition in indian regional scripts: a survey of offline techniques. ACM Trans Asian Lang Inf Process 11(1):1

    Article  Google Scholar 

  • Pan W, Suen CY, Bui TD (2005) Script identification using steerable gabor filters. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 883–887

  • Park N, Mohammadi M, Gorde K, Jajodia S, Park H, Kim Y (2018) Data synthesis based on generative adversarial networks. Proc VLDB Endow 11(10):1071–1083

    Article  Google Scholar 

  • Pati PB, Ramakrishnan A (2008) Word level multi-script identification. Pattern Recognit Lett 29(9):1218–1229

    Article  Google Scholar 

  • Pitrelli JF, Perrone MP (2002) Confidence modeling for verification post-processing for handwriting recognition. In: Frontiers in handwriting recognition, 2002. Proceedings. Eighth international workshop on, IEEE, pp 30–35

  • Plamondon R, Srihari SN (2000) Online and off-line handwriting recognition: a comprehensive survey. IEEE Trans Pattern Anal Mach Intell 22(1):63–84

    Article  Google Scholar 

  • Plötz T, Fink GA (2009) Markov models for offline handwriting recognition: a survey. Int J Doc Anal Recognit 12(4):269

    Article  Google Scholar 

  • Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125

    Article  Google Scholar 

  • Rabi M, Amrouch M, Mahani Z (2018) Recognition of cursive arabic handwritten text using embedded training based on hidden markov models. Int J Pattern Recognit Artif Intell 32(01):1860007

    Article  Google Scholar 

  • Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  • Razzak MI, Husain SA, Mirza AA, Belaid A (2012) Fuzzy based preprocessing using fusion of online and offline trait for online urdu script based languages character recognition. Int J Innov Comput Inf Control 8(5):21

    Google Scholar 

  • Rehman A, Saba T (2014) Neural networks for document image preprocessing: state of the art. Artif Intell Rev 42(2):253–273

    Article  Google Scholar 

  • Rehman A, Mohammad D, Sulong G, Saba T (2009) Simple and effective techniques for core-region detection and slant correction in offline script recognition. In: Signal and image processing applications (ICSIPA), 2009 IEEE international conference on, IEEE, pp 15–20

  • Rehman A, Kurniawan F, Saba T (2011) An automatic approach for line detection and removal without smash-up characters. Imaging Sci J 59(3):177–182

    Article  Google Scholar 

  • Ribas FC, Oliveira L, Britto A, Sabourin R (2013) Handwritten digit segmentation: a comparative study. Int J Doc Anal Recognit 16(2):127–137

    Article  Google Scholar 

  • Roy PP, Pal U, Lladós J (2008) Recognition of multi-oriented touching characters in graphical documents. In: Computer vision, graphics and image processing, 2008. ICVGIP’08. Sixth Indian conference on, IEEE, pp 297–304

  • Roy PP, Pal U, Lladós J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recognit 45(5):1972–1983

    Article  Google Scholar 

  • Roy PP, Bhunia AK, Das A, Dey P, Pal U (2016) HMM-based indic handwritten word recognition using zone segmentation. Pattern Recognit 60:1057–1075

    Article  Google Scholar 

  • Roy PP, Zhong G, Cheriet M (2017) Tandem hidden markov models using deep belief networks for offline handwriting recognition. Front Inf Technol Electron Eng 18(7):978–988

    Article  Google Scholar 

  • Roy S, Das N, Kundu M, Nasipuri M (2017) Handwritten isolated bangla compound character recognition: a new benchmark using a novel deep learning approach. Pattern Recogniti Lett 90:15–21

    Article  Google Scholar 

  • Ryu J, Koo HI, Cho NI (2015) Word segmentation method for handwritten documents based on structured learning. IEEE Signal Process Lett 22(8):1161–1165

    Article  Google Scholar 

  • Saabni RM, El-Sana JA (2013) Comprehensive synthetic arabic database for on/off-line script recognition research. Int J Doc Anal Recognit 16(3):285–294

    Article  Google Scholar 

  • Saba T, Sulong G, Rehman A (2011) Retracted article: Document image analysis: issues, comparison of methods and remaining problems. Artif Intell Rev 35(2):101–118

    Article  Google Scholar 

  • Saba T, Rehman A, Altameem A, Uddin M (2014) Annotated comparisons of proposed preprocessing techniques for script recognition. Neural Comput Appl 25(6):1337–1347

    Article  Google Scholar 

  • Saini R, Roy PP, Dogra DP (2018) A segmental HMM based trajectory classification using genetic algorithm. Expert Syst Appl 93:169–181

    Article  Google Scholar 

  • Sajedi H, Bahador M (2016) Persian handwritten number recognition using adapted framing feature and support vector machines. Int J Comput Intell Appl 15(01):1650004

    Article  Google Scholar 

  • Samanta O, Bhattacharya U, Parui SK (2014) Smoothing of HMM parameters for efficient recognition of online handwriting. Pattern Recognit 47(11):3614–3629

    Article  Google Scholar 

  • Samanta O, Roy A, Bhattacharya U, Parui SK (2015) Script independent online handwriting recognition. In: Document analysis and recognition (ICDAR), 2015 13th international conference on, IEEE, pp 1251–1255

  • Sampath A, Gomathi N (2017) Fuzzy-based multi-kernel spherical support vector machine for effective handwritten character recognition. Sādhanā 42(9):1513–1525

    Article  MathSciNet  MATH  Google Scholar 

  • Sarkhel R, Das N, Das A, Kundu M, Nasipuri M (2017) A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts. Pattern Recognit 71:78–93

    Article  Google Scholar 

  • Schenk J, Lenz J, Rigoll G (2009) Novel script line identification method for script normalization and feature extraction in on-line handwritten whiteboard note recognition. Pattern Recognit 42(12):3383–3393

    Article  MATH  Google Scholar 

  • Sen S, Sarkar R, Roy K, Hori N (2017) Recognize online handwritten bangla characters using hausdorff distance-based feature. In: Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications, Springer, Berlin, pp 541–549

  • Sen S, Bhattacharyya A, Singh PK, Sarkar R, Roy K, Doermann D (2018) Application of structural and topological features to recognize online handwritten bangla characters. ACM Trans Asian Low-Resour Lang Inf Process 17(3):20

    Article  Google Scholar 

  • Sen S, Chowdhury S, Mitra M, Schwenker F, Sarkar R, Roy K (2018) A novel segmentation technique for online handwritten bangla words. Pattern Recognit Lett

  • Sen S, Mitra M, Bhattacharyya A, Sarkar R, Schwenker F, Roy K (2019) Feature selection for recognition of online handwritten bangla characters. Neural Process Lett, pp 1–24

  • Shanthi N, Duraiswamy K (2010) A novel SVM-based handwritten tamil character recognition system. Pattern Anal Appl 13(2):173–180

    Article  MathSciNet  Google Scholar 

  • Sharma MK, Dhaka VP (2016) Pixel plot and trace based segmentation method for bilingual handwritten scripts using feedforward neural network. Neural Comput Appl 27(7):1817–1829

    Article  Google Scholar 

  • Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recognit 52:448–458

    Article  Google Scholar 

  • Shijian L, Tan CL (2008) Script and language identification in noisy and degraded document images. IEEE Trans Pattern Anal Mach Intell 30(1):14–24

    Article  Google Scholar 

  • Shin J (2004) On-line cursive hangul recognition that uses DP matching to detect key segmentation points. Pattern Recognit 37(11):2101–2112

    Article  MATH  Google Scholar 

  • Shivakumara P, Yuan Z, Zhao D, Lu T, Tan CL (2015) New gradient-spatial-structural features for video script identification. Comput Vis Image Underst 130:35–53

    Article  Google Scholar 

  • Shivram A, Ramaiah C, Setlur S, Govindaraju V (2013) Ibm_ub_1: a dual mode unconstrained english handwriting dataset. In: Document analysis and recognition (ICDAR), 2013 12th international conference on, IEEE, pp 13–17

  • Shridhar M, Kimura F (1995) Handwritten address interpretation using word recognition with and without lexicon. In: Systems, man and cybernetics, 1995. Intelligent systems for the 21st century., IEEE international conference on, IEEE, vol 3, pp 2341–2346

  • Simistira F, Katsouros V, Carayannis G (2015) Recognition of online handwritten mathematical formulas using probabilistic svms and stochastic context free grammars. Pattern Recognit Lett 53:85–92

    Article  Google Scholar 

  • Singh S, Sharma A, Chhabra I (2017) A dominant points-based feature extraction approach to recognize online handwritten strokes. Int J Doc Anal Recognit 20(1):37–58

    Article  Google Scholar 

  • Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp 2951–2959

  • Soora NR, Deshpande PS (2017) Novel geometrical shape feature extraction techniques for multilingual character recognition. IETE Tech Rev 34(6):612–621

    Article  Google Scholar 

  • Srimany A, Chowdhuri SD, Bhattacharya U, Parui SK (2014) Holistic recognition of online handwritten words based on an ensemble of SVM classifiers. In: Document analysis systems (DAS), 2014 11th IAPR international workshop on, IEEE, pp 86–90

  • Sternby J, Morwing J, Andersson J, Friberg C (2009) On-line arabic handwriting recognition with templates. Pattern Recognit 42(12):3278–3286

    Article  MATH  Google Scholar 

  • Su B, Lu S (2017) Accurate recognition of words in scenes without character segmentation using recurrent neural network. Pattern Recognit 63:397–405

    Article  Google Scholar 

  • Su Z, Cao Z, Wang Y (2009) Stroke extraction based on ambiguous zone detection: a preprocessing step to recover dynamic information from handwritten chinese characters. Int J Doc Anal Recognit 12(2):109–121

    Article  Google Scholar 

  • Sundaram S, Ramakrishnan A (2015) Bigram language models and reevaluation strategy for improved recognition of online handwritten tamil words. ACM Trans Asian Low-Resour Lang Inf Process 14(2):8

    Article  Google Scholar 

  • Tagougui N, Kherallah M, Alimi AM (2013) Online arabic handwriting recognition: a survey. Int J Doc Anal Recognit 16(3):209–226

    Article  Google Scholar 

  • Tan GX, Viard-Gaudin C, Kot AC (2009) Information retrieval model for online handwritten script identification. In: Document analysis and recognition, 2009. ICDAR’09. 10th international conference on, IEEE, pp 336–340

  • Tappert C (1984) Dehooking procedure for handwriting on a tablet. IBM Tech Disclosure Bull 27(5):2995–2998

    Google Scholar 

  • Tappert CC, Suen CY, Wakahara T (1990) The state of the art in online handwriting recognition. IEEE Trans Pattern Anal Mach Intell 12(8):787–808

    Article  Google Scholar 

  • Tian S, Bhattacharya U, Lu S, Su B, Wang Q, Wei X, Lu Y, Tan CL (2016) Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognit 51:125–134

    Article  Google Scholar 

  • Ubul K, Tursun G, Aysa A, Impedovo D, Pirlo G, Yibulayin T (2017) Script identification of multi-script documents: a survey. IEEE Access 5:6546–6559

    Google Scholar 

  • Uchida S, Taira E, Sakoe H (2001) Nonuniform slant correction using dynamic programming. In: Document analysis and recognition, 2001. Proceedings. Sixth international conference on, IEEE, pp 434–438

  • Ul-Hasan A, Afzal MZ, Shafait F, Liwicki M, Breuel TM (2015) A sequence learning approach for multiple script identification. In: Document analysis and recognition (ICDAR), 2015 13th International conference on, IEEE, pp 1046–1050

  • Vajda S, Roy K, Pal U, Chaudhuri BB, Belaid A (2009) Automation of indian postal documents written in bangla and english. Int J Pattern Recognit Artif Intell 23(08):1599–1632

    Article  Google Scholar 

  • Van Erp M, Vuurpijl L, Schomaker L (2002) An overview and comparison of voting methods for pattern recognition. In: Frontiers in handwriting recognition, 2002. Proceedings. Eighth international workshop on, IEEE, pp 195–200

  • Verma B, Blumenstein M, Ghosh M (2004) A novel approach for structural feature extraction: contour vs. direction. Pattern Recognit Lett 25(9):975–988

    Article  Google Scholar 

  • Verma K, Sharma RK (2017) Comparison of HMM-and SVM-based stroke classifiers for Gurmukhi script. Neural Comput Appl 28(1):51–63

    Article  Google Scholar 

  • Viard-Gaudin C, Lallican PM, Knerr S, Binter P (1999) The ireste on/off (ironoff) dual handwriting database. In: Document analysis and recognition, 1999. ICDAR’99. Proceedings of the fifth international conference on, IEEE, pp 455–458

  • Vinciarelli A, Luettin J (2001) A new normalization technique for cursive handwritten words. Pattern Recognit Lett 22(9):1043–1050

    Article  MATH  Google Scholar 

  • Vučković V, Arizanović B (2017) Efficient character segmentation approach for machine-typed documents. Expert Syst Appl 80:210–231

    Article  Google Scholar 

  • Wang F, Guo Q, Lei J, Zhang J (2017) Convolutional recurrent neural networks with hidden Markov model bootstrap for scene text recognition. IET Comput Vis 11(6):497–504

    Article  Google Scholar 

  • Wang QF, Yin F, Liu CL (2012) Handwritten chinese text recognition by integrating multiple contexts. IEEE Trans Pattern Anal Mach Intell 34(8):1469–1481

    Article  Google Scholar 

  • Wei X, Ma S, Jin Y (2005) Segmentation of connected chinese characters based on genetic algorithm. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 645–649

  • Wu YC, Yin F, Liu CL (2017) Improving handwritten chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recognit 65:251–264

    Article  Google Scholar 

  • Xiao X, Jin L, Yang Y, Yang W, Sun J, Chang T (2017) Building fast and compact convolutional neural networks for offline handwritten chinese character recognition. Pattern Recognit 72:72–81

    Article  Google Scholar 

  • Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435

    Article  Google Scholar 

  • Xu R, Yeung D, Shu W, Liu J (2002) A hybrid post-processing system for handwritten chinese character recognition. Int J Pattern Recognit Artif Intell 16(06):657–679

    Article  Google Scholar 

  • Xu R, Yeung DS, Shi D (2005) A hybrid post-processing system for offline handwritten chinese character recognition based on a statistical language model. Int J Pattern Recognit Artif Intell 19(03):415–428

    Article  Google Scholar 

  • Yamaguchi T, Tsuruoka S, Yoshikawa T, Shinogi T, Makimoto E, Ogata H, Shridhar M (2002) A segmentation system for touching handwritten Japanese characters. In: Frontiers in handwriting recognition, 2002. Proceedings. Eighth international workshop on, IEEE, pp 407–412

  • Yang W, Jin L, Tao D, Xie Z, Feng Z (2016) Dropsample: a new training method to enhance deep convolutional neural networks for large-scale unconstrained handwritten chinese character recognition. Pattern Recognit 58:190–203

    Article  Google Scholar 

  • Youn E, Koenig L, Jeong MK, Baek SH (2010) Support vector-based feature selection using fisher’s linear discriminant and support vector machine. Expert Syst Appl 37(9):6148–6156

    Article  Google Scholar 

  • Zamora-Martinez F, Frinken V, España-Boquera S, Castro-Bleda MJ, Fischer A, Bunke H (2014) Neural network language models for off-line handwriting recognition. Pattern Recognit 47(4):1642–1652

    Article  Google Scholar 

  • Zhang Q, Yang LT, Chen Z, Li P (2018) A survey on deep learning for big data. Inf Fusion 42:146–157

    Article  Google Scholar 

  • Zhang S, Jin L, Lin L (2016) Discovering similar chinese characters in online handwriting with deep convolutional neural networks. Int J Doc Anal Recognit 19(3):237–252

    Article  Google Scholar 

  • Zouari R, Boubaker H, Kherallah M (2016) A time delay neural network for online arabic handwriting recognition. In: International conference on intelligent systems design and applications, Springer, Berlin, pp 1005–1014

Download references

Acknowledgements

This research was supported by Council of Scientific and Industrial Research (CSIR) funded by the Ministry of Science and Technology (09/677(0031)/2018/EMR-I) as well as the Government of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ravinder Kumar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaur, S., Bawa, S. & Kumar, R. A survey of mono- and multi-lingual character recognition using deep and shallow architectures: indic and non-indic scripts. Artif Intell Rev 53, 1813–1872 (2020). https://doi.org/10.1007/s10462-019-09720-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-019-09720-9

Keywords

Navigation