Abstract
The cultural and regional diversity across the world and specifically in India has given birth to a large number of writing systems and scripts having a variety of character sets. For scripts having a larger character set, just a simple keyboard with limited character set is not the optimal way for providing inputs to the computer. Variations in individual handwriting due to mood swings, changes in medium of writing, changes in writing styles, etc. pose a challenge before the character recognition (CR) research community. Similar kinds of symbols in various scripts and languages act as a big barrier in multilingual CR. Lack of benchmark results and corpora for multilingual CR hinder the research in multilingual CR. There have been only a limited number of articles for optimal combination of features and classifiers to process multilingual data. Multilingual CR has least explored the Indic scripts. This paper presents a detailed review and analysis of the work done in multilingual online as well as offline CR for Indic and non-Indic scripts. The paper mainly contributes in two ways: Firstly, it provides a clear perspective about various phases of monolingual and multilingual CR; and secondly, identifies the major deficiencies in monolingual and multilingual CR for printed and handwritten text. It contributes by giving an in-depth view of work done at each phase including data acquisition, pre-processing, segmentation, feature extraction, recognition and post-processing of CR. Issues to be resolved at each phase have also been elaborated. The recent work done using Deep and Shallow architectures has been analysed. Tools used for these architectures have been compared to highlight their pros and cons. The present work also suggests how further research can be conducted in the field of monolingual and multilingual CR. The problems such as CR in hybrid documents, identifying more reliable features, resolving issues of similar characters, identifying optimal combination strategies for deep and shallow architectures, etc. need to be tackled in future research.
Similar content being viewed by others
References
Abdelaziz I, Abdou S, Al-Barhamtoshy H (2016) A large vocabulary system for arabic online handwriting recognition. Pattern Anal Appl 19(4):1129–1141
Agrawal M, Bali K, Madhvanath S, Vuurpijl L (2005) Upx: A new xml representation for annotated datasets of online handwriting data. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 1161–1165
Ahmed SB, Naz S, Razzak MI, Rashid SF, Afzal MZ, Breuel TM (2016) Evaluation of cursive and non-cursive scripts using recurrent neural networks. Neural Comput Appl 27(3):603–613
Ahmed SB, Naz S, Swati S, Razzak MI (2017) Handwritten Urdu character recognition using one-dimensional BLSTM classifier. Neural Comput Appl, pp 1–9
Ait-Mohand K, Paquet T, Ragot N (2014) Combining structure and parameter adaptation of hmms for printed text recognition. IEEE Trans Pattern Anal Mach Intell 36(9):1716–1732
Al-Boeridi ON, Ahmad SS, Koh S (2015) A scalable hybrid decision system (HDS) for roman word recognition using ann SVM: study case on malay word recognition. Neural Comput Appl 26(6):1505–1513
Al Maadeed S, Ayouby W, Hassaïne A, Aljaam JM (2012) Quwi: an arabic and english handwriting dataset for offline writer identification. In: Frontiers in handwriting recognition (ICFHR), 2012 international conference on, IEEE, pp 746–751
Alginahi YM, Mudassar M, Kabir MN (2015) An arabic script recognition system. KSII Trans Internet Inf Syst 9(9):3701–3720
Almaksour A, Anquetil E (2009) Fast incremental learning strategy driven by confusion reject for online handwriting recognition. In: Document analysis and recognition, 2009. ICDAR’09. 10th international conference on, IEEE, pp 81–85
Amara NEB, Mazhoud O, Bouzrara N, Ellouze N (2005) Arabase: a relational database for arabic OCR systems. Int Arab J Inf Technol 2(4):259–266
Arica N, Yarman-Vural FT (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans Syst Man Cybern Part C 31(2):216–233
Arora S, Sharma D, Arora S (2014) Recognition of gurmukhi text from sign board images captured from mobile camera. Int J Inf Comput Technol 4(17):1839–1845
Arvind K, Kumar J, Ramakrishnan A (2007) Line removal and restoration of handwritten strokes. In: Conference on computational intelligence and multimedia applications, 2007. International conference on, IEEE, vol 3, pp 208–214
Azeem SA, Ahmed H (2013) Effective technique for the recognition of offline arabic handwritten words using hidden markov models. Int J Doc Anal Recognit 16(4):399–412
Bag S, Harit G, Bhowmick P (2014) Recognition of bangla compound characters using structural decomposition. Pattern Recognit 47(3):1187–1201
Bai ZL, Huo Q (2004) Underline detection and removal in a document image using multiple strategies. In: Pattern recognition, 2004. ICPR 2004. Proceedings of the 17th international conference on, IEEE, vol 2, pp 578–581
Bansal V, Sinha R (2002) Segmentation of touching and fused Devanagari characters. Pattern Recognit 35(4):875–893
Baral S, Bhattacharya S, Chakraborty A, Bhattacharya U, Parui SK (2014) A machine learning approach to detection of core region of online handwritten bangla word samples. In: Frontiers in handwriting recognition (ICFHR), 2014 14th international conference on, IEEE, pp 458–463
Basu S, Das N, Sarkar R, Kundu M, Nasipuri M, Basu DK (2009) A hierarchical approach to recognition of handwritten bangla characters. Pattern Recognit 42(7):1467–1484
Benjelil M, Kanoun S, Mullot R, Alimi AM (2009) Arabic and latin script identification in printed and handwritten types based on steerable pyramid features. In: Document analysis and recognition, 2009. ICDAR’09. 10th international conference on, IEEE, pp 591–595
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305
Bharath A, Madhvanath S (2012) Hmm-based lexicon-driven and lexicon-free word recognition for online handwritten indic scripts. IEEE Trans Pattern Anal Mach Intell 34(4):670–682
Bharath A, Madhvanath S (2014) Allograph modeling for online handwritten characters in devanagari using constrained stroke clustering. ACM Trans Asian Lang Inf Process 13(3):12
Bhaskarabhatla AS, Madhvanath S (2004) Experiences in collection of handwriting data for online handwriting recognition in indic scripts. In: LREC, Citeseer
Bhattacharya S, Maitra DS, Bhattacharya U, Parui SK (2016) An end-to-end system for bangla online handwriting recognition. In: Frontiers in handwriting recognition (ICFHR), 2016 15th International conference on, IEEE, pp 373–378
Bhattacharya U, Shridhar M, Parui SK, Sen P, Chaudhuri B (2012) Offline recognition of handwritten bangla characters: an efficient two-stage approach. Pattern Anal Appl 15(4):445–458
Bhowmik TK, Parui SK, Roy U, Schomaker L (2016) Bangla handwritten character segmentation using structural features: a supervised and bootstrapping approach. ACM Trans Asian Low-Resour Lang Inf Process 15(4):29
Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recognit 85:172–184
Biadsy F, Saabni R, El-Sana J (2011) Segmentation-free online arabic handwriting recognition. Int J Pattern Recognit Artif Intell 25(07):1009–1033
Bianne-Bernard AL, Menasri F, Mohamad RAH, Mokbel C, Kermorvant C, Likforman-Sulem L (2011) Dynamic and contextual information in hmm modeling for handwritten word recognition. IEEE Trans Pattern Anal Mach Intell 33(10):2066–2080
Blanchard J, Artieres T (2004) On-line handwritten documents segmentation. In: Frontiers in handwriting recognition, 2004. IWFHR-9 2004. Ninth international workshop on, IEEE, pp 148–153
Blumenstein M, Cheng CK, Liu XY (2002) New preprocessing techniques for handwritten word recognition. In: Proceedings of the second IASTED international conference on visualization, imaging and image processing (VIIP 2002), ACTA Press, Calgary, pp 480–484
Bozinovic RM, Srihari SN (1989) Off-line cursive script word recognition. IEEE Trans Pattern Anal Mach Intell 11(1):68–83
Carbonnel S, Anquetil E (2004) Lexicon organization and string edit distance learning for lexical post-processing in handwriting recognition. In: Frontiers in handwriting recognition, 2004. IWFHR-9 2004. Ninth international workshop on, IEEE, pp 462–467
Casey RG, Lecolinet E (1996) A survey of methods and strategies in character segmentation. IEEE Trans Pattern Anal Mach Intell 18(7):690–706
Cavalin PR, Sabourin R, Suen CY, Britto AS Jr (2009) Evaluation of incremental learning algorithms for hmm in the recognition of alphanumeric characters. Pattern Recognit 42(12):3241–3253
Chakraborty D, Pal U (2016) Baseline detection of multi-lingual unconstrained handwritten text lines. Pattern Recognit Lett 74:74–81
Chherawala Y, Roy PP, Cheriet M (2016) Feature set evaluation for offline handwriting recognition systems: application to the recurrent neural network model. IEEE Trans Cybern 46(12):2825–2836
Chherawala Y, Roy PP, Cheriet M (2017) Combination of context-dependent bidirectional long short-term memory classifiers for robust offline handwriting recognition. Pattern Recognit Lett 90:58–64
Connell SD, Jain AK (2001) Template-based online character recognition. Pattern Recognit 34(1):1–14
Connell SD, Jain AK (2002) Writer adaptation for online handwriting recognition. IEEE Trans Pattern Anal Mach Intell 24(3):329–346
Dalal S, Malik L (2008) A survey of methods and strategies for feature extraction in handwritten script identification. In: Emerging trends in engineering and technology, 2008. ICETET’08. First international conference on, IEEE, pp 1164–1169
Das N, Reddy JM, Sarkar R, Basu S, Kundu M, Nasipuri M, Basu DK (2012) A statistical-topological feature combination for recognition of handwritten numerals. Appl Soft Comput 12(8):2486–2495
Dash KS, Puhan NB, Panda G (2016) BESAC: binary external symmetry axis constellation for unconstrained handwritten character recognition. Pattern Recognit Lett 83:413–422
De Oliveira J, de Carvalho JM, de A Freitas C, Sabourin R (2002) Feature sets evaluation for handwritten word recognition. In: Frontiers in handwriting recognition, 2002. Proceedings. Eighth international workshop on, IEEE, pp 446–450
De Stefano C, Marcelli A (2004) An efficient method for online cursive handwriting strokes reordering. Int J Pattern Recognit Artif Intell 18(07):1157–1171
Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process 3
Dhaka VP, Sharma MK (2015) An efficient segmentation technique for devanagari offline handwritten scripts using the feedforward neural network. Neural Comput Appl 26(8):1881–1893
Dutta D, Chowdhury AR, Bhattacharya U, Parui SK (2014) Stroke level user-adaptation for stroke order free online handwriting recognition. In: Frontiers in handwriting recognition (ICFHR), 2014 14th international conference on, IEEE, pp 250–255
Elanwar RI, Rashwan MA, Mashali SA (2007) Simultaneous segmentation and recognition of arabic characters in an unconstrained on-line cursive handwritten document. In: Proceedings of world academy of science, engineering and technology vol 23, pp 288–291
Elgammal AM, Ismail MA (2001) Techniques for language identification for hybrid Arabic-English document images. In: Document analysis and recognition, 2001. Proceedings. Sixth international conference on, IEEE, pp 1100–1104
Elnagar A, Alhajj R (2003) Segmentation of connected handwritten numeral strings. Pattern Recognit 36(3):625–634
Eskenazi S, Gomez-Krämer P, Ogier JM (2017) A comprehensive survey of mostly textual document segmentation algorithms since 2008. Pattern Recognit 64:1–14
Farooq F, Bhardwaj A, Govindaraju V (2009) Using topic models for ocr correction. Int J Doc Anal Recognit 12(3):153–164
Farulla GA, Murru N, Rossini R (2017) A fuzzy approach to segment touching characters. Expert Syst Appl 88:1–13
Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H (2018) Gan-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. arXiv preprint arXiv:1803.01229
Frishkopf L, Harmon L (1961) Machine reading of cursive script. Inf Theory, pp 300–316
Gader PD, Khabou MA (1996) Automatic feature generation for handwritten digit recognition. IEEE Trans Pattern Anal Mach Intell 18(12):1256–1261
Ghods V, Kabir E, Razzazi F (2013) Effect of delayed strokes on the recognition of online farsi handwriting. Pattern Recognit Lett 34(5):486–491
Ghods V, Kabir E, Razzazi F (2014) Fusion of hmm classifiers, based on x, y and (x, y) signals, for the recognition of online farsi handwriting: a large lexicon approach. Arab J Sci Eng 39(3):1713–1723
Ghosh D, Dube T, Shivaprasad A (2010) Script recognition: a review. IEEE Trans Pattern Anal Mach Intell 32(12):2142–2161
Giménez A, Khoury I, Andrés-Ferrer J, Juan A (2014) Handwriting word recognition using windowed bernoulli HMMs. Pattern Recognit Lett 35:149–156
Guerfali W, Plamondon R (1993) Normalizing and restoring on-line handwriting. Pattern Recognit 26(3):419–431
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Hládek D, Staš J, Ondáš S, Juhár J, Kovács L (2017) Learning string distance with smoothing for OCR spelling correction. Multimedia Tools and Appl 76(22):24549–24567
Hochberg J, Kelly P, Thomas T, Kerns L (1997) Automatic script identification from document images using cluster-based templates. IEEE Trans Pattern Anal Mach Intell 19(2):176–181
Hochberg J, Bowers K, Cannon M, Kelly P (1999) Script and language identification for handwritten document images. Int J Doc Anal Recognit 2(2–3):45–52
Holzinger A, Stocker C, Peischl B, Simonic KM (2012) On using entropy for enhancing handwriting preprocessing. Entropy 14(11):2324–2350
Hu J, Rosenthal AS, Brown MK (1997) Combining high-level features with sequential local features for on-line handwriting recognition. In: International conference on image analysis and processing. Springer, Berlin, pp 647–654
Huang BQ, Zhang Y, Kechadi MT (2007) Preprocessing techniques for online handwriting recognition. In: Intelligent systems design and applications, 2007. ISDA 2007. Seventh international conference on, IEEE, pp 793–800
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Humied IA (2016) Segmentation accuracy for offline arabic handwritten recognition based on bounding box algorithm. Int J Comput Sci Netw Secur 16(9):98
Hussain R, Raza A, Siddiqi I, Khurshid K, Djeddi C (2015) A comprehensive survey of handwritten document benchmarks: structure, usage and evaluation. EURASIP J Image Video Process 2015(1):46
Iwana BK, Frinken V, Riesen K, Uchida S (2017) Efficient temporal pattern recognition by means of dissimilarity space embedding with discriminative prototypes. Pattern Recognit 64:268–276
Jaeger S, Nakagawa M (2001) Two on-line Japanese character databases in unipen format. In: Document analysis and recognition, 2001. Proceedings. Sixth international conference on, IEEE, pp 566–570
Jaeger S, Ma H, Doermann D (2005) Identifying script on word-level with informational confidence. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 416–420
Jawahar C, Kumar MP, Kiran SR (2003) A bilingual ocr for hindi-telugu documents and its applications. In: Document analysis and recognition, 2003. Proceedings. Seventh international conference on, IEEE, pp 408–412
Jayadevan R, Kolhe SR, Patil PM, Pal U (2011) Offline recognition of devanagari script: a survey. IEEE Trans Syst Man Cybern Part C 41(6):782–796
Jayech K, Mahjoub MA, Amara NEB (2016) Synchronous multi-stream hidden markov model for offline arabic handwriting recognition without explicit segmentation. Neurocomputing 214:958–971
Jothi JAA, Rajam VMA (2017) A survey on automated cancer diagnosis from histopathology images. Artif Intell Rev 48(1):31–81
Kacem A, Saïdani A (2017) A texture-based approach for word script and nature identification. Pattern Anal Appl 20(4):1157–1167
Kavallieratou E, Fakotakis N, Kokkinakis G (1999) New algorithms for skewing correction and slant removal on word-level [ocr]. In: Electronics, circuits and systems, 1999. Proceedings of ICECS’99. The 6th IEEE international conference on, IEEE, vol 2, pp 1159–1162
Kavitha S, Shivakumara P, Kumar GH, Tan C (2015) A robust script identification system for historical indian document images. Malays J Comput Sci 28(4):283–300
Keysers D, Deselaers T, Rowley HA, Wang LL, Carbune V (2017) Multi-language online handwriting recognition. IEEE Trans Pattern Anal Mach Intell 39(6):1180–1194
Kherallah M, Elbaati A, Abed H, Alimi A (2008) The on/off (LMCA) dual arabic handwriting database. In: 11th International conference on frontiers in handwriting recognition (ICFHR)
Kherallah M, Tagougui N, Alimi AM, El Abed H, Margner V (2011) Online arabic handwriting recognition competition. In: Document analysis and recognition (ICDAR), 2011 international conference on, IEEE, pp 1454–1458
Kim IJ, Xie X (2015) Handwritten hangul recognition using deep convolutional neural networks. Int J Doc Anal Recognit 18(1):1–13
Kukich K (1992) Techniques for automatically correcting words in text. ACM Comput Surv 24(4):377–439
Kumar M, Jindal M, Sharma R, Jindal SR (2018) Character and numeral recognition for non-indic and indic scripts: a survey. Artif Intell Rev, pp 1–27
Kumar R, Sharma RK (2013) An efficient post processing algorithm for online handwriting Gurmukhi character recognition using set theory. Int J Pattern Recognit Artif Intell 27(04):1353002
Lacerda EB, Mello CA (2013) Segmentation of connected handwritten digits using self-organizing maps. Expert Syst Appl 40(15):5867–5877
Lai S, Jin L, Yang W (2017) Toward high-performance online HCCR: A CNN approach with dropdistortion, path signature and spatial stochastic max-pooling. Pattern Recognit Lett 89:60–66
Lam L, Suen CY (1995) Optimal combinations of pattern classifiers. Pattern Recognit Lett 16(9):945–954
Lee JJ, Kim JH (1996) A unified network-based approach for online recognition of multi-lingual cursive handwritings. In: Proceedings of fifth international workshop frontiers in handwriting recognition, pp 393–397
Lee MH, Kim SH, Lee GS, Kim SH, Yang HJ (2012) Correction for misrecognition of korean texts in signboard images using improved levenshtein metric. KSII Trans Internet Inf Syst 6(2):722–733
Lehal G, Singh C (2001) A technique for segmentation of Gurmukhi text. In: International conference on computer analysis of images and patterns, Springer, Berlin, pp 191–200
Li F, Shen Q, Li Y, Mac Parthaláin N (2016) Handwritten chinese character recognition using fuzzy image alignment. Soft Comput 20(8):2939–2949
Li Y, Jin L, Zhu X, Long T (2008) SCUT-COUCH2008: a comprehensive online unconstrained chinese handwriting dataset. ICFHR 2008:165–170
Li YX, Tan CL, Ding X (2005) A hybrid post-processing system for offline handwritten chinese script recognition. Pattern Anal Appl 8(3):272–286
Liu CL, Jaeger S, Nakagawa M (2004) Online recognition of chinese characters: the state-of-the-art. IEEE Trans Pattern Anal Mach Intell 26(2):198–213
Liu CL, Yin F, Wang DH, Wang QF (2011) Casia online and offline chinese handwriting databases. In: Document analysis and recognition (ICDAR), 2011 International conference on, IEEE, pp 37–41
Liu X, Fu H, Jia Y (2008) Gaussian mixture modeling and learning of neighboring characters for multilingual text extraction in images. Pattern Recognit 41(2):484–493
Liu YH, Lin CC, Chang F (2005) Language identification of character images using machine learning techniques. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 630–634
Liwicki M, Bunke H (2005) Iam-ondb-an on-line english sentence database acquired from handwritten text on a whiteboard. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 956–961
Liwicki M, Bunke H (2009) Feature selection for HMM and BLSTM based handwriting recognition of whiteboard notes. Int J Pattern Recognit Artif Intell 23(05):907–923
Llorens D, Prat F, Marzal A, Vilar JM, Castro MJ, Amengual JC, Barrachina S, Castellanos A, Boquera SE, Gómez J, et al (2008) The UJIpenchars database: a pen-based database of isolated handwritten characters. In: LREC
Lorigo LM, Govindaraju V (2006) Offline arabic handwriting recognition: a survey. IEEE Trans Pattern Anal Mach Intell 28(5):712–724
Ma L, Liu H, Wu J (2011) MRG-OHTC database for online handwritten tibetan character recognition. In: Document analysis and recognition (ICDAR), 2011 international conference on, IEEE, pp 207–211
Mahalat MH, Mollah AF, Basu S, Nasipuri M (2017) Design of novel post-processing algorithms for handwritten arabic numerals classification. Int J Appl Pattern Recognit 4(4):342–357
Mandler E (1987) Advanced preprocessing technique for on-line script recognition of nonconnected symbols. In: Proceedings of 3rd international symposium on handwriting and computer applications, pp 64–66
Marti UV, Bunke H (2001) Using a statistical language model to improve the performance of an hmm-based cursive handwriting recognition system. In: Hidden Markov models: applications in computer vision. World Scientific, Singapore, pp 65–90
Mitrpanont JL, Limkonglap U (2007) Using contour analysis to improve feature extraction in thai handwritten character recognition systems. In: Computer and information technology, 2007. CIT 2007. 7th IEEE international conference on, IEEE, pp 668–673
Mohamad RAH, Likforman-Sulem L, Mokbel C (2009) Combining slanted-frame classifiers for improved HMM-based arabic handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31(7):1165–1177
Mohamed Ar, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22
Mori S, Suen CY, Yamamoto K (1992) Historical review of ocr research and development. Proc IEEE 80(7):1029–1058
Nakagawa M, Matsumoto K (2004) Collection of on-line handwritten japanese character pattern databases and their analyses. Doc Anal Recognit 7(1):69–81
Namboodiri AM, Jain AK (2004) Online handwritten script recognition. IEEE Trans Pattern Anal Mach Intell 26(1):124–130
Naz S, Umar AI, Ahmad R, Ahmed SB, Shirazi SH, Siddiqi I, Razzak MI (2016) Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing 177:228–241
Naz S, Umar AI, Ahmad R, Ahmed SB, Shirazi SH, Razzak MI (2017) Urdu Nasta’liq text recognition system based on multi-dimensional recurrent neural network and statistical features. Neural Comput Appl 28(2):219–231
Nethravathi B, Archana C, Shashikiran K, Ramakrishnan AG, Kumar V (2010) Creation of a huge annotated database for tamil and kannada ohr. In: Frontiers in handwriting recognition (ICFHR), 2010 international conference on, IEEE, pp 415–420
Nguyen CT, Zhu B, Nakagawa M (2014) A semi-incremental recognition method for on-line handwritten english text. In: Frontiers in handwriting recognition (ICFHR), 2014 14th international conference on, IEEE, pp 234–239
Niu XX, Suen CY (2012) A novel hybrid CNN–SVM classifier for recognizing handwritten digits. Pattern Recognit 45(4):1318–1325
Obaidullah SM, Halder C, Santosh K, Das N, Roy K (2018) Phdindic_11: page-level handwritten document image dataset of 11 official indic scripts for script identification. Multimedia Tools Appl 77(2):1643–1678
Oprean C, Likforman-Sulem L, Popescu A, Mokbel C (2015) Handwritten word recognition using web resources and recurrent neural networks. Int J Doc Anal Recognit 18(4):287–301
Pal U, Belaıd A, Choisy C (2003) Touching numeral segmentation using water reservoir concept. Pattern Recognit Lett 24(1–3):261–272
Pal U, Jayadevan R, Sharma N (2012) Handwriting recognition in indian regional scripts: a survey of offline techniques. ACM Trans Asian Lang Inf Process 11(1):1
Pan W, Suen CY, Bui TD (2005) Script identification using steerable gabor filters. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 883–887
Park N, Mohammadi M, Gorde K, Jajodia S, Park H, Kim Y (2018) Data synthesis based on generative adversarial networks. Proc VLDB Endow 11(10):1071–1083
Pati PB, Ramakrishnan A (2008) Word level multi-script identification. Pattern Recognit Lett 29(9):1218–1229
Pitrelli JF, Perrone MP (2002) Confidence modeling for verification post-processing for handwriting recognition. In: Frontiers in handwriting recognition, 2002. Proceedings. Eighth international workshop on, IEEE, pp 30–35
Plamondon R, Srihari SN (2000) Online and off-line handwriting recognition: a comprehensive survey. IEEE Trans Pattern Anal Mach Intell 22(1):63–84
Plötz T, Fink GA (2009) Markov models for offline handwriting recognition: a survey. Int J Doc Anal Recognit 12(4):269
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125
Rabi M, Amrouch M, Mahani Z (2018) Recognition of cursive arabic handwritten text using embedded training based on hidden markov models. Int J Pattern Recognit Artif Intell 32(01):1860007
Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Razzak MI, Husain SA, Mirza AA, Belaid A (2012) Fuzzy based preprocessing using fusion of online and offline trait for online urdu script based languages character recognition. Int J Innov Comput Inf Control 8(5):21
Rehman A, Saba T (2014) Neural networks for document image preprocessing: state of the art. Artif Intell Rev 42(2):253–273
Rehman A, Mohammad D, Sulong G, Saba T (2009) Simple and effective techniques for core-region detection and slant correction in offline script recognition. In: Signal and image processing applications (ICSIPA), 2009 IEEE international conference on, IEEE, pp 15–20
Rehman A, Kurniawan F, Saba T (2011) An automatic approach for line detection and removal without smash-up characters. Imaging Sci J 59(3):177–182
Ribas FC, Oliveira L, Britto A, Sabourin R (2013) Handwritten digit segmentation: a comparative study. Int J Doc Anal Recognit 16(2):127–137
Roy PP, Pal U, Lladós J (2008) Recognition of multi-oriented touching characters in graphical documents. In: Computer vision, graphics and image processing, 2008. ICVGIP’08. Sixth Indian conference on, IEEE, pp 297–304
Roy PP, Pal U, Lladós J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recognit 45(5):1972–1983
Roy PP, Bhunia AK, Das A, Dey P, Pal U (2016) HMM-based indic handwritten word recognition using zone segmentation. Pattern Recognit 60:1057–1075
Roy PP, Zhong G, Cheriet M (2017) Tandem hidden markov models using deep belief networks for offline handwriting recognition. Front Inf Technol Electron Eng 18(7):978–988
Roy S, Das N, Kundu M, Nasipuri M (2017) Handwritten isolated bangla compound character recognition: a new benchmark using a novel deep learning approach. Pattern Recogniti Lett 90:15–21
Ryu J, Koo HI, Cho NI (2015) Word segmentation method for handwritten documents based on structured learning. IEEE Signal Process Lett 22(8):1161–1165
Saabni RM, El-Sana JA (2013) Comprehensive synthetic arabic database for on/off-line script recognition research. Int J Doc Anal Recognit 16(3):285–294
Saba T, Sulong G, Rehman A (2011) Retracted article: Document image analysis: issues, comparison of methods and remaining problems. Artif Intell Rev 35(2):101–118
Saba T, Rehman A, Altameem A, Uddin M (2014) Annotated comparisons of proposed preprocessing techniques for script recognition. Neural Comput Appl 25(6):1337–1347
Saini R, Roy PP, Dogra DP (2018) A segmental HMM based trajectory classification using genetic algorithm. Expert Syst Appl 93:169–181
Sajedi H, Bahador M (2016) Persian handwritten number recognition using adapted framing feature and support vector machines. Int J Comput Intell Appl 15(01):1650004
Samanta O, Bhattacharya U, Parui SK (2014) Smoothing of HMM parameters for efficient recognition of online handwriting. Pattern Recognit 47(11):3614–3629
Samanta O, Roy A, Bhattacharya U, Parui SK (2015) Script independent online handwriting recognition. In: Document analysis and recognition (ICDAR), 2015 13th international conference on, IEEE, pp 1251–1255
Sampath A, Gomathi N (2017) Fuzzy-based multi-kernel spherical support vector machine for effective handwritten character recognition. Sādhanā 42(9):1513–1525
Sarkhel R, Das N, Das A, Kundu M, Nasipuri M (2017) A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts. Pattern Recognit 71:78–93
Schenk J, Lenz J, Rigoll G (2009) Novel script line identification method for script normalization and feature extraction in on-line handwritten whiteboard note recognition. Pattern Recognit 42(12):3383–3393
Sen S, Sarkar R, Roy K, Hori N (2017) Recognize online handwritten bangla characters using hausdorff distance-based feature. In: Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications, Springer, Berlin, pp 541–549
Sen S, Bhattacharyya A, Singh PK, Sarkar R, Roy K, Doermann D (2018) Application of structural and topological features to recognize online handwritten bangla characters. ACM Trans Asian Low-Resour Lang Inf Process 17(3):20
Sen S, Chowdhury S, Mitra M, Schwenker F, Sarkar R, Roy K (2018) A novel segmentation technique for online handwritten bangla words. Pattern Recognit Lett
Sen S, Mitra M, Bhattacharyya A, Sarkar R, Schwenker F, Roy K (2019) Feature selection for recognition of online handwritten bangla characters. Neural Process Lett, pp 1–24
Shanthi N, Duraiswamy K (2010) A novel SVM-based handwritten tamil character recognition system. Pattern Anal Appl 13(2):173–180
Sharma MK, Dhaka VP (2016) Pixel plot and trace based segmentation method for bilingual handwritten scripts using feedforward neural network. Neural Comput Appl 27(7):1817–1829
Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recognit 52:448–458
Shijian L, Tan CL (2008) Script and language identification in noisy and degraded document images. IEEE Trans Pattern Anal Mach Intell 30(1):14–24
Shin J (2004) On-line cursive hangul recognition that uses DP matching to detect key segmentation points. Pattern Recognit 37(11):2101–2112
Shivakumara P, Yuan Z, Zhao D, Lu T, Tan CL (2015) New gradient-spatial-structural features for video script identification. Comput Vis Image Underst 130:35–53
Shivram A, Ramaiah C, Setlur S, Govindaraju V (2013) Ibm_ub_1: a dual mode unconstrained english handwriting dataset. In: Document analysis and recognition (ICDAR), 2013 12th international conference on, IEEE, pp 13–17
Shridhar M, Kimura F (1995) Handwritten address interpretation using word recognition with and without lexicon. In: Systems, man and cybernetics, 1995. Intelligent systems for the 21st century., IEEE international conference on, IEEE, vol 3, pp 2341–2346
Simistira F, Katsouros V, Carayannis G (2015) Recognition of online handwritten mathematical formulas using probabilistic svms and stochastic context free grammars. Pattern Recognit Lett 53:85–92
Singh S, Sharma A, Chhabra I (2017) A dominant points-based feature extraction approach to recognize online handwritten strokes. Int J Doc Anal Recognit 20(1):37–58
Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp 2951–2959
Soora NR, Deshpande PS (2017) Novel geometrical shape feature extraction techniques for multilingual character recognition. IETE Tech Rev 34(6):612–621
Srimany A, Chowdhuri SD, Bhattacharya U, Parui SK (2014) Holistic recognition of online handwritten words based on an ensemble of SVM classifiers. In: Document analysis systems (DAS), 2014 11th IAPR international workshop on, IEEE, pp 86–90
Sternby J, Morwing J, Andersson J, Friberg C (2009) On-line arabic handwriting recognition with templates. Pattern Recognit 42(12):3278–3286
Su B, Lu S (2017) Accurate recognition of words in scenes without character segmentation using recurrent neural network. Pattern Recognit 63:397–405
Su Z, Cao Z, Wang Y (2009) Stroke extraction based on ambiguous zone detection: a preprocessing step to recover dynamic information from handwritten chinese characters. Int J Doc Anal Recognit 12(2):109–121
Sundaram S, Ramakrishnan A (2015) Bigram language models and reevaluation strategy for improved recognition of online handwritten tamil words. ACM Trans Asian Low-Resour Lang Inf Process 14(2):8
Tagougui N, Kherallah M, Alimi AM (2013) Online arabic handwriting recognition: a survey. Int J Doc Anal Recognit 16(3):209–226
Tan GX, Viard-Gaudin C, Kot AC (2009) Information retrieval model for online handwritten script identification. In: Document analysis and recognition, 2009. ICDAR’09. 10th international conference on, IEEE, pp 336–340
Tappert C (1984) Dehooking procedure for handwriting on a tablet. IBM Tech Disclosure Bull 27(5):2995–2998
Tappert CC, Suen CY, Wakahara T (1990) The state of the art in online handwriting recognition. IEEE Trans Pattern Anal Mach Intell 12(8):787–808
Tian S, Bhattacharya U, Lu S, Su B, Wang Q, Wei X, Lu Y, Tan CL (2016) Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognit 51:125–134
Ubul K, Tursun G, Aysa A, Impedovo D, Pirlo G, Yibulayin T (2017) Script identification of multi-script documents: a survey. IEEE Access 5:6546–6559
Uchida S, Taira E, Sakoe H (2001) Nonuniform slant correction using dynamic programming. In: Document analysis and recognition, 2001. Proceedings. Sixth international conference on, IEEE, pp 434–438
Ul-Hasan A, Afzal MZ, Shafait F, Liwicki M, Breuel TM (2015) A sequence learning approach for multiple script identification. In: Document analysis and recognition (ICDAR), 2015 13th International conference on, IEEE, pp 1046–1050
Vajda S, Roy K, Pal U, Chaudhuri BB, Belaid A (2009) Automation of indian postal documents written in bangla and english. Int J Pattern Recognit Artif Intell 23(08):1599–1632
Van Erp M, Vuurpijl L, Schomaker L (2002) An overview and comparison of voting methods for pattern recognition. In: Frontiers in handwriting recognition, 2002. Proceedings. Eighth international workshop on, IEEE, pp 195–200
Verma B, Blumenstein M, Ghosh M (2004) A novel approach for structural feature extraction: contour vs. direction. Pattern Recognit Lett 25(9):975–988
Verma K, Sharma RK (2017) Comparison of HMM-and SVM-based stroke classifiers for Gurmukhi script. Neural Comput Appl 28(1):51–63
Viard-Gaudin C, Lallican PM, Knerr S, Binter P (1999) The ireste on/off (ironoff) dual handwriting database. In: Document analysis and recognition, 1999. ICDAR’99. Proceedings of the fifth international conference on, IEEE, pp 455–458
Vinciarelli A, Luettin J (2001) A new normalization technique for cursive handwritten words. Pattern Recognit Lett 22(9):1043–1050
Vučković V, Arizanović B (2017) Efficient character segmentation approach for machine-typed documents. Expert Syst Appl 80:210–231
Wang F, Guo Q, Lei J, Zhang J (2017) Convolutional recurrent neural networks with hidden Markov model bootstrap for scene text recognition. IET Comput Vis 11(6):497–504
Wang QF, Yin F, Liu CL (2012) Handwritten chinese text recognition by integrating multiple contexts. IEEE Trans Pattern Anal Mach Intell 34(8):1469–1481
Wei X, Ma S, Jin Y (2005) Segmentation of connected chinese characters based on genetic algorithm. In: Document analysis and recognition, 2005. Proceedings. Eighth international conference on, IEEE, pp 645–649
Wu YC, Yin F, Liu CL (2017) Improving handwritten chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recognit 65:251–264
Xiao X, Jin L, Yang Y, Yang W, Sun J, Chang T (2017) Building fast and compact convolutional neural networks for offline handwritten chinese character recognition. Pattern Recognit 72:72–81
Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435
Xu R, Yeung D, Shu W, Liu J (2002) A hybrid post-processing system for handwritten chinese character recognition. Int J Pattern Recognit Artif Intell 16(06):657–679
Xu R, Yeung DS, Shi D (2005) A hybrid post-processing system for offline handwritten chinese character recognition based on a statistical language model. Int J Pattern Recognit Artif Intell 19(03):415–428
Yamaguchi T, Tsuruoka S, Yoshikawa T, Shinogi T, Makimoto E, Ogata H, Shridhar M (2002) A segmentation system for touching handwritten Japanese characters. In: Frontiers in handwriting recognition, 2002. Proceedings. Eighth international workshop on, IEEE, pp 407–412
Yang W, Jin L, Tao D, Xie Z, Feng Z (2016) Dropsample: a new training method to enhance deep convolutional neural networks for large-scale unconstrained handwritten chinese character recognition. Pattern Recognit 58:190–203
Youn E, Koenig L, Jeong MK, Baek SH (2010) Support vector-based feature selection using fisher’s linear discriminant and support vector machine. Expert Syst Appl 37(9):6148–6156
Zamora-Martinez F, Frinken V, España-Boquera S, Castro-Bleda MJ, Fischer A, Bunke H (2014) Neural network language models for off-line handwriting recognition. Pattern Recognit 47(4):1642–1652
Zhang Q, Yang LT, Chen Z, Li P (2018) A survey on deep learning for big data. Inf Fusion 42:146–157
Zhang S, Jin L, Lin L (2016) Discovering similar chinese characters in online handwriting with deep convolutional neural networks. Int J Doc Anal Recognit 19(3):237–252
Zouari R, Boubaker H, Kherallah M (2016) A time delay neural network for online arabic handwriting recognition. In: International conference on intelligent systems design and applications, Springer, Berlin, pp 1005–1014
Acknowledgements
This research was supported by Council of Scientific and Industrial Research (CSIR) funded by the Ministry of Science and Technology (09/677(0031)/2018/EMR-I) as well as the Government of India.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kaur, S., Bawa, S. & Kumar, R. A survey of mono- and multi-lingual character recognition using deep and shallow architectures: indic and non-indic scripts. Artif Intell Rev 53, 1813–1872 (2020). https://doi.org/10.1007/s10462-019-09720-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-019-09720-9