Skip to main content

Advertisement

Log in

A comprehensive survey on word recognition for non-Indic and Indic scripts

  • Survey
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The term handwriting recognition is used to describe the capability of a computer system to transform human handwriting into machine processable text. Handwriting recognition has many applications in various fields such as bank-cheque processing, postal-address interpretation, document archiving, mail sorting and form processing in administration, insurance offices. A collection of different scripts is employed in writing languages throughout the world. Many researchers have done work for handwriting recognition of various non-Indic and Indic scripts from the most recent couple of years. But, only a limited number of systems are offered for word recognition for these scripts. This paper presents an extensive systematic survey of word recognition techniques. This survey of word recognition is classified broadly based on different scripts in which a word is written. Experimental evaluation of word recognition tools/techniques is presented in this paper. Different databases have been surveyed to evaluate the performance of techniques used to recognize words, and the achieved recognition accuracies have been reported. The efforts in two directions (non-Indic and Indic scripts) are reflected in this paper. We increased awareness of the potential benefits of word recognition techniques and identify the need to develop an efficient word recognition technique. Recommendations are also provided for future research. It is also observed that the research in this area is quietly thin and still more research is to be done, particularly in the case of word recognition of printed/handwritten documents in Indic scripts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Acharyya A, Rakshit S, Sarkar R, Basu S, Nasipuri M (2013) Handwritten word recognition using MLP based classifier: a holistic approach. Int J Comput Sci Issues 10(2):422–427

    Google Scholar 

  2. Adak C, Chaudhuri BB, Blumenstein M (2016) Offline cursive Bengali word recognition using CNNs with a recurrent model. In: Proceedings of the 15th international conference on frontiers in handwriting recognition, pp 429–434

  3. Al-Boeridi ON, Ahmad SM (2015) A scalable hybrid decision system (HDS) for Roman word recognition using ANN SVM: study case on Malay word recognition. Nat Comput Appl Forum 26(6):1505–1513

    Google Scholar 

  4. Bhowmik S, Malakar S, Sarkar R, Nasipuri M (2014) Handwritten Bangla word recognition using elliptical features. In: Proceedings of the sixth international conference on computational intelligence and communication networks, pp 257–261

  5. Bouaziz S, Mezghani A, Kanoun S (2014) Arabic handwritten word recognition with large vocabulary based on explicit segmentation. In: Proceedings of the international conference on information and communication technologies innovation and application, pp 1–4

  6. Bouwhuis D, Bouma H (1979) Visual word recognition of three letter words as derived from the recognition of the constituent letters. Percept Psychophys 25:12–22

    Article  Google Scholar 

  7. Caesar T, Gloger JM, Kaltenmeier A, Mandler E (1994) Handwritten word recognition using statistics. In: Proceedings of the IEE European workshop on handwriting analysis and recognition: a European perspective, pp 1–5

  8. Cattell J (1886) The time taken up by cerebral operations. Mind 11:277–282

    Google Scholar 

  9. Cheikh IB, Kacem A (2007) Neural network for the recognition of handwritten Tunisian city names. In: Proceedings of the international conference on document analysis and recognition, pp 1108–1112

  10. Chowdhury K, Alam L, Sarmin S, Arefin S, Hoque MM (2015) A fuzzy features based online handwritten Bangla word recognition framework. In: Proceedings of the 18th international conference on computer and information technology (ICCIT), pp 484–489

  11. Dasgupta J, Bhattacharya K, Chanda B (2016) A holistic approach for Off-line handwritten cursive word recognition using directional feature based on Arnold transform. Pattern Recogn Lett 79:73–79

    Article  Google Scholar 

  12. Dehghan M, Faez K, Ahmadi M, Shridhar M (2001) Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM. Pattern Recogn Lett 34(5):1057–1065

    Article  Google Scholar 

  13. Dhandra BV, Mallikarjun H, Hegadi R, Malemath VS (2006) Word-wise script identification from bilingual documents based on morphological reconstruction. In: Proceedings of the first international conference on digital information management, pp 389–394

  14. Dhiman S, Lehal GS (2017) Performance comparison of Gurmukhi script: k-NN classifier with DCT and Gabor filter. Int J Adv Res Comput Sci 8(5):762–764

    Google Scholar 

  15. Eynard L, Emptoz H (2009) Italic or roman: word style recognition without a priori knowledge for old printed documents. In: Proceedings of the 10th international conference on document analysis and recognition, pp 823–827

  16. Ghosh R, Roy PP (2016) Comparison of zone-features for online Bengali and Devanagari word recognition using HMM. In: Proceedings of the 15th international conference on frontiers in handwriting recognition, pp 435–440

  17. Gough PB (1972) One second of reading. In: Kavanagh JF, Mattingly IG (eds) Language by ear and by eye. MIT Press, Cambridge

    Google Scholar 

  18. Gowda PK, Chethan S, Harsha J, Rakesh J, Tanushree KN (2017) Offline Kannada handwritten word recognition using locality preserving projections (LPP). Int J Innov Res Comput Commun Eng 5(5):9955–9960

    Google Scholar 

  19. Fisher DF (1975) Reading and visual search. Memory Cogn 3:188–196

    Article  Google Scholar 

  20. Hafiz AM, Bhat GM (2016) Arabic OCR using a novel hybrid classification scheme. J Pattern Recognit Res 11(1):55–60

    Article  Google Scholar 

  21. Ibrayim M, Hamdulla A (2015) On-line handwritten Uyghur word recognition using segmentation-based techniques. Int J Signal Process Image Process Pattern Recognit 8(6):51–60

    Google Scholar 

  22. Imani Z, Ahmadyfard AR, Zohrevand A (2016) Holistic Farsi handwritten word recognition using gradient features. J Artif Intell Data Min 4(1):19–25

    Google Scholar 

  23. Jayech K, Mahjoub M, Amara NB (2016) Arabic handwritten word recognition based on dynamic Bayesian network. Int Arab J Inf Technol 13(6B):1024–1031

    Google Scholar 

  24. Karim A, Kadhm MS (2015) Handwriting word recognition based on neural networks. Int J Appl Eng Res 10(22):43120–43124

    Google Scholar 

  25. Karim A, Kadhm MS (2015) Handwriting word recognition based on SVM classifier. Int J Adv Comput Sci Appl 6(11):64–68

    Google Scholar 

  26. Khaissidi G, Elfakir Y, Mrabti M, Lakhliai Z, Chenouni D, El-yacoubi M (2016) Segmentation-free word spotting for handwritten Arabic documents. Int J Interact Multim Artif Intell 4:6–10

    Google Scholar 

  27. Khemiri A, Echi AK, Belaid A, Elloumi M (2016) A System for off-line Arabic handwritten word recognition based on Bayesian approach. In: Proceedings of the 15th international conference on frontiers in handwriting recognition, pp 560–565

  28. Kumar M, Chandran S (2015) Handwritten Malayalam word recognition system using neural networks. Int J Eng Res Technol (IJERT) 4(4):90–99

    Google Scholar 

  29. Kumar M, Jindal MK, Sharma RK, (2011a) Review on OCR for handwritten indian scripts character recognition. In: Proceedings of the first international conference on digital image processing and pattern recognition (DPPR), Tirunelveli, Tamil Nadu, vol 205, pp 268–276

    Chapter  Google Scholar 

  30. Kumar M, Jindal MK, Sharma RK (2011b) k-nearest neighbor based offline handwritten Gurmukhi character recognition. In: Proceedings of the international conference on image information processing (ICIIP), Jaypee University of Information Technology, Waknaghat (Shimla), pp 1–4

  31. Kumar M, Sharma RK, Jindal MK (2011c) Classification of characters and grading writers in offline handwritten Gurmukhi script. In: Proceedings of the international conference on image information processing (ICIIP), Jaypee University of Information Technology, Waknaghat (Shimla), pp 1–4

  32. Kumar M, Sharma RK, Jindal MK, (2011d) SVM based offline handwritten Gurmukhi character recognition. In: Proceedings of the international workshop on soft computing applications and knowledge discovery (SCAKD), National Research University Higher School of Economics, Moscow (Russia), pp 51–62

  33. Kumar M, Jindal MK, Sharma RK (2012) Offline handwritten Gurmukhi character recognition: study of different features and classifiers combinations. In: Proceedings of the workshop on document analysis and recognition (IWDAR), IIT Bombay, pp 94–99

  34. Kumar M, Jindal MK, Sharma RK (2013) PCA based offline handwritten Gurmukhi character recognition. Smart Comput Rev 3(5):346–357

    Article  Google Scholar 

  35. Kumar M, Sharma RK, Jindal MK (2014) Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition. Natl Acad Sci Lett 37(4):381–391

    Article  Google Scholar 

  36. Kumar M, Sharma RK, Jindal MK (2014) A novel hierarchical technique for offline handwritten Gurmukhi character recognition. Natl Acad Sci Lett 37(6):567–572

    Article  Google Scholar 

  37. Kumar M, Jindal MK, Sharma RK, Jindal SR (2018) Character and numeral recognition for Non-Indic and Indic scripts: a survey. Artif Intell Rev. https://doi.org/10.1007/s10462-017-9607-x

    Article  Google Scholar 

  38. Kumar S (2016) A study for handwritten Devanagari word recognition. In: Proceedings of the international conference on communication and signal processing, pp 1009–1014

  39. Lehal GS, Singh C (2000) A Gurmukhi script recognition system. In: Proceedings of the international conference on pattern recognition (ICPR’00), pp 557–560

  40. Liu J, Ma LL, Wu J (2016) Online handwritten Mongolian word recognition using MWRCNN and position maps. In: Proceedings of the 15th international conference on frontiers in handwriting recognition, pp 60–65

  41. Maruyama K, Nakano Y (2000) Recognition method for cursive Japanese word written in latin characters. In: Proceedings of the seventh international workshop on frontiers in handwriting recognition, pp 133–142

  42. Mohanty S, Swain BK (2010) Markov model based Oriya isolated speech recognizer—an emerging solution for visually impaired students in school and public examination. In: Proceedings of the international conference on communications and technologies, pp 107–111

  43. Mori S, Suen CY, Yamato K (1992) Historical review of OCR research and development. Proc IEEE 80(7):1029–1058

    Article  Google Scholar 

  44. Moubtahij HE, Satori K, Halli A (2016) Recognition of off-line Arabic handwriting words using HMM toolkit (HTK). In: Proceedings of the 13th international conference computer graphics, imaging and visualization, pp 167–171

  45. Naik A, Patel MS (2014) Offline English handwritten word recognizer using best feature extraction. Int J Adv Comput Theory Eng (IJACTE) 3(2):61–63

    Google Scholar 

  46. Obaidullah SM, Santosh KC, Halder C, Das N, Roy K (2017) Automatic Indic script identification from handwritten documents: page, block, line and word-level approach. J Mach Learn Cybern (JMLC). https://doi.org/10.1007/s13042-017-0702-8

    Article  Google Scholar 

  47. Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl 28(12):3941–3951

    Article  Google Scholar 

  48. Patel C, Desai A (2011) Zone identification for Gujarati handwritten word. In: Proceedings of the second international conference on emerging applications of information technology, pp 194–197

  49. Patel MS, Reddy SC (2014) An impact of grid based approach in offline handwritten Kannada word recognition. In: Proceedings of the international conference on contemporary computing and informatics (IC3I), pp 630–633

  50. Patel MS, Kumar R, Reddy SC (2015) Offline Kannada handwritten word recognition using locality preserving projection (LPP) for feature extraction. Int J Innov Res Sci Eng Technol 4(7):5078–5086

    Article  Google Scholar 

  51. Patil P, Ansari S (2014) Online handwritten Devanagari word recognition using HMM based technique. Int J Comput Appl 95(17):17–21

    Google Scholar 

  52. Rani R, Dhir R, Lehal GS (2013) Modified gabor feature extraction method for word level script identification—experimentation with Gurumukhi and English scripts. Int J Signal Process Image Process Pattern Recognit 6(5):25–38

    Google Scholar 

  53. Ranjan R, Dubey RK (2016) Isolated word recognition using HMM for Maithili dialect. In: Proceedings of the international conference on signal processing and communication, pp 323–327

  54. Rasagna V, Kumar A, Jawahar CV, Manmatha R (2009) Robust recognition of documents by fusing results of word clusters. In: Proceedings of the 10th international conference on document analysis and recognition, pp 566–570

  55. Roy K, Pal U (2006) Word-wise hand-written script separation for Indian Postal automation. In: Tenth international workshop on frontiers in handwriting recognition, pp 1–6

  56. Roy K, Alaei A, Pal U (2010) Word-wise handwritten Persian and Roman script identification. In: Proceedings of the 12th international conference on frontiers in handwriting recognition, pp 628–633

  57. Roy PP, Chherawala Y, Cheriet M (2014) Deep-belief-network based rescoring for handwritten word recognition. In: Proceedings of the 14th international conference on frontiers in handwriting recognition, pp 506–511

  58. Sahu AK, Mati GR (2016) Odia isolated word recognition using DTW. Int J Eng Res Technol (IJERT) 5(8):552–556

    Google Scholar 

  59. Septi M, Bedda M (2006) Contribution to the recognition of hand Arabic word based on neural network. In: Proceedings of the international conference on information and communication technologies, pp 1635–1639

  60. Sharma DV, Jhajj P (2010) Recognition of isolated handwritten characters in Gurmukhi script. Int J Comput Appl 4(8):9–17

    Google Scholar 

  61. Shaw B, Parui SK, Shridhar M (2008) Offline handwritten Devanagari word recognition: a holistic approach based on directional chain code feature and HMM. In: Proceedings of the international conference on information technology, pp 203–208

  62. Shaw B, Bhattacharya U, Parui SK (2015) Offline handwritten Devanagari word recognition: information fusion at feature and classifier levels. In: Proceedings of the 3rd IAPR Asian conference on pattern recognition, pp 720–724

  63. Shridhar M, Kimura F, Truijen B, Houle GF (2002) Impact of Lexicon completeness on city name recognition. In: Proceedings of the eighth international workshop on frontiers in handwriting recognition (IWFHR’02), pp 513–518

  64. Singh G, Sachan M (2014) Multi-layer perceptorn (MLP) neural network technique for offline handwritten Gurmukhi character recognition. In: Proceedings of the IEEE international conference on computational intelligence and computing research, pp 221–225

  65. Singh S, Kariveda T, Gupta JD, Bhattacharya K (2015) Handwritten words recognition for legal amounts of bank cheques in English script. In: Proceedings of the 8th international conference on advances in pattern recognition, pp 1–5

  66. Smith F (1969) Familiarity of configuration vs. discriminability of features in the visual identification of words. Psychon Sci 14:261–262

    Article  Google Scholar 

  67. Sperling G (1963) A model for visual memory tasks. Hum Factors 5:19–31

    Article  Google Scholar 

  68. Steinherz T, Rivlin E, Intrator N (1999) Offline cursive script word recognition—a survey. IJDAR 2(2–3):90–110

    Google Scholar 

  69. Su B, Lu S (2017) Accurate recognition of words in scenes without character segmentation using recurrent neural network. Pattern Recogn Lett 63:397–405

    Article  Google Scholar 

  70. Tamen Z, Drias H, Boughaci D (2017) An efficient multiple classifier system for Arabic handwritten words recognition. Pattern Recogn Lett 93:123–132

    Article  Google Scholar 

  71. Tay YH, Lallican PM, Khalid M, Gaudin CV, Knerr S (2010) An offline cursive handwritten word recognition system. In: Proceedings of the IEEE region 10 international conference on electrical and electronic technology, pp 519–524

  72. Thadchanamoorthy S, Kodikara ND, Premaretne HL (2013) Tamil handwritten city name database development and recognition for postal automation. In: Proceedings of the 12th international conference on document analysis and recognition, pp 793–797

  73. Verma B, Gader P, Chen W (2001) Fusion of multiple handwritten word recognition techniques. Pattern Recogn Lett 22(9):991–998

    Article  Google Scholar 

  74. Vichianchai V (2011) Thai-word segmentation through Thai writing structure matching. In: Proceedings of the international conference on modeling, simulation and control, vol 10, pp 184–188

  75. Vinciarelli A (2002) A survey of offline cursive word recognition. Pattern Recogn 35(7):1433–1446

    Article  Google Scholar 

  76. Waard WPD (1995) An optimized minimal edit distance for hand-written word recognition. Pattern Recogn Lett 16:1091–1096

    Article  Google Scholar 

  77. Wang GY, Zhang YM, Sun ML, Wang X, Zhang Y (2016) Speech signal feature parameters extraction algorithm based on PCNN for isolated word recognition. In: Proceedings of the international conference on audio, language and image processing (ICALIP), pp 679–682

  78. Woodworth RS (1938) Experimental psychology. Holt, New York

    Google Scholar 

  79. Zhang H, Cao X, Ho JKL, Chow TWS (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531

    Article  Google Scholar 

  80. Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inform 13(2):616–624

    Article  Google Scholar 

  81. Zhang Q, Yuan Y, Li N, Wei X, Miao J (2009) A new way for chinese place name recognition. In: Proceedings of the international conference on Asian language processing, pp 129–134

  82. Zinjore RS, Ramteke RJ (2015) Identification and removal of Devanagari script and extraction of roman words from printed bilingual text document. In: IJCA proceedings on national conference on digital image and signal processing (DISP), pp 17–20

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munish Kumar.

Appendices

Appendix 1: A quality assessment forms

1.1 Screening question

Section-1

Does the research paper refer to word recognition?

Yes

Consider:

The paper includes the study of word recognition. All types of studies, i.e., case study, experimental study or research paper is included.

Section-1 is evaluated first. If the reply is positive, then proceed to Section-2.

1.2 Screening question

Section-2

Key sub-area categorization

Is the research paper focusing on word recognition?

Yes

Consider:

– Is the study’s focus or main focus on word recognition or not?

– Did the study fit in any one of the sub-areas categorized? (Apparently the study motivated different categories.)

If the study’s primary focus is on word detection, proceed to section-3, else proceed to section-4.

1.3 Detailed questions

Section-3

Findings

Is there clear statement of the findings?

Yes

Consider:

Did the study mention the approach/word detection?

Has the word detection technique reported?

What is the corresponding transformation technique, findings, i.e., source representation?

Comparison

Was the data reported sufficient for comparative analysis?

Yes

Consider:

Are the necessary parameters for comparison discussed?

Is the study referring to handwritten word recognition explicitly?

1.4 Detailed questions

Section-4

Findings

Did the study mention the type of word recognition?

Yes

Consider:

How well the word recognition is categorized?

Did the study explicitly mention the type of word recognition, or is to be inferred from the study?

Appendix 2: Data items extracted from all papers

Data item

Description

Study identifier

Unique ID for the study

Bibliographic data

Author, year, title, source

Type of article

Journal article, conference article, workshop paper

Study aims/context/application domain

What are the aims of the study, i.e., search focus, i.e., the research areas the paper focus on

Study design

Classification of study—feature extraction, classification, word recognition, comparative analysis, etc.

What is the word recognition technique?

It explicitly refers to the techniques used for extracting the features of word, segmentation techniques if any and classification techniques to recognize a word

How was comparison carried out?

Values of important parameters for word recognition, i.e., recall, precision, application area, scalability, portability

Subject system

How the data was collected: it refers to the subject system and its size

Data analysis

Data analysis, i.e., corresponding source representation and match detection techniques are extracted

Developer of the tool and usage

It refers to the word detection tool, developer and usage of the tool

Study findings

Major findings or conclusions from the primary study like percentage of word’s recognition accuracy

Other

Does the study explicitly refer to handwritten word recognition or printed word recognition, any other important point

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaur, H., Kumar, M. A comprehensive survey on word recognition for non-Indic and Indic scripts. Pattern Anal Applic 21, 897–929 (2018). https://doi.org/10.1007/s10044-018-0731-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-018-0731-2

Keywords

Navigation