Skip to main content

Offline script recognition from handwritten and printed multilingual documents: a survey

Abstract

Script recognition has many real-life applications like optical character recognition, document archiving, writer identification, searching within the documents, etc. Automatic script recognition from multilingual documents is a stimulating task, where the system must identify and recognize several types of scripts that can be available on a single page. In offline script recognition, printed or handwritten documents are firstly scanned followed by the process of script recognition, whereas in online script recognition documents are already in soft-copy form. Most of the script recognition techniques presented by researchers so far are based on traditional image processing frameworks. But nowadays, it is observed that Deep Learning-based techniques are more capable of achieving a script recognition task efficiently as well as accurately. This paper provides a comprehensive survey of various techniques available for identification and recognition of multilingual scripts from the last few decades that are mainly focused on Indic scripts. However, some potential non-Indic script identification works are also incorporated for ease of understanding. We hope that this survey can act as a compendium as well as provide future directions to researchers for developing generic OCRs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Center for microprocessor application for training education and research (cmater. https://code.google.com/archive/p/cmaterdb/

  2. 2.

    Morphological image processing. https://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/ImageProcessing-html/topic4.htm

  3. 3.

    Ablavsky, V., Stevens, M.R.: Automatic feature selection with applications to script identification of degraded documents. In: ICDAR, pp. 750–754. Citeseer (2003)

  4. 4.

    Acharya, D.U., Gopakumar, R., Aithal, P.K.: Multi-script line identification system for indian languages. J. Comput. 2(11), 107–111 (2010)

    Google Scholar 

  5. 5.

    Aithal, P.K., Rajesh, G., Acharya, D.U., Subbareddy, N.K.M.: Text line script identification for a tri-lingual document. In: 2010 Second International conference on Computing, Communication and Networking Technologies, pp. 1–3. IEEE (2010)

  6. 6.

    Angadi, S.A., Kodabagi, M.: A fuzzy approach for word level script identification of text in low resolution display board images using wavelet features. In: 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1804–1811. IEEE (2013)

  7. 7.

    Ansari, G.J., Shah, J.H., Yasmin, M., Sharif, M., Fernandes, S.L.: A novel machine learning approach for scene text extraction. Future Gener. Comput. Syst. 87, 328–340 (2018)

    Google Scholar 

  8. 8.

    Bashir, R., Quadri, S.: Identification of kashmiri script in a bilingual document image. In: 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013), pp. 575–579. IEEE (2013)

  9. 9.

    Bashir, R., Quadri, S., Giri, K.J.: Script identification: a review. Int. J. Inf. Technol. pp. 1–15 (2018)

  10. 10.

    Benjelil, M., Kanoun, S., Mullot, R., Alimi, A.M.: Arabic and latin script identification in printed and handwritten types based on steerable pyramid features. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 591–595. IEEE (2009)

  11. 11.

    Benjelil, M., Mullot, R., Alimi, A.M.: Language and script identification based on steerable pyramid features. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 716–721. IEEE (2012)

  12. 12.

    Bhattacharya, U.: Indian scripts character database (isical). https://www.isical.ac.in/~ujjwal/download/database.html

  13. 13.

    Bhunia, A.K., Konwer, A., Bhunia, A.K., Bhowmick, A., Roy, P.P., Pal, U.: Script identification in natural scene image and video frames using an attention based convolutional-lstm network. Pattern Recogn. 85, 172–184 (2019)

    Google Scholar 

  14. 14.

    Bhunia, A.K., Mukherjee, S., Sain, A., Bhunia, A.K., Roy, P.P., Pal, U.: Indic handwritten script identification using offline-online multi-modal deep network. Inf. Fusion 57, 1–14 (2020)

    Google Scholar 

  15. 15.

    Busch, A., Boles, W.W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1720–1732 (2005). https://doi.org/10.1109/TPAMI.2005.227

    Article  Google Scholar 

  16. 16.

    Carbune, V., Gonnet, P., Deselaers, T., Rowley, H.A., Daryin, A., Calvo, M., Wang, L.L., Keysers, D., Feuz, S., Gervais, P.: Fast multi-language lstm-based online handwriting recognition. International Journal on Document Analysis and Recognition (IJDAR) pp. 1–14 (2020)

  17. 17.

    Chanda, S., Franke, K., Pal, U.: Identification of indic scripts on torn-documents. In: 2011 International Conference on Document Analysis and Recognition, pp. 713–717. IEEE (2011)

  18. 18.

    Chanda, S., Pal, S., Franke, K., Pal, U.: Two-stage approach for word-wise script identification. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 926–930. IEEE (2009)

  19. 19.

    Chanda, S., Pal, S., Pal, U.: Word-wise sinhala tamil and english script identification using gaussian kernel svm. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)

  20. 20.

    Chanda, S., Pal, U.: English, devanagari and urdu text identification. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 538–545. Citeseer (2005)

  21. 21.

    Chanda, S., Pal, U., Franke, K., Kimura, F.: Script identification–a han and roman script perspective. In: 2010 20th International Conference on Pattern Recognition, pp. 2708–2711. IEEE (2010)

  22. 22.

    Chanda, S., Pal, U., Kimura, F.: Identification of japanese and english script from a single document page. In: 7th IEEE International Conference on Computer and Information Technology (CIT 2007), pp. 656–661. IEEE (2007)

  23. 23.

    Chanda, S., Terrades, O.R., Pal, U.: Svm based scheme for thai and english script identification. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 1, pp. 551–555. IEEE (2007)

  24. 24.

    Chaudhari, S.A., Gulati, R.M.: An ocr for separation and identification of mixed english–gujarati digits using knn classifier. In: 2013 International Conference on Intelligent Systems and Signal Processing (ISSP), pp. 190–193. IEEE (2013)

  25. 25.

    Chaudhuri, B., Pal, U.: A complete printed bangla ocr system. Pattern Recogn. 31(5), 531–549 (1998)

    Google Scholar 

  26. 26.

    Chaudhury, S., Sheth, R.: Trainable script identification strategies for indian languages. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR’99 (Cat. No. PR00318), pp. 657–660. IEEE (1999)

  27. 27.

    Choudhary, A., Ahlawat, S., Rishi, R., Dhaka, V.S.: Performance analysis of feed forward mlp with various activation functions for handwritten numerals recognition. In: 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE), vol. 5, pp. 852–856. IEEE (2010)

  28. 28.

    Dalal, S., Malik, L.: A survey for feature extraction methods in handwritten script identification. Int. J. Simul. Syst. Sci. Technol. 10, 1–7 (2009)

    Google Scholar 

  29. 29.

    Das, M.S., Rani, D.S., Reddy, C.: Heuristic based script identification from multilingual text documents. In: 2012 1st International Conference on Recent Advances in Information Technology (RAIT), pp. 487–492. IEEE (2012)

  30. 30.

    Das, N., Acharya, K., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M.: A benchmark image database of isolated bangla handwritten compound characters. IJDAR 17(4), 413–431 (2014)

    Google Scholar 

  31. 31.

    Dhaka, V., et al.: Offline language-free writer identification based on speeded-up robust features. Int. J. Eng. 28(7), 984–994 (2015)

    Google Scholar 

  32. 32.

    Dhandra, B., Hangarge, M.: Global and local features based handwritten text words and numerals script identification. In: International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), vol. 2, pp. 471–475. IEEE (2007)

  33. 33.

    Dhandra, B., Mallikarjun, H., Hegadi, R., Malemath, V.: Word-wise script identification based on morphological reconstruction in printed bilingual documents (2006)

  34. 34.

    Dhandra, B., Nagabhushan, P., Hangarge, M., Hegadi, R., Malemath, V.: Script identification based on morphological reconstruction in document images. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 2, pp. 950–953. IEEE (2006)

  35. 35.

    Dhanya, D., Ramakrishnan, A.: Script identification in printed bilingual documents. In: International Workshop on Document Analysis Systems, pp. 13–24. Springer (2002)

  36. 36.

    Dongre, V.J., Mankar, V.H.: Development of comprehensive devnagari numeral and character database for offline handwritten character recognition. Appl. Comput. Intell. Soft Comput. 2012, (2012)

  37. 37.

    Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.: Offline handwriting recognition on devanagari using a new benchmark dataset. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 25–30. IEEE (2018)

  38. 38.

    Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.: Towards spotting and recognition of handwritten words in indic scripts. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 32–37. IEEE (2018)

  39. 39.

    Ferrer, M.A., Morales, A., Pal, U.: Lbp based line-wise script identification. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 369–373. IEEE (2013)

  40. 40.

    Ghosh, D., Dube, T., Shivaprasad, A.: Script recognition-a review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)

    Google Scholar 

  41. 41.

    Ghosh, R., Vamshi, C., Kumar, P.: Rnn based online handwritten word recognition in devanagari and bengali scripts using horizontal zoning. Pattern Recogn. 92, 203–218 (2019)

    Google Scholar 

  42. 42.

    Ghosh, S., Chaudhuri, B.B.: Composite script identification and orientation detection for indian text images. In: 2011 International Conference on Document Analysis and Recognition, pp. 294–298. IEEE (2011)

  43. 43.

    Gllavata, J., Freisleben, B.: Script recognition in images with complex backgrounds. In: Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005., pp. 589–594. IEEE (2005)

  44. 44.

    Gonzalez, R.C., Woods, R.E.: Digital image processing (2002)

  45. 45.

    Gopakumar, R., Subbareddy, N., Makkithaya, K., Acharya, D.U.: Script identification from multilingual indian documents using structural features. J. Comput. 2(7), 106–111 (2010)

    Google Scholar 

  46. 46.

    Guru, D., Ravikumar, M., Harish, B.: A review on offline handwritten script identification. Int. J. Comput. Appl. 975, 8878 (2012)

    Google Scholar 

  47. 47.

    Halder, C., Obaidullah, S.M., Roy, K.: Offline writer identification from isolated characters using textural features. In: Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015, pp. 221–231. Springer (2016)

  48. 48.

    Hangarge, M., Dhandra, B.: Offline handwritten script identification in document images. Int. J. Comput. Appl. 4(6), 6–10 (2010)

    Google Scholar 

  49. 49.

    Hangarge, M., Santosh, K., Pardeshi, R.: Directional discrete cosine transform for handwritten script identification. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 344–348. IEEE (2013)

  50. 50.

    Hiremath, P., Pujari, J.D., Shivashankar, S., Mouneswara, V.: Script identification in a handwritten document image using texture features. In: 2010 IEEE 2nd International Advance Computing Conference (IACC), pp. 110–114. IEEE (2010)

  51. 51.

    Hiremath, P.S., Shivashankar, S.: Wavelet based co-occurrence histogram features for texture classification with an application to script identification in a document image. Pattern Recogn. Lett. 29(9), 1182–1189 (2008)

    Google Scholar 

  52. 52.

    Hochberg, J., Bowers, K., Cannon, M., Kelly, P.: Script and language identification for handwritten document images. Int. J. Doc. Anal. Recogn. 2(2–3), 45–52 (1999)

    Google Scholar 

  53. 53.

    Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 176–181 (1997)

    Google Scholar 

  54. 54.

    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016)

    MathSciNet  Google Scholar 

  55. 55.

    Jaeger, S., Ma, H., Doermann, D.: Identifying script on word-level with informational confidence. In: Eighth International Conference on Document Analysis and Recognition (ICDAR’05), pp. 416–420. IEEE (2005)

  56. 56.

    Jindal, M., Hemrajani, N.: Script identification for printed document images at text-line level using dct and pca. IOSR J. Comput. Eng. 12(5), 97–102 (2013)

    Google Scholar 

  57. 57.

    John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Machine Learning Proceedings 1994, pp. 121–129. Elsevier (1994)

  58. 58.

    Joshi, G.D., Garg, S., Sivaswamy, J.: Script identification from indian documents. In: International Workshop on Document Analysis Systems, pp. 255–267. Springer (2006)

  59. 59.

    Juan Cheng, Xijian Ping, Guanwei Zhou, Yang Yang: Script identification of document image analysis. In: First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC’06), vol. 3, pp. 178–181 (2006). https://doi.org/10.1109/ICICIC.2006.518

  60. 60.

    Jundale, T.A., Hegadi, R.S.: Skew detection and correction of devanagari script using hough transform. Proc. Comput. Sci. 45, 305–311 (2015)

    Google Scholar 

  61. 61.

    Jundale, T.A., Hegadi, R.S.: Skew detection of devanagari script using pixels of axes-parallel rectangle and linear regression. In: 2015 International Conference on Energy Systems and Applications, pp. 480–484. IEEE (2015)

  62. 62.

    Jundale, T.A., Hegadi, R.S.: Skew detection and correction of devanagari script using interval halving method. In: International Conference on Recent Trends in Image Processing and Pattern Recognition, pp. 28–38. Springer (2016)

  63. 63.

    Kanoun, S., Ennaji, A., LeCourtier, Y., Alimi, A.M.: Script and nature differentiation for arabic and latin text images. In: Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 309–313. IEEE (2002)

  64. 64.

    Keserwani, P., De, K., Roy, P.P., Pal, U.: Zero shot learning based script identification in the wild. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 987–992. IEEE (2019)

  65. 65.

    Khoddami, M., Behrad, A.: Farsi and latin script identification using curvature scale space features. In: 10th Symposium on Neural Network Applications in Electrical Engineering, pp. 213–217. IEEE (2010)

  66. 66.

    Krishnan, P., Jawahar, C.: Hwnet v2: an efficient word image representation for handwritten documents. IJDAR 22(4), 387–405 (2019)

    Google Scholar 

  67. 67.

    Kumar, B., Bera, A., Patnaik, T.: Line based robust script identification for indianlanguages. Int. J. Inf. Electron. Eng. 2(2), 189 (2012)

    Google Scholar 

  68. 68.

    Lee, D.S., Nohl, C.R., Baird, H.S.: Language identification in complex, unoriented, and degraded document images. In: Document Analysis Systems II, pp. 17–39. World Scientific (1998)

  69. 69.

    Li, L., Tan, C.L.: Script identification of camera-based images. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)

  70. 70.

    Lin, X.R., Guo, C.Y., Chang, F.: Classifying textual components of bilingual documents with decision-tree support vector machines. In: 2011 International Conference on Document Analysis and Recognition, pp. 498–502. IEEE (2011)

  71. 71.

    Lu, S., Tan, C.L.: Automatic detection of document script and orientation. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 1, pp. 237–241. IEEE (2007)

  72. 72.

    Luqman, H., Mahmoud, S.A., Awaida, S.: Kafd arabic font database. Pattern Recogn. 47(6), 2231–2240 (2014)

    Google Scholar 

  73. 73.

    Ma, H., Doermann, D.S.: Gabor filter based multi-class classifier for scanned document images. In: ICDAR, vol. 3, p. 968. Citeseer (2003)

  74. 74.

    Mahmoud, S.A., Ahmad, I., Alshayeb, M., Al-Khatib, W.G., Parvez, M.T., Fink, G.A., Märgner, V., El Abed, H.: Khatt: Arabic offline handwritten text database. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 449–454. IEEE (2012)

  75. 75.

    Mane, D., Kulkarni, U.: Visualizing and understanding customized convolutional neural network for recognition of handwritten marathi numerals. Proc. Comput. Sci. 132, 1123–1137 (2018)

    Google Scholar 

  76. 76.

    Manjula, S., Hegadi, R.S.: A review on multilingual document analysis in indian context. In: 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), pp. 519–522. IEEE (2016)

  77. 77.

    Manjula, S., Hegadi, R.S.: Identification and classification of multilingual document using maximized mutual information. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp. 1679–1682. IEEE (2017)

  78. 78.

    Manjula, S., Hegadi, R.S.: Recognition of oriya and english languages based on lbp features. In: 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–3. IEEE (2017)

  79. 79.

    Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)

    MATH  Google Scholar 

  80. 80.

    Mohanty, S., Bebartta, H.D.: A novel approach for bilingual (english-oriya) script identification and recognition in a printed document. IJIP 4(2), 175 (2010)

    Google Scholar 

  81. 81.

    Morera, Á., Sánchez, Á., Vélez, J.F., Moreno, A.B.: Gender and handedness prediction from offline handwriting using convolutional neural networks. Complexity 2018, (2018)

  82. 82.

    Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of ocr research and development. Proc. IEEE 80(7), 1029–1058 (1992)

    Google Scholar 

  83. 83.

    Moussa, S.B., Zahour, A., Benabdelhafid, A., Alimi, A.M.: Fractal-based system for arabic/latin, printed/handwritten script identification. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)

  84. 84.

    Namboodiri, A.M., Jain, A.K.: Online script recognition. In: Object recognition supported by user interaction for service robots, vol. 3, pp. 736–739. IEEE (2002)

  85. 85.

    Namboodiri, A.M., Jain, A.K.: Online handwritten script recognition. IEEE Trans. Pattern Anal. Mach. Intell. 26(1), 124–130 (2004). https://doi.org/10.1109/TPAMI.2004.1261096

    Article  Google Scholar 

  86. 86.

    Nethravathi, B., Archana, C., Shashikiran, K., Ramakrishnan, A.G., Kumar, V.: Creation of a huge annotated database for tamil and kannada ohr. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 415–420. IEEE (2010)

  87. 87.

    Obaidullah, S.M., Das, N., Halder, C., Roy, K.: Indic script identification from handwritten document images–an unconstrained block-level approach. In: 2015 IEEE 2nd international conference on recent trends in information systems (ReTIS), pp. 213–218. IEEE (2015)

  88. 88.

    Obaidullah, S.M., Das, S.K., Roy, K.: A system for handwritten script identification from indian document. J. Pattern Recogn. Res. 8(1), 1–12 (2013)

    Google Scholar 

  89. 89.

    Obaidullah, S.M., Goswami, C., Santosh, K., Das, N., Halder, C., Roy, K.: Separating indic scripts with matra for effective handwritten script identification in multi-script documents. Int. J. Pattern Recognit Artif Intell. 31(05), 1753003 (2017)

    Google Scholar 

  90. 90.

    Obaidullah, S.M., Goswami, C., Santosh, K., Halder, C., Das, N., Roy, K.: Separating indic scripts with ‘matra’–a precursor to script identification in multi-script documents. In: Proceedings of International Conference on Computer Vision and Image Processing, pp. 205–214. Springer (2017)

  91. 91.

    Obaidullah, S.M., Halder, C., Das, N., Roy, K.: Numeral script identification from handwritten document images. Proc. Comput. Sci. 54, 585–594 (2015)

    Google Scholar 

  92. 92.

    Obaidullah, S.M., Halder, C., Das, N., Roy, K.: A corpus of word-level offline handwritten numeral images from official indic scripts. In: Proceedings of the Second International Conference on Computer and Communication Technologies, pp. 703–711. Springer (2016)

  93. 93.

    Obaidullah, S.M., Halder, C., Das, N., Roy, K.: A new dataset of word-level offline handwritten numeral images from four official indic scripts and its benchmarking using image transform fusion. Int. J. Intell. Eng. Inf. 4(1), 1–20 (2016)

    Google Scholar 

  94. 94.

    Obaidullah, S.M., Halder, C., Das, N., Roy, K.: Pwdb\_13: A corpus of word-level printed document images from thirteen official indic scripts. In: Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015, pp. 233–242. Springer (2016)

  95. 95.

    Obaidullah, S.M., Halder, C., Das, N., Roy, K.: Visual analytic-based technique for handwritten indic script identification–a greedy heuristic feature fusion framework. In: Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015, pp. 211–219. Springer (2016)

  96. 96.

    Obaidullah, S.M., Halder, C., Santosh, K., Das, N., Roy, K.: Automatic line-level script identification from handwritten document images-a region-wise classification framework for indian subcontinent. Malays. J. Comput. Sci. 31(1), 63–84 (2018)

    Google Scholar 

  97. 97.

    Obaidullah, S.M., Halder, C., Santosh, K., Das, N., Roy, K.: Phdindic\_11: page-level handwritten document image dataset of 11 official indic scripts for script identification. Multimedia Tools Appl. 77(2), 1643–1678 (2018)

    Google Scholar 

  98. 98.

    Obaidullah, S.M., Karim, R., Shaikh, S., Halder, C., Das, N., Roy, K.: Transform based approach for indic script identification from handwritten document images. In: 2015 3rd International Conference on Signal Processing, Communication and Networking (ICSCN), pp. 1–7. IEEE (2015)

  99. 99.

    Obaidullah, S.M., Roy, K., Das, N.: Comparison of different classifiers for script identification from handwritten document. In: 2013 IEEE International Conference on Signal Processing, Computing and Control (ISPCC), pp. 1–6. IEEE (2013)

  100. 100.

    Obaidullah, S.M., Santosh, K., Das, N., Halder, C., Roy, K.: Handwritten indic script identification in multi-script document images: a survey. Int. J. Pattern Recognit Artif Intell. 32(10), 1856012 (2018)

    Google Scholar 

  101. 101.

    Obaidullah, S.M., Santosh, K., Halder, C., Das, N., Roy, K.: Automatic indic script identification from handwritten documents: page, block, line and word-level approach. Int. J. Mach. Learn. Cybernet. 10(1), 87–106 (2019)

    Google Scholar 

  102. 102.

    Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    MathSciNet  Google Scholar 

  103. 103.

    Padma, M., Vijaya, P.: Identification of telugu devanagari and english scripts using discriminating. J. Comput. Sci. 1, 64–78 (2009)

    Google Scholar 

  104. 104.

    Padma, M., Vijaya, P.: Monothetic separation of telugu, hindi and english text lines from a multi script document. In: 2009 IEEE International Conference on Systems, Man and Cybernetics, pp. 4870–4875. IEEE (2009)

  105. 105.

    Padma, M., Vijaya, P.: Entropy based texture features useful for automatic script identification. Int. J. Comput. Sci. Eng. 2(02), 115–120 (2010)

    Google Scholar 

  106. 106.

    Padma, M., Vijaya, P.: Global approach for script identification using wavelet packet based features. Int. J. Signal Process. Image Process. Pattern Recogn. 3(3), 29–40 (2010)

  107. 107.

    Padma, M., Vijaya, P.: Script identification from trilingual documents using profile based features. IJCSA 7(4), 16–33 (2010)

    Google Scholar 

  108. 108.

    Padma, M., Vijaya, P.: Wavelet packet based texture features for automatic script identification. Int. J. Image Process 4(1), 53–65 (2010)

    Google Scholar 

  109. 109.

    Pal, U., Belaıd, A., Choisy, C.: Touching numeral segmentation using water reservoir concept. Pattern Recogn. Lett. 24(1–3), 261–272 (2003)

    Google Scholar 

  110. 110.

    Pal, U., Chaudhuri, B.: Automatic separation of words in multi-lingual multi-script indian documents. In: Proceedings of the fourth international conference on document analysis and recognition, vol. 2, pp. 576–579. IEEE (1997)

  111. 111.

    Pal, U., Chaudhuri, B.: Automatic identification of english, chinese, arabic, devnagari and bangla script line. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, pp. 790–794. IEEE (2001)

  112. 112.

    Pal, U., Chaudhuri, B.: Identification of different script lines from multi-script documents. Image Vis. Comput. 20(13–14), 945–954 (2002)

    Google Scholar 

  113. 113.

    Pal, U., Chaudhuri, B.: Script line separation from indian multi-script documents. IETE J. Res. 49(1), 3–11 (2003)

    Google Scholar 

  114. 114.

    Pal, U., Chaudhuri, B.: Indian script character recognition: a survey. pattern Recognition 37(9), 1887–1899 (2004)

  115. 115.

    Pal, U., Roy, R.K., Roy, K., Kimura, F.: Indian multi-script full pin-code string recognition for postal automation. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 456–460 (2009). https://doi.org/10.1109/ICDAR.2009.171

  116. 116.

    Pal, U., Sarkar, A.: Recognition of printed urdu script. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings., pp. 1183–1187. Citeseer (2003)

  117. 117.

    Pal, U., Sharma, N., Wakabayashi, T., Kimura, F.: Handwritten numeral recognition of six popular indian scripts. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 749–753. IEEE (2007)

  118. 118.

    Pal, U., Sinha, S., Chaudhuri, B.: Multi-script line identification from indian documents. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings., pp. 880–884. IEEE (2003)

  119. 119.

    Pan, J., Tang, Y.: A rotation-robust script identification based on bemd and lbp. In: 2011 International Conference on Wavelet Analysis and Pattern Recognition, pp. 165–170. IEEE (2011)

  120. 120.

    Pan, W., Suen, C.Y., Bui, T.D.: Script identification using steerable gabor filters. In: Eighth International Conference on Document Analysis and Recognition (ICDAR’05), pp. 883–887. IEEE (2005)

  121. 121.

    Pati, P.B., Raju, S.S., Pati, N., Ramakrishnan, A.: Gabor filters for document analysis in indian bilingual documents. In: International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of, pp. 123–126. IEEE (2004)

  122. 122.

    Pati, P.B., Ramakrishnan, A.: Hvs inspired system for script identification in indian multi-script documents. In: International Workshop on Document Analysis Systems, pp. 380–389. Springer (2006)

  123. 123.

    Pati, P.B., Ramakrishnan, A.: Word level multi-script identification. Pattern Recogn. Lett. 29(9), 1218–1229 (2008)

    Google Scholar 

  124. 124.

    Patil, S.B., Subbareddy, N.: Neural network based system for script identification in indian documents. Sadhana 27(1), 83–97 (2002)

    Google Scholar 

  125. 125.

    Peake, G., Tan, T.: Script and language identification from document images. In: Proceedings Workshop on Document Image Analysis (DIA’97), pp. 10–17. IEEE (1997)

  126. 126.

    Peng, L., Liu, C., Ding, X., Wang, H.: Multilingual document recognition research and its application in china. In: Second International Conference on Document Image Analysis for Libraries (DIAL’06), pp. 7–pp. IEEE (2006)

  127. 127.

    Phan, T.Q., Shivakumara, P., Ding, Z., Lu, S., Tan, C.L.: Video script identification based on text lines. In: 2011 International Conference on Document Analysis and Recognition, pp. 1240–1244. IEEE (2011)

  128. 128.

    Philip, B., Samuel, R.S.: A novel bilingual ocr for printed malayalam-english text based on gabor features and dominant singular values. In: 2009 International Conference on Digital Image Processing, pp. 361–365. IEEE (2009)

  129. 129.

    Plamondon, R., Lorette, G.: Automatic signature verification and writer identification-the state of the art. Pattern Recogn. 22(2), 107–131 (1989)

    Google Scholar 

  130. 130.

    Rabby, A.S.A., Haque, S., Islam, S., Abujar, S., Hossain, S.A.: Bornonet: Bangla handwritten characters recognition using convolutional neural network. Proc. Comput. Sci. 143, 528–535 (2018)

    Google Scholar 

  131. 131.

    Raghunandan, K., Shivakumara, P., Roy, S., Kumar, G.H., Pal, U., Lu, T.: Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans. Circuits Syst. Video Technol. 29(4), 1145–1162 (2018)

    Google Scholar 

  132. 132.

    Rai, H., Yadav, A.: Iris recognition using combined support vector machine and hamming distance approach. Expert Syst. Appl. 41(2), 588–593 (2014)

    Google Scholar 

  133. 133.

    Rajput, G., Anita, H.: Handwritten script recognition at line level-a multiple feature based approach. Int. J. Eng. Innov. Technol. 3(4), 90–95 (2013)

    Google Scholar 

  134. 134.

    Ramteke, A.S., Rane, M.E.: A survey on offline recognition of handwritten devanagari script. Int. J. Sci. Eng. Res. 3(5), (2012)

  135. 135.

    Rani, R., Dhir, R., Lehal, G.S.: Performance analysis of feature extractors and classifiers for script recognition of english and gurmukhi words. In: Proceeding of the workshop on Document Analysis and Recognition, pp. 30–36 (2012)

  136. 136.

    Rani, R., Dhir, R., Lehal, G.S.: Script identification of pre-segmented multi-font characters and digits. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1150–1154. IEEE (2013)

  137. 137.

    Rao, G.S., Imanuddin, M., Harikumar, B.: Script identification of telugu, english and hindi document image. Int. J. Adv. Eng. Global Technol 2(2), 443–452 (2014)

    Google Scholar 

  138. 138.

    Razzak, M.I., Hussain, S., Sher, M.: Numeral recognition for urdu script in unconstrained environment. In: 2009 International Conference on Emerging Technologies, pp. 44–47. IEEE (2009)

  139. 139.

    Rezaee, H., Geravanchizadeh, M., Razzazi, F.: Automatic language identification of bilingual english and farsi scripts. In: 2009 International Conference on Application of Information and Communication Technologies, pp. 1–4. IEEE (2009)

  140. 140.

    Roy, K., Alaei, A., Pal, U.: Word-wise handwritten persian and roman script identification. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 628–633. IEEE (2010)

  141. 141.

    Roy, K., Banerjee, A., Pal, U.: A system for word-wise handwritten script identification for indian postal automation. In: Proceedings of the IEEE INDICON 2004. First India Annual Conference, 2004., pp. 266–271 (2004). https://doi.org/10.1109/INDICO.2004.1497753

  142. 142.

    Roy, K., Das, S.K., Obaidullah, S.M.: Script identification from handwritten document. In: 2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, pp. 66–69. IEEE (2011)

  143. 143.

    Roy, K., Majumder, K.: Trilingual script separation of handwritten postal document. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 693–700. IEEE (2008)

  144. 144.

    Roy, K., Pal, U., Chaudhuri, B.: Neural network based word-wise handwritten script identification system for indian postal automation. In: Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005., pp. 240–245. IEEE (2005)

  145. 145.

    Roy, P.P.: Center for visual information technology (cvit) - international institute of information technology, gachibowli, hyderabad. https://cvit.iiit.ac.in/research/resources

  146. 146.

    Roy, P.P.: Pattern recognition, image processing and machine learning (parimal) iit roorkee. http://parimal.iitr.ac.in/dataset

  147. 147.

    Saïdani, A., Echi, A.K., Belaid, A.: Identification of machine-printed and handwritten words in arabic and latin scripts. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 798–802. IEEE (2013)

  148. 148.

    Saidani, A., Kacem, A., Belaid, A.: Co-occurrence matrix of oriented gradients for word script and nature identification. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 16–20. IEEE (2015)

  149. 149.

    Samanta, O., Roy, A., Parui, S.K., Bhattacharya, U.: An hmm framework based on spherical-linear features for online cursive handwriting recognition. Inf. Sci. 441, 133–151 (2018)

    MathSciNet  MATH  Google Scholar 

  150. 150.

    Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: Word level script identification from bangla and devanagri handwritten texts mixed with roman script. arXiv preprint arXiv:1002.4007 (2010)

  151. 151.

    Sharma, M.K., Dhaka, V.P.: Offline scripting-free author identification based on speeded-up robust features. International Journal on Document Analysis and Recognition (IJDAR) 18(4), 303–316 (2015)

    Google Scholar 

  152. 152.

    Sharma, M.K., Dhaka, V.P.: Pixel plot and trace based segmentation method for bilingual handwritten scripts using feedforward neural network. Neural Comput. Appl. 27(7), 1817–1829 (2016)

    Google Scholar 

  153. 153.

    Sharma, M.K., Dhaka, V.P.: Segmentation of english offline handwritten cursive scripts using a feedforward neural network. Neural Comput. Appl. 27(5), 1369–1379 (2016)

    Google Scholar 

  154. 154.

    Sharma, N., Chanda, S., Pal, U., Blumenstein, M.: Word-wise script identification from video frames. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 867–871 (2013). https://doi.org/10.1109/ICDAR.2013.177

  155. 155.

    Sharma, N., Mandal, R., Sharma, R., Pal, U., Blumenstein, M.: Bag-of-visual words for word-wise video script identification: A study. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2015)

  156. 156.

    Sharma, N., Pal, U., Blumenstein, M.: A study on word-level multi-script identification from video frames. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 1827–1833. IEEE (2014)

  157. 157.

    Sharma, N., Shivakumara, P., Pal, U., Blumenstein, M., Tan, C.L.: A new method for word segmentation from arbitrarily-oriented video text lines. In: 2012 International Conference on Digital Image Computing Techniques and Applications (DICTA), pp. 1–8. IEEE (2012)

  158. 158.

    Shi, B., Bai, X., Yao, C.: Script identification in the wild via discriminative convolutional neural network. Pattern Recogn. 52, 448–458 (2016)

    Google Scholar 

  159. 159.

    Shi, B., Yao, C., Zhang, C., Guo, X., Huang, F., Bai, X.: Automatic script identification in the wild. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 531–535. IEEE (2015)

  160. 160.

    Shivakumara, P., Sharma, N., Pal, U., Blumenstein, M., Tan, C.L.: Gradient-angular-features for word-wise video script identification. In: 2014 22nd International Conference on Pattern Recognition, pp. 3098–3103. IEEE (2014)

  161. 161.

    Shivakumara, P., Yuan, Z., Zhao, D., Lu, T., Tan, C.L.: New gradient-spatial-structural features for video script identification. Comput. Vis. Image Underst. 130, 35–53 (2015)

    Google Scholar 

  162. 162.

    Singh, M.P., Dhaka, V.: Handwritten character recognition using modified gradient descent technique of neural networks and representation of conjugate descent for training patterns. International Journal of Engineering pp. 145–158 (2009)

  163. 163.

    Singh, P.K., Chatterjee, I., Sarkar, R.: Page-level handwritten script identification using modified log-gabor filter based features. In: 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS), pp. 225–230. IEEE (2015)

  164. 164.

    Singh, P.K., Sarkar, R., Nasipuri, M.: Offline script identification from multilingual indic-script documents: a state-of-the-art. Computer Science Review 15, 1–28 (2015)

    MathSciNet  Google Scholar 

  165. 165.

    Singhal, V., Navin, N., Ghosh, D.: Script-based classification of hand-written text documents in a multilingual environment. In: Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation, pp. 47–54. IEEE (2003)

  166. 166.

    Sinha, S., Pal, U., Chaudhuri, B.: Word–wise script identification from indian documents. In: International Workshop on Document Analysis Systems, pp. 310–321. Springer (2004)

  167. 167.

    Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282. IEEE (2016)

  168. 168.

    Thadchanamoorthy, S., Kodikara, N., Premaretne, H., Pal, U., Kimura, F.: Tamil handwritten city name database development and recognition for postal automation. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 793–797. IEEE (2013)

  169. 169.

    Tsai, M.J., Tao, Y.H., Yuadi, I.: Deep learning for printed document source identification. Sig. Process. Image Commun. 70, 184–198 (2019)

    Google Scholar 

  170. 170.

    Ubul, K., Tursun, G., Aysa, A., Impedovo, D., Pirlo, G., Yibulayin, T.: Script identification of multi-script documents: a survey. IEEE Access 5, 6546–6559 (2017)

    Google Scholar 

  171. 171.

    Ukil, S., Ghosh, S., Obaidullah, S.M., Santosh, K., Roy, K., Das, N.: Deep learning for word-level handwritten indic script identification. arXiv preprint arXiv:1801.01627 (2018)

  172. 172.

    Wang, X.Y., Wang, Q.Y., Yang, H.Y., Bu, J.: Color image segmentation using automatic pixel classification with support vector machine. Neurocomputing 74(18), 3898–3911 (2011)

    MATH  Google Scholar 

  173. 173.

    Xing, L., Qiao, Y.: Deepwriter: A multi-stream deep cnn for text-independent writer identification. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 584–589. IEEE (2016)

  174. 174.

    Zheng, Y., Iwana, B.K., Uchida, S.: Mining the displacement of max-pooling for text recognition. Pattern Recogn. 93, 558–569 (2019)

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Deepak Sinwar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sinwar, D., Dhaka, V.S., Pradhan, N. et al. Offline script recognition from handwritten and printed multilingual documents: a survey. IJDAR 24, 97–121 (2021). https://doi.org/10.1007/s10032-021-00365-5

Download citation

Keywords

  • Indic script identification
  • Script recognition
  • Support vector machine
  • Artificial neural network
  • Multi-layer perceptron
  • Nearest neighbor
  • Multilingual
  • Handwritten
  • k-NN