Advertisement

Automatic Indic script identification from handwritten documents: page, block, line and word-level approach

  • Sk Md Obaidullah
  • K. C. Santosh
  • Chayan Halder
  • Nibaran Das
  • Kaushik Roy
Original Article

Abstract

Script identification is a well-studied problem in literature since last decade. Several methods for automatic script identification have been reported. All these methods consider a document as either at page, block, line or word-level, but no experimental/empirical conclusion has been provided in choosing the particular level of work. To address this, we have carried out a multi-level script identification experiment, i.e., the same document is considered at different levels namely: page, block, line and word for script identification. Two different types of features are considered: script dependent and script independent, which is computed at each level to categorize different scripts. The experiment is conducted on a newly created handwritten multi-script and multi-level dataset, where 5 blocks, 7.5 lines and 15 words are generated from a single page, on an average (440 pages, 2200 blocks, 3300 lines and 6600 words, in total). Finally, we conclude two major issues: (1) find an optimal level of work, i.e. page/block/line/word-level, (2) provide a qualitative measure of feature set on particular level of work considered.

Keywords

Handwritten script identification Multi-level framework Script dependent feature Script independent feature 

References

  1. 1.
  2. 2.
    Ghosh D, Dube T, Shivprasad SP (2010) Script recognition—a review. IEEE Trans Pattern Anal Mach Intell 32(12):2142–2161CrossRefGoogle Scholar
  3. 3.
    Obaidullah SM, Das SK, Roy K (2013) A system for handwritten script identification from Indian document. J Pattern Recognit Res 8:1–12Google Scholar
  4. 4.
    Obaidullah SM, Das N, Roy K (2014) Gabor filter based technique for offline Indic script identification from handwritten document images. In: International conference on devices, circuits and communications (ICDCCom-2014), pp 1–6Google Scholar
  5. 5.
    Obaidullah SM, Karim R, Shaikh S, Halder C, Das N, Roy K (2015) Transform based approach for Indic script identification from handwritten document images. In: 3rd International conference on signal processing, communications and networking, pp 1–7Google Scholar
  6. 6.
    Singh PK, Chatterjee I, Sarkar R (2015) Page-level handwritten script identification using modified log-Gabor filter based features. In: IEEE 2nd international conference on recent trends in information systems, pp 225–230Google Scholar
  7. 7.
    Basu S, Das N, Sarkar R, Kundu M, Nasipuri M, Basu DK (2010) A novel framework for automatic sorting of postal documents with multi-script address blocks. Pattern Recognit 43(10):3507–3521CrossRefMATHGoogle Scholar
  8. 8.
    Rajput G, Anita HB (2010) Handwritten script recognition using DCT and wavelet features at block level. Int J Comput Appl Spec Issue Recent Trends Image Process Pattern Recognit 3:158–163Google Scholar
  9. 9.
    Obaidullah SM, Halder C, Das N, Roy K (2015) An approach for automatic Indic script identification from handwritten document images. In: 2nd doctoral symposium on applied computation and security systems, pp 37–51Google Scholar
  10. 10.
    Hangarge M, Santosh KC, Pardeshi R (2013) Directional discrete cosine transform for handwritten script identification. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 344–348Google Scholar
  11. 11.
    Pardeshi R, Chaudhuri BB, Hangarge M, Santosh KC (2014) Automatic handwritten Indian scripts identification. In: 2014 14th international conference on frontiers in handwriting recognition, pp 375–380Google Scholar
  12. 12.
    Singh PK, Sarkar R, Nasipuri M, Doermann D (2015) Word-level script identification for handwritten Indic scripts. In: 13th international conference on document analysis and recognition, pp 1106–1110Google Scholar
  13. 13.
    Obaidullah SM, Halder C, Das N, Roy K (2015) Numeral script identification from handwritten document images. Procedia Comput Sci J 54C:585–594CrossRefGoogle Scholar
  14. 14.
    Hochberg J, Bowers K, Cannon M, Kelly P (1999) Script and language identification for handwritten document images. Int J Doc Anal Recognit 2(2/3):45–52CrossRefGoogle Scholar
  15. 15.
    Zhu G, Yu X, Li Y, Doermann D (2009) Language identification for handwritten document images using a shape codebook. Pattern Recognit 42:3184–3191CrossRefMATHGoogle Scholar
  16. 16.
    Kanoun S, Ennaji A, Courtier YL, Alimi AM (2002) Script and nature differentiation for arabic and latin text images. In: 8th international workshop on frontiers in handwriting recognition (IWFHR), pp 309–313Google Scholar
  17. 17.
    Singhal V, Navin N, Ghosh D (2003) Script-based classification of hand-written text documents in a multi-lingual environment. In: 13th international workshop on research issues in data engineering: multi-lingual information management, pp 47–54Google Scholar
  18. 18.
    Zhou L, Lu Y, Tan CL (2006) Bangla/English script identification based on analysis of connected component profiles. In: 2nd international workshop on document analysis systems, pp 243–254Google Scholar
  19. 19.
    Hangarge M, Dhandra BV (2010) Offline handwritten script identification in document images. Int J Comput Appl 4(6):6–10Google Scholar
  20. 20.
    Obaidullah SM, Halder C, Das N, Roy K (2015) Indic script identification from handwritten document images—an unconstrained block-level approach. In: IEEE 2nd international conference on recent trends in information systems, pp 213–218Google Scholar
  21. 21.
    Moussa SB, Zahour A, Benabdelhafid A, Alimi AM (2008) Fractal-based system for Arabic/Latin, printed/handwritten script identification. In: International conference on pattern recognition, pp 1–4Google Scholar
  22. 22.
    Rajput GG, Anita HB (2011) Handwritten script identification from a bi-script document at line level using gabor filter. In: International workshop on soft computing applications and knowledge discovery, pp 94–101Google Scholar
  23. 23.
    Roy K, Banerjee A, Pal U (2004) A system for word wise handwritten script identification for indian postal automation. In: IEEE India annual conference, pp 266–271Google Scholar
  24. 24.
    Roy K, Pal U, Chaudhuri BB (2005) Neural network based word-wise handwritten script identification system for Indian postal automation. In: International conference on intelligent sensing and information processing, pp 240–245Google Scholar
  25. 25.
    Roy K, Pal U (2006) Word-wise hand-written script separation for Indian postal automation. In: 10th International workshop on frontiers in handwriting recognition (IWFHR), pp 521–526Google Scholar
  26. 26.
    Benjelil M, Kanoun S, Mullot R, Alimi AM (2009) Arabic and Latin script identification in printed and handwritten types based on steerable pyramid features. In: Steerable pyramid features, international conference on document analysis and recognition (ICDAR), pp 591–595Google Scholar
  27. 27.
    Roy K, Alaei A, Pal U (2010) Word-wise handwritten Persian and Roman script identification. In: 12th international conference on frontiers in handwriting recognition (ICFHR), pp 628–633Google Scholar
  28. 28.
    Sarkar R, Das N, Basu S, Kundu M, Nasipuri M, Basu DK (2010) Word level script identification from Bangla and Devanagri handwritten texts mixed with Roman script. J Comput 2(2):103–108Google Scholar
  29. 29.
    Chanda S, Franke K, Pal U (2011) Identification of Indic scripts on torn-documents. In: International conference on document analysis and recognition, pp 713–717Google Scholar
  30. 30.
    Singh PK, Sarkar R, Das N, Basu S, Nasipuri M (2013) Identification of Devnagari and Roman scripts from multi-script handwritten documents. In: 5th International conference pattern recognition and machine intelligence, pp 509–514Google Scholar
  31. 31.
    Dey N, Ashoura A, Hassanien A (2017) Feature detectors and descriptors generations with numerous images and video applications: a recap. In: Handbook of research on applied video processing and mining, pp 36–65Google Scholar
  32. 32.
    Obaidullah SM, Roy K, Das N (2013) Comparison of different classifiers for script identification from handwritten document. In: 2013 IEEE International Conference Signal Processing, Computing and Control, ISPCC, pp 0–5Google Scholar
  33. 33.
    Obaidullah SM, Goswami C, Santosh KC, Halder C, Das N, Roy K (2017) Separating Indic scripts with ‘matra’ for effective handwritten script identification in multi-script documents. Int J Artif Intell Pattern Recognit 31(4):1753003CrossRefGoogle Scholar
  34. 34.
    Chacko BP, Krishnan VRV, Raju G, Anto PB (2012) Handwritten character recognition using wavelet energy and extreme learning machine. Int J Mach Learn Cybern 3(2):149–161CrossRefGoogle Scholar
  35. 35.
    Saba T, Rehman A (2013) Effects of artificially intelligent tools on pattern recognition. Int J Mach Learn Cybern 4(2):155–162CrossRefGoogle Scholar
  36. 36.
    AlShahrani A, Al-Abadi M, Al-Malki A, Ashour A, Dey N (2016) Automated system for crops recognition and classification. In: Handbook of research on applied video processing and mining, pp 54–69Google Scholar
  37. 37.
    Hore S et al (2016) Neural-based prediction of structural failure of multistoried RC buildings. Struct Eng Mech 58(3):459–473CrossRefGoogle Scholar
  38. 38.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefMATHGoogle Scholar
  39. 39.
    Sarkar R, Das N, Basu S, Kundu M, Nasipuri M, Basu DK (2012) CMATERdb1: a database of unconstrained handwritten Bangla and Bangla-English mixed script document image. Int J Doc Anal Recognit 15(1):71–83CrossRefGoogle Scholar
  40. 40.
    Aleai A, Nagabhushan P, Pal U (2011) A benchmark Kannada handwritten document dataset and its segmentation. In: International conference on document analysis and recognition (ICDAR), pp 140–145Google Scholar
  41. 41.
    Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATHGoogle Scholar
  42. 42.
    Huang GB, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529CrossRefGoogle Scholar
  43. 43.
    Liu P, Huang Y, Meng L, Gong S, Zhang G (2016) Two-stage extreme learning machine for high-dimensional data. Int J Mach Learn Cybern 7(5):765–772CrossRefGoogle Scholar
  44. 44.
    Li J, Mei X, Prokhorov D, Tao D (2017) Deep neural network for structural prediction and lane detection in traffic scene. IEEE Trans Neural Netw Learn Syst 28(3):690–703CrossRefGoogle Scholar
  45. 45.
    Fang Y, Liu ZH, Min F (2016) Multi-objective cost-sensitive attribute reduction on data with error ranges. Int J Mach Learn Cybern 7(5):783–793CrossRefGoogle Scholar
  46. 46.
    Abdessalem W, Ashour A, Sassi D, Roy P, Kausar N, Dey N (2015) MEDLINE text mining: an enhancement genetic algorithm based approach for document clustering. In: Applications of intelligent optimization in biology and medicine, Springer, pp 267–287Google Scholar
  47. 47.
    Acharjya D, Anitha A (2017) A comparative study of statistical and rough computing models in predictive data analysis. IJACI 8(2):32–35Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringAliah UniversityKolkataIndia
  2. 2.Department of Computer ScienceUniversity of South DakotaVermillionUSA
  3. 3.Department of Computer ScienceWest Bengal State UniversityKolkataIndia
  4. 4.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia

Personalised recommendations