Skip to main content

Language-Based Classification of Document Images Using Hybrid Texture Features

  • Chapter
  • First Online:
Advances in Biometrics

Abstract

Document image understanding has attracted the research community since two and half decades. Multilingual documents in the database require an automatic classification technique for browsing and sorting. This chapter provides an introduction to language-based classification of documents and review about the methods used for language detection. This chapter also proposes a segmentation-free technique for classification of document images based on the language used. A hybrid texture feature-extraction scheme using stationary wavelet transform (SWT) and histogram of oriented gradients (HOG) is presented. The multi-class support vector machine (SVM) is employed for classification of documents. The presented method is investigated on a database of 1006 document images consisting of Kannada, Telugu, Marathi, Hindi, and English language. It has shown better results compared with existing techniques. An average detection rate of 87.02% is obtained using the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. G. Nagy, Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)

    Article  Google Scholar 

  2. S. Chaudhury, G. Harit, S. Madnani, R.B. Shet, Identification of scripts of Indian languages by combining trainable classifiers, in ICVGIP 2000 (2000), pp. 20–22

    Google Scholar 

  3. A. Kulkarni, P. Upparamani, R. Kadkol, P. Tergundi, Script identification from multilingual text documents. Int. J. Adv. Res. Comput. Commun. Eng. 4(6), 15–19 (2015)

    CAS  Google Scholar 

  4. M.C. Padma, P.A. Vijaya, Script identification from trilingual documents using profile based features. Int. J. Comput. Sci. Appl. 7(4), 16–33 (2010)

    Google Scholar 

  5. U. Pal, B.B. Chaudhuri, Automatic identification of english, chinese, arabic, devnagari and bangla script line, in Proceedings of Sixth International Conference on Document Analysis and Recognition (2001), pp. 790–794

    Google Scholar 

  6. G.G. Rajput, H.B. Anita, Handwritten script recognition using DCT and wavelet features at block level. Int. J. Comput. Appl., Special issue on RTIPPR (3), 158–163 (2010)

    Google Scholar 

  7. S.M. Obaidullah, A. Mondal, N. Das, K. Roy, Script identification from printed Indian document images and performance evaluation using different classifiers. Appl. Comput. Intell. Soft Comput. 2014, 1–12 (2014)

    Article  Google Scholar 

  8. M.S. Shirdhonkar, M.B. Kokare (2010). Discrimination between printed and handwritten text in documents, in IJCA Special Issue on Recent Trends in Image Processing and Pattern Recognition, pp. 131134

    Google Scholar 

  9. R. Pardeshi, B.B. Chaudhuri, M. Hangarge, K.C. Santosh, Automatic handwritten Indian scripts identification, in IEEE 14th International Conference on Frontiers in Handwriting Recognition (September 2014), pp. 375–380

    Google Scholar 

  10. C.L. Tan, W. Huang, S.Y. Sung, Z. Yu, Y. Xu, Text retrieval from document images based on word shape analysis. Appl. Intell. 18(3), 257–270 (2003)

    Article  Google Scholar 

  11. A.S. Wanchoo, P. Yadav, A. Anuse, A survey on Devanagari character recognition for Indian postal system automation. Int. J. Appl. Eng. Res. 11(6), 4529–4536 (2016)

    Google Scholar 

  12. P. Sahare, S.B. Dhok, Script identification algorithms: A survey. Int. J. Multimed. Inf. Retr. 6(3), 211–232 (2017)

    Article  Google Scholar 

  13. U.D. Dixit, M.S. Shirdhonkar, A survey on document image analysis and retrieval system. Int. J. Cybern. Informat. 4(2), 259–270 (2015)

    Article  Google Scholar 

  14. S.A.A.A. Arani, E. Kabir, R. Ebrahimpour, Handwritten Farsi word recognition using NN-based fusion of HMM classifiers with different types of features. Int. J. Image Graph. 19(1), 1–21 (2019)

    Article  Google Scholar 

  15. N. Bi, J. Chen, J. Tan, The handwritten Chinese character recognition uses convolutional neural networks with the GoogLeNet. Intern. J. Pattern Recognit. Artif. Intell. 33(11), 1–12 (2019)

    Article  Google Scholar 

  16. C. Djeddi, I. Siddiqi, L. Souici-Meslati, A. Ennaji, Text-independent writer recognition using multi-script handwritten texts. Pattern Recognit. Lett. 34(10), 1196–1202 (2013)

    Article  Google Scholar 

  17. U.D. Dixit, M.S. Shirdhonkar, Fingerprint-based document image retrieval. Int. J. Image Graph. 19(2), 1–17 (2019)

    Article  Google Scholar 

  18. P.P. Roy, A.K. Bhunia, A. Das, P. Dey, U. Pal, HMM-based Indic handwritten word recognition using zone segmentation. Pattern Recognit. 60, 1057–1075 (2016)

    Article  Google Scholar 

  19. U.D. Dixit, M.S. Shirdhonkar, Preprocessing framework for document image analysis. Int. J. Adv. Netw. Appl. 10(4), 3911–3918 (2019)

    Google Scholar 

  20. A. Bultheel, Learning to swim in a sea of wavelets. Bull. Belg. Math. Soc. Simon Stevin 2(1), 1–45 (1995)

    Google Scholar 

  21. S.G. Chang, B. Yu, M. Vetterli, Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Process. 9(9), 1532–1546 (2000)

    Article  CAS  Google Scholar 

  22. A.N. Akansu, Y. Liu, On-signal decomposition techniques. Opt. Eng. 30(7), 912–921 (1991)

    Article  Google Scholar 

  23. M.J. Shensa, The discrete wavelet transform: wedding the a trous and Mallat algorithms. IEEE Trans. Signal Process. 40(10), 2464–2482 (1992)

    Article  Google Scholar 

  24. M.V. Tazebay, A.N. Akansu, Progressive optimality in hierarchical filter banks, in Proceedings of 1st International Conference on Image Processing, vol. 1 (Nov 1994), pp. 825–829

    Google Scholar 

  25. M.V. Tazebay, A.N. Akansu, Adaptive subband transforms in time-frequency excisers for DSSS communications systems. IEEE Trans. Signal Process. 43(11), 2776–2782 (1995)

    Article  Google Scholar 

  26. M. Holschneider, R. Kronland-Martinet, J. Morlet, P. Tchamitchian, A real-time algorithm for signal analysis with the help of the wavelet transform, in Wavelets, (Springer, Berlin, Heidelberg, 1990), pp. 286–297

    Chapter  Google Scholar 

  27. Y. Zhang, S. Wang, Y. Huo, L. Wu, A. Liu, Feature extraction of brain MRI by stationary wavelet transform and its applications. J. Biol. Syst. 18, 115–132 (2010)

    Article  Google Scholar 

  28. N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in International Conference on Computer Vision & Pattern Recognition (CVPR’05), vol. 1 (2005), pp. 886–893

    Google Scholar 

  29. Y. Zhao, Z. Song, X. Wu, Hand detection using multi-resolution HOG features, in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO) (Dec 2012), pp. 1715–1720

    Google Scholar 

  30. Y. Zhao, Y. Zhang, R. Cheng, D. Wei, G. Li, An enhanced histogram of oriented gradients for pedestrian detection. IEEE Intell. Transp. Syst. Mag. 7(3), 29–38 (2015)

    Article  Google Scholar 

  31. X.Y. Li, Z.X. Lin, Face recognition based on HOG and fast PCA algorithm, in The Euro-China Conference on Intelligent Data Analysis and Applications, (Springer, Spain, 2017), pp. 10–21

    Google Scholar 

  32. J. Pan, Y. Zhuang, S. Fong, The impact of data normalization on stock market prediction: using SVM and technical indicators, in International Conference on Soft Computing in Data Science, (Springer, Malaysia, 2016), pp. 72–88

    Chapter  Google Scholar 

  33. V. Vapnik, The Nature of Statistical Learning Theory (Springer Science & Business Media, 2013)

    Google Scholar 

  34. Mäenpaa Topi, Matti Pietikäinen. (2005) Texture analysis with local binary patterns, Handbook of Pattern Recognition and Computer Vision, (pp. 197–216), Singapore: World Scientific Publishing

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dixit, U.D., Shirdhonkar, M.S. (2019). Language-Based Classification of Document Images Using Hybrid Texture Features. In: Sinha, G. (eds) Advances in Biometrics. Springer, Cham. https://doi.org/10.1007/978-3-030-30436-2_9

Download citation

Publish with us

Policies and ethics