Skip to main content

Multi-font Script Identification Using Texture-Based Features

  • Conference paper
Image Analysis and Recognition (ICIAR 2006)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4142))

Included in the following conference series:

Abstract

The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Spitz, A.L.: Determination of the script and language content of document images. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(3), 235–245 (1997)

    Article  Google Scholar 

  2. Julesz, B.: Visual pattern discrimination. IRE Transactions on Information Theory 8, 84–92 (1962)

    Article  Google Scholar 

  3. Peake, G., Tan, T.: Script and language identification from document images. In: BSDIA 1997, 1st edn., pp. 10–17 (1997)

    Google Scholar 

  4. Busch, A., Boles, W.W., Sridharan, S., Chandran, V.: Texture analysis for script recognition. In: Proceedings of IVCNZ, pp. 289–293 (2001)

    Google Scholar 

  5. Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Transactions on Systems, Man and Cybernetics 3, 610–621 (1973)

    Article  Google Scholar 

  6. Greenspan, H., Belongie, S., Goodman, R., Perona, P.: Rotation invariant texture recognition using a steerable pyramid. In: Proceedings of 12th International Conference on Pattern Recognition, 2nd edn., Jerusalem, Israel, pp. 162–167 (1994)

    Google Scholar 

  7. Busch, A., Boles, W.W., Sridharan, S.: Logarithmic quantisation of wavelet coefficients for improved texture classification performance. In: Proceedings of ICASSP (2004)

    Google Scholar 

  8. Van de Wouwer, G., Scheunders, P., Van Dyck, D.: Statistical texture characterization from discrete wavelet representations. IEEE Transactions on Image Processing 8(4), 592–598 (1999)

    Article  Google Scholar 

  9. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Inc., New York (2001)

    MATH  Google Scholar 

  10. Kass, R.E., Raftery, A.E.: Bayes factors. Journal of the American Statistical Association 90, 773–795 (1994)

    Article  Google Scholar 

  11. Younis, K.S., DeSimio, M.P., Rogers, S.K.: A new algorithm for detecting the optimal number of substructures in the data. In: Proceedings of the IEEE Aerospace and Electronis Conference, vol. 1, pp. 503–507 (1997)

    Google Scholar 

  12. Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: Proceedings of EUROSPEECH, vol. 2, pp. 963–970 (1997)

    Google Scholar 

  13. Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, San Diego (1990)

    MATH  Google Scholar 

  14. Lee, C.-H., Lin, C.-H., Juang, B.-H.: A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Transactions on Acoustics, Speech and Signal Processing 39(4), 806–814 (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Busch, A. (2006). Multi-font Script Identification Using Texture-Based Features. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2006. Lecture Notes in Computer Science, vol 4142. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11867661_76

Download citation

  • DOI: https://doi.org/10.1007/11867661_76

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44894-5

  • Online ISBN: 978-3-540-44896-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics