Multi-font Script Identification Using Texture-Based Features
The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed.
KeywordsGaussian Mixture Model Document Image Optical Character Recognition Linear Discriminate Function Training Observation
Unable to display preview. Download preview PDF.
- 3.Peake, G., Tan, T.: Script and language identification from document images. In: BSDIA 1997, 1st edn., pp. 10–17 (1997)Google Scholar
- 4.Busch, A., Boles, W.W., Sridharan, S., Chandran, V.: Texture analysis for script recognition. In: Proceedings of IVCNZ, pp. 289–293 (2001)Google Scholar
- 6.Greenspan, H., Belongie, S., Goodman, R., Perona, P.: Rotation invariant texture recognition using a steerable pyramid. In: Proceedings of 12th International Conference on Pattern Recognition, 2nd edn., Jerusalem, Israel, pp. 162–167 (1994)Google Scholar
- 7.Busch, A., Boles, W.W., Sridharan, S.: Logarithmic quantisation of wavelet coefficients for improved texture classification performance. In: Proceedings of ICASSP (2004)Google Scholar
- 11.Younis, K.S., DeSimio, M.P., Rogers, S.K.: A new algorithm for detecting the optimal number of substructures in the data. In: Proceedings of the IEEE Aerospace and Electronis Conference, vol. 1, pp. 503–507 (1997)Google Scholar
- 12.Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: Proceedings of EUROSPEECH, vol. 2, pp. 963–970 (1997)Google Scholar
- 14.Lee, C.-H., Lin, C.-H., Juang, B.-H.: A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Transactions on Acoustics, Speech and Signal Processing 39(4), 806–814 (1991)Google Scholar