Spontaneous Handwriting Text Recognition and Classification Using Finite-State Models
Finite-state models are used to implement a handwritten text recognition and classification system for a real application entailing casual, spontaneous writing with large vocabulary. Handwritten short phrases which involve a wide variety of writing styles and contain many non-textual artifacts, are to be classified into a small number of predefined classes. To this end, two different types of statistical framework for phrase recognition-classification are considered, based on finite-state models. HMMs are used for text recognition process. Depending to the considered architecture, N-grams are used for performing text recognition and then text classification (serial approach) or for performing both simultaneously (integrated approach). The multinomial text classifier is also employed in the classification phase of the serial approach. Experimental results are reported which, given the extreme difficulty of the task, are encouraging.
KeywordsBlank Space Word Error Rate Handwriting Recognition Recognition Phase Serial Approach
Unable to display preview. Download preview PDF.
- 2.Bazzi, I., Schwartz, R., Makhoul, J.: An Omnifont Open-Vocabulary OCR System for English and Arabic. IEEE Trans. on PAMI 21, 495–504 (1999)Google Scholar
- 3.González, J., Salvador, I., Toselli, A.H., Juan, A., Vidal, E., Casacuberta, F.: Offline Recognition of Syntax-Constrained Cursive Handwritten Text. In: Proc. of the S+SSPR 2000, Alicante (Spain), pp. 143–153 (2000)Google Scholar
- 6.Toselli, A.H., Juan, A., Vidal, E.: Spontaneous Handwriting Recognition and Classification. In: Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, United Kingdom, vol. 1, pp. 433–436 (2004)Google Scholar
- 7.Juan, A., Ney, H.: Reversing and Smoothing the Multinomial Naive Bayes Text Classifier. In: Proc. of the 2nd Int. Workshop on Pattern Recognition in Information Systems (PRIS 2002), Alacant (Spain), pp. 200–212 (2002)Google Scholar
- 9.Cavnar, W.B., Trenkle, J.M.: n-gram-based text categorization. In: Proc. of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1994), Las Vegas, Nevada, U.S.A, pp. 161–175 (1994)Google Scholar
- 10.Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)Google Scholar
- 12.Witten, I.H., Bell, T.C.: The Zero-Frequency Problem: Estimating the Probabilities of Novel Events in Adaptive Text Compression. IEEE Trans. on Information Theory 17 (1991)Google Scholar