Abstract
One area of pattern recognition that is receiving a lot of attention recently is handwritten text recognition. Traditionally, handwritten text recognition systems have been modelled by means of HMM models and n-gram language models. The problem that n-grams present is that they are not able to capture long-term constraints of the sentences. Stochastic context-free grammars (SCFG) can be used to overcome this limitation by rescoring a n-best list generated with the HMM-based recognizer. Howerver, SCFG are known to have problems in the estimation of comlpex real tasks. In this work we propose the use of a combination of n-grams and category-based SCFG together with a word distribution into categories. The category-based approach is thought to simplify the SCFG inference process, while at the same time preserving the description power of the model. The results on the IAM-Database show that this combined scheme outperforms the classical scheme.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bozinovic, R.M., Srihari, S.N.: Off-line cursive script word recognition. IEEE Trans. Pattern Anal. Mach. Intell. 11(1), 68–83 (1989)
González, J., Salvador, I., Toselli, A.H., Juan, A., Vidal, E., Casacuberta, F.: Offline recognition of syntax-constrained cursive handwritten text. In: Amin, A., Pudil, P., Ferri, F.J., Iñesta, J.M. (eds.) SPR 2000 and SSPR 2000. LNCS, vol. 1876, pp. 143–153. Springer, Heidelberg (2000)
Yacoubi, A.E., Bertille, J.M., Gilloux, M.: Conjoined location and recognition of street names within a postal address delivery line. In: ICDAR ’95, Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 2, p. 1024. IEEE Computer Society Press, Washington (1995)
Dimauro, G., Impedovo, S.P., Salzo, G.: Automatic banckcheck processing: A new engineered system. International Journal of Pattern Recognition and Artificial Intelligence 11(4), 467–504 (1997)
Bahl, L.R., Jelinek, F., Mercer, R.L.: A maximum likelihood approach to continuous speech recognition. In: Readings in speech recognition, pp. 308–319 (1990)
Benedí, J., Sánchez, J.: Estimation of stochastic context-free grammars and their use as language models. Computer Speech and Language 19(3), 249–274 (2005)
Zimmermann, M., Chappelier, J.C.: Offline grammar-based recognition of handwritten sentences. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 818–821 (2006)
Bose, C.B., Kuo, S.S.: Connected and degraded text recognition using hidden markov model. Pattern Recognition 27(10), 1345–1363 (1994)
Ogawa, A., Takeda, K., Itakura, F.: Balancing acoustic and linguistic probabilities. In: ICASSP, vol. 1, pp. 181–184 (1998)
Toselli, A.H., Juan, A., Keysers, D., González, J., Salvador, I., Ney, H., Vidal, E., Casacuberta, F.: Integrated Handwriting Recognition and Interpretation using Finite-State Models. Int. Journal of Pattern Recognition and Artificial Intelligence 18(4), 519–539 (2004)
Gatos, B., Papamarkos, N., Chamzas, C.: Skew detection and text line position determination in digitized documents. Pattern Recognition 30(9), 1505–1519 (1997)
Pastor, M., Toselli, A.H., Romero, V., Vidal, E.: Improving handwritten off-line text slant correction. In: Proc. of The Sixth IASTED international Conference on Visualization, Imaging, and Image Processing (VIIP 06), Palma de Mallorca, Spain (2006)
Romero, V., Pastor, M., Toselli, A.H., Vidal, E.: Criteria for handwritten off-line text size normalization. In: Proc. of The Sixth IASTED international Conference on Visualization, Imaging, and Image Processing (VIIP 06), Palma de Mallorca, Spain (2006)
Marti, U.V., Bunke, H.: The iam-database: an english sentence database for off-line handwriting recognition. Int. Journal on Document Analysis and Recognition 5, 39–46 (2002)
Johansson, S., Leech, G.N., Goodluck, H.: Manual of Information to Accompany the Lancadster-Oslo/bergen Corpus of British English, for Use with Digital Computers. Dept. of Englis, Univ. of Oslo, Norway (1978)
Johansson, S., Atwell, E., Garside, R., Leech, G.: The Tagged LOB Corpus, User’s Manual. Bergen, Norway: Norwegian Computing Center for the Humanities (1986)
Garsid, R., Leech, G., Váradi, T.: Manual of Information for the Lancaster Parsed Corpus. Bergen, Norway: Norwegian Computing Center for the Humanities (1995)
Charniak, E.: http://www.cs.brown.edu/people/ec/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Romero, V., Alabau, V., Benedí, J.M. (2007). Combination of N-Grams and Stochastic Context-Free Grammars in an Offline Handwritten Recognition System. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2007. Lecture Notes in Computer Science, vol 4477. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72847-4_60
Download citation
DOI: https://doi.org/10.1007/978-3-540-72847-4_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72846-7
Online ISBN: 978-3-540-72847-4
eBook Packages: Computer ScienceComputer Science (R0)