A Language Model for Improving the Graph-Based Transcription Approach for Historical Documents

Meza-Lovón, Graciela Lecireth

doi:10.1007/978-3-319-12027-0_19

Graciela Lecireth Meza-Lovón⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8864))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1630 Accesses

Abstract

Language Models (LMs) capture the contextual dependencies of a language and assign higher probabilities to well-formed sequences of words. For that reason, LMs have been commonly used in generic handwriting recognition, improving recognition results. In this paper, we present the integration of a Language Model along with a dictionary into a graph-based recognizer, which aims at transcribing handwritten historical documents. The results of such integration show a significant improvement on word accuracy when applied to our corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Burger, T., Kessentini, Y., Paquet, T.: Dempster-shafer based rejection strategy for handwritten word recognition. In: Proc. 2011 Int. Conf. on Document Analysis and Recognition (ICDAR 2011), pp. 528–532 (2011)
Google Scholar
Chowdhury, S., Garain, U., Chattopadhyay, T.: A weighted finite-state transducer (wfst)-based language model for online indic script handwriting recognition. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 599–602 (September 2011)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Maching Learning 20(3), 273–297 (1995)
MATH Google Scholar
Fischer, A., Frinken, V., Bunke, H., Suen, C.Y.: Improving hmm-based keyword spotting with character language models. In: ICDAR, pp. 506–510 (2013)
Google Scholar
Frinken, V., Bunke, H.: Self-training for handwritten text line recognition. In: Bloch, I., Cesar Jr, R.M. (eds.) CIARP 2010. LNCS, vol. 6419, pp. 104–112. Springer, Heidelberg (2010)
Chapter Google Scholar
Frinken, V., Fischer, A., Bunke, H.: Combining neural networks to improve performance of handwritten keyword spotting. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 215–224. Springer, Heidelberg (2010)
Chapter Google Scholar
Frinken, V., Fischer, A., Bunke, H., Fornés, A.: Co-training for handwritten word recognition. In: Proc. 2011 Int. Conf. on Document Analysis and Recognition (ICDAR 2011), pp. 314–318 (2011)
Google Scholar
Fujisawa, Y., Shi, M., Wakabayashi, T., Kimura, F.: Handwritten numeral recognition using gradient and curvature of gray scale image. In: Proc. 5th Int. Conf. on Document Analysis and Recognition (ICDAR 1999), pp. 277–300 (1999)
Google Scholar
He, C.L.: Error Analysis of a Hybrid Multiple Classifier System for Recognizing Unconstrained Handwritten Numerals. PhD thesis, Computer Science Department, Concordia University, Montreal, Canada (September 2010)
Google Scholar
Leydier, Y., Lebourgeois, F., Emptoz, H.: Omnilingual Segmentation-freeWord Spotting for Ancient Manuscripts Indexation. In: Proc. 8th Int. Conf. on Document Analysis and Recognition (ICDAR 2005), pp. 533–537 (2005)
Google Scholar
Leydier, Y., Lebourgeois, F., Emptoz, H.: Text search for medieval manuscript images. Pattern Recogntion 40(12), 3552–3567 (2007)
Article MATH Google Scholar
Leydier, Y., Ouji, A., LeBourgeois, F., Emptoz, H.: Towards an omnilingual word retrieval system for ancient manuscripts. Pattern Recognition 42(9), 2089–2105 (2009)
Article MATH Google Scholar
Liwicki, M., Bunke, H.: Feature selection for HMM and BLSTM based handwriting recognition of whiteboard notes. Int. Journal on Pattern Recognition and Artificial Intelligence 23(5), 907–923 (2009)
Article Google Scholar
Meza-Lovón, G.L.: A graph-based approach for transcribing ancient documents. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.) IBERAMIA 2012. LNCS, vol. 7637, pp. 210–220. Springer, Heidelberg (2012)
Chapter Google Scholar
Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proc. 7th Int. Conf. on Document Analysis and Recognition (ICDAR 2003), pp. 218–222. IEEE Computer Society (2003)
Google Scholar
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 521–527 (2003)
Google Scholar
Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. Journal on Document Analysis and Recognition, 139–152 (2007)
Google Scholar
Romero, V., Andreu Sanchez, J.: Category-based language models for handwriting recognition of marriage license books. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 788–792 (August 2013)
Google Scholar
Romero, V., Pastor, M.: Computer Assisted Transcription of Text Images. In: Multimodal Interactive Pattern Recognition and Applications. Springer (2011)
Google Scholar
Romero, V., Rodríguez-Ruiz, L.: Computer Assisted Transcription: General Framework. In: Multimodal Interactive Pattern Recognition and Applications. Springer (2011)
Google Scholar
Roy, U., Sankaran, N., Sankar, K., Jawahar, C.: Character n-gram spotting on handwritten documents using weakly-supervised segmentation. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 577–581 (August 2013)
Google Scholar
Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: Holistic Urdu handwritten word recognition using support vector machine. In: Proc. of the 9th International Conference on Pattern Recognition (ICPR 2010), pp. 1900–1903 (2010)
Google Scholar
Toselli, A.H., Romero, V., Pastor, M., Vidal, E.: Multimodal interactive transcription of text images. Pattern Recognition 43(5), 1814–1825 (2010)
Article MATH Google Scholar
Wang, Q.-F., Yin, F., Liu, C.-L.: Integrating language model in handwritten chinese text recognition. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 1036–1040 (July 2009)
Google Scholar
Zhang, H., Zhou, X.-D., Liu, C.-L.: Keyword spotting in online chinese handwritten documents with candidate scoring based on semi-crf model. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 567–571 (August 2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad La Salle, Arequipa, Perú
Graciela Lecireth Meza-Lovón

Authors

Graciela Lecireth Meza-Lovón
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Graciela Lecireth Meza-Lovón .

Editor information

Editors and Affiliations

Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Ana L.C. Bazzan
Pontifica Universidad Católica (PUC), Santiago de Chile, Chile
Karim Pichara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meza-Lovón, G.L. (2014). A Language Model for Improving the Graph-Based Transcription Approach for Historical Documents. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-12027-0_19
Published: 12 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12026-3
Online ISBN: 978-3-319-12027-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics