Skip to main content

A Language Model for Improving the Graph-Based Transcription Approach for Historical Documents

  • Conference paper
  • First Online:
Advances in Artificial Intelligence -- IBERAMIA 2014 (IBERAMIA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8864))

Included in the following conference series:

  • 1630 Accesses

Abstract

Language Models (LMs) capture the contextual dependencies of a language and assign higher probabilities to well-formed sequences of words. For that reason, LMs have been commonly used in generic handwriting recognition, improving recognition results. In this paper, we present the integration of a Language Model along with a dictionary into a graph-based recognizer, which aims at transcribing handwritten historical documents. The results of such integration show a significant improvement on word accuracy when applied to our corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burger, T., Kessentini, Y., Paquet, T.: Dempster-shafer based rejection strategy for handwritten word recognition. In: Proc. 2011 Int. Conf. on Document Analysis and Recognition (ICDAR 2011), pp. 528–532 (2011)

    Google Scholar 

  2. Chowdhury, S., Garain, U., Chattopadhyay, T.: A weighted finite-state transducer (wfst)-based language model for online indic script handwriting recognition. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 599–602 (September 2011)

    Google Scholar 

  3. Cortes, C., Vapnik, V.: Support-vector networks. Maching Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  4. Fischer, A., Frinken, V., Bunke, H., Suen, C.Y.: Improving hmm-based keyword spotting with character language models. In: ICDAR, pp. 506–510 (2013)

    Google Scholar 

  5. Frinken, V., Bunke, H.: Self-training for handwritten text line recognition. In: Bloch, I., Cesar Jr, R.M. (eds.) CIARP 2010. LNCS, vol. 6419, pp. 104–112. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Frinken, V., Fischer, A., Bunke, H.: Combining neural networks to improve performance of handwritten keyword spotting. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 215–224. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Frinken, V., Fischer, A., Bunke, H., Fornés, A.: Co-training for handwritten word recognition. In: Proc. 2011 Int. Conf. on Document Analysis and Recognition (ICDAR 2011), pp. 314–318 (2011)

    Google Scholar 

  8. Fujisawa, Y., Shi, M., Wakabayashi, T., Kimura, F.: Handwritten numeral recognition using gradient and curvature of gray scale image. In: Proc. 5th Int. Conf. on Document Analysis and Recognition (ICDAR 1999), pp. 277–300 (1999)

    Google Scholar 

  9. He, C.L.: Error Analysis of a Hybrid Multiple Classifier System for Recognizing Unconstrained Handwritten Numerals. PhD thesis, Computer Science Department, Concordia University, Montreal, Canada (September 2010)

    Google Scholar 

  10. Leydier, Y., Lebourgeois, F., Emptoz, H.: Omnilingual Segmentation-freeWord Spotting for Ancient Manuscripts Indexation. In: Proc. 8th Int. Conf. on Document Analysis and Recognition (ICDAR 2005), pp. 533–537 (2005)

    Google Scholar 

  11. Leydier, Y., Lebourgeois, F., Emptoz, H.: Text search for medieval manuscript images. Pattern Recogntion 40(12), 3552–3567 (2007)

    Article  MATH  Google Scholar 

  12. Leydier, Y., Ouji, A., LeBourgeois, F., Emptoz, H.: Towards an omnilingual word retrieval system for ancient manuscripts. Pattern Recognition 42(9), 2089–2105 (2009)

    Article  MATH  Google Scholar 

  13. Liwicki, M., Bunke, H.: Feature selection for HMM and BLSTM based handwriting recognition of whiteboard notes. Int. Journal on Pattern Recognition and Artificial Intelligence 23(5), 907–923 (2009)

    Article  Google Scholar 

  14. Meza-Lovón, G.L.: A graph-based approach for transcribing ancient documents. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.) IBERAMIA 2012. LNCS, vol. 7637, pp. 210–220. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proc. 7th Int. Conf. on Document Analysis and Recognition (ICDAR 2003), pp. 218–222. IEEE Computer Society (2003)

    Google Scholar 

  16. Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 521–527 (2003)

    Google Scholar 

  17. Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. Journal on Document Analysis and Recognition, 139–152 (2007)

    Google Scholar 

  18. Romero, V., Andreu Sanchez, J.: Category-based language models for handwriting recognition of marriage license books. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 788–792 (August 2013)

    Google Scholar 

  19. Romero, V., Pastor, M.: Computer Assisted Transcription of Text Images. In: Multimodal Interactive Pattern Recognition and Applications. Springer (2011)

    Google Scholar 

  20. Romero, V., Rodríguez-Ruiz, L.: Computer Assisted Transcription: General Framework. In: Multimodal Interactive Pattern Recognition and Applications. Springer (2011)

    Google Scholar 

  21. Roy, U., Sankaran, N., Sankar, K., Jawahar, C.: Character n-gram spotting on handwritten documents using weakly-supervised segmentation. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 577–581 (August 2013)

    Google Scholar 

  22. Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: Holistic Urdu handwritten word recognition using support vector machine. In: Proc. of the 9th International Conference on Pattern Recognition (ICPR 2010), pp. 1900–1903 (2010)

    Google Scholar 

  23. Toselli, A.H., Romero, V., Pastor, M., Vidal, E.: Multimodal interactive transcription of text images. Pattern Recognition 43(5), 1814–1825 (2010)

    Article  MATH  Google Scholar 

  24. Wang, Q.-F., Yin, F., Liu, C.-L.: Integrating language model in handwritten chinese text recognition. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 1036–1040 (July 2009)

    Google Scholar 

  25. Zhang, H., Zhou, X.-D., Liu, C.-L.: Keyword spotting in online chinese handwritten documents with candidate scoring based on semi-crf model. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 567–571 (August 2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Graciela Lecireth Meza-Lovón .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Meza-Lovón, G.L. (2014). A Language Model for Improving the Graph-Based Transcription Approach for Historical Documents. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12027-0_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12026-3

  • Online ISBN: 978-3-319-12027-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics