Advertisement

Neural Computing and Applications

, Volume 28, Issue 9, pp 2477–2487 | Cite as

Word graphs size impact on the performance of handwriting document applications

  • Alejandro H. Toselli
  • Verónica Romero
  • Enrique Vidal
IBPRIA 2015
  • 200 Downloads

Abstract

Two document processing applications are considered: computer-assisted transcription of text images (CATTI) and Keyword Spotting (KWS), for transcribing and indexing handwritten documents, respectively. Instead of working directly on the handwriting images, both of them employ meta-data structures called word graphs (WG), which are obtained using segmentation-free handwritten text recognition technology based on N-gram language models and hidden Markov models. A WG contains most of the relevant information of the original text (line) image required by CATTI and KWS but, if it is too large, the computational cost of generating and using it can become unafordable. Conversely, if it is too small, relevant information may be lost, leading to a reduction of CATTI or KWS performance. We study the trade-off between WG size and performance in terms of effectiveness and efficiency of CATTI and KWS. Results show that small, computationally cheap WGs can be used without loosing the excellent CATTI and KWS performance achieved with huge WGs.

Keywords

Computer-assisted transcription of text images Keyword spotting for handwritten text Historical handwritten manuscripts Word graphs Evaluation performance 

Notes

Acknowledgments

Work partially supported by the Generalitat Valenciana under the Prometeo/2009/014 Project Grant ALMAMATER, by the Spanish MECD as part of the Valorization and I+D+I Resources program of VLC/CAMPUS in the International Excellence Campus program, and through the EU projects: HIMANIS (JPICH programme, Spanish Grant Ref. PCIN-2015-068) and READ (Horizon-2020 programme, Grant Ref. 674943).

References

  1. 1.
    Amengual JC, Vidal E (1998) Efficient error-correcting Viterbi parsing. IEEE Trans Pattern Anal Mach Intell 20(10):1109–1116CrossRefGoogle Scholar
  2. 2.
    Bazzi I, Schwartz R, Makhoul J (1999) An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans Pattern Anal Mach Intell 21(6):495–504CrossRefGoogle Scholar
  3. 3.
    Erman L, Lesser V (1990) The HEARSAY-II speech understanding system: a tutorial. Readings in Speech Reasoning, pp 235–245Google Scholar
  4. 4.
    Evermann G (1999) Minimum word error rate decoding. Ph.D. thesis, Churchill College, University of CambridgeGoogle Scholar
  5. 5.
    Fischer A, Wuthrich M, Liwicki M, Frinken V, Bunke H, Viehhauser G, Stolz M (2009) Automatic transcription of handwritten medieval documents. In: 15th international conference on virtual systems and multimedia, 2009. VSMM ’09, pp 137–142Google Scholar
  6. 6.
    Frinken V, Fischer A, Manmatha R, Bunke H (2012) A novel word spotting method based on recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 34(2):211–224CrossRefGoogle Scholar
  7. 7.
    Furcy D, Koenig S (2005) Limited discrepancy beam search. In: Proceedings of the 19th international joint conference on artificial intelligence, IJCAI’05, pp 125–131Google Scholar
  8. 8.
    Granell E, Martínez-Hinarejos CD (2015) Multimodal output combination for transcribing historical handwritten documents. In: 16th international conference on computer analysis of images and patterns, CAIP 2015, chap, pp 246–260. Springer International PublishingGoogle Scholar
  9. 9.
    Hakkani-Tr D, Bchet F, Riccardi G, Tur G (2006) Beyond ASR 1-best: using word confusion networks in spoken language understanding. Comput Speech Lang 20(4):495–514CrossRefGoogle Scholar
  10. 10.
    Jelinek F (1998) Statistical methods for speech recognition. MIT Press, CambridgeGoogle Scholar
  11. 11.
    Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, speech recognition, and computational linguistics, 2nd edn. Prentice-Hall, Englewood CliffsGoogle Scholar
  12. 12.
    Kneser R, Ney H (1995) Improved backing-off for N-gram language modeling. In: International conference on acoustics, speech and signal processing (ICASSP ’95), vol 1, pp 181–184. IEEE Computer SocietyGoogle Scholar
  13. 13.
    Liu P, Soong FK (2006) Word graph based speech recognition error correction by handwriting input. In: Proceedings of the 8th international conference on multimodal interfaces, ICMI ’06, pp 339–346. ACMGoogle Scholar
  14. 14.
    Lowerre BT (1976) The harpy speech recognition system. Ph.D. thesis, Pittsburgh, PAGoogle Scholar
  15. 15.
    Luján-Mares M, Tamarit V, Alabau V, Martínez-Hinarejos CD, Pastor M, Sanchis A, Toselli A (2008) iATROS: a speech and handwritting recognition system. In: V Jornadas en Tecnologías del Habla (VJTH’2008), pp 75–78Google Scholar
  16. 16.
    Mangu L, Brill E, Stolcke A (2000) Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput Speech Lang 14(4):373–400CrossRefGoogle Scholar
  17. 17.
    Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefMATHGoogle Scholar
  18. 18.
    Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16(1):69–88CrossRefGoogle Scholar
  19. 19.
    Odell JJ, Valtchev V, Woodland PC, Young SJ (1994) A one pass decoder design for large vocabulary recognition. In: Proceedings of the workshop on human language technology, HLT ’94, pp 405–410. Association for Computational LinguisticsGoogle Scholar
  20. 20.
    Oerder M, Ney H (1993) Word graphs: an efficient interface between continuous-speech recognition and language understanding. IEEE Int Conf Acoust Speech Signal Process 2:119–122Google Scholar
  21. 21.
    Olivie J, Christianson C, McCarry J (eds) (2011) Handbook of natural language processing and machine translation. Springer, BerlinGoogle Scholar
  22. 22.
    Ortmanns S, Ney H, Aubert X (1997) A word graph algorithm for large vocabulary continuous speech recognition. Comput Speech Lang 11(1):43–72CrossRefGoogle Scholar
  23. 23.
    Padmanabhan M, Saon G, Zweig G (2000) Lattice-based unsupervised MLLR for speaker adaptation. In: ASR2000-automatic speech recognition: challenges for the New Millenium ISCA Tutorial and Research Workshop (ITRW)Google Scholar
  24. 24.
    Pesch H, Hamdani M, Forster J, Ney H (2012) Analysis of preprocessing techniques for latin handwriting recognition. In: International conference on frontiers in handwriting recognition, ICFHR’12, pp 280–284Google Scholar
  25. 25.
    Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesely K (2011) The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing SocietyGoogle Scholar
  26. 26.
    Povey D, Hannemann M, Boulianne G, Burget L, Ghoshal A, Janda M, Karafiat M, Kombrink S, Motlcek P, Qian Y, Riedhammer K, Vesely K, Vu NT (2012) Generating Exact Lattices in the WFST Framework. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP)Google Scholar
  27. 27.
    Rabiner L (1989) A tutorial of hidden Markov models and selected application in speech recognition. Proc IEEE 77:257–286CrossRefGoogle Scholar
  28. 28.
    Robertson S (2008) A new interpretation of average precision. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval (SIGIR ’08), pp 689–690. ACMGoogle Scholar
  29. 29.
    Romero V, Toselli AH, Rodríguez L, Vidal E (2007) Computer assisted transcription for ancient text images. Proc Int Conf Image Anal Recogn LNCS 4633:1182–1193CrossRefGoogle Scholar
  30. 30.
    Romero V, Toselli AH, Vidal E (2012) Multimodal interactive handwritten text transcription. Series in machine perception and artificial intelligence (MPAI). World Scientific Publishing, SingaporeCrossRefGoogle Scholar
  31. 31.
    Rybach D, Gollan C, Heigold G, Hoffmeister B, Lööf J, Schlüter R, Ney H (2009) The RWTH aachen university open source speech recognition system. In: Interspeech, pp 2111–2114Google Scholar
  32. 32.
    Sánchez J, Mühlberger G, Gatos B, Schofield P, Depuydt K, Davis R, Vidal E, de Does J (2013) tranScriptorium: an European project on handwritten text recognition. In: DocEng, pp 227–228Google Scholar
  33. 33.
    Saon G, Povey D, Zweig G (2005) Anatomy of an extremely fast LVCSR decoder. In: INTERSPEECH, pp 549–552Google Scholar
  34. 34.
    Strom N (1995) Generation and minimization of word graphs in continuous speech recognition. In: Proceedings of IEEE workshop on ASR’95, pp 125–126. Snowbird, UtahGoogle Scholar
  35. 35.
    Tanha J, de Does J, Depuydt K (2015) Combining higher-order N-grams and intelligent sample selection to improve language modeling for Handwritten Text Recognition. In: ESANN 2015 proceedings, European symposium on artificial neural networks, computational intelligence and machine learning, pp 361–366Google Scholar
  36. 36.
    Toselli A, Romero V, i Gadea MP, Vidal E (2010) Multimodal interactive transcription of text images. Pattern Recogn 43(5):1814–1825CrossRefMATHGoogle Scholar
  37. 37.
    Toselli A, Romero V, Vidal E (2015) Word-graph based applications for handwriting documents: impact of word-graph size on their performances. In: Paredes R, Cardoso JS, Pardo XM (eds) Pattern recognition and image analysis. Lecture Notes in Computer Science, vol 9117, pp 253–261. Springer International PublishingGoogle Scholar
  38. 38.
    Toselli AH, Juan A, Keysers D, Gonzlez J, Salvador I, Ney H, Vidal E, Casacuberta F (2004) Integrated handwriting recognition and interpretation using finite-state models. Int J Pattern Recogn Artif Intell 18(4):519–539CrossRefGoogle Scholar
  39. 39.
    Toselli AH, Vidal E (2013) Fast HMM-Filler approach for key word spotting in handwritten documents. In: Proceedings of the 12th international conference on document analysis and recognition (ICDAR’13). IEEE Computer SocietyGoogle Scholar
  40. 40.
    Toselli AH, Vidal E, Romero V, Frinken V (2013) Word-graph based keyword spotting and indexing of handwritten document images. Technical report, Universitat Politècnica de ValènciaGoogle Scholar
  41. 41.
    Ueffing N, Ney H (2007) Word-level confidence estimation for machine translation. Comput Linguist 33(1):9–40. doi: 10.1162/coli.2007.33.1.9 CrossRefMATHGoogle Scholar
  42. 42.
    Vinciarelli A, Bengio S, Bunke H (2004) Off-line recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans Pattern Anal Mach Intell 26(6):709–720CrossRefGoogle Scholar
  43. 43.
    Weng F, Stolcke A, Sankar A (1998) Efficient lattice representation and generation. In: Proceedings of ICSLP, pp 2531–2534Google Scholar
  44. 44.
    Wessel F, Schluter R, Macherey K, Ney H (2001) Confidence measures for large vocabulary continuous speech recognition. IEEE Trans Speech Audio Process 9(3):288–298CrossRefGoogle Scholar
  45. 45.
    Wolf J, Woods W (1977) The HWIM speech understanding system. In: IEEE international conference on acoustics, speech, and signal processing, ICASSP ’77, vol 2, pp 784–787Google Scholar
  46. 46.
    Woodland P, Leggetter C, Odell J, Valtchev V, Young S (1995) The 1994 HTK large vocabulary speech recognition system. In: International conference on acoustics, speech, and signal processing (ICASSP ’95), vol 1, pp 73 –76Google Scholar
  47. 47.
    Young S, Odell J, Ollason D, Valtchev V, Woodland P (1997) The HTK book: hidden Markov models toolkit V2.1. Cambridge Research Laboratory Ltd, CambridgeGoogle Scholar
  48. 48.
    Young S, Russell N, Thornton J (1989) Token passing: a simple conceptual model for connected speech recognition systems. Technical reportGoogle Scholar
  49. 49.
    Zhu M (2004) Recall, precision and average precision. Working Paper 2004–09 Department of Statistics and Actuarial Science, University of WaterlooGoogle Scholar
  50. 50.
    Zimmermann M, Bunke H (2004) Optimizing the integration of a statistical language model in hmm based offline handwritten text recognition. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 2, pp 541–544Google Scholar

Copyright information

© The Natural Computing Applications Forum 2016

Authors and Affiliations

  • Alejandro H. Toselli
    • 1
  • Verónica Romero
    • 1
  • Enrique Vidal
    • 1
  1. 1.PRHLT Research CentreUniversitat Politècnica de ValènciaValenciaSpain

Personalised recommendations