Aligning Transcripts to Automatically Segmented Handwritten Manuscripts

  • Jamie Rothfeder
  • R. Manmatha
  • Toni M. Rath
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3872)


Training and evaluation of techniques for handwriting recognition and retrieval is a challenge given that it is difficult to create large ground-truthed datasets. This is especially true for historical handwritten datasets. In many instances the ground truth has to be created by manually transcribing each word, which is a very labor intensive process. Sometimes transcriptions are available for some manuscripts. These transcriptions were created for other purposes and hence correspondence at the word, line, or sentence level may not be available. To be useful for training and evaluation, a word level correspondence must be available between the segmented handwritten word images and the ASCII transcriptions. Creating this correspondence or alignment is challenging because the segmentation is often errorful and the ASCII transcription may also have errors in it. Very little work has been done on the alignment of handwritten data to transcripts. Here, a novel Hidden Markov Model based automatic alignment algorithm is described and tested. The algorithm produces an average alignment accuracy of about 72.8% when aligning whole pages at a time on a set of 70 pages of the George Washington collection. This outperforms a dynamic time warping alignment algorithm by about 12% previously reported in the literature and tested on the same collection.


Hide Markov Model Machine Translation Dynamic Time Warping Alignment Algorithm Observation Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Deng, Y., Byrne, W.: Hmm word and phrase alignment for statistical machine translation. In: Proceedings of HLT-EMNLP (2005)Google Scholar
  2. 2.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (2001)Google Scholar
  3. 3.
    Hobby, J.D.: Matching document images with ground truth. International Journal on Document Analysis and Recognition 1(1), 52–61 (1997)Google Scholar
  4. 4.
    Kay, M., Roscheisen, M.: Text-translation alignment. Computational Linguistics 19(1), 121–142 (1993)Google Scholar
  5. 5.
    Kornfield, E.M., Manmatha, R., Allan, J.: Text alignment with handwritten documents. In: Proceedings of Document Image Analysis for Libraries (DIAL), pp. 23–24 (2004)Google Scholar
  6. 6.
    Lavrenko, V., Rath, T.M., Manmatha, R.: Holistic word recognition for handwritten historical documents. In: Proceedings of the Workshop on Document Image Analysis for Libraries DIAL 2004, pp. 278–287 (2004)Google Scholar
  7. 7.
    Malfrère, F., Deroo, O., Dutoit, T.: Phonetic alignment: Speech synthesis based vs. hybrid hmm/ann. In: Proceedings of the ICSLP, pp. 1571–1574 (1998)Google Scholar
  8. 8.
    Manmatha, R., Rothfeder, J.L.: A scale space approach for automatically segmenting words from historical handwritten documents. IEEE Transactions on PAMI 28(8), 1212–1225 (2005)Google Scholar
  9. 9.
    Manmatha, R., Srimal, N.: Scale space technique for word segmentation in handwritten manuscripts. In: Proc. of the Second Int’l Conf. on Scale-Space Theories in Computer Vision, Corfu, Greece, September 26-27, pp. 22–33 (1999)Google Scholar
  10. 10.
    Marti, U.V., Bunke, H.: A full English sentence database for off-line handwriting recognition. In: Proc. of the 5th Int. Conf. on Document Analysis and Recognition, Gangalore, India, pp. 705–708 (1999)Google Scholar
  11. 11.
    Marti, U.-V., Bunke, H.: Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Int’l Journal of Pattern Recognition and Artifical Intelligence 15(1), 65–90 (2001)CrossRefGoogle Scholar
  12. 12.
    Jang, P.J., Hauptmann, A.G.: Learning to recognize speech by watching television. IEEE Intelligent Systems 14(5), 51–58 (1999)CrossRefGoogle Scholar
  13. 13.
    Rath, T.M., Lavrenko, V., Manmatha, R.: A search engine for historical manuscript images. In: Proceedings of ACM SIGIR 2004, pp. 369–376 (2004)Google Scholar
  14. 14.
    Rath, T.M., Rothfeder, J.L., Lvin, V.B.: The BoxModify tool, Computer program (2004)Google Scholar
  15. 15.
    Roy, D.K., Malamud, C.: Speaker identification based text to audio alignment for an audio retrieval system. In: ICASSP 1997, Munich, Germany, pp. 1099–1102 (1997)Google Scholar
  16. 16.
    Tomai, C.I., Zhang, B., Govindaraju, V.: Transcript mapping for historic handwritten document images. In: Proc. of the 8th Int’l Workshop on Frontiers in Handwriting Recognition, Niagara-on-the-Lake, ON, August 6-8, pp. 413–418 (2002)Google Scholar
  17. 17.
    Vinciarelli, A., Bengio, S., Bunke, H.: Offline recognition of unconstrained handwritten texts using hmms and statistical language models. IEEE Trans. Pattern Anal. Mach. Intelligence 26(6), 709–720 (2004)CrossRefGoogle Scholar
  18. 18.
    Xu, Y., Nagy, G.: Prototype extraction and adaptive ocr. IEEE Trans. PAMI 21(12), 1280–1296 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jamie Rothfeder
    • 1
  • R. Manmatha
    • 1
  • Toni M. Rath
    • 1
  1. 1.Department of Computer ScienceUniversity of Massachusetts AmherstAmherstUSA

Personalised recommendations