Skip to main content

Word Spotting in Archive Documents Using Shape Contexts

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4478))

Abstract

The analysis of historical document images is not only interesting for the preservation of historical heritage but also for the extraction of semantic knowledge. In this paper we present a word spotting approach to find keyword images in digital archives. Detected words allow to construct metadata on document contents for indexing and retrieval purposes. Instead of using OCR based approches that would require accurate segmentation and high image quality, we propose a shape recognition method based on the well-known shape context descriptor. Our method is proven to be robust under hightly distorted and noisy document images, a usual drawback in old document analysis. It has been used in a real application scenario, the Collection of Border Records of the Girona Archive. In particular, spotted keywords are used to extract knowledge on personal data of people referred in the documents.

This work has been partially supported by the Spanish project TIN2006-15694-C02-02 and the Subdirecció General d’Arxius de la Generalitat de Catalunya.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Meyyappan, N., Chowdhury, G., Foo, S.: A review of the status of twenty digital libraries. Journal of Information Science 26(5), 337–355 (2000)

    Article  Google Scholar 

  2. Baird, H.S., Govindaraju, V., Lopresti, D.P.: Document analysis systems for digital libraries: Challenges and opportunities. In: Marinai, S., Dengel, A. (eds.) DAS 2004. LNCS, vol. 3163, pp. 1–16. Springer, Heidelberg (2004)

    Google Scholar 

  3. Antonacopoulos, A., Karatzas, D.: A complete approach to the conversion of typewritten historical documents for digital archives. In: Marinai, S., Dengel, A. (eds.) DAS 2004. LNCS, vol. 3163, pp. 90–101. Springer, Heidelberg (2004)

    Google Scholar 

  4. He, J., Downton, A.: Evaluation of a user assisted archive construction system for online natural history archives. In: Proceedings of 8th Int. Conf. on Document Analysis and Recognition, Seoul, Korea, pp. 42–446 (2005)

    Google Scholar 

  5. Le Bourgeois, F., Kaileh, H.: Automatic metadata retrieval from ancient manuscripts. In: Marinai, S., Dengel, A. (eds.) DAS 2004. LNCS, vol. 3163, pp. 75–89. Springer, Heidelberg (2004)

    Google Scholar 

  6. Couasnon, B., Camillerapp, J., Leplumey, I.: Making handwritten archives documents accessible to public with a generic system of document image analysis. In: Proceedings of First International Workshop on Document Image Analysis for Libraries (DIAL04), Palo Alto, California, pp. 270–277 (2004)

    Google Scholar 

  7. Journet, N., Eglin, V., Ramel, J., Mullot, R.: Text/graphic labelling of ancient printed documents. In: Proceedings of 8th Int. Conf. on Document Analysis and Recognition. Seoul, Korea, pp. 1010–1014 (2005)

    Google Scholar 

  8. Surapong, U., Hammound, M., Garrido, C., Franco, P., Ogier, J.: Ancient graphic documents characterization. In: Proceedings of Sixth IAPR Workshop on Graphics Recognition. Hong Kong, China, pp. 97–105 (2005)

    Google Scholar 

  9. Tomai, C., Zhang, B., Govindaraju, V.: Transcript mapping for historic handwritten document images. In: Proc. of 8th International Workshop on Frontiers in Handwriting Recognition. Ontario, Canada, pp. 413–418 (2002)

    Google Scholar 

  10. Rath, T., Manmatha, R.: Word image matching using dynamic time warping. In: Proc. of the Conf. on Computer Vision and Pattern Recognition (CVPR), Madison, WI , vol. 2, pp. 521–527 (2003)

    Google Scholar 

  11. Loncaric, S.: A survey of shape analysis techniques. Pattern Recognition 31(8), 983–1001 (1998)

    Article  Google Scholar 

  12. Zhang, D., Lu, G.: Review of shape representation and description techniques. Pattern Recognition 37(1), 1–19 (2004)

    Article  MATH  Google Scholar 

  13. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(24), 509–522 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Joan Martí José Miguel Benedí Ana Maria Mendonça Joan Serrat

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Lladós, J., Pratim-Roy, P., Rodríguez, J.A., Sánchez, G. (2007). Word Spotting in Archive Documents Using Shape Contexts. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2007. Lecture Notes in Computer Science, vol 4478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72849-8_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72849-8_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72848-1

  • Online ISBN: 978-3-540-72849-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics