A Complete Approach to the Conversion of Typewritten Historical Documents for Digital Archives

  • Apostolos Antonacopoulos
  • Dimosthenis Karatzas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3163)

Abstract

This paper presents a complete system that historians/archivists can use to digitize whole collections of documents relating to personal information. The system integrates tools and processes that facilitate scanning, image indexing, document (physical and logical) structure definition, document image analysis, recognition, proofreading/correction and semantic tagging. The system is described in the context of different types of typewritten documents relating to prisoners in World-War II concentration camps and is the result of a multinational collaboration under the MEMORIAL project funded (€1.5M) by the European Union (www.memorial-project.info). Results on a representative selection of documents show a significant improvement not only in terms of OCR accuracy but also in terms of overall time/cost involved in converting these documents for digital archives.

Keywords

Document Image Text Line Text Region Digital Archive Complete Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Barrett, W., Hutchinson, L., Quass, D., Nielson, H., Kennard, D.: Digital Mountain: From Granite Archive to Global Access. In: Proceedings of the International Workshop on Document Image Analysis for Libraries (DIAL 2004), Palo Alto, USA, pp. 104–121 (2004)Google Scholar
  2. 2.
    Downton, A., Lucas, S., Patoulas, G., Beccaloni, G., Scoble, M., Robinson, G.: Computerising Natural History Card Archives. In: Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), Edinburgh, UK, August 3–6, pp. 354–358 (2003)Google Scholar
  3. 3.
    IsyReADeT project, IST-1999-57462, http://www.isyreadet.net
  4. 4.
    Marinai, S., Marino, E., Cesarini, F., Soda, G.: A Geberal System for the Retrieval of Document Images from Digital Libraries. In: Proceedings of the International Workshop on Document Image Analysis for Libraries (DIAL 2004), Palo Alto, USA, pp. 150–173 (2004)Google Scholar
  5. 5.
    MEMORIAL Consortium.: Specification of a Personal Record Paper Document Layout, Structure and Content. MEMORIAL (IST-2001-33441), Report D2 (2002)Google Scholar
  6. 6.
    Niblack, W.: An Introduction To Digital Image Processing. Prentice-Hall, London (1986)Google Scholar
  7. 7.
    Otsu, N.: A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man and Cybernetics SMC-9, 62–66 (1979)Google Scholar
  8. 8.
    Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis and Machine Vision, 2nd edn. PWS Publishing,Google Scholar
  9. 9.
    Weszka, J.S., Rosenfeld, A.: Threshold Evaluation Techniques. IEEE Transactions on Systems, Man and Cybernetics SMC-8, 622–629 (1978)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Apostolos Antonacopoulos
    • 1
  • Dimosthenis Karatzas
    • 1
  1. 1.Pattern Recognition and Image Analysis (PRImA) group, Department of Computer ScienceUniversity of LiverpoolLiverpoolUK

Personalised recommendations