Automatic Feature Extraction and Recognition for Digital Access of Books of the Renaissance

  • F. Muge
  • I. Granado
  • M. Mengucci
  • P. Pina
  • V. Ramos
  • N. Sirakov
  • J. R. Caldas Pinto
  • A. Marcolino
  • Mário Ramalho
  • P. Vieira
  • A. Maia do Amaral
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1923)

Abstract

Antique printed books constitute a heritage that should be preserved and used. With novel digitising techniques is now possible to have these books stored in digital format and accessible to a wider public. However it remains the problem of how to use them. DEBORA (Digital accEss to BOoks of the RenAissance) is a European project that aims to develop a system to interact with these books through world-wide networks. The main issue is to build a database accessible through client computers. That will require to built accompanying metadata that should characterise different components of the books as illuminated letters, banners, figures and key words in order to simplify and speed up the remote access. To solve these problems, digital image analysis algorithms regarding filtering, segmentation, separation of text from non-text, lines and word segmentation and word recognition were developed. Some novel ideas are presented and illustrated through examples.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agam G., Dinstein I., 1996, Adaptive Directional Morphology with Application to Document Analysis, in Maragos P., Schafer R.W., Butt M.A. (eds.), Mathematical Morphology and its Applications to Image and Signal Processing, 401–xxx, Kluwer Academic Publishers, Boston.Google Scholar
  2. 2.
    Beucher S., 1996, Pré-traitement morphologique d’images de plis postaux, 4 éme Colloque National Sur L’ecrit Et Le Document-Cned’96, Nantes.Google Scholar
  3. 3.
    Bhat D., 1998, An Evolutionary Measure for Image Matching, in ICPR’98 — Proc. 14th Int. Conf. On Pattern Recognition, vol. I, 850–852, Brisbane, Australia.Google Scholar
  4. 4.
    Cinque L, Lombardi, L, Manzini G., 1998, A multiresolution approach to page segmentation, Pattern Recognition Letters, 19, pp 217–2225.Google Scholar
  5. 5.
    Cumplido M., Montolio P., Gasull A., 1996, Morphological Preprocessing and Binarization for OCR Systems, in Maragos P., Schafer R.W., Butt M.A. (eds.), Mathematical Morphology and its Applications to Image and Signal Processing, 393–400, Kluwer Academic Publishers, Boston.Google Scholar
  6. 6.
    Guillevic D., Suen C.Y., 1997, HMM Word Recognition Engine, in ICDAR’97 — Proc. 4th Int. Conf. on Document Analysis and Recognition, vol. 2, 544–547, Ulm, GermanyGoogle Scholar
  7. 7.
    He S., Abe N., 1996, A Clustering-Based Approach to the Separation of Text Strings from Mixed Text/Graphics Documents, Proceedings of ICPR’ 96, Vienna.Google Scholar
  8. 8.
    Jain A.K., Yu B, Document Representation and its application to page decomposition, IEEE Pattern Analysis and Machine Intelligence, 20(3), pp 294–308, March 1998Google Scholar
  9. 9.
    Marcolino A., Ramos V., Ramalho M., Caldas Pinto J., 2000, Line and Word Matching in Old Documents, submitted to SIARP’2000 — V Ibero-American Symposium on Pattern Recognition, Lisboa.Google Scholar
  10. 10.
    Mengucci M., Granado I., Muge F., Caldas Pinto J.R., 2000, A Methodology Based on Mathematical Morphology for the Extraction of Text and Figures from Ancient Books, RecPad 2000, pp 471–476 Porto, 11-12 May 2000, Portugal.Google Scholar
  11. 11.
    Parodi P., Piccioli G., 1996, An Efficient Pre-Processing of Mixed-Content Document Images for OCR Systems, Proceedings of ICPR’ 96, Vienna.Google Scholar
  12. 12.
    Ramos V., 2000, An Evolutionary Measure for Image Matching — Extensions to Binary Image Matching, Internal Technical Report, CVRM/IST, Lisboa.Google Scholar
  13. 13.
    Serra J., 1982, Image Analysis and Mathematical Morphology, Academic Press, London.Google Scholar
  14. 14.
    Soille P., 1999, Morphological Image Analysis, Springer; Berlin.Google Scholar
  15. 15.
    Spitz A., 1999, Shape-based word Recognition, International Journal on Document Analysis and Recognition, vol 1, no. 4, 178–190.Google Scholar
  16. 16.
    Srihari, et al, Document Image Understanding, http://www.cedar.buffalo.edu/ Publications/TechReps/Survey/, CEDAR-TR-92-1, 1992.
  17. 17.
    Tang Y.Y., Lee S.W., Suen C.Y., 1996, Automatic Document Processing: A survey; Pattern Recognition, 29(12), 1931-1952.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • F. Muge
    • 1
  • I. Granado
    • 1
  • M. Mengucci
    • 1
  • P. Pina
    • 1
  • V. Ramos
    • 1
  • N. Sirakov
    • 1
  • J. R. Caldas Pinto
    • 2
  • A. Marcolino
    • 2
  • Mário Ramalho
    • 2
  • P. Vieira
    • 2
  • A. Maia do Amaral
    • 3
  1. 1.CVRM / Centro de Geo-SistemasInstituto Superior TécnicoAv. Rovisco PaisLisboa
  2. 2.IDMECInstituto Superior TécnicoAv. Rovisco PaisLisboa
  3. 3.Biblioteca Geral da Universidade de CoimbraLargo da PortaCoimbra

Personalised recommendations