Skip to main content
Log in

DEBORA: Digital AccEss to BOoks of the RenAissance

  • Original Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

EBORA (Digital AccEss to BOoks of the RenAissance) is a multidisciplinary European project aiming at digitizing and thus making rare sixteenth century books more accessible. End-users, librarians, historians, researchers in book history and computer scientists participated in the development of remote and collaborative access to digitized Renaissance books, necessary because of the reduced accessibility to digital libraries in image mode through the Internet. The size of files for the storage of images, the lack of a standard file format exchange suitable for progressive transmission, and limited querying possibilities currently limit remote access to digital libraries. To improve accessibility, historical documents must be digitized and retro-converted to extract a detailed description of the image contents suited to users’ needs. Specialists of the Renaissance have described the metadata generally required by end-users and the ideal functionalities of the digital library. The retro-conversion of historical documents is a complex process that includes image capture, metadata extraction, image storage and indexing, automatic conversion in a reusable electronic form, publication on the Internet, and data compression for faster remote access. The steps of this process cannot be developed independently. DEBORA proposes a global approach to retro-conversion from the digitization to the final functionalities of the digital library centered on users’ needs. The retro-conversion process is mainly based on a document image analysis system that simultaneously extracts the metadata and compresses the images. We also propose a file format to describe compressed books as heterogeneous data (images/text/links/ annotation/physical layout and logical structure) suitable for progressive transmission, editing, and annotation. DEBORA is an exploratory project that aims at demonstrating the feasibility of the concepts by developing prototypes tested by end-users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Le Bourgeois, F., et al.: Document images analysis solutions for digital libraries. In: Proceedings of first International Workshop on Document Image Analysis for Libraries (DIAL’04). Palo Alto, California, pp. 2–24, 23–24 January 2004

  2. http://debora.enssib.fr

  3. DEBORA: European project, on-line book, 171p. http://rfv6.insa-lyon.fr/debora (2000)

  4. Trinh, E.: De la numérisation à la consultation des documents anciens : Elaboration de procédures de numérisation, de traitements de restauration et proposition d’une plate-forme de consultation, PhD, INSA de Lyon, Villeurbanne France, 212 p, 3 April 2003

  5. Sauvola, J., et al.: Adaptative document binarization. In: Proceedings of the 4th International Conference on Analysis and Recognition, ICDAR’97, vol. 1, Ulm, Allemagne, pp. 147–152 (1997)

  6. Wolf, C.: Text localization enhancement and binarization in multimedia documents. In: Proceedings of the ICPR’02, vol 2, August 11–15 2002, Québec, Canada, pp. 1037–1040

  7. Le Bourgeois, F., Kaileh, H.: Automatic metadata retrieval from ancient manuscripts. In: Proceedings of International Workshop on Documents Analysis Systems (DAS2004), Florence, 8–10 September 2004

  8. Hunter R., Robinson A. (1980) International digital facsimile coding standards. Proc. IEEE, 68: 854–867

    Google Scholar 

  9. Bodson D., Urban S., Deutermann A., Clarke C. (1985) Measurement of data compression in advanced group 4 facsimile system. Proc. IEEE, 73: 731–739

    Article  Google Scholar 

  10. JBIG Committee: ISO/IEC JTC1/SC29/WG1 (ITU-T-SG8) WD 14492, (1998)

  11. Pennebaker W., Mitchell J., Langdon G., Arps R. (1988) An overview of the basic principles of the Q-coder adaptative binary arithmetic coder. IBM J. Res. Dev, 32: 717–726

    Article  Google Scholar 

  12. Kia O.E.: Document image compression and analysis. Ph.D. of the university of Maryland, 1997, p. 191 (1997)

  13. Howard P. Lossless and lossy compression of text images by soft pattern matching. In: Proceedings of the IEEE Data compression Conference, 210–219 (1996).

  14. Howard P., Kossentini F., Martins B., Forchhammer S., Rucklide W., Ono F. (1998) The emerging JBIG2 standard. IEEE Trans. Circ. Syst. Video Technol, 8(5): 838–848

    Article  Google Scholar 

  15. Bottou L., Haffner P., Howard P.G., Simard P., Bengio Y., LeCun Y. (1998) High-quality document image compression with DjVu. J Electron. Imaging, 7(3): 410–428

    Article  Google Scholar 

  16. Asher R., Nagy G. (1974) A means for achieving a high degree of compaction on scan-digitized printed text. IEEE Trans. Comput. 23: 1174–1179

    Google Scholar 

  17. Wong K., Casey R., Wahl F. (1982) Document analysis system. IBM J. Res. Dev. 26: 647–656

    Google Scholar 

  18. Mohiuddin K., Rissanen J., Arps R. Lossless binary image compression based on pattern matching. Proceedings of the International Conference On Computers, Systems and Signal Processing, 447–451 (1984)

  19. Witten I., Bell T., Emberson H., Inglis S., Moffat A. (1994) Textual image compression: two stage lossy/lossless encoding of textual images. Proc. IEEE, 82: 878–888

    Article  Google Scholar 

  20. Inglis S., Witten I.: Compression-based template matching. Proc. of the IEEE Data Compression Conference, pp. 106–115 (1994)

  21. Le Bourgeois F., Emptoz H.: Document Analysis in gray level and typography Extraction using character pattern redundancies. In: proceedings of the 5th ICDAR, Bangalore India, pp. 177–180, 20–22 (1999)

  22. Gross A., Latecki L.J. (1999) Digital geometric methods in document image analysis. Pattern Recogn, 32: 407–424

    Article  Google Scholar 

  23. Sarkar P., et al. (1991) Spatial sampling of printed patterns. IEEE Trans. Pattern Anal. Mach. Intell. 20: 344–351

    Article  MathSciNet  Google Scholar 

  24. Le Bourgeois F., et al.: Networking digital document images. Proceedings of the ICDAR, Seattle, pp. 379–383 (2001)

  25. O’Gorman, Binarization and multi-thresholding of document images using connectivity. Comput. Vis. Graph. Image Process. J. Graph. Models Image Process. 56(6), 494–506 (1994)

  26. Hersch R., André J., Brown H. (1998) Electronic publishing, artistic imaging, and digital typography. Springer, Berlin New York

    Book  Google Scholar 

  27. André J. (2003) Numérisation et codage des caractères de livres anciens. J. Doc. Numér, 7(3): 127–142

    Article  Google Scholar 

  28. Turcan, I.: L’édition scientifique d‘ouvrages anciens sur support électronique: éthique méthodologique du traitement numérique des ornements et marques typographiques des dictionnaires dans le programme de numérisation des collections d’ouvrages anciens du laboratoire ATILF, actes de la XIVe Conférence Européenne TeX (EuroTeX’2003), Retour à la typographie. Brest, 24–27 juin 2003

  29. Bres, S., Jolion, J.M., Le Bourgeois, F.: Traitement et analyse des images numériques. Paris Hermès Lavoisier. ISBN 2-7462-0741-9, 408 p (2003)

  30. Nadler L. (1984) A survey of document segmentation and coding techniques. Comput. Vis. Graph. Image process. 28: 240–262

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Le Bourgeois.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le Bourgeois, F., Emptoz, H. DEBORA: Digital AccEss to BOoks of the RenAissance. IJDAR 9, 193–221 (2007). https://doi.org/10.1007/s10032-006-0030-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-006-0030-0

Keywords

Navigation