Advertisement

Electronic Document Publishing Using DjVu

  • Artem Mikheev
  • Luc Vincent
  • Mike Hawrylycz
  • Léon Bottou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2423)

Abstract

Online access to complex compound documents with client side search and browsing capability is one of the key requirements of effective content management. “DjVu” (Déjà Vu) is a highly efficient document image compression methodology, a file format, and a delivery platform that, when considered together, has shown to effectively address these issues [1]. Originally developed for scanned color documents, the DjVu technology was recently expanded to electronic documents. The small file sizes and very efficient document browsing make DjVu a compelling alternative to such document interchange formats as PostScript or PDF. In addition, DjVu offers a uniform viewing experience for electronic or scanned original documents, on any platform, over any connection speed, which is ideal for digital libraries and electronic publishing. This paper describes the basics of DjVu encoding, with emphasis on the particular challenges posed by electronic sources. The DjVu Virtual Printer Driver we implemented as “Universal DjVu Converter” is then introduced. Basic performance statistics are given, and enterprise workflow applications of this technology are highlighted.

Keywords

Digital Library Document Image Minimum Description Length Electronic Document Color Document 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Bottou, L., Haffner, P., Howard, P., Simard, P., Bengio, Y., LeCun, Y.: High quality document image compression with DjVu. Journal of Electronic Imaging 7 (1998) 410–428CrossRefGoogle Scholar
  2. 2.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, New York (1994)zbMATHGoogle Scholar
  3. 3.
    LeCun, Y., Bottou, L.,, Haffner, P., Howard, P.: DjVu: a compression method for distributing scanned documents in color over the internet. In: Proceedings of Color 6, IST. (1998)Google Scholar
  4. 4.
    Bottou, L., Haffner, P., Howard, P., Simard, P., Bengio, Y., LeCun, Y.: Browsing through high quality document images with DjVu. In: Proceedings of IEEE Conference on Advanced in Digital Libraries. (1998)Google Scholar
  5. 5.
    Haffner, P., LeCun, Y., Bottou, L., Howard, P., Vincent, P.: Color documents on the web with DjVu. In: Proceedings of IEEE International Conference on Image Processing, Kobe, Japan (1999) 239–243Google Scholar
  6. 6.
    LeCun, Y., Bottou, L., Haffner, P., Triggs, J., Riemers, B., Vincent, L.: Overview of the djvu document compression technology. In: SDIUT’01, Symposium on Document Image Understanding Technologies, Columbia, MA, University of Maryland (2001) 119–122Google Scholar
  7. 7.
    Ascher, R.N., Nagy, G.: Means for achieving a high degree of compaction on scandigitized printed text. IEEE Trans. Comput. C-23 (1974) 1174–1179CrossRefGoogle Scholar
  8. 8.
    Howard, P.G.: Text image compression using soft pattern matching. Computer Journal 40(2/3) (1997) 146–156CrossRefGoogle Scholar
  9. 9.
    Haffner, P., Bottou, L., LeCun, Y., Vincent, L.: A general segmentation scheme for DjVu document compression. In Talbot, H., Berman, M., eds.: ISMM’02, International Symposium on Mathematical Morphology, Sydney, Australia, CSIRO Publications (2002)Google Scholar
  10. 10.
    Bottou, L., Haffner, P., LeCun, Y.: Conversion of digital documents to multilayer raster formats. In: ICDAR’2001, International Conference on Document Analysis and Recognition, Seattle, WA (2001)Google Scholar
  11. 11.
    Rissanen, J.: Stochastic complexity and modeling. Annals of Statistics 14 (1986) 1080–1100zbMATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Artem Mikheev
    • 1
  • Luc Vincent
    • 1
  • Mike Hawrylycz
    • 1
  • Léon Bottou
    • 2
  1. 1.NECResearc h InstitutePrincetonUSA
  2. 2.Lizardtech SoftwareSeattleUSA

Personalised recommendations