A video-based framework for the analysis of presentations/posters


Detection and recognition of textual information in an image or video sequence is important for many applications. The increased resolution and capabilities of digital cameras and faster mobile processing allow for the development of interesting systems. We present an application based on the capture of information presented at a slide-show presentation or at a poster session. We describe the development of a system to process the textual and graphical information in such presentations. The application integrates video and image processing, document layout understanding, optical character recognition (OCR), and pattern recognition. The digital imaging device captures slides/poster images, and the computing module preprocesses and annotates the content. Various problems related to metric rectification, key-frame extraction, text detection, enhancement, and system integration are addressed. The results are promising for applications such as a mobile text reader for the visually impaired. By using powerful text-processing algorithms, we can extend this framework to other applications, e.g., document and conference archiving, camera-based semantics extraction, and ontology creation.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Clarke JC, Carlsson S, Zisserman A (1996) Detecting and tracking linear features efficiently In: Proceedings of the British Machine Vision Conference (BMVC 1996)Google Scholar
  2. 2.
    Comaniciu D, Meer P (1999) Mean shift analysis and applications. In: IEEE international conference on computer vision, pp 1197-1203Google Scholar
  3. 3.
    Clark P, Mirmehdi M (2001) Estimating the orientation and recovery of text planes in a single image. In: Proceedings of the British Machine Vision Conference, pp 421-430Google Scholar
  4. 4.
    Clark P, Mirmehdi M (2002) Recognising text in real scenes. In: Int J Doc Anal Recog 4(4):243-257Google Scholar
  5. 5.
    Capel D, Zisserman A (2000) Super-resolution enhancement of text image sequence. In: International conference on pattern recognition, 1:600-605Google Scholar
  6. 6.
    Devernay F (1995) A non-maxima suppression method for edge detection with sub-pixel accuracy. Technical report RR 2724, INRIAGoogle Scholar
  7. 7.
    Doermann D, Liang J, Li H (2003) Progress in camera-based document image analysis. In: 7th international conference on document analysis and recognition, 1:606-617Google Scholar
  8. 8.
    Forsyth DA, Ponce J (2003) Computer vision: a modern approach. Prentice Hall, Englewood Cliffs, NJGoogle Scholar
  9. 9.
    Ferreira S, Thillou C, Gosselin B (2003) From picture to speech: an innovative OCR application for embedded environment. In: Proceedings of the 14th ProRISC workshop on circuits, systems and signal processing (ProRISC 2003)Google Scholar
  10. 10.
    Foroosh H (Shekarforoush), Zerubia J, Berthod M (2002) Extension of phase correlation to sub-pixel registration. In: IEEE Trans Image Process 11(3):188-200Google Scholar
  11. 11.
    Gumerov N, Zandifar A, Duraiswami R, Davis LS (2004) Structure of applicable surfaces from single views. European conference on computer vision (ECCV2004), pp 482-496Google Scholar
  12. 12.
    Hartley R, Zissermann A (2000) Multiple view geometry in computer vision. Cambridge University Press, Cambridge, UKGoogle Scholar
  13. 13.
    Jain AK, Bhattacharjee S (1992) Text segmentation using Gabor filters for automatic document processing. In: Mach Vis Appl 5(3):169-184Google Scholar
  14. 14.
    Kuglin C, Hines D (1975) The phase correlation image alignment method. In: Proceedings of the international conference on cybernetics, 12:163-165Google Scholar
  15. 15.
    Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. In: IEEE Trans Circuits Syst Video Technol 12(4):256-268Google Scholar
  16. 16.
    Liebowitz D, Zisserman A (1998) Metric rectification for perspective images of planes. In: IEEE conference on computer vision and pattern recognition, pp 482-488Google Scholar
  17. 17.
    Liebowitz D (2001) Camera calibration and reconstrcution of geomtery from images. In: PhD dissertation, Oxford UniversityGoogle Scholar
  18. 18.
    Pilu M (2001) Extraction of illusory linear clues in perspectively skewed documents. In: IEEE conference on computer vision and pattern recognition, pp 363-368Google Scholar
  19. 19.
    Mirmehdi M, Palmer PL, Kittler J (1997) Towards optimal zoom for automatic target recognition. In: Proceedings of the 10th SCIA, 1:447-453Google Scholar
  20. 20.
    Newman W, Dance C, Taylor A, Taylor S, Taylor M, Aldhous T (1999) CamWorks: a video-based tool for efficient capture from paper source documents. In: Procedeeings of ICMCS, pp 647-653Google Scholar
  21. 21.
    Faugeras O (1995) Stratification of 3-D vision: projetcive, affine, and metric representations. In: J Opt Soc Am 12(3):465-484Google Scholar
  22. 22.
    Intel Image Processing Open Computer Vision (OpenCV) Library http://www.intel.com/mrl/research/opencvGoogle Scholar
  23. 23.
    Scansoft2000 (OCR software) http://www.scansoft.com/devkit/docimage.aspGoogle Scholar
  24. 24.
    Taylor MJ, Dance CR (1998) Enhancement of document images from cameras. In: Proceedings of IS&T/SPIE EIDR V, pp 230-241Google Scholar
  25. 25.
    Torkkola K (2002) Discriminative features for document classification. In: Proceedings of the 16th international conference on pattern recognition, 1:472-475Google Scholar
  26. 26.
    Trier OD, Taxt T (1995) Evaluation of binarization methods for document images. In: IEEE Trans Pattern Anal Mach Intell 17(3):312-315Google Scholar
  27. 27.
    Van Hateren JH, Van der Schaaf A (1998) Independent component filters of natural images compared with simple cells in the primary visual cortex. In: Proc R Soc Lond B 265(1394):359-366Google Scholar
  28. 28.
    Wallick MN, Lobo NDV, Shah M (2000) Computer vision framework for analyzing projections from video of lectures. In: Proceedings of the ISCA 9th international conference on intellegent systemsGoogle Scholar
  29. 29.
    Wallick MN, Lobo NDV, Shah M (2001) A system for placing videotaped and digital lectures online. In: IEEE 2001 international symposium on intelligent multimedia, video and speech processing (ISIMP)Google Scholar
  30. 30.
    Wu V, Manmatha R, Riseman EM (1999) extFinder: an automatic system to detect and recognize text in images. In: IEEE Trans Pattern Anal Mach Intell 21(11):1224-1229Google Scholar
  31. 31.
    Zandifar A, Chahine A, Duraiswami R, Davis LS (2002) Video-based interface to textual information for the visually impaired. In: IEEE international conference on multimodal interfaces (ICMI), pp 325-330Google Scholar

Copyright information

© Springer-Verlag Berlin/Heidelberg 2005

Authors and Affiliations

  1. 1.Perceptual Interfaces and Reality Lab (PIRL)University of MarylandUSA

Personalised recommendations