Video Mining pp 155-183 | Cite as

Video OCR: A Survey And Practitioner’s Guide

  • Rainer Lienhart
Part of the The Springer International Series in Video Computing book series (VICO, volume 6)


This survey strives to present the core concepts underlying the different texture-based approaches to automatic detection, segmentation and recognition of visual text occurrences in complex images and videos. It emphasizes the different approaches to attack the many issues in this space. For each kind of approach only a few representative references are given. This survey does not try to give an exhaustive listing of all relevant work, but to help practitioners and engineers new in the field to get a thorough overview of the state-of-the-art principles, methods, and systems in Video OCR. To this end, the approaches of the various researchers are broken up into constituents and presented as a design choice in a hypothetical image and video OCR system.


Video OCR text detection text segmentation text recognition text tracking texture survey guide scene text overlay text font attributes pixel classification non-Roman languages edge detection wavelets scale integration. 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Lalitha Agnihotri and Nevenka Dimitrova. Text Dection for Video Analysis. IEEE Workshop on Content-Based Access of Image and Video Libraries, 22 June 1999, Fort Collins, Colorado, 1999.Google Scholar
  2. Min Cai, Jiqiang Song, and Michael R. Lyu. A New Approach for Video Text Detection. IEEE International Conference on Image Processing, pp. 117–120, 2002.Google Scholar
  3. P. Clark and M. Mirmehdi. Finding Text Regions Using Localised Measures. Proceedings of the 11th British Machine Vision Conference, pp. 675–684, BMVA Press, September 2000.Google Scholar
  4. P. Clark and M. Mirmehdi. Estimating the orientation and recovery of text planes in a single image. Proceedings of the 12th British Machine Vision Conference, pp. 421–430, BMVA Press, September 2001.Google Scholar
  5. L. Itti, C. Koch, and E. Niebur. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20 (11): 1254–1259, 1998.CrossRefGoogle Scholar
  6. S. U. Lee, S. Y. Chung, and R. H. Park. A Comparative Performance Study of Several Global Thresholding Techniques for Segmentation. Computer Vision, Graphics, and Image Processing, Vol. 51, pp. 171–190, 1990.CrossRefGoogle Scholar
  7. Huiping Li, O. Kia and David Doermann. Text Enhancement in Digital Videos. Proc. SPIE Vol. 3651: Document Recognition and Retrieval VI, p. 2–9, 1999.CrossRefGoogle Scholar
  8. Huiping Li and David Doermann. Superresolution-Based Enhancement of Text in Digital Video. 15th th Pattern Recognition Conference, Vol. 1, pp. 847–850, 2000.Google Scholar
  9. H. Li, D. Doermann and O. Kia. Automatic Text Detection and Tracking in Digital Video. IEEE Transactions on Image Processing. Vol. 9, No. 1, pp. 147–156, Jan. 2000.CrossRefGoogle Scholar
  10. Bernd Jaehne. Digital Image Processing. Springer-Verlag Berlin Heidelberg, 1995.Google Scholar
  11. Anil K. Jain and Bin Yu. Automatic Text Localication in Images and Video Frames. Pattern Recognition, 31 (12), pp. 2055–2076, Dec. 1998.CrossRefGoogle Scholar
  12. Ki-Young Jeong, Keechul Jung, Eun Yi Kim, and Hang Joon Kim. Neural Network-based Text Location for News Video Indexing. IEEE International Conference on Image Processing, Vol. 3, pp. 319–323, 1999.Google Scholar
  13. Rainer Lienhart and Frank Stuber. Automatic Text Recognition in Digital Videos. Proc. SPIE 2666: Image and Video Processing IV, pp. 180–188, 1996.Google Scholar
  14. Rainer Lienhart. Automatic Text Recognition for Video Indexing. Proc. ACM Multimedia 96, Boston, MA, pp. 11–20, Nov. 1996.Google Scholar
  15. Rainer Lienhart and Wolfgang Effelsberg. Automatic Text Segmentation and Text Recognition for Video Indexing. ACM/Springer Multimedia SystemsVol. 8, pp. 69–81, Jan. 2000.CrossRefGoogle Scholar
  16. Rainer Lienhart and Axel Wernicke. Localizing and Segmenting Text in Images, Videos and Web Pages. IEEE Transactions on Circuits and Systems for Video Technology,Vol.12, No. 4, pp. 256–268, April 2002.CrossRefGoogle Scholar
  17. Rainer Lienhart and Jochen Maydt. An Extended Set of Haar-like Features for Rapid Object Detection.IEEE International Conference on Image Processing,Vol. 1, pp. 900–903, Sep. 2002.CrossRefGoogle Scholar
  18. Daniel Loprestie and JiangYing Zhou. Locating and Recognizing Text in WWW Images. Information Retrieval, Kluwer Academic Publishers, pp. 177–206, 2000.Google Scholar
  19. Vladimir Y. Mariano and Rangachar Kasturi. Locating Uniform-Colored Text in Video Frames. 15th Int. Conf. on Pattern Recognition, Vol. 4, pp. 539–542, 2000.Google Scholar
  20. G. Myers, R. Bolles, Q.-T. Luong, and J. Herson. Recognition of Text in 3-D Scenes. 4th th Symposium on Document Image Understanding Technology, Columbia, Maryland, pp. 23–25, April 2001.Google Scholar
  21. Jun Ohya, Akio Shio, and Shigeru Akamatsu. Recognizing Characters in Scene Images.IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 16, No. 2, Febr. 1994.CrossRefGoogle Scholar
  22. N. Otsu. A Threshold Selection Method From Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, Vol. 9, No. 1, pp. 62–66, 1979.MathSciNetCrossRefGoogle Scholar
  23. Henry A. Rowley, Shumeet Baluja, and Takeo Kanade. Neural Network-Based Face Detection.IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, pp. 23–38, January 1998.CrossRefGoogle Scholar
  24. T. Sato, T. Kanade, E. Hughes, M. Smith. Video OCR for Digital News Archives.IEEE Workshop on Content-Based Access of Image and Video Databases, Bombay, India, January, pp. 52–60, 1998.Google Scholar
  25. T. Sato, T. Kanade, E. K. Huges, M. A. Smith, and S. Satoh. Video OCR: Indexing Digital News Libraries by Recognition of Superimposed Caption. ACM Multimedia Systems, Vol. 7, No. 5, pp. 385–395, 1999.CrossRefGoogle Scholar
  26. Jae-Chang Shim, Chitra Dorai, and Ruud Bolle. Automatic Text Extraction from Video for Content-based Annotation and Retrieval. IBM Technical Report, RC21087, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, January 1998.Google Scholar
  27. C.S. Shin, K.I. Kim, M.H. Park, H.J. Kim. Support Vector Machine-based Text Detection in Digital Video.Proceedings of the IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing X, Vol. 2, pp. 634–641, 2000.CrossRefGoogle Scholar
  28. Paul Viola and Michael J. Jones. Rapid Object Detection using a Boosted Cascade of Simple Features.IEEE Computer Vision and Pattern Recognition, Vol. 1, pp. 511–518, 2001.Google Scholar
  29. V. Wu, R. Manmatha, E.M. Riseman. Textfinder: An Automatic System to Detect and Recognize Text in Images.IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, Issue 11, pp. 1224–1229, Nov. 1999.CrossRefGoogle Scholar
  30. Boon-Lock Yeo and Bede Liu. Visual Content Highlighting via Automatic Extraction of Embedded Captions on MPEG Compressed Video. in Digital Video Compression: Algorithms and Technologies, Proc. SPIE 2668–07 (1996).Google Scholar
  31. Yu Zhong, Hongjiang Zhang,and A.K. Jain. Automatic Caption Localization in Compressed Videos.IEEE International Conference on Image Processing, Vol. 2, pp. 96–100, 1999.Google Scholar
  32. Yu Zhong, Hongjiang Zhang, and A.K. Jain. Automatic Caption Localization in Compressed Videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, Issue 4, pp. 385–392, April 2000.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2003

Authors and Affiliations

  • Rainer Lienhart
    • 1
  1. 1.Intel LabsIntel CorporationSanta ClaraUSA

Personalised recommendations