Text Extraction, Enhancement and OCR in Digital Video

  • Huiping Li
  • David Doermann
  • Omid Kia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1655)


In this paper we address the problem of text extraction, enhancement and recognition in digital video. Compared with optical character recognition (OCR) from document images, text extraction and recognition in digital video presents several new challenges. First, the text in video is often embedded in complex backgrounds, making text extraction and separation difficult. Second, image data contained in video frames is often digitized and/or subsampled at a much lower resolution than is typical for document images. As a result, most commercial OCR software can not recognize text extracted from video. We have implemented a hybrid wavelet/neural network segmenter to extract text regions and use a two stage enhancement scheme prior to recognition. First, we use Shannon interpolation to raise the image resolution, and second we postprocess the block with normal/inverse text classification and adaptive thresholding. Experimental results show that our text extraction scheme can extract both scene text and graphical text robustly and reasonable OCR results are achieved after enhancement.


Linear Discriminant Analysis Video Frame Digital Video Document Image Text Block 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    G. Piccioli, E. De Micheli, P. Parodi, and M. Campani. Robust method for road sign detection and recognition. Image and Vision Computing, 14:209–254, 1996.CrossRefGoogle Scholar
  2. 2.
    S. K. Kim, D. W. Kim, and H. J. Kiml, A recognition of vehicle license plate using a genetic algorithm based segmentation. In Proceedings of ICIP, pages 661–664, 1996.Google Scholar
  3. 3.
    T. Gotoh, T. Toriu, S. Sasaki, and M. Yoshida. A flexible vision-based algorithm for a book sorting system. IEEE Trans. PAMI, 10:393–399, 1998.Google Scholar
  4. 4.
    .J. Zhou, D. Lopresti, and T. Tasdizen. Finding text in color images. In Proceedings of SPIE, Document Recognition V, pages 130–140, 1998.Google Scholar
  5. 5.
    R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In Proceedings of ACM Multimedia, pages 11–20, 1996.Google Scholar
  6. 6.
    A. K. Jain and B. Yu. Automatic text location in images and video frames. In Proceedings of ICPR, pages 1497–1499, 1998.Google Scholar
  7. 7.
    Hae-Kwang Kim. Efficient automatic text location method and content-based indexing and structuring of video database. Journal of Visual Communication and Image Representation, 7:336–344, 1996.CrossRefGoogle Scholar
  8. 8.
    C-M. Lee and A. Kankanhalli. Automatic extraction of characters in complex scene images. International Journal of Pattern Rocognition and Artificial Intelligence, 9:67–82, 1995.CrossRefGoogle Scholar
  9. 9.
    J. Ohya, A. Shio, and S. Akamatsu. Recognizing characters in scene images. IEEE Trans. PAMI, 16:214–220, 1994.Google Scholar
  10. 10.
    A. K. Jain and S. Bhattacharjee. Text segmentation using Gabor niters for automatic document processing. Machine Vision and Applications, 5:169–184, 1992.CrossRefGoogle Scholar
  11. 11.
    V. Wu, R. Manmatha, and E. M. Riseman.Automatic text detection and recognition. pages 707–712. 5 1997.Google Scholar
  12. 12.
    Y. Zhong, K. Karu, and A.K. Jain. Locating text in complex color images. Pattern Recognition, 28:1523-1236, 1995.Google Scholar
  13. 13.
    John D. Hobby and Tin K. Ho. Enhancing degraded document images via bitmap clustering and averaging. In ICDAR'97: Fourth International Conference on Document Analysis and Recogntion, pages 394–400, August 1997.Google Scholar
  14. 14.
    J. Liang and R. M. Haralick. Document image restoration using binary morphological filters. In SPIE Vol. 2660, 1996.Google Scholar
  15. 15.
    J. Shim, C. Dorai, and R. Bolle. Automatic text extraction from video for contentbased annotation and retrieval. In Proceedings of ICPR, pages 618–620, 1998.Google Scholar
  16. 16.
    J. Zhou and D. Lopresti. Ocr for world wide web images. In Proceedings of SPIE, Document Recognition IV, pages 58–66, 1997.Google Scholar
  17. 17.
    S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. PAMI, 11:674–693, 1989.zbMATHGoogle Scholar
  18. 18.
    K. Sung and T. Poggio. Example-based learning for view-based human face detection. Technical report, MIT, A.I. Memo 1521, CBCL Paper 112, 1994.Google Scholar
  19. 19.
    K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, New York, 1990.zbMATHGoogle Scholar
  20. 20.
    Niblack W. In An introducti on to image processing, pages 115–116, Englewood Cliffs, N.J.: Prentice Hall, 1986.Google Scholar
  21. 21.
    V. Kobia, D. S. Doermann, and K. I. Lin. Archiving, indexing, and retrieval of video in the compressed domain. In Proc. of the SPIE Conference on Multimedia Storage and Archiving Systems, volume 2916, pages 78–89, 1996.Google Scholar
  22. 22.
    S. Chen. OCR performance evaluation software-user's manual. In T h e Uni versi ty of Washington Database.Google Scholar
  23. 23.
    T. Kanungo, G. A. Marton, and O. Bulbul. Omnipage vs. sakhr: Paired model evaluation of two arable ocr products. In Proc. of the SPIE Conference on Document Recognition and Retrieval (VI), volume 3651, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Huiping Li
    • 1
  • David Doermann
    • 1
  • Omid Kia
    • 2
  1. 1.Language and Media Processing Laboratory Institute for Advanced Computer StudiesUniversity of Maryland College Park
  2. 2.Advanced Network Technologies DivisionNational Institute of Standards and Technology Gaithersburg

Personalised recommendations