Abstract
The systems currently available for contentbased image and video retrieval work without semantic knowledge, i. e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not always correspond to the similarity a human user would expect. A way to include more semantic knowledge into the indexing process is to use the text included in the images and video sequences. It is rich in information but easy to use, e. g. by key word based queries. In this paper we present an algorithm to localise artificial text in images and videos using a measure of accumulated gradients and morphological processing. The quality of the localised text is improved by robust multiple frame integration. A new technique for the binarisation of the text boxes based on a criterion maximizing local contrast is proposed. Finally, detection and OCR results for a commercial OCR are presented, justifying the choice of the binarisation technique.
Similar content being viewed by others
Author information
Authors and Affiliations
Corresponding author
Additional information
An erratum to this article can be found at http://dx.doi.org/10.1007/s10044-004-0216-3
Rights and permissions
About this article
Cite this article
Wolf, C., Jolion, JM. Extraction and recognition of artificial text in multimedia documents. Formal Pattern Analysis & Applications 6, 309–326 (2004). https://doi.org/10.1007/s10044-003-0197-7
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10044-003-0197-7