The only media descriptions available for the vast majority of media published on the Web today is global high level metadata. To differentiate themselves from systems that treat the described media payload as an opaque data file, individual systems must employ automated content processing. While specific processing methods have been optimized to handle particular media types, there are common principles that apply to some degree across all media types. The field of digital signal processing includes several areas of focus including speech, audio, image, and video processing. If we stretch the notion of signal processing from digitizing an analog waveform to include streams of symbols, we can consider text streams corresponding to the media dialog to be signals as well [Rab99]. Common media processing operations include noise reduction, re-sampling, compression, segmentation, feature extraction, modeling, statistical methods, summarization, and building compact representations for indexing and browsing.
In the previous chapters we discussed the practical issues of compression systems in use today as well as container file formats for media streams. We discussed media related text streams and formats including closed caption, subtitles, transcripts, etc. Here we will present at an introductory level, the common elements for media processing as it relates to content-based video search engine systems. In later chapters, we will explore in greater detail some of the most common methods applied to audio, video and text streams, and we will present multimodal processing where these media streams may be processed in a coordinated manner to achieve greater accuracy than is possible by processing the components individually.
KeywordsFeature Extraction Media Type Media Processing Speech Recognition System Video Summarization
Unable to display preview. Download preview PDF.