All search engine systems share a common architecture at a high level, but vary widely depending on the application and design choices. In general, there are three main architectural components as we view the system from a content flow perspective: content acquisition, processing (indexing), and retrieval. In practice, these are typically decoupled independent processes in order to ease scaling. We will also consider the system from a user activity perspective in which we can consider behaviors and system states.

Content or media processing is the next logical stage in the content flow and involves transcoding, metadata manipulation, extraction and augmentation through media analysis methods. The goal is to capture the media structure and metadata in data structures that enable rapid retrieval and content adaptation.


