Object Level Grouping for Video Shots
- Josef SivicAffiliated withDepartment of Engineering Science, University of Oxford
- , Frederik SchaffalitzkyAffiliated withDepartment of Engineering Science, University of Oxford
- , Andrew ZissermanAffiliated withDepartment of Engineering Science, University of Oxford
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
We describe a method for automatically obtaining object representations suitable for retrieval from generic video shots. The object representation consists of an association of frame regions. These regions provide exemplars of the object’s possible visual appearances.
Two ideas are developed: (i) associating regions within a single shot to represent a deforming object; (ii) associating regions from the multiple visual aspects of a 3D object, thereby implicitly representing 3D structure. For the association we exploit temporal continuity (tracking) and wide baseline matching of affine covariant regions.
In the implementation there are three areas of novelty: First, we describe a method to repair short gaps in tracks. Second, we show how to join tracks across occlusions (where many tracks terminate simultaneously). Third, we develop an affine factorization method that copes with motion degeneracy.
We obtain tracks that last throughout the shot, without requiring a 3D reconstruction. The factorization method is used to associate tracks into object-level groups, with common motion. The outcome is that separate parts of an object that are not simultaneously visible (such as the front and back of a car, or the front and side of a face) are associated together. In turn this enables object-level matching and recognition throughout a video.
We illustrate the method on the feature film “Groundhog Day.” Examples are given for the retrieval of deforming objects (heads, walking people) and rigid objects (vehicles, locations).
Keywords3D object retrieval in videos tracking affine covariant regions independent motion segmentation robust affine factorization
- Object Level Grouping for Video Shots
International Journal of Computer Vision
Volume 67, Issue 2 , pp 189-210
- Cover Date
- Print ISSN
- Online ISSN
- Kluwer Academic Publishers
- Additional Links
- 3D object retrieval in videos
- tracking affine covariant regions
- independent motion segmentation
- robust affine factorization
- Industry Sectors