, Volume 65, Issue 1, pp 63-79
Date: 09 Oct 2010

A Robust Particle Filter-Based Method for Tracking Single Visual Object Through Complex Scenes Using Dynamical Object Shape and Appearance Similarity

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access

Abstract

This paper addresses the issue of tracking a single visual object through crowded scenarios, where a target object may be intersected or partially occluded by other objects for a long duration, experience severe deformation and pose changes, and different motion speed in cluttered background. A robust visual object tracking scheme is proposed that exploits the dynamics of object shape and appearance similarity. The method uses a particle filter where a multi-mode anisotropic mean shift is embedded to improve the initial particles. Comparing with the conventional particle filter and mean shift-based tracking (Shan et al. 2004), our method offers the following novelties: We employ a fully tunable rectangular bounding box described by five parameters (2D central location, width, height, and orientation) and full functionaries in the joint tracking scheme; We derive the equations for the multi-mode version of the anisotropic mean shift where the rectangular bounding box is partitioned into concentric areas, allowing better tracking objects with multiple modes. The bounding box parameters are then computed by using eigen-decomposition of mean shift estimates and weighted averaging. This enables a more efficient re-distributions of initial particles towards locations associated with large weights, hence an efficient particle filter tracking using a very small number of particles (N = 15 is used). Experiments have been conducted on video containing a range of complex scenarios, where tracking results are further evaluated by using two objective criteria and compared with two existing tracking methods. Our results have shown that the propose method is robust in terms of tracking drift, tightness and accuracy of tracked bounding boxes, especially in scenarios where the target object contains long-term partial occlusions, intersections, severe deformation, pose changes, or cluttered background with similar color distributions.