Abstract
We introduce an online learning approach for multi-target tracking. Detection responses are gradually associated into tracklets in multiple levels to produce final tracks. Unlike most previous approaches which only focus on producing discriminative motion and appearance models for all targets, we further consider discriminative features for distinguishing difficult pairs of targets. The tracking problem is formulated using an online learned CRF model, and is transformed into an energy minimization problem. The energy functions include a set of unary functions that are based on motion and appearance models for discriminating all targets, as well as a set of pairwise functions that are based on models for differentiating corresponding pairs of tracklets. The online CRF approach is more powerful at distinguishing spatially close targets with similar appearances, as well as in tracking targets in presence of camera motions. An efficient algorithm is introduced for finding an association with low energy cost. We present results on four public data sets, and show significant improvements compared with several state-of-art methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Definitions are given in Sect. 5.
Detection time costs are not included in either measurements.
References
Andriyenko, A., & Schindler, K. (2011). Multi-target tracking by continuous energy minimization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, USA, pp. 1265–1272.
Andriyenko, A., Schindler, K., & Roth, S. (2012). Discrete–continuous optimization for multi-target tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, pp. 1926–1933.
Breitenstein, M. D., Reichlin, F., Leibe, B., Koller-Meier, E., & Gool, L. V. (2011). Online multi-person tracking-by-detection from a single, uncalibrated camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1820–1833.
Comaniciu, D., Ramesh, V., & Meer, P. (2000). Real-time tracking of non-rigid objects using mean shift. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hilton Head Island, SC, USA, pp. 142–149.
Duan, G., Ai, H., Cao, S., & Lao, S. (2012). Group tracking: Exploring mutual relations for multiple object tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy, pp. 129–143.
Ess, A., Leibe, B., Schindler, K., & van Gool, L. (2009). Robust multiperson tracking from a mobile platform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(10), 1831–1846.
Grabner, H., Matas, J., Gool, L. V., & Cattin, P. (2010). Tracking the invisible: Learning where the object might be. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, USA, pp. 1285–1292.
Hammer, P. L., Hansen, P., & Simeone, B. (1984). Roof duality, complementation and persistency in quadratic 0–1 optimization. Mathematical Programming, 28(2), 121–155.
Holzer, S., Pollefeys, M., Ilic, S., Tan, D., & Navab, N. (2012). Online learning of linear predictors for real-time tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy, pp. 470–483.
Huang, C., & Nevatia, R. (2010). High performance object detection by collaborative learning of joint ranking of granule features. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, USA, pp. 41–48.
Huang, C., Wu, B., & Nevatia, R. (2008). Robust object tracking by hierarchical association of detection responses. In Proceedings of the European Conference on Computer Vision (ECCV), Marseille, France, pp. 788–801.
Isard, M., & Blake, A. (1998). Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1), 5–28.
Kalal, Z., Matas, J., & Mikolajczyk, K. (2010). P–N learning: Bootstrapping binary classifiers by structural constraints. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, USA, pp. 49–56.
Kuo, C.-H., & Nevatia, R. (2011). How does person identity recognition help multi-person tracking? In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, USA, pp. 1217–1224.
Kuo, C.-H., Huang, C., & Nevatia, R. (2010). Multi-target tracking by on-line learned discriminative appearance models. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, USA, pp. 685–692.
Li, Y., Huang, C., & Nevatia, R. (2009). Learning to associate: Hybridboosted multi-target tracker for crowded scene. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, pp. 2953–2960.
National Institute of Standards and Technology: Trecvid 2008 evaluation for surveillance event detection. Retrieved October 1, 2012 from http://www.nist.gov/speech/tests/trecvid/2008/.
Pearl, J. (1998). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco: Morgan Kaufmann.
Perera, A. G. A., Srinivas, C., Hoogs, A., Brooksby, G., & Hu, W. (2006). Multi-object tracking through simultaneous long occlusions and split-merge conditions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, USA, pp. 666–673.
Pets 2009 dataset. Retrieved October 1, 2012 from http://www.cvg.rdg.ac.uk/PETS2009.
Pirsiavash, H., Ramanan, D., & Fowlkes, C. C. (2011). Globally-optimal greedy algorithms for tracking a variable number of objects. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, USA, pp. 1201–1208.
Shitrit, H. B., Berclaz, J., Fleuret, F., & Fua, P. (2011). Tracking multiple people under global appearance constraints. In Proceedings of IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, pp. 137–144.
Song, B., Jeng, T. Y., Staudt, E., & Roy-Chowdhury, A. K. (2010). A stochastic graph evolution framework for robust multi-target tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Crete, Greece, pp. 605–619.
Stalder, S., Grabner, H., & Gool, L. V. (2010). Cascaded confidence filtering for improved tracking-by-detection. In Proceedings of the European Conference on Computer Vision (ECCV), Crete, Greece, pp. 369–382.
Wang, S., Lu, H., Yang, F., & Yang, M.-H. (2011). Superpixel tracking. In Proceedings of IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, pp. 1323–1330.
Xing, J., Ai, H., & Lao, S. (2009). Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, pp. 1200–1207.
Xing, J., Ai, H., Liu, L., & Lao, S. (2011). Multiple players tracking in sports video: A dual-mode two-way bayesian inference approach with progressive observation modeling. IEEE Transaction on Image Processing, 20(6), 1652–1667.
Yang, B., & Nevatia, R. (2012a). Online learned discriminative part-based appearance models for multi-human tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy, pp. 484–498.
Yang, B., & Nevatia, R. (2012b). An online learned CRF model for multi-target tracking. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, pp. 2034–2041.
Yang, B., Huang, C., & Nevatia, R. (2011). Learning affinities and dependencies for multi-target tracking using a CRF model. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, USA, pp. 1233–1240.
Yu, Q., Medioni, G., & Cohen, I. (2007). Multiple target tracking using spatio-temporal markov chain monte carlo data association. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, pp. 1–8.
Zamir, A. R., Dehghan, A., & Shah, M. (2012). GMCP-Tracker: Global multi-object tracking. Using generalized minimum clique graphs. In Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy, pp. 343–356.
Zhang, L., Li, Y., & Nevatia, R. (2008). Global data association for multi-object tracking using network flows. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, USA, pp. 1–8.
Zhang, T., Ghanem, B., Liu, S., & Ahuja, N. (2012a). Robust visual tracking via multi-task sparse learning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, pp. 2042–2049.
Zhang, K., Zhang, L., & Yang, M.-H. (2012b). Real-time compressive tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy, pp. 864–877.
Acknowledgments
Research was sponsored, in part, by Office of Naval Research under Grant number N00014-10-1-0517 and by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-10-2-0063. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (avi 5068 KB)
Rights and permissions
About this article
Cite this article
Yang, B., Nevatia, R. Multi-Target Tracking by Online Learning a CRF Model of Appearance and Motion Patterns. Int J Comput Vis 107, 203–217 (2014). https://doi.org/10.1007/s11263-013-0666-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-013-0666-4