Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions


We describe an end-to-end framework for learning parameters of min-cost flow multi-target tracking problem with quadratic trajectory interactions including suppression of overlapping tracks and contextual cues about co-occurrence of different objects. Our approach utilizes structured prediction with a tracking-specific loss function to learn the complete set of model parameters. In this learning framework, we evaluate two different approaches to finding an optimal set of tracks under a quadratic model objective, one based on an linear program (LP) relaxation and the other based on novel greedy variants of dynamic programming that handle pairwise interactions. We find the greedy algorithms achieve almost equivalent accuracy to the LP relaxation while being up to 10\(\times \) faster than a commercial LP solver. We evaluate trained models on three challenging benchmarks. Surprisingly, we find that with proper parameter learning, our simple data association model without explicit appearance/motion reasoning is able to achieve comparable or better accuracy than many state-of-the-art methods that use far more complex motion features or appearance affinity metric learning.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.


  2. 2.


  3. 3.


  4. 4.

    In a recent update of the benchmark server, the organizers changed their evaluation script to count detections in “don’t care” regions as false positives, which we believe is not consistent with general consensus of what “don’t care” regions mean. Thus we report the results up to 24 May 2016 which were evaluated using old evaluation script.


  1. Bae, S. H., & Yoon, K. J. (2014). Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In Proceedings of CVPR.

  2. Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: The clear MOT metrics. Journal on Image Video Processing. doi:10.1155/2008/246309.

  3. Brau, E., Guan, J., Simek, K., Del Pero, L., Reimer Dawson, C., & Barnard, K. (2013). Bayesian 3D tracking from monocular video. In Proceedings of ICCV.

  4. Brendel, W., Amer, M., & Todorovic, S. (2011). Multiobject tracking as maximum weight independent set. In Proceedings of CVPR.

  5. Butt, A. A., & Collins, R. T. (2013). Multi-target tracking by Lagrangian relaxation to min-cost network flow. In Proceedings of CVPR.

  6. Chari, V., Lacoste-Julien, S., Laptev, I., & Sivic, J. (2015). On pairwise costs for network flow multi-object tracking. In Proceedings of CVPR.

  7. Choi, W. (2015). Near-online multi-target tracking with aggregated local flow descriptor. In Proceedings of ICCV.

  8. Choi, W., & Savarese, S. (2012). A unified framework for multi-target tracking and collective activity recognition. In Proceedings of ECCV.

  9. Dehghan, A., Tian, Y., Torr, P. H., & Shah, M. (2015). Target identity-aware network flow for online multiple target tracking. In Proceedings of CVPR.

  10. Desai, C., Ramanan, D., & Fowlkes, C. (2009). Discriminative models for multi-class object layout. In Proceedings of ICCV.

  11. Dollár, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8), 1532–1545.

  12. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.

  13. Finley, T., & Joachims, T. (2008). Training structural SVMs when exact inference is intractable. In Proceedings of ICML.

  14. Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of CVPR.

  15. Ahuja, R. K., Magnanti, T. L., & Orlin, J. B. (1993). Network flows: Theory, algorithms, and applications. Upper Saddle River, NJ: Prentice-Hall, Inc.

  16. Geiger, A., Lauer, M., Wojek, C., Stiller, C., & Urtasun, R. (2014). 3D traffic scene understanding from movable platforms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 1012–1025.

  17. Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The KITTI dataset. International Journal of Robotics Research, 32(11), 1231–1237.

  18. Joachims, T., Finley, T., & Yu, C. N. (2009). Cutting-plane training of structural SVMs. Machine Learning, 77(1), 27–59.

  19. Joulin, A., Tang, K., & Fei-Fei, L. (2014). Efficient image and video co-localization with Frank–Wolfe algorithm. In Proceedings of ECCV.

  20. Kim, S., Kwak, S., Feyereisl, J., & Han, B. (2013). Online multi-target tracking by large margin structured learning. In Proceedings of ACCV.

  21. Kim, C., Li, F., Ciptadi, A., & Rehg, J. M. (2015). Multiple hypothesis tracking revisited. In Proceedings of ICCV.

  22. Lacoste-Julien, S., Taskar, B., Klein, D., & Jordan, M. I. (2006). Word alignment via quadratic assignment. In Proceedings of HLT-NAACL.

  23. Leal-Taixé, L., Fenzi, M., Kuznetsova, A., Rosenhahn, B., & Savarese, S. (2014). Learning an image-based motion context for multiple people tracking. In Proceedings of CVPR.

  24. Leal-Taixé, L., Milan, A., Reid, I., Roth, S., & Schindler, K. (2015). MOTChallenge 2015: Towards a benchmark for multi-target tracking. arXiv:1504.01942 [cs].

  25. Lenz, P., Geiger, A., & Urtasun, R. (2015). Followme: Efficient online min-cost flow tracking with bounded memory and computation. In Proceedings of ICCV.

  26. Li, Y., Huang, C., & Nevatia, R. (2009). Learning to associate: Hybridboosted multi-target tracker for crowded scene. In Proceedings of CVPR.

  27. Liu, C. (2009). Beyond pixels: Exploring new representations and applications for motion analysis. PhD Thesis, Massachusetts Institute of Technology.

  28. Lou, X., & Hamprecht, F. A. (2011). Structured learning for cell tracking. In Proceedings of NIPS.

  29. Milan, A., Leal-Taixé, L., Schindler, K., & Reid, I. (2015). Joint tracking and segmentation of multiple targets. In Proceedings of CVPR.

  30. Milan, A., Schindler, K., & Roth, S. (2012). Discrete–continuous optimization for multi-target tracking. In Proceedings of CVPR.

  31. Milan, A., Schindler, K., & Roth, S. (2013). Detection- and trajectory-level exclusion in multiple object tracking. In Proceedings of CVPR.

  32. Milan, A., Schindler, K., & Roth, S. (2016). Multi-target tracking by discrete–continuous energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10), 2054–2068.

  33. Milan, A., Roth, S., & Schindler, K. (2014). Continuous energy minimization for multitarget tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 58–72.

  34. Pirsiavash, H., Ramanan, D., & Fowlkes, C. C. (2011). Globally-optimal greedy algorithms for tracking a variable number of objects. In Proceedings of CVPR.

  35. Segal, A. V., & Reid, I. (2013). Latent data association: Bayesian model selection for multi-target tracking. In Proceedings of ICCV.

  36. Solera, F., Calderara, S., & Cucchiara, R. (2015). Learning to divide and conquer for online multi-target tracking. In Proceedings of ICCV.

  37. Szummer, M., Kohli, P., & Hoiem, D. (2008). Learning CRFs using graph cuts. In ECCV.

  38. Tang, S., Andres, B., Andriluka, M., & Schiele, B. (2015). Subgraph decomposition for multi-target tracking. In Proceedings of CVPR.

  39. Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., & Schiele, B. (2013). Learning people detectors for tracking in crowded scenes. In Proceedings of ICCV.

  40. Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In Proceedings of NIPS.

  41. Wang, S., & Fowlkes, C. C. (2015). Learning optimal parameters for multi-target tracking. In Proceedings of BMVC.

  42. Wang, B., Wang, G., Luk Chan, K., & Wang, L. (2014). Tracklet association with online target-specific metric learning. In Proceedings of CVPR.

  43. Wang, X., Yang, M., Zhu, S., & Lin, Y. (2013). Regionlets for generic object detection. In Proceedings of ICCV.

  44. Wu, Z., Thangali, A., Sclaroff, S., & Betke, M. (2012). Coupling detection and data association for multiple object tracking. In Proceedings of CVPR.

  45. Xiang, Y., Alahi, A., & Savarese, S. (2015). Learning to track: Online multi-object tracking by decision making. In Proceedings of ICCV.

  46. Yang, B., & Nevatia, R. (2012). An online learned CRF model for multi-target tracking. In Proceedings of CVPR.

  47. Yoon, J. H., Yang, M., Lim, J., & Yoon, K. (2015). Bayesian multi-object tracking using motion context from multiple objects. In Proceedings of WACV.

  48. Zaied, A. N. H., & Shawky, L. A. E. (2014). A survey of quadratic assignment problems. International Journal of Computer Applications, 101(6), 28–36.

  49. Zhang, L., Li, Y., & Nevatia, R. (2008). Global data association for multi-object tracking using network flows. In Proceedings of CVPR.

Download references


This work was supported by the US National Science Foundation through Awards IIS-1253538 and DBI-1053036.

Author information

Correspondence to Shaofei Wang.

Additional information

Communicated by Xianghua Xie, Mark Jones and Gary Tam.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Fowlkes, C.C. Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions. Int J Comput Vis 122, 484–501 (2017). https://doi.org/10.1007/s11263-016-0960-z

Download citation


  • Multi-target tracking
  • Data association
  • Network-flow
  • Structured prediction