Two-Granularity Tracking: Mediating Trajectory and Detection Graphs for Tracking under Occlusions

  • Katerina Fragkiadaki
  • Weiyu Zhang
  • Geng Zhang
  • Jianbo Shi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7576)


We propose a tracking framework that mediates grouping cues from two levels of tracking granularities, detection tracklets and point trajectories, for segmenting objects in crowded scenes. Detection tracklets capture objects when they are mostly visible. They may be sparse in time, may miss partially occluded or deformed objects, or contain false positives. Point trajectories are dense in space and time. Their affinities integrate long range motion and 3D disparity information, useful for segmentation. Affinities may leak though across similarly moving objects, since they lack model knowledge. We establish one trajectory and one detection tracklet graph, encoding grouping affinities in each space and associations across. Two-granularity tracking is cast as simultaneous detection tracklet classification and clustering (cl2) in the joint space of tracklets and trajectories. We solve cl2 by explicitly mediating contradictory affinities in the two graphs: Detection tracklet classification modifies trajectory affinities to reflect object specific dis-associations. Non-accidental grouping alignment between detection tracklets and trajectory clusters boosts or rejects corresponding detection tracklets, changing accordingly their classification.We show our model can track objects through sparse, inaccurate detections and persistent partial occlusions. It adapts to the changing visibility masks of the targets, in contrast to detection based bounding box trackers, by effectively switching between the two granularities according to object occlusions, deformations and background clutter.


  1. 1.
    Borenstein, E., Ullman, S.: Combined top-down/bottom-up segmentation. TPAMI 30Google Scholar
  2. 2.
    Levin, A., Weiss, Y.: Learning to Combine Bottom-Up and Top-Down Segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 581–594. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Zhang, W., Srinivasan, P., Shi, J.: Discriminative image warping with attribute flow. In: CVPR (2011)Google Scholar
  4. 4.
    Pantofaru, C., Schmid, C., Hebert, M.: Object Recognition by Integrating Multiple Image Segmentations. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 481–494. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Ionescu, C., Li, F., Sminchisescu, C.: Latent structured models for human pose estimation. In: ICCV (2011)Google Scholar
  6. 6.
    Brox, T., Malik, J.: Object Segmentation by Long Term Analysis of Point Trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Fragkiadaki, K., Shi, J.: Exploiting motion and topology for segmenting and tracking under entanglement. In: CVPR (2011)Google Scholar
  8. 8.
    Lezama, J., Alahari, K., Sivic, J., Laptev, I.: Track to the future: Spatio-temporal video segmentation with long-range motion cues. In: CVPR (2011)Google Scholar
  9. 9.
    Wang, S., Lu, H., Yang, F., Yang, M.H.: Superpixel tracking. In: ICCV (2011)Google Scholar
  10. 10.
    Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Gool, L.V.: Robust tracking-by-detection using a detector confidence particle filter. In: ICCV (2009)Google Scholar
  11. 11.
    Leibe, B., Cornelis, N., Cornelis, K., Gool, L.V.: Dynamic 3D scene analysis from a moving vehicle. In: CVPR (2007)Google Scholar
  12. 12.
    William Brendel, M.A.: Multiobject tracking as maximum-weight independent set. In: CVPR (2011)Google Scholar
  13. 13.
    Ren, X., Malik, J.: Tracking as repeated figure/ground segmentation. In: CVPR (2007)Google Scholar
  14. 14.
    Mitzel, D., Horbert, E., Ess, A., Leibe, B.: Multi-person Tracking with Sparse Detection and Continuous Segmentation. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 397–410. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Wu, B., Nevatia, R.: Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. IJCV (2007)Google Scholar
  16. 16.
    Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: CVPR (2008)Google Scholar
  17. 17.
    Huang, C., Wu, B., Nevatia, R.: Robust Object Tracking by Hierarchical Association of Detection Responses. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 788–801. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Bibby, C., Reid, I.: Robust Real-Time Visual Tracking Using Pixel-Wise Posteriors. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 831–844. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Bibby, C., Reid, I.: Real-time tracking of multiple occluding objects using level sets. In: CVPR (2010)Google Scholar
  20. 20.
    Mitzel, D., Horbert, E., Ess, A., Leibe, B.: Level-set person segmentation and tracking with multi-region appearemnce models and top-down shape information. In: ICCV (2011)Google Scholar
  21. 21.
    Brox, T., Malik, J.: Large displacement optical flow: Descriptor matching in variational motion estimation. TPAMI (2010)Google Scholar
  22. 22.
    Sundaram, N., Brox, T., Keutzer, K.: Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 438–451. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Yu, S.X., Gross, R., Shi, J.: Concurrent object recognition and segmentation by graph partitioning. In: NIPS (2002)Google Scholar
  24. 24.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. TPAMI (2000)Google Scholar
  25. 25.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT 1998 (1998)Google Scholar
  26. 26.
    Malisiewicz, T., Efros, A.A.: Improving spatial support for objects via multiple segmentations. In: BMVC (2007)Google Scholar
  27. 27.
    Ramanan, D.: Using segmentation to verify object hypotheses. In: CVPR (2007)Google Scholar
  28. 28.
    Gong, H., Simy, J., Likhachev, M., Shi, J.: Multi-hypothesis motion planning for visual object tracking. In: ICCV (2011)Google Scholar
  29. 29.
    Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting People Using Mutually Consistent Poselet Activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  30. 30.
    Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the clear mot metrics. J. Image Video Process. (2008)Google Scholar
  31. 31.
    Cech, J., Sára, R.: Efficient sampling of disparity space for fast and accurate matching. In: CVPR (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Katerina Fragkiadaki
    • 1
  • Weiyu Zhang
    • 1
  • Geng Zhang
    • 2
  • Jianbo Shi
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of PennsylvaniaUSA
  2. 2.Institue of Artificial Intelligence and RoboticsXi’an Jiaotong UniversityChina

Personalised recommendations