Efficient Image and Video Co-localization with Frank-Wolfe Algorithm

  • Armand Joulin
  • Kevin Tang
  • Li Fei-Fei
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8694)


In this paper, we tackle the problem of performing efficient co-localization in images and videos. Co-localization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images or videos. Building upon recent state-of-the-art methods, we show how we are able to naturally incorporate temporal terms and constraints for video co-localization into a quadratic programming framework. Furthermore, by leveraging the Frank-Wolfe algorithm (or conditional gradient), we show how our optimization formulations for both images and videos can be reduced to solving a succession of simple integer programs, leading to increased efficiency in both memory and speed. To validate our method, we present experimental results on the PASCAL VOC 2007 dataset for images and the YouTube-Objects dataset for videos, as well as a joint combination of the two.


Object Tracking Image Model Video Model Temporal Consistency Adjacent Frame 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE T-PAMI 34(11), 2189–2202 (2012)CrossRefGoogle Scholar
  2. 2.
    Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE T-PAMI 33(8), 1619–1632 (2011)CrossRefGoogle Scholar
  3. 3.
    Bach, F., Harchaoui, Z.: Diffrac: a discriminative and flexible framework for clustering. In: NIPS (2007)Google Scholar
  4. 4.
    Berclaz, J., Fleuret, F., Türetken, E., Fua, P.: Multiple object tracking using k-shortest paths optimization. IEEE T-PAMI 33(9), 1806–1819 (2011)CrossRefGoogle Scholar
  5. 5.
    Bergh, M.V.D., Roig, G., Boix, X., Manen, S., Gool, L.V.: Online video seeds for temporal window objectness. In: ICCV (2013)Google Scholar
  6. 6.
    Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE T-PAMI 23(11), 1222–1239 (2001)CrossRefGoogle Scholar
  7. 7.
    Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Carreira, J., Sminchisescu, C.: Constrained parametric min-cuts for automatic object segmentation. In: CVPR (2010)Google Scholar
  9. 9.
    Chiu, W.C., Fritz, M.: Multi-class video co-segmentation with a generative multi-video model. In: CVPR (2013)Google Scholar
  10. 10.
    Chari, V., Lacoste-Julien, S., Sivic, J., Laptev, I.: On pairwise cost for multi-object network flow tracking. Tech. rep., arXiv (2014)Google Scholar
  11. 11.
    Chum, O., Zisserman, A.: An exemplar model for learning object classes. In: CVPR (2007)Google Scholar
  12. 12.
    Delong, A., Gorelick, L., Veksler, O., Boykov, Y.: Minimizing energies with hierarchical costs. IJCV 100(1), 38–58 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Delong, A., Osokin, A., Isack, H.N., Boykov, Y.: Fast approximate energy minimization with label costs. IJCV 96(1), 1–27 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Deselaers, T., Alexe, B., Ferrari, V.: Weakly supervised localization and learning with generic knowledge. IJCV 100(3), 275–293 (2012)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Dunn, J.C.: Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM Journal on Control and Optimization 18(5), 473–487 (1980)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results (2007)Google Scholar
  17. 17.
    Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Research Logistics Quarterly 3(1-2), 95–110 (1956)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Guelat, J., Marcotte, P.: Some comments on wolfe’s away step. Mathematical Programming 35(1), 110–119 (1986)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Hare, S., Saffari, A., Torr, P.H.S.: Struck: Structured output tracking with kernels. In: ICCV (2011)Google Scholar
  20. 20.
    Jaggi, M.: Revisiting frank-wolfe: Projection-free sparse convex optimization. In: ICML, pp. 427–435 (2013)Google Scholar
  21. 21.
    Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: CVPR (2010)Google Scholar
  22. 22.
    Joulin, A., Bach, F., Ponce, J.: Multi-class cosegmentation. In: CVPR (2012)Google Scholar
  23. 23.
    Kim, G., Xing, E.P., Fei-Fei, L., Kanade, T.: Distributed cosegmentation via submodular optimization on anisotropic diffusion. In: ICCV (2011)Google Scholar
  24. 24.
    Kuettel, D., Guillaumin, M., Ferrari, V.: Segmentation propagation in imagenet. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 459–473. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  25. 25.
    Lacoste-Julien, S., Jaggi, M.: An affine invariant linear convergence analysis for frank-wolfe algorithms. arXiv preprint arXiv:1312.7864 (2013)Google Scholar
  26. 26.
    Lacoste-Julien, S., Jaggi, M., Schmidt, M., Pletscher, P.: Block-coordinate frank-wolfe optimization for structural svms. In: ICML, vol. 28, pp. 1438–1444 (2012)Google Scholar
  27. 27.
    Lampert, C.H., Krömer, O.: Weakly-paired maximum covariance analysis for multimodal dimensionality reduction and transfer learning. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 566–579. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  28. 28.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  29. 29.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  30. 30.
    Manén, S., Guillaumin, M., Van Gool, L.: Prime Object Proposals with Randomized Prim’s Algorithm. In: ICCV (2013)Google Scholar
  31. 31.
    Nguyen, M.H., Torresani, L., de la Torre, F., Rother, C.: Weakly supervised discriminative localization and classification: a joint learning process. In: ICCV (2009)Google Scholar
  32. 32.
    Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: ICCV (2011)Google Scholar
  33. 33.
    Pang, Y., Ling, H.: Finding the best from the second bests - inhibiting subjective bias in evaluation of visual tracking algorithms. In: ICCV (2013)Google Scholar
  34. 34.
    Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: ICCV (2013)Google Scholar
  35. 35.
    Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: Contrast based filtering for salient region detection. In: CVPR (2012)Google Scholar
  36. 36.
    Pérez, P., Hue, C., Vermaak, J., Gangnet, M.: Color-based probabilistic tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 661–675. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  37. 37.
    Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: CVPR (2012)Google Scholar
  38. 38.
    Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: CVPR (2013)Google Scholar
  39. 39.
    Rubio, J.C., Serrat, J., López, A.: Video co-segmentation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part II. LNCS, vol. 7725, pp. 13–24. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  40. 40.
    Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: CVPR (2006)Google Scholar
  41. 41.
    van de Sande, K.E.A., Uijlings, J.R.R., Gevers, T., Smeulders, A.W.M.: Segmentation as selective search for object recognition. In: ICCV (2011)Google Scholar
  42. 42.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE T-PAMI 22(8), 888–905 (2000)CrossRefGoogle Scholar
  43. 43.
    Siva, P., Russell, C., Xiang, T., de Agapito, L.: Looking beyond the image: Unsupervised learning for object saliency and detection. In: CVPR (2013)Google Scholar
  44. 44.
    Tang, K., Joulin, A., Li, L.J., Fei-Fei, L.: Co-localization in real-world images. In: CVPR (2014)Google Scholar
  45. 45.
    Tang, K., Ramanathan, V., Fei-Fei, L., Koller, D.: Shifting weights: Adapting object detectors from image to video. In: NIPS (2012)Google Scholar
  46. 46.
    Tang, K., Sukthankar, R., Yagnik, J., Fei-Fei, L.: Discriminative segment annotation in weakly labeled video. In: CVPR (2013)Google Scholar
  47. 47.
    Vicente, S., Rother, C., Kolmogorov, V.: Object cosegmentation. In: CVPR (2011)Google Scholar
  48. 48.
    Wolfe, P.: Convergence theory in nonlinear programming. In: Integer and Nonlinear Programming, pp. 1–36 (1970)Google Scholar
  49. 49.
    Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: NIPS (2004)Google Scholar
  50. 50.
    Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. ACM Comput. Surv. 38(4) (2006)Google Scholar
  51. 51.
    Zhou, G.T., Lan, T., Vahdat, A., Mori, G.: Latent maximum margin clustering. In: NIPS (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Armand Joulin
    • 1
  • Kevin Tang
    • 1
  • Li Fei-Fei
    • 1
  1. 1.Computer Science DepartmentStanford UniversityUSA

Personalised recommendations