CATS: Co-saliency Activated Tracklet Selection for Video Co-localization

  • Koteswar Rao JerripothulaEmail author
  • Jianfei Cai
  • Junsong Yuan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9911)


Video co-localization is the task of jointly localizing common objects across videos. Due to the appearance variations both across the videos and within the video, it is a challenging problem to identify and track them without any supervision. In contrast to previous joint frameworks that use bounding box proposals to attack the problem, we propose to leverage co-saliency activated tracklets to address the challenge. To identify the common visual object, we first explore inter-video commonness, intra-video commonness, and motion saliency to generate the co-saliency maps. Object proposals of high objectness and co-saliency scores are tracked across short video intervals to build tracklets. The best tube for a video is obtained through tracklet selection from these intervals based on confidence and smoothness between the adjacent tracklets, with the help of dynamic programming. Experimental results on the benchmark YouTube Object dataset show that the proposed method outperforms state-of-the-art methods.


Tracklet Co-localization Co-saliency Co-detection Video Cats 



This research was carried out at the Rapid-Rich Object Search (ROSE) Lab at the Nanyang Technological University, Singapore. The ROSE Lab is supported by the National Research Foundation, Prime Ministers Office, Singapore, under its IDM Futures Funding Initiative and administered by the Interactive and Digital Media Programme Office. This work is supported in part by Singapore Ministry of Education Academic Research Fund Tier 2 MOE2015-T2-2-114.

Supplementary material

Supplementary material 1 (avi 29222 KB)


  1. 1.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI) 34(11), 2189–2202 (2012)CrossRefGoogle Scholar
  2. 2.
    Alt, N., Hinterstoisser, S., Navab, N.: Rapid selection of reliable templates for visual tracking. In: Computer Vision and Pattern Recognition (CVPR), pp. 1355–1362. IEEE (2010)Google Scholar
  3. 3.
    Bao, C., Wu, Y., Ling, H., Ji, H.: Real time robust l1 tracker using accelerated proximal gradient approach. In: Computer Vision and Pattern Recognition (CVPR), pp. 1830–1837. IEEE.(2012)Google Scholar
  4. 4.
    Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15555-0_21 CrossRefGoogle Scholar
  5. 5.
    Chang, K.Y., Liu, T.L., Lai, S.H.: From co-saliency to co-segmentation: an efficient and fully unsupervised energy minimization model. In: Computer Vision and Pattern Recognition (CVPR), pp. 2129–2136. IEEE (2011)Google Scholar
  6. 6.
    Chen, H.T.: Preattentive co-saliency detection. In: International Conference on Image Processing (ICIP), pp. 1117–1120. IEEE (2010)Google Scholar
  7. 7.
    Fu, H., Cao, X., Tu, Z.: Cluster-based co-saliency detection. IEEE Trans. Image Process. (T-IP) 22(10), 3766–3778 (2013)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Jacobs, D.E., Goldman, D.B., Shechtman, E.: Cosaliency: where people look when comparing images. In: ACM Symposium on User Interface Software and Technology, pp. 219–228. ACM (2010)Google Scholar
  9. 9.
    Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 2555–2562. IEEE (2013)Google Scholar
  10. 10.
    Jerripothula, K.R., Cai, J., Yuan, J.: Group saliency propagation for large scale and quick image co-segmentation. In: International Conference on Image Processing (ICIP), pp. 4639–4643. IEEE (2015)Google Scholar
  11. 11.
    Jerripothula, K.R., Cai, J., Meng, F., Yuan, J.: Automatic image co-segmentation using geometric mean saliency. In: International Conference on Image Processing (ICIP), pp. 3282–3286. IEEE (2014)Google Scholar
  12. 12.
    Jiang, H., Wang, J., Yuan, Z., Wu, Y., Zheng, N., Li, S.: Salient object detection: a discriminative regional feature integration approach. In: Computer Vision and Pattern Recognition (CVPR), pp. 2083–2090. IEEE (2013)Google Scholar
  13. 13.
    Joulin, A., Tang, K., Fei-Fei, L.: Efficient image and video co-localization with frank-wolfe algorithm. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 253–268. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10599-4_17 Google Scholar
  14. 14.
    Kwak, S., Cho, M., Laptev, I., Ponce, J., Schmid, C.: Unsupervised object discovery and tracking in video collections. In: International Conference on Computer Vision (ICCV), pp. 3173–3181. IEEE (2015)Google Scholar
  15. 15.
    Li, H., Ngan, K.N.: A co-saliency model of image pairs. IEEE Trans. Image Process. (T-IP) 20(12), 3365–3375 (2011)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Liu, C., Yuen, J., Torralba, A.: Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI) 33(5), 978–994 (2011)CrossRefGoogle Scholar
  17. 17.
    Liu, Z., Zou, W., Li, L., Shen, L., Le Meur, O.: Co-saliency detection based on hierarchical segmentation. IEEE Sig. Process. Lett. 21(1), 88–92 (2014)CrossRefGoogle Scholar
  18. 18.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence (IJCAI), vol. 2, pp. 674–679. Morgan Kaufmann Publishers Inc. (1981)Google Scholar
  19. 19.
    Matthews, I., Ishikawa, T., Baker, S.: The template update problem. IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI) 26(6), 810–815 (2004)CrossRefGoogle Scholar
  20. 20.
    Mei, X., Ling, H.: Robust visual tracking using l1 minimization. In: International Conference on Computer Vision (ICCV), pp. 1436–1443. IEEE (2009)Google Scholar
  21. 21.
    Mei, X., Ling, H., Wu, Y., Blasch, E., Bai, L.: Efficient minimum error bounded particle resampling l1 tracker with occlusion detection. IEEE Trans. Image Process. (T-IP) 22(7), 2661–2675 (2013)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. (IJCV) 42(3), 145–175 (2001). SpringerCrossRefzbMATHGoogle Scholar
  23. 23.
    Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: International Conference on Computer Vision (ICCV), pp. 1777–1784. IEEE (2013)Google Scholar
  24. 24.
    Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: Computer Vision and Pattern Recognition (CVPR), pp. 3282–3289. IEEE (2012)Google Scholar
  25. 25.
    Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: interactive foreground extraction using iterated graph cuts. In: SIGGRAPH 2004, pp. 309–314. ACM (2004)Google Scholar
  26. 26.
    Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: Computer Vision and Pattern Recognition (CVPR), pp. 1939–1946. IEEE (2013)Google Scholar
  27. 27.
    Vicente, S., Rother, C., Kolmogorov, V.: Object cosegmentation. In: Computer Vision and Pattern Recognition (CVPR), pp. 2217–2224. IEEE (2011)Google Scholar
  28. 28.
    Wang, L., Hua, G., Sukthankar, R., Xue, J., Zheng, N.: Video object discovery and co-segmentation with extremely weak supervision. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 640–655. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10593-2_42 Google Scholar
  29. 29.
    Zhang, D., Han, J., Li, C., Wang, J.: Co-saliency detection via looking deep and wide. In: Computer Vision and Pattern Recognition, pp. 2994–3002. IEEE (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Koteswar Rao Jerripothula
    • 1
    • 2
    Email author
  • Jianfei Cai
    • 2
  • Junsong Yuan
    • 3
  1. 1.Interdisciplinary Graduate SchoolNanyang Technological UniversitySingaporeSingapore
  2. 2.School of Computer Science and EngineeringNanyang Technological UniversitySingaporeSingapore
  3. 3.School of Electrical and Electronic EngineeringNanyang Technological UniversitySingaporeSingapore

Personalised recommendations