Sequential Clique Optimization for Video Object Segmentation

  • Yeong Jun KohEmail author
  • Young-Yoon Lee
  • Chang-Su Kim
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11218)


A novel algorithm to segment out objects in a video sequence is proposed in this work. First, we extract object instances in each frame. Then, we select a visually important object instance in each frame to construct the salient object track through the sequence. This can be formulated as finding the maximal weight clique in a complete k-partite graph, which is NP hard. Therefore, we develop the sequential clique optimization (SCO) technique to efficiently determine the cliques corresponding to salient object tracks. We convert these tracks into video object segmentation results. Experimental results show that the proposed algorithm significantly outperforms the state-of-the-art video object segmentation and video salient object detection algorithms on recent benchmark datasets.


Video object segmentation Primary object segmentation Salient object detection Sequential clique optimization 



This work was supported partly by the Ministry of Science and ICT (MSIT), Korea, under the Information Technology Research Center (ITRC) support program (IITP-2018-2016-0-00464) supervised by the Institute for Information & communications Technology Promotion, and the National Research Foundations of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2015R1A2A1A10055037 and No. NRF-2018R1A2B3003896).


  1. 1.
    Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: ICCV, pp. 1777–1784 (2013)Google Scholar
  2. 2.
    Faktor, A., Irani, M.: Video segmentation by non-local consensus voting. In: BMVC (2014)Google Scholar
  3. 3.
    Zhang, D., Javed, O., Shah, M.: Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In: CVPR, pp. 628–635 (2013)Google Scholar
  4. 4.
    Jang, W.D., Lee, C., Kim, C.S.: Primary object segmentation in videos via alternate convex optimization of foreground and background distributions. In: CVPR, pp. 696–704 (2016)Google Scholar
  5. 5.
    Koh, Y.J., Jang, W.D., Kim, C.S.: POD: discovering primary objects in videos based on evolutionary refinement of object recurrence, background, and primary object models. In: CVPR, pp. 1068–1076 (2016)Google Scholar
  6. 6.
    Koh, Y.J., Kim, C.S.: Unsupervised primary object discovery in videos based on evolutionary primary object modeling with reliable object proposals. IEEE Trans. Image Process. 26(11), 5203–5216 (2017)CrossRefGoogle Scholar
  7. 7.
    Kim, W., Jung, C., Kim, C.: Spatiotemporal saliency detection and its applications in static and dynamic scenes. IEEE Trans. Circuits Syst. Video Technol. 21(4), 446–456 (2011)CrossRefGoogle Scholar
  8. 8.
    Kim, H., Kim, Y., Sim, J.Y., Kim, C.S.: Spatiotemporal saliency detection for video sequences based on random walk with restart. IEEE Trans. Image Process. 24(8), 2552–2564 (2015)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Chen, C., Li, S., Wang, Y., Qin, H., Hao, A.: Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE Trans. Image Process. 26(7), 3156–3170 (2017)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2018)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: CVPR (2017) 2359–2367Google Scholar
  12. 12.
    Arnab, A., Torr, P.H.: Pixelwise instance segmentation with a dynamically instantiated network. In: CVPR, pp. 44–450 (2017)Google Scholar
  13. 13.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)Google Scholar
  14. 14.
    Li, G., Xie, Y., Lin, L., Yu, Y.: Instance-level salient object segmentation. In: CVPR, pp. 2386–2395 (2017)Google Scholar
  15. 15.
    Jain, S.D., Xiong, B., Grauman, K.: FusionSeg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: CVPR, pp. 3664–3673 (2017)Google Scholar
  16. 16.
    Tokmakov, P., Alahari, K., Schmid, C.: Learning motion patterns in videos. In: CVPR, pp. 3386–3394 (2017)Google Scholar
  17. 17.
    Yang, J., et al.: Discovering primary objects in videos by saliency fusion and iterative appearance estimation. IEEE Trans. Circuits Syst. Video Technol. 26(6), 1070–1083 (2016)CrossRefGoogle Scholar
  18. 18.
    Chartrand, G., Zhang, P.: Chromatic Graph Theory. CRC Press, New York (2008)CrossRefGoogle Scholar
  19. 19.
    Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR, pp. 724–732 (2016)Google Scholar
  20. 20.
    Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1187–1200 (2014)CrossRefGoogle Scholar
  21. 21.
    Lee, Y.J., Kim, J., Grauman, K.: Key-segments for video object segmentation. In: ICCV, pp. 1995–2002 (2011)Google Scholar
  22. 22.
    Ma, T., Latecki, L.J.: Maximum weight cliques with mutex constraints for video object segmentation. In: CVPR, pp. 670–677 (2012)Google Scholar
  23. 23.
    Koh, Y.J., Kim, C.S.: Primary object segmentation in videos based on region augmentation and reduction. In: CVPR, pp. 3442–3450 (2017)Google Scholar
  24. 24.
    Wang, W., Shen, J., Porikli, F.: Saliency-aware geodesic video object segmentation. In: CVPR, pp. 3395–3402 (2015)Google Scholar
  25. 25.
    Yang, J., Price, B., Shen, X., Lin, Z., Yuan, J.: Fast appearance modeling for automatic primary video object segmentation. IEEE Trans. Image Process. 25(2), 503–515 (2016)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Shi, J., Malik, J.: Motion segmentation and tracking using normalized cuts. In: ICCV, pp. 1154–1160 (1998)Google Scholar
  27. 27.
    Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010). Scholar
  28. 28.
    Ochs, P., Brox, T.: Object segmentation in video: a hierarchical variational approach for turning point trajectories into dense regions. In: ICCV, pp. 1583–1590 (2011)Google Scholar
  29. 29.
    Ochs, P., Brox, T.: Higher order motion models and spectral clustering. In: CVPR, pp. 614–621 (2012)Google Scholar
  30. 30.
    Fragkiadaki, K., Zhang, G., Shi, J.: Video segmentation by tracing discontinuities in a trajectory embedding. In: CVPR, pp. 1846–1853 (2012)Google Scholar
  31. 31.
    Tokmakov, P., Alahari, K., Schmid, C.: Learning video object segmentation with visual memory. In: ICCV, pp. 4491–4500 (2017)Google Scholar
  32. 32.
    Cheng, J., Tsai, Y.H., Wang, S., Yang, M.H.: SegFlow: joint learning for video object segmentation and optical flow. In: ICCV, pp. 686–695 (2017)Google Scholar
  33. 33.
    Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixe, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: CVPR, pp. 221–230 (2017)Google Scholar
  34. 34.
    Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: CVPR, pp. 2663–2672 (2017)Google Scholar
  35. 35.
    Jang, W.D., Kim, C.S.: Online video object segmentation via convolutional trident network. In: CVPR, pp. 5849–5858 (2017)Google Scholar
  36. 36.
    Yan, Q., Xu, L., Shi, J., Jia, J.: Hierarchical saliency detection. In: CVPR, pp. 1155–1162 (2013)Google Scholar
  37. 37.
    Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 1915–1926 (2012)CrossRefGoogle Scholar
  38. 38.
    Cheng, M.M., Mitra, N.J., Huang, X., Torr, P.H., Hu, S.M.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2015)CrossRefGoogle Scholar
  39. 39.
    Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: contrast based filtering for salient region detection. In: CVPR, pp. 733–740 (2012)Google Scholar
  40. 40.
    Wei, Y., Wen, F., Zhu, W., Sun, J.: Geodesic saliency using background priors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 29–42. Springer, Heidelberg (2012). Scholar
  41. 41.
    Zhu, W., Liang, S., Wei, Y., Sun, J.: Saliency optimization from robust background detection. In: CVPR, pp. 2814–2821 (2014)Google Scholar
  42. 42.
    Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: CVPR, pp. 3166–3173 (2013)Google Scholar
  43. 43.
    Li, G., Yu, Y.: Deep contrast learning for salient object detection. In: CVPR, pp. 478–487 (2016)Google Scholar
  44. 44.
    Luo, Z., Mishra, A., Achkar, A., Eichel, J., Li, S., Jodoin, P.M.: Non-local deep features for salient object detection. In: CVPR, pp. 6609–6617 (2017)Google Scholar
  45. 45.
    Hu, P., Shuai, B., Liu, J., Wang, G.: Deep level sets for salient object detection. In: CVPR, pp. 2300–2309 (2017)Google Scholar
  46. 46.
    Fang, Y., Wang, Z., Lin, W., Fang, Z.: Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans. Image Process. 23(9), 3910–3921 (2014)MathSciNetCrossRefGoogle Scholar
  47. 47.
    Liu, Z., Zhang, X., Luo, S., Le Meur, O.: Superpixel-based spatiotemporal saliency detection. IEEE Trans. Circuits Syst. Video Technol. 24(9), 1522–1540 (2014)CrossRefGoogle Scholar
  48. 48.
    Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)MathSciNetCrossRefGoogle Scholar
  49. 49.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  50. 50.
    Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)CrossRefGoogle Scholar
  51. 51.
    Pan, J.Y., Yang, H.J., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: Proceedings of the ACM SIGKDD, pp. 653–658 (2004)Google Scholar
  52. 52.
    Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: CVPR, pp. 524–531 (2005)Google Scholar
  53. 53.
    Galasso, F., Nagaraja, N.S., Cardenas, T.J., Brox, T., Schiele, B.: A unified video segmentation benchmark: Annotation, metrics and analysis. In: ICCV, pp. 3527–3534 (2013)Google Scholar
  54. 54.
    Feremans, C., Labbé, M., Laporte, G.: Generalized network design problems. Eur. J. Oper. Res. 148(1), 1–13 (2003)MathSciNetCrossRefGoogle Scholar
  55. 55.
    Roshan Zamir, A., Dehghan, A., Shah, M.: GMCP-tracker: global multi-object tracking using generalized minimum clique graphs. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 343–356. Springer, Heidelberg (2012). Scholar
  56. 56.
    Althaus, E., Kohlbacher, O., Lenhof, H.P., Müller, P.: A combinatorial approach to protein docking with flexible side chains. J. Comput. Biol. 9(4), 597–612 (2002)CrossRefGoogle Scholar
  57. 57.
    Dehghan, A., Modiri Assari, S., Shah, M.: GMMCP tracker: globally optimal generalized maximum multi clique problem for multiple object tracking. In: CVPR, pp. 4091–4099 (2015)Google Scholar
  58. 58.
    Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (1955)MathSciNetCrossRefGoogle Scholar
  59. 59.
    Rother, C., Kolmogorov, V., Blake, A.: GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 309–314 (2004)CrossRefGoogle Scholar
  60. 60.
    Chinneck, J.W.: Practical Optimization: A Gentle Introduction. Systems and Computer Engineering (2006)Google Scholar
  61. 61.
    Taylor, B., Karasev, V., Soattoc, S.: Causal video object segmentation from persistence of occlusions. In: CVPR, pp. 4268–4276 (2015)Google Scholar
  62. 62.
    Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. In: BMVC (2017)Google Scholar
  63. 63.
    Li, G., Yu, Y.: Visual saliency based on multiscale deep features. In: CVPR, pp. 5455–5463 (2015)Google Scholar
  64. 64.
    Jiang, B., Zhang, L., Lu, H., Yang, C., Yang, M.H.: Saliency detection via absorbing Markov chain. In: ICCV, pp. 1665–1672 (2013)Google Scholar
  65. 65.
    Zhou, F., Kang, S.B., Cohen, M.F.: Time-mapping using space-time saliency. In: CVPR, pp. 3358–3365 (2014)Google Scholar
  66. 66.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Electrical EngineeringKorea UniversitySeoulKorea
  2. 2.Samsung Electronics Co., Ltd.SeoulKorea

Personalised recommendations