International Journal of Computer Vision

, Volume 93, Issue 3, pp 273–292 | Cite as

Interactively Co-segmentating Topically Related Images with Intelligent Scribble Guidance

  • Dhruv BatraEmail author
  • Adarsh Kowdle
  • Devi Parikh
  • Jiebo Luo
  • Tsuhan Chen


We present an algorithm for Interactive Co-segmentation of a foreground object from a group of related images. While previous works in co-segmentation have focussed on unsupervised co-segmentation, we use successful ideas from the interactive object-cutout literature. We develop an algorithm that allows users to decide what foreground is, and then guide the output of the co-segmentation algorithm towards it via scribbles. Interestingly, keeping a user in the loop leads to simpler and highly parallelizable energy functions, allowing us to work with significantly more images per group. However, unlike the interactive single-image counterpart, a user cannot be expected to exhaustively examine all cutouts (from tens of images) returned by the system to make corrections. Hence, we propose iCoseg, an automatic recommendation system that intelligently recommends where the user should scribble next. We introduce and make publicly available the largest co-segmentation dataset yet, the CMU-Cornell iCoseg dataset, with 38 groups, 643 images, and pixelwise hand-annotated groundtruth. Through machine experiments and real user studies with our developed interface, we show that iCoseg can intelligently recommend regions to scribble on, and users following these recommendations can achieve good quality cutouts with significantly lower time and effort than exhaustively examining all cutouts.


Interactive segmentation Co-segmentation Scribbles Energy minimization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bagon, S. (2006). Matlab wrapper for graph cut.
  2. Bai, X., & Sapiro, G. (2007). A geodesic framework for fast interactive image and video segmentation and matting. In ICCV. Google Scholar
  3. Batra, D., Sukthankar, R., & Chen, T. (2008). Semi-supervised clustering via learnt codeword distances. In BMVC. Google Scholar
  4. Batra, D., Kowdle, A., Parikh, D., Tang, K., & Chen, T. (2009). Interactive Co-segmentation by Touch.
  5. Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). icoseg: interactive co-segmentation with intelligent scribble guidance. In CVPR. Google Scholar
  6. Bouman, C. A. (1997). Cluster: an unsupervised algorithm for modeling Gaussian mixtures. Available from
  7. Boykov, Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In ICCV. Google Scholar
  8. Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1124–1137. CrossRefGoogle Scholar
  9. Boykov, Y., Veksler, O., & Zabih, R. (2001). Efficient approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12), 1222–1239. CrossRefGoogle Scholar
  10. Chen, Y., & Medioni, G. (1992). Object modelling by registration of multiple range images. Image and Vision Computing, 10(3), 145–155. CrossRefGoogle Scholar
  11. Chen, Z., Chou, H. L., & Chen, W. C. (2008). A performance controllable octree construction method. In ICPR (pp. 1–4). Google Scholar
  12. Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Towards scalable dataset construction: an active learning approach. In ECCV. Google Scholar
  13. Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619. CrossRefGoogle Scholar
  14. Criminisi, A., Sharp, T., & Blake, A. (2008). Geos: geodesic image segmentation. In ECCV . Google Scholar
  15. Cui, J., Yang, Q., Wen, F., Wu, Q., Zhang, C. Gool, L. V., & Tang, X. (2008). Transductive object cutout. In CVPR. Google Scholar
  16. Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In SIGGRAPH ’96: proceedings of the 23rd annual conference on computer graphics and interactive techniques (pp. 303–312). New York: ACM. doi: CrossRefGoogle Scholar
  17. Fang, Y. H., Chou, H. L., & Chen, Z. (2003). 3d shape recovery of complex objects from multiple silhouette images. Pattern Recognition Letters, 24(9–10), 1279–1293. CrossRefzbMATHGoogle Scholar
  18. Fitzgibbon, A. W., Cross, G., & Zisserman, A. (1998). Automatic 3d model construction for turn-table sequences. In Proceedings of SMILE workshop on structure from multiple images in large scale environments (Vol. 1560, pp. 154–170). Google Scholar
  19. Forbes, K., Nicolls, F., de Jager, G., & Voigt, A. (2006). Shape-from-silhouette with two mirrors and an uncalibrated camera. In ECCV (pp. 165–178). Google Scholar
  20. Franco, J. S., & Boyer, E. (2003). Exact polyhedral visual hulls. In BMVC (Vol. 1, pp. 329–338). Google Scholar
  21. Gallagher, A., & Chen, T. (2008). Estimating age, gender and identity using first name priors. In CVPR. Google Scholar
  22. Hochbaum, D. S., & Singh, V. (2009). An efficient algorithm for co-segmentation. In ICCV. Google Scholar
  23. Hoiem, D., Efros, A. A., & Hebert, M. (2005). Geometric context from a single image. In ICCV. Google Scholar
  24. Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with Gaussian processes for object categorization. In ICCV. Google Scholar
  25. Kohli, P., & Torr, P. H. S. (2008). Measuring uncertainty in graph cut solutions. Computer Vision and Image Understanding, 112(1), 30–38. CrossRefGoogle Scholar
  26. Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 147–159. CrossRefGoogle Scholar
  27. Kowdle, A., Batra, D., Chen, W. C., & Chen, T. (2010). imodel: interactive co-segmentation for object of interest 3d modeling. In Workshop on reconstruction and modeling of large-scale 3D virtual environments at European conference on computer vision. Google Scholar
  28. Lee, Y. J., & Grauman, K. (2010). Collect-cut: segmentation with top-down cues discovered in multi-object images. In CVPR. Google Scholar
  29. Leung, T., & Malik, J. (1998). Contour continuity in region based image segmentation. In ECCV. Google Scholar
  30. Levoy, M., Pulli, K., Curless, B., Rusinkiewicz, S., Koller, D., Pereira, L., Ginzton, M., Anderson, S., Davis, J., Ginsberg, J., Shade, J., & Fulk, D. (2000). The digital Michelangelo project: 3d scanning of large statues. In SIGGRAPH (pp. 131–144). CrossRefGoogle Scholar
  31. Li, Y., Sun, J., Tang, C. K., & Shum, H. Y. (2004). Lazy snapping. In SIGGRAPH. Google Scholar
  32. Mu, Y., & Zhou, B. (2007). Co-segmentation of image pairs with quadratic global constraint in MRFs. In ACCV. Google Scholar
  33. Mukherjee, L., Singh, V., & Dyer, C. R. (2009). Half-integrality based algorithms for co-segmentation of images. In CVPR. Google Scholar
  34. Rother, C., Kolmogorov, V., & Blake, A. (2004). “Grabcut”: interactive foreground extraction using iterated graph cuts. In SIGGRAPH. Google Scholar
  35. Rother, C., Minka, T., Blake, A., & Kolmogorov, V. (2006). Cosegmentation of image pairs by histogram matching—incorporating a global constraint into MRFs. In CVPR. Google Scholar
  36. Schnitman, Y., Caspi, Y., Cohen Or, D., & Lischinski, D. (2006). Inducing semantic segmentation from an example. In ACCV. Google Scholar
  37. Settles, B. (2009). Active learning literature survey (Computer Sciences Technical Report 1648). Madison: University of Wisconsin. Google Scholar
  38. Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In COLT. Google Scholar
  39. Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo tourism: exploring photo collections in 3d. In SIGGRAPH (pp. 835–846). Google Scholar
  40. Starck, J., & Hilton, A. (2007). Surface capture for performance-based animation. IEEE Computer Graphics and Applications, 27(3), 21–31. CrossRefGoogle Scholar
  41. Szeliski, R. (1993). Rapid octree construction from image sequences. CVGIP. Image Understanding, 58(1), 23–32. CrossRefGoogle Scholar
  42. Vicente, S., Kolmogorov, V., & Rother, C. (2010). Cosegmentation revisited: models and optimization. In ECCV. Google Scholar
  43. Vijayanarasimhan, S., & Grauman, K. (2009). What’s it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In CVPR. Google Scholar
  44. Vlasic, D., Baran, I., Matusik, W., & Popović, J. (2008). Articulated mesh animation from multi-view silhouettes. In SIGGRAPH (pp. 1–9). New York: ACM. Google Scholar
  45. Yan, R., Yang, J., & Hauptmann, A. (2003). Automatically labeling video data using multi-class active learning. In ICCV. Google Scholar
  46. Zhang, L., Curless, B., & Seitz, S. M. (2002). Rapid shape acquisition using color structured light and multi-pass dynamic programming. In 3DPVT (p. 24). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Dhruv Batra
    • 1
    Email author
  • Adarsh Kowdle
    • 2
  • Devi Parikh
    • 3
  • Jiebo Luo
    • 4
  • Tsuhan Chen
    • 2
  1. 1.Carnegie Mellon UniversityPittsburghUSA
  2. 2.Cornell UniversityIthacaUSA
  3. 3.Toyota Technological InstituteChicagoUSA
  4. 4.Eastman Kodak CompanyRochesterUSA

Personalised recommendations