User-Centric Learning and Evaluation of Interactive Segmentation Systems


Many successful applications of computer vision to image or video manipulation are interactive by nature. However, parameters of such systems are often trained neglecting the user. Traditionally, interactive systems have been treated in the same manner as their fully automatic counterparts. Their performance is evaluated by computing the accuracy of their solutions under some fixed set of user interactions. In this paper, we study the problem of evaluating and learning interactive segmentation systems which are extensively used in the real world. The key questions in this context are how to measure (1) the effort associated with a user interaction, and (2) the quality of the segmentation result as perceived by the user. We conduct a user study to analyze user behavior and answer these questions. Using the insights obtained from these experiments, we propose a framework to evaluate and learn interactive segmentation systems which brings the user in the loop. The framework is based on the use of an active robot user—a simulated model of a human user. We show how this approach can be used to evaluate and learn parameters of state-of-the-art interactive segmentation systems. We also show how simulated user models can be integrated into the popular max-margin method for parameter learning and propose an algorithm to solve the resulting optimisation problem.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.

    E.g. ICCV 2007, NIPS 2009 and CVPR 2010.

  2. 2.

    We will refer to each user interaction in this scenario as a brush stroke.

  3. 3. projects/visionimagevideoediting/segmentation/grabcut.htm.

  4. 4.

    The quality of the segmentation results is not affected by this down-scaling.

  5. 5.

    This input is used for both comparison and parameter learning e.g. (Blake et al. 2004; Singaraju et al. 2009).

  6. 6.

    We started the learning from no initial brushes and let it run for 60 brush strokes. The learned parameters were similar as with starting from 20 brushes.

  7. 7.

    Note, one could do even better by looking at two or more brushes after each other and then selecting the optimal one. However, the solution grows exponentially with the number look-ahead steps.

  8. 8.

    This behaviour is also observed in our experiments. Note that after each user interaction we obtain the global optimum of our current energy. Also, note that the energy changes with each user interaction.

  9. 9.

  10. 10.

    This is number-of-data-point-fold cross validation.

  11. 11.

    However, compared to an exhaustive search over all possible joint settings of the parameters, we are not guaranteed to find the global optimum of the objective function.

  12. 12.

    Note, the fact that the uncertainty of the “tight trimap” learning is high, gives an indication that this value can not be trusted very much.

  13. 13.

    We write images of size (n x ×n y ×n c ) as vectors \(\in\mathbb{R}^{n},\:n=n_{x} n_{y} n_{z}\) for simplicity. All involved operations respect the 2d grid structure absent in general n-vectors.

  14. 14.

    We use the Hamming loss Δ H (y ,y k)=1 |y ky |.

  15. 15.

    It is in fact the most informative feature with corresponding predictor given by the identity.

  16. 16.

    To our knowledge, there is no simple graph cut like algorithm to do the minimisation in U all at once.

  17. 17.

    The cost is K runs of dynamic graphcuts of size n, though.

  18. 18.

    In the end, we can only safely flip a single pixel \(u_{i}^{k}\) at a time to guarantee descent.

  19. 19.

    We did not fix w u to 1, as before, to give the system the freedom to set it to 0.


  1. (2010). Amazon mechanical turk.

  2. Bai, X., & Sapiro, G. (2007). A geodesic framework for fast interactive image and video segmentation and matting. In ICCV.

    Google Scholar 

  3. Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). iCoseg: interactive co-segmentation with intelligent scribble guidance. In CVPR.

    Google Scholar 

  4. Blake, A., Rother, C., Brown, M., Perez, P., & Tor, P. (2004). Interactive image segmentation using an adaptive GMMRF model. In ECCV.

    Google Scholar 

  5. Blake, A., Kohli, P., & Rother, C. (2011). Markov random fields for vision and image processing. Cambridge: MIT Press.

    Google Scholar 

  6. Boykov, Y., & Jolly, M. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In ICCV.

    Google Scholar 

  7. Duchenne, O., Audibert, J. Y., Keriven, R., Ponce, J., & Ségonne, F. (2008). Segmentation by transduction. In CVPR.

    Google Scholar 

  8. Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2009).

  9. Finley, T., & Joachims, T. (2008). Training structural SVMs when exact inference is intractable. In ICML.

    Google Scholar 

  10. Grady, L. (2006). Random walks for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1–17.

    Article  Google Scholar 

  11. Gulshan, V., Rother, C., Criminisi, A., Blake, A., & Zisserman, A. (2010). Geodesic star convexity for interactive image segmentation. In CVPR.

    Google Scholar 

  12. Kohli, P., Ladicky, L., & Torr, P. (2008). Robust higher order potentials for enforcing label consistency. In CVPR.

    Google Scholar 

  13. Kohli, P., & Torr, P. (2005). Efficiently solving dynamic MRFs using graph cuts. In ICCV.

    Google Scholar 

  14. Li, Y., Sun, J., Tang, C. K., & Shum, H. Y. (2004). Lazy snapping. In SIGGRAPH (Vol. 23).

    Google Scholar 

  15. Liu, J., Sun, J., & Shum, H. Y. (2009). Paint selection. In SIGGRAPH.

    Google Scholar 

  16. McGuinness, K., & O’Connor, N. E. (2010). A comparative evaluation of interactive segmentation algorithms. Pattern Recognition, 43(2), 434–444.

    MATH  Article  Google Scholar 

  17. McGuinness, K., & O’Connor, N. E. (2011). Toward automated evaluation of interactive segmentation. In CVIU.

    Google Scholar 

  18. Mortensen, E. N., & Barrett, W. A. (1998). Interactive segmentation with intelligent scissors. In Graphical models and image processing.

    Google Scholar 

  19. Nickisch, H., Kohli, P., & Rother, C. (2009). Learning an interactive segmentation system (Tech. rep.).

  20. Nickisch, H., Rother, C., Kohli, P., & Rhemann, C. (2010). Learning and evaluating interactive segmentation systems. In ICVGIP.

    Google Scholar 

  21. Nowozin, S., & Lampert, C. H. (2009). Global connectivity potentials for random field models. In CVPR.

    Google Scholar 

  22. Rother, C., Bordeaux, L., Hamadi, Y., & Blake, A. (2006). Autocollage. ACM Transactions on Graphics, 25(3), 847–852.

    Article  Google Scholar 

  23. Rother, C., Kolmogorov, V., & Blake, A. (2004). “GrabCut”—interactive foreground extraction using iterated graph cuts. In SIGGRAPH.

    Google Scholar 

  24. Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.

    Article  Google Scholar 

  25. Singaraju, D., Grady, L., & Vidal, R. (2009). P-brush: Continuous valued MRFs with normed pairwise distributions for image segmentation. In CVPR.

    Google Scholar 

  26. Sorokin, A., & Forsyth, D. (2008). Utility data annotation with amazon mechanical turk. In Internet vision workshop at CVPR.

    Google Scholar 

  27. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., & Rother, C. (2006). A comparative study of energy minimization methods for Markov random fields. In ECCV.

    Google Scholar 

  28. Szummer, M., Kohli, P., & Hoiem, D. (2008). Learning CRFs using graph cuts. In ECCV.

    Google Scholar 

  29. Taskar, B., Chatalbashev, V., & Koller, D. (2004). Learning associative Markov networks. In ICML.

    Google Scholar 

  30. Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector learning for interdependent and structured output spaces. In ICML.

    Google Scholar 

  31. Vicente, S., Kolmogorov, V., & Rother, C. (2008). Graph cut based image segmentation with connectivity priors. In CVPR.

    Google Scholar 

  32. Vijayanarasimhan, S., & Grauman, K. (2009). What’s it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In CVPR (pp. 2262–2269).

    Google Scholar 

  33. Vijayanarasimhan, S., & Grauman, K. (2011a). Cost-SENSITive active visual category learning. International Journal of Computer Vision, 91(1), 24–44.

    MATH  Article  Google Scholar 

  34. Vijayanarasimhan, S., & Grauman, K. (2011b). Large-scale live active learning: Training object detectors with crawled data and crowds. In CVPR.

    Google Scholar 

  35. von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In SIGCHI (pp. 319–326).

    Google Scholar 

  36. Wasserman, L. (2004). All of statistics. Berlin: Springer.

    Google Scholar 

Download references


Christoph Rhemann was supported by the Vienna Science and Technology Fund (WWTF) under project ICT08-019.

Author information



Corresponding author

Correspondence to Pushmeet Kohli.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Kohli, P., Nickisch, H., Rother, C. et al. User-Centric Learning and Evaluation of Interactive Segmentation Systems. Int J Comput Vis 100, 261–274 (2012).

Download citation


  • Interactive systems
  • Image segmentation
  • Learning