Skip to main content

Advertisement

Log in

User-Centric Learning and Evaluation of Interactive Segmentation Systems

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Many successful applications of computer vision to image or video manipulation are interactive by nature. However, parameters of such systems are often trained neglecting the user. Traditionally, interactive systems have been treated in the same manner as their fully automatic counterparts. Their performance is evaluated by computing the accuracy of their solutions under some fixed set of user interactions. In this paper, we study the problem of evaluating and learning interactive segmentation systems which are extensively used in the real world. The key questions in this context are how to measure (1) the effort associated with a user interaction, and (2) the quality of the segmentation result as perceived by the user. We conduct a user study to analyze user behavior and answer these questions. Using the insights obtained from these experiments, we propose a framework to evaluate and learn interactive segmentation systems which brings the user in the loop. The framework is based on the use of an active robot user—a simulated model of a human user. We show how this approach can be used to evaluate and learn parameters of state-of-the-art interactive segmentation systems. We also show how simulated user models can be integrated into the popular max-margin method for parameter learning and propose an algorithm to solve the resulting optimisation problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. E.g. ICCV 2007, NIPS 2009 and CVPR 2010.

  2. We will refer to each user interaction in this scenario as a brush stroke.

  3. http://research.microsoft.com/en-us/um/cambridge/ projects/visionimagevideoediting/segmentation/grabcut.htm.

  4. The quality of the segmentation results is not affected by this down-scaling.

  5. This input is used for both comparison and parameter learning e.g. (Blake et al. 2004; Singaraju et al. 2009).

  6. We started the learning from no initial brushes and let it run for 60 brush strokes. The learned parameters were similar as with starting from 20 brushes.

  7. Note, one could do even better by looking at two or more brushes after each other and then selecting the optimal one. However, the solution grows exponentially with the number look-ahead steps.

  8. This behaviour is also observed in our experiments. Note that after each user interaction we obtain the global optimum of our current energy. Also, note that the energy changes with each user interaction.

  9. http://www.robots.ox.ac.uk/~vgg/research/iseg/.

  10. This is number-of-data-point-fold cross validation.

  11. However, compared to an exhaustive search over all possible joint settings of the parameters, we are not guaranteed to find the global optimum of the objective function.

  12. Note, the fact that the uncertainty of the “tight trimap” learning is high, gives an indication that this value can not be trusted very much.

  13. We write images of size (n x ×n y ×n c ) as vectors \(\in\mathbb{R}^{n},\:n=n_{x} n_{y} n_{z}\) for simplicity. All involved operations respect the 2d grid structure absent in general n-vectors.

  14. We use the Hamming loss Δ H (y ,y k)=1 |y ky |.

  15. It is in fact the most informative feature with corresponding predictor given by the identity.

  16. To our knowledge, there is no simple graph cut like algorithm to do the minimisation in U all at once.

  17. The cost is K runs of dynamic graphcuts of size n, though.

  18. In the end, we can only safely flip a single pixel \(u_{i}^{k}\) at a time to guarantee descent.

  19. We did not fix w u to 1, as before, to give the system the freedom to set it to 0.

References

  • amazon.com (2010). Amazon mechanical turk. https://www.mturk.com

  • Bai, X., & Sapiro, G. (2007). A geodesic framework for fast interactive image and video segmentation and matting. In ICCV.

    Google Scholar 

  • Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). iCoseg: interactive co-segmentation with intelligent scribble guidance. In CVPR.

    Google Scholar 

  • Blake, A., Rother, C., Brown, M., Perez, P., & Tor, P. (2004). Interactive image segmentation using an adaptive GMMRF model. In ECCV.

    Google Scholar 

  • Blake, A., Kohli, P., & Rother, C. (2011). Markov random fields for vision and image processing. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Boykov, Y., & Jolly, M. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In ICCV.

    Google Scholar 

  • Duchenne, O., Audibert, J. Y., Keriven, R., Ponce, J., & Ségonne, F. (2008). Segmentation by transduction. In CVPR.

    Google Scholar 

  • Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). http://www.pascal-network.org/challenges/VOC

  • Finley, T., & Joachims, T. (2008). Training structural SVMs when exact inference is intractable. In ICML.

    Google Scholar 

  • Grady, L. (2006). Random walks for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1–17.

    Article  Google Scholar 

  • Gulshan, V., Rother, C., Criminisi, A., Blake, A., & Zisserman, A. (2010). Geodesic star convexity for interactive image segmentation. In CVPR.

    Google Scholar 

  • Kohli, P., Ladicky, L., & Torr, P. (2008). Robust higher order potentials for enforcing label consistency. In CVPR.

    Google Scholar 

  • Kohli, P., & Torr, P. (2005). Efficiently solving dynamic MRFs using graph cuts. In ICCV.

    Google Scholar 

  • Li, Y., Sun, J., Tang, C. K., & Shum, H. Y. (2004). Lazy snapping. In SIGGRAPH (Vol. 23).

    Google Scholar 

  • Liu, J., Sun, J., & Shum, H. Y. (2009). Paint selection. In SIGGRAPH.

    Google Scholar 

  • McGuinness, K., & O’Connor, N. E. (2010). A comparative evaluation of interactive segmentation algorithms. Pattern Recognition, 43(2), 434–444.

    Article  MATH  Google Scholar 

  • McGuinness, K., & O’Connor, N. E. (2011). Toward automated evaluation of interactive segmentation. In CVIU.

    Google Scholar 

  • Mortensen, E. N., & Barrett, W. A. (1998). Interactive segmentation with intelligent scissors. In Graphical models and image processing.

    Google Scholar 

  • Nickisch, H., Kohli, P., & Rother, C. (2009). Learning an interactive segmentation system (Tech. rep.). http://arxiv.org/abs/0912.2492

  • Nickisch, H., Rother, C., Kohli, P., & Rhemann, C. (2010). Learning and evaluating interactive segmentation systems. In ICVGIP.

    Google Scholar 

  • Nowozin, S., & Lampert, C. H. (2009). Global connectivity potentials for random field models. In CVPR.

    Google Scholar 

  • Rother, C., Bordeaux, L., Hamadi, Y., & Blake, A. (2006). Autocollage. ACM Transactions on Graphics, 25(3), 847–852.

    Article  Google Scholar 

  • Rother, C., Kolmogorov, V., & Blake, A. (2004). “GrabCut”—interactive foreground extraction using iterated graph cuts. In SIGGRAPH.

    Google Scholar 

  • Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.

    Article  Google Scholar 

  • Singaraju, D., Grady, L., & Vidal, R. (2009). P-brush: Continuous valued MRFs with normed pairwise distributions for image segmentation. In CVPR.

    Google Scholar 

  • Sorokin, A., & Forsyth, D. (2008). Utility data annotation with amazon mechanical turk. In Internet vision workshop at CVPR.

    Google Scholar 

  • Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., & Rother, C. (2006). A comparative study of energy minimization methods for Markov random fields. In ECCV.

    Google Scholar 

  • Szummer, M., Kohli, P., & Hoiem, D. (2008). Learning CRFs using graph cuts. In ECCV.

    Google Scholar 

  • Taskar, B., Chatalbashev, V., & Koller, D. (2004). Learning associative Markov networks. In ICML.

    Google Scholar 

  • Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector learning for interdependent and structured output spaces. In ICML.

    Google Scholar 

  • Vicente, S., Kolmogorov, V., & Rother, C. (2008). Graph cut based image segmentation with connectivity priors. In CVPR.

    Google Scholar 

  • Vijayanarasimhan, S., & Grauman, K. (2009). What’s it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In CVPR (pp. 2262–2269).

    Google Scholar 

  • Vijayanarasimhan, S., & Grauman, K. (2011a). Cost-SENSITive active visual category learning. International Journal of Computer Vision, 91(1), 24–44.

    Article  MATH  Google Scholar 

  • Vijayanarasimhan, S., & Grauman, K. (2011b). Large-scale live active learning: Training object detectors with crawled data and crowds. In CVPR.

    Google Scholar 

  • von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In SIGCHI (pp. 319–326).

    Google Scholar 

  • Wasserman, L. (2004). All of statistics. Berlin: Springer.

    MATH  Google Scholar 

Download references

Acknowledgement

Christoph Rhemann was supported by the Vienna Science and Technology Fund (WWTF) under project ICT08-019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pushmeet Kohli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kohli, P., Nickisch, H., Rother, C. et al. User-Centric Learning and Evaluation of Interactive Segmentation Systems. Int J Comput Vis 100, 261–274 (2012). https://doi.org/10.1007/s11263-012-0537-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-012-0537-4

Keywords

Navigation