Abstract
Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images and automated methods. The framework is based on a prediction module that estimates the quality of given algorithm-drawn segmentations. We demonstrate the value of the framework for two novel tasks related to predicting how to distribute annotation efforts between algorithms and humans. Specifically, we develop two systems that automatically decide, for a batch of images, when to recruit humans versus computers to create (1) coarse segmentations required to initialize segmentation tools and (2) final, fine-grained segmentations. Experiments demonstrate the advantage of relying on a mix of human and computer efforts over relying on either resource alone for segmenting objects in images coming from three diverse modalities (visible, phase contrast microscopy, and fluorescence microscopy).
Similar content being viewed by others
Notes
To afford similar contributions of each dataset, we randomly sample 2000 segmentations for the MSRA10K dataset.
For the one dataset large enough to train a deep model, MSRA10K, we find that fine-tuning off-the-shelf CNNs (namely, AlexNet, VGG, and ResNet) yields similar or worse performance than the other models tested in our experiments, including those using the frozen CNN features without fine-tuning. This suggests that the proposed features are well matched for the target task.
References
Alpert, S., Galun, M., Basri, R., & Brandt, A. (2007). Image segmentation by probabilistic bottom-up aggregation and cue integration. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916.
Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., & Malik, J. (2014) Multiscale combinatorial grouping. In IEEE conference on computer vision and pattern recognition (pp. 328–335).
Ballard, D. (1981). Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition, 13(2), 111–122.
Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010) iCoseg: Interactive co-segmentation with intelligent scribble guidance. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3169–3176). IEEE.
Bernard, O., Friboulet, D., Thevenaz, P., & Unser, M. (2009). Variational b-spline level-set: A linear filtering approach for fast, deformable model evolution. IEEE Transactions on Image Processing, 18(6), 1179–1191.
Biswas, A., & Parikh, D. (2013) Simultaneous active learning of classifiers & attributes via relative feedback. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 644–651).
Branson, S., Grant, V. H., Wah, C., Perona, P., & Belongie, S. (2014). The ignorant led by the blind: A hybrid human-machine vision system for fine-grained categorization. International Journal of Computer Vision, 108, 3–29.
Carlier, A., Charvillat, V., Salvador, A., i Nieto, X. G., & Marques, O. (2014). Click’n’Cut: Crowdsourced interactive segmentation with object candidates. In International ACM workshop on crowdsourcing for multimedia (pp. 53–56).
Carreira, J., Sminchisescu, C. (2010). Constrained parametric min-cuts for automatic object segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3241–3248).
Caselles, V., Kimmel, R., & Sapiro, G. (1997). Geodesic active contours. IEEE Transactions on Image Processing, 22(1), 61–79.
Chan, T., & Vese, L. (2001). Active contours without edges. IEEE Transactions on Image Processing, 10(2), 266–277.
Cheng, M., Mitra, N. J., Huang, X., Torr, P. H. S., & Hu, S. (2014). Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 569–582.
Chittajallu, D. R., Florian, S., Kohler, R. H., Iwamoto, Y., Orth, J. D., Weissleder, R., et al. (2015). In vivo cell-cycle profiling in xenograft tumors by quantitative intravital microscopy. Nature Methods, 12(6), 577–585.
Cui, J., Yang, Q., Wen, F., Wu, Q., Zhang, C., Gool, L. V., & Tang, X. (2008) Transductive object cutout. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Endres, I., & Hoiem, D. (2010). Category independent object proposals. In European conference on computer vision (ECCV) (pp. 575–588).
Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Glenn, D. R., Lee, K., Park, H., Weissleder, R., Yacoby, A., Lukin, M. D., et al. (2015). Single-cell magnetic imaging using a quantum diamond microscope. Nature Methods, 12, 736–738.
Grady, L., Jolly, M. P., & Seitz, A. (2011). Segmentation from a box. In IEEE international conference on computer vision (ICCV) (pp. 367–374).
Gulshan, V., Rother, C., Criminisi, A., Blake, A., & Zisserman, A. (2010). Geodesic star convexity for interactive image segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3129–3136).
Gurari, D., He, K., Xiong, B., Zhang, J., Sameki, M., Jain, S. D., et al. (2018). Predicting foreground object ambiguity and efficiently crowdsourcing the segmentation(s). International Journal on Computer Vision (IJCV), 126, 714–730.
Gurari, D., Jain, S. D., Betke, M., & Grauman, K. (2016). Pull the plug? predicting if computers or humans should segment images. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 382–391).
Gurari, D., Theriault, D., Sameki, M., & Betke, M. (2014). How to use level set methods to accurately find boundaries of cells in biomedical images? Evaluation of six methods paired with automated and crowdsourced initial contours. In Conference on medical image computing and computer assisted intervention (MICCAI): Interactive medical image computation (IMIC) workshop (pp. 9).
Gurari, D., Theriault, D., Sameki, M., Isenberg, B., Pham, T. A., Purwada, A., Solski, P., Walker, M., Zhang, C., Wong, J. Y., & Betke, M. (2015). How to collect segmentations for biomedical images? A benchmark evaluating the performance of experts, crowdsourced non-experts, and algorithms. In IEEE winter conference on applications in computer vision (WACV) (pp. 8).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition (pp. 770–778).
Jain, S. D., & Grauman, K. (2013). Predicting sufficient annotation strength for interactive foreground segmentation. In IEEE international conference on computer vision (ICCV) (pp. 1313–1320).
Jain, S. D., Xiong, B., & Grauman, K. (2017). Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 1).
Kohlberger, T., Singh, V., Alvino, C., Bahlmann, C., & Grady, L. (2012). Evaluating segmentation error without ground truth. In Medical image computing and computer assisted intervention (MICCAI) (pp. 528–536).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS) (pp. 1097–1105).
Lankton, S., & Tannenbaum, A. (2008). Localizing region-based active contours. IEEE Transactions on Image Processing, 17(11), 2029–2039.
Lempitsky, V., Kohli, P., Rother, C., Sharp, T. (2009). Image segmentation with a bounding box prior. In IEEE international conference on computer vision (ICCV) (pp. 277–284).
Li, C., Kao, C. Y., Gore, J. C., & Ding, Z. (2008). Minimization of region-scalable fitting energy for image segmentation. IEEE Transactions on Image Processing, 17(10), 1940–1949.
Li, H., Meng, F., Luo, B., & Zhu, S. (2014). Repairing bad co-segmentation using its quality evaluation and segment propagation. IEEE Transactions on Image Processing, 23(8), 3545–3559.
Liu, D., Xiong, Y., Pulli, K., & Shapiro, L. (2011). Estimating image segmentation difficulty. In Machine learning and data mining in pattern recognition (pp. 484–495).
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., et al. (2011). Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2), 353–367.
Maitra, M., Gupta, R. K., & Mukherjee, M. (2012). Detection and counting of red blood cells in blood cell images using Hough transform. International Journal of Computer Applications, 53(16), 18–22.
Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International conference on computer vision (ICCV) (Vol. 2, pp. 416–423).
Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66.
Rother, C., Kolmogorov, V., & Blake, A. (2004). GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 3, 309–314.
Settles, B. (2010). Active learning literature survey. Technical report, University of Wisconsin, Madison.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Vijayanarasimhan, S., & Grauman, K. (2011). Cost-sensitive active visual category learning. International Journal of Computer Vision, 91, 24–44.
Wah, C., Maji, S., & Belongie, S. (2015). Learning localized perceptual similarity metrics for interactive categorization. In IEEE Winter conference on applications in computer vision (WACV) (pp. 502–509).
Wu, J., Zhao, Y., Zhu, J., Luo, S., & Tu, Z. (2014). MILCut: A sweeping line multiple instance learning paradigm for interactive image segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 256–263).
Acknowledgements
The authors thank the anonymous crowd workers for participating in our experiments. This work is supported in part by National Science Foundation funding to DG (IIS-1755593), a gift from Adobe to DG, National Science Foundation funding to MB (IIS-1421943), a Google Faculty Award to MB, AWS Machine Learning Research Award to KG, IBM Faculty Award to KG, IBM Open Collaborative Research Award to KG, and a gift from Qualcomm to KG.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Gang Hua.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gurari, D., Zhao, Y., Jain, S.D. et al. Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch. Int J Comput Vis 127, 1198–1216 (2019). https://doi.org/10.1007/s11263-019-01172-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-019-01172-6