Skip to main content
Log in

Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Active learning and crowdsourcing are promising ways to efficiently build up training sets for object recognition, but thus far techniques are tested in artificially controlled settings. Typically the vision researcher has already determined the dataset’s scope, the labels “actively” obtained are in fact already known, and/or the crowd-sourced collection process is iteratively fine-tuned. We present an approach for live learning of object detectors, in which the system autonomously refines its models by actively requesting crowd-sourced annotations on images crawled from the Web. To address the technical issues such a large-scale system entails, we introduce a novel part-based detector amenable to linear classifiers, and show how to identify its most uncertain instances in sub-linear time with a hashing-based solution. We demonstrate the approach with experiments of unprecedented scale and autonomy, and show it successfully improves the state-of-the-art for the most challenging objects in the PASCAL VOC benchmark. In addition, we show our detector competes well with popular nonlinear classifiers that are much more expensive to train.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. We use Locality-constrained Linear Coding (LLC) by Wang et al. (2010) to obtain the sparse coding, though other algorithms could also be used for this step.

  2. Hyperplane hashes can be used with existing approximate near-neighbor search algorithms; we use the formulation by Charikar (2002), which guarantees the probability with which the nearest neighbor will be returned.

References

  • Boureau, Y.-L., Bach, F., LeCun, Y., Ponce, J. (2010). Learning mid-level features for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Charikar, M. (2002). Similarity estimation techniques from rounding algorithms. In Symposium on Theory of Computing.

  • Chum, O., Zisserman, A. (2007). An exemplar model for learning object classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Dean, T., Ruzon, M., Segal, M., Shlens, J., Vijayanarasimhan, S., Yagnik, J. (2013). Fast, accurate detection of 100,000 object classes on a single machine. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). A large-scale hierarchical image database: Imagenet. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2010). The pascal visual object classes challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99(1), 5555.

    Google Scholar 

  • Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A. (2005). Learning object categories from Google’s image search. In Proceedings of the International Conference on Computer Vision (ICCV).

  • Jain, P., Vijayanarasimhan, S., Grauman, K. (2010). Hashing hyperplane queries to near points with applications to large-scale active learning. In Advances in Neural Information Processing Systems (NIPS).

  • Joachims, T. (2006). Training linear SVMs in linear time. In International Conference on Knowledge Discovery and Data Mining (KDD).

  • Joshi, A., Porikli, F., Papanikolopoulos, N. (2009). Multi-class active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Kapoor, A., Grauman, K., Urtasun, R., Darrell, T. (2007). Active learning with Gaussian processes for object categorization. In International Conference on Computer Vision (ICCV).

  • Lampert, C., Blaschko, M., & Hofmann, T. (2008). Object localization by efficient subwindow search: Beyond sliding windows. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (ICCV).

  • Lee, Y. J., Grauman, K. (2010). Object-graphs for context-aware category discovery. In Proceedings of IEEE International Conference on Computer Vision (CVPR).

  • Li, L., Wang, G., & Fei-Fei, G. (2007). Automatic online picture collection via incremental model learning: Optimol. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Pirsiavash, H., Ramanan, D. (2012). Steerable part models. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Qi, G., Hua, X., Rui, Y., Tang, J., Zhang, H. (2008). Two-dimensional active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2007). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.

    Article  Google Scholar 

  • Siddiquie, B., & Gupta, A. (2010). Modeling context for multi-class active learning: Beyond active noun tagging. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Song, H., Zickler, S., Althoff, T., Girshick, R., Fritz, M., Geyer, C., Felzenszwalb, P., Darrell, T. (2012). Sparselet models for efficient multiclass object detection. In Proceedings of the European Conference on Computer Vision.

  • Sorokin, A., Forsyth, D. (2008). Utility data annotation with Amazon mechanical turk. In Workshop on Internet Vision.

  • Tong, S., Koller, D. (2000). Support vector machine active learning with applications to text classification. In Proceedings of the International Conference on Machine Learning (ICML).

  • Torralba, A., Murphy, K., & Freeman, W. (2007). Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5), 854–869.

    Article  Google Scholar 

  • Uijlings, J., Smeulders, A., Scha, R. (2009). What is the spatial extent of an object? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A. (2009). Multiple kernels for object detection. In International Conference on Computer Vision (ICCV).

  • Vijayanarasimhan, S., & Grauman, K. (2008). Multiple-instance learning for weakly supervised object categorization: Keywords to visual categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Vijayanarasimhan, S., & Grauman, K. (2011). Training object detectors with crawled data and crowds: Large-scale live active learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Vijayanarasimhan, S., Grauman, K. (2008). Multi-level active prediction of useful image annotations for recognition. In Advances in Neural Information Processing Systems (NIPS).

  • Vijayanarasimhan, S., Kapoor, A. (2010). Visual recognition and detection under bounded computational resources. In Proceedings of IEEE International Conference on Computer Vision (CVPR).

  • Vijayanarasimhan, S., Jain, P., & Grauman, K. (2014). Hashing hyperplane queries to near points with applications to large-scale active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2), 276–288.

    Article  Google Scholar 

  • Viola, P., Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • von Ahn, L., Dabbish, L. (2004). Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI).

  • Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Welinder, P., & Perona, P. (2010). Rating annotators and obtaining cost-effective labels: Online crowdsourcing. In Workshop on Advancing Computer Vision with Humans in the Loop (ACVHL).

  • Yang, J., Yu, K., Gong, Y., Huang, T. (2009). Linear spatial pyramid matching sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Download references

Acknowledgments

The authors thank the anonymous reviewers for their helpful comments. This research is supported in part by NSF CAREER IIS-0747356 and DARPA Mind’s Eye.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kristen Grauman.

Additional information

Communicated by Martial Hebert.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vijayanarasimhan, S., Grauman, K. Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds. Int J Comput Vis 108, 97–114 (2014). https://doi.org/10.1007/s11263-014-0721-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0721-9

Keywords

Navigation