Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

Vijayanarasimhan, Sudheendra; Grauman, Kristen

doi:10.1007/s11263-014-0721-9

Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

Published: 12 April 2014

Volume 108, pages 97–114, (2014)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Sudheendra Vijayanarasimhan¹ &
Kristen Grauman¹

2981 Accesses
108 Citations
Explore all metrics

Abstract

Active learning and crowdsourcing are promising ways to efficiently build up training sets for object recognition, but thus far techniques are tested in artificially controlled settings. Typically the vision researcher has already determined the dataset’s scope, the labels “actively” obtained are in fact already known, and/or the crowd-sourced collection process is iteratively fine-tuned. We present an approach for live learning of object detectors, in which the system autonomously refines its models by actively requesting crowd-sourced annotations on images crawled from the Web. To address the technical issues such a large-scale system entails, we introduce a novel part-based detector amenable to linear classifiers, and show how to identify its most uncertain instances in sub-linear time with a hashing-based solution. We demonstrate the approach with experiments of unprecedented scale and autonomy, and show it successfully improves the state-of-the-art for the most challenging objects in the PASCAL VOC benchmark. In addition, we show our detector competes well with popular nonlinear classifiers that are much more expensive to train.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

A survey of transfer learning

Article Open access 28 May 2016

Notes

We use Locality-constrained Linear Coding (LLC) by Wang et al. (2010) to obtain the sparse coding, though other algorithms could also be used for this step.
Hyperplane hashes can be used with existing approximate near-neighbor search algorithms; we use the formulation by Charikar (2002), which guarantees the probability with which the nearest neighbor will be returned.

References

Boureau, Y.-L., Bach, F., LeCun, Y., Ponce, J. (2010). Learning mid-level features for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Charikar, M. (2002). Similarity estimation techniques from rounding algorithms. In Symposium on Theory of Computing.
Chum, O., Zisserman, A. (2007). An exemplar model for learning object classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Dean, T., Ruzon, M., Segal, M., Shlens, J., Vijayanarasimhan, S., Yagnik, J. (2013). Fast, accurate detection of 100,000 object classes on a single machine. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). A large-scale hierarchical image database: Imagenet. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2010). The pascal visual object classes challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99(1), 5555.
Google Scholar
Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A. (2005). Learning object categories from Google’s image search. In Proceedings of the International Conference on Computer Vision (ICCV).
Jain, P., Vijayanarasimhan, S., Grauman, K. (2010). Hashing hyperplane queries to near points with applications to large-scale active learning. In Advances in Neural Information Processing Systems (NIPS).
Joachims, T. (2006). Training linear SVMs in linear time. In International Conference on Knowledge Discovery and Data Mining (KDD).
Joshi, A., Porikli, F., Papanikolopoulos, N. (2009). Multi-class active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Kapoor, A., Grauman, K., Urtasun, R., Darrell, T. (2007). Active learning with Gaussian processes for object categorization. In International Conference on Computer Vision (ICCV).
Lampert, C., Blaschko, M., & Hofmann, T. (2008). Object localization by efficient subwindow search: Beyond sliding windows. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (ICCV).
Lee, Y. J., Grauman, K. (2010). Object-graphs for context-aware category discovery. In Proceedings of IEEE International Conference on Computer Vision (CVPR).
Li, L., Wang, G., & Fei-Fei, G. (2007). Automatic online picture collection via incremental model learning: Optimol. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Pirsiavash, H., Ramanan, D. (2012). Steerable part models. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Qi, G., Hua, X., Rui, Y., Tang, J., Zhang, H. (2008). Two-dimensional active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2007). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.
Article Google Scholar
Siddiquie, B., & Gupta, A. (2010). Modeling context for multi-class active learning: Beyond active noun tagging. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Song, H., Zickler, S., Althoff, T., Girshick, R., Fritz, M., Geyer, C., Felzenszwalb, P., Darrell, T. (2012). Sparselet models for efficient multiclass object detection. In Proceedings of the European Conference on Computer Vision.
Sorokin, A., Forsyth, D. (2008). Utility data annotation with Amazon mechanical turk. In Workshop on Internet Vision.
Tong, S., Koller, D. (2000). Support vector machine active learning with applications to text classification. In Proceedings of the International Conference on Machine Learning (ICML).
Torralba, A., Murphy, K., & Freeman, W. (2007). Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5), 854–869.
Article Google Scholar
Uijlings, J., Smeulders, A., Scha, R. (2009). What is the spatial extent of an object? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A. (2009). Multiple kernels for object detection. In International Conference on Computer Vision (ICCV).
Vijayanarasimhan, S., & Grauman, K. (2008). Multiple-instance learning for weakly supervised object categorization: Keywords to visual categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Vijayanarasimhan, S., & Grauman, K. (2011). Training object detectors with crawled data and crowds: Large-scale live active learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Vijayanarasimhan, S., Grauman, K. (2008). Multi-level active prediction of useful image annotations for recognition. In Advances in Neural Information Processing Systems (NIPS).
Vijayanarasimhan, S., Kapoor, A. (2010). Visual recognition and detection under bounded computational resources. In Proceedings of IEEE International Conference on Computer Vision (CVPR).
Vijayanarasimhan, S., Jain, P., & Grauman, K. (2014). Hashing hyperplane queries to near points with applications to large-scale active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2), 276–288.
Article Google Scholar
Viola, P., Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
von Ahn, L., Dabbish, L. (2004). Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI).
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Welinder, P., & Perona, P. (2010). Rating annotators and obtaining cost-effective labels: Online crowdsourcing. In Workshop on Advancing Computer Vision with Humans in the Loop (ACVHL).
Yang, J., Yu, K., Gong, Y., Huang, T. (2009). Linear spatial pyramid matching sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Download references

Acknowledgments

The authors thank the anonymous reviewers for their helpful comments. This research is supported in part by NSF CAREER IIS-0747356 and DARPA Mind’s Eye.

Author information

Authors and Affiliations

University of Texas at Austin, Austin, TX, USA
Sudheendra Vijayanarasimhan & Kristen Grauman

Authors

Sudheendra Vijayanarasimhan
View author publications
You can also search for this author in PubMed Google Scholar
Kristen Grauman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kristen Grauman.

Additional information

Communicated by Martial Hebert.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vijayanarasimhan, S., Grauman, K. Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds. Int J Comput Vis 108, 97–114 (2014). https://doi.org/10.1007/s11263-014-0721-9

Download citation

Received: 24 February 2013
Accepted: 19 March 2014
Published: 12 April 2014
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11263-014-0721-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

A survey of transfer learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

A survey of transfer learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation