Advertisement

Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

  • Sudheendra Vijayanarasimhan
  • Kristen GraumanEmail author
Article

Abstract

Active learning and crowdsourcing are promising ways to efficiently build up training sets for object recognition, but thus far techniques are tested in artificially controlled settings. Typically the vision researcher has already determined the dataset’s scope, the labels “actively” obtained are in fact already known, and/or the crowd-sourced collection process is iteratively fine-tuned. We present an approach for live learning of object detectors, in which the system autonomously refines its models by actively requesting crowd-sourced annotations on images crawled from the Web. To address the technical issues such a large-scale system entails, we introduce a novel part-based detector amenable to linear classifiers, and show how to identify its most uncertain instances in sub-linear time with a hashing-based solution. We demonstrate the approach with experiments of unprecedented scale and autonomy, and show it successfully improves the state-of-the-art for the most challenging objects in the PASCAL VOC benchmark. In addition, we show our detector competes well with popular nonlinear classifiers that are much more expensive to train.

Keywords

Object detection Active learning Large-scale learning Hashing Crowdsourcing Image annotation 

Notes

Acknowledgments

The authors thank the anonymous reviewers for their helpful comments. This research is supported in part by NSF CAREER IIS-0747356 and DARPA Mind’s Eye.

References

  1. Boureau, Y.-L., Bach, F., LeCun, Y., Ponce, J. (2010). Learning mid-level features for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  2. Charikar, M. (2002). Similarity estimation techniques from rounding algorithms. In Symposium on Theory of Computing.Google Scholar
  3. Chum, O., Zisserman, A. (2007). An exemplar model for learning object classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  4. Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  5. Dean, T., Ruzon, M., Segal, M., Shlens, J., Vijayanarasimhan, S., Yagnik, J. (2013). Fast, accurate detection of 100,000 object classes on a single machine. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  6. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). A large-scale hierarchical image database: Imagenet. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  7. Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2010). The pascal visual object classes challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRefGoogle Scholar
  8. Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99(1), 5555.Google Scholar
  9. Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A. (2005). Learning object categories from Google’s image search. In Proceedings of the International Conference on Computer Vision (ICCV).Google Scholar
  10. Jain, P., Vijayanarasimhan, S., Grauman, K. (2010). Hashing hyperplane queries to near points with applications to large-scale active learning. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
  11. Joachims, T. (2006). Training linear SVMs in linear time. In International Conference on Knowledge Discovery and Data Mining (KDD).Google Scholar
  12. Joshi, A., Porikli, F., Papanikolopoulos, N. (2009). Multi-class active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  13. Kapoor, A., Grauman, K., Urtasun, R., Darrell, T. (2007). Active learning with Gaussian processes for object categorization. In International Conference on Computer Vision (ICCV).Google Scholar
  14. Lampert, C., Blaschko, M., & Hofmann, T. (2008). Object localization by efficient subwindow search: Beyond sliding windows. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (ICCV).Google Scholar
  15. Lee, Y. J., Grauman, K. (2010). Object-graphs for context-aware category discovery. In Proceedings of IEEE International Conference on Computer Vision (CVPR).Google Scholar
  16. Li, L., Wang, G., & Fei-Fei, G. (2007). Automatic online picture collection via incremental model learning: Optimol. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  17. Pirsiavash, H., Ramanan, D. (2012). Steerable part models. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  18. Qi, G., Hua, X., Rui, Y., Tang, J., Zhang, H. (2008). Two-dimensional active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  19. Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2007). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.CrossRefGoogle Scholar
  20. Siddiquie, B., & Gupta, A. (2010). Modeling context for multi-class active learning: Beyond active noun tagging. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  21. Song, H., Zickler, S., Althoff, T., Girshick, R., Fritz, M., Geyer, C., Felzenszwalb, P., Darrell, T. (2012). Sparselet models for efficient multiclass object detection. In Proceedings of the European Conference on Computer Vision.Google Scholar
  22. Sorokin, A., Forsyth, D. (2008). Utility data annotation with Amazon mechanical turk. In Workshop on Internet Vision.Google Scholar
  23. Tong, S., Koller, D. (2000). Support vector machine active learning with applications to text classification. In Proceedings of the International Conference on Machine Learning (ICML).Google Scholar
  24. Torralba, A., Murphy, K., & Freeman, W. (2007). Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5), 854–869.CrossRefGoogle Scholar
  25. Uijlings, J., Smeulders, A., Scha, R. (2009). What is the spatial extent of an object? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  26. Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A. (2009). Multiple kernels for object detection. In International Conference on Computer Vision (ICCV).Google Scholar
  27. Vijayanarasimhan, S., & Grauman, K. (2008). Multiple-instance learning for weakly supervised object categorization: Keywords to visual categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  28. Vijayanarasimhan, S., & Grauman, K. (2011). Training object detectors with crawled data and crowds: Large-scale live active learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  29. Vijayanarasimhan, S., Grauman, K. (2008). Multi-level active prediction of useful image annotations for recognition. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
  30. Vijayanarasimhan, S., Kapoor, A. (2010). Visual recognition and detection under bounded computational resources. In Proceedings of IEEE International Conference on Computer Vision (CVPR).Google Scholar
  31. Vijayanarasimhan, S., Jain, P., & Grauman, K. (2014). Hashing hyperplane queries to near points with applications to large-scale active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2), 276–288.CrossRefGoogle Scholar
  32. Viola, P., Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  33. von Ahn, L., Dabbish, L. (2004). Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI).Google Scholar
  34. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google Scholar
  35. Welinder, P., & Perona, P. (2010). Rating annotators and obtaining cost-effective labels: Online crowdsourcing. In Workshop on Advancing Computer Vision with Humans in the Loop (ACVHL).Google Scholar
  36. Yang, J., Yu, K., Gong, Y., Huang, T. (2009). Linear spatial pyramid matching sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.University of Texas at AustinAustinUSA

Personalised recommendations