Do We Need More Training Data?

Zhu, Xiangxin; Vondrick, Carl; Fowlkes, Charless C.; Ramanan, Deva

doi:10.1007/s11263-015-0812-2

Do We Need More Training Data?

Published: 12 March 2015

Volume 119, pages 76–92, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Xiangxin Zhu¹,
Carl Vondrick²,
Charless C. Fowlkes¹ &
…
Deva Ramanan¹

3505 Accesses
137 Citations
34 Altmetric
1 Mention
Explore all metrics

Abstract

Datasets for training object recognition systems are steadily increasing in size. This paper investigates the question of whether existing detectors will continue to improve as data grows, or saturate in performance due to limited model complexity and the Bayes risk associated with the feature spaces in which they operate. We focus on the popular paradigm of discriminatively trained templates defined on oriented gradient features. We investigate the performance of mixtures of templates as the number of mixture components and the amount of training data grows. Surprisingly, even with proper treatment of regularization and “outliers”, the performance of classic mixture models appears to saturate quickly (\({\sim }10\) templates and \({\sim }100\) positive training examples per template). This is not a limitation of the feature space as compositional mixtures that share template parameters via parts and that can synthesize new templates not encountered during training yield significantly better performance. Based on our analysis, we conjecture that the greatest gains in detection performance will continue to derive from improved representations and learning algorithms that can make efficient use of large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple Classifier Systems: Theory, Applications and Tools

Image Processing: Object Recognition

Bag-of-Components: An Online Algorithm for Batch Learning of Mixture Models

Notes

The dataset can be downloaded from http://vision.ics.uci.edu/datasets/.

References

Beis, J.S., & Lowe, D.G. (1997). Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on IEEE (pp. 1000–1006).
Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on IEEE (pp. 1–8).
Bosch, A., Zisserman, A., & Muoz, X. (2007). Image classification using random forests and ferns. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on IEEE (pp. 1–8).
Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3D human pose annotations. In International Conference on Computer Vision.
Chang, C., & Lin, C. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(27), 1–27:27, software http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR 2005.
Deng, J., Berg, A., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us?. In International Conference on Computer Vision.
Divvala, S.K., Efros, A.A., & Hebert, M. (2012). How important are deformable parts in the deformable parts model? In European Conference on Computer Vision (ECCV), Parts and Attributes Workshop.
Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Felzenszwalb, P., & Huttenlocher, D. (2012). Distance transforms of sampled functions. Theory of Computing, 8(1), 415–428.
Article MathSciNet MATH Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE TPAMI, 32(9), 1627–1645.
Article Google Scholar
Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-pie. Image and Vision Computing, 28(5), 807–813.
Article Google Scholar
Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8–12.
Article Google Scholar
Hays, J., & Efros, A. (2007). Scene completion using millions of photographs. ACM Transactions on Graphics (TOG), 26, 4.
Article Google Scholar
Hays, J., & Efros, A.A. (2008). Im2gps: Estimating geographic information from a single image. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on IEEE (pp. 1–8).
Hoiem, D., Chodpathumwan. Y., & Dai, Q. (2012). Diagnosing error in object detectors. In Computer Vision ECCV 2012 (Vol. 7574, pp. 340–353). Berlin: Springer.
Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1106–1114.
Google Scholar
Liu, C., Yuen, J., & Torralba, A. (2011). Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2368–2382.
Article Google Scholar
Malisiewicz, T., Gupta, A., & Efros, A. (2011). Ensemble of exemplar-svms for object detection and beyond. In IEEE, International Conference on Computer Vision (pp. 89–96).
McAllester, D. A. (1999). Some pac-bayesian theorems. Machine Learning, 37(3), 355–363.
Article MATH Google Scholar
Muja, M., & Lowe, D.G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Applications (VISSAPP09) (pp. 331–340).
Parikh, D., & Zitnick, C. (2011). Finding the weakest link in person detectors. In Computer Vision and Pattern Recognition IEEE (pp. 1425–1432).
Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers (pp. 61–74), MIT Press.
Shakhnarovich, G., Darrell, T., & Indyk, P. (2005). Nearest-neighbor methods in learning and vision: Theory and practice. Cambridge: MIT press.
Google Scholar
Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on IEEE (pp. 750–757).
Tighe, J., & Lazebnik, S. (2010). Superparsing: Scalable nonparametric image parsing with superpixels. In Computer Vision-ECCV 2010 (pp. 352–365). Springer.
Torralba, A., & Efros, A. (2011). Unbiased look at dataset bias. In Computer Vision and Pattern Recognition IEEE (pp. 1521–1528).
Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 Million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958–1970.
Article Google Scholar
Tuytelaars, T., & Mikolajczyk, K. (2008). Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision, 3(3), 177–280.
Article Google Scholar
Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In Computer Vision, 2009 IEEE 12th International Conference on IEEE (pp. 606–613).
Wu, Y., & Liu, Y. (2007). Robust truncated hinge loss support vector machines. Journal of the American Statistical Association, 102(479), 974–983.
Article MathSciNet MATH Google Scholar
Zhang, H., Berg, A. C., Maire, M., & Malik, J. (2006). Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on IEEE (Vol. 2, pp. 2126–2136).
Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In Computer Vision and Pattern Recognition.

Download references

Acknowledgments

Funding for this research was provided by NSF IIS-0954083, NSF DBI-1053036, ONR-MURI N00014-10-1-0933, a Google Research award to CF, and a Microsoft Research gift to DR.

Author information

Authors and Affiliations

Department of Computer Science, UC Irvine, Irvine, USA
Xiangxin Zhu, Charless C. Fowlkes & Deva Ramanan
CSAIL, MIT, Cambridge, USA
Carl Vondrick

Authors

Xiangxin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Carl Vondrick
View author publications
You can also search for this author in PubMed Google Scholar
Charless C. Fowlkes
View author publications
You can also search for this author in PubMed Google Scholar
Deva Ramanan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangxin Zhu.

Additional information

Communicated by Antonio Torralba and Alexei Efros.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, X., Vondrick, C., Fowlkes, C.C. et al. Do We Need More Training Data?. Int J Comput Vis 119, 76–92 (2016). https://doi.org/10.1007/s11263-015-0812-2

Download citation

Received: 01 June 2013
Accepted: 02 March 2015
Published: 12 March 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s11263-015-0812-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Do We Need More Training Data?

Abstract

Access this article

Similar content being viewed by others

Multiple Classifier Systems: Theory, Applications and Tools

Image Processing: Object Recognition

Bag-of-Components: An Online Algorithm for Batch Learning of Mixture Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Do We Need More Training Data?

Abstract

Access this article

Similar content being viewed by others

Multiple Classifier Systems: Theory, Applications and Tools

Image Processing: Object Recognition

Bag-of-Components: An Online Algorithm for Batch Learning of Mixture Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation