Skip to main content
Log in

Do We Need More Training Data?

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript


Datasets for training object recognition systems are steadily increasing in size. This paper investigates the question of whether existing detectors will continue to improve as data grows, or saturate in performance due to limited model complexity and the Bayes risk associated with the feature spaces in which they operate. We focus on the popular paradigm of discriminatively trained templates defined on oriented gradient features. We investigate the performance of mixtures of templates as the number of mixture components and the amount of training data grows. Surprisingly, even with proper treatment of regularization and “outliers”, the performance of classic mixture models appears to saturate quickly (\({\sim }10\) templates and \({\sim }100\) positive training examples per template). This is not a limitation of the feature space as compositional mixtures that share template parameters via parts and that can synthesize new templates not encountered during training yield significantly better performance. Based on our analysis, we conjecture that the greatest gains in detection performance will continue to derive from improved representations and learning algorithms that can make efficient use of large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others


  1. The dataset can be downloaded from


  • Beis, J.S., & Lowe, D.G. (1997). Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on IEEE (pp. 1000–1006).

  • Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on IEEE (pp. 1–8).

  • Bosch, A., Zisserman, A., & Muoz, X. (2007). Image classification using random forests and ferns. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on IEEE (pp. 1–8).

  • Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3D human pose annotations. In International Conference on Computer Vision.

  • Chang, C., & Lin, C. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(27), 1–27:27, software

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR 2005.

  • Deng, J., Berg, A., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us?. In International Conference on Computer Vision.

  • Divvala, S.K., Efros, A.A., & Hebert, M. (2012). How important are deformable parts in the deformable parts model? In European Conference on Computer Vision (ECCV), Parts and Attributes Workshop.

  • Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Felzenszwalb, P., & Huttenlocher, D. (2012). Distance transforms of sampled functions. Theory of Computing, 8(1), 415–428.

    Article  MathSciNet  MATH  Google Scholar 

  • Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE TPAMI, 32(9), 1627–1645.

    Article  Google Scholar 

  • Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-pie. Image and Vision Computing, 28(5), 807–813.

    Article  Google Scholar 

  • Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8–12.

    Article  Google Scholar 

  • Hays, J., & Efros, A. (2007). Scene completion using millions of photographs. ACM Transactions on Graphics (TOG), 26, 4.

    Article  Google Scholar 

  • Hays, J., & Efros, A.A. (2008). Im2gps: Estimating geographic information from a single image. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on IEEE (pp. 1–8).

  • Hoiem, D., Chodpathumwan. Y., & Dai, Q. (2012). Diagnosing error in object detectors. In Computer Vision ECCV 2012 (Vol. 7574, pp. 340–353). Berlin: Springer.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1106–1114.

    Google Scholar 

  • Liu, C., Yuen, J., & Torralba, A. (2011). Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2368–2382.

    Article  Google Scholar 

  • Malisiewicz, T., Gupta, A., & Efros, A. (2011). Ensemble of exemplar-svms for object detection and beyond. In IEEE, International Conference on Computer Vision (pp. 89–96).

  • McAllester, D. A. (1999). Some pac-bayesian theorems. Machine Learning, 37(3), 355–363.

    Article  MATH  Google Scholar 

  • Muja, M., & Lowe, D.G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Applications (VISSAPP09) (pp. 331–340).

  • Parikh, D., & Zitnick, C. (2011). Finding the weakest link in person detectors. In Computer Vision and Pattern Recognition IEEE (pp. 1425–1432).

  • Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers (pp. 61–74), MIT Press.

  • Shakhnarovich, G., Darrell, T., & Indyk, P. (2005). Nearest-neighbor methods in learning and vision: Theory and practice. Cambridge: MIT press.

    Google Scholar 

  • Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on IEEE (pp. 750–757).

  • Tighe, J., & Lazebnik, S. (2010). Superparsing: Scalable nonparametric image parsing with superpixels. In Computer Vision-ECCV 2010 (pp. 352–365). Springer.

  • Torralba, A., & Efros, A. (2011). Unbiased look at dataset bias. In Computer Vision and Pattern Recognition IEEE (pp. 1521–1528).

  • Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 Million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958–1970.

    Article  Google Scholar 

  • Tuytelaars, T., & Mikolajczyk, K. (2008). Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision, 3(3), 177–280.

    Article  Google Scholar 

  • Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In Computer Vision, 2009 IEEE 12th International Conference on IEEE (pp. 606–613).

  • Wu, Y., & Liu, Y. (2007). Robust truncated hinge loss support vector machines. Journal of the American Statistical Association, 102(479), 974–983.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, H., Berg, A. C., Maire, M., & Malik, J. (2006). Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on IEEE (Vol. 2, pp. 2126–2136).

  • Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In Computer Vision and Pattern Recognition.

Download references


Funding for this research was provided by NSF IIS-0954083, NSF DBI-1053036, ONR-MURI N00014-10-1-0933, a Google Research award to CF, and a Microsoft Research gift to DR.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Xiangxin Zhu.

Additional information

Communicated by Antonio Torralba and Alexei Efros.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Vondrick, C., Fowlkes, C.C. et al. Do We Need More Training Data?. Int J Comput Vis 119, 76–92 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: