Advertisement

The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

  • Jonathan Krause
  • Benjamin Sapp
  • Andrew Howard
  • Howard Zhou
  • Alexander Toshev
  • Tom Duerig
  • James Philbin
  • Li Fei-Fei
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9907)

Abstract

Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes. Second, train a model utilizing this data. Toward the goal of solving fine-grained recognition, we introduce an alternative approach, leveraging free, noisy data from the web and simple, generic methods of recognition. This approach has benefits in both performance and scalability. We demonstrate its efficacy on four fine-grained datasets, greatly exceeding existing state of the art without the manual collection of even a single label, and furthermore show first results at scaling to more than 10,000 fine-grained categories. Quantitatively, we achieve top-1 accuracies of \(92.3\,\%\) on CUB-200-2011, \(85.4\,\%\) on Birdsnap, \(93.4\,\%\) on FGVC-Aircraft, and \(80.8\,\%\) on Stanford Dogs without using their annotated training sets. We compare our approach to an active learning approach for expanding fine-grained datasets.

Notes

Acknowledgments

We thank Gal Chechik, Chuck Rosenberg, Zhen Li, Timnit Gebru, Vignesh Ramanathan, Oliver Groth, and the anonymous reviewers for valuable feedback.

Supplementary material

419975_1_En_19_MOESM1_ESM.pdf (9.9 mb)
Supplementary material 1 (pdf 10138 KB)

References

  1. 1.
    Angelova, A., Zhu, S., Lin, Y.: Image segmentation for large-scale subcategory flower recognition. In: Workshop on Applications of Computer Vision (WACV), pp. 39–45. IEEE (2013)Google Scholar
  2. 2.
    Balcan, M.-F., Broder, A., Zhang, T.: Margin based active learning. In: Bshouty, N.H., Gentile, C. (eds.) COLT 2007. LNCS (LNAI), vol. 4539, pp. 35–50. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-72927-3_5 CrossRefGoogle Scholar
  3. 3.
    Berg, T., Belhumeur, P.N.: Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: Computer Vision and Pattern Recognition (CVPR), pp. 955–962. IEEE (2013)Google Scholar
  4. 4.
    Berg, T., Liu, J., Lee, S.W., Alexander, M.L., Jacobs, D.W., Belhumeur, P.N.: Birdsnap: large-scale fine-grained visual categorization of birds. In: Computer Vision and Pattern Recognition (CVPR), June 2014Google Scholar
  5. 5.
    Branson, S., Van Horn, G., Perona, P., Belongie, S.: Improved bird species recognition using pose normalized deep convolutional nets. In: British Machine Vision Conference (BMVC) (2014)Google Scholar
  6. 6.
    Branson, S., Van Horn, G., Wah, C., Perona, P., Belongie, S.: The ignorant led by the blind: a hybrid human-machine vision system for fine-grained categorization. Int. J. Comput. Vision (IJCV), 1–27 (2014)Google Scholar
  7. 7.
    Chai, Y., Lempitsky, V., Zisserman, A.: Bicos: A bi-level co-segmentation method for image classification. In: International Conference on Computer Vision (ICCV). IEEE (2011)Google Scholar
  8. 8.
    Chai, Y., Lempitsky, V., Zisserman, A.: Symbiotic segmentation and part localization for fine-grained categorization. In: International Conference on Computer Vision (ICCV), pp. 321–328. IEEE (2013)Google Scholar
  9. 9.
    Chai, Y., Rahtu, E., Lempitsky, V., Gool, L., Zisserman, A.: TriCoS: a tri-level class-discriminative co-segmentation method for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 794–807. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33718-5_57 Google Scholar
  10. 10.
    Chen, X., Gupta, A.: Webly supervised learning of convolutional networks. In: International Conference on Computer Vision (ICCV). IEEE (2015)Google Scholar
  11. 11.
    Chen, X., Shrivastava, A., Gupta, A.: Neil: Extracting visual knowledge from web data. In: International Conference on Computer Vision (ICCV), pp. 1409–1416. IEEE (2013)Google Scholar
  12. 12.
    Collins, B., Deng, J., Li, K., Fei-Fei, L.: Towards scalable dataset construction: an active learning approach. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 86–98. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-88682-2_8 CrossRefGoogle Scholar
  13. 13.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  14. 14.
    Deng, J., Krause, J., Fei-Fei, L.: Fine-grained crowdsourcing for fine-grained recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2013)Google Scholar
  15. 15.
    Divvala, S.K., Farhadi, A., Guestrin, C.: Learning everything about anything: webly-supervised visual concept learning. In: Computer Vision and Pattern Recognition (CVPR), pp. 3270–3277. IEEE (2014)Google Scholar
  16. 16.
    Duan, K., Parikh, D., Crandall, D., Grauman, K.: Discovering localized attributes for fine-grained recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 3474–3481. IEEEGoogle Scholar
  17. 17.
    Erkan, A.N.: Semi-supervised learning via generalized maximum entropy. Ph.D. thesis, New York University (2010)Google Scholar
  18. 18.
    Farrell, R., Oza, O., Zhang, N., Morariu, V.I., Darrell, T., Davis, L.S.: Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In: International Conference on Computer Vision (ICCV), pp. 161–168. IEEE (2011)Google Scholar
  19. 19.
    Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from internet image searches. Proc. IEEE 98(8), 1453–1466 (2010)CrossRefGoogle Scholar
  20. 20.
    Gavves, E., Fernando, B., Snoek, C.G., Smeulders, A.W., Tuytelaars, T.: Fine-grained categorization by alignments. In: International Conference on Computer Vision (ICCV), pp. 1713–1720. IEEEGoogle Scholar
  21. 21.
    Gavves, E., Fernando, B., Snoek, C.G., Smeulders, A.W., Tuytelaars, T.: Local alignments for fine-grained categorization. Int. J. Comput. Vision (IJCV), 1–22 (2014)Google Scholar
  22. 22.
    Goering, C., Rodner, E., Freytag, A., Denzler, J.: Nonparametric part transfer for fine-grained recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 2489–2496. IEEE (2014)Google Scholar
  23. 23.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2016)Google Scholar
  24. 24.
    Hinchliff, C.E., Smith, S.A., Allman, J.F., Burleigh, J.G., Chaudhary, R., Coghill, L.M., Crandall, K.A., Deng, J., Drew, B.T., Gazis, R., Gude, K., Hibbett, D.S., Katz, L.A., Laughinghouse, H.D., McTavish, E.J., Midford, P.E., Owen, C.L., Ree, R.H., Rees, J.A., Soltis, D.E., Williams, T., Cranston, K.A.: Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proc. Nat. Acad. Sci. (2015). http://www.pnas.org/content/early/2015/09/16/1423041112.abstract
  25. 25.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML) (2015)Google Scholar
  26. 26.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Neural Information Processing Systems (NIPS) (2015)Google Scholar
  27. 27.
    Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization, Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, June 2011Google Scholar
  28. 28.
    Krause, J., Gebru, T., Deng, J., Li, L.J., Fei-Fei, L.: Learning features and parts for fine-grained recognition. In: International Conference on Pattern Recognition (ICPR), Stockholm, Sweden, August 2014Google Scholar
  29. 29.
    Krause, J., Jin, H., Yang, J., Fei-Fei, L.: Fine-grained recognition without part annotations. In: Conference on Computer Vision and Pattern Recognition (CVPR). IEEEGoogle Scholar
  30. 30.
    Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13). IEEE (2013)Google Scholar
  31. 31.
    Kumar, N., Belhumeur, P.N., Biswas, A., Jacobs, D.W., Kress, W.J., Lopez, I.C., Soares, J.V.: Leafsnap: a computer vision system for automatic plant species identification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) European Conference on Computer Vision (ECCV), vol. 7573, pp. 502–516. Springer, Heidelberg (2012)Google Scholar
  32. 32.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  33. 33.
    Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: International Conference on Machine Learning (ICML), pp. 148–156 (1994)Google Scholar
  34. 34.
    Li, L.J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. Int. J. Comput. Vision (IJCV) 88(2), 147–168 (2010)CrossRefGoogle Scholar
  35. 35.
    Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear cnn models for fine-grained visual recognition. In: International Conference on Computer Vision (ICCV). IEEEGoogle Scholar
  36. 36.
    Liu, J., Kanazawa, A., Jacobs, D., Belhumeur, P.: Dog breed classification using part localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 172–185. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33718-5_13 Google Scholar
  37. 37.
    Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. Technical report (2013)Google Scholar
  38. 38.
    Mnih, V., Hinton, G.E.: Learning to label aerial images from noisy data. In: International Conference on Machine Learning (ICML), pp. 567–574 (2012)Google Scholar
  39. 39.
    Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. Proc. VLDB Endowment 8(2), 125–136 (2014)CrossRefGoogle Scholar
  40. 40.
    Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 1447–1454. IEEE (2006)Google Scholar
  41. 41.
    Pu, J., Jiang, Y.-G., Wang, J., Xue, X.: Which looks like which: exploring inter-class relationships in fine-grained visual categorization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 425–440. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10578-9_28 Google Scholar
  42. 42.
    Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., Rabinovich, A.: Training deep neural networks on noisy labels with bootstrapping (2014). arXiv preprint arXiv:1412.6596
  43. 43.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV), 1–42, April 2015Google Scholar
  44. 44.
    Schroff, F., Criminisi, A., Zisserman, A.: Harvesting image databases from the web. Pattern Anal. Mach. Intell. (PAMI) 33(4), 754–766 (2011)CrossRefGoogle Scholar
  45. 45.
    Sermanet, P., Frome, A., Real, E.: Attention for fine-grained categorization (2014). arXiv preprint arXiv:1412.7054
  46. 46.
    Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–66), 11 (2010)Google Scholar
  47. 47.
    Settles, B., Craven, M., Ray, S.: Multiple-instance active learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 1289–1296 (2008)Google Scholar
  48. 48.
    Shih, K.J., Mallya, A., Singh, S., Hoiem, D.: Part localization using multi-proposal consensus for fine-grained categorization. In: British Machine Vision Conference (BMVC) (2015)Google Scholar
  49. 49.
    Simon, M., Rodner, E.: Neural activation constellations: unsupervised part model discovery with convolutional networks. In: ICCV (2015)Google Scholar
  50. 50.
    Simon, M., Rodner, E., Denzler, J.: Part detector discovery in deep convolutional neural networks. In: Asian Conference on Computer Vision (ACCV), vol. 2, pp.162–177 (2014)Google Scholar
  51. 51.
    Sukhbaatar, S., Fergus, R.: Learning from noisy labels with deep neural networks (2014). arXiv preprint arXiv:1406.2080
  52. 52.
    Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning (2016). arXiv preprint arXiv:1602.07261
  53. 53.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  54. 54.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2016)Google Scholar
  55. 55.
    Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., Li, L.J.: The new data and new challenges in multimedia research (2015). arXiv preprint arXiv:1503.01817
  56. 56.
    Torralba, A., Efros, A., et al.: Unbiased look at dataset bias. In: Computer Vision and Pattern Recognition (CVPR), pp. 1521–1528. IEEE (2011)Google Scholar
  57. 57.
    Van Horn, G., Branson, S., Farrell, R., Haber, S., Barry, J., Ipeirotis, P., Perona, P., Belongie, S.: Building a bird recognition app. and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Computer Vision and Pattern Recognition (CVPR). IEEE (2015)Google Scholar
  58. 58.
    Vedaldi, A., Mahendran, S., Tsogkas, S., Maji, S., Girshick, B., Kannala, J., Rahtu, E., Kokkinos, I., Blaschko, M.B., Weiss, D., Taskar, B., Simonyan, K., Saphra, N., Mohamed, S.: Understanding objects in detail with fine-grained attributes. In: Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  59. 59.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)Google Scholar
  60. 60.
    Wah, C., Belongie, S.: Attribute-based detection of unfamiliar classes with humans in the loop. In: Computer Vision and Pattern Recognition (CVPR), pp. 779–786. IEEE (2013)Google Scholar
  61. 61.
    Wah, C., Branson, S., Perona, P., Belongie, S.: Multiclass recognition and part localization with humans in the loop. In: International Conference on Computer Vision (ICCV), pp. 2524–2531. IEEE (2011)Google Scholar
  62. 62.
    Wah, C., Horn, G., Branson, S., Maji, S., Perona, P., Belongie, S.: Similarity comparisons for interactive fine-grained categorization. In: Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  63. 63.
    Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393 (2014)Google Scholar
  64. 64.
    Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical report CNS-TR-2010-001, California Institute of Technology (2010)Google Scholar
  65. 65.
    Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Computer Vision and Pattern Recognition (CVPR). IEEEGoogle Scholar
  66. 66.
    Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: Computer Vision and Pattern Recognition (CVPR). IEEEGoogle Scholar
  67. 67.
    Xie, S., Yang, T., Wang, X., Lin, Y.: Hyper-class augmented and regularized deep learning for fine-grained image classification. In: Computer Vision and Pattern Recognition (CVPR). IEEEGoogle Scholar
  68. 68.
    Xu, Z., Huang, S., Zhang, Y., Tao, D.: Augmenting strong supervision using web data for fine-grained categorization. In: International Conference on Computer Vision (ICCV) (2015)Google Scholar
  69. 69.
    Yang, L., Luo, P., Loy, C.C., Tang, X.: A large-scale car dataset for fine-grained categorization and verification. In: Computer Vision and Pattern Recognition (CVPR). IEEEGoogle Scholar
  70. 70.
    Yang, S., Bo, L., Wang, J., Shapiro, L.G.: Unsupervised template learning for fine-grained object recognition. In: Advances in Neural Information Processing Systems (NIPS), pp. 3122–3130 (2012)Google Scholar
  71. 71.
    Yao, B., Bradski, G., Fei-Fei, L.: A codebook-free and annotation-free approach for fine-grained image categorization. In: Computer Vision and Pattern Recognition (CVPR), pp. 3466–3473. IEEE (2012)Google Scholar
  72. 72.
    Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: Computer Vision and Pattern Recognition (CVPR), pp. 1577–1584. IEEE (2011)Google Scholar
  73. 73.
    Yu, F., Zhang, Y., Song, S., Seff, A., Xiao, J.: Construction of a large-scale image dataset using deep learning with humans in the loop (2015). arXiv preprint arXiv:1506.03365
  74. 74.
    Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based r-cnns for fine-grained category detection. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 834–849. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10590-1_54 Google Scholar
  75. 75.
    Zhang, N., Farrell, R., Darrell, T.: Pose pooling kernels for sub-category recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 3665–3672. IEEE (2012)Google Scholar
  76. 76.
    Zhang, N., Farrell, R., Iandola, F., Darrell, T.: Deformable part descriptors for fine-grained recognition and attribute prediction. In: International Conference on Computer Vision (ICCV), pp. 729–736. IEEE (2013)Google Scholar
  77. 77.
    Zhang, Y., Wei, X-S., Wu, J., Cai, J., Lu, J., Nguyen, V.A., Do, M.N.: Weakly supervised fine-grained image categorization (2015). arXiv preprint arXiv:1504.04943

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Jonathan Krause
    • 1
  • Benjamin Sapp
    • 2
  • Andrew Howard
    • 2
  • Howard Zhou
    • 2
  • Alexander Toshev
    • 2
  • Tom Duerig
    • 2
  • James Philbin
    • 2
  • Li Fei-Fei
    • 1
  1. 1.Stanford UniversityStanfordUSA
  2. 2.GoogleMountain ViewUSA

Personalised recommendations