Advertisement

Learning to Learn: Model Regression Networks for Easy Small Sample Learning

  • Yu-Xiong WangEmail author
  • Martial Hebert
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9910)

Abstract

We develop a conceptually simple but powerful approach that can learn novel categories from few annotated examples. In this approach, the experience with already learned categories is used to facilitate the learning of novel classes. Our insight is two-fold: (1) there exists a generic, category agnostic transformation from models learned from few samples to models learned from large enough sample sets, and (2) such a transformation could be effectively learned by high-capacity regressors. In particular, we automatically learn the transformation with a deep model regression network on a large collection of model pairs. Experiments demonstrate that encoding this transformation as prior knowledge greatly facilitates the recognition in the small sample size regime on a broad range of tasks, including domain adaptation, fine-grained recognition, action recognition, and scene classification.

Keywords

Small sample learning Transfer learning Object recognition Model transformation Deep regression networks 

Notes

Acknowledgments

We thank Liangyan Gui, David Fouhey, and Deva Ramanan for valuable and insightful discussions. This work was supported in part by ONR MURI N000141612007 and U.S. Army Research Laboratory (ARL) under the Collaborative Technology Alliance Program, Cooperative Agreement W911NF-10-2-0016. We also thank NVIDIA for donating GPUs and AWS Cloud Credits for Research program.

References

  1. 1.
    Amit, Y., Fink, M., Srebro, N., Ullman, S.: Uncovering shared structures in multiclass classification. In: ICML (2007)Google Scholar
  2. 2.
    Aytar, Y., Zisserman, A.: Tabula rasa: model transfer for object category detection. In: ICCV (2011)Google Scholar
  3. 3.
    Aytar, Y., Zisserman, A.: Enhancing exemplar SVMs using part level transfer regularization. In: BMVC (2012)Google Scholar
  4. 4.
    Azizpour, H., Razavian, A.S., Sullivan, J., Maki, A., Carlsson, S.: From generic to specific deep representations for visual recognition. In: CVPR Workshops (2015)Google Scholar
  5. 5.
    Ba, J., Caruana, R.: Do deep nets really need to be deep? In: NIPS (2014)Google Scholar
  6. 6.
    Ba, J., Swersky, K., Fidler, S., Salakhutdinov, R.: Predicting deep zero-shot convolutional neural networks using textual descriptions. In: ICCV (2015)Google Scholar
  7. 7.
    Bart, E., Ullman, S.: Cross-generalization: learning novel classes from a single example by feature replacement. In: CVPR (2005)Google Scholar
  8. 8.
    Bart, E., Ullman, S.: Single-example learning of novel classes using representation by similarity. In: BMVC (2005)Google Scholar
  9. 9.
    Baxter, J.: A Bayesian/information theoretic model of learning to learn via multiple task sampling. Mach. Learn. 28(1), 7–39 (1997)CrossRefzbMATHGoogle Scholar
  10. 10.
    Bellet, A., Habrard, A., Sebban, M.: A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709 (2013)
  11. 11.
    Ben-David, S., Schuller, R.: Exploiting task relatedness for multiple task learning. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 567–580. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Bendale, A., Boult, T.: Towards open world recognition. In: CVPR (2015)Google Scholar
  13. 13.
    Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: KDD (2006)Google Scholar
  14. 14.
    Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised Learning. Adaptive Computation and Machine Learning. The MIT Press, Cambridge (2006)CrossRefGoogle Scholar
  16. 16.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)Google Scholar
  17. 17.
    Daumé III, H.: Frustratingly easy domain adaptation. In: ACL (2007)Google Scholar
  18. 18.
    Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a laplacian pyramid of adversarial networks. In: NIPS (2015)Google Scholar
  19. 19.
    Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: ICCV (2015)Google Scholar
  20. 20.
    Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML (2014)Google Scholar
  21. 21.
    Dosovitskiy, A., Springenberg, J.T., Brox, T.: Learning to generate chairs with convolutional neural networks. In: CVPR (2015)Google Scholar
  22. 22.
    Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: NIPS (2014)Google Scholar
  23. 23.
    Duan, L., Tsang, I.W., Xu, D., Chua, T.S.: Domain adaptation from multiple sources via auxiliary classifiers. In: ICML (2009)Google Scholar
  24. 24.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. JMLR 9, 1871–1874 (2008)zbMATHGoogle Scholar
  25. 25.
    Fei-Fei, L., Fergus, R., Perona, P.: A Bayesian approach to unsupervised one-shot learning of object categories. In: ICCV (2003)Google Scholar
  26. 26.
    Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. TPAMI 28(4), 594–611 (2006)CrossRefGoogle Scholar
  27. 27.
    Ferencz, A., Learned-Miller, E.G., Malik, J.: Building a classification cascade for visual identification from one example. In: ICCV (2005)Google Scholar
  28. 28.
    Fernando, B., Habrard, A., Sebban, M., Tuytelaars, T.: Unsupervised visual domain adaptation using subspace alignment. In: ICCV (2013)Google Scholar
  29. 29.
    Fink, M.: Object classification from a single example utilizing class relevance metrics. In: NIPS (2005)Google Scholar
  30. 30.
    Fink, M.: Acquiring a new class from a few examples: learning recurrent domain structures in humans and machines. Ph.D. thesis, The Hebrew University of Jerusalem (2011)Google Scholar
  31. 31.
    Fleuret, F., Blanchard, G.: Pattern recognition from one example by chopping. In: NIPS (2005)Google Scholar
  32. 32.
    Fu, Y., Sigal, L.: Semi-supervised vocabulary-informed learning. In: CVPR (2016)Google Scholar
  33. 33.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  34. 34.
    Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: CVPR (2012)Google Scholar
  35. 35.
    Goroshin, R., Bruna, J., Tompson, J., Eigen, D., LeCun, Y.: Unsupervised learning of spatiotemporally coherent metrics. In: ICCV (2015)Google Scholar
  36. 36.
    Gupta, S., Hoffman, J., Malik, J.: Cross modal distillation for supervision transfer. In: CVPR (2016)Google Scholar
  37. 37.
    Hariharan, B., Girshick, R.: Low-shot visual object recognition. arXiv preprint arXiv:1606.02819 (2016)
  38. 38.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  39. 39.
    Held, D., Thrun, S., Savarese, S.: Robust single-view instance recognition. In: ICRA (2016)Google Scholar
  40. 40.
    Hertz, T., Hillel, A.B., Weinshall, D.: Learning a kernel function for classification with small training samples. In: ICML (2006)Google Scholar
  41. 41.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Workshops (2014)Google Scholar
  42. 42.
    Hoffman, J., Rodner, E., Donahue, J., Darrell, T., Saenko, K.: Efficient learning of domain-invariant image representations. In: ICLR (2013)Google Scholar
  43. 43.
    Hoffman, J., Tzeng, E., Donahue, J., Jia, Y., Saenko, K., Darrell, T.: One-shot adaptation of supervised deep convolutional models. In: ICLR Workshops (2014)Google Scholar
  44. 44.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  45. 45.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM MM (2014)Google Scholar
  46. 46.
    Joulin, A., van der Maaten, L., Jabri, A., Vasilache, N.: Learning visual features from large weakly supervised data. In: ECCV (2016)Google Scholar
  47. 47.
    Kienzle, W., Chellapilla, K.: Personalized handwriting recognition via biased regularization. In: ICML (2006)Google Scholar
  48. 48.
    Kim, J., Collomosse, J.: Incremental transfer learning for object recognition in streaming video. In: ICIP (2014)Google Scholar
  49. 49.
    Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Workshops (2015)Google Scholar
  50. 50.
    Krause, E.A., Zillich, M., Williams, T.E., Scheutz, M.: Learning to recognize novel objects in one shot through human-robot interactions in natural language dialogues. In: AAAI (2014)Google Scholar
  51. 51.
    Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalanditis, Y., Li, L.J., Shamma, D.A., Bernstein, M., Fei-Fei, L.: Visual genome: connecting language and vision using crowdsourced dense image annotations. arXiv preprint arXiv:1602.07332 (2016)
  52. 52.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  53. 53.
    Kuzborskij, I., Orabona, F., Caputo, B.: From N to N+1: multiclass transfer incremental learning. In: CVPR (2013)Google Scholar
  54. 54.
    Lake, B.M., Salakhutdinov, R., Gross, J., Tenenbaum, J.B.: One shot learning of simple visual concepts. In: CogSci (2011)Google Scholar
  55. 55.
    Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: One-shot learning by inverting a compositional causal process. In: NIPS (2013)Google Scholar
  56. 56.
    Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)MathSciNetCrossRefGoogle Scholar
  57. 57.
    Lawrence, N.D., Platt, J.C., Jordan, M.I.: Extensions of the informative vector machine. In: Winkler, J.R., Niranjan, M., Lawrence, N.D. (eds.) Deterministic and Statistical Methods in Machine Learning. LNCS (LNAI), vol. 3635, pp. 56–87. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  58. 58.
    Lee, S.I., Chatalbashev, V., Vickrey, D., Koller, D.: Learning a meta-level prior for feature relevance from multiple related tasks. In: ICML (2007)Google Scholar
  59. 59.
    Levi, K., Fink, M., Weiss, Y.: Learning from a small number of training examples by exploiting object categories. In: CVPR Workshops (2004)Google Scholar
  60. 60.
    Levi, K., Weiss, Y.: Learning object detection from a small number of examples: the importance of good features. In: CVPR (2004)Google Scholar
  61. 61.
    Lim, J.J., Salakhutdinov, R., Torralba, A.: Transfer learning by borrowing examples for multiclass object detection. In: NIPS (2011)Google Scholar
  62. 62.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014)Google Scholar
  63. 63.
    Miller, E.G., Matsakis, N.E., Viola, P.A.: Learning from one example through shared densities on transforms. In: CVPR (2000)Google Scholar
  64. 64.
    Misra, I., Wang, Y.-X., Hebert, M.: Learning object models from few examples. In: SPIE Unmanned Systems Technology XVIII (2016)Google Scholar
  65. 65.
    Misra, I., Zitnick, C.L., Hebert, M.: Shuffle and learn: unsupervised learning using temporal order verification. In: ECCV (2016)Google Scholar
  66. 66.
    Movshovitz-Attias, Y.: Dataset curation through renders and ontology matching. Ph.D. thesis, Carnegie Mellon University (2015)Google Scholar
  67. 67.
    Movshovitz-Attias, Y., Yu, Q., Stumpe, M.C., Shet, V., Arnoud, S., Yatziv, L.: Ontological supervision for fine grained classification of street view storefronts. In: CVPR (2015)Google Scholar
  68. 68.
    Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: ICVGIP (2008)Google Scholar
  69. 69.
    Opelt, A., Pinz, A., Zisserman, A.: Incremental learning of object detectors using a visual shape alphabet. In: CVPR (2006)Google Scholar
  70. 70.
    Pan, S.J., Yang, Q.: A survey on transfer learning. TKDE 22(10), 1345–1359 (2010)Google Scholar
  71. 71.
    Park, D., Ramanan, D.: Articulated pose estimation with tiny synthetic videos. In: CVPR (2015)Google Scholar
  72. 72.
    Patricia, N., Caputo, B.: Learning to learn, from transfer learning to domain adaptation: a unifying perspective. In: CVPR (2014)Google Scholar
  73. 73.
    Patterson, G., Van Horn, G., Belongie, S., Perona, P., Hays, J.: Tropel: crowdsourcing detectors with minimal training. In: HCOMP (2015)Google Scholar
  74. 74.
    Pinker, S.: How the mind works. Ann. N. Y. Acad. Sci. 882(1), 119–127 (1999)CrossRefGoogle Scholar
  75. 75.
    Quattoni, A., Collins, M., Darrell, T.: Transfer learning for image classification with sparse prototype representations. In: CVPR (2008)Google Scholar
  76. 76.
    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR Workshops (2014)Google Scholar
  77. 77.
    Rodner, E.: Visual transfer learning: informal introduction and literature overview. arXiv preprint arXiv:1211.1127 (2012)
  78. 78.
    Rodner, E., Denzler, J.: One-shot learning of object categories using dependent gaussian processes. In: Goesele, M., Roth, S., Kuijper, A., Schiele, B., Schindler, K. (eds.) Pattern Recognition. LNCS, vol. 6376, pp. 232–241. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  79. 79.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  80. 80.
    Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 213–226. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  81. 81.
    Salakhutdinov, R., Tenenbaum, J., Torralba, A.: One-shot learning with a hierarchical nonparametric Bayesian model. In: ICML Workshops (2012)Google Scholar
  82. 82.
    Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: One-shot learning with memory-augmented neural networks. In: ICML (2016)Google Scholar
  83. 83.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  84. 84.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  85. 85.
    Thrun, S., Mitchell, T.M.: Learning one more thing. In: IJCAI (1995)Google Scholar
  86. 86.
    Thrun, S., Pratt, L.: Learning to Learn. Springer Science & Business Media, New York (2012)zbMATHGoogle Scholar
  87. 87.
    Tommasi, T.: Learning to learn by exploiting prior knowledge. Ph.D. thesis, École Polytechnique Fédérale de Lausanne (2013)Google Scholar
  88. 88.
    Tommasi, T., Orabona, F., Caputo, B.: Learning categories from few examples with multi model knowledge transfer. TPAMI 36(5), 928–941 (2014)CrossRefGoogle Scholar
  89. 89.
    Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing visual features for multiclass and multiview object detection. TPAMI 29(5), 854–869 (2007)CrossRefGoogle Scholar
  90. 90.
    Torralba, A., Quattoni, A.: Recognizing indoor scenes. In: CVPR (2009)Google Scholar
  91. 91.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  92. 92.
    Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. arXiv preprint arXiv:1606.04080 (2016)
  93. 93.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 dataset. Technical report, California Institute of Technology (2011)Google Scholar
  94. 94.
    Wan, J., Ruan, Q., Li, W., Deng, S.: One-shot learning gesture recognition from RGB-D data using bag of features. JMLR 14(1), 2549–2582 (2013)Google Scholar
  95. 95.
    Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV (2015)Google Scholar
  96. 96.
    Wang, Y.-X., Hebert, M.: Model recommendation: generating object detectors from few samples. In: CVPR (2015)Google Scholar
  97. 97.
    Wang, Y.-X., Hebert, M.: Learning by transferring from unsupervised universal sources. In: AAAI (2016)Google Scholar
  98. 98.
    Weston, J., Collobert, R., Sinz, F., Bottou, L., Vapnik, V.: Inference with the universum. In: ICML (2006)Google Scholar
  99. 99.
    Weston, J., Ratle, F., Mobahi, H., Collobert, R.: Deep learning via semi-supervised embedding. In: ICML (2008)Google Scholar
  100. 100.
    Wolf, L., Hassner, T., Taigman, Y.: The one-shot similarity kernel. In: ICCV (2009)Google Scholar
  101. 101.
    Wolf, L., Martin, I.: Robust boosting for learning from few examples. In: CVPR (2005)Google Scholar
  102. 102.
    Yang, J., Yan, R., Hauptmann, A.: Adapting SVM classifiers to data with shifted distributions. In: ICDM Workshops (2007)Google Scholar
  103. 103.
    Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: ICCV (2011)Google Scholar
  104. 104.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS (2014)Google Scholar
  105. 105.
    Zhu, X., Anguelov, D., Ramanan, D.: Capturing long-tail distributions of object subcategories. In: CVPR (2014)Google Scholar
  106. 106.
    Zhu, X., Vondrick, C., Fowlkes, C.C., Ramanan, D.: Do we need more training data? IJCV 119(1), 76–92 (2016)MathSciNetCrossRefGoogle Scholar
  107. 107.
    Zhu, X.: Semi-supervised learning literature survey. Technical report, University of Wisconsin-Madison (2005)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations