Advertisement

Efficient Transfer Learning via Joint Adaptation of Network Architecture and Weight

Conference paper
  • 423 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12358)

Abstract

Transfer learning can boost the performance on the target task by leveraging the knowledge of the source domain. Recent works in neural architecture search (NAS), especially one-shot NAS, can aid transfer learning by establishing sufficient network search space. However, existing NAS methods tend to approximate huge search spaces by explicitly building giant super-networks with multiple sub-paths, and discard super-network weights after a child structure is found. Both the characteristics of existing approaches causes repetitive network training on source tasks in transfer learning. To remedy the above issues, we reduce the super-network size by randomly dropping connection between network blocks while embedding a larger search space. Moreover, we reuse super-network weights to avoid redundant training by proposing a novel framework consisting of two modules, the neural architecture search module for architecture transfer and the neural weight search module for weight transfer. These two modules conduct search on the target task based on a reduced super-networks, so we only need to train once on the source task. We experiment our framework on both MS-COCO and CUB-200 for the object detection and fine-grained image classification tasks, and show promising improvements with only \(O(C^{N})\) super-network complexity.

Keywords

Neural architecture search Transfer learning Weight sharing 

References

  1. 1.
    Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. CoRR abs/1611.02167 (2016). http://arxiv.org/abs/1611.02167
  2. 2.
    Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q.: Understanding and simplifying one-shot architecture search. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 550–559. PMLR, Stockholmsmüssan, Stockholm Sweden, 10–15 July 2018. http://proceedings.mlr.press/v80/bender18a.html
  3. 3.
    Cai, H., Zhu, L., Han, S.: ProxylessNAS: direct neural architecture search on target task and hardware. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HylVB3AqYm
  4. 4.
    Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. CoRR abs/1712.00726 (2017). http://arxiv.org/abs/1712.00726
  5. 5.
    Cui, Y., Song, Y., Sun, C., Howard, A., Belongie, S.: Large scale fine-grained categorization and domain-specific transfer learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  6. 6.
    Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 193–200. ACM, New York (2007).  https://doi.org/10.1145/1273496.1273521. http://doi.acm.org/10.1145/1273496.1273521
  7. 7.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)Google Scholar
  8. 8.
    Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010). http://dl.acm.org/citation.cfm?id=1756006.1756025
  9. 9.
    Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICML 2015, vol. 37, pp. 1180–1189. JMLR.org (2015). http://dl.acm.org/citation.cfm?id=3045118.3045244
  10. 10.
    Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096–2030 (2016). http://dl.acm.org/citation.cfm?id=2946645.2946704
  11. 11.
    Ghiasi, G., Lin, T.Y., Le, Q.V.: Dropblock: a regularization method for convolutional networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, pp. 10750–10760. Curran Associates Inc., USA (2018). http://dl.acm.org/citation.cfm?id=3327546.3327732
  12. 12.
    Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. CoRR abs/1904.00420 (2019). http://arxiv.org/abs/1904.00420
  13. 13.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)Google Scholar
  14. 14.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. CoRR abs/1703.06870 (2017). http://arxiv.org/abs/1703.06870
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
  16. 16.
    He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of tricks for image classification with convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  17. 17.
    Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv abs/1503.02531 (2015)Google Scholar
  18. 18.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, June 2018.  https://doi.org/10.1109/CVPR.2018.00745
  19. 19.
    Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. CoRR abs/1603.09382 (2016). http://arxiv.org/abs/1603.09382
  20. 20.
    Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  21. 21.
    Jang, Y., Lee, H., Hwang, S.J., Shin, J.: Learning what and where to transfer. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, USA, 9–15 June 2019, pp. 3030–3039 (2019). http://proceedings.mlr.press/v97/jang19b.html
  22. 22.
    Kornblith, S., Shlens, J., Le, Q.V.: Do better imagenet models transfer better? In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  23. 23.
    Li, Q., Jin, S., Yan, J.: Mimicking very efficient network for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  24. 24.
    Liang, F., et al.: Computation reallocation for object detection. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=SkxLFaNKwB
  25. 25.
    Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944, July 2017.  https://doi.org/10.1109/CVPR.2017.106
  26. 26.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  27. 27.
    Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=S1eYHoC5FX
  28. 28.
    Liu, X., Liu, Z., Wang, G., Cai, Z., Zhang, H.: Ensemble transfer learning algorithm. IEEE Access 6, 2389–2396 (2018).  https://doi.org/10.1109/ACCESS.2017.2782884CrossRefGoogle Scholar
  29. 29.
    Long, M., Wang, J.: Learning transferable features with deep adaptation networks. CoRR abs/1502.02791 (2015). http://arxiv.org/abs/1502.02791
  30. 30.
    Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: Grid R-CNN. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  31. 31.
    Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 4898–4906 (2016)Google Scholar
  32. 32.
    Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. CoRR abs/1805.00932 (2018). http://arxiv.org/abs/1805.00932
  33. 33.
    Miikkulainen, R., et al.: Evolving deep neural networks. CoRR abs/1703.00548 (2017). http://arxiv.org/abs/1703.00548
  34. 34.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010).  https://doi.org/10.1109/TKDE.2009.191CrossRefGoogle Scholar
  35. 35.
    Pardoe, D., Stone, P.: Boosting for regression transfer. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML 2010, Omnipress, USA, pp. 863–870 (2010). http://dl.acm.org/citation.cfm?id=3104322.3104432
  36. 36.
    Paszke, A., et al.: Automatic differentiation in pytorch (2017)Google Scholar
  37. 37.
    Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. CoRR abs/1802.03268 (2018). http://arxiv.org/abs/1802.03268
  38. 38.
    Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search (2018). https://arxiv.org/pdf/1802.01548.pdf
  39. 39.
    Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, vol. 70, pp. 2902–2911. JMLR.org (2017). http://dl.acm.org/citation.cfm?id=3305890.3305981
  40. 40.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  41. 41.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015). http://arxiv.org/abs/1506.01497
  42. 42.
    Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018). http://arxiv.org/abs/1801.04381
  43. 43.
    Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. CoRR abs/1808.01974 (2018). http://arxiv.org/abs/1808.01974
  44. 44.
    Tan, M., et al.: Mnasnet: platform-aware neural architecture search for mobile. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  45. 45.
    Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. CoRR abs/1905.11946 (2019). http://arxiv.org/abs/1905.11946
  46. 46.
    Tzeng, E., Hoffman, J., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. In: The IEEE International Conference on Computer Vision (ICCV), December 2015Google Scholar
  47. 47.
    Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  48. 48.
    Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. CoRR abs/1412.3474 (2014). http://arxiv.org/abs/1412.3474
  49. 49.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report (2011)Google Scholar
  50. 50.
    Wang, C., Wu, Y., Liu, Z.: Hierarchical boosting for transfer learning with multi-source. In: Proceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering, ICAIR-CACRE 2016, pp. 15:1–15:5. ACM, New York (2016).  https://doi.org/10.1145/2952744.2952756. http://doi.acm.org/10.1145/2952744.2952756
  51. 51.
    Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  52. 52.
    Welinder, P., et al.: Caltech-UCSD Birds 200. Technical report, CNS-TR-2010-001, California Institute of Technology (2010)Google Scholar
  53. 53.
    Wu, B., et al.: FBNet: hardware-aware efficient convnet design via differentiable neural architecture search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  54. 54.
    Xie, L., Yuille, A.: Genetic CNN. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  55. 55.
    Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  56. 56.
    Xie, S., Zheng, H., Liu, C., Lin, L.: SNAS: stochastic neural architecture search. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=rylqooRqK7
  57. 57.
    Xu, Y., et al.: A unified framework for metric transfer learning. IEEE Trans. Knowl. Data Eng. 29(6), 1158–1171 (2017).  https://doi.org/10.1109/TKDE.2017.2669193CrossRefGoogle Scholar
  58. 58.
    Yue, K., Sun, M., Yuan, Y., Zhou, F., Ding, E., Xu, F.: Compact generalized non-local network. In: Advances in Neural Information Processing Systems, pp. 6510–6519 (2018)Google Scholar
  59. 59.
    Zhang, J., Li, W., Ogunbona, P.: Joint geometrical and statistical alignment for visual domain adaptation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  60. 60.
    Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. CoRR abs/1611.01578 (2016). http://arxiv.org/abs/1611.01578
  61. 61.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012 (2017). http://arxiv.org/abs/1707.07012

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.SenseTime Group LimitedBeijingChina

Personalised recommendations