Efficient Transfer Learning via Joint Adaptation of Network Architecture and Weight

Sun, Ming; Dou, Haoxuan; Yan, Junjie

doi:10.1007/978-3-030-58601-0_28

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12358))

Included in the following conference series:

European Conference on Computer Vision

2966 Accesses
3 Citations

Abstract

Transfer learning can boost the performance on the target task by leveraging the knowledge of the source domain. Recent works in neural architecture search (NAS), especially one-shot NAS, can aid transfer learning by establishing sufficient network search space. However, existing NAS methods tend to approximate huge search spaces by explicitly building giant super-networks with multiple sub-paths, and discard super-network weights after a child structure is found. Both the characteristics of existing approaches causes repetitive network training on source tasks in transfer learning. To remedy the above issues, we reduce the super-network size by randomly dropping connection between network blocks while embedding a larger search space. Moreover, we reuse super-network weights to avoid redundant training by proposing a novel framework consisting of two modules, the neural architecture search module for architecture transfer and the neural weight search module for weight transfer. These two modules conduct search on the target task based on a reduced super-networks, so we only need to train once on the source task. We experiment our framework on both MS-COCO and CUB-200 for the object detection and fine-grained image classification tasks, and show promising improvements with only \(O(C^{N})\) super-network complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. CoRR abs/1611.02167 (2016). http://arxiv.org/abs/1611.02167
Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q.: Understanding and simplifying one-shot architecture search. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 550–559. PMLR, Stockholmsmüssan, Stockholm Sweden, 10–15 July 2018. http://proceedings.mlr.press/v80/bender18a.html
Cai, H., Zhu, L., Han, S.: ProxylessNAS: direct neural architecture search on target task and hardware. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HylVB3AqYm
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. CoRR abs/1712.00726 (2017). http://arxiv.org/abs/1712.00726
Cui, Y., Song, Y., Sun, C., Howard, A., Belongie, S.: Large scale fine-grained categorization and domain-specific transfer learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 193–200. ACM, New York (2007). https://doi.org/10.1145/1273496.1273521. http://doi.acm.org/10.1145/1273496.1273521
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)
Google Scholar
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010). http://dl.acm.org/citation.cfm?id=1756006.1756025
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICML 2015, vol. 37, pp. 1180–1189. JMLR.org (2015). http://dl.acm.org/citation.cfm?id=3045118.3045244
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096–2030 (2016). http://dl.acm.org/citation.cfm?id=2946645.2946704
Ghiasi, G., Lin, T.Y., Le, Q.V.: Dropblock: a regularization method for convolutional networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, pp. 10750–10760. Curran Associates Inc., USA (2018). http://dl.acm.org/citation.cfm?id=3327546.3327732
Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. CoRR abs/1904.00420 (2019). http://arxiv.org/abs/1904.00420
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. CoRR abs/1703.06870 (2017). http://arxiv.org/abs/1703.06870
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of tricks for image classification with convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv abs/1503.02531 (2015)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, June 2018. https://doi.org/10.1109/CVPR.2018.00745
Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. CoRR abs/1603.09382 (2016). http://arxiv.org/abs/1603.09382
Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Jang, Y., Lee, H., Hwang, S.J., Shin, J.: Learning what and where to transfer. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, USA, 9–15 June 2019, pp. 3030–3039 (2019). http://proceedings.mlr.press/v97/jang19b.html
Kornblith, S., Shlens, J., Le, Q.V.: Do better imagenet models transfer better? In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Li, Q., Jin, S., Yan, J.: Mimicking very efficient network for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Liang, F., et al.: Computation reallocation for object detection. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=SkxLFaNKwB
Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944, July 2017. https://doi.org/10.1109/CVPR.2017.106
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=S1eYHoC5FX
Liu, X., Liu, Z., Wang, G., Cai, Z., Zhang, H.: Ensemble transfer learning algorithm. IEEE Access 6, 2389–2396 (2018). https://doi.org/10.1109/ACCESS.2017.2782884
Article Google Scholar
Long, M., Wang, J.: Learning transferable features with deep adaptation networks. CoRR abs/1502.02791 (2015). http://arxiv.org/abs/1502.02791
Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: Grid R-CNN. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 4898–4906 (2016)
Google Scholar
Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. CoRR abs/1805.00932 (2018). http://arxiv.org/abs/1805.00932
Miikkulainen, R., et al.: Evolving deep neural networks. CoRR abs/1703.00548 (2017). http://arxiv.org/abs/1703.00548
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191
Article Google Scholar
Pardoe, D., Stone, P.: Boosting for regression transfer. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML 2010, Omnipress, USA, pp. 863–870 (2010). http://dl.acm.org/citation.cfm?id=3104322.3104432
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Google Scholar
Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. CoRR abs/1802.03268 (2018). http://arxiv.org/abs/1802.03268
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search (2018). https://arxiv.org/pdf/1802.01548.pdf
Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, vol. 70, pp. 2902–2911. JMLR.org (2017). http://dl.acm.org/citation.cfm?id=3305890.3305981
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015). http://arxiv.org/abs/1506.01497
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018). http://arxiv.org/abs/1801.04381
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. CoRR abs/1808.01974 (2018). http://arxiv.org/abs/1808.01974
Tan, M., et al.: Mnasnet: platform-aware neural architecture search for mobile. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. CoRR abs/1905.11946 (2019). http://arxiv.org/abs/1905.11946
Tzeng, E., Hoffman, J., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. In: The IEEE International Conference on Computer Vision (ICCV), December 2015
Google Scholar
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. CoRR abs/1412.3474 (2014). http://arxiv.org/abs/1412.3474
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report (2011)
Google Scholar
Wang, C., Wu, Y., Liu, Z.: Hierarchical boosting for transfer learning with multi-source. In: Proceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering, ICAIR-CACRE 2016, pp. 15:1–15:5. ACM, New York (2016). https://doi.org/10.1145/2952744.2952756. http://doi.acm.org/10.1145/2952744.2952756
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Welinder, P., et al.: Caltech-UCSD Birds 200. Technical report, CNS-TR-2010-001, California Institute of Technology (2010)
Google Scholar
Wu, B., et al.: FBNet: hardware-aware efficient convnet design via differentiable neural architecture search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Xie, L., Yuille, A.: Genetic CNN. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
Google Scholar
Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Xie, S., Zheng, H., Liu, C., Lin, L.: SNAS: stochastic neural architecture search. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=rylqooRqK7
Xu, Y., et al.: A unified framework for metric transfer learning. IEEE Trans. Knowl. Data Eng. 29(6), 1158–1171 (2017). https://doi.org/10.1109/TKDE.2017.2669193
Article Google Scholar
Yue, K., Sun, M., Yuan, Y., Zhou, F., Ding, E., Xu, F.: Compact generalized non-local network. In: Advances in Neural Information Processing Systems, pp. 6510–6519 (2018)
Google Scholar
Zhang, J., Li, W., Ogunbona, P.: Joint geometrical and statistical alignment for visual domain adaptation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. CoRR abs/1611.01578 (2016). http://arxiv.org/abs/1611.01578
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012 (2017). http://arxiv.org/abs/1707.07012

Download references

Author information

Authors and Affiliations

SenseTime Group Limited, Beijing, China
Ming Sun, Haoxuan Dou & Junjie Yan

Authors

Ming Sun
View author publications
You can also search for this author in PubMed Google Scholar
Haoxuan Dou
View author publications
You can also search for this author in PubMed Google Scholar
Junjie Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Sun .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, M., Dou, H., Yan, J. (2020). Efficient Transfer Learning via Joint Adaptation of Network Architecture and Weight. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12358. Springer, Cham. https://doi.org/10.1007/978-3-030-58601-0_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-58601-0_28
Published: 28 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58600-3
Online ISBN: 978-3-030-58601-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics