Abstract
Transfer learning can boost the performance on the target task by leveraging the knowledge of the source domain. Recent works in neural architecture search (NAS), especially one-shot NAS, can aid transfer learning by establishing sufficient network search space. However, existing NAS methods tend to approximate huge search spaces by explicitly building giant super-networks with multiple sub-paths, and discard super-network weights after a child structure is found. Both the characteristics of existing approaches causes repetitive network training on source tasks in transfer learning. To remedy the above issues, we reduce the super-network size by randomly dropping connection between network blocks while embedding a larger search space. Moreover, we reuse super-network weights to avoid redundant training by proposing a novel framework consisting of two modules, the neural architecture search module for architecture transfer and the neural weight search module for weight transfer. These two modules conduct search on the target task based on a reduced super-networks, so we only need to train once on the source task. We experiment our framework on both MS-COCO and CUB-200 for the object detection and fine-grained image classification tasks, and show promising improvements with only \(O(C^{N})\) super-network complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. CoRR abs/1611.02167 (2016). http://arxiv.org/abs/1611.02167
Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q.: Understanding and simplifying one-shot architecture search. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 550–559. PMLR, Stockholmsmüssan, Stockholm Sweden, 10–15 July 2018. http://proceedings.mlr.press/v80/bender18a.html
Cai, H., Zhu, L., Han, S.: ProxylessNAS: direct neural architecture search on target task and hardware. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HylVB3AqYm
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. CoRR abs/1712.00726 (2017). http://arxiv.org/abs/1712.00726
Cui, Y., Song, Y., Sun, C., Howard, A., Belongie, S.: Large scale fine-grained categorization and domain-specific transfer learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 193–200. ACM, New York (2007). https://doi.org/10.1145/1273496.1273521. http://doi.acm.org/10.1145/1273496.1273521
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010). http://dl.acm.org/citation.cfm?id=1756006.1756025
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICML 2015, vol. 37, pp. 1180–1189. JMLR.org (2015). http://dl.acm.org/citation.cfm?id=3045118.3045244
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096–2030 (2016). http://dl.acm.org/citation.cfm?id=2946645.2946704
Ghiasi, G., Lin, T.Y., Le, Q.V.: Dropblock: a regularization method for convolutional networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, pp. 10750–10760. Curran Associates Inc., USA (2018). http://dl.acm.org/citation.cfm?id=3327546.3327732
Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. CoRR abs/1904.00420 (2019). http://arxiv.org/abs/1904.00420
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. CoRR abs/1703.06870 (2017). http://arxiv.org/abs/1703.06870
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of tricks for image classification with convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv abs/1503.02531 (2015)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, June 2018. https://doi.org/10.1109/CVPR.2018.00745
Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. CoRR abs/1603.09382 (2016). http://arxiv.org/abs/1603.09382
Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Jang, Y., Lee, H., Hwang, S.J., Shin, J.: Learning what and where to transfer. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, USA, 9–15 June 2019, pp. 3030–3039 (2019). http://proceedings.mlr.press/v97/jang19b.html
Kornblith, S., Shlens, J., Le, Q.V.: Do better imagenet models transfer better? In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Li, Q., Jin, S., Yan, J.: Mimicking very efficient network for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Liang, F., et al.: Computation reallocation for object detection. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=SkxLFaNKwB
Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944, July 2017. https://doi.org/10.1109/CVPR.2017.106
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=S1eYHoC5FX
Liu, X., Liu, Z., Wang, G., Cai, Z., Zhang, H.: Ensemble transfer learning algorithm. IEEE Access 6, 2389–2396 (2018). https://doi.org/10.1109/ACCESS.2017.2782884
Long, M., Wang, J.: Learning transferable features with deep adaptation networks. CoRR abs/1502.02791 (2015). http://arxiv.org/abs/1502.02791
Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: Grid R-CNN. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 4898–4906 (2016)
Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. CoRR abs/1805.00932 (2018). http://arxiv.org/abs/1805.00932
Miikkulainen, R., et al.: Evolving deep neural networks. CoRR abs/1703.00548 (2017). http://arxiv.org/abs/1703.00548
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191
Pardoe, D., Stone, P.: Boosting for regression transfer. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML 2010, Omnipress, USA, pp. 863–870 (2010). http://dl.acm.org/citation.cfm?id=3104322.3104432
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. CoRR abs/1802.03268 (2018). http://arxiv.org/abs/1802.03268
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search (2018). https://arxiv.org/pdf/1802.01548.pdf
Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, vol. 70, pp. 2902–2911. JMLR.org (2017). http://dl.acm.org/citation.cfm?id=3305890.3305981
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015). http://arxiv.org/abs/1506.01497
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018). http://arxiv.org/abs/1801.04381
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. CoRR abs/1808.01974 (2018). http://arxiv.org/abs/1808.01974
Tan, M., et al.: Mnasnet: platform-aware neural architecture search for mobile. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. CoRR abs/1905.11946 (2019). http://arxiv.org/abs/1905.11946
Tzeng, E., Hoffman, J., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. In: The IEEE International Conference on Computer Vision (ICCV), December 2015
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. CoRR abs/1412.3474 (2014). http://arxiv.org/abs/1412.3474
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report (2011)
Wang, C., Wu, Y., Liu, Z.: Hierarchical boosting for transfer learning with multi-source. In: Proceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering, ICAIR-CACRE 2016, pp. 15:1–15:5. ACM, New York (2016). https://doi.org/10.1145/2952744.2952756. http://doi.acm.org/10.1145/2952744.2952756
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Welinder, P., et al.: Caltech-UCSD Birds 200. Technical report, CNS-TR-2010-001, California Institute of Technology (2010)
Wu, B., et al.: FBNet: hardware-aware efficient convnet design via differentiable neural architecture search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Xie, L., Yuille, A.: Genetic CNN. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Xie, S., Zheng, H., Liu, C., Lin, L.: SNAS: stochastic neural architecture search. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=rylqooRqK7
Xu, Y., et al.: A unified framework for metric transfer learning. IEEE Trans. Knowl. Data Eng. 29(6), 1158–1171 (2017). https://doi.org/10.1109/TKDE.2017.2669193
Yue, K., Sun, M., Yuan, Y., Zhou, F., Ding, E., Xu, F.: Compact generalized non-local network. In: Advances in Neural Information Processing Systems, pp. 6510–6519 (2018)
Zhang, J., Li, W., Ogunbona, P.: Joint geometrical and statistical alignment for visual domain adaptation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. CoRR abs/1611.01578 (2016). http://arxiv.org/abs/1611.01578
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012 (2017). http://arxiv.org/abs/1707.07012
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, M., Dou, H., Yan, J. (2020). Efficient Transfer Learning via Joint Adaptation of Network Architecture and Weight. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12358. Springer, Cham. https://doi.org/10.1007/978-3-030-58601-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-58601-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58600-3
Online ISBN: 978-3-030-58601-0
eBook Packages: Computer ScienceComputer Science (R0)