Transfer channel pruning for compressing deep domain adaptation models

  • Chaohui Yu
  • Jindong Wang
  • Yiqiang ChenEmail author
  • Xin Qin
Original Article


Deep unsupervised domain adaptation has recently received increasing attention from researchers. However, existing methods are computationally intensive due to the computational cost of convolutional neural networks (CNN) adopted by most work. There is no effective network compression method for such problem. In this paper, we propose a unified transfer channel pruning (TCP) method for accelerating deep unsupervised domain adaptation (UDA) models. TCP method is capable of compressing the deep UDA model by pruning less important channels while simultaneously learning transferable features by reducing the cross-domain distribution divergence. Therefore, it reduces the impact of negative transfer and maintains competitive performance on the target task. To the best of our knowledge, TCP method is the first approach that aims at accelerating deep unsupervised domain adaptation models. TCP method is validated on two main kinds of UDA methods: the discrepancy-based methods and the adversarial-based methods. In addition, it is validated on two benchmark datasets: Office-31 and ImageCLEF-DA with two common backbone networks - VGG16 and ResNet50. Experimental results demonstrate that our TCP method achieves comparable or better classification accuracy than other comparison methods while significantly reducing the computational cost. To be more specific, in VGG16, we get even higher accuracy after pruning 26% floating point operations (FLOPs); in ResNet50, we also get higher accuracy on half of the tasks after pruning 12% FLOPs for both discrepancy-based methods and adversarial-based methods.


Unsupervised domain adaptation Transfer channel pruning Accelerating 



This work is supported in part by National Key Research & Development Plan of China (No. 2017YFB1002802), NSFC (No. 61572471), and Beijing Municipal Science & Technology Commission (No. Z171100000117017).


  1. 1.
    Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828CrossRefGoogle Scholar
  2. 2.
    Borgwardt KM, Gretton A, Rasch MJ, Kriegel HP, Schölkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):e49–e57CrossRefGoogle Scholar
  3. 3.
    Chollet F (2017) Xception: deep learning with depthwise separable convolutions. arXiv preprint pp. 1610–02357Google Scholar
  4. 4.
    Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277Google Scholar
  5. 5.
    Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML. pp 647–655Google Scholar
  6. 6.
    Durugkar I, Gemp I, Mahadevan S (2016) Generative multi-adversarial networks. arXiv preprint arXiv:1611.01673
  7. 7.
    Fernando B, Habrard A, Sebban M, Tuytelaars T (2013) Unsupervised visual domain adaptation using subspace alignment. In: Proceedings of the IEEE international conference on computer vision. Sydney, Australia, pp 2960–2967.
  8. 8.
    Ganin Y, Lempitsky V (2014) Unsupervised domain adaptation by backpropagation. arXiv preprint arXiv:1409.7495
  9. 9.
    Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2096-30MathSciNetzbMATHGoogle Scholar
  10. 10.
    Gong B, Grauman K, Sha F (2013) Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation. In: ICML. pp 222–230Google Scholar
  11. 11.
    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y: Generative adversarial nets. In: Advances in neural information processing systems. pp 2672–2680 (2014)Google Scholar
  12. 12.
    Han S, Mao H, Dally WJ (2015) Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149
  13. 13.
    Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems. pp 1135–1143Google Scholar
  14. 14.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas, USA, pp 770–778.
  15. 15.
    He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: International conference on computer vision (ICCV), vol. 2. Venice, Italy.
  16. 16.
    Hoffman J, Tzeng E, Park T, Zhu JY, Isola P, Saenko K, Efros AA, Darrell T (2018) Cycada: cycle-consistent adversarial domain adaptation. In: ICMLGoogle Scholar
  17. 17.
    Hou CA, Tsai YHH, Yeh YR, Wang YCF (2016) Unsupervised domain adaptation with label and structural consistency. IEEE Trans Image Process 25(12):5552–5562MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Hu Y, Sun S, Li J, Wang X, Gu Q (2018) A novel channel pruning method for deep neural network compression. arXiv preprint arXiv:1805.11394
  19. 19.
    Huang J, Gretton A, Borgwardt K, Schölkopf B, Smola AJ (2007) Correcting sample selection bias by unlabeled data. In: Advances in neural information processing systems. pp 601–608Google Scholar
  20. 20.
    Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICMLGoogle Scholar
  21. 21.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. pp 1097–1105Google Scholar
  22. 22.
    Lin J, Rao Y, Lu J, Zhou J (2017) Runtime neural pruning. In: Advances in neural information processing systems. pp 2181–2191Google Scholar
  23. 23.
    Long M, Cao Y, Wang J, Jordan MI (2015) Learning transferable features with deep adaptation networks. In: ICMLGoogle Scholar
  24. 24.
    Long M, Wang J, Ding G, Sun J, Yu PS (2013) Transfer feature learning with joint distribution adaptation. In: Proceedings of the IEEE international conference on computer vision. Sydney, Australia, pp 2200–2207.
  25. 25.
    Long M, Zhu H, Wang J, Jordan MI (2017) Deep transfer learning with joint adaptation networks. In: ICMLGoogle Scholar
  26. 26.
    Luo JH, Wu J (2018) Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference. arXiv preprint arXiv:1805.08941
  27. 27.
    Luo JH, Wu J, Lin W (2017) Thinet: A filter level pruning method for deep neural network compression. arXiv preprint arXiv:1707.06342
  28. 28.
    Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2017) Pruning convolutional neural networks for resource efficient inference. In: ICLRGoogle Scholar
  29. 29.
    Motiian S, Jones Q, Iranmanesh S, Doretto G (2017) Few-shot adversarial domain adaptation. In: Advances in neural information processing systems. pp 6670–6680Google Scholar
  30. 30.
    Pan SJ, Kwok JT, Yang Q (2008) Transfer learning via dimensionality reduction. AAAI 8:677–682Google Scholar
  31. 31.
    Pan SJ, Tsang IW, Kwok JT, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210CrossRefGoogle Scholar
  32. 32.
    Pan SJ, Yang Q et al (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRefGoogle Scholar
  33. 33.
    Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorchGoogle Scholar
  34. 34.
    Patel VM, Gopalan R, Li R, Chellappa R (2015) Visual domain adaptation: a survey of recent advances. IEEE Signal Process Mag 32(3):53–69CrossRefGoogle Scholar
  35. 35.
    Pei Z, Cao Z, Long M, Wang J (2018) Multi-adversarial domain adaptation. In: Thirty-second AAAI conference on artificial intelligence. New Orleans, Louisiana, USA.
  36. 36.
    Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: imagenet classification using binary convolutional neural networks. In: ECCV. Springer, pp 525–542Google Scholar
  37. 37.
    Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: European conference on computer vision. Springer, Hersonissos, Heraklion, Crete, Greece, pp 213–226.
  38. 38.
    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLRGoogle Scholar
  39. 39.
    Sun B, Feng J, Saenko K (2016) Return of frustratingly easy domain adaptation. AAAI 6:8Google Scholar
  40. 40.
    Sun B, Saenko K (2015) Subspace distribution alignment for unsupervised domain adaptation. In: BMVC. pp 24–1Google Scholar
  41. 41.
    Sun B, Saenko K (2016) Deep coral: Correlation alignment for deep domain adaptation. In: European conference on computer vision. Springer, Amsterdam, The Netherlands, pp 443–450.
  42. 42.
    Tahmoresnezhad J, Hashemi S (2017) Visual domain adaptation via transfer feature learning. Knowl Inf Syst 50(2):585–605CrossRefGoogle Scholar
  43. 43.
    Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, Hawaii, USA, pp 7167–7176.
  44. 44.
    Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474
  45. 45.
    Wang J, Chen Y, Hao S, Feng W, Shen Z (2017) Balanced distribution adaptation for transfer learning. In: Data mining (ICDM), 2017 IEEE International Conference on IEEE. New Orleans, USA, pp 1129–1134.
  46. 46.
    Wang J, Feng W, Chen Y, Yu H, Huang M, Yu PS (2018) Visual domain adaptation with manifold embedded distribution alignment. In: 2018 ACM Multimedia Conference on Multimedia Conference. Seoul, Korea, pp 402–410.
  47. 47.
    Wang J, et al.: Everything about transfer learning and domain adapation.
  48. 48.
    Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems. pp 3320–3328Google Scholar
  49. 49.
    Zhou A, Yao A, Guo Y, Xu L, Chen Y (2017) Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044
  50. 50.
    Zhuang F, Cheng X, Luo P, Pan SJ, He Q (2015) Supervised representation learning: transfer learning with deep autoencoders. In: IJCAI. pp 4119–4125Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Chaohui Yu
    • 1
    • 2
  • Jindong Wang
    • 3
  • Yiqiang Chen
    • 1
    • 2
    Email author
  • Xin Qin
    • 1
    • 2
  1. 1.Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.Microsoft Research AsiaBeijingChina

Personalised recommendations