Advertisement

Collaboration by Competition: Self-coordinated Knowledge Amalgamation for Multi-talent Student Learning

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12351)

Abstract

A vast number of well-trained deep networks have been released by developers online for plug-and-play use. These networks specialize in different tasks and in many cases, the data and annotations used to train them are not publicly available. In this paper, we study how to reuse such heterogeneous pre-trained models as teachers, and build a versatile and compact student model, without accessing human annotations. To this end, we propose a self-coordinate knowledge amalgamation network (SOKA-Net) for learning the multi-talent student model. This is achieved via a dual-step adaptive competitive-cooperation training approach, where the knowledge of the heterogeneous teachers are in the first step amalgamated to guide the shared parameter learning of the student network, and followed by a gradient-based competition-balancing strategy to learn the multi-head prediction network as well as the loss weightings of the distinct tasks in the second step. The two steps, which we term as the collaboration and competition step respectively, are performed alternatively until the balance of the competition is reached for the ultimate collaboration. Experimental results demonstrate that, the learned student not only comes with a smaller size but achieves performances on par with or even superior to those of the teachers.

Keywords

Knowledge amalgamation Competitive collaboration 

Notes

Acknowledgments

This work is supported by National Key Research and Development Program (2018AAA0101503), National Natural Science Foundation of China (61976186), Key Research and Development Program of Zhejiang Province (2018C01004), and the Major Scientific Research Project of Zhejiang Lab (No. 2019KD0AC01).

Supplementary material

504443_1_En_38_MOESM1_ESM.pdf (511 kb)
Supplementary material 1 (pdf 510 KB)

References

  1. 1.
    Achille, A., et al.: Task2vec: task embedding for meta-learning. In: IEEE International Conference on Computer Vision (ICCV), pp. 6430–6439 (2019)Google Scholar
  2. 2.
    Azizpour, H., Razavian, A.S., Sullivan, J., Maki, A., Carlsson, S.: Factors of transferability for a generic convnet representation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 38(9), 1790–1802 (2015)CrossRefGoogle Scholar
  3. 3.
    Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1–2), 151–175 (2010)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: International Conference on Machine Learning (ICML), pp. 794–803 (2018)Google Scholar
  5. 5.
    Dwivedi, K., Roig, G.: Representation similarity analysis for efficient task taxonomy & transfer learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12387–12396 (2019)Google Scholar
  6. 6.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  7. 7.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  8. 8.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
  9. 9.
    Huang, Z., Wang, N.: Like what you like: Knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219 (2017)
  10. 10.
    Liu, S., Johns, E., Davison, A.J.: End-to-end multi-task learning with attention. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1871–1880 (2019)Google Scholar
  11. 11.
    Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: Looking wider to see better. arXiv preprint arXiv:1506.04579 (2015)
  12. 12.
    Luo, S., Wang, X., Fang, G., Hu, Y., Tao, D., Song, M.: Knowledge amalgamation from heterogeneous networks by common feature learning. In: International Joint Conference on Artificial Intelligence (IJCAI) (2019)Google Scholar
  13. 13.
    Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3994–4003 (2016)Google Scholar
  14. 14.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)CrossRefGoogle Scholar
  15. 15.
    Ranjan, A., et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12240–12249 (2019)Google Scholar
  16. 16.
    Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar
  17. 17.
    Ruder, S., Bingel, J., Augenstein, I., Søgaard, A.: Latent multi-task architecture learning. In: AAAI Conference on Artificial Intelligence (AAAI), vol. 33, pp. 4822–4829 (2019)Google Scholar
  18. 18.
    Sener, O., Koltun, V.: Multi-task learning as multi-objective optimization. In: Neural Information Processing Systems (NeurIPS), pp. 527–538 (2018)Google Scholar
  19. 19.
    Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR Workshops, pp. 806–813 (2014)Google Scholar
  20. 20.
    Shen, C., Wang, X., Song, J., Sun, L., Song, M.: Amalgamating knowledge towards comprehensive classification. In: AAAI Conference on Artificial Intelligence (AAAI) (2019)Google Scholar
  21. 21.
    Shen, C., Xue, M., Wang, X., Song, J., Sun, L., Song, M.: Customizing student networks from heterogeneous teachers via adaptive knowledge amalgamation. In: IEEE International Conference on Computer Vision (ICCV), pp. 3504–3513 (2019)Google Scholar
  22. 22.
    Song, J., Chen, Y., Wang, X., Shen, C., Song, M.: Deep model transferability from attribution maps. In: Advances in Neural Information Processing Systems (NeurIPS) (2019)Google Scholar
  23. 23.
    Song, J., et al.: Depara: deep attribution graph for deep knowledge transferability. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)Google Scholar
  24. 24.
    Standley, T., Zamir, A.R., Chen, D., Guibas, L., Malik, J., Savarese, S.: Which tasks should be learned together in multi-task learning? arXiv preprint arXiv:1905.07553 (2019)
  25. 25.
    Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: IEEE International Conference on Computer Vision (ICCV), pp. 1365–1374 (2019)Google Scholar
  26. 26.
    Wang, H., Zhao, H., Li, X., Tan, X.: Progressive blockwise knowledge distillation for neural network acceleration. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 2769–2775 (2018)Google Scholar
  27. 27.
    Wang, Z., Deng, Z., Wang, S.: Accelerating convolutional neural networks with dominant convolutional kernel and knowledge pre-regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 533–548. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_32CrossRefGoogle Scholar
  28. 28.
    Xu, D., Ouyang, W., Wang, X., Sebe, N.: PAD-Net: multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 675–684 (2018)Google Scholar
  29. 29.
    Yang, Y., Qiu, J., Song, M., Tao, D., Wang, X.: Distilling knowledge from graph convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)Google Scholar
  30. 30.
    Ye, J., Ji, Y., Wang, X., Gao, X., Song, M.: Data-free knowledge amalgamation via group-stack dual-GAN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)Google Scholar
  31. 31.
    Ye, J., Ji, Y., Wang, X., Ou, K., Tao, D., Song, M.: Student becoming the master: knowledge amalgamation for joint scene parsing, depth estimation, and more. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  32. 32.
    Yu, X., Liu, T., Wang, X., Tao, D.: On compressing deep models by low rank and sparse decomposition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  33. 33.
    Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: International Conference on Learning Representations (ICLR) (2017)Google Scholar
  34. 34.
    Zamir, A.R., Sax, A., Shen, W., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3712–3722 (2018)Google Scholar
  35. 35.
    Zhao, Y., Xu, R., Wang, X., Hou, P., Tang, H., Song, M.: Hearing lips: improving lip reading by distilling speech recognizers. In: AAAI Conference on Artificial Intelligence, (AAAI) (2020)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Zhejiang UniversityHangzhouChina
  2. 2.Stevens Institute of TechnologyHobokenUSA
  3. 3.Alibaba GroupHangzhouChina

Personalised recommendations