Skip to main content
Log in

COLAM: Co-Learning of Deep Neural Networks and Soft Labels via Alternating Minimization

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Softening labels of training datasets with respect to data representations has been frequently used to improve the training of deep neural networks. While such a practice has been studied as a way to leverage “privileged information” about the distribution of the data, a well-trained learner with soft classification outputs should be first obtained as a prior to generate such privileged information. To solve such a “chicken-and-egg” problem, we propose COLAM framework that Co-Learns DNNs and soft labels through Alternating Minimization of two objectives—(a) the training loss subject to soft labels and (b) the objective to learn improved soft labels—in one end-to-end training procedure. We performed extensive experiments to compare our proposed method with a series of baselines. The experiment results show that COLAM achieves improved performance on many tasks with better testing classification accuracy. We also provide both qualitative and quantitative analyses that explain why COLAM works well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Bagherinezhad H, Horton M, Rastegari M, Farhadi A (2018) Label refinery: improving imagenet classification through label progression. arXiv:1805.02641

  2. Chorowski J, Jaitly N (2016) Towards better decoding and language model integration in sequence to sequence models. In: INTERSPEECH

  3. Cimpoi M, Maji S, Vedaldi A (2015) Deep filter banks for texture recognition and segmentation. CVPR, pp 3828–3836

  4. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. CVPR, pp 248–255

  5. Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: ICML

  6. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CVPR, pp 770–778

  7. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. arXiv:1603.05027

  8. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531

  9. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580

  10. Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. CVPR, pp 2261–2269

  11. Huang Y, Cheng Y, Chen D, Lee H, Ngiam J, Le QV, Chen Z (2018) Gpipe: Efficient training of giant neural networks using pipeline parallelism. arXiv:1811.06965

  12. Józefowicz R, Vinyals O, Schuster M, Shazeer N, Wu Y (2016) Exploring the limits of language modeling. arXiv:1602.02410

  13. Krizhevsky A (2009) Learning multiple layers of features from tiny images

  14. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS

  15. Lopez-Paz D, Bottou L, Schölkopf B, Vapnik V (2016) Unifying distillation and privileged information. Int Conf Learn Represent (ICLR)

  16. Maji S, Rahtu E, Kannala J, Blaschko MB, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151

  17. Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? CoRR arXiv:1906.02629

  18. Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. ICVGIP, pp 722–729

  19. Pereyra G, Tucker G, Chorowski J, Kaiser L, Hinton GE (2017) Regularizing neural networks by penalizing confident output distributions. arXiv:1701.06548

  20. Real E, Aggarwal A, Huang Y, Le QV (2018) Regularized evolution for image classifier architecture search. In: AAAI

  21. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556

  22. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. CVPR

  23. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR, pp 2818–2826

  24. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: NIPS

  25. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-UCSD birds-200-2011 dataset. Tech. Rep. CNS-TR-2011-001, California Institute of Technology

  26. Xie L, Wang J, Wei Z, Wang M, Tian Q (2016) Disturblabel: regularizing cnn on the loss layer. CVPR, pp 4753–4762

  27. Xie S, Girshick RB, Dollár P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. CVPR, pp 5987–5995

  28. Yun S, Park J, Lee K, Shin J (2020) Regularizing class-wise predictions via self-knowledge distillation. In: The IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  29. Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv:1605.07146

  30. Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. CoRR arXiv:1301.3557

  31. Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2016) Understanding deep learning requires rethinking generalization. arXiv:1611.03530

  32. Zhang G, Wang C, Xu B, Grosse RB (2018) Three mechanisms of weight decay regularization. arXiv:1810.12281

  33. Zoph B, Vasudevan V, Shlens J, Le QV (2017) Learning transferable architectures for scalable image recognition. CVPR, pp 8697–8710

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haoyi Xiong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Xiong, H., An, H. et al. COLAM: Co-Learning of Deep Neural Networks and Soft Labels via Alternating Minimization. Neural Process Lett 54, 4735–4749 (2022). https://doi.org/10.1007/s11063-022-10830-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-10830-9

Keywords

Navigation