Abstract
Softening labels of training datasets with respect to data representations has been frequently used to improve the training of deep neural networks. While such a practice has been studied as a way to leverage “privileged information” about the distribution of the data, a well-trained learner with soft classification outputs should be first obtained as a prior to generate such privileged information. To solve such a “chicken-and-egg” problem, we propose COLAM framework that Co-Learns DNNs and soft labels through Alternating Minimization of two objectives—(a) the training loss subject to soft labels and (b) the objective to learn improved soft labels—in one end-to-end training procedure. We performed extensive experiments to compare our proposed method with a series of baselines. The experiment results show that COLAM achieves improved performance on many tasks with better testing classification accuracy. We also provide both qualitative and quantitative analyses that explain why COLAM works well.
Similar content being viewed by others
References
Bagherinezhad H, Horton M, Rastegari M, Farhadi A (2018) Label refinery: improving imagenet classification through label progression. arXiv:1805.02641
Chorowski J, Jaitly N (2016) Towards better decoding and language model integration in sequence to sequence models. In: INTERSPEECH
Cimpoi M, Maji S, Vedaldi A (2015) Deep filter banks for texture recognition and segmentation. CVPR, pp 3828–3836
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. CVPR, pp 248–255
Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: ICML
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CVPR, pp 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. arXiv:1603.05027
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580
Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. CVPR, pp 2261–2269
Huang Y, Cheng Y, Chen D, Lee H, Ngiam J, Le QV, Chen Z (2018) Gpipe: Efficient training of giant neural networks using pipeline parallelism. arXiv:1811.06965
Józefowicz R, Vinyals O, Schuster M, Shazeer N, Wu Y (2016) Exploring the limits of language modeling. arXiv:1602.02410
Krizhevsky A (2009) Learning multiple layers of features from tiny images
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS
Lopez-Paz D, Bottou L, Schölkopf B, Vapnik V (2016) Unifying distillation and privileged information. Int Conf Learn Represent (ICLR)
Maji S, Rahtu E, Kannala J, Blaschko MB, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? CoRR arXiv:1906.02629
Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. ICVGIP, pp 722–729
Pereyra G, Tucker G, Chorowski J, Kaiser L, Hinton GE (2017) Regularizing neural networks by penalizing confident output distributions. arXiv:1701.06548
Real E, Aggarwal A, Huang Y, Le QV (2018) Regularized evolution for image classifier architecture search. In: AAAI
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. CVPR
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR, pp 2818–2826
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: NIPS
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-UCSD birds-200-2011 dataset. Tech. Rep. CNS-TR-2011-001, California Institute of Technology
Xie L, Wang J, Wei Z, Wang M, Tian Q (2016) Disturblabel: regularizing cnn on the loss layer. CVPR, pp 4753–4762
Xie S, Girshick RB, Dollár P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. CVPR, pp 5987–5995
Yun S, Park J, Lee K, Shin J (2020) Regularizing class-wise predictions via self-knowledge distillation. In: The IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv:1605.07146
Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. CoRR arXiv:1301.3557
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2016) Understanding deep learning requires rethinking generalization. arXiv:1611.03530
Zhang G, Wang C, Xu B, Grosse RB (2018) Three mechanisms of weight decay regularization. arXiv:1810.12281
Zoph B, Vasudevan V, Shlens J, Le QV (2017) Learning transferable architectures for scalable image recognition. CVPR, pp 8697–8710
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, X., Xiong, H., An, H. et al. COLAM: Co-Learning of Deep Neural Networks and Soft Labels via Alternating Minimization. Neural Process Lett 54, 4735–4749 (2022). https://doi.org/10.1007/s11063-022-10830-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-10830-9