Abstract
Deep supervised learning has achieved great successes in tackling complex computer vision tasks. However, it typically requires a large amount of data with labels and is expensive in practical applications. Semi-supervised learning, which leverages the hidden structures learned from unlabeled data, has attracted much attention. In this work, a semi-supervised classification model named Multi-Match is proposed, which includes two augmentation branches and encourages the output of the complex augmentation branch to be close to the predictions of the simple augmentation branch. A mutual information (MI) loss is introduced to maximize MI not only between the input and output representation, but also between the class assignments inside the simple augmentation branch. A novel information dropping method named CutEdge is proposed by removing multiple regions near the input edges to further improve the robustness. The experimental results on CIFAR-10, CIFAR-100 and SVHN with different label sizes demonstrate that the proposed model outperforms the compared semi-supervised learning methods. The gains come from the MI loss, the combination of affine transformation and CutEdge, and the use of multiple branches.
Similar content being viewed by others
References
Athiwaratkun B, Finzi M, Izmailov P, Wilson AG (2018a) Improving consistency-based semi-supervised learning with weight averaging. arXiv:180605594
Athiwaratkun B, Finzi M, Izmailov P, Wilson AG (2018b) There are many consistent explanations of unlabeled data: Why you should average. In: Proceedings of the international conference on learning representations (ICLR)
Bachman P, Hjelm RD, Buchwalter W (2019) Learning representations by maximizing mutual information across views. In: Advances in neural information processing systems (NIPS), pp 15535–15545
Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA (2019) Mixmatch: A holistic approach to semi-supervised learning. In: Advances in neural information processing systems (NIPS), pp 5049–5059
Canny J (1986) A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (6):679–698
Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning (Chapelle, O et al 2006)[book reviews]. IEEE Transactions on Neural Networks 20 (3):542–542
Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV (2018) Autoaugment: Learning augmentation policies from data. arXiv:180509501
Cubuk ED, Zoph B, Shlens J, Le QV (2020) Randaugment: Practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 702–703
DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:170804552
Grandvalet Y, Bengio Y (2005) Semi-supervised learning by entropy minimization. In: Advances in neural information processing systems (NIPS), pp 529–536
Hartley RI (1999) Theory and practice of projective rectification. Int J Comput Vis 35(2):115–127
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv:12070580
Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learning deep representations by mutual information estimation and maximization. In: Proceedings of the international conference on learning representations (ICLR)
Hu W, Miyato T, Tokui S, Matsumoto E, Sugiyama M (2017) Learning discrete representations via information maximizing self-augmented training. In: Proceedings of the international conference on machine learning (ICML), pp 1558–1567
Ji X, Henriques JF, Vedaldi A (2019) Invariant information clustering for unsupervised image classification and segmentation. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 9865–9874
Kingma DP, Mohamed S, Rezende DJ, Welling M (2014) Semi-supervised learning with deep generative models. In: Advances in neural information processing systems (NIPS), pp 3581–3589
Krizhevsky A, Hinton G, et al. (2009) Learning multiple layers of features from tiny images. Technical Report
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (NIPS), pp 1097–1105
Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. arXiv:161002242
Lee DH (2013) Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning (ICML), pp 1–6
Leng L, Zhang J, Khan MK, Chen X, Alghathbar K (2010) Dynamic weighted discrimination power analysis: a novel approach for face and palmprint recognition in dct domain. Int J Phys Sci 5(17):2543–2554
Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76(1):333–354
Lim S, Kim I, Kim T, Kim C, Kim S (2019) Fast autoaugment. In: Advances in neural information processing systems (NIPS), pp 6665–6675
Liu P, Yu H, Cang S (2019) Adaptive neural network tracking control for underactuated systems with matched and mismatched disturbances. Nonlinear Dynam 98(2):1447–1464
Maaløe L, Sønderby CK, Sønderby SK, Winther O (2016) Auxiliary deep generative models. In: Proceedings of the international conference on machine learning (ICML), pp 1445–1453
Mahbod A, Chowdhury M, Smedby Ö, Wang C (2018) Automatic brain segmentation using artificial neural networks with shape context. Pattern Recogn Lett 101:74–79
Miyato T, Si Maeda, Koyama M, Ishii S (2018) Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Trans Pattern Anal Mach Intell 41(8):1979–1993
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning
Oord Avd, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv:180703748
Poole B, Ozair S, Van Den Oord A, Alemi A, Tucker G (2019) On variational bounds of mutual information. In: Proceedings of the international conference on machine learning (ICML), pp 5171–5180
Qi GJ, Zhang L, Hu H, Edraki M, Wang J, Hua XS (2018) Global versus localized generative adversarial nets. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1517–1525
Qi GJ, Zhang L, Chen CW, Tian Q (2019) Avt: Unsupervised learning of transformation equivariant representations by autoencoding variational transformations. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 8130–8139
Rasmus A, Berglund M, Honkala M, Valpola H, Raiko T (2015) Semi-supervised learning with ladder networks. In: Advances in neural information processing systems (NIPS), pp 3546–3554
Sajjadi M, Javanmardi M, Tasdizen T (2016) Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: Advances in neural information processing systems (NIPS), pp 1163–1171
Siddharth N, Paige B, Van de Meent JW, Desmaison A, Goodman N, Kohli P, Wood F, Torr P (2017) Learning disentangled representations with semi-supervised deep generative models. In: Advances in neural information processing systems (NIPS), pp 5925–5935
Sønderby CK, Raiko T, Maaløe L, Sønderby SK, Winther O (2016) Ladder variational autoencoders. In: Advances in neural information processing systems (NIPS), pp 3738–3746
Song J, Ermon S (2019) Understanding the limitations of variational mutual information estimators. In: Proceedings of the international conference on learning representations (ICLR)
Sun L, Zhao C, Yan Z, Liu P, Duckett T, Stolkin R (2018) A novel weakly-supervised approach for rgb-d-based nuclear waste object detection. IEEE Sensors J 19(9):3487–3500
Zc Tang, Li C, Jf W u, Pc Liu, Sw Cheng (2019) Classification of eeg-based single-trial motor imagery tasks using a b-csp method for bci. Front Inform Technol Electron Eng 20(8):1087–1098
Tarvainen A, Valpola H (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in neural information processing systems (NIPS), pp 1195–1204
Tschannen M, Djolonga J, Rubenstein PK, Gelly S, Lucic M (2019) On mutual information maximization for representation learning. In: Proceedings of the international conference on learning representations (ICLR)
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv:160507146
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) Mixup: Beyond empirical risk minimization. In: Proceedings of the international conference on learning representations (ICLR)
Zhang L, Qi GJ, Wang L, Luo J (2019) Aet vs. aed: Unsupervised representation learning by auto-encoding transformations rather than data. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2547–2555
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2017) Random erasing data augmentation. arXiv:170804896
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2921–2929
Xianye B, Yi R, Junping Z, Su-Jing W, Kidiyo K, Weixiao M, Yong-Jin L Video-based facial micro-expression analysis: a survey of datasets, features and algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, https://doi.org/10.1109/TPAMI.2021.3067464
Acknowledgements
This research was funded in part by the National Key Research and Development Program of China (2021YFB2800300), the Shandong Provincial Key Research and Development Program (Major Scientific and Technological Innovation Project) under Grant 2019JZZY010119, the National Natural Science Foundation of China under Grant No. 62001267, and the Future Plan for Young Scholars of Shandong University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Abbreviations and descriptions
Appendix A: Abbreviations and descriptions
Abbreviations | Descriptions |
---|---|
Multi-Match | Proposed semi-supervised classification model based on multi-branch |
MI | Mutual information |
CutEdge | Proposed information dropping method based on edge removal |
GAN | Generative adversarial network |
CutOut | Information dropping method based on random deletion |
π Model | Method of maximizing prediction consistency |
VAT | Virtual adversarial training, method of applying the most sensitive perturbation to model input |
MixMatch | Method of combining entropy minimization and consistency regularization |
MixUp | Fusion method based on samples and labels |
KL | Kullback leiber divergence, measurement of the difference between two probability distributions |
DIM | Deep infomax, method of maximizing global and local mutual information of images |
AMDIM | Augmented multi-scale deep infomax, method of maximizing input and output mutual information of images |
CPC | Contrastive predictive coding, method of maximizing the lower bound of mutual information |
IIC | Invariant information clustering, method of maximizing mutual information between image outputs |
NMS | Non-maximum suppression |
EMA | Exponential moving average, the exponential moving average of previous model parameters |
CAM | Class activation mapping, method of generating class activation thermal mapping for input |
Rights and permissions
About this article
Cite this article
Wu, Y., Chen, L., Zhao, D. et al. Multi-match: mutual information maximization and CutEdge for semi-supervised learning. Multimed Tools Appl 82, 479–496 (2023). https://doi.org/10.1007/s11042-022-13126-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13126-1