Mutual Improvement Between Temporal Ensembling and Virtual Adversarial Training

  • Wei Zhou
  • Cheng LianEmail author
  • Zhigang Zeng
  • Yixin Su


The research of semi-supervised learning (SSL) is of great significance because it is very expensive to collect a large quantity of data with labels in some fields. Two recent deep learning-based SSL algorithms, temporal ensembling and virtual adversarial training (VAT), have achieved state-of-the-art accuracy in some classical SSL tasks, while both of them have shortcomings. Because of simply adding random noise to training data, temporal ensembling is not fully utilized. In addition, VAT has considerable time costs because there are two inferences in each epoch for unlabeled samples. In this paper, we propose the use of virtual adversarial perturbations (VAP) in temporal ensembling rather than random noises to improve performance. Moreover, we also find that reusing VAP can accelerate the training process of VAT without losing obvious accuracy. The two methods are validated on MNIST, FashionMNIST and SVHN.


Temporal ensembling Virtual adversarial perturbations Accelerate training process Semi-supervised learning 



The work was supported by the National Key R&D Program of China under Grant 2017YFC1501301, the Natural Science Foundation of China under Grants 61876219, 61503144, 61673188 and 61761130081, the Natural Science Foundation of Hubei Province of China under Grant 2017CFB519.

Compliance with Ethical Standards

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


  1. 1.
    Baldi P, Sadowski PJ (2013) Understanding dropout. In: NIPSGoogle Scholar
  2. 2.
    Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434MathSciNetzbMATHGoogle Scholar
  3. 3.
    Blum A, Mitchell TM (1998) Combining labeled and unlabeled data with co-training. In: COLTGoogle Scholar
  4. 4.
    Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning (Chapelle, O. et al., eds.; 2006) [book reviews]. IEEE Trans Neural Netw 20(3):542Google Scholar
  5. 5.
    Deng J, Dong W, Socher R, Li L-J, Li K, Fei-FL (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255Google Scholar
  6. 6.
    Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICMLGoogle Scholar
  7. 7.
    Golub GH, Van Der Vorst HA (2000) Eigenvalue computation in the 20th century. J Comput Appl Math 123(1–2):35–65MathSciNetCrossRefGoogle Scholar
  8. 8.
    Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: NIPSGoogle Scholar
  9. 9.
    Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. CoRR, arXiv:1412.6572
  10. 10.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778Google Scholar
  11. 11.
    Hoffer E, Ailon N (2016) Semi-supervised deep learning by metric embedding. ArXiv, arXiv:1611.01449
  12. 12.
    Hong C, Jun Y, Tao D, Meng W (2015) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751Google Scholar
  13. 13.
    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint. arXiv:1412.6980
  14. 14.
    Kingma DP, Mohamed S, Rezende DJ, Welling M (2014) Semi-supervised learning with deep generative models. In: NIPSGoogle Scholar
  15. 15.
    Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images, vol 1. University of Toronto, Technical report, p 7Google Scholar
  16. 16.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systemsGoogle Scholar
  17. 17.
    Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. arXiv preprint. arXiv:1610.02242
  18. 18.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  19. 19.
    Lee DH (2013) Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning, ICML, vol 3. p 2Google Scholar
  20. 20.
    Li M, Lv J, Wang J, Sang Y (2019) An abstract painting generation method based on deep generative model. In Neural Processing Letters, pp 1–12Google Scholar
  21. 21.
    MacKay DJC (2003) Information theory, inference, and learning algorithms. IEEE Trans Inf Theory 50:2544–2545Google Scholar
  22. 22.
    Miyato T, Maeda SI, Koyama M, Ishii S (2017) Virtual adversarial training: a regularization method for supervised and semi-supervised learning. In: IEEE Trans Pattern Anal Mach Intell 99:1Google Scholar
  23. 23.
    Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: ICMLGoogle Scholar
  24. 24.
    Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learningGoogle Scholar
  25. 25.
    Ng YC, Colombo N, Silva R (2018) Bayesian semi-supervised learning with graph Gaussian processesGoogle Scholar
  26. 26.
    Nigam K, Mccallum ST, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134CrossRefGoogle Scholar
  27. 27.
    Oliver A, Odena A, Raffel CA, Cubuk ED, Goodfellow I (2018) Realistic evaluation of deep semi-supervised learning algorithms. In: Advances in neural information processing systems, pp 3235–3246Google Scholar
  28. 28.
    Park S, Park J-K, Shin S-J, Moon I-C (2018) Adversarial dropout for supervised and semi-supervised learning. In: AAAIGoogle Scholar
  29. 29.
    Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorchGoogle Scholar
  30. 30.
    Rasmus A, Berglund M, Honkala M, Valpola H, Raiko T (2015) Semi-supervised learning with ladder networks. In: NIPSGoogle Scholar
  31. 31.
    Salimans T, Kingma DP (2016) Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in neural information processing systems. pp 901–909Google Scholar
  32. 32.
    Shinoda S, Worrall DE, Brostow GJ (2017) Virtual adversarial ladder networks for semi-supervised learning. CoRR, arXiv:1711.07476
  33. 33.
    Sindhwani V, Niyogi P, Belkin M (2005) A co-regularization approach to semi-supervised learning with multiple views. In: ICML 2005Google Scholar
  34. 34.
    Springenberg JT (2016) Unsupervised and semi-supervised learning with categorical generative adversarial networks. CoRR, arXiv:1511.06390
  35. 35.
    Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetzbMATHGoogle Scholar
  36. 36.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9Google Scholar
  37. 37.
    Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, Fergus R (2014) Intriguing properties of neural networks. CoRR, arXiv:1312.6199
  38. 38.
    Tarvainen A, Valpola H (2018) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning resultsGoogle Scholar
  39. 39.
    Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of annual meeting of the association for computational linguistics, pp 189–196Google Scholar
  40. 40.
    Yu B, Wu J, Zhu Z (2018) Tangent-normal adversarial regularization for semi-supervised learning. CoRR, arXiv:1808.06088
  41. 41.
    Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 99:1–11Google Scholar
  42. 42.
    Yu JS, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23:2019–2032MathSciNetCrossRefGoogle Scholar
  43. 43.
    Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell. CrossRefGoogle Scholar
  44. 44.
    Yu J, Zhu C, Zhang J, Huang Q, Tao D (2019) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Sys. CrossRefGoogle Scholar
  45. 45.
    Zhang JW, Yu JS, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27:2420–2432MathSciNetCrossRefGoogle Scholar
  46. 46.
    Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data. Tech Report 3175(2004):237–244Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of AutomationWuhan University of TechnologyWuhanChina
  2. 2.School of Artificial Intelligence and AutomationHuazhong University of Science and TechnologyWuhanChina

Personalised recommendations