Advertisement

Regularization with Latent Space Virtual Adversarial Training

Conference paper
  • 1.7k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12346)

Abstract

Virtual Adversarial Training (VAT) has shown impressive results among recently developed regularization methods called consistency regularization. VAT utilizes adversarial samples, generated by injecting perturbation in the input space, for training and thereby enhances the generalization ability of a classifier. However, such adversarial samples can be generated only within a very small area around the input data point, which limits the adversarial effectiveness of such samples. To address this problem we propose LVAT (Latent space VAT), which injects perturbation in the latent space instead of the input space. LVAT can generate adversarial samples flexibly, resulting in more adverse effect and thus more effective regularization. The latent space is built by a generative model, and in this paper we examine two different type of models: variational auto-encoder and normalizing flow, specifically Glow.

We evaluated the performance of our method in both supervised and semi-supervised learning scenarios for an image classification task using SVHN and CIFAR-10 datasets. In our evaluation, we found that our method outperforms VAT and other state-of-the-art methods.

Keywords

Consistency regularization Adversarial training Image classification Semi-supervised learning And unsupervised learning 

Notes

Acknowledgement

This work was supported in part by JSPS KAKENHI Grant Number 20K11807.

References

  1. 1.
    Athiwaratkun, B., Finzi, M., Izmailov, P., Wilson, A.G.: There are many consistent explanations of unlabeled data: Why you should average. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=rkgKBhA5Y7
  2. 2.
    Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. In: NeurIPS (2019)Google Scholar
  3. 3.
    Cao, X., Gong, N.Z.: Mitigating evasion attacks to deep neural networks via region-based classification. In: Proceedings of the 33rd Annual Computer Security Applications Conference, pp. 278–287. ACM (2017)Google Scholar
  4. 4.
    Chen, X., et al.: Variational lossy autoencoder. In: International Conference on Learning Representations (2017)Google Scholar
  5. 5.
    Dai, Z., Yang, Z., Yang, F., Cohen, W.W., Salakhutdinov, R.R.: Good semi-supervised learning that requires a bad GAN. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 6510–6520. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7229-good-semi-supervised-learning-that-requires-a-bad-gan.pdf
  6. 6.
    Dinh, L., Krueger, D., Bengio, Y.: Nice: non-linear independent components estimation. In: International Conference on Learning Representations (2015)Google Scholar
  7. 7.
    Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. In: International Conference on Learning Representations (2017)Google Scholar
  8. 8.
    Dumoulin, V., et al.: Adversarially learned inference. In: International Conference on Learning Representations (2017)Google Scholar
  9. 9.
    Fawzi, A., Fawzi, H., Fawzi, O.: Adversarial vulnerability for any classifier. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 1178–1187. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/7394-adversarial-vulnerability-for-any-classifier.pdf
  10. 10.
    Haeusser, P., Mordvintsev, A., Cremers, D.: Learning by association - a versatile semi-supervised training method for neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  11. 11.
    Huang, C.W., Krueger, D., Lacoste, A., Courville, A.: Neural autoregressive flows. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 2078–2087. PMLR, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. http://proceedings.mlr.press/v80/huang18d.html
  12. 12.
    Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 448–456. PMLR, Lille, France, 07–09 July 2015. http://proceedings.mlr.press/v37/ioffe15.html
  13. 13.
    Jackson, J., Schulman, J.: Semi-supervised learning by label gradient alignment. arXiv preprint arXiv:1902.02336 (2019)
  14. 14.
    Kamnitsas, K., et al.: Semi-supervised learning via compact latent space clustering. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 2459–2468. PMLR, Stockholmsmässan, Stockholm Sweden, 10–15 July 2018. http://proceedings.mlr.press/v80/kamnitsas18a.html
  15. 15.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)Google Scholar
  16. 16.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (2014)Google Scholar
  17. 17.
    Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1 \(\times \) 1 convolutions. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 10215–10224. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/8224-glow-generative-flow-with-invertible-1x1-convolutions.pdf
  18. 18.
    Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improved variational inference with inverse autoregressive flow. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 4743–4751. Curran Associates, Inc. (2016), http://papers.nips.cc/paper/6581-improved-variational-inference-with-inverse-autoregressive-flow.pdf
  19. 19.
    Kolasinski, K.: An implementation of the GLOW paper and simple normalizing flows lib (2018). https://github.com/kmkolasinski/deep-learning-notes/tree/master/seminars/2018-10-Normalizing-Flows-NICE-RealNVP-GLOW
  20. 20.
    Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: International Conference on Learning Representations (2017)Google Scholar
  21. 21.
    Li, C., Xu, T., Zhu, J., Zhang, B.: Triple generative adversarial nets. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 4088–4098. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6997-triple-generative-adversarial-nets.pdf
  22. 22.
    Li, Y., Liu, S., Yang, J., Yang, M.H.: Generative face completion. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  23. 23.
    Luo, Y., Zhu, J., Li, M., Ren, Y., Zhang, B.: Smooth neighbors on teacher graphs for semi-supervised learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  24. 24.
    Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2013)Google Scholar
  25. 25.
    Miyato, T., Maeda, S.i., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1979–1993 (2018)Google Scholar
  26. 26.
    Miyato, T., Maeda, S.i., Koyama, M., Nakae, K., Ishii, S.: Distributional smoothing with virtual adversarial training. In: International Conference on Learning Representations (2016)Google Scholar
  27. 27.
    Papamakarios, G., Pavlakou, T., Murray, I.: Masked autoregressive flow for density estimation. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 2338–2347. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6828-masked-autoregressive-flow-for-density-estimation.pdf
  28. 28.
    Park, S., Park, J., Shin, S.J., Moon, I.C.: Adversarial dropout for supervised and semi-supervised learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  29. 29.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations (2016)Google Scholar
  30. 30.
    Rezende, D., Mohamed, S.: Variational inference with normalizing flows. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 1530–1538. PMLR, Lille, France, 07–09 July 2015. http://proceedings.mlr.press/v37/rezende15.html
  31. 31.
    Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 1163–1171. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6333-regularization-with-stochastic-transformations-and-perturbations-for-deep-semi-supervised-learning.pdf
  32. 32.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural information Processing Systems, pp. 2234–2242 (2016)Google Scholar
  33. 33.
    Springenberg, J.T.: Unsupervised and semi-supervised learning with categorical generative adversarial networks. In: International Conference on Learning Representations (2016)Google Scholar
  34. 34.
    Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 1195–1204. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6719-mean-teachers-are-better-role-models-weight-averaged-consistency-targets-improve-semi-supervised-deep-learning-results.pdf
  35. 35.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training (2019)Google Scholar
  37. 37.
    Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, pp. 321–328. MIT Press (2004). http://papers.nips.cc/paper/2506-learning-with-local-and-global-consistency.pdf

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Philips Co-Creation CenterTokyoJapan
  2. 2.University of TsukubaTsukubaJapan
  3. 3.The Tokyo Foundation for Policy ResearchTokyoJapan
  4. 4.Philips India LimitedBangaloreIndia
  5. 5.I Dragon CorporationTokyoJapan

Personalised recommendations