Advertisement

AutoMix: Mixup Networks for Sample Interpolation via Cooperative Barycenter Learning

Conference paper
  • 737 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12355)

Abstract

This paper proposes new ways of sample mixing by thinking of the process as generation of barycenter in a metric space for data augmentation. First, we present an optimal-transport-based mixup technique to generate Wasserstein barycenter which works well on images with clean background and is empirically shown complementary to existing mixup methods. Then we generalize mixup to an AutoMix technique by using a learnable network to fit barycenter in a cooperative way between the classifier (a.k.a. discriminator) and generator networks. Experimental results on both multi-class and multi-label prediction tasks show the efficacy of our approach, which is also verified in the presence of unseen categories (open set) and noise.

Keywords

Image mixing Generative model Image classification 

References

  1. 1.
    Agueh, M., Carlier, G.: Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43, 904–921 (2011)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Anderes, E., Borgwardt, S., Miller, J.: Discrete Wasserstein barycenters: optimal transport for discrete data. Math. Methods Oper. Res. 84, 389–409 (2016)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning (2017)Google Scholar
  4. 4.
    Babbar, R., Schölkopf, B.: DiSMEC: distributed sparse machines for extreme multi-label classification. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 721–729 (2017)Google Scholar
  5. 5.
    Beckham, C., et al.: On adversarial mixup resynthesis. In: Advances in Neural Information Processing Systems, pp. 4348–4359 (2019)Google Scholar
  6. 6.
    Bendale, A., Boult, T.E.: Towards open set deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1563–1572 (2016)Google Scholar
  7. 7.
    Berthelot, D., Raffel, C., Roy, A., Goodfellow, I.: Understanding and improving interpolation in autoencoders via an adversarial regularizer. In: International Conference on Learning Representations (2018)Google Scholar
  8. 8.
    Chapelle, O., Weston, J., Bottou, L., Vapnik, V.: Vicinal risk minimization. In: Advances in Neural Information Processing Systems, pp. 416–422 (2001)Google Scholar
  9. 9.
    Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: International Conference on Machine Learning, pp. 685–693 (2014)Google Scholar
  10. 10.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  11. 11.
    Guo, H., Mao, Y., Zhang, R.: Mixup as locally linear out-of-manifold regularization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3714–3722 (2019)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  13. 13.
    Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29, 82–97 (2012)CrossRefGoogle Scholar
  14. 14.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)Google Scholar
  15. 15.
    Inoue, H.: Data augmentation by pairing samples for images classification. arXiv preprint arXiv:1801.02929 (2018)
  16. 16.
    Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)Google Scholar
  17. 17.
    Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report (2009)Google Scholar
  18. 18.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  19. 19.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)CrossRefGoogle Scholar
  20. 20.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
  21. 21.
    Oktay, O., et al.: Attention U-Net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
  22. 22.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  23. 23.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Scheirer, W.J., de Rezende Rocha, A., Sapkota, A., Boult, T.E.: Toward open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1757–1772 (2013)CrossRefGoogle Scholar
  25. 25.
    Simard, P.Y., LeCun, Y.A., Denker, J.S., Victorri, B.: Transformation invariance in pattern recognition — tangent distance and tangent propagation. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 239–274. Springer, Heidelberg (1998).  https://doi.org/10.1007/3-540-49430-8_13CrossRefGoogle Scholar
  26. 26.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)Google Scholar
  27. 27.
    Solomon, J., et al.: Convolutional Wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans. Graph. (TOG) 34, 1–11 (2015)CrossRefGoogle Scholar
  28. 28.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: The German traffic sign recognition benchmark: a multi-class classification competition. In: IEEE International Joint Conference on Neural Networks (2011)Google Scholar
  30. 30.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)Google Scholar
  31. 31.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  32. 32.
    Tokozume, Y., Ushiku, Y., Harada, T.: Between-class learning for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5486–5494 (2018)Google Scholar
  33. 33.
    Tokozume, Y., Ushiku, Y., Harada, T.: Learning from between-class examples for deep sound recognition. In: International Conference on Learning Representations (2018)Google Scholar
  34. 34.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Boston (2009).  https://doi.org/10.1007/978-0-387-09823-4_34CrossRefGoogle Scholar
  35. 35.
    Verma, V., Lamb, A., Beckham, C., Najafi, A., Bengio, Y.: Manifold mixup: better representations by interpolating hidden states. In: International Conference on Machine Learning, pp. 6438–6447 (2019)Google Scholar
  36. 36.
    Wong, S.C., Gatt, A., Stamatescu, V., McDonnell, M.D.: Understanding data augmentation for classification: when to warp? In: International Conference on Digital Image Computing: Techniques and Applications, pp. 1–6 (2016)Google Scholar
  37. 37.
    Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
  38. 38.
    Yaguchi, Y., Shiratani, F., Iwaki, H.: MixFeat: mix feature in latent space learns discriminative space. Submission at International Conference on Learning Representations (2019)Google Scholar
  39. 39.
    Yao, L., Miller, J.: Tiny ImageNet classification with convolutional neural networks. CS 231N, p. 8 (2015)Google Scholar
  40. 40.
    Yen, I.E.H., Huang, X., Ravikumar, P., Zhong, K., Dhillon, I.: PD-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: International Conference on Machine Learning, pp. 3069–3077 (2016)Google Scholar
  41. 41.
    Yue, D., Hua-jun, F., Zhi-hai, X., Yue-ting, C., Qi, L.: Attention Res-Unet: an efficient shadow detection algorithm. J. Zhejiang Univ. (Eng. Sci.) 53, 373 (2019)Google Scholar
  42. 42.
    Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. In: International Conference on Learning Representations (2018)Google Scholar
  43. 43.
    Zhang, W., Yan, J., Wang, X., Zha, H.: Deep extreme multi-label learning. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 100–107 (2018)Google Scholar
  44. 44.
    Zhou, Z.H., Zhang, M.L.: Multi-instance multi-label learning with application to scene classification. In: Advances in Neural Information Processing Systems, pp. 1609–1616 (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Software EngineeringEast China Normal UniversityShanghaiChina
  2. 2.Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina
  3. 3.Shenzhen Research Institute of Big DataThe Chinese University of Hong KongShenzhenChina

Personalised recommendations