Skip to main content
Log in

RandoMix: a mixed sample data augmentation method with multiple mixed modes

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Data augmentation plays a crucial role in enhancing the robustness and performance of machine learning models across various domains. In this study, we introduce a novel mixed-sample data augmentation method called RandoMix. RandoMix is specifically designed to simultaneously address robustness and diversity challenges. It leverages a combination of linear and mask mixed modes, introducing flexibility in candidate selection and weight adjustments. We evaluate the effectiveness of RandoMix on diverse datasets, including CIFAR-10/100, Tiny-ImageNet, ImageNet, and Google Speech Commands. Our results demonstrate its superior erformance compared to existing techniques such as Mixup, CutMix, Fmix, and ResizeMix. Notably, RandoMix excels in enhancing model robustness against adversarial noise, natural noise, and sample occlusion. The comprehensive experimental results and insights into parameter tuning underscore the potential of RandoMix as a versatile and effective data augmentation method. Moreover, it seamlessly integrates into the training pipeline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3

Similar content being viewed by others

Data Availibility Statement

The data supporting the findings of this study include publicly available datasets. These datasets are CIFAR-10/100, Tiny-ImageNet, ImageNet and Google Speech Commands, which can be accessed through their respective online repositories. Additional data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: a brief review. Computational intelligence and neuroscience

  2. Kamath U, Liu J, Whitaker J Deep learning for NLP and speech recognition vol 84. Springer

  3. Vapnik V (1968) On the uniform convergence of relative frequencies of events to their probabilities. Dokl Akad Nauk USSR 181:781–787

    Google Scholar 

  4. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) Mixup: beyond empirical risk minimization. International conference on learning representations (ICLR)

  5. Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y (2019) Cutmix: regularization strategy to train strong classifiers with localizable features. IEEE international conference on computer vision (ICCV)

  6. Qin J, Fang J, Zhang Q, Liu W, Wang X, Wang X (2020) Resizemix: mixing data with preserved object information and true labels. arXiv:2012.11101

  7. Harris E, Marcu A, Painter M, Niranjan M, Hare AP-BJ (2021) Fmix: enhancing mixed sample data augmentation. International conference on learning representations (ICLR)

  8. Kim J-H, Choo W, Song HO (2020) Puzzle mix: exploiting saliency and local statistics for optimal mixup. International conference on machine learning (ICML)

  9. Kim J, Choo W, Jeong H, Song HO (2021) Co-mixup: saliency guided joint mixup with supermodular diversity. In: international conference on learning representations (ICLR)

  10. Uddin AFMS, Monira MS, Shin W, Chung T, Bae S-H (2021) Saliencymix: a saliency guided data augmentation strategy for better regularization. International conference on learning representations (ICLR)

  11. Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images

  12. Chrabaszcz P, Loshchilov I, Hutter F (2017) A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv:1707.08819

  13. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al (2015) Imagenet large scale visual recognition challenge. International journal on computer vision (IJCV)

  14. Warden P (2017) Speech commands: a public dataset for single-word speech recognition. Dataset available from http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz

  15. Verma V, Lamb A, Beckham C, Najafi A, Mitliagkas I, Courville A, Lopez-Paz D, Bengio Y (2019) Manifold mixup: better representations by interpolating hidden states. International conference on machine learning (ICML)

  16. Faramarzi M, Amini M, Badrinaaraayanan A, Verma V, Chandar S (2022) Patchup: a feature-space block-level regularization technique for convolutional neural networks. Proc AAAI Conf Artif Intell 36:589–597

    Google Scholar 

  17. DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552

  18. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision (ECCV)

  19. Zagoruyko S, Komodakis N (2016) Wide residual networks. In: Procedings of the British machine vision conference 2016. British machine vision association

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR)

  21. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 10012–10022

  22. Loshchilov I, Hutter F (2017) Sgdr: stochastic gradient descent with warm restarts. In: International conference on learning representations (ICLR)

  23. Wang W, Liang J, Liu D (2022) Learning equivariant segmentation with instance-unique querying. Neural Inf Process Syst (NeurIPS) 35:12826–12840

    Google Scholar 

  24. Wang W, Han C, Zhou T, Liu D (2023) Visual recognition with deep nearest centroids. In: International conference on learning representations (ICLR)

  25. Liu D, Cui Y, Tan W, Chen Y (2021) Sg-net: spatial granularity network for one-stage video instance segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 9816–9825

  26. Liang J, Zhou T, Liu D, Wang W (2023) Clustseg: clustering for universal segmentation. In: international conference on machine learning (ICML)

  27. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: IEEE international conference on computer vision (ICCV), pp 618–626

  28. Brain G (2017) Tensorflow speech recognition challenge. https://www.kaggle.com/c/tensorflow-speech-recognition-challenge

  29. Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. International conference on learning representations (ICLR)

  30. Hendrycks D, Dietterich T (2019) Benchmarking neural network robustness to common corruptions and perturbations. International conference on learning representations (ICLR)

Download references

Acknowledgements

This work was supported in part by the STI 2030-Major Projects of China under Grant 2021ZD0201300, and by the National Science Foundation of China under Grant 62276127.

Explicitly Stated the Absence of Direct Ethical Concerns: We clarified that our research methodology and the nature of RandoMix do not directly engage with ethical dilemmas typically encountered in studies involving human or animal subjects, sensitive data, or environmental impacts.

Compliance and Ethical Standards: We have also affirmed our adherence to general ethical standards in research, including honesty in reporting results, transparency in methodology, and respect for intellectual property.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Furao Shen.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Shen, F., Zhao, J. et al. RandoMix: a mixed sample data augmentation method with multiple mixed modes. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18868-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18868-8

Keywords

Navigation