Abstract
Data augmentation plays a crucial role in enhancing the robustness and performance of machine learning models across various domains. In this study, we introduce a novel mixed-sample data augmentation method called RandoMix. RandoMix is specifically designed to simultaneously address robustness and diversity challenges. It leverages a combination of linear and mask mixed modes, introducing flexibility in candidate selection and weight adjustments. We evaluate the effectiveness of RandoMix on diverse datasets, including CIFAR-10/100, Tiny-ImageNet, ImageNet, and Google Speech Commands. Our results demonstrate its superior erformance compared to existing techniques such as Mixup, CutMix, Fmix, and ResizeMix. Notably, RandoMix excels in enhancing model robustness against adversarial noise, natural noise, and sample occlusion. The comprehensive experimental results and insights into parameter tuning underscore the potential of RandoMix as a versatile and effective data augmentation method. Moreover, it seamlessly integrates into the training pipeline.
Similar content being viewed by others
Data Availibility Statement
The data supporting the findings of this study include publicly available datasets. These datasets are CIFAR-10/100, Tiny-ImageNet, ImageNet and Google Speech Commands, which can be accessed through their respective online repositories. Additional data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: a brief review. Computational intelligence and neuroscience
Kamath U, Liu J, Whitaker J Deep learning for NLP and speech recognition vol 84. Springer
Vapnik V (1968) On the uniform convergence of relative frequencies of events to their probabilities. Dokl Akad Nauk USSR 181:781–787
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) Mixup: beyond empirical risk minimization. International conference on learning representations (ICLR)
Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y (2019) Cutmix: regularization strategy to train strong classifiers with localizable features. IEEE international conference on computer vision (ICCV)
Qin J, Fang J, Zhang Q, Liu W, Wang X, Wang X (2020) Resizemix: mixing data with preserved object information and true labels. arXiv:2012.11101
Harris E, Marcu A, Painter M, Niranjan M, Hare AP-BJ (2021) Fmix: enhancing mixed sample data augmentation. International conference on learning representations (ICLR)
Kim J-H, Choo W, Song HO (2020) Puzzle mix: exploiting saliency and local statistics for optimal mixup. International conference on machine learning (ICML)
Kim J, Choo W, Jeong H, Song HO (2021) Co-mixup: saliency guided joint mixup with supermodular diversity. In: international conference on learning representations (ICLR)
Uddin AFMS, Monira MS, Shin W, Chung T, Bae S-H (2021) Saliencymix: a saliency guided data augmentation strategy for better regularization. International conference on learning representations (ICLR)
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
Chrabaszcz P, Loshchilov I, Hutter F (2017) A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv:1707.08819
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al (2015) Imagenet large scale visual recognition challenge. International journal on computer vision (IJCV)
Warden P (2017) Speech commands: a public dataset for single-word speech recognition. Dataset available from http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz
Verma V, Lamb A, Beckham C, Najafi A, Mitliagkas I, Courville A, Lopez-Paz D, Bengio Y (2019) Manifold mixup: better representations by interpolating hidden states. International conference on machine learning (ICML)
Faramarzi M, Amini M, Badrinaaraayanan A, Verma V, Chandar S (2022) Patchup: a feature-space block-level regularization technique for convolutional neural networks. Proc AAAI Conf Artif Intell 36:589–597
DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision (ECCV)
Zagoruyko S, Komodakis N (2016) Wide residual networks. In: Procedings of the British machine vision conference 2016. British machine vision association
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR)
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 10012–10022
Loshchilov I, Hutter F (2017) Sgdr: stochastic gradient descent with warm restarts. In: International conference on learning representations (ICLR)
Wang W, Liang J, Liu D (2022) Learning equivariant segmentation with instance-unique querying. Neural Inf Process Syst (NeurIPS) 35:12826–12840
Wang W, Han C, Zhou T, Liu D (2023) Visual recognition with deep nearest centroids. In: International conference on learning representations (ICLR)
Liu D, Cui Y, Tan W, Chen Y (2021) Sg-net: spatial granularity network for one-stage video instance segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 9816–9825
Liang J, Zhou T, Liu D, Wang W (2023) Clustseg: clustering for universal segmentation. In: international conference on machine learning (ICML)
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: IEEE international conference on computer vision (ICCV), pp 618–626
Brain G (2017) Tensorflow speech recognition challenge. https://www.kaggle.com/c/tensorflow-speech-recognition-challenge
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. International conference on learning representations (ICLR)
Hendrycks D, Dietterich T (2019) Benchmarking neural network robustness to common corruptions and perturbations. International conference on learning representations (ICLR)
Acknowledgements
This work was supported in part by the STI 2030-Major Projects of China under Grant 2021ZD0201300, and by the National Science Foundation of China under Grant 62276127.
Explicitly Stated the Absence of Direct Ethical Concerns: We clarified that our research methodology and the nature of RandoMix do not directly engage with ethical dilemmas typically encountered in studies involving human or animal subjects, sensitive data, or environmental impacts.
Compliance and Ethical Standards: We have also affirmed our adherence to general ethical standards in research, including honesty in reporting results, transparency in methodology, and respect for intellectual property.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, X., Shen, F., Zhao, J. et al. RandoMix: a mixed sample data augmentation method with multiple mixed modes. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18868-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18868-8