Abstract
Data augmentation (DA) is an effective way to improve the performance of deep networks. Unfortunately, current methods are mostly developed for high-level vision tasks (eg, image classification) and few are studied for low-level (eg, image restoration). In this paper, we provide a comprehensive analysis of the existing DAs in the frequency domain. We find that the methods that largely manipulate the spatial information can hinder the image restoration process and hurt the performance. Based on our analyses, we propose CutBlur and mixture-of-augmentation (MoA). CutBlur cuts a low-quality patch and pastes it to the corresponding high-quality image region, or vice versa. The key intuition is to provide enough DA effect while keeping the pixel distribution intact. This characteristic of CutBlur enables a model to learn not only “how” but also “where” to reconstruct an image. Eventually, the model understands “how much” to restore given pixels, which allows it to generalize better to unseen data distributions. We further improve the restoration performance by MoA that incorporates the curated list of DAs. We demonstrate the effectiveness of our methods by conducting extensive experiments on several low-level vision tasks on both single or a mixture of distortion tasks. Our results show that CutBlur and MoA consistently and significantly improve the performance especially when the model size is big and the data is collected under real-world environments. Our code is available at https://github.com/clovaai/cutblur.
Similar content being viewed by others
Data Availibility
The datasets generated during and/or analyzed during the current study are available in the following repository: https://github.com/clovaai/cutblur.
Notes
For every experiment, we only used geometric DA methods, flip and rotation, which is the default setting of EDSR. Here, to solely analyze the effect of the DA methods, we did not use the \(\times 2\) pre-trained model.
References
Abdelhamed, A., Lin, S., & Brown, M. S. (2018). A high-quality denoising dataset for smartphone cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Abdelhamed, A., Afifi, M., & Timofte, R., et al. (2020). Ntire 2020 challenge on real image denoising: Dataset, methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
Agustsson, E., & Timofte, R. (2017). Ntire 2017 challenge on single image super-resolution: Dataset and study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops
Ahmad, W., Ali, H., Shah, Z., et al. (2022). A new generative adversarial network for medical images super resolution. Scientific Reports, 12(1), 9533.
Ahn, N., Kang, B., & Sohn, K. A. (2018). Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV)
Baek, K., Bang, D., & Shim, H. (2021). Gridmix: Strong regularization through local context mapping. Pattern Recognition, 109(107), 594.
Bevilacqua, M., Roumy, A., & Guillemot, C., et al. (2012). Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the British Machine Vision Conference (BMVC)
Cai, J., Zeng, H., & Yong, H., et al. (2019). Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the IEEE/CVF International Conference on Computer Vision
Chen, C., Xiong, Z., & Tian, X., et al. (2019). Camera lens super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Cheng, K., & Wu, C. (2020). Self-calibrated attention neural network for real-world super resolution. In European Conference on Computer Vision. Springer, pp 453–467
Choe, J., & Shim, H. (2019). Attention-based dropout layer for weakly supervised object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Cubuk, E. D., Zoph, B., & Mane, D., et al. (2019). Autoaugment: Learning augmentation policies from data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Dabouei, A., Soleymani, S., & Taherkhani, F., et al, (2021). Supermix: Supervising the mixing data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13,794–13,803
DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Dong, C., Loy, C. C., He, K., et al. (2015). Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 295–307.
El Helou, M., Zhou, R., & Süsstrunk, S. (2020). Stochastic frequency masking to improve super-resolution and denoising networks. In European Conference on Computer Vision. Springer, pp. 749–766
Feng, R., Gu, J., & Qiao, Y., et al. (2019). Suppressing model overfitting for image super-resolution networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Pointwise shape-adaptive dct for high-quality denoising and deblocking of grayscale and color images. IEEE Transactions on Image Processing, 16(5), 1395–1411.
Gastaldi, X. (2017). Shake-shake regularization. arXiv:1705.07485
Ghiasi, G., Lin, T. Y., & Le, Q.V. (2018). Dropblock: A regularization method for convolutional networks. In Advances in Neural Information Processing Systems
Gu, S., Zuo, W., & Xie, Q., et al. (2015). Convolutional sparse coding for image super-resolution. In Proceedings of the IEEE International Conference on Computer Vision
Hendrycks, D., & Dietterich, T. (2019). Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations
Hong, M., Choi, J., & Kim, G. (2021). Stylemix: Separating content and style for enhanced data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14,862–14,870
Huang, J. B., Singh, A., & Ahuja, N. (2015). Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Ishii, Y., & Yamashita, T. (2021). Cutdepth: Edge-aware data augmentation in depth estimation. arXiv:2107.07684
Lai, W. S., Huang, J. B., Ahuja, N., et al. (2018). Fast and accurate image super-resolution with deep Laplacian pyramid networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(11), 2599–2613.
Leclerc, S., Smistad, E., Pedrosa, J., et al. (2019). Deep learning for segmentation using an open large-scale dataset in 2d echocardiography. IEEE Transactions on Medical Imaging, 38(9), 2198–2210.
Liang, J., Cao, J., & Sun, G., et al. (2021). Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844
Liang, W., Liang, Y., & Jia, J. (2023). Miamix: Enhancing image classification through a multi-stage augmented mixied sample data augmentation method. arXiv:2308.02804
Lim, B., Son, S., & Kim, H., et al. (2017). Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
Lim, S., Kim, I., & Kim, T., et al. (2019). Fast autoaugment. In Advances in Neural Information Processing Systems
Liu, Z., Lin, Y., & Cao, Y., et al. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10,012–10,022
Liu, Z., Li, S., & Wu, D., et al. (2022). Automix: Unveiling the power of mixup for stronger classifiers. In European Conference on Computer Vision. Springer, pp. 441–458
Martin, D., Fowlkes, C., & Tal, D., et al. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the IEEE International Conference on Computer Vision
Matsui, Y., Ito, K., Aramaki, Y., et al. (2017). Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, 76(20), 21811–21838.
Nakao, K., & Nobuhara, H. (2022). Controllable image super-resolution by som based data augmentation and its applications. In 2022 Joint 12th International Conference on Soft Computing and Intelligent Systems and 23rd International Symposium on Advanced Intelligent Systems (SCIS &ISIS). IEEE, pp. 1–7
Raghavan, J., & Ahmadi, M. (2022). Data augmentation methods for low resolution facial images. In: TENCON 2022–2022 IEEE Region 10 Conference (TENCON). IEEE, pp. 1–6
Sheikh, H. (2005). Live image quality assessment database release 2. http://liveeceutexasedu/research/quality
Srivastava, N., Hinton, G., Krizhevsky, A., et al. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.
Staal, J., Abràmoff, M. D., Niemeijer, M., et al. (2004). Ridge-based vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging, 23(4), 501–509.
Szegedy, C., Vanhoucke, V., & Ioffe, S., et al. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Timofte, R., De Smet, V., & Van Gool, L. (2014). A+: Adjusted anchored neighborhood regression for fast super-resolution. In Asian Conference on Computer Vision. Springer
Timofte, R., Rothe, R., & Van Gool, L. (2016). Seven ways to improve example-based single image super resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Tompson, J., Goroshin, R., & Jain, A., et al. (2015). Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Vaswani, A., Shazeer, N., & Parmar, N., et al. (2017). Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 6000-6010
Verma, V., Lamb, A., & Beckham, C., et al. (2019). Manifold mixup: Better representations by interpolating hidden states. In International Conference on Machine Learning, PMLR
Vu, T., Van Nguyen, C., & Pham, T. X., et al. (2018). Fast and efficient image quality enhancement via desubpixel convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV)
Wang, X., Xie, L., & Dong, C., et al. (2021). Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1905–1914
Wang, Z., Bovik, A. C., Sheikh, H. R., et al. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Wei, Y., Ma, J., & Jiang, Z., et al. (2022). Mixed color channels (mcc): A universal module for mixed sample data augmentation methods. In 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp. 1–6
Yamada, Y., Iwamura, M., Akiba, T., et al. (2019). Shakedrop regularization for deep residual learning. IEEE Access, 7, 186,126-186,136.
Yang, J., Wright, J., Huang, T. S., et al. (2010). Image super-resolution via sparse representation. IEEE Transactions on Image Processing, 19(11), 2861–2873.
Yoo, J., Ahn, N., & Sohn, K. A. (2020). Rethinking data augmentation for image super-resolution: A comprehensive analysis and a new strategy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8375–8384
Yu, K., Dong, C., & Lin, L., et al. (2018). Crafting a toolchain for image restoration by deep reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Yun, S., Han, D., & Oh, S. J., et al. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision
Zhang, H., Cisse, M., & Dauphin, Y. N., et al. (2018a). mixup: Beyond empirical risk minimization. In International Conference on Learning Representations
Zhang, K., Zuo, W., & Gu, S., et al. (2017). Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang, K., Liang, J., & Van Gool, L., et al. (2021). Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4791–4800
Zhang, X., Ng, R., & Chen, Q. (2018b). Single image reflection separation with perceptual losses. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang, Y., Li, K., & Li, K., et al. (2018c). Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV)
Zhang, Y., Li, K., & Li, K., et al. (2019). Residual non-local attention networks for image restoration. In International Conference on Learning Representations
Zhang, Y., Tian, Y., Kong, Y., et al. (2020). Residual dense network for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 2480–2495.
Zhong, Z., Zheng, L., & Kang, G., et al. (2017). Random erasing data augmentation. arXiv:1708.04896
Zhou, R., El Helou, M., & Sage, D., et al. (2020). W2S: Microscopy data with joint denoising and super-resolution for widefield to SIM mapping. In ECCVW
Acknowledgements
This work was by the Korea Research Institute for Defence Technology Planning and Advancement (KRIT) grant funded by the Korea government (DAPA) in 2022 (KRIT-CT-22-037, SAR Image Super-Resolution for Improving of Target Identification Performance, 50%), National Research Foundation of Korea Grants funded by the Korea government (MSIT) (No. NRF-2019R1A2C1006608, 5%, No. 2.220574.01, 15%), Institute of Information & communications Technology Planning & Evaluation(IITP) Grants funded by the ITRC (Information Technology Research Center) support program (IITP-2020-2018-0-01431, 5%) and MSIT No.2020-0-01336 5%, Artificial Intelligence Graduate School Program (UNIST), 5%, No.2021-0-02068, Artificial Intelligence Innovation Hub, 5%, No.2022-0-00959, (Part 2) Few-Shot Learning of Causal Inference in Vision and Language for Decision Making, 5%, No.2022-0-00264, Comprehensive Video Understanding and Generation with Knowledge-based Deep Logic Neural Network, 5%).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We claim no conflicts of interest.
Additional information
Communicated by Oliver Zendel.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ahn, N., Yoo, J. & Sohn, KA. Data Augmentation for Low-Level Vision: CutBlur and Mixture-of-Augmentation. Int J Comput Vis 132, 2041–2059 (2024). https://doi.org/10.1007/s11263-023-01970-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-023-01970-z