Abstract
Data augmentation methods are crucial to improve the accuracy of densely occluded object recognition in the scene where the quantity and diversity of training images are insufficient. However, the current methods that use regional dropping and mixing strategies suffer from the problem of missing foreground objects and redundant background features, which can lead to densely occluded object recognition issues in classification or detection tasks. Herein, saliency information and mosaic based data augmentation method for densely occluded object recognition is proposed, which utilizes saliency information as prior knowledge to supervise the mosaic process of training images containing densely occluded objects. And the method uses fogging processing and class label mixing to construct new augmented images, in order to improve the accuracy of image classification and object recognition tasks by augmenting the quantity and diversity of training images. Extensive experiments on different classification datasets with various CNN architectures prove the effectiveness of our method.
Similar content being viewed by others
Data availability
In this research work, publicly available standard datasets such as CIFAR-10, CIFAR-100, ImageNet, and VOC have been utilized.
References
Achanta R, Hemami S, Estrada F, et al (2009) Frequency-tuned salient region detection. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 1597–1604
Bengio Y, Bastien F, Bergeron A, et al (2011) Deep learners benefit more from out-of-distribution examples. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, JMLR workshop and conference proceedings, pp 164–172
Cao Z, Huang Z, Pan L, et al (2022) Tctrack: temporal contexts for aerial tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14798–14808
Chen Y, Li G, An P, et al (2023) Light field salient object detection with sparse views via complementary and discriminative interaction network. In: IEEE transactions on circuits and systems for video technology
Choe J, Shim H (2019) Attention-based dropout layer for weakly supervised object localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2219–2228
De Vries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv preprint: arXiv:1708.04552
Ding X, Zhang X, Han J, et al (2022) Scaling up your kernels to 31 × 31: revisiting large kernel design in CNNs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11963–11975
Ghiasi G, Lin TY, Le QV (2018) Dropblock: a regularization method for convolutional networks. Adv Neural Inf Process Syst 31
Guo H, Mao Y, Zhang R (2019) Mixup as locally linear out-of-manifold regularization. In: Proceedings of the AAAI conference on artificial intelligence, pp 3714–3722
Harris E, Marcu A, Painter M, et al (2020) Fmix: enhancing mixed sample data augmentation. arXiv preprint: arXiv:2002.12047
He K, Zhang X, Ren S, et al (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Chen X, Xie S, et al (2022) Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16000–16009
Hinton G, Vinyals O, Dean J, et al (2015) Distilling the knowledge in a neural network. arXiv preprint: arXiv:1503.02531 2(7)
Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE pp 1–8
Huang G, Sun Y, Liu Z, et al (2016) Deep networks with stochastic depth. In: European conference on computer vision, Springer, pp 646–661
Huang G, Liu Z, Van Der Maaten L, et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Hugo T, Cord M, Matthijs D, et al (2021) Training data-efficient image transformers & distillation through attention. In: ICML
Kim B, Lee J, Lee S, et al (2022) Tricubenet: 2D Kernel-based object representation for weakly-occluded oriented object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 167–176
Kim JH, Choo W, Song HO (2020) Puzzle mix: exploiting saliency and local statistics for optimal mixup. In: International conference on machine learning, PMLR, pp 5275–5285
Kim JH, Choo W, Jeong H, et al (2021) Co-mixup: saliency guided joint mixup with supermodular diversity. arXiv preprint: arXiv:2102.03065
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Master’s thesis, University of Tront
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
LeCun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lee CY, Xie S, Gallagher P, et al (2015) Deeply-supervised nets. Artif Intell Stat PMLR, pp 562–570
Lewy D, Mańdziuk J (2023) An overview of mixing augmentation methods and augmentation strategies. Artif Intell Rev 56(3):2111–2169
Li X, Xie X, Yu M, et al (2023) Gradient corner pooling for keypoint-based object detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 1460–1467
Liu J, Liu B, Zhou H, et al (2022a) Tokenmix: rethinking image mixing for data augmentation in vision transformers. In: European conference on computer vision, Springer, pp 455–471
Liu W, Ren G, Yu R, et al (2022b) Image-adaptive yolo for object detection in adverse weather conditions. In: Proceedings of the AAAI conference on artificial intelligence, pp 1792–1800
Liu Z, Lin Y, Cao Y, et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Liu Z, Li S, Wu D et al (2022) Automix: unveiling the power of mixup for stronger classifiers. In: Part XXIV (ed) Computer vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings. Springer, pp 441–458
Ma X, Huang H, Wang Y, et al (2020) Normalized loss functions for deep learning with noisy labels. In: International conference on machine learning, PMLR, pp 6543–6553
Mondal A (2019) Neuro-probabilistic model for object tracking. Pattern Anal Appl 22:1609–1628
Montabone S, Soto A (2010) Human detection using a mobile platform and novel features derived from a visual saliency mechanism. Image Vis Comput 28(3):391–402
Narasimhan SG, Nayar SK (2002) Vision and the atmosphere. Int J Comput Vis 48(3):233
Ning X, Tian W, He F et al (2023) Hyper-sausage coverage function neuron model and learning algorithm for image classification. Pattern Recognit 136:109216
Qin X, Zhang Z, Huang C, et al (2019) Basnet: boundary-aware salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7479–7489
Qin Z, Zhou S, Wang L, et al (2023) Motiontrack: learning robust short-term and long-term motions for multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17939–17948
Ren S, He K, Girshick R, et al (2015) Faster r-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28
Romero A, Ballas N, Kahou SE, et al (2014) Fitnets: hints for thin deep nets. arXiv preprint: arXiv:1412.6550
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Sam D, Kolter JZ (2023) Losses over labels: weakly supervised learning via direct loss construction. In: Proceedings of the AAAI conference on artificial intelligence, pp 9695–9703
Selvaraju RR, Cogswell M, Das A, et al (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 30
Springenberg JT, Dosovitskiy A, Brox T, et al (2014) Striving for simplicity: the all convolutional net. arXiv preprint: arXiv:1412.6806
Suzuki T (2022) Teachaugment: Data augmentation optimization using teacher knowledge. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10904–10914
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Takahashi R, Matsubara T, Uehara K (2019) Data augmentation using random image cropping and patching for deep CNNs. IEEE Trans Circuits Syst Video Technol 30(9):2917–2931
Tompson J, Goroshin R, Jain A, et al (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 648–656
Uddin A, Monira M, Shin W, et al (2020) Saliencymix: a saliency guided data augmentation strategy for better regularization. arXiv preprint: arXiv:2006.01791
Venkataramanan S, Kijak E, Amsaleg L, et al (2022) Alignmixup: improving representations by interpolating aligned features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19174–19183
Verma V, Lamb A, Beckham C, et al (2019) Manifold mixup: better representations by interpolating hidden states. In: International conference on machine learning, PMLR, pp 6438–6447
Wang L, Lu H, Ruan X, et al (2015) Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3183–3192
Xu M, Yoon S, Fuentes A, et al (2023) A comprehensive survey of image augmentation techniques for deep learning. Pattern Recognit p 109347
Yang X, Wu J, He L et al (2023) Cpss-fat: a consistent positive sample selection for object detection with full adaptive threshold. Pattern Recognit 141:109627
Yapıcı MM, Tekerek A, Topaloğlu N (2021) Deep learning-based data augmentation method and signature verification system for offline handwritten signature. Pattern Anal Appl 24:165–179
Ye T, Qin W, Zhao Z et al (2023) Real-time object detection network in UAV-vision based on CNN and transformer. IEEE Trans Instrum Meas 72:1–13
Yelmenoglu ED, Celebi N, Tasci T (2022) Saliency detection based on hybrid artificial bee colony and firefly optimization. Pattern Anal Appl 25(4):757–772
Yun S, Han D, Oh SJ, et al (2019) Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6023–6032
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint: arXiv:1605.07146
Zhang H, Cisse M, Dauphin YN, et al (2017) Mixup: beyond empirical risk minimization. arXiv preprint: arXiv:1710.09412
Zhao R, Ouyang W, Li H, et al (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1265–1274
Zhong Z, Zheng L, Kang G, et al (2020) Random erasing data augmentation. In: Proceedings of the AAAI conference on artificial intelligence, pp 13001–13008
Zhou B, Khosla A, Lapedriza A, et al (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Zhou H, Qiao B, Yang L, et al (2023) Texture-guided saliency distilling for unsupervised salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7257–7267
Acknowledgements
The research reported in this article was supported by the National Natural Science Foundation of China under the Grant No. 61991415 and the Development Project of Ship Situational Intelligent Awareness System under the Grant MC-201920-X01.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tong, Y., Luo, X., Ma, L. et al. Saliency information and mosaic based data augmentation method for densely occluded object recognition. Pattern Anal Applic 27, 34 (2024). https://doi.org/10.1007/s10044-024-01258-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10044-024-01258-z