Saliency information and mosaic based data augmentation method for densely occluded object recognition

Tong, Ying; Luo, Xiangfeng; Ma, Liyan; Xie, Shaorong; Yang, Wenbin; Guo, Yinsai

doi:10.1007/s10044-024-01258-z

Saliency information and mosaic based data augmentation method for densely occluded object recognition

Original Paper
Published: 29 March 2024

Volume 27, article number 34, (2024)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Ying Tong¹,
Xiangfeng Luo¹,
Liyan Ma¹,
Shaorong Xie¹,
Wenbin Yang¹ &
…
Yinsai Guo¹

88 Accesses
1 Altmetric
Explore all metrics

Abstract

Data augmentation methods are crucial to improve the accuracy of densely occluded object recognition in the scene where the quantity and diversity of training images are insufficient. However, the current methods that use regional dropping and mixing strategies suffer from the problem of missing foreground objects and redundant background features, which can lead to densely occluded object recognition issues in classification or detection tasks. Herein, saliency information and mosaic based data augmentation method for densely occluded object recognition is proposed, which utilizes saliency information as prior knowledge to supervise the mosaic process of training images containing densely occluded objects. And the method uses fogging processing and class label mixing to construct new augmented images, in order to improve the accuracy of image classification and object recognition tasks by augmenting the quantity and diversity of training images. Extensive experiments on different classification datasets with various CNN architectures prove the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NeighborMix data augmentation for image recognition

Article 01 September 2023

Grid self-occlusion: a grid self-occlusion data augmentation for better classification

Article 30 July 2022

Data Augmentation for Low-Level Vision: CutBlur and Mixture-of-Augmentation

Article 05 January 2024

Data availability

In this research work, publicly available standard datasets such as CIFAR-10, CIFAR-100, ImageNet, and VOC have been utilized.

References

Achanta R, Hemami S, Estrada F, et al (2009) Frequency-tuned salient region detection. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 1597–1604
Bengio Y, Bastien F, Bergeron A, et al (2011) Deep learners benefit more from out-of-distribution examples. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, JMLR workshop and conference proceedings, pp 164–172
Cao Z, Huang Z, Pan L, et al (2022) Tctrack: temporal contexts for aerial tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14798–14808
Chen Y, Li G, An P, et al (2023) Light field salient object detection with sparse views via complementary and discriminative interaction network. In: IEEE transactions on circuits and systems for video technology
Choe J, Shim H (2019) Attention-based dropout layer for weakly supervised object localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2219–2228
De Vries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv preprint: arXiv:1708.04552
Ding X, Zhang X, Han J, et al (2022) Scaling up your kernels to 31 × 31: revisiting large kernel design in CNNs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11963–11975
Ghiasi G, Lin TY, Le QV (2018) Dropblock: a regularization method for convolutional networks. Adv Neural Inf Process Syst 31
Guo H, Mao Y, Zhang R (2019) Mixup as locally linear out-of-manifold regularization. In: Proceedings of the AAAI conference on artificial intelligence, pp 3714–3722
Harris E, Marcu A, Painter M, et al (2020) Fmix: enhancing mixed sample data augmentation. arXiv preprint: arXiv:2002.12047
He K, Zhang X, Ren S, et al (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Chen X, Xie S, et al (2022) Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16000–16009
Hinton G, Vinyals O, Dean J, et al (2015) Distilling the knowledge in a neural network. arXiv preprint: arXiv:1503.02531 2(7)
Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE pp 1–8
Huang G, Sun Y, Liu Z, et al (2016) Deep networks with stochastic depth. In: European conference on computer vision, Springer, pp 646–661
Huang G, Liu Z, Van Der Maaten L, et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Hugo T, Cord M, Matthijs D, et al (2021) Training data-efficient image transformers & distillation through attention. In: ICML
Kim B, Lee J, Lee S, et al (2022) Tricubenet: 2D Kernel-based object representation for weakly-occluded oriented object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 167–176
Kim JH, Choo W, Song HO (2020) Puzzle mix: exploiting saliency and local statistics for optimal mixup. In: International conference on machine learning, PMLR, pp 5275–5285
Kim JH, Choo W, Jeong H, et al (2021) Co-mixup: saliency guided joint mixup with supermodular diversity. arXiv preprint: arXiv:2102.03065
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Master’s thesis, University of Tront
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
LeCun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Lee CY, Xie S, Gallagher P, et al (2015) Deeply-supervised nets. Artif Intell Stat PMLR, pp 562–570
Lewy D, Mańdziuk J (2023) An overview of mixing augmentation methods and augmentation strategies. Artif Intell Rev 56(3):2111–2169
Article Google Scholar
Li X, Xie X, Yu M, et al (2023) Gradient corner pooling for keypoint-based object detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 1460–1467
Liu J, Liu B, Zhou H, et al (2022a) Tokenmix: rethinking image mixing for data augmentation in vision transformers. In: European conference on computer vision, Springer, pp 455–471
Liu W, Ren G, Yu R, et al (2022b) Image-adaptive yolo for object detection in adverse weather conditions. In: Proceedings of the AAAI conference on artificial intelligence, pp 1792–1800
Liu Z, Lin Y, Cao Y, et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Liu Z, Li S, Wu D et al (2022) Automix: unveiling the power of mixup for stronger classifiers. In: Part XXIV (ed) Computer vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings. Springer, pp 441–458
Ma X, Huang H, Wang Y, et al (2020) Normalized loss functions for deep learning with noisy labels. In: International conference on machine learning, PMLR, pp 6543–6553
Mondal A (2019) Neuro-probabilistic model for object tracking. Pattern Anal Appl 22:1609–1628
Article MathSciNet Google Scholar
Montabone S, Soto A (2010) Human detection using a mobile platform and novel features derived from a visual saliency mechanism. Image Vis Comput 28(3):391–402
Article Google Scholar
Narasimhan SG, Nayar SK (2002) Vision and the atmosphere. Int J Comput Vis 48(3):233
Article Google Scholar
Ning X, Tian W, He F et al (2023) Hyper-sausage coverage function neuron model and learning algorithm for image classification. Pattern Recognit 136:109216
Article Google Scholar
Qin X, Zhang Z, Huang C, et al (2019) Basnet: boundary-aware salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7479–7489
Qin Z, Zhou S, Wang L, et al (2023) Motiontrack: learning robust short-term and long-term motions for multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17939–17948
Ren S, He K, Girshick R, et al (2015) Faster r-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28
Romero A, Ballas N, Kahou SE, et al (2014) Fitnets: hints for thin deep nets. arXiv preprint: arXiv:1412.6550
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Sam D, Kolter JZ (2023) Losses over labels: weakly supervised learning via direct loss construction. In: Proceedings of the AAAI conference on artificial intelligence, pp 9695–9703
Selvaraju RR, Cogswell M, Das A, et al (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 30
Springenberg JT, Dosovitskiy A, Brox T, et al (2014) Striving for simplicity: the all convolutional net. arXiv preprint: arXiv:1412.6806
Suzuki T (2022) Teachaugment: Data augmentation optimization using teacher knowledge. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10904–10914
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Takahashi R, Matsubara T, Uehara K (2019) Data augmentation using random image cropping and patching for deep CNNs. IEEE Trans Circuits Syst Video Technol 30(9):2917–2931
Article Google Scholar
Tompson J, Goroshin R, Jain A, et al (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 648–656
Uddin A, Monira M, Shin W, et al (2020) Saliencymix: a saliency guided data augmentation strategy for better regularization. arXiv preprint: arXiv:2006.01791
Venkataramanan S, Kijak E, Amsaleg L, et al (2022) Alignmixup: improving representations by interpolating aligned features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19174–19183
Verma V, Lamb A, Beckham C, et al (2019) Manifold mixup: better representations by interpolating hidden states. In: International conference on machine learning, PMLR, pp 6438–6447
Wang L, Lu H, Ruan X, et al (2015) Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3183–3192
Xu M, Yoon S, Fuentes A, et al (2023) A comprehensive survey of image augmentation techniques for deep learning. Pattern Recognit p 109347
Yang X, Wu J, He L et al (2023) Cpss-fat: a consistent positive sample selection for object detection with full adaptive threshold. Pattern Recognit 141:109627
Article Google Scholar
Yapıcı MM, Tekerek A, Topaloğlu N (2021) Deep learning-based data augmentation method and signature verification system for offline handwritten signature. Pattern Anal Appl 24:165–179
Article Google Scholar
Ye T, Qin W, Zhao Z et al (2023) Real-time object detection network in UAV-vision based on CNN and transformer. IEEE Trans Instrum Meas 72:1–13
Google Scholar
Yelmenoglu ED, Celebi N, Tasci T (2022) Saliency detection based on hybrid artificial bee colony and firefly optimization. Pattern Anal Appl 25(4):757–772
Article Google Scholar
Yun S, Han D, Oh SJ, et al (2019) Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6023–6032
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint: arXiv:1605.07146
Zhang H, Cisse M, Dauphin YN, et al (2017) Mixup: beyond empirical risk minimization. arXiv preprint: arXiv:1710.09412
Zhao R, Ouyang W, Li H, et al (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1265–1274
Zhong Z, Zheng L, Kang G, et al (2020) Random erasing data augmentation. In: Proceedings of the AAAI conference on artificial intelligence, pp 13001–13008
Zhou B, Khosla A, Lapedriza A, et al (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Zhou H, Qiao B, Yang L, et al (2023) Texture-guided saliency distilling for unsupervised salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7257–7267

Download references

Acknowledgements

The research reported in this article was supported by the National Natural Science Foundation of China under the Grant No. 61991415 and the Development Project of Ship Situational Intelligent Awareness System under the Grant MC-201920-X01.

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
Ying Tong, Xiangfeng Luo, Liyan Ma, Shaorong Xie, Wenbin Yang & Yinsai Guo

Authors

Ying Tong
View author publications
You can also search for this author in PubMed Google Scholar
Xiangfeng Luo
View author publications
You can also search for this author in PubMed Google Scholar
Liyan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shaorong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Wenbin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yinsai Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaorong Xie.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tong, Y., Luo, X., Ma, L. et al. Saliency information and mosaic based data augmentation method for densely occluded object recognition. Pattern Anal Applic 27, 34 (2024). https://doi.org/10.1007/s10044-024-01258-z

Download citation

Received: 18 September 2023
Accepted: 20 February 2024
Published: 29 March 2024
DOI: https://doi.org/10.1007/s10044-024-01258-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Saliency information and mosaic based data augmentation method for densely occluded object recognition

Abstract

Access this article

Similar content being viewed by others

NeighborMix data augmentation for image recognition

Grid self-occlusion: a grid self-occlusion data augmentation for better classification

Data Augmentation for Low-Level Vision: CutBlur and Mixture-of-Augmentation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Saliency information and mosaic based data augmentation method for densely occluded object recognition

Abstract

Access this article

Similar content being viewed by others

NeighborMix data augmentation for image recognition

Grid self-occlusion: a grid self-occlusion data augmentation for better classification

Data Augmentation for Low-Level Vision: CutBlur and Mixture-of-Augmentation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation