Skip to main content
Log in

Saliency information and mosaic based data augmentation method for densely occluded object recognition

  • Original Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Data augmentation methods are crucial to improve the accuracy of densely occluded object recognition in the scene where the quantity and diversity of training images are insufficient. However, the current methods that use regional dropping and mixing strategies suffer from the problem of missing foreground objects and redundant background features, which can lead to densely occluded object recognition issues in classification or detection tasks. Herein, saliency information and mosaic based data augmentation method for densely occluded object recognition is proposed, which utilizes saliency information as prior knowledge to supervise the mosaic process of training images containing densely occluded objects. And the method uses fogging processing and class label mixing to construct new augmented images, in order to improve the accuracy of image classification and object recognition tasks by augmenting the quantity and diversity of training images. Extensive experiments on different classification datasets with various CNN architectures prove the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

In this research work, publicly available standard datasets such as CIFAR-10, CIFAR-100, ImageNet, and VOC have been utilized.

References

  1. Achanta R, Hemami S, Estrada F, et al (2009) Frequency-tuned salient region detection. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 1597–1604

  2. Bengio Y, Bastien F, Bergeron A, et al (2011) Deep learners benefit more from out-of-distribution examples. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, JMLR workshop and conference proceedings, pp 164–172

  3. Cao Z, Huang Z, Pan L, et al (2022) Tctrack: temporal contexts for aerial tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14798–14808

  4. Chen Y, Li G, An P, et al (2023) Light field salient object detection with sparse views via complementary and discriminative interaction network. In: IEEE transactions on circuits and systems for video technology

  5. Choe J, Shim H (2019) Attention-based dropout layer for weakly supervised object localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2219–2228

  6. De Vries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv preprint: arXiv:1708.04552

  7. Ding X, Zhang X, Han J, et al (2022) Scaling up your kernels to 31 × 31: revisiting large kernel design in CNNs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11963–11975

  8. Ghiasi G, Lin TY, Le QV (2018) Dropblock: a regularization method for convolutional networks. Adv Neural Inf Process Syst 31

  9. Guo H, Mao Y, Zhang R (2019) Mixup as locally linear out-of-manifold regularization. In: Proceedings of the AAAI conference on artificial intelligence, pp 3714–3722

  10. Harris E, Marcu A, Painter M, et al (2020) Fmix: enhancing mixed sample data augmentation. arXiv preprint: arXiv:2002.12047

  11. He K, Zhang X, Ren S, et al (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  12. He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  13. He K, Chen X, Xie S, et al (2022) Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16000–16009

  14. Hinton G, Vinyals O, Dean J, et al (2015) Distilling the knowledge in a neural network. arXiv preprint: arXiv:1503.02531 2(7)

  15. Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE pp 1–8

  16. Huang G, Sun Y, Liu Z, et al (2016) Deep networks with stochastic depth. In: European conference on computer vision, Springer, pp 646–661

  17. Huang G, Liu Z, Van Der Maaten L, et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  18. Hugo T, Cord M, Matthijs D, et al (2021) Training data-efficient image transformers & distillation through attention. In: ICML

  19. Kim B, Lee J, Lee S, et al (2022) Tricubenet: 2D Kernel-based object representation for weakly-occluded oriented object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 167–176

  20. Kim JH, Choo W, Song HO (2020) Puzzle mix: exploiting saliency and local statistics for optimal mixup. In: International conference on machine learning, PMLR, pp 5275–5285

  21. Kim JH, Choo W, Jeong H, et al (2021) Co-mixup: saliency guided joint mixup with supermodular diversity. arXiv preprint: arXiv:2102.03065

  22. Krizhevsky A (2009) Learning multiple layers of features from tiny images. Master’s thesis, University of Tront

  23. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  24. LeCun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551

    Article  Google Scholar 

  25. LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  26. Lee CY, Xie S, Gallagher P, et al (2015) Deeply-supervised nets. Artif Intell Stat PMLR, pp 562–570

  27. Lewy D, Mańdziuk J (2023) An overview of mixing augmentation methods and augmentation strategies. Artif Intell Rev 56(3):2111–2169

    Article  Google Scholar 

  28. Li X, Xie X, Yu M, et al (2023) Gradient corner pooling for keypoint-based object detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 1460–1467

  29. Liu J, Liu B, Zhou H, et al (2022a) Tokenmix: rethinking image mixing for data augmentation in vision transformers. In: European conference on computer vision, Springer, pp 455–471

  30. Liu W, Ren G, Yu R, et al (2022b) Image-adaptive yolo for object detection in adverse weather conditions. In: Proceedings of the AAAI conference on artificial intelligence, pp 1792–1800

  31. Liu Z, Lin Y, Cao Y, et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  32. Liu Z, Li S, Wu D et al (2022) Automix: unveiling the power of mixup for stronger classifiers. In: Part XXIV (ed) Computer vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings. Springer, pp 441–458

  33. Ma X, Huang H, Wang Y, et al (2020) Normalized loss functions for deep learning with noisy labels. In: International conference on machine learning, PMLR, pp 6543–6553

  34. Mondal A (2019) Neuro-probabilistic model for object tracking. Pattern Anal Appl 22:1609–1628

    Article  MathSciNet  Google Scholar 

  35. Montabone S, Soto A (2010) Human detection using a mobile platform and novel features derived from a visual saliency mechanism. Image Vis Comput 28(3):391–402

    Article  Google Scholar 

  36. Narasimhan SG, Nayar SK (2002) Vision and the atmosphere. Int J Comput Vis 48(3):233

    Article  Google Scholar 

  37. Ning X, Tian W, He F et al (2023) Hyper-sausage coverage function neuron model and learning algorithm for image classification. Pattern Recognit 136:109216

    Article  Google Scholar 

  38. Qin X, Zhang Z, Huang C, et al (2019) Basnet: boundary-aware salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7479–7489

  39. Qin Z, Zhou S, Wang L, et al (2023) Motiontrack: learning robust short-term and long-term motions for multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17939–17948

  40. Ren S, He K, Girshick R, et al (2015) Faster r-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28

  41. Romero A, Ballas N, Kahou SE, et al (2014) Fitnets: hints for thin deep nets. arXiv preprint: arXiv:1412.6550

  42. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  43. Sam D, Kolter JZ (2023) Losses over labels: weakly supervised learning via direct loss construction. In: Proceedings of the AAAI conference on artificial intelligence, pp 9695–9703

  44. Selvaraju RR, Cogswell M, Das A, et al (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  45. Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 30

  46. Springenberg JT, Dosovitskiy A, Brox T, et al (2014) Striving for simplicity: the all convolutional net. arXiv preprint: arXiv:1412.6806

  47. Suzuki T (2022) Teachaugment: Data augmentation optimization using teacher knowledge. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10904–10914

  48. Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  49. Takahashi R, Matsubara T, Uehara K (2019) Data augmentation using random image cropping and patching for deep CNNs. IEEE Trans Circuits Syst Video Technol 30(9):2917–2931

    Article  Google Scholar 

  50. Tompson J, Goroshin R, Jain A, et al (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 648–656

  51. Uddin A, Monira M, Shin W, et al (2020) Saliencymix: a saliency guided data augmentation strategy for better regularization. arXiv preprint: arXiv:2006.01791

  52. Venkataramanan S, Kijak E, Amsaleg L, et al (2022) Alignmixup: improving representations by interpolating aligned features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19174–19183

  53. Verma V, Lamb A, Beckham C, et al (2019) Manifold mixup: better representations by interpolating hidden states. In: International conference on machine learning, PMLR, pp 6438–6447

  54. Wang L, Lu H, Ruan X, et al (2015) Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3183–3192

  55. Xu M, Yoon S, Fuentes A, et al (2023) A comprehensive survey of image augmentation techniques for deep learning. Pattern Recognit p 109347

  56. Yang X, Wu J, He L et al (2023) Cpss-fat: a consistent positive sample selection for object detection with full adaptive threshold. Pattern Recognit 141:109627

    Article  Google Scholar 

  57. Yapıcı MM, Tekerek A, Topaloğlu N (2021) Deep learning-based data augmentation method and signature verification system for offline handwritten signature. Pattern Anal Appl 24:165–179

    Article  Google Scholar 

  58. Ye T, Qin W, Zhao Z et al (2023) Real-time object detection network in UAV-vision based on CNN and transformer. IEEE Trans Instrum Meas 72:1–13

    Google Scholar 

  59. Yelmenoglu ED, Celebi N, Tasci T (2022) Saliency detection based on hybrid artificial bee colony and firefly optimization. Pattern Anal Appl 25(4):757–772

    Article  Google Scholar 

  60. Yun S, Han D, Oh SJ, et al (2019) Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6023–6032

  61. Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint: arXiv:1605.07146

  62. Zhang H, Cisse M, Dauphin YN, et al (2017) Mixup: beyond empirical risk minimization. arXiv preprint: arXiv:1710.09412

  63. Zhao R, Ouyang W, Li H, et al (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1265–1274

  64. Zhong Z, Zheng L, Kang G, et al (2020) Random erasing data augmentation. In: Proceedings of the AAAI conference on artificial intelligence, pp 13001–13008

  65. Zhou B, Khosla A, Lapedriza A, et al (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929

  66. Zhou H, Qiao B, Yang L, et al (2023) Texture-guided saliency distilling for unsupervised salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7257–7267

Download references

Acknowledgements

The research reported in this article was supported by the National Natural Science Foundation of China under the Grant No. 61991415 and the Development Project of Ship Situational Intelligent Awareness System under the Grant MC-201920-X01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaorong Xie.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tong, Y., Luo, X., Ma, L. et al. Saliency information and mosaic based data augmentation method for densely occluded object recognition. Pattern Anal Applic 27, 34 (2024). https://doi.org/10.1007/s10044-024-01258-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10044-024-01258-z

Keywords

Navigation