Learning multi-scale features for foreground segmentation

Lim, Long Ang; Keles, Hacer Yalim

doi:10.1007/s10044-019-00845-9

Learning multi-scale features for foreground segmentation

Short paper
Published: 31 August 2019

Volume 23, pages 1369–1380, (2020)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

2011 Accesses
118 Citations
2 Altmetric
Explore all metrics

Abstract

Foreground segmentation algorithms aim at segmenting moving objects from the background in a robust way under various challenging scenarios. Encoder–decoder-type deep neural networks that are used in this domain recently perform impressive segmentation results. In this work, we propose a variation of our formerly proposed method (Anonymous 2018) that can be trained end-to-end using only a few training examples. The proposed method extends the feature pooling module of FgSegNet by introducing fusion of features inside this module, which is capable of extracting multi-scale features within images, resulting in a robust feature pooling against camera motion, which can alleviate the need of multi-scale inputs to the network. Sample visualizations highlight the regions in the images on which the model is specially focused. It can be seen that these regions are also the most semantically relevant. Our method outperforms all existing state-of-the-art methods in CDnet2014 datasets by an average overall F-measure of 0.9847. We also evaluate the effectiveness of our method on SBI2015 and UCSD Background Subtraction datasets. The source code of the proposed method is made available at https://github.com/lim-anggun/FgSegNet_v2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Lim LA, Keles HY (2018) Foreground segmentation using convolutional neural networks for multiscale feature encoding. Pattern Recogn Lett 112:256–262
Article Google Scholar
Babaee M, Dinh DT, Rigoll G (2017) A deep convolutional neural network for background subtraction. arXiv preprint arXiv:1702.01731
Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561
Barnich O, Van Droogenbroeck M (2011) Vibe: a universal background subtraction algorithm for video sequences. IEEE Trans Image Process 20(6):1709–1724
Article MathSciNet Google Scholar
Basharat A, Gritai A, Shah M (2008) Learning object motion patterns for anomaly detection and improved object detection. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Bianco S, Ciocca G, Schettini R (2017) How far can you get by combining change detection algorithms? In: International conference on image analysis and processing. Springer, Berlin, pp 96–107
Braham M, Van Droogenbroeck M (2016) Deep background subtraction with scene-specific convolutional neural networks. In: 2016 International conference on systems, signals and image processing (IWSSIP). IEEE, pp 1–4
Brutzer S, Höferlin B, Heidemann G (2011) Evaluation of background subtraction techniques for video surveillance. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1937–1944
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611
Cheung SCS, Kamath C (2004) Robust techniques for background subtraction in urban traffic video. Proc SPIE 5308:881–892
Article Google Scholar
Chollet F, et al (2015) Keras. https://keras.io. Accessed 29 Aug 2019
Cinelli LP, Thomaz LA, da Silva AF, da Silva EA, Netto SL (2017) Foreground segmentation for anomaly detection in surveillance videos using deep residual networks. In: Proceedings XXXV Brazilian communication signal processing symposium, pp 914–918
Hofmann M, Tiefenbacher P, Rigoll G (2012) Background segmentation with feedback: the pixel-based adaptive segmenter. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 38–43
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
KaewTraKulPong P, Bowden R (2002) An improved adaptive background mixture model for real-time tracking with shadow detection. Video-based Surveill Syst 1:135–144
Article Google Scholar
Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, NIPS 2012, 3–8 Dec 2012. Nevada, USA, pp 1097–1105
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Lim K, Jang WD, Kim CS (2017) Background subtraction using encoder-decoder structured convolutional neural network. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Maddalena L, Petrosino A (2015) Towards benchmarking scene background initialization. In: International conference on image analysis and processing. Springer, Berlin, pp 469–476
Mahadevan V, Vasconcelos N (2010) Spatiotemporal saliency in dynamic scenes. IEEE Trans Pattern Anal Mach Intell 32(1):171–177. https://doi.org/10.1109/TPAMI.2009.112
Article Google Scholar
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Article Google Scholar
Porikli F, Tuzel O (2003) Human body tracking by adaptive background models and mean-shift analysis. In: IEEE international workshop on performance evaluation of tracking and surveillance, pp 1–9
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 234–241
Sakkos D, Liu H, Han J, Shao L (2017) End-to-end video background subtraction with 3d convolutional neural networks. Multimed Tools Appl 77(17):23023–23041
Article Google Scholar
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 618–626
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. In: IEEE computer society conference on computer vision and pattern recognition, 1999, vol 2. IEEE, pp 246–252
Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 648–656
Ulyanov D, Vedaldi A, Lempitsky VS (2016) Instance normalization: the missing ingredient for fast stylization. CoRR abs/1607.08022
Van Droogenbroeck M, Paquot O (2012) Background subtraction: experiments and improvements for vibe. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 32–37
Wang Y, Jodoin PM, Porikli F, Konrad J, Benezeth Y, Ishwar P (2014) Cdnet 2014: an expanded change detection benchmark dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 387–394
Wang Y, Luo Z, Jodoin PM (2017) Interactive deep learning method for segmenting moving objects. Pattern Recognit Lett 96(Supplement C):66–75. https://doi.org/10.1016/j.patrec.2016.09.014
Article Google Scholar
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Berlin, pp 818–833
Zhu S, Xia L (2015) Human action recognition based on fusion features extraction of adaptive background subtraction and optical flow model. Math Probl Eng 2015:387–464
MathSciNet MATH Google Scholar
Zivkovic Z (2004) Improved adaptive gaussian mixture model for background subtraction. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 2. IEEE, pp 28–31

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their valuable suggestions and comments.

Author information

Authors and Affiliations

Department of Computer Engineering, Ankara University, Ankara, Turkey
Long Ang Lim & Hacer Yalim Keles

Authors

Long Ang Lim
View author publications
You can also search for this author in PubMed Google Scholar
Hacer Yalim Keles
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hacer Yalim Keles.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lim, L.A., Keles, H.Y. Learning multi-scale features for foreground segmentation. Pattern Anal Applic 23, 1369–1380 (2020). https://doi.org/10.1007/s10044-019-00845-9

Download citation

Received: 30 August 2018
Accepted: 22 August 2019
Published: 31 August 2019
Issue Date: August 2020
DOI: https://doi.org/10.1007/s10044-019-00845-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning multi-scale features for foreground segmentation

Abstract

Access this article

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation