Abstract
Foreground segmentation algorithms aim at segmenting moving objects from the background in a robust way under various challenging scenarios. Encoder–decoder-type deep neural networks that are used in this domain recently perform impressive segmentation results. In this work, we propose a variation of our formerly proposed method (Anonymous 2018) that can be trained end-to-end using only a few training examples. The proposed method extends the feature pooling module of FgSegNet by introducing fusion of features inside this module, which is capable of extracting multi-scale features within images, resulting in a robust feature pooling against camera motion, which can alleviate the need of multi-scale inputs to the network. Sample visualizations highlight the regions in the images on which the model is specially focused. It can be seen that these regions are also the most semantically relevant. Our method outperforms all existing state-of-the-art methods in CDnet2014 datasets by an average overall F-measure of 0.9847. We also evaluate the effectiveness of our method on SBI2015 and UCSD Background Subtraction datasets. The source code of the proposed method is made available at https://github.com/lim-anggun/FgSegNet_v2.
References
Lim LA, Keles HY (2018) Foreground segmentation using convolutional neural networks for multiscale feature encoding. Pattern Recogn Lett 112:256–262
Babaee M, Dinh DT, Rigoll G (2017) A deep convolutional neural network for background subtraction. arXiv preprint arXiv:1702.01731
Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561
Barnich O, Van Droogenbroeck M (2011) Vibe: a universal background subtraction algorithm for video sequences. IEEE Trans Image Process 20(6):1709–1724
Basharat A, Gritai A, Shah M (2008) Learning object motion patterns for anomaly detection and improved object detection. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Bianco S, Ciocca G, Schettini R (2017) How far can you get by combining change detection algorithms? In: International conference on image analysis and processing. Springer, Berlin, pp 96–107
Braham M, Van Droogenbroeck M (2016) Deep background subtraction with scene-specific convolutional neural networks. In: 2016 International conference on systems, signals and image processing (IWSSIP). IEEE, pp 1–4
Brutzer S, Höferlin B, Heidemann G (2011) Evaluation of background subtraction techniques for video surveillance. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1937–1944
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611
Cheung SCS, Kamath C (2004) Robust techniques for background subtraction in urban traffic video. Proc SPIE 5308:881–892
Chollet F, et al (2015) Keras. https://keras.io. Accessed 29 Aug 2019
Cinelli LP, Thomaz LA, da Silva AF, da Silva EA, Netto SL (2017) Foreground segmentation for anomaly detection in surveillance videos using deep residual networks. In: Proceedings XXXV Brazilian communication signal processing symposium, pp 914–918
Hofmann M, Tiefenbacher P, Rigoll G (2012) Background segmentation with feedback: the pixel-based adaptive segmenter. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 38–43
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
KaewTraKulPong P, Bowden R (2002) An improved adaptive background mixture model for real-time tracking with shadow detection. Video-based Surveill Syst 1:135–144
Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, NIPS 2012, 3–8 Dec 2012. Nevada, USA, pp 1097–1105
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lim K, Jang WD, Kim CS (2017) Background subtraction using encoder-decoder structured convolutional neural network. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Maddalena L, Petrosino A (2015) Towards benchmarking scene background initialization. In: International conference on image analysis and processing. Springer, Berlin, pp 469–476
Mahadevan V, Vasconcelos N (2010) Spatiotemporal saliency in dynamic scenes. IEEE Trans Pattern Anal Mach Intell 32(1):171–177. https://doi.org/10.1109/TPAMI.2009.112
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Porikli F, Tuzel O (2003) Human body tracking by adaptive background models and mean-shift analysis. In: IEEE international workshop on performance evaluation of tracking and surveillance, pp 1–9
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 234–241
Sakkos D, Liu H, Han J, Shao L (2017) End-to-end video background subtraction with 3d convolutional neural networks. Multimed Tools Appl 77(17):23023–23041
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 618–626
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. In: IEEE computer society conference on computer vision and pattern recognition, 1999, vol 2. IEEE, pp 246–252
Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 648–656
Ulyanov D, Vedaldi A, Lempitsky VS (2016) Instance normalization: the missing ingredient for fast stylization. CoRR abs/1607.08022
Van Droogenbroeck M, Paquot O (2012) Background subtraction: experiments and improvements for vibe. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 32–37
Wang Y, Jodoin PM, Porikli F, Konrad J, Benezeth Y, Ishwar P (2014) Cdnet 2014: an expanded change detection benchmark dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 387–394
Wang Y, Luo Z, Jodoin PM (2017) Interactive deep learning method for segmenting moving objects. Pattern Recognit Lett 96(Supplement C):66–75. https://doi.org/10.1016/j.patrec.2016.09.014
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Berlin, pp 818–833
Zhu S, Xia L (2015) Human action recognition based on fusion features extraction of adaptive background subtraction and optical flow model. Math Probl Eng 2015:387–464
Zivkovic Z (2004) Improved adaptive gaussian mixture model for background subtraction. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 2. IEEE, pp 28–31
Acknowledgements
We would like to thank the anonymous reviewers for their valuable suggestions and comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lim, L.A., Keles, H.Y. Learning multi-scale features for foreground segmentation. Pattern Anal Applic 23, 1369–1380 (2020). https://doi.org/10.1007/s10044-019-00845-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-019-00845-9