Advertisement

Online Mutual Foreground Segmentation for Multispectral Stereo Videos

  • Pierre-Luc St-CharlesEmail author
  • Guillaume-Alexandre Bilodeau
  • Robert Bergevin
Article
  • 74 Downloads

Abstract

The segmentation of video sequences into foreground and background regions is a low-level process commonly used in video content analysis and smart surveillance applications. Using a multispectral camera setup can improve this process by providing more diverse data to help identify objects despite adverse imaging conditions. The registration of several data sources is however not trivial if the appearance of objects produced by each sensor differs substantially. This problem is further complicated when parallax effects cannot be ignored when using close-range stereo pairs. In this work, we present a new method to simultaneously tackle multispectral segmentation and stereo registration. Using an iterative procedure, we estimate the labeling result for one problem using the provisional result of the other. Our approach is based on the alternating minimization of two energy functions that are linked through the use of dynamic priors. We rely on the integration of shape and appearance cues to find proper multispectral correspondences, and to properly segment objects in low contrast regions. We also formulate our model as a frame processing pipeline using higher order terms to improve the temporal coherence of our results. Our method is evaluated under different configurations on multiple multispectral datasets, and our implementation is available online.

Keywords

Video object segmentation Cosegmentation Multispectral imagery Energy minimization Video signal processing 

Notes

Acknowledgements

This work was supported by NSERC, by FRQ-NT team Grant No. 2014-PR-172083, and by REPARTI (Regroupement pour l’étude des environnements partagés intelligents répartis) FRQ-NT strategic cluster. We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Titan X GPU used for this research. We also thank Chris Holmberg Bahnsen who provided us with the full calibration data needed to rectify the stereo pairs of the VAP trimodal segmentation dataset.

References

  1. Andres, B., Beier, T., & Kappes, J. (2012). OpenGM: A C++ library for discrete graphical models. CoRR abs/1206.0111, http://arxiv.org/abs/1206.0111
  2. Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916.  https://doi.org/10.1109/TPAMI.2010.161.CrossRefGoogle Scholar
  3. Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522.  https://doi.org/10.1109/34.993558.CrossRefGoogle Scholar
  4. Bienkowski, L., Homma, C., Eisler, K., & Boller, C. (2012). Hybrid camera and real-view thermography for nondestructive evaluation. Quantitative Infrared Thermography, 254.Google Scholar
  5. Bilodeau, G. A., Torabi, A., & Morin, F. (2011). Visible and infrared image registration using trajectories and composite foreground images. Image and Vision Computing, 29(1), 41–50.  https://doi.org/10.1016/j.imavis.2010.08.002.CrossRefGoogle Scholar
  6. Bilodeau, G. A., Torabi, A., St-Charles, P. L., & Riahi, D. (2014). Thermal-visible registration of human silhouettes: A similarity measure performance evaluation. Infrared Physics & Technology, 64, 79–86.  https://doi.org/10.1016/j.infrared.2014.02.005.CrossRefGoogle Scholar
  7. Bleyer, M., Rother, C., Kohli, P., Scharstein, D., & Sinha, S. (2011). Object stereo—joint stereo matching and object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3081–3088).  https://doi.org/10.1109/CVPR.2011.5995581.
  8. Bouwmans, T. (2014). Traditional and recent approaches in background modeling for foreground detection: An overview. Computer Science Review, 11, 31–66.  https://doi.org/10.1016/j.cosrev.2014.04.001.CrossRefzbMATHGoogle Scholar
  9. Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.  https://doi.org/10.1109/34.969114.CrossRefGoogle Scholar
  10. Caelles, S., Maninis, K. K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., & Van Gool, L. (2017). One-shot video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition.Google Scholar
  11. Cheng, J., Tsai, Y. H., Wang, S., & Yang, M. H. (2017). SegFlow: Joint learning for video object segmentation and optical flow. In Proceedings of the IEEE international conference on computer vision.Google Scholar
  12. Coiras, E., Santamaria, J., & Miravet, C. (2000). Segment-based registration technique for visual-infrared images. Optical Engineering, 39, 282–289.CrossRefGoogle Scholar
  13. Davis, J. W., & Sharma, V. (2007). Background-subtraction using contour-based fusion of thermal and visible imagery. Computer Vision and Image Understanding, 106(2–3), 162–182.  https://doi.org/10.1016/j.cviu.2006.06.010.CrossRefGoogle Scholar
  14. Djelouah, A., Franco, J. S., Boyer, E., Le Clerc, F., & Perez, P. (2015). Sparse multi-view consistency for object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99, 1.  https://doi.org/10.1109/TPAMI.2014.2385704.
  15. Fix, A., Gruber, A., Boros, E., & Zabih, R. (2011). A graph cut algorithm for higher-order markov random fields. In Proceedings of the IEEE international conference on computer vision (pp. 1020–1027).Google Scholar
  16. Fix, A., Wang, C., & Zabih, R. (2014). A primal-dual algorithm for higher-order multilabel markov random fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1138–1145).Google Scholar
  17. Goyette, N., Jodoin, P. M., Porikli, F., Konrad, J., & Ishwar, P. (2012). Changedetection.net: A new change detection benchmark dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 1–8).  https://doi.org/10.1109/CVPRW.2012.6238919.
  18. Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision (2nd ed.). New York, NY: Cambridge University Press.zbMATHGoogle Scholar
  19. Hoyer, P. O. (2004). Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5(Nov), 1457–1469.MathSciNetzbMATHGoogle Scholar
  20. Hwang, S., Park, J., Kim, N., Choi, Y., & So Kweon, I. (2015). Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1037–1045).Google Scholar
  21. Jain, S.D., Xiong, B., & Grauman, K. (2017) Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos. In Proceedings of the IEEE conference on computer vision and pattern recognition.Google Scholar
  22. Jeong, S., Lee, J., Kim, B., Kim, Y., & Noh, J. (2017). Object segmentation ensuring consistency across multi-viewpoint images. IEEE Transactions on Pattern Analysis and Machine Intelligence.Google Scholar
  23. Ju, R., Ren, T., & Wu, G. (2015). Stereosnakes: contour based consistent object extraction for stereo images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1724–1732).Google Scholar
  24. Kappes, J., Andres, B., Hamprecht, F., Schnorr, C., Nowozin, S., Batra, D., et al. (2013). A comparative study of modern inference techniques for discrete energy minimization problems. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1328–1335).Google Scholar
  25. Kim, S., Min, D., Ham, B., Ryu, S., Do, M. N., & Sohn, K. (2015) DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2103–2112).Google Scholar
  26. Kohli, P., Ladický, L., & Torr, P. H. (2009). Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision, 82(3), 302–324.  https://doi.org/10.1007/s11263-008-0202-0.CrossRefGoogle Scholar
  27. Kolmogorov, V., & Zabih, R. (2001). Computing visual correspondence with occlusions using graph cuts. Proceedings of the IEEE Conference on Computer Vision, 2, 508–515.Google Scholar
  28. Kroeger, T., Timofte, R., Dai, D., Van Gool, L. (2016). Fast optical flow using dense inverse search. In Proceedings of European conference on computer vision (pp. 471–488).Google Scholar
  29. Lempitsky, V., Rother, C., Roth, S., & Blake, A. (2010). Fusion moves for markov random field optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1392–1405.CrossRefGoogle Scholar
  30. Li, C., Wang, X., Zhang, L., Tang, J., Wu, H., & Lin, L. (2017). Weighted low-rank decomposition for robust grayscale-thermal foreground detection. IEEE Transactions on Circuits and Systems for Video Technology, 27(4), 725–738.CrossRefGoogle Scholar
  31. Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., & Suetens, P. (1997). Multimodality image registration by maximization of mutual information. IEEE Transactions on Medical Imaging, 16(2), 187–198.CrossRefGoogle Scholar
  32. Mouats, T., & Aouf, N. (2013). Multimodal stereo correspondence based on phase congruency and edge histogram descriptor. In Proceedings of 16th international conference on information fusion (pp. 1981–1987).Google Scholar
  33. Nguyen, D. L., St-Charles, P. L., & Bilodeau, G. A. (2016). Non-planar infrared-visible registration for uncalibrated stereo pairs. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 63–71).Google Scholar
  34. Palmero, C., Clapés, A., Bahnsen, C., Møgelmose, A., Moeslund, T. B., & Escalera, S. (2016). Multi-modal rgb-depth-thermal human body segmentation. International Journal of Computer Vision, 118(2), 217–239.  https://doi.org/10.1007/s11263-016-0901-x.MathSciNetCrossRefGoogle Scholar
  35. Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., & Sorkine-Hornung, A. (2016). A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 724–732).Google Scholar
  36. Pinggera, P., Breckon, T., & Bischof, H. (2012). On cross-spectral stereo matching using dense gradient features. In Proceedings of British machine vision conference.  https://doi.org/10.5244/C.26.103
  37. Pistarelli, M. D., Sappa, A. D., & Toledo, R. (2013) Multispectral stereo image correspondence. In Computer analysis of images and patterns (pp 217–224). New York: Springer. https://doi.org/10.1007/978-3-642-40246-3_27
  38. Riklin-Raviv, T., Sochen, N., & Kiryati, N. (2008). Shape-based mutual segmentation. International Journal of Computer Vision, 79(3), 231–245.  https://doi.org/10.1007/s11263-007-0115-3.CrossRefzbMATHGoogle Scholar
  39. Rother, C., Kolmogorov, V., & Blake, A. (2004). “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314.  https://doi.org/10.1145/1015706.1015720.CrossRefGoogle Scholar
  40. Rother, C., Minka, T., Blake, A., & Kolmogorov, V. (2006). Cosegmentation of image pairs by histogram matching - incorporating a global constraint into MRFs. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 993–1000).  https://doi.org/10.1109/CVPR.2006.91.
  41. Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., & Westling, P. (2014). High-resolution stereo datasets with subpixel-accurate ground truth. In Proceedings of German conference pattern recognition (pp. 31–42).  https://doi.org/10.1007/978-3-319-11752-2_3
  42. Shechtman, E., & Irani, M. (2007). Matching local self-similarities across images and videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).  https://doi.org/10.1109/CVPR.2007.383198.
  43. St-Charles, P. L., Bilodeau, G. A., & Bergevin, R. (2016). Universal background subtraction using word consensus models. IEEE Transactions on Image Processing, 25(10), 4768–4781.MathSciNetCrossRefGoogle Scholar
  44. St-Charles, P. L., Bilodeau, G. A., & Bergevin, R. (2017). Mutual foreground segmentation with multispectral stereo pairs. In Proceedings of the IEEE conference on computer vision workshops.Google Scholar
  45. Tippetts, B., Lee, D. J., Lillywhite, K., & Archibald, J. (2016). Review of stereo vision algorithms and their suitability for resource-limited systems. Journal of Real-Time Image Processing, 11(1), 5–25.CrossRefGoogle Scholar
  46. Torabi, A., Massé, G., & Bilodeau, G. A. (2012). An iterative integrated framework for thermal-visible image registration, sensor fusion, and people tracking for video surveillance applications. Computer Vision and Image Understanding, 116(2), 210–221.  https://doi.org/10.1016/j.cviu.2011.10.006.CrossRefGoogle Scholar
  47. Tron, R., & Vidal, R. (2007). A benchmark for the comparison of 3-d motion segmentation algorithms. In Proceedings of the IEEE conference on computer vision and pattern recognition.Google Scholar
  48. Vicente, S., Rother, C., & Kolmogorov, V. (2011). Object cosegmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2217–2224).  https://doi.org/10.1109/CVPR.2011.5995530.
  49. Woodford, O., Torr, P., Reid, I., & Fitzgibbon, A. (2009). Global stereo reconstruction under second-order smoothness priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2115–2128.CrossRefGoogle Scholar
  50. Zhang, C., Li, Z., Cai, R., Chao, H., & Rui, Y. (2016). Joint multiview segmentation and localization of RGB-D images using depth-induced silhouette consistency. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4031–4039).Google Scholar
  51. Zhao, J., & Sen-Ching, S. C. (2014). Human segmentation by geometrically fusing visible-light and thermal imageries. Multimedia Tools and Applications, 73(1), 61–89.CrossRefGoogle Scholar
  52. Zhu, H., Meng, F., Cai, J., & Lu, S. (2016). Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation. The Journal of Visual Communication and Image Representation, 34, 12–27.CrossRefGoogle Scholar
  53. Zitová, B., & Flusser, J. (2003). Image registration methods: a survey. Image and Vision Computing, 21(11), 977–1000.  https://doi.org/10.1016/S0262-8856(03)00137-9.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Polytechnique MontréalMontrealCanada
  2. 2.Université LavalQuebecCanada

Personalised recommendations