Temporal Semantic Motion Segmentation Using Spatio Temporal Optimization

Haque, Nazrul; Reddy, N. Dinesh; Krishna, Madhava

doi:10.1007/978-3-319-78199-0_7

Nazrul Haque¹⁵,
N. Dinesh Reddy¹⁶ &
Madhava Krishna¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10746))

Included in the following conference series:

International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition

1052 Accesses
1 Citations

Abstract

Segmenting moving objects in a video sequence has been a challenging problem and critical to outdoor robotic navigation. While recent literature has laid focus on regularizing object labels over a sequence of frames, exploiting the spatio-temporal features for motion segmentation has been scarce. Particularly in real world dynamic scenes, existing approaches fail to exploit temporal consistency in segmenting moving objects with large camera motion.

In this paper, we present an approach for exploiting semantic information and temporal constraints in a joint framework for motion segmentation in a video. We propose a formulation for inferring per-frame joint semantic and motion labels using semantic potentials from dilated CNN framework and motion potentials from depth and geometric constraints. We integrate the potentials obtained into a 3D (space-time) fully connected CRF framework with overlapping/connected blocks. We solve for a feature space embedding in the spatio-temporal space by enforcing temporal constraints using optical flow and long term tracks as a least-squares problem. We evaluate our approach on outdoor driving benchmarks - KITTI and Cityscapes dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Badrinarayanan, V., Handa, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. arXiv preprint arXiv:1505.07293 (2015)
Chen, T., Lu, S.: Object-level motion detection from moving cameras. IEEE Trans. Circ. Syst. Video Technol. 27, 2333–2343 (2016)
Article Google Scholar
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
Google Scholar
Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. PAMI 37, 1558–1570 (2015)
Article Google Scholar
Fragkiadaki, K., Arbeláez, P., Felsen, P., Malik, J.: Learning to segment moving objects in videos. In: CVPR. IEEE (2015)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: CVPR (2012)
Google Scholar
Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: Dense 3D reconstruction in real-time. In: Intelligent Vehicles Symposium (IV) (2011)
Google Scholar
Haque, N., Reddy, D., Krishna, M.: Joint semantic and motion segmentation for dynamic scenes using deep convolutional networks. In: VISAPP (2017)
Google Scholar
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. PAMI 30, 328–341 (2008)
Article Google Scholar
Huang, S.J., Yu, Y., Zhou, Z.H.: Multi-label hypothesis reuse. In: KDD. ACM (2012)
Google Scholar
Jain, S., Madhav Govindu, V.: Efficient higher-order clustering on the grassmann manifold. In: ICCV, pp. 3511–3518 (2013)
Google Scholar
Koltun, V.: Efficient inference in fully connected CRFS with Gaussian edge potentials. In: NIPS (2011)
Google Scholar
Kundu, A., Krishna, K., Sivaswamy, J.: Moving object detection by multi-view geometric techniques from a single camera mounted robot. In: IROS (2009)
Google Scholar
Kundu, A., Vineet, V., Koltun, V.: Feature space optimization for semantic video segmentation. In: CVPR (2016)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: ICCV, pp. 3431–3440 (2015)
Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)
Google Scholar
Reddy, N.D., Singhal, P., Chari, V., Krishna, K.M.: Dynamic body VSLAM with semantic constraints. In: IROS (2015)
Google Scholar
Reddy, N.D., Singhal, P., Krishna, K.M.: Semantic motion segmentation using dense CRF formulation. In: ICVGIP (2014)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Ros, G., Ramos, S., Granados, M., Bakhtiary, A., Vazquez, D., Lopez, A.: Vision-based offline-online perception paradigm for autonomous driving. In: WACV (2015)
Google Scholar
Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: CVPR. IEEE (2008)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Sundaram, N., Brox, T., Keutzer, K.: Dense point trajectories by GPU-accelerated large displacement optical flow. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 438–451. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_32
Chapter Google Scholar
Tourani, S., Krishna, K.M.: Using in-frame shear constraints for monocular motion segmentation of rigid bodies. JIRS 82(2), 237–255 (2016)
Google Scholar
Vertens, J., Valada, A., Burgard, W.: SMSnet: semantic motion segmentation using deep convolutional neural networks. In: IROS (2017)
Google Scholar
Vidal, R., Sastry, S.: Optimal segmentation of dynamic scenes from two perspective views. In: CVPR, vol. 2 (2003)
Google Scholar
Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: Deepflow: large displacement optical flow with deep matching. In: ICCV (2013)
Google Scholar
Yi, S., Li, H., Wang, X.: Pedestrian behavior understanding and prediction with deep neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 263–279. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_16
Chapter Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.: Conditional random fields as recurrent neural networks. In: ICCV, pp. 1529–1537 (2015)
Google Scholar
Zografos, V., Nordberg, K.: Fast and accurate motion segmentation using linear combination of views. In: BMVC (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

International Institute of Information Technology, Hyderabad, India
Nazrul Haque & Madhava Krishna
Robotic Institute, Carnegie Mellon University, Pittsburgh, USA
N. Dinesh Reddy

Authors

Nazrul Haque
View author publications
You can also search for this author in PubMed Google Scholar
N. Dinesh Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Madhava Krishna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nazrul Haque .

Editor information

Editors and Affiliations

Ca’ Foscari University of Venice, Venice, Italy
Marcello Pelillo
University of York, York, United Kingdom
Edwin Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haque, N., Reddy, N.D., Krishna, M. (2018). Temporal Semantic Motion Segmentation Using Spatio Temporal Optimization. In: Pelillo, M., Hancock, E. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2017. Lecture Notes in Computer Science(), vol 10746. Springer, Cham. https://doi.org/10.1007/978-3-319-78199-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-78199-0_7
Published: 22 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78198-3
Online ISBN: 978-3-319-78199-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics