Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images

Khan, Salman H.; Bennamoun, Mohammed; Sohel, Ferdous; Togneri, Roberto; Naseem, Imran

doi:10.1007/s11263-015-0843-8

Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images

Published: 03 July 2015

Volume 117, pages 1–20, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Salman H. Khan¹,
Mohammed Bennamoun¹,
Ferdous Sohel¹,
Roberto Togneri² &
…
Imran Naseem³

1367 Accesses
17 Citations
3 Altmetric
Explore all metrics

Abstract

Inexpensive structured light sensors can capture rich information from indoor scenes, and scene labeling problems provide a compelling opportunity to make use of this information. In this paper we present a novel conditional random field (CRF) model to effectively utilize depth information for semantic labeling of indoor scenes. At the core of the model, we propose a novel and efficient plane detection algorithm which is robust to erroneous depth maps. Our CRF formulation defines local, pairwise and higher order interactions between image pixels. At the local level, we propose a novel scheme to combine energies derived from appearance, depth and geometry-based cues. The proposed local energy also encodes the location of each object class by considering the approximate geometry of a scene. For the pairwise interactions, we learn a boundary measure which defines the spatial discontinuity of object classes across an image. To model higher-order interactions, the proposed energy treats smooth surfaces as cliques and encourages all the pixels on a surface to take the same label. We show that the proposed higher-order energies can be decomposed into pairwise sub-modular energies and efficient inference can be made using the graph-cuts algorithm. We follow a systematic approach which uses structured learning to fine-tune the model parameters. We rigorously test our approach on SUN3D and both versions of the NYU-Depth database. Experimental results show that our work achieves superior performance to state-of-the-art scene labeling techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

VOX2BIM+ - A Fast and Robust Approach for Automated Indoor Point Cloud Segmentation and Building Model Generation

Article Open access 30 May 2023

Jan Martens & Jörg Blankenbach

Notes

In this work we set $r=3$ and ${{\varvec{\kappa }}}$ is set to [0.25, 0.75], [0.5, 0.5] and [0.75, 0.25] respectively in each case. This choice is based on the validation set (see Sect. 6.2).
Plane detection code is available at author’s webpage: http://www.csse.uwa.edu.au/~salman.
The development of this section is similar to Kohli et al. (2009). We also used the same notation - wherever possible - to allow the reader to easily sort out differences and commonalities.

References

Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. TPAMI, 33(5), 898–916.
Article Google Scholar
Blake, A., Kohli, P., & Rother, C. (2011). Markov random fields for vision and image processing. Cambridge: The MIT Press.
MATH Google Scholar
Boykov, Y., & Funka-Lea, G. (2006). Graph cuts and efficient nd image segmentation. IJCV, 70(2), 109–131.
Article Google Scholar
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. TPAMI, 23(11), 1222–1239.
Article Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(0885–6125), 5–32.
Article MATH Google Scholar
Cadena, C., & Košecká, J. (2014). Semantic segmentation with heterogeneous sensor coverages.
Carreira, J., & Sminchisescu, C. (2012). Cpmc: Automatic object segmentation using constrained parametric min-cuts. TPAMI, 34(7), 1312–1328.
Article Google Scholar
Couprie, C., Farabet, C., Najman, L., & LeCun, Y.(2013). Indoor semantic segmentation using depth information. ICLR.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, vol 1 (pp 886–893).
Edwards, W., Miles, R. F, Jr, & Von Winterfeldt, D. (2007). Advances in decision analysis: from foundations to applications. Cambridge: Cambridge University Press.
Book Google Scholar
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. TPAMI, 35(8), 1915–1929. doi:10.1109/TPAMI.2012.231.
Article Google Scholar
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. IJCV, 59(2), 167–181.
Article Google Scholar
Fukunaga, K., & Hostetler, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. TIT, 21(1), 32–40.
MathSciNet MATH Google Scholar
Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In IEEE ICCV (pp 1–8).
Gulshan, V., Rother, C., Criminisi, A., Blake, A., & Zisserman, A. (2010). Geodesic star convexity for interactive image segmentation. In IEEE CVPR (pp 3129–3136).
Gupta, S., Arbelaez, P., & Malik, J. (2013), Perceptual organization and recognition of indoor scenes from rgb-d images. In IEEE CVPR (pp. 564–571).
Gupta, S., Girshick, R., Arbeláez. P., & Malik, J. (2014). Learning rich features from rgb-d images for object detection and segmentation. In Computer Vision–ECCV 2014 (pp. 345–360). Springer.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The weka data mining software: An update. ACM SIGKDD, 11(1), 10–18.
Article Google Scholar
Hayat, M., Bennamoun, M., & An, S. (2015). Deep reconstruction models for image set classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(4), 713–727. doi:10.1109/TPAMI.2014.2353635.
Article Google Scholar
He, X., Zemel, R. S., & Carreira-Perpinán, M. A. (2004). Multiscale conditional random fields for image labeling. In IEEE CVPR, vol 2 (pp II–695).
Huang, Q., Han, M., Wu, B., & Ioffe, S. (2011). A hierarchical conditional random field model for labeling and segmenting images of street scenes. In IEEE CVPR (pp. 1953–1960).
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., et al (2011). Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In ACM Proceedings of the 24th annual ACM symposium on User interface software and technology (pp. 559–568).
Jiang, Y., Lim, M., Zheng, C., & Saxena, A. (2012). Learning to place new objects in a scene. IJRR, 31(9), 1021–1043.
Google Scholar
Joachims, T., Finley, T., & Yu, C. N. J. (2009). Cutting-plane training of structural svms. JML, 77(1), 27–59.
Article MATH Google Scholar
Johnson, A. E., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5), 433–449.
Article Google Scholar
Khan, S., Bennamoun, M., Sohel, F., & Togneri, R. (2014a). Automatic feature learning for robust shadow detection. In IEEE CVPR.
Khan, S., He, X., Bennamoun, M., Sohel, F., & Togneri, R. (2015). Separating objects and clutter in indoor scenes. In IEEE CVPR.
Khan, S. H., Bennamoun, M., Sohel, F., & Togneri, R. (2014b). Geometry driven semantic labeling of indoor scenes. In Computer Vision–ECCV 2014 (pp. 679–694). Springer.
Kohli, P., Kumar, M. P., & Torr, P. H. (2007). P3 & beyond: Solving energies with higher order cliques. In IEEE CVPR (pp. 1–8).
Kohli, P., Torr, P. H., et al. (2009). Robust higher order potentials for enforcing label consistency. IJCV, 82(3), 302–324.
Article Google Scholar
Koppula, H. S., Anand, A., Joachims, T., & Saxena ,A. (2011). Semantic labeling of 3d point clouds for indoor scenes. In NIPS (pp. 244–252).
Krähenbühl, P., & Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. In NIPS (pp. 109–117).
Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. (2009). Associative hierarchical crfs for object class image segmentation. In IEEE ICCV (pp. 739–746).
Ladickỳ, L., Russell, C., Kohli, P., & Torr, P. H. (2013). Inference methods for crfs with co-occurrence statistics. In IJCV (pp. 1–13).
Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view rgb-d object dataset. In IEEE ICRA (pp. 1817–1824).
Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). Pylon model for semantic segmentation. In NIPS (pp. 1485–1493).
Li, Y., Tarlow, D., & Zemel, R. (2013). Exploring compositional high order pattern potentials for structured output learning. In IEEE CVPR (pp. 49–56).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.
Article Google Scholar
Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In CVPR (pp. 413–420). doi:10.1109/CVPR.2009.5206537.
Quigley, M., Batra, S., Gould, S., Klingbeil, E., Le, Q., Wellman, A., & Ng, A. Y. (2009). High-accuracy 3d sensing for mobile manipulation: Improving object detection and door opening. In IEEE ICRA (pp. 2816–2822).
Rabbani, T., van Den Heuvel, F., & Vosselmann, G. (2006). Segmentation of point clouds using smoothness constraint. IAPR SSIS, 36(5), 248–253.
Google Scholar
Rao, D., Le, Q. V., Phoka, T., Quigley, M., Sudsang, A., & Ng, A. Y. (2010). Grasping novel objects with depth segmentation. In IEEE IROS (pp. 2578–2585).
Ren, X., Bo, L., & Fox, D. (2012). Rgb-(d) scene labeling: Features and algorithms. In IEEE CVPR (pp. 2759–2766).
Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: Interactive foreground extraction using iterated graph cuts. TOG, ACM, 23, 309–314.
Article Google Scholar
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV, 81(1), 2–23.
Article Google Scholar
Silberman, N., & Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In IEEE ICCV Workshops (pp. 601–608).
Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. In ECCV (pp. 746–760). Springer.
Szummer, M., Kohli, P., & Hoiem, D. (2008). Learning crfs using graph cuts. In ECCV (pp 582–595). Springer.
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In ACM ICML (p 104).
Van De Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In ECCV (pp 334–348). Springer
Von Gioi, R. G., Jakubowicz, J., Morel, J. M., & Randall, G. (2010). Lsd: A fast line segment detector with a false detection control. TPAMI, 32(4), 722–732.
Article Google Scholar
Woodford, O. J., Rother, C., & Kolmogorov, V. (2009). A global perspective on map inference for low-level vision. In IEEE ICCV (pp. 2319–2326).
Xiao, J., Owens, A., & Torralba, A. (2013). Sun3d: A database of big spaces reconstructed using sfm and object labels. In IEEE ICCV
Xiong, X., & Huber, D. (2010). Using context to create semantic 3d models of indoor environments. In BMVC (pp. 45–1).

Download references

Acknowledgments

This research was supported by the IPRS scholarship from The University of Western Australia and the Australian Research Council (ARC) Grants DP110102166, DP150104251 and DE120102960. The authors would especially like to thank the anonymous reviewers and the Associate Editor for their valuable comments and suggestions to improve the quality of the manuscript.

Author information

Authors and Affiliations

School of CSSE, The University of Western Australia, 35 Stirling Highway, Crawley, WA, 6009, Australia
Salman H. Khan, Mohammed Bennamoun & Ferdous Sohel
School of EECE, The University of Western Australia, 35 Stirling Highway, Crawley, WA, 6009, Australia
Roberto Togneri
Department of Engineering, Karachi Institute of Economics and Technology, Karachi, 75190, Pakistan
Imran Naseem

Authors

Salman H. Khan
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Bennamoun
View author publications
You can also search for this author in PubMed Google Scholar
Ferdous Sohel
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Togneri
View author publications
You can also search for this author in PubMed Google Scholar
Imran Naseem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salman H. Khan.

Additional information

Communicated by Derek Hoiem.

Appendix: Disintegration of Higher-Order Energies

In this appendix, we will show how the higher-order energies can be minimized using graph cuts. Since, graph cuts can efficiently minimize submodular functions, we will transform our higher-order energy function (Eq. 9) to a submodular second-order energy function. For the case of both $\alpha \beta $-swap and $\alpha $-expansion move making algorithms, we will explain this transformation and the process of optimal moves computation^{Footnote 3}. All of the previously defined notations are used in the same context and all of the newly introduced symbols are defined in this section. The function that accounts for the number of disagreeing nodes in a clique is defined as:

$$\begin{aligned} n_{\ell }({y}_{\mathbf {c}}) = \sum \limits _{i \in {\mathbf {c}}} w_i^{\ell } \mathbf {1}_{y_i = \ell } \end{aligned}$$

The function $ \mathbf {1}_{y_i = \ell }$ is a zero-one indicator function that returns a unit value when $y_i = \ell $. We suppose here that weights are symmetric for all labels $\ell \in {\mathcal {L}}$ i.e., $w_i^{\ell } = w_i$. Further, for our implementation we set $w_i=1 \;\; \forall i \in {\mathbf {c}}$. This setting satisfies the required constraints for these parameters, i.e.,

$$\begin{aligned} w_i^{\ell } \ge 0 \quad \text {and}\quad \sum \limits _{i \in {\mathbf {c}}} w_i^{\ell } = \# {\mathbf {c}} \;\; \forall \ell \in {\mathcal {L}}. \end{aligned}$$

We define a summation function that adds the weights for a subset $\mathbf {s}$ of ${\mathbf {c}}$,

$$\begin{aligned} W(\mathbf {s}) = \sum \limits _{i\in \mathbf {s}} w_i^{\ell } = \# \mathbf {s} \quad \forall \ell \in {\mathcal {L}}. \end{aligned}$$

1.1 Disintegration of Higher-Order Energies to Second-Order Sub-modular Energies for Swap Moves

Suppose, in a clique ‘${\mathbf {c}}$’, the locations of the active nodes is represented by a set of indices ${\mathbf {c}}_{a}$. The nodes which remain inactive during the move making process are termed the passive nodes. Their locations are denoted by $\bar{{\mathbf {c}}}_{a} = \{{\mathbf {c}}\setminus \forall c_i \in {\mathbf {c}}_{a}\} $. The corresponding set of available moves to the swap move making algorithm are encoded in the form of a vector $\mathbf {t}_{c_a}$. For the sake of a simple demonstration, let us focus on the two class labeling problem i.e., $\ell \in \{0,1\}$. The induced labeling is the combination of the old labeling for the inactive nodes and the new labeling for the active nodes i.e., ${y}^n_c = {y}^{\circ }_{\bar{c}_{a}} \cup T_{\alpha \beta }({y}^{\circ }_{{c}_{a}}, \mathbf {t}_{c_a})$. If ${y}^n_c$ denotes the new labeling induced by move $\mathbf {t}_{c_a}$ and ${y}^{\circ }_c$ denotes the old labeling, we can define the energy of move for an $\alpha \beta $ swap as:

$$\begin{aligned}&\psi ^m_{{\mathbf {c}}}(\mathbf {t}_{c_a}) = \psi _{{\mathbf {c}}}({y}^n_c) = \psi _{{\mathbf {c}}}({y}^{\circ }_{\bar{c}_{a}} \cup T_{\alpha \beta }({y}^{\circ }_{c_{a}}, \mathbf {t}_{c_a}))\\&\quad = \underset{\ell \in {\mathcal {L}}}{{\text {min}}} \left\{ \lambda _{max} - (\lambda _{max} - \lambda _{\ell })\right. \\&\left. \quad \text {exp} {\left( - \frac{W({\mathbf {c}}) - n_{\ell }({y}^{\circ }_{\bar{c}_{a}} \cup T_{\alpha \beta }({y}^{\circ }_{c_{a}}, \mathbf {t}_{c_a}))}{Q_{\ell }}\right) }\right\} \\&\quad = \underset{\ell \in {\mathcal {L}}}{{\text {min}}} \left\{ \lambda _{max} - (\lambda _{max} - \lambda _{\alpha })\text {exp} { \left( - \frac{W({\mathbf {c}}) - n_{0}^m(\mathbf {t}_{c_a})}{Q_{\alpha }}\right) }, \right. \\&\left. \quad \lambda _{max} - (\lambda _{max} - \lambda _{\beta })\text {exp} {\left( - \frac{W({\mathbf {c}} - {\mathbf {c}}_a) + n_{0}^m(\mathbf {t}_{c_a})}{Q_{\beta }}\right) } \right\} , \end{aligned}$$

where, $W({\mathbf {c}}_a) = n_0^m(\mathbf {t}_{c_a}) + n_1^m(\mathbf {t}_{c_a})$. The minimization operation in the above equation can be replaced by defining a piecewise function:

$$\begin{aligned} \psi _{{\mathbf {c}}}^m(\mathbf {t}_{c_a}) = \left\{ \begin{array}{l} \lambda _{max} - (\lambda _{max} - \lambda _{\alpha })\text {exp} { \left( - \frac{W({\mathbf {c}}) - n_{0}^m(\mathbf {t}_{c_a})}{Q_{\alpha }}\right) } \\ \qquad \text {if}\quad n_{0}^m(\mathbf {t}_{c_a}) > \varrho _{\alpha \beta }\left( \frac{W({\mathbf {c}})}{Q_{\alpha }} - \frac{W({\mathbf {c}} - {\mathbf {c}}_a)}{Q_{\beta }} \right. \\ \qquad \qquad \qquad \qquad \left. - \log \left( \frac{\lambda _{max} - \lambda _{\alpha }}{\lambda _{max} - \lambda _{\beta }}\right) \right) ,\\ \lambda _{max} - (\lambda _{max} - \lambda _{\beta })\text {exp} {\left( - \frac{W({\mathbf {c}} - {\mathbf {c}}_a) + n_{0}^m(\mathbf {t}_{c_a})}{Q_{\beta }}\right) }\\ \qquad \text {if}\quad n_{0}^m(\mathbf {t}_{c_a}) < \varrho _{\alpha \beta }\left( \frac{W({\mathbf {c}})}{Q_{\alpha }} - \frac{W({\mathbf {c}} - {\mathbf {c}}_a)}{Q_{\beta }} \right. \\ \qquad \qquad \qquad \qquad \left. - \log \left( \frac{\lambda _{max} - \lambda _{\alpha }}{\lambda _{max} - \lambda _{\beta }}\right) \right) , \end{array}\right. \end{aligned}$$

where, $\varrho _{\alpha \beta } = \frac{Q_{\alpha }Q_{\beta }}{Q_{\alpha }+Q_{\beta }}$. The function $n^m_{\ell }(\mathbf {t}_{c_a})$ is defined as:

$$\begin{aligned} n^m_{\ell }(\mathbf {t}_{c_a}) = \sum \limits _{i \in {\mathbf {c}}_{a}}w_i \delta _{\ell }(\mathbf {t}_i). \end{aligned}$$

From Theorem 1 in Kohli et al. (2009), the energy defined above can be transformed to the submodular quadratic pseudo-boolean function with two binary meta variables. In this form the $\alpha \beta $-swap algorithm can be used for minimizing the energy function.

1.2 Disintegration of Higher-Order Energies to Second-Order Sub-modular Energies for Expansion Moves

Suppose, in a clique ‘ c’, the location of the nodes with label $\ell $ is represented by a set of indices ${\mathbf {c}}_{\ell }$. The current labeling solution is denoted by ${y}_{{\mathbf {c}}}^{\circ }$.

If the dominant label is denoted by $d \in {\mathcal {L}}$ in the current labeling ${y}_{{\mathbf {c}}}^{\circ }$ is,

$$\begin{aligned} \text {s.t} \quad W({\mathbf {c}}_d) > W({\mathbf {c}}) - Q_d \quad \text {where} \;d \ne \alpha , \end{aligned}$$

there must be one dominant label:

$$\begin{aligned}&Q_a + Q_b < W({\mathbf {c}}) \qquad \forall a \ne b \in {\mathcal {L}},\\&\begin{array}{l} \psi _{{\mathbf {c}}}^{m}(t_c) = \psi _{{\mathbf {c}}} (T_{\alpha }({y}_c^{\circ }, t_c)) \\ = \underset{\ell \in {\mathcal {L}}}{{\text {min}}} \left\{ \lambda _{max} - (\lambda _{max} - \lambda _{\alpha }) {\text {exp}} \left( - \frac{\sum \limits _{i\in c} w_i t_i}{Q_{\alpha }}\right) , \right. \\ \left. \lambda _{max} - (\lambda _{max} - \lambda _{d}) \text {exp} \left( - \frac{W({\mathbf {c}}) - \sum \limits _{i\in c} w_i t_i}{Q_{d}}\right) \right\} . \end{array} \end{aligned}$$

The minimization operator in the above function can be replaced by a piecewise function:

$$\begin{aligned} \psi _{{\mathbf {c}}}^m(\mathbf {t}_{c}, \mathbf {t}_{c_d}) = \left\{ \begin{array}{l} \lambda _{max} - (\lambda _{max} - \lambda _{\alpha })\text {exp} {\left( - \frac{n_{0}^m(\mathbf {t}_{c})}{Q_{\alpha }}\right) } \\ \qquad \text {if}\quad n_{0}^m(\mathbf {t}_{c}) > \varrho _{\alpha d}\left( \frac{W({\mathbf {c}})}{Q_{\alpha }} \right. \\ \qquad \qquad \qquad \qquad \left. - \log \left( \frac{\lambda _{max} - \lambda _{\alpha }}{\lambda _{max} - \lambda _{d}}\right) \right) , \\ \lambda _{max} - (\lambda _{max} - \lambda _{d})\text {exp} {\left( - \frac{W({\mathbf {c}}) - n_{0}^m(\mathbf {t}_{c_d})}{Q_{d}}\right) }\\ \qquad \text {if}\quad n_{0}^m(\mathbf {t}_{c}) < \varrho _{\alpha d}\left( \frac{W({\mathbf {c}})}{Q_{\alpha }} \right. \\ \qquad \qquad \qquad \qquad \left. - \log \left( \frac{\lambda _{max} - \lambda _{\alpha }}{\lambda _{max} - \lambda _{d}}\right) \right) , \end{array}\right. \end{aligned}$$

where, $\varrho _{\alpha d} = \frac{Q_{\alpha }Q_{d}}{Q_{\alpha }+Q_{d}}$ and function $n^m_{\ell }(\mathbf {t}_c)$ is defined as:

$$\begin{aligned} n^m_{\ell }(\mathbf {t}_c) = \sum \limits _{i\in {\mathbf {c}}}w_i \delta _{\ell }(\mathbf {t}_i). \end{aligned}$$

From Theorem 2 in Kohli et al. (2009), the energy defined above can be transformed to the submodular quadratic pseudo-boolean function with two binary meta variables. In this form the $\alpha $-expansion algorithm can be used for minimizing the energy function.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, S.H., Bennamoun, M., Sohel, F. et al. Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images. Int J Comput Vis 117, 1–20 (2016). https://doi.org/10.1007/s11263-015-0843-8

Download citation

Received: 23 November 2014
Accepted: 24 June 2015
Published: 03 July 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11263-015-0843-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

VOX2BIM+ - A Fast and Robust Approach for Automated Indoor Point Cloud Segmentation and Building Model Generation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Disintegration of Higher-Order Energies

1.1 Disintegration of Higher-Order Energies to Second-Order Sub-modular Energies for Swap Moves

1.2 Disintegration of Higher-Order Energies to Second-Order Sub-modular Energies for Expansion Moves

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

VOX2BIM+ - A Fast and Robust Approach for Automated Indoor Point Cloud Segmentation and Building Model Generation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Disintegration of Higher-Order Energies

Appendix: Disintegration of Higher-Order Energies

1.1 Disintegration of Higher-Order Energies to Second-Order Sub-modular Energies for Swap Moves

1.2 Disintegration of Higher-Order Energies to Second-Order Sub-modular Energies for Expansion Moves

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation