HSNet: hierarchical semantics network for scene parsing

Tan, Xin; Xu, Jiachen; Cao, Ying; Xu, Ke; Ma, Lizhuang; Lau, Rynson W. H.

doi:10.1007/s00371-022-02477-3

HSNet: hierarchical semantics network for scene parsing

Original article
Published: 03 May 2022

Volume 39, pages 2543–2554, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Xin Tan ORCID: orcid.org/0000-0001-9346-1196^1,2,
Jiachen Xu¹,
Ying Cao²,
Ke Xu²,
Lizhuang Ma¹ &
…
Rynson W. H. Lau²

337 Accesses
1 Altmetric
Explore all metrics

Abstract

Scene parsing is one of the fundamental tasks in computer vision. Humans tend to perceive a scene in a hierarchical manner, i.e., first identifying the coarse category (e.g., vehicle) of a group of objects and then the fine category (e.g., bicycle, truck or car) of each of them. Despite recent tremendous progress on scene parsing, such a hierarchical semantics prior (HSP) has not been explicitly exploited. In this paper, we aim to introduce the HSP into scene parsing, by proposing a hierarchical semantics network (HSNet). Our key contribution is a bidirectional cross-level feature matching framework, which enables us to learn multi-level, hierarchy-aware features via forward feature transfer and backward feature regularization. In the forward stage, we train a coarse-to-fine module to learn fine-category features that explicitly encode hierarchical semantics information. In the backward stage, we introduce a fine-to-coarse module to collapse fine-category features to coarse-category features that are used to regularize the feature learning of our network. Experimental results on Cityscapes and Pascal Context show that our method achieves state-of-the-art performances. Our visualization also shows that our learned features capture semantic hierarchy favorably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Ahmed, K.: Mohammad Haris Baig, and Lorenzo Torresani. Network of experts for large-scale image categorization, In ECCV (2016)
Bilal, A., Jourabloo, A., Ye, M., Liu, X., Ren, L.: Do convolutional neural networks learn class hierarchy? TVCG 24(1), 152–162 (2017)
Google Scholar
Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In ICML, pp. 111–118 (2010)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A., Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI, (2018)
Chen, T.-W., Wang, D., Tao, W., Wen, D., Yin, L., Ito, T., Osa, K., Kato, M.: Cassod-net: Cascaded and separable structures of dilated convolution for embedded vision systems and applications. In CVPR, pp. 3182–3190 (2021)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele ,B.: The cityscapes dataset for semantic urban scene understanding. In CVPR, (2016)
Ding, H., Jiang, X., Liu A.Q., Thalmann, N.M., and Wang G.: Boundary-aware feature propagation for scene segmentation, In ICCV (2019)
Ding, H., Jiang, X., Shuai, B., Liu, A.Q., Wang, G.: Semantic correlation promoted shape-variant context for segmentation. In CVPR, pp. 8885–8894, (2019)
Fan, C., Peng, Y., Peng, S., Zhang, H., Wu, Y., Kwong, S.: Detection of train driver fatigue and distraction based on forehead eeg: A time-series ensemble learning method. IEEE Transactions on Intelligent Transportation Systems, pp. 1–11, (2021)
Fu, J., Liu, J., Tian, H., Fang, Z., Lu, H.: Dual attention network for scene segmentation. CVPR, (2019)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H. : Dual attention network for scene segmentation. In CVPR, pp. 3146–3154 (2019)
He, J., Deng, Z., Qiao, Y.: Dynamic multi-scale filters for semantic segmentation. In ICCV, pp. 3562–3572 (2019)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: CVPR (2016)
Hu, H., Ji, D., Gan, W., Bai, S., Wu, W., Yan, J.: Class-wise dynamic graph convolution for semantic segmentation. In ECCV, (2020)
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Wenyu L.: Criss-cross attention for semantic segmentation. ICCV, Ccnet (2019)
Ji, J., Shi, R., Li, S., Chen, P., Miao, Q.: Encoder-decoder with cascaded crfs for semantic segmentation. IEEE Transactions on Circuits and Systems for Video Technology, (2020)
Jiang, M., Zhai, F., Kong, J.: Sparse attention module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. The Visual Computer, pp. 1–16 (2021)
Jiao, J., Wei, Y., Jie, Z., Shi, H., Lau, R., Huang, W.H., Thomas S.: Geometry-aware distillation for indoor semantic segmentation. In CVPR, pp. 2869–2878 (2019)
Kruger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., Rodriguez-Sanchez, A.J., Wiskott, L.: Deep hierarchies in the primate visual cortex: What can we learn for computer. TPAMI 35(8), 1847–1871 (2012)
Article Google Scholar
Lee, K., Lee, K., Min, K., Zhang, Y., Shin, J., Lee, H.: Hierarchical novelty detection for visual object recognition. In CVPR, (2018)
Li, X., Yang, Y., Zhao, Q., Shen, T., Lin, Z., Liu, H.: Spatial pyramid based graph reasoning for semantic segmentation. In CVPR, pp. 8950–8959 (2020)
Lin, D., Ji, Y., Lischinski, D., Cohen-Or, D., Huang, H.: Multi-scale context intertwining for semantic segmentation. In ECCV, (2018)
Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In CVPR, pp. 1925–1934 (2017)
Liu, Mingyuan, Schonfeld, Dan, Tang, Wei: Exploit visual dependency relations for semantic segmentation. In CVPR, pages 9726–9735 (2021)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In CVPR, pp. 3431–3440 (2015)
Mottaghi, R., Chen, X., Liu, X., Cho, N.-G., Lee, S.-W., Fidler, S., Urtasun, R., Yuille, A.: The role of context for object detection and semantic segmentation in the wild. In CVPR (2014)
Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters—improve semantic segmentation by global convolutional network. In CVPR, pp. 1743 – 1751 (2017)
Peng, Y., Lin, Y., Fan, C., Qian, X., Diya, X., Yi, S., Zhang, H., Wang, K.: Passenger overall comfort in high-speed railway environments based on eeg: Assessment and degradation mechanism. Build. Environ. 210, 108711 (2022)
Article Google Scholar
Takikawa, T., Acuna, D., Jampani, V., Sanja Fidler: Gated shape cnns for semantic segmentation. In ICCV, Gated-scnn (2019)
Tamaazousti, Y., Le B., Hervé, Céline H.: Multi categorical-level networks to generate more discriminating features. In CVPR, Mucale-net (2017)
Tan, X., Xu, K., Cao, Y., Zhang, Y., Ma, L., Lau, R.W.H.: Night-time scene parsing with a large real dataset. IEEE Trans. Image Process. 30, 9085–9098 (2021)
Article Google Scholar
Wang, D., Guoqing, H., Lyu, C.: Frnet: an end-to-end feature refinement neural network for medical image segmentation. Visual Comput. 37, 1101–1112 (2021)
Article Google Scholar
Wang, K., Yang, J., Yuan, S., Li, M.: A lightweight network with attention decoder for real-time semantic segmentation. Visual Computer, pp. 1–11, (2021)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In CVPR, pp. 7794–7803 (2018)
Wang, Y., Zhou, W., Jiang, T., Bai, X., Xu, Y.: Intra-class feature variation distillation for semantic segmentation. In ECCV, pp. 346–362 (2020)
Wu, T., Tang, S., Zhang, R., Guo, G.: Consensus feature network for scene parsing. IEEE Transactions on Multimedia, (2021)
Zifeng, W., Shen, C., Anton Van Den H.: Revisiting the resnet model for visual recognition. Pattern Recognition, Wider or deeper (2019)
Xu, K., Tian, X., Yang, X., Yin, B., Lau, R.W.H.: Intensity-aware single-image deraining with semantic and color regularization. IEEE TIP (2021)
Xu, K., Wang, X., Yang, X., He, S., Zhang, Q., Yin, B., Wei, X., Lau, R.W.H.: Efficient image super-resolution integration. The Visual Computer, (2018)
Yan, Z., Zhang, H., Piramuthu, R., Jagadeesh, V., DeCoste, D., Di, W., Yu, Y.: Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition. In ICCV, pp. 2740–2748 (2015)
Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: Denseaspp for semantic segmentation in street scenes. In CVPR, (2018)
Ye, X., Wang, H., Li, Y.: Image content-dependent steerable kernels. The Visual Computer, pp. 1–12 (2021)
Yu, C., Wang, J., Gao, C., Yu, G., Shen, C., Sang, N.: Context prior for scene segmentation. In CVPR, pp. 12416–12425 (2020)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In ECCV, pp. 325–341 (2018)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In CVPR (2018)
Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In ECCV, (2019)
Zhang, F., Chen, Y., Li, Z., Hong, Z., Liu, J., Ma, F., Han, J., Errui D.: Attentional class feature network for semantic segmentation. In ICCV, Acfnet (2019)
Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., Agrawal, A.: Context encoding for semantic segmentation. In CVPR, pp. 7151–7160 (2018)
Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., Torr, P.H.S.: Dual graph convolutional network for semantic segmentation. In BMVC, (2019)
Zhang, Z., Zhang, X., Peng, C., Xue, X., Jian S.: Enhancing feature fusion for semantic segmentation. In ECCV, Exfuse (2018)
Zhao, H., Jianping, S., Xiaogang W., Jiaya J.: Pyramid scene parsing network. In CVPR, Xiaojuan Qi (2017)
Zheng, C., Wang, J., Chen, W., Xingming, W.: Multi-class indoor semantic segmentation with deep structured model. Visual Comput 34(5), 735–747 (2018)
Article Google Scholar
Zheng, X., Tan, X., Zhou, J., Ma, L., Lau, R.W.H.: Weakly-supervised saliency detection via salient object subitizing. IEEE Trans Circuits Syst Video Technol 31(11), 4370–4380 (2021)
Article Google Scholar
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, pp. 2223–2232 (2017)
Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Asymmetric non-local neural networks for semantic segmentation. In ICCV (2019)

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2019YFC1521104), National Natural Science Foundation of China (72192821, 61972157), Shanghai Municipal Science, Technology Major Project (2021SHZDZX0102), Shanghai Science and Technology Commission (21511101200, 22YF1420300), a General Research Fund from RGC of Hong Kong (RGC Ref.: 11205620) and a Strategic Research Grant from City University of Hong Kong (Ref.: 7005674). Xin Tan is also supported by the Postgraduate Studentship (by Mainland Schemes) from City University of Hong Kong.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Xin Tan, Jiachen Xu & Lizhuang Ma
Department of Computer Science, City University of Hong Kong, HKSAR, China
Xin Tan, Ying Cao, Ke Xu & Rynson W. H. Lau

Authors

Xin Tan
View author publications
You can also search for this author in PubMed Google Scholar
Jiachen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Cao
View author publications
You can also search for this author in PubMed Google Scholar
Ke Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lizhuang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Rynson W. H. Lau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ying Cao or Rynson W. H. Lau.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, X., Xu, J., Cao, Y. et al. HSNet: hierarchical semantics network for scene parsing. Vis Comput 39, 2543–2554 (2023). https://doi.org/10.1007/s00371-022-02477-3

Download citation

Accepted: 18 March 2022
Published: 03 May 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00371-022-02477-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HSNet: hierarchical semantics network for scene parsing

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HSNet: hierarchical semantics network for scene parsing

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation