Skip to main content
Log in

HSNet: hierarchical semantics network for scene parsing

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Scene parsing is one of the fundamental tasks in computer vision. Humans tend to perceive a scene in a hierarchical manner, i.e., first identifying the coarse category (e.g., vehicle) of a group of objects and then the fine category (e.g., bicycle, truck or car) of each of them. Despite recent tremendous progress on scene parsing, such a hierarchical semantics prior (HSP) has not been explicitly exploited. In this paper, we aim to introduce the HSP into scene parsing, by proposing a hierarchical semantics network (HSNet). Our key contribution is a bidirectional cross-level feature matching framework, which enables us to learn multi-level, hierarchy-aware features via forward feature transfer and backward feature regularization. In the forward stage, we train a coarse-to-fine module to learn fine-category features that explicitly encode hierarchical semantics information. In the backward stage, we introduce a fine-to-coarse module to collapse fine-category features to coarse-category features that are used to regularize the feature learning of our network. Experimental results on Cityscapes and Pascal Context show that our method achieves state-of-the-art performances. Our visualization also shows that our learned features capture semantic hierarchy favorably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Ahmed, K.: Mohammad Haris Baig, and Lorenzo Torresani. Network of experts for large-scale image categorization, In ECCV (2016)

  2. Bilal, A., Jourabloo, A., Ye, M., Liu, X., Ren, L.: Do convolutional neural networks learn class hierarchy? TVCG 24(1), 152–162 (2017)

    Google Scholar 

  3. Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In ICML, pp. 111–118 (2010)

  4. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A., Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI, (2018)

  5. Chen, T.-W., Wang, D., Tao, W., Wen, D., Yin, L., Ito, T., Osa, K., Kato, M.: Cassod-net: Cascaded and separable structures of dilated convolution for embedded vision systems and applications. In CVPR, pp. 3182–3190 (2021)

  6. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele ,B.: The cityscapes dataset for semantic urban scene understanding. In CVPR, (2016)

  7. Ding, H., Jiang, X., Liu A.Q., Thalmann, N.M., and Wang G.: Boundary-aware feature propagation for scene segmentation, In ICCV (2019)

  8. Ding, H., Jiang, X., Shuai, B., Liu, A.Q., Wang, G.: Semantic correlation promoted shape-variant context for segmentation. In CVPR, pp. 8885–8894, (2019)

  9. Fan, C., Peng, Y., Peng, S., Zhang, H., Wu, Y., Kwong, S.: Detection of train driver fatigue and distraction based on forehead eeg: A time-series ensemble learning method. IEEE Transactions on Intelligent Transportation Systems, pp. 1–11, (2021)

  10. Fu, J., Liu, J., Tian, H., Fang, Z., Lu, H.: Dual attention network for scene segmentation. CVPR, (2019)

  11. Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H. : Dual attention network for scene segmentation. In CVPR, pp. 3146–3154 (2019)

  12. He, J., Deng, Z., Qiao, Y.: Dynamic multi-scale filters for semantic segmentation. In ICCV, pp. 3562–3572 (2019)

  13. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: CVPR (2016)

  14. Hu, H., Ji, D., Gan, W., Bai, S., Wu, W., Yan, J.: Class-wise dynamic graph convolution for semantic segmentation. In ECCV, (2020)

  15. Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Wenyu L.: Criss-cross attention for semantic segmentation. ICCV, Ccnet (2019)

  16. Ji, J., Shi, R., Li, S., Chen, P., Miao, Q.: Encoder-decoder with cascaded crfs for semantic segmentation. IEEE Transactions on Circuits and Systems for Video Technology, (2020)

  17. Jiang, M., Zhai, F., Kong, J.: Sparse attention module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. The Visual Computer, pp. 1–16 (2021)

  18. Jiao, J., Wei, Y., Jie, Z., Shi, H., Lau, R., Huang, W.H., Thomas S.: Geometry-aware distillation for indoor semantic segmentation. In CVPR, pp. 2869–2878 (2019)

  19. Kruger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., Rodriguez-Sanchez, A.J., Wiskott, L.: Deep hierarchies in the primate visual cortex: What can we learn for computer. TPAMI 35(8), 1847–1871 (2012)

    Article  Google Scholar 

  20. Lee, K., Lee, K., Min, K., Zhang, Y., Shin, J., Lee, H.: Hierarchical novelty detection for visual object recognition. In CVPR, (2018)

  21. Li, X., Yang, Y., Zhao, Q., Shen, T., Lin, Z., Liu, H.: Spatial pyramid based graph reasoning for semantic segmentation. In CVPR, pp. 8950–8959 (2020)

  22. Lin, D., Ji, Y., Lischinski, D., Cohen-Or, D., Huang, H.: Multi-scale context intertwining for semantic segmentation. In ECCV, (2018)

  23. Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In CVPR, pp. 1925–1934 (2017)

  24. Liu, Mingyuan, Schonfeld, Dan, Tang, Wei: Exploit visual dependency relations for semantic segmentation. In CVPR, pages 9726–9735 (2021)

  25. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In CVPR, pp. 3431–3440 (2015)

  26. Mottaghi, R., Chen, X., Liu, X., Cho, N.-G., Lee, S.-W., Fidler, S., Urtasun, R., Yuille, A.: The role of context for object detection and semantic segmentation in the wild. In CVPR (2014)

  27. Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters—improve semantic segmentation by global convolutional network. In CVPR, pp. 1743 – 1751 (2017)

  28. Peng, Y., Lin, Y., Fan, C., Qian, X., Diya, X., Yi, S., Zhang, H., Wang, K.: Passenger overall comfort in high-speed railway environments based on eeg: Assessment and degradation mechanism. Build. Environ. 210, 108711 (2022)

    Article  Google Scholar 

  29. Takikawa, T., Acuna, D., Jampani, V., Sanja Fidler: Gated shape cnns for semantic segmentation. In ICCV, Gated-scnn (2019)

  30. Tamaazousti, Y., Le B., Hervé, Céline H.: Multi categorical-level networks to generate more discriminating features. In CVPR, Mucale-net (2017)

  31. Tan, X., Xu, K., Cao, Y., Zhang, Y., Ma, L., Lau, R.W.H.: Night-time scene parsing with a large real dataset. IEEE Trans. Image Process. 30, 9085–9098 (2021)

    Article  Google Scholar 

  32. Wang, D., Guoqing, H., Lyu, C.: Frnet: an end-to-end feature refinement neural network for medical image segmentation. Visual Comput. 37, 1101–1112 (2021)

    Article  Google Scholar 

  33. Wang, K., Yang, J., Yuan, S., Li, M.: A lightweight network with attention decoder for real-time semantic segmentation. Visual Computer, pp. 1–11, (2021)

  34. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In CVPR, pp. 7794–7803 (2018)

  35. Wang, Y., Zhou, W., Jiang, T., Bai, X., Xu, Y.: Intra-class feature variation distillation for semantic segmentation. In ECCV, pp. 346–362 (2020)

  36. Wu, T., Tang, S., Zhang, R., Guo, G.: Consensus feature network for scene parsing. IEEE Transactions on Multimedia, (2021)

  37. Zifeng, W., Shen, C., Anton Van Den H.: Revisiting the resnet model for visual recognition. Pattern Recognition, Wider or deeper (2019)

  38. Xu, K., Tian, X., Yang, X., Yin, B., Lau, R.W.H.: Intensity-aware single-image deraining with semantic and color regularization. IEEE TIP (2021)

  39. Xu, K., Wang, X., Yang, X., He, S., Zhang, Q., Yin, B., Wei, X., Lau, R.W.H.: Efficient image super-resolution integration. The Visual Computer, (2018)

  40. Yan, Z., Zhang, H., Piramuthu, R., Jagadeesh, V., DeCoste, D., Di, W., Yu, Y.: Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition. In ICCV, pp. 2740–2748 (2015)

  41. Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: Denseaspp for semantic segmentation in street scenes. In CVPR, (2018)

  42. Ye, X., Wang, H., Li, Y.: Image content-dependent steerable kernels. The Visual Computer, pp. 1–12 (2021)

  43. Yu, C., Wang, J., Gao, C., Yu, G., Shen, C., Sang, N.: Context prior for scene segmentation. In CVPR, pp. 12416–12425 (2020)

  44. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In ECCV, pp. 325–341 (2018)

  45. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In CVPR (2018)

  46. Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In ECCV, (2019)

  47. Zhang, F., Chen, Y., Li, Z., Hong, Z., Liu, J., Ma, F., Han, J., Errui D.: Attentional class feature network for semantic segmentation. In ICCV, Acfnet (2019)

  48. Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., Agrawal, A.: Context encoding for semantic segmentation. In CVPR, pp. 7151–7160 (2018)

  49. Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., Torr, P.H.S.: Dual graph convolutional network for semantic segmentation. In BMVC, (2019)

  50. Zhang, Z., Zhang, X., Peng, C., Xue, X., Jian S.: Enhancing feature fusion for semantic segmentation. In ECCV, Exfuse (2018)

  51. Zhao, H., Jianping, S., Xiaogang W., Jiaya J.: Pyramid scene parsing network. In CVPR, Xiaojuan Qi (2017)

  52. Zheng, C., Wang, J., Chen, W., Xingming, W.: Multi-class indoor semantic segmentation with deep structured model. Visual Comput 34(5), 735–747 (2018)

    Article  Google Scholar 

  53. Zheng, X., Tan, X., Zhou, J., Ma, L., Lau, R.W.H.: Weakly-supervised saliency detection via salient object subitizing. IEEE Trans Circuits Syst Video Technol 31(11), 4370–4380 (2021)

    Article  Google Scholar 

  54. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, pp. 2223–2232 (2017)

  55. Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Asymmetric non-local neural networks for semantic segmentation. In ICCV (2019)

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2019YFC1521104), National Natural Science Foundation of China (72192821, 61972157), Shanghai Municipal Science, Technology Major Project (2021SHZDZX0102), Shanghai Science and Technology Commission (21511101200, 22YF1420300), a General Research Fund from RGC of Hong Kong (RGC Ref.: 11205620) and a Strategic Research Grant from City University of Hong Kong (Ref.: 7005674). Xin Tan is also supported by the Postgraduate Studentship (by Mainland Schemes) from City University of Hong Kong.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ying Cao or Rynson W. H. Lau.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tan, X., Xu, J., Cao, Y. et al. HSNet: hierarchical semantics network for scene parsing. Vis Comput 39, 2543–2554 (2023). https://doi.org/10.1007/s00371-022-02477-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02477-3

Keywords

Navigation