Skip to main content

Context-Based Deep Learning Architecture with Optimal Integration Layer for Image Parsing

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13109))

Included in the following conference series:

Abstract

Deep learning models have been proved to be promising and efficient lately on image parsing tasks. However, deep learning models are not fully capable of incorporating visual and contextual information simultaneously. We propose a new three-layer context-based deep architecture to integrate context explicitly with visual information. The novel idea here is to have a visual layer to learn visual characteristics from binary class-based learners, a contextual layer to learn context, and then an integration layer to learn from both via genetic algorithm-based optimal fusion to produce a final decision. The experimental outcomes when evaluated on benchmark datasets show our approach outperforms existing baseline approaches. Further analysis shows that optimized network weights can improve performance and make stable predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: DenseASPP for semantic segmentation in street scenes. In: CVPR, pp. 3684–3692 (2018)

    Google Scholar 

  2. Zhou, L., Zhang, H., Long, Y., Shao, L., Yang, J.: Depth embedded recurrent predictive parsing network for video scenes. IEEE Trans. ITS 20(12), 4643–4654 (2019)

    Google Scholar 

  3. Qi, M., Wang, Y., Li, A., Luo, J.: STC-GAN: spatio-temporally coupled generative adversarial networks for predictive scene parsing. IEEE Trans. IP 29, 5420–5430 (2020)

    Google Scholar 

  4. Zhang, R., Tang, S., Zhang, Y., Li, J., Yan, S.: Perspective-adaptive convolutions for scene parsing. IEEE Trans. PAMI 42(4), 909–924 (2019)

    Article  Google Scholar 

  5. Zhang, H., et al.: Context encoding for semantic segmentation. In: CVPR, pp. 7151–7160 (2018)

    Google Scholar 

  6. Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models: combining models for holistic scene understanding. NIPS 21, 641–648 (2008)

    Google Scholar 

  7. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_20

    Chapter  Google Scholar 

  8. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. PAMI 35(8), 1915–1929 (2013)

    Article  Google Scholar 

  9. Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D.: Multi-class segmentation with relative location prior. IJCV 80(3), 300–316 (2008)

    Article  Google Scholar 

  10. Micušlík, B., Košecká, J.: Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry. In: ICCV Workshops, pp. 625–632 (2009)

    Google Scholar 

  11. Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: ICCV, pp. 1–8 (2009)

    Google Scholar 

  12. Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high-definition ground truth database. Pattern Recognit. Lett. 30(2), 88–97 (2009)

    Article  Google Scholar 

  13. Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV 81(1), 2–23 (2009)

    Article  Google Scholar 

  14. Fulkerson, B., Vedaldi, A., Soatto, S.: Class segmentation and object localization with superpixel neighborhoods. In: ICCV, pp. 670–677 (2009)

    Google Scholar 

  15. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)

    Google Scholar 

  16. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid Scene Parsing Network. In: CVPR, pp. 2881–2890 (2017)

    Google Scholar 

  17. Nguyen, T.N.A., Phung, S.L., Bouzerdoum, A.: Hybrid deep learning-Gaussian process network for pedestrian lane detection in unstructured scenes. IEEE Trans. NNLS 31(12), 5324–5338 (2020)

    Google Scholar 

  18. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. PAMI 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  19. Ghosh, R., Verma, B.: A hierarchical method for finding optimal architecture and weights using evolutionary least square based learning. IJNS 13(1), 13–24 (2003)

    Google Scholar 

  20. Zhang, P., Liu, W., Lei, Y., Wang, H., Lu, H.: RAPNet: residual atrous pyramid network for importance-aware street scene parsing. IEEE Trans. IP 29, 5010–5021 (2020)

    Google Scholar 

  21. Xiong, Y., et al.: UPSNet: a unified panoptic segmentation network. In: CVPR, pp. 8818–8826 (2019)

    Google Scholar 

  22. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. PAMI 42(2), 386–397 (2020)

    Article  Google Scholar 

  23. Zhang, P., Liu, W., Wang, H., Lei, Y., Lu, H.: Deep gated attention networks for large-scale street-level scene segmentation. Pattern Recognit. 88, 702–714 (2019)

    Article  Google Scholar 

  24. Cheng, B., et al.: Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: CVPR, pp. 12475–12482 (2020)

    Google Scholar 

  25. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR, pp. 1–13 (2016)

    Google Scholar 

  26. Lin, G., Liu, F., Milan, A., Shen, C., Reid, I.: Refinenet: multi-path refinement networks for dense prediction. PAMI 42(5), 1228–1242 (2019)

    Google Scholar 

  27. Ding, S., Li, H., Su, C., Yu, J., Jin, F.: Evolutionary artificial neural networks: a review. Artif. Intell. Rev. 39(3), 251–260 (2013)

    Article  Google Scholar 

  28. Azam, B., Mandal, R., Zhang, L., Verma, B. K.: Class probability-based visual and contextual feature integration for image parsing. In IVCNZ, pp. 1–6 (2020)

    Google Scholar 

  29. Zhu, X., Zhang, X., Zhang, X.-Y., Xue, Z., Wang, L.: A novel framework for semantic segmentation with generative adversarial network. JVCIR 58, 532–543 (2019)

    Google Scholar 

  30. Kumar, M.P., Koller, D.: Efficiently selecting regions for scene understanding. In: CVPR, pp. 3217–3224 (2010)

    Google Scholar 

  31. Lempitsky, V., Vedaldi, A., Zisserman, A.: Pylon model for semantic segmentation. In: NIPS, pp. 1485–1493 (2011)

    Google Scholar 

  32. Sharma, A., Tuzel, O., Jacobs, D.W.: Deep hierarchical parsing for semantic segmentation. In: CVPR, pp. 530–538 (2015)

    Google Scholar 

  33. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. PAMI 40(4), 834–848 (2017)

    Article  Google Scholar 

  34. Munoz, D., Bagnell, J.A., Hebert, M.: Stacked hierarchical labeling. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 57–70. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15567-3_5

    Chapter  Google Scholar 

  35. Ladický, L., Russell, C., Kohli, P., Torr, P.H.S.: Associative hierarchical random fields. PAMI 36(6), 1056–1077 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported under Australian Research Council’s Discovery Projects funding scheme (project number DP200102252).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ranju Mandal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mandal, R., Azam, B., Verma, B. (2021). Context-Based Deep Learning Architecture with Optimal Integration Layer for Image Parsing. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13109. Springer, Cham. https://doi.org/10.1007/978-3-030-92270-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92270-2_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92269-6

  • Online ISBN: 978-3-030-92270-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics