Context-Based Deep Learning Architecture with Optimal Integration Layer for Image Parsing

Mandal, Ranju; Azam, Basim; Verma, Brijesh

doi:10.1007/978-3-030-92270-2_25

Ranju Mandal¹³,
Basim Azam¹³ &
Brijesh Verma¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13109))

Included in the following conference series:

International Conference on Neural Information Processing

1568 Accesses
1 Citations

Abstract

Deep learning models have been proved to be promising and efficient lately on image parsing tasks. However, deep learning models are not fully capable of incorporating visual and contextual information simultaneously. We propose a new three-layer context-based deep architecture to integrate context explicitly with visual information. The novel idea here is to have a visual layer to learn visual characteristics from binary class-based learners, a contextual layer to learn context, and then an integration layer to learn from both via genetic algorithm-based optimal fusion to produce a final decision. The experimental outcomes when evaluated on benchmark datasets show our approach outperforms existing baseline approaches. Further analysis shows that optimized network weights can improve performance and make stable predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: DenseASPP for semantic segmentation in street scenes. In: CVPR, pp. 3684–3692 (2018)
Google Scholar
Zhou, L., Zhang, H., Long, Y., Shao, L., Yang, J.: Depth embedded recurrent predictive parsing network for video scenes. IEEE Trans. ITS 20(12), 4643–4654 (2019)
Google Scholar
Qi, M., Wang, Y., Li, A., Luo, J.: STC-GAN: spatio-temporally coupled generative adversarial networks for predictive scene parsing. IEEE Trans. IP 29, 5420–5430 (2020)
Google Scholar
Zhang, R., Tang, S., Zhang, Y., Li, J., Yan, S.: Perspective-adaptive convolutions for scene parsing. IEEE Trans. PAMI 42(4), 909–924 (2019)
Article Google Scholar
Zhang, H., et al.: Context encoding for semantic segmentation. In: CVPR, pp. 7151–7160 (2018)
Google Scholar
Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models: combining models for holistic scene understanding. NIPS 21, 641–648 (2008)
Google Scholar
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_20
Chapter Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. PAMI 35(8), 1915–1929 (2013)
Article Google Scholar
Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D.: Multi-class segmentation with relative location prior. IJCV 80(3), 300–316 (2008)
Article Google Scholar
Micušlík, B., Košecká, J.: Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry. In: ICCV Workshops, pp. 625–632 (2009)
Google Scholar
Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: ICCV, pp. 1–8 (2009)
Google Scholar
Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high-definition ground truth database. Pattern Recognit. Lett. 30(2), 88–97 (2009)
Article Google Scholar
Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV 81(1), 2–23 (2009)
Article Google Scholar
Fulkerson, B., Vedaldi, A., Soatto, S.: Class segmentation and object localization with superpixel neighborhoods. In: ICCV, pp. 670–677 (2009)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid Scene Parsing Network. In: CVPR, pp. 2881–2890 (2017)
Google Scholar
Nguyen, T.N.A., Phung, S.L., Bouzerdoum, A.: Hybrid deep learning-Gaussian process network for pedestrian lane detection in unstructured scenes. IEEE Trans. NNLS 31(12), 5324–5338 (2020)
Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. PAMI 39(12), 2481–2495 (2017)
Article Google Scholar
Ghosh, R., Verma, B.: A hierarchical method for finding optimal architecture and weights using evolutionary least square based learning. IJNS 13(1), 13–24 (2003)
Google Scholar
Zhang, P., Liu, W., Lei, Y., Wang, H., Lu, H.: RAPNet: residual atrous pyramid network for importance-aware street scene parsing. IEEE Trans. IP 29, 5010–5021 (2020)
Google Scholar
Xiong, Y., et al.: UPSNet: a unified panoptic segmentation network. In: CVPR, pp. 8818–8826 (2019)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. PAMI 42(2), 386–397 (2020)
Article Google Scholar
Zhang, P., Liu, W., Wang, H., Lei, Y., Lu, H.: Deep gated attention networks for large-scale street-level scene segmentation. Pattern Recognit. 88, 702–714 (2019)
Article Google Scholar
Cheng, B., et al.: Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: CVPR, pp. 12475–12482 (2020)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR, pp. 1–13 (2016)
Google Scholar
Lin, G., Liu, F., Milan, A., Shen, C., Reid, I.: Refinenet: multi-path refinement networks for dense prediction. PAMI 42(5), 1228–1242 (2019)
Google Scholar
Ding, S., Li, H., Su, C., Yu, J., Jin, F.: Evolutionary artificial neural networks: a review. Artif. Intell. Rev. 39(3), 251–260 (2013)
Article Google Scholar
Azam, B., Mandal, R., Zhang, L., Verma, B. K.: Class probability-based visual and contextual feature integration for image parsing. In IVCNZ, pp. 1–6 (2020)
Google Scholar
Zhu, X., Zhang, X., Zhang, X.-Y., Xue, Z., Wang, L.: A novel framework for semantic segmentation with generative adversarial network. JVCIR 58, 532–543 (2019)
Google Scholar
Kumar, M.P., Koller, D.: Efficiently selecting regions for scene understanding. In: CVPR, pp. 3217–3224 (2010)
Google Scholar
Lempitsky, V., Vedaldi, A., Zisserman, A.: Pylon model for semantic segmentation. In: NIPS, pp. 1485–1493 (2011)
Google Scholar
Sharma, A., Tuzel, O., Jacobs, D.W.: Deep hierarchical parsing for semantic segmentation. In: CVPR, pp. 530–538 (2015)
Google Scholar
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. PAMI 40(4), 834–848 (2017)
Article Google Scholar
Munoz, D., Bagnell, J.A., Hebert, M.: Stacked hierarchical labeling. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 57–70. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15567-3_5
Chapter Google Scholar
Ladický, L., Russell, C., Kohli, P., Torr, P.H.S.: Associative hierarchical random fields. PAMI 36(6), 1056–1077 (2014)
Article Google Scholar

Download references

Acknowledgements

This research was supported under Australian Research Council’s Discovery Projects funding scheme (project number DP200102252).

Author information

Authors and Affiliations

Centre for Intelligent Systems, School of Engineering and Technology, Central Queensland University, Brisbane, Australia
Ranju Mandal, Basim Azam & Brijesh Verma

Authors

Ranju Mandal
View author publications
You can also search for this author in PubMed Google Scholar
Basim Azam
View author publications
You can also search for this author in PubMed Google Scholar
Brijesh Verma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ranju Mandal .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mandal, R., Azam, B., Verma, B. (2021). Context-Based Deep Learning Architecture with Optimal Integration Layer for Image Parsing. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13109. Springer, Cham. https://doi.org/10.1007/978-3-030-92270-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-92270-2_25
Published: 07 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92269-6
Online ISBN: 978-3-030-92270-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics