Abstract
Deep learning models have been proved to be promising and efficient lately on image parsing tasks. However, deep learning models are not fully capable of incorporating visual and contextual information simultaneously. We propose a new three-layer context-based deep architecture to integrate context explicitly with visual information. The novel idea here is to have a visual layer to learn visual characteristics from binary class-based learners, a contextual layer to learn context, and then an integration layer to learn from both via genetic algorithm-based optimal fusion to produce a final decision. The experimental outcomes when evaluated on benchmark datasets show our approach outperforms existing baseline approaches. Further analysis shows that optimized network weights can improve performance and make stable predictions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: DenseASPP for semantic segmentation in street scenes. In: CVPR, pp. 3684–3692 (2018)
Zhou, L., Zhang, H., Long, Y., Shao, L., Yang, J.: Depth embedded recurrent predictive parsing network for video scenes. IEEE Trans. ITS 20(12), 4643–4654 (2019)
Qi, M., Wang, Y., Li, A., Luo, J.: STC-GAN: spatio-temporally coupled generative adversarial networks for predictive scene parsing. IEEE Trans. IP 29, 5420–5430 (2020)
Zhang, R., Tang, S., Zhang, Y., Li, J., Yan, S.: Perspective-adaptive convolutions for scene parsing. IEEE Trans. PAMI 42(4), 909–924 (2019)
Zhang, H., et al.: Context encoding for semantic segmentation. In: CVPR, pp. 7151–7160 (2018)
Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models: combining models for holistic scene understanding. NIPS 21, 641–648 (2008)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_20
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. PAMI 35(8), 1915–1929 (2013)
Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D.: Multi-class segmentation with relative location prior. IJCV 80(3), 300–316 (2008)
MicuÅ¡lÃk, B., KoÅ¡ecká, J.: Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry. In: ICCV Workshops, pp. 625–632 (2009)
Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: ICCV, pp. 1–8 (2009)
Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high-definition ground truth database. Pattern Recognit. Lett. 30(2), 88–97 (2009)
Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV 81(1), 2–23 (2009)
Fulkerson, B., Vedaldi, A., Soatto, S.: Class segmentation and object localization with superpixel neighborhoods. In: ICCV, pp. 670–677 (2009)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid Scene Parsing Network. In: CVPR, pp. 2881–2890 (2017)
Nguyen, T.N.A., Phung, S.L., Bouzerdoum, A.: Hybrid deep learning-Gaussian process network for pedestrian lane detection in unstructured scenes. IEEE Trans. NNLS 31(12), 5324–5338 (2020)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. PAMI 39(12), 2481–2495 (2017)
Ghosh, R., Verma, B.: A hierarchical method for finding optimal architecture and weights using evolutionary least square based learning. IJNS 13(1), 13–24 (2003)
Zhang, P., Liu, W., Lei, Y., Wang, H., Lu, H.: RAPNet: residual atrous pyramid network for importance-aware street scene parsing. IEEE Trans. IP 29, 5010–5021 (2020)
Xiong, Y., et al.: UPSNet: a unified panoptic segmentation network. In: CVPR, pp. 8818–8826 (2019)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. PAMI 42(2), 386–397 (2020)
Zhang, P., Liu, W., Wang, H., Lei, Y., Lu, H.: Deep gated attention networks for large-scale street-level scene segmentation. Pattern Recognit. 88, 702–714 (2019)
Cheng, B., et al.: Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: CVPR, pp. 12475–12482 (2020)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR, pp. 1–13 (2016)
Lin, G., Liu, F., Milan, A., Shen, C., Reid, I.: Refinenet: multi-path refinement networks for dense prediction. PAMI 42(5), 1228–1242 (2019)
Ding, S., Li, H., Su, C., Yu, J., Jin, F.: Evolutionary artificial neural networks: a review. Artif. Intell. Rev. 39(3), 251–260 (2013)
Azam, B., Mandal, R., Zhang, L., Verma, B. K.: Class probability-based visual and contextual feature integration for image parsing. In IVCNZ, pp. 1–6 (2020)
Zhu, X., Zhang, X., Zhang, X.-Y., Xue, Z., Wang, L.: A novel framework for semantic segmentation with generative adversarial network. JVCIR 58, 532–543 (2019)
Kumar, M.P., Koller, D.: Efficiently selecting regions for scene understanding. In: CVPR, pp. 3217–3224 (2010)
Lempitsky, V., Vedaldi, A., Zisserman, A.: Pylon model for semantic segmentation. In: NIPS, pp. 1485–1493 (2011)
Sharma, A., Tuzel, O., Jacobs, D.W.: Deep hierarchical parsing for semantic segmentation. In: CVPR, pp. 530–538 (2015)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. PAMI 40(4), 834–848 (2017)
Munoz, D., Bagnell, J.A., Hebert, M.: Stacked hierarchical labeling. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 57–70. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15567-3_5
Ladický, L., Russell, C., Kohli, P., Torr, P.H.S.: Associative hierarchical random fields. PAMI 36(6), 1056–1077 (2014)
Acknowledgements
This research was supported under Australian Research Council’s Discovery Projects funding scheme (project number DP200102252).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mandal, R., Azam, B., Verma, B. (2021). Context-Based Deep Learning Architecture with Optimal Integration Layer for Image Parsing. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13109. Springer, Cham. https://doi.org/10.1007/978-3-030-92270-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-92270-2_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92269-6
Online ISBN: 978-3-030-92270-2
eBook Packages: Computer ScienceComputer Science (R0)