Abstract
Semantic segmentation and image parsing have rapidly become an eminent research area in computer vision and machine learning domain. Many applications have required a robust mechanism for segmentation, such as self-driving, augmentative reality, object recognition, etc. Due to the high applicability in the various domains, In this paper, we have introduced a two-step frame-work that parses the image into predefined labels by using a novel CNN architecture and improving the likelihood of labels. In step-1, nine-layer CNN architecture has been introduced, which trains on minimal training samples and results in the pixel-wise Soft-Max probabilities. These probabilities are the soft estimates derived from a hard classifier, i.e., MLP. Data in step-1 has been prepared in the form of a patch-label set. In step-2, we have introduced a Jacobian optimization-based label relaxation method that fuses the local extrema as an edge prior. The proposed frame-work has been denoted as CNN-EFF in this work. The CNN-EFF scheme has been evaluated two publicly available benchmark data-sets, which has arranged in the form of image and their pixel label ground-truth. The experimental results have been compared with the previously proposed state-of-the-art methods. The CNN-EFF has greatly improved semantic labeling accuracy up to a significant gain from the past techniques. The CNN-EFF process has reported 84.42%, 85.91%, 94.66%, 97.14%, and 98.27% accuracy for the Highway, House, sheep, Horse-rider, and Horse-keeper images, respectively. Conclusively, the Proposed frame-work has out-performed the previously proposed state-of-the-art methods.
Similar content being viewed by others
References
Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, Garcia-Rodriguez J (2018) A survey on deep learning techniques for image and video semantic segmentation. Applied Soft Computing 70:41–65. https://doi.org/10.1016/j.asoc.2018.05.018, http://www.sciencedirect.com/science/article/pii/S1568494618302813
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, and Lu H (2019) Dual Attention Network for Scene Segmentation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 3141-3149,
Chen, Hang (2020) ResNeSt: Split-Attention Networks. In: eprint=2004.08955 2020 arXiv , cs.CV,
Liu X, Deng Z, Yang Y (2018) Recent progress in semantic image segmentation. Artificial Intelligence Review. https://doi.org/10.1007/s10462-018-9641-3
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JA, van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Medical Image Analysis 42:60–88. https://doi.org/10.1016/j.media.2017.07.005, http://www.sciencedirect.com/science/article/pii/S1361841517301135
Hu T, Wu W, Liu L (2014) Combination of hard and soft classification method based on adaptive threshold. In: 2014 IEEE Geoscience and Remote Sensing Symposium, pp 4180–4183, https://doi.org/10.1109/IGARSS.2014.6947409
Foody GM (2002) Hard and soft classifications by a neural network with a non-exhaustively defined set of classes. Int J Remote Sens 23(18):3853–3864. https://doi.org/10.1080/01431160110109570
Liu J, Wang C, Su H, Du B, Tao D (2020) Multistage GAN for fabric defect detection. IEEE Transactions on Image Processing 3388-3400 –29, https://doi.org/10.1109/TIP.2019.2959741
Li X, Du B, Xu C, Zhang Y, Zhang L, Tao D (2020) Robust learning with imperfect privileged information. Artificial Intelligence 0004-3702 –282, https://doi.org/10.1016/j.artint.2020.103246, https://www.sciencedirect.com/science/article/pii/S0004370220300114
Lateef F, Ruichek Y (2019) Survey on semantic segmentation using deep learning techniques. Neurocomputing 338:321–348. https://doi.org/10.1016/j.neucom.2019.02.003, http://www.sciencedirect.com/science/article/pii/S092523121930181X
Li R, Wang S, Gu D (2018) Ongoing evolution of visual slam from geometry to deep learning: challenges and opportunities. Cogn Comput 10(6):875–889. https://doi.org/10.1007/s12559-018-9591-8
Halstead MA, Denman S, Sridharan S, Tian Y, Fookes C (2019) Multimodal clothing recognition for semantic search in unconstrained surveillance imagery. Journal of Visual Communication and Image Representation 58:439–452. https://doi.org/10.1016/j.jvcir.2018.12.001, http://www.sciencedirect.com/science/article/pii/S1047320318303274
Luo L, Wang X, Hu S, Hu X, Chen L (2017) Interactive image segmentation based on samples reconstruction and flda. Journal of Visual Communication and Image Representation 43:138–151. https://doi.org/10.1016/j.jvcir.2016.12.012, http://www.sciencedirect.com/science/article/pii/S1047320316302656
Ru L, Du B, Wu C (2021) Multi-temporal scene classification and scene change detection with correlation based fusion. IEEE Trans Image Process 30:1382–1394. https://doi.org/10.1109/TIP.2020.3039328
Li Y, Qi H, Dai J, Ji X, Wei Y (2017) Fully convolutional instance-aware semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4438–4446, https://doi.org/10.1109/CVPR.2017.472
Li T, Leng J, Kong L, Guo S, Bai G, Wang K (2019) Dcnr: deep cube cnn with random forest for hyperspectral image classification. Multimedia Tools Appl 78(3):3411–3433. https://doi.org/10.1007/s11042-018-5986-5
Wu Z, Gao Y, Li L, Xue J, Li Y (2019) Semantic segmentation of high-resolution remote sensing images using fully convolutional network with adaptive threshold. Connect Sci 31(2):169–184. https://doi.org/10.1080/09540091.2018.1510902
Holliday A, Barekatain M, Laurmaa J, Kandaswamy C, Prendinger H (2017) Speedup of deep learning ensembles for semantic segmentation using a model compression technique. Computer Vision and Image Understanding 164:16–26. https://doi.org/10.1016/j.cviu.2017.05.004, http://www.sciencedirect.com/science/article/pii/S1077314217300826, deep Learning for Computer Vision
Hu W, Hu H (2019) Discriminant deep feature learning based on joint supervision loss and multi-layer feature fusion for heterogeneous face recognition. Computer Vision and Image Understanding 184:9–21. https://doi.org/10.1016/j.cviu.2019.04.003, http://www.sciencedirect.com/science/article/pii/S1077314219300566
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929. https://doi.org/10.1109/TPAMI.2012.231
Li Y, Sohel F, Bennamoun M, Lei H (2015) Outdoor scene labelling with learned features and region consistency activation. In: 2015 IEEE International Conference on Image Processing (ICIP), pp 1374–1378, https://doi.org/10.1109/ICIP.2015.7351025
Jiang H, Guo Y (2019) Multi-class multimodal semantic segmentation with an improved 3d fully convolutional networks. Neurocomputing https://doi.org/10.1016/j.neucom.2018.11.103, http://www.sciencedirect.com/science/article/pii/S0925231219304187
Ning Q, Zhu J, Chen C (2018) Very fast semantic image segmentation using hierarchical dilation and feature refining. Cogn Comput 10(1):62–72. https://doi.org/10.1007/s12559-017-9530-0
Chen LC (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Computer Vision – ECCV 2018 Springer International Publishing , pp 833–851
Niu X, Yan B, Tan W, Wang J (2019) Effective image restoration for semantic segmentation. Neurocomputing. https://doi.org/10.1016/j.neucom.2019.09.063, http://www.sciencedirect.com/science/article/pii/S0925231219313311
Li R, Gu D, Liu Q, Long Z, Hu H (2018) Semantic scene mapping with spatio-temporal deep neural network for robotic applications. Cogn Comput 10(2):260–271. https://doi.org/10.1007/s12559-017-9526-9
Liu X, Deng Z (2018) Segmentation of drivable road using deep fully convolutional residual network with pyramid pooling. Cogn Comput 10(2):272–281. https://doi.org/10.1007/s12559-017-9524-y
Zhu X, Zhang X, Zhang XY, Xue Z, Wang L (2019) A novel framework for semantic segmentation with generative adversarial network. Journal of Visual Communication and Image Representation 58:532–543. https://doi.org/10.1016/j.jvcir.2018.11.020, http://www.sciencedirect.com/science/article/pii/S1047320318302931
Basaeed E, Bhaskar H, Al-Mualla M (2016) Supervised remote sensing image segmentation using boosted convolutional neural networks. Knowledge-Based Systems 99:19–27. https://doi.org/10.1016/j.knosys.2016.01.028, http://www.sciencedirect.com/science/article/pii/S0950705116000484
Le TN, Nguyen TV, Nie Z, Tran MT, Sugimoto A (2019) Anabranch network for camouflaged object segmentation. Computer Vision and Image Understanding 184:45–56. https://doi.org/10.1016/j.cviu.2019.04.006, http://www.sciencedirect.com/science/article/pii/S1077314219300608
Lekic V, Babic Z (2019) Automotive radar and camera fusion using generative adversarial networks. Computer Vision and Image Understanding 184:1–8. https://doi.org/10.1016/j.cviu.2019.04.002, http://www.sciencedirect.com/science/article/pii/S1077314219300530
Chaudhuri U, Banerjee B, Bhattacharya A (2019) Siamese graph convolutional network for content based remote sensing image retrieval. Computer Vision and Image Understanding 184:22–30. https://doi.org/10.1016/j.cviu.2019.04.004, http://www.sciencedirect.com/science/article/pii/S1077314219300578
Xie J, Yu L, Zhu L, Chen X (2017) Semantic image segmentation method with multiple adjacency trees and multiscale features. Cogn Comput 9(2):168–179. https://doi.org/10.1007/s12559-016-9441-5
Kosov S, Shirahama K, Li C, Grzegorzek M (2018) Environmental microorganism classification using conditional random fields and deep convolutional neural networks. Pattern Recognition 77:248–261. https://doi.org/10.1016/j.patcog.2017.12.021, http://www.sciencedirect.com/science/article/pii/S0031320317305174
Wang W, He C, Xia XG (2018) A constrained total variation model for single image dehazing. Pattern Recognition 80:196–209. https://doi.org/10.1016/j.patcog.2018.03.009, http://www.sciencedirect.com/science/article/pii/S0031320318300864
Liu Y, Chen X, Zhang C, Sprague A (2009) Semantic clustering for region-based image retrieval. Journal of Visual Communication and Image Representation 20(2):157 – 166, https://doi.org/10.1016/j.jvcir.2008.11.006, http://www.sciencedirect.com/science/article/pii/S1047320308001132, special issue on Emerging Techniques for Multimedia Content Sharing, Search and Understanding
shan Zhu S, Yung NH, (2014) Sub-scene segmentation using constraints based on gestalt principles. Journal of Visual Communication and Image Representation 25(5):994–1005. https://doi.org/10.1016/j.jvcir.2014.02.017, http://www.sciencedirect.com/science/article/pii/S1047320314000558
Wang LL, Yung NH (2015) Hybrid graphical model for semantic image segmentation. Journal of Visual Communication and Image Representation 28:83–96. https://doi.org/10.1016/j.jvcir.2015.01.014, http://www.sciencedirect.com/science/article/pii/S1047320315000218
Kittler J, Illingworth J (1985) Relaxation labelling algorithms – a review. Image and Vision Computing 3(4):206 – 216, https://doi.org/10.1016/0262-8856(85)90009-5, http://www.sciencedirect.com/science/article/pii/0262885685900095, papers from the 1985 Alvey Computer Vision and Image Interpretation Meeting
Fang X, Xu Y, Li X, Lai Z, Wong WK, Fang B (2018) Regularized label relaxation linear regression. IEEE Trans Neural Netw Learn Syst 29(4):1006–1018. https://doi.org/10.1109/TNNLS.2017.2648880
He X, Cai D, Yan S, Zhang HJ (2005) Neighborhood preserving embedding. In: Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, vol 2, pp 1208–1213 Vol. 2, https://doi.org/10.1109/ICCV.2005.167
Li X, Lin S, Yan S, Xu D (2008) Discriminant locally linear embedding with high-order tensor data. IEEE Trans Syst, Man, Cybern, Part B (Cybern) 38(2):342–352. https://doi.org/10.1109/TSMCB.2007.911536
Cai Z, Shao L (2018) Rgb-d scene classification via multi-modal feature learning. Cogn Comput. https://doi.org/10.1007/s12559-018-9580-y
Marinoni A, Gamba P (2017) Unsupervised data driven feature extraction by means of mutual information maximization. IEEE Trans Comput Imaging 3(2):243–253. https://doi.org/10.1109/TCI.2017.2669731
Ma X, Liu W, Tao D, Zhou Y (2019) Ensemble p-laplacian regularization for scene image recognition. Cogn Comput. https://doi.org/10.1007/s12559-019-09637-z
Yao Y, Guo P, Xin X, Jiang Z (2014) Image fusion by hierarchical joint sparse representation. Cogn Comput 6(3):281–292. https://doi.org/10.1007/s12559-013-9235-y
Zhang L, Barnden J (2012) Affect sensing using linguistic, semantic and cognitive cues in multi-threaded improvisational dialogue. Cogn Comput 4(4):436–459. https://doi.org/10.1007/s12559-012-9170-3
Bian X, Zhang T, Zhang X, Yan L, Li B (2013) Clustering-based extraction of near border data samples for remote sensing image classification. Cogn Comput 5(1):19–31. https://doi.org/10.1007/s12559-012-9147-2
Zhang A, Liu S, Sun G, Huang H, Ma P, Rong J, Ma H, Lin C, Wang Z (2018) Clustering of remote sensing imagery using a social recognition-based multi-objective gravitational search algorithm. Cogn Comput. https://doi.org/10.1007/s12559-018-9582-9
Sun L, Wu Z, Liu J, Wei Z (2013) Supervised hyperspectral image classification using sparse logistic regression and spatial-tv regularization. In: 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS, pp 1019–1022, https://doi.org/10.1109/IGARSS.2013.6721336
Rodriguez P (2013) Total variation regularization algorithms for images corrupted with different noise models: A review. J Electr Comput Eng. https://doi.org/10.1155/2013/217021
Marquina A, Osher SJ (2008) Image super-resolution by tv-regularization and bregman iteration. J Sci Comput 37(3):367–382. https://doi.org/10.1007/s10915-008-9214-8
Li J, Khodadadzadeh M, Plaza A, Jia X, Bmioucas-Dias JM (2016) A discontinuity preserving relaxation scheme for spectral-spatial hyperspectral image classification. IEEE J Select Topics Appl Earth Obs Remote Sens 9(2):625–639. https://doi.org/10.1109/JSTARS.2015.2470129
Acknowledgements
We have obtained the pascal-VOC dataset from "http://host.robots.ox.ac.uk/pascal/VOC/" and sift-flow dataset from "https://www.kaggle.com/quanbk/sift-flow-dataset." We have performed the CNN training on the tensor-flow with python framework. We are thankful to Cao et al, Department of Mathematics and Statistics,Xi’an Jiaoting University, China, for providing the python helper functions to design the CNN model.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Srivastava, V., Biswas, B. CNN-EFF: CNN Based Edge Feature Fusion in Semantic Image Labelling and Parsing. Neural Process Lett 54, 1753–1781 (2022). https://doi.org/10.1007/s11063-021-10704-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-021-10704-6