Abstract
Segmentation of vehicles into images of road traffic with congested and unstructured traffic patterns is a challenging task. For the same, this paper presents a modified-UNet which segments the input image by the sequential encoding and decoding steps. The modified-UNet uses a number of convolutions, inception modules, and batch normalization to encode the image into feature map. The extracted map is decoded into segmented image by performing the transposed convolutions in different layers. To validate the performance, two publicly available datasets have been considered, namely autorickshaw and IDD-Lite. Performance of the proposed model has been analyzed against state-of-the-art segmentation models in terms of seven parameters, namely intersection over union, accuracy, error, specificity, sensitivity, F1-score, and correlation-coefficient. Furthermore, ablation experiments have also been conducted. Experimental results depict that the proposed model outperforms the other models and reported the best IoU-Scores of 0.82 and 0.61 on autorickshaw and IDD-Lite datasets respectively.
Similar content being viewed by others
References
Aydogdu MF, Celik V, Demirci MF (2017) Comparison of three different cnn architectures for age classification. In: IEEE 11th international conference on semantic computing (ICSC), pp 372–377. IEEE
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Buric M, Pobar M, Ivasic-Kos M (2018) “Ball detection using yolo and mask r-cnn”. In: International conference on computational science and computational intelligence (CSCI). IEEE, pp 319–323
Byeon W, Breuel TM, Raue F, Liwicki M (2015) Scene labeling with lstm recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3547–3555
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Chen C, Song J, Peng C, Wang G, Fang Y (2021) A novel video salient object detection method via semisupervised motion quality perception. IEEE Trans Circuits Syst Video Technol
Chen C, Wang G, Peng C, Fang Y, Zhang D, Qin H (2021) Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans Image Process 30:3995–4007
Chen C, Wei J, Peng C, Qin H (2021) Depth-quality-aware salient object detection. IEEE Trans Image Process 30:2350–2363
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Chu W, Liu Y, Shen C, Cai D, Hua X-S (2017) Multi-task vehicle detection with region-of-interest voting. IEEE Trans Image Process 27 (1):432–441
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Deng L, Yang M, Li H, Li T, Hu B, Wang C (2019) Restricted deformable convolution-based road scene semantic segmentation using surround view cameras. IEEE Trans Intell Transp Syst
Druzhkov P, Kustikova V (2016) A survey of deep learning methods and software tools for image classification and object detection. Pattern Recognit Image Anal 26(1):9–15
Fan H, Mei X, Prokhorov D, Ling H (2018) Multi-level contextual rnns with attention model for scene labeling. IEEE Trans Intell Transp Syst 19 (11):3475–3485
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Haoran Li YCYP, He F (2021) Mlfs-ccde: multi-objective large-scale feature selection by cooperative coevolutionary differential evolution. Memetic Comp 13:1–18
Hayou S, Doucet A, Rousseau J (2018) On the selection of initialization and activation function for deep neural networks. arXiv:1805.08266
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi-supervised semantic segmentation. Adv Neural Inf Process Syst:1495–1503
Huang Z, Wang X, Wang J, Liu W, Wang J (2018) Weakly-supervised semantic segmentation network with deep seeded region growing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7014–7023
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167
Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y (2017) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 11–19
Khoreva A, Benenson R, Hosang J, Hein M, Schiele B (2017) Simple does it: weakly supervised instance and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 876–885
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst:1097–1105
Lateef F, Ruichek Y (2019) Survey on semantic segmentation using deep learning techniques. Neurocomputing 338:321–348
Li M-W, Wang Y-T, Geng J, Hong W-C (2021) Chaos cloud quantum bat hybrid optimization algorithm. Nonlinear Dynamics 103(1):1167–1193
Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3367–3375
Maysam Shahedi JDDMBF, Anusha Devi TT (2020) A study on u-net limitations in object localization and image segmentation. In: SIIM
Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the european conference on computer vision (ECCV), pp 552–568
Meyes R, Lu M, De Puiseau CW, Meisen T (2019) Ablation studies in artificial neural networks. arXiv:1901.08644
Nikolenko S, Kadurin A, Arkhangelskaya E (2018) Deep learning.SPb. Peter
Pan Y, He F, Yu H (2020) Learning social representations with deep autoencoder for recommender system. World Wide Web 23(4):2259–2279
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147
Pohlen T, Hermans A, Mathias M, Leibe B (2017) Full-resolution residual networks for semantic segmentation in street scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4151–4160
Quan Q, He F, Li H (2021) A multi-phase blending method with incremental intensity for training detection networks. Vis Comput 37(2):245–259
Reddy DR (1975) Speech recognition: invited papers presented at the 1974 IEEE symposium. Elsevier
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn : towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst:91–99
Romera E, Alvarez JM, Bergasa LM, Arroyo R (2017) Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sirignano J, Spiliopoulos K (2019) Scaling limit of neural networks with the xavier initialization and convergence to a global minimum. arXiv:1907.04108
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Valada A, Vertens J, Dhall A, Burgard W (2017) Adapnet: adaptive semantic segmentation in adverse environmental conditions. In: IEEE international conference on robotics and automation (ICRA). IEEE, pp 4644–4651
Wang G, Chen C, Fan D-P, Hao A, Qin H (2021) From semantic categories to fixations: a novel weakly-supervised visual-auditory saliency detection approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15119–15128
Wu Z, Li S, Chen C, Hao A, Qin H (2022) Recursive multi-model complementary deep fusion for robust salient object detection via parallel sub-networks. Pattern Recogn 121:108212
Wu Z, Shen C, Hengel AVD (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn 90:119–133
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Xue H-J, Dai X, Zhang J, Huang S, Chen J (2017) Deep matrix factorization models for recommender systems. IJCAI Melbourne Australia 17:3203–3209
Zabihollahy F, White JA, Ukwatta E (2019) Convolutional neural network-based approach for segmentation of left ventricle myocardial scar from 3d late gadolinium enhancement mr images. Med Phys 46(4):1740–1751
Zhang S, He F (2020) Drcdn: learning deep residual convolutional dehazing networks. Vis Comput 36(9):1797–1808
Zhang Z, Hong W-C (2021) Application of variational mode decomposition and chaotic grey wolf optimizer with support vector regression for forecasting electric loads. Knowl-Based Syst 228:107297
Zhang M, Lucas J, Ba J, Hinton GE (2019) Lookahead optimizer: k steps forward, 1 step back. Adv Neural Inf Process Sys:9597–9608
Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 405–420
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have stated that this paper has no potential conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tiwari, T., Saraswat, M. A new modified-unet deep learning model for semantic segmentation. Multimed Tools Appl 82, 3605–3625 (2023). https://doi.org/10.1007/s11042-022-13230-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13230-2