Skip to main content
Log in

A new modified-unet deep learning model for semantic segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Segmentation of vehicles into images of road traffic with congested and unstructured traffic patterns is a challenging task. For the same, this paper presents a modified-UNet which segments the input image by the sequential encoding and decoding steps. The modified-UNet uses a number of convolutions, inception modules, and batch normalization to encode the image into feature map. The extracted map is decoded into segmented image by performing the transposed convolutions in different layers. To validate the performance, two publicly available datasets have been considered, namely autorickshaw and IDD-Lite. Performance of the proposed model has been analyzed against state-of-the-art segmentation models in terms of seven parameters, namely intersection over union, accuracy, error, specificity, sensitivity, F1-score, and correlation-coefficient. Furthermore, ablation experiments have also been conducted. Experimental results depict that the proposed model outperforms the other models and reported the best IoU-Scores of 0.82 and 0.61 on autorickshaw and IDD-Lite datasets respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Aydogdu MF, Celik V, Demirci MF (2017) Comparison of three different cnn architectures for age classification. In: IEEE 11th international conference on semantic computing (ICSC), pp 372–377. IEEE

  2. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  3. Buric M, Pobar M, Ivasic-Kos M (2018) “Ball detection using yolo and mask r-cnn”. In: International conference on computational science and computational intelligence (CSCI). IEEE, pp 319–323

  4. Byeon W, Breuel TM, Raue F, Liwicki M (2015) Scene labeling with lstm recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3547–3555

  5. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062

  6. Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587

  7. Chen C, Song J, Peng C, Wang G, Fang Y (2021) A novel video salient object detection method via semisupervised motion quality perception. IEEE Trans Circuits Syst Video Technol

  8. Chen C, Wang G, Peng C, Fang Y, Zhang D, Qin H (2021) Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans Image Process 30:3995–4007

    Article  Google Scholar 

  9. Chen C, Wei J, Peng C, Qin H (2021) Depth-quality-aware salient object detection. IEEE Trans Image Process 30:2350–2363

    Article  Google Scholar 

  10. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  11. Chu W, Liu Y, Shen C, Cai D, Hua X-S (2017) Multi-task vehicle detection with region-of-interest voting. IEEE Trans Image Process 27 (1):432–441

    Article  MathSciNet  MATH  Google Scholar 

  12. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  13. Deng L, Yang M, Li H, Li T, Hu B, Wang C (2019) Restricted deformable convolution-based road scene semantic segmentation using surround view cameras. IEEE Trans Intell Transp Syst

  14. Druzhkov P, Kustikova V (2016) A survey of deep learning methods and software tools for image classification and object detection. Pattern Recognit Image Anal 26(1):9–15

    Article  Google Scholar 

  15. Fan H, Mei X, Prokhorov D, Ling H (2018) Multi-level contextual rnns with attention model for scene labeling. IEEE Trans Intell Transp Syst 19 (11):3475–3485

    Article  Google Scholar 

  16. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  17. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  18. Haoran Li YCYP, He F (2021) Mlfs-ccde: multi-objective large-scale feature selection by cooperative coevolutionary differential evolution. Memetic Comp 13:1–18

    Article  Google Scholar 

  19. Hayou S, Doucet A, Rousseau J (2018) On the selection of initialization and activation function for deep neural networks. arXiv:1805.08266

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  21. Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi-supervised semantic segmentation. Adv Neural Inf Process Syst:1495–1503

  22. Huang Z, Wang X, Wang J, Liu W, Wang J (2018) Weakly-supervised semantic segmentation network with deep seeded region growing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7014–7023

  23. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

  24. Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y (2017) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 11–19

  25. Khoreva A, Benenson R, Hosang J, Hein M, Schiele B (2017) Simple does it: weakly supervised instance and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 876–885

  26. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst:1097–1105

  27. Lateef F, Ruichek Y (2019) Survey on semantic segmentation using deep learning techniques. Neurocomputing 338:321–348

    Article  Google Scholar 

  28. Li M-W, Wang Y-T, Geng J, Hong W-C (2021) Chaos cloud quantum bat hybrid optimization algorithm. Nonlinear Dynamics 103(1):1167–1193

    Article  Google Scholar 

  29. Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3367–3375

  30. Maysam Shahedi JDDMBF, Anusha Devi TT (2020) A study on u-net limitations in object localization and image segmentation. In: SIIM

  31. Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the european conference on computer vision (ECCV), pp 552–568

  32. Meyes R, Lu M, De Puiseau CW, Meisen T (2019) Ablation studies in artificial neural networks. arXiv:1901.08644

  33. Nikolenko S, Kadurin A, Arkhangelskaya E (2018) Deep learning.SPb. Peter

  34. Pan Y, He F, Yu H (2020) Learning social representations with deep autoencoder for recommender system. World Wide Web 23(4):2259–2279

    Article  Google Scholar 

  35. Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147

  36. Pohlen T, Hermans A, Mathias M, Leibe B (2017) Full-resolution residual networks for semantic segmentation in street scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4151–4160

  37. Quan Q, He F, Li H (2021) A multi-phase blending method with incremental intensity for training detection networks. Vis Comput 37(2):245–259

    Article  Google Scholar 

  38. Reddy DR (1975) Speech recognition: invited papers presented at the 1974 IEEE symposium. Elsevier

  39. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn : towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst:91–99

  40. Romera E, Alvarez JM, Bergasa LM, Arroyo R (2017) Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272

    Article  Google Scholar 

  41. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241

  42. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  43. Sirignano J, Spiliopoulos K (2019) Scaling limit of neural networks with the xavier initialization and convergence to a global minimum. arXiv:1907.04108

  44. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  45. Valada A, Vertens J, Dhall A, Burgard W (2017) Adapnet: adaptive semantic segmentation in adverse environmental conditions. In: IEEE international conference on robotics and automation (ICRA). IEEE, pp 4644–4651

  46. Wang G, Chen C, Fan D-P, Hao A, Qin H (2021) From semantic categories to fixations: a novel weakly-supervised visual-auditory saliency detection approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15119–15128

  47. Wu Z, Li S, Chen C, Hao A, Qin H (2022) Recursive multi-model complementary deep fusion for robust salient object detection via parallel sub-networks. Pattern Recogn 121:108212

    Article  Google Scholar 

  48. Wu Z, Shen C, Hengel AVD (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn 90:119–133

    Article  Google Scholar 

  49. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

  50. Xue H-J, Dai X, Zhang J, Huang S, Chen J (2017) Deep matrix factorization models for recommender systems. IJCAI Melbourne Australia 17:3203–3209

    Google Scholar 

  51. Zabihollahy F, White JA, Ukwatta E (2019) Convolutional neural network-based approach for segmentation of left ventricle myocardial scar from 3d late gadolinium enhancement mr images. Med Phys 46(4):1740–1751

    Article  Google Scholar 

  52. Zhang S, He F (2020) Drcdn: learning deep residual convolutional dehazing networks. Vis Comput 36(9):1797–1808

    Article  Google Scholar 

  53. Zhang Z, Hong W-C (2021) Application of variational mode decomposition and chaotic grey wolf optimizer with support vector regression for forecasting electric loads. Knowl-Based Syst 228:107297

    Article  Google Scholar 

  54. Zhang M, Lucas J, Ba J, Hinton GE (2019) Lookahead optimizer: k steps forward, 1 step back. Adv Neural Inf Process Sys:9597–9608

  55. Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 405–420

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Twinkle Tiwari.

Ethics declarations

Competing interests

The authors have stated that this paper has no potential conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tiwari, T., Saraswat, M. A new modified-unet deep learning model for semantic segmentation. Multimed Tools Appl 82, 3605–3625 (2023). https://doi.org/10.1007/s11042-022-13230-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13230-2

Keywords

Navigation