Multimedia Tools and Applications

, Volume 78, Issue 22, pp 32379–32392 | Cite as

Convolution-deconvolution architecture with the pyramid pooling module for semantic segmentation

  • Amirhossein MalekijooEmail author
  • Mohammad Javad Fadaeieslam


Recognizing the content of an image is an important challenge in machine vision. Semantic segmentation is one of the most important ways to overcome this challenge. It is utilized in different applications such as autonomous driving, indoor navigation, virtual or augmented reality systems, and recognition tasks. In this paper, a novel and practical deep fully convolutional neural network architecture was introduced for semantic pixel-wise segmentation termed as P-DecovNet. The proposed architecture combines the Convolution-Deconvolution Neural Network architecture with the Pyramid Pooling Module. In this project, the high-level features were extracted from the image using the Convolutional Neural Network. To reinforce the local information, the Pooling module was added to the architecture. CamVid road scene dataset was used to evaluate the performance of the P-DecovNet. With respect to different criteria (including - but not limited to - accuracy and mIoU), the experimental results demonstrated that P-DecovNet practically has a good performance in the domain of Convolution-Deconvolution Network. To achieve such performance, this work uses a smaller number of training images with lesser iterations compared to the existing methods.


Convolution neural network Machine vision Semantic pixel-wise segmentation Convolution-deconvolution network Road scene dataset 


Compliance with ethical standards

Conflict of interest

The authors declared no conflict of interest.


  1. 1.
    Alhaija H, Mustikovela S, Mescheder L, Geiger A, Rother C (2018) Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes. International Journal of Computer Vision (IJCV)Google Scholar
  2. 2.
    Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12)CrossRefGoogle Scholar
  3. 3.
    Brostow G, Fauqueur J, Cipolla R (2009) Semantic object classes in video: A high-definition ground truth database. PRL 30(2):88–97CrossRefGoogle Scholar
  4. 4.
    Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected. crfs. In: ICLRGoogle Scholar
  5. 5.
    Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4)CrossRefGoogle Scholar
  6. 6.
    Dumoulin et al (2018) Feature-wise transformations. Distill.
  7. 7.
    A. Garcia-Garcia, et al. (2017) A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv:1704.06857Google Scholar
  8. 8.
    Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hyper-columns for object segmentation and fine-grained localization. In: CVPRGoogle Scholar
  9. 9.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. CVPRGoogle Scholar
  10. 10.
    Jerripothula KR, Cai J, Yuan J (2016) Image Co-segmentation via saliency co-fusion. IEEE Trans on Multimedia 18(9):1896–1909CrossRefGoogle Scholar
  11. 11.
    Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR)Google Scholar
  12. 12.
    Kong S, Fowlkes C (2018) Pixel-wise Attentional Gating for Parsimonious Pixel Labeling. arXiv:1805.01556Google Scholar
  13. 13.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105Google Scholar
  14. 14.
    LeCun Y, Boser B, Denker J, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to hand-written zip code recognition. Neural ComputGoogle Scholar
  15. 15.
    LeCun YA, Bottou L, Orr GB, Müller K-R (1998) Efficient backprop. In: Neural networks: Tricks of the trade, pages 9–48. SpringerGoogle Scholar
  16. 16.
    Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multipath refinement networks with identity mappings for highresolution semantic segmentation. In: CVPRGoogle Scholar
  17. 17.
    F. Liu, C. Shen, G. Lin, and I. D. Reid (2015) Learning depth from single monocular images using deep convolutional neural fields. CoRR, abs/150207411Google Scholar
  18. 18.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPRGoogle Scholar
  19. 19.
    Mortensen EN, Barrett WA (1998. [Online) Interactive Segmentation with Intelligent Scissors. Graphical Models and Image Processing 60(5):349–384. CrossRefzbMATHGoogle Scholar
  20. 20.
    Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. Proc IEEE Conf Comput Vis Pattern Recognit:3376–3385Google Scholar
  21. 21.
    Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In ICCVGoogle Scholar
  22. 22.
    Pohlen T, Hermans A, Mathias M, Leibe B (2017) Fullresolution residual networks for semantic segmentation in street scenes. In: CVPRGoogle Scholar
  23. 23.
    Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In MICCAIGoogle Scholar
  24. 24.
    Rother C, Kolmogorov V, Blake A (2004) Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314CrossRefGoogle Scholar
  25. 25.
    Thoma M (2016) A survey of semantic segmentation, CoRR, vol. abs/1602.06541, Available:
  26. 26.
    Wang SH, Lv YD, Sui Y, Liu S, Wang SJ, Zhang YD (2018) Alcoholism Detection by Data Augmentation and Convolutional Neural Network with Stochastic Pooling. J Med Syst 42(2)Google Scholar
  27. 27.
    Wenzhe S, Jose C, Lucas T, Ference H, Andrew A, Christian L (2016) Wang Zehan: “Is the deconvolution layer the same as a convolutional layer,” arXiv 1609:07009Google Scholar
  28. 28.
    Yang G, Zhao H, Shi J, Deng Z, Jia J (2018) SegStereo: Exploiting Semantic Information for Disparity Estimation. arXiv:1807.11699Google Scholar
  29. 29.
    Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. In: International Conference on Learning Representations (ICLR), IEEE, Scottsdale, pp 1–7.Google Scholar
  30. 30.
    Zhang YD, Muhammad K, Tan C (2018) Twelve-layer deep convolutional neural network with stochastic pooling for tea category classification on GPU platform. Multimed Tools Appl 77:22821CrossRefGoogle Scholar
  31. 31.
    Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference onGoogle Scholar
  32. 32.
    Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P (2015) Conditional random fields as recurrent neural networks. In: ICCVGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Electrical and Computer Engineering DepartmentSemnan UniversitySemnanIran

Personalised recommendations