Semantic segmentation using reinforced fully convolutional densenet with multiscale kernel

  • Sourour Brahimi
  • Najib Ben AounEmail author
  • Alexandre Benoit
  • Patrick Lambert
  • Chokri Ben Amar


In recent years, semantic segmentation has become one of the most active tasks of the computer vision field. Its goal is to group image pixels into semantically meaningful regions. Deep learning methods, in particular those who use convolutional neural network (CNN), have shown a big success for the semantic segmentation task. In this paper, we will introduce a semantic segmentation system using a reinforced fully convolutional densenet with multiscale kernel prediction method. Our main contribution is to build an encoder-decoder based architecture where we increase the width of dense block in the encoder part by conducting recurrent connections inside the dense block. The resulting network structure is called wider dense block where each dense block takes not only the output of the previous layer but also the initial input of the dense block. These recurrent structure emulates the human brain system and helps to strengthen the extraction of the target features. As a result, our network becomes deeper and wider with no additional parameters used because of weights sharing. Moreover, a multiscale convolutional layer has been conducted after the last dense block of the decoder part to perform model averaging over different spatial scales and to provide a more flexible method. This proposed method has been evaluated on two semantic segmentation benchmarks: CamVid and Cityscapes. Our method outperforms many recent works from the state of the art.


Semantic Segmentation Fully Convolutional DenseNet Wider Dense Block MultiScale kernel prediction 



The research leading to these results has received funding from the Ministry of Higher Education and Scientific Research of Tunisia under the grant agreement number LR11ES48. LISTIC experiments have been made possible thanks to the MUST computing center of the University of Savoie Mont Blanc.


  1. 1.
    Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Ghemawat S Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Publicly available at:
  2. 2.
    Alhaija H-A, Mustikovela S-K, Mescheder L, Geiger A, Rother C (2017) Augmented reality meets deep learning for car instance segmentation in urban scenes. In: British machine vision conference, vol 3Google Scholar
  3. 3.
    Audebert N, Le Saux B, Lefevre S (2016) Semantic segmentation of earth observation data using multimodal and multi-scale deep networks. In: Asian conference on computer vision, pp 180–196Google Scholar
  4. 4.
    Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561,2015
  5. 5.
    Batenburg K-J, Sijbers J (2009) Adaptive thresholding of tomograms by projection distance minimization. Pattern Recogn 42(10):2297–2305CrossRefGoogle Scholar
  6. 6.
    Batenburg K-J, Sijbers J (2009) Optimal threshold selection for tomogram segmentation by projection distance minimization. IEEE Trans Med Imaging 28 (5):676–686CrossRefGoogle Scholar
  7. 7.
    Ben Ahmed O, Benois-Pineau J, Allard M, Ben Amar C, Catheline G (2014) Classification of Alzheimer’s disease subjects from MRI using hippocampal visual features. Multimedia Tools and Applications 74(4):1249–1266CrossRefGoogle Scholar
  8. 8.
    Ben Aoun N, Elarbi M, Ben Amar C (2010) Multiresolution motion estimation and compensation for video coding. In: ICSP, pp 1121–1124Google Scholar
  9. 9.
    Ben Aoun N, Elghazel H, Ben Amar C (2011) Graph modeling based video event detection. In: IIT, pp 114–117Google Scholar
  10. 10.
    Ben Aoun N, Elghazel H, Hacid M-S, Ben Amar C (2011) Graph aggregation based image modeling and indexing for video annotation. In: CAIP, pp 324–331Google Scholar
  11. 11.
    Ben Aoun N, Elarbi M, Ben Amar C (2012) Wavelet transform based motion estimation and compensation for video coding. Advances in Wavelet Theory and Their Applications in Engineering, Physics and Technology, Dr. Dumitru Baleanu (Ed.), 23–40Google Scholar
  12. 12.
    Ben Aoun N, Mejdoub M, Ben Amar C (2014) Graph-based approach for human action recognition using spatio-temporal features. J Vis Commun Image Represent 25 (2):329–338CrossRefGoogle Scholar
  13. 13.
    Ben Aoun N, Mejdoub M, Ben Amar C (2014) Graph-based video event recognition. In: ICASSP, pp 1566–1570Google Scholar
  14. 14.
    Brahimi S, Ben Aoun N, Ben Amar C (2018) Boosted convolutional neural network for object recognition at large scale. NeuroComputing 330:337–354CrossRefGoogle Scholar
  15. 15.
    Brahimi S, Ben Aoun N, Ben amar c (2016) Improved very deep recurrent convolutional neural network for object recognition. In: SMC, pp 2497–2502Google Scholar
  16. 16.
    Brahimi S, Ben Aoun N, Ben amar c (2016) Very deep recurrent convolutional neural network for object recognition. In: ICMVGoogle Scholar
  17. 17.
    Brahimi S, Ben Aoun N, Ben Amar C, Benoit A, Lambert P (2018) Multiscale fully convolutional densenet for semantic segmentation. In: International conference on computer graphics, visualization and computer visionGoogle Scholar
  18. 18.
    Brostow G-J, Fauqueur J, Cipolla R (2009) Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett 30(2):88–97CrossRefGoogle Scholar
  19. 19.
    Chen B-K, Gong C, Yang J (2017) Importance-aware semantic segmentation for autonomous driving system. In: Proceedings of the international joint conference on artificial intelligence, pp 1504–1510Google Scholar
  20. 20.
    Chen L-C, Barron J-T, Papandreou G, Murphy K, Yuille A-L (2016) Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In: IEEE conference on computer vision and pattern recognition, pp 4545–4554Google Scholar
  21. 21.
    Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille A-L (2014) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv:1606.00915
  22. 22.
    Cordts M, Omran M, Ramos S, Scharwachter T, Enzweiler T, Benenson R, Franke U, Roth S, Schiele B (2015) The cityscapes dataset. In: CVPR workshop on the future of datasets in visionGoogle Scholar
  23. 23.
    Dinarelli M, Tellier I (2016) Improving recurrent neural networks for sequence labelling. arXiv:1606.02555
  24. 24.
    Boughrara H, Chtourou M, Ben Amar C (2012) MLP neural network based face recognition system using constructive training algorithm. In: International conference on multimedia computing and systems (ICMCS), pp 233–238Google Scholar
  25. 25.
    El’Arbi M, Ben Amar C, Nicolas H (2006) Video watermarking based on neural networks. In: IEEE international conference on multimedia and expo (ICME), pp 1577–1580Google Scholar
  26. 26.
    Fabijanska A, Goclawski J (2014) New accelerated graph-based method of image segmentation applying minimum spanning tree. IET Image Process 8(4):239–251CrossRefGoogle Scholar
  27. 27.
    Gao H, Zhuang L, Kilian Q-W (2016) Densely connected convolutional networks. arXiv:1608.06993v3
  28. 28.
    Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017) A review on deep learning techniques applied to semantic segmentation. arXiv:1704.06857
  29. 29.
    Guedri B, Zaied M, Ben Amar C (2011) Indexing and images retrieval by content. In: International conference on high performance computing and simulation (HPCS), pp 369–375Google Scholar
  30. 30.
    He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034Google Scholar
  31. 31.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  32. 32.
    Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y (2017, July) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Computer vision and pattern recognition workshops (CVPRW), pp 1175–1183Google Scholar
  33. 33.
    Kayalibay B, Jensen G, Smagt P (2017) CNN-based segmentation of medical imaging data. arXiv:1701.03056v2
  34. 34.
    Kendall A, Badrinarayanan V, Cipolla R (2015) Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.02680
  35. 35.
    Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105Google Scholar
  36. 36.
    Lai S, Xu L, Liu K, Zhao J (2015, January) Recurrent convolutional neural networks for text classification. In: AAAI, vol 333, pp 2267–2273Google Scholar
  37. 37.
    Lin J, Wang W-J, Huang S-K, Chen H-C (2017) Learning based semantic segmentation for robot navigation in outdoor environment. In: Fuzzy systems association and 9th international conference on soft computing and intelligent systems (IFSA-SCIS), pp 1–5Google Scholar
  38. 38.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440Google Scholar
  39. 39.
    Mejdoub M, Fonteles L, Ben Amar C, Antonini M (2008) Fast indexing method for image retrieval using tree-structured lattices. In: International workshop on content-based multimedia indexing (CBMI), pp 365–372Google Scholar
  40. 40.
    Mejdoub M, Ben Aoun N, Ben Amar C (2015) Bag of frequent subgraphs approach for image classification. Intell Data Anal 19(1):75–88CrossRefGoogle Scholar
  41. 41.
    Othmani M, Bellil W, Ben Amar C, Alimi AM (2010) A new structure and training procedure for multi-mother wavelet networks. Int J Wavelets Multiresolution Inf Process 8(1):149–175MathSciNetCrossRefGoogle Scholar
  42. 42.
    Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147
  43. 43.
    Pourian N, Karthikeyan S, Manjunath B-S (2015) Weakly supervised graph based semantic segmentation by learning communities of image-parts. In: Proceedings of the IEEE international conference on computer vision, pp 1359–1367Google Scholar
  44. 44.
    Qin A-K, Clausi D-A (2010) Multivariate image segmentation using semantic region growing with adaptive edge penalty. IEEE Trans Image Process 19(8):2157–2170MathSciNetCrossRefGoogle Scholar
  45. 45.
    Raza S-H, Grundmann M, Essa I (2013) Geometric context from video. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3081–3088Google Scholar
  46. 46.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A-C, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
  47. 47.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  48. 48.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9Google Scholar
  49. 49.
    Tieleman T, Hinton G (2012) rmsprop adaptive learning. In: COURSERA: neural networks for machine learningGoogle Scholar
  50. 50.
    Visin F, Ciccone M, Romero A, Kastner K, Cho K, Bengio Y, Matteucci M, Courville A (2016) Reseg: a recurrent neural network-based model for semantic segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR) workshops, pp 426–433Google Scholar
  51. 51.
    Visin F, Kastner K, Cho K, Matteucci M, Courville A-C, Bengio Y (2015) Renet: a recurrent neural network based alternative to convolutional networks. arXiv:1505.00393v3
  52. 52.
    Wali A, Ben Aoun N, Karray H, Ben Amar C, Alimi AM (2010) A new system for event detection from video surveillance sequences. In: ACIVS, pp 110–120Google Scholar
  53. 53.
    Wan J, Wang D, Hoi S-C-H, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: a comprehensive study. In: ACM international conference on multimedia, pp 157–166Google Scholar
  54. 54.
    Wang C, Yang H, Bartz C, Meinel C (2016) Image captioning with deep bidirectional LSTMs. In: Proceedings of the 2016 ACM on multimedia conference, pp 988–997Google Scholar
  55. 55.
    Wang C, Yang H, Meinel C (2015) Deep semantic mapping for cross-modal retrieval. In: Tools with artificial intelligence (ICTAI), pp 234–241Google Scholar
  56. 56.
    Wang C, Yang H, Meinel C (2016) Exploring multimodal video representation for action recognition. In: Neural networks (IJCNN), pp 1924–1931Google Scholar
  57. 57.
    Wang C, Yang H, Meinel C (2018) Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14(2s):40:1–40:20Google Scholar
  58. 58.
    Wu Z, Shen C, Hengel A-V-D (2016) Wider or deeper: revisiting the resnet model for visual recognition. arXiv:1611.10080
  59. 59.
    Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122
  60. 60.
    Zhang K, Zhang W, Zeng S, Xue X (2014) Semantic segmentation using multiple graphs with Block-Diagonal constraints. In: AAAI, pp 2867–2873Google Scholar
  61. 61.
    Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2881–2890Google Scholar
  62. 62.
    Zou W, Kpalma K, Ronsin J (2012) Semantic segmentation via sparse coding over hierarchical regions. In: Image processing (ICIP), pp 2577–2580Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Sourour Brahimi
    • 1
  • Najib Ben Aoun
    • 1
    • 2
    Email author
  • Alexandre Benoit
    • 3
  • Patrick Lambert
    • 3
  • Chokri Ben Amar
    • 1
    • 4
  1. 1.REGIM-Lab.: REsearch Groups in Intelligent MachinesUniversity of Sfax, National Engineering School of Sfax (ENIS)SfaxTunisia
  2. 2.Department of Computer Science, College of Computer Science and Information TechnologyAl-Baha UniversityAl BahaSaudi Arabia
  3. 3.LISTIC-Lab: Univ. Savoie Mont Blanc, LISTICPolytech Annecy ChambéryAnnecyFrance
  4. 4.Department of Computer Engineering, College of Computers and Information TechnologyTaif UniversityTaifSaudi Arabia

Personalised recommendations