Multimedia Tools and Applications

, Volume 77, Issue 17, pp 22199–22211 | Cite as

Semantic segmentation based on fusion of features and classifiers

  • Yanbing XueEmail author
  • Huiqiang Geng
  • Hua Zhang
  • Zhenshan Xue
  • Guangping Xu


This paper proposes a feed forward architecture algorithm using fusion of features and classifiers for semantic segmentation. The algorithm consists of three phases: Firstly, the features from hierarchical convolutional neural network (CNN) and the features based on region are extracted and fused on super pixel level; secondly, multiple classifiers of Softmax, XGBoost and Random Forest are ensemble to compute the per-pixel class probabilities; at last, a fully connected conditional random field is employed to enhance the final performance. The hierarchical features contain more global evidence and the region features contain more local evidence. So the fusion of these two features is expected to enhance the feature representation ability. In classification phase, integrating multiple classifiers aims to improve the generalization ability of classification algorithms. Experiments are conducted on Sift-Flow datasets by our proposed methods with competitive labeling accuracy.


Fusion Features Super-pixels Multiple classifiers CRF 



This research has been supported by National Natural Science Foundation of China (U1509207, 61472278, 61403281 and 61572357).


  1. 1.
    Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282CrossRefGoogle Scholar
  2. 2.
    Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127CrossRefzbMATHGoogle Scholar
  3. 3.
    Bertasius G, Shi J, Torresani L (2015) High-for-low and low for-high: Efficient boundary detection from deep object features and its applications to high-level vision. In: Proceedings of the IEEE International Conference on Computer Vision, p 504–512Google Scholar
  4. 4.
    Bu S, Han P, Liu Z, Han J (2016) Scene parsing using inference embedded deep networks. Pattern Recongnition 59:188–198CrossRefGoogle Scholar
  5. 5.
    Byeon W, Breuel TM, Raue F, Liwicki M (2015) Scene labeling with LSTM recurrent neural network. In: CVPRGoogle Scholar
  6. 6.
    Caesar H, Jasper U, Ferrari V. Region-based semantic segmentation with end-to-end training. arXiv1607.07671Google Scholar
  7. 7.
    Carreira J, Caseiro R, Batista J, Sminchisescu C (2012) Semantic segmentation with second-order pooling. In: ECCVGoogle Scholar
  8. 8.
    Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected crfs. In: ICLRWGoogle Scholar
  9. 9.
    Dai J, He K, Sun J (2015) Convolutional feature masking for joint object and stuff segmentation. In: CVPRGoogle Scholar
  10. 10.
    Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single imae using a multi-scale deep network. In: NIPSGoogle Scholar
  11. 11.
    Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchial features for scene labeling. IEEE TPAMIGoogle Scholar
  12. 12.
    Gao Z, Zhang L-f, Chen M-y, Hauptmann A, Zhang H, Cai A (2014) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimed Tools Appl 68(3):641–657CrossRefGoogle Scholar
  13. 13.
    Gatta C, Romero A, van de Veijer J (2014) Unrolling loopy top-down semantic feedback in convolutional deep networks. In: Workshop at CVPRGoogle Scholar
  14. 14.
    Girshick R (2015) Fast R-CNN. In: ICCVGoogle Scholar
  15. 15.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPRGoogle Scholar
  16. 16.
    Hariharan B, Arbelaez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: ECCVGoogle Scholar
  17. 17.
    He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition, CVPRGoogle Scholar
  18. 18.
    Janai J, Güney F, Behl A, Geiger A (2017) Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv:1704.05519v1Google Scholar
  19. 19.
    Kokkinos I (2015) Pushing the boundaries of boundary detection using deep learning. arXiv preprint arXiv:1511.07386Google Scholar
  20. 20.
    Krahenbuhl P, Koltun V (2011) Efficient inference in fully connected crfs with Gaussian edge potentials. In: NI-PSGoogle Scholar
  21. 21.
    Li F, Carreira J, Lebanon G, Sminchisescu C (2013) Composite statistical inference for semantic segmentation. In: CVPRGoogle Scholar
  22. 22.
    Liu A-A, Su Y-T, Nie W-Z, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114CrossRefGoogle Scholar
  23. 23.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE CVPRGoogle Scholar
  24. 24.
    Nie W-Z, Liu A-A, Gao Z, Su Y-T (2015) Clique-graph matching by preserving global & local structure. In: CVPR 2015. IEEE, BostonGoogle Scholar
  25. 25.
    Pinheiro P, Collobert R (2014) Recurrent convolutional neural networks for scene labeling. In: ICMLGoogle Scholar
  26. 26.
    Plath N, Toussaint M, Nakajima S (2009) Multi-class image segmentation using conditional random fields and global classification. In: ICMLGoogle Scholar
  27. 27.
    Wu Z, Shen C, Hengel AVD (2016) High-performance semantic segmentation using very deep fully convolutional networks. arXiv preprint arXiv:1604.04339Google Scholar
  28. 28.
    Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimedia 15(3):572–581CrossRefGoogle Scholar
  29. 29.
    Zavaschi THH, Britto AS, Oliveira LES, Koerich AL (2013) Fusion of feature sets and classifiers for facial expression recognition. Expert Syst Appl 40(2):646–655CrossRefGoogle Scholar
  30. 30.
    Zhang H, Shang X, Luan H, Wang M, Chua T-S (2016) Learning from collective intelligence: feature learning using social images and tags. ACM Trans Multimed Comput Commun Appl 13:1529–1537Google Scholar
  31. 31.
    Zheng S, Jayasuamana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P. Conditional random fields as recurrent neural networks. arXiv preprint ar-Xiv: 1502.03240Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Key Laboratory of Computer Vision and System (Ministry of Education)Tianjin University of TechnologyTianjinChina
  2. 2.Tianjin Key Laboratory of Intelligence Computing and Novel Software TechnologyTianjin University of TechnologyTianjinChina
  3. 3.Tianjin Sino-German University of Applied SciencesTianjinChina

Personalised recommendations