A robust multilevel segment description for multi-class object recognition


We present an attempt to improve the performance of multi-class image segmentation systems based on a multilevel description of segments. The multi-class image segmentation system used in this paper marks the segments in an image, describes the segments via multilevel feature vectors and passes the vectors to a multi-class object classifier. The focus of this paper is on the segment description section. We first propose a robust, scale-invariant texture feature set, named directional differences (DDs). This feature is designed by investigating the flaws of conventional texture features. The advantages of DDs are justified both analytically and experimentally. We have conducted several experiments on the performance of our multi-class image segmentation system to compare DDs with some well-known texture features. Experimental results show that DDs present about 8 % higher classification accuracy. Feature reduction experiments also show that in a combined feature space, DDs remain in the list of most effective features even for small feature vector sizes. To describe a segment fully, we introduce a multilevel strategy called different levels of feature extraction (DLFE) that enables the system to include the semantic relations and contextual information in the features. This information is very effective especially for highly occluded objects. DLFE concatenates the features related to different views of every segment. Experimental results that show more than 4 % improvement in multi-class image segmentation accuracy is achieved. Using the semantic information in the classifier section adds another 2 % improvement to the accuracy of the system.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.

    Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81, 2–23 (2009)

    Article  Google Scholar 

  2. 2.

    Shotton, J., Johnson, M., Cipolla, R.: Semantic Texton Forests for Image Categorization and Segmentation. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1–8 (2008)

  3. 3.

    Schroff, F., Criminisi, A., Zisserman, A.: Object class segmentation using random forests. In: Proceedings of the British machine vision conference, pp. 54.1–54.10 (2008)

  4. 4.

    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. IEEE Conf. Comput. Vis. Pattern Recogn. 1, 886–893 (2005)

    Google Scholar 

  5. 5.

    Bosch, Z., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval, pp. 401–408 (2007)

  6. 6.

    Overett, G., Tychsen-Smith, L., Petersson, L., Petersson, N., Andersson, L.: Creating robust high-throughput traffic sign detectors using centre-surround HOG statistics. Mach. Vis. Appl., pp. 1–14 (2011)

  7. 7.

    Xu, C., Prince, J.L.: Snakes, shapes, and gradient vector flow. IEEE Trans. Image Process. 7, 359–369 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  8. 8.

    Prewitt, J.: Object Enhancement and Extraction. In Picture Processing and Psychopictorics. Academic Press, New York (1970)

    Google Scholar 

  9. 9.

    Kekre, H.B., Gharge, S.M.: Image segmentation using extended edge operator for mammographic images. Int. J. Comput. Sci. Eng. 2(04), 1086–1091 (2010)

    Google Scholar 

  10. 10.

    Marr, D., Hildreth, E.: Theory of edge detection. Proc. R. Soc. Lond. Ser. B Biol. Sci. 207(1167), 187–217 (1980)

    Article  Google Scholar 

  11. 11.

    Mostajabi, M., Gholampour, I.: A framework based on the affine invariant regions for improving unsupervised image segmentation. Information sciences signal processing and their applications (ISSPA), 11th International conference on, Montreal, Canada, pp. 17–22 (2012)

  12. 12.

    Tighe, J., Lazebnik, S.: SuperParsing: scalable nonparametric image parsing with superpixels. Euro. Conf. Comput. Vis. (2010)

  13. 13.

    Tighe, J., Lazebnik, S.: Finding things: image parsing with regions and per-exemplar detectors. IEEE Conf. Comput. Vis. Pattern Recogn. (2013)

  14. 14.

    Carreira, J., Sminchisescu, C.: CPMC: Automatic object segmentation using constrained parametric min-cuts. In: IEEE transactions on pattern analysis and machine intelligence (TPAMI), July (2012)

  15. 15.

    Ion, A., Carreira, J., Sminchisescu, C.: Probabilistic joint image segmentation and labeling by figure-ground composition. Int. J. Comput. Vis. (IJCV), November (2013)

  16. 16.

    Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: ECCV, pp. 430–443 (2012)

  17. 17.

    Kohli, P., Ladick, L., Torr, P.: Robust higher order potentials for enforcing label consistency. Int. J. Comput. Vis. 82, 302–324 (2009)

    Article  Google Scholar 

  18. 18.

    Plath, N., Toussaint, M., Nakajima, S.: Multi-class image segmentation using conditional random fields and global classification. In: Proceedings of international conference on machine learning, pp. 817–824 (2009)

  19. 19.

    Bertelli, L., Tianli, Y., Vu, D., Gokturk, B.: Kernelized structural SVM learning for supervised object segmentation. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition, pp. 2153–2160 (2011)

  20. 20.

    Blaschko, M., Lampert, C.: Learning to localize objects with structured output regression.In: Proceedings of the 10th European conference on computer vision, Marseille, France, pp. 2–15 (2008)

  21. 21.

    Li, N., DiCarlo, J.: Unsupervised natural visual experience rapidly reshapes size-invariant object representation in inferior temporal cortex. Neuron 67, 1062–1075 (2010)

    Article  Google Scholar 

  22. 22.

    Oliva, Z., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11, 520–527 (2007)

    Article  Google Scholar 

  23. 23.

    Ma, W., Navalpakkam, V., Beck, J., Berg, R., Pouget, A.: Behavior and neural basis of near-optimal visual search. Nat. Neurosci. 14, 783–790 (2011)

    Article  Google Scholar 

  24. 24.

    Oliva, A.: Gist of the scene. In: The encyclopedia of neurobiology of attention, pp. 251–256 (2005)

  25. 25.

    Mylonas, P., Spyrou, E., Avrithis, Y., Kollias, S.: Using visual context and region semantics for high-level concept detection. Trans. Multi. 11, 229–243 (2009)

    Article  Google Scholar 

  26. 26.

    Torralba, Z., Murphy, K. P., Freeman, W. T.: Contextual models for object detection using boosted random fields. Adv. Neural Inf. Process. Syst. (NIPS), pp. 1401–1408 (2005)

  27. 27.

    Jian, Y., Fidler, S., Urtasun, R.: Describing the scene as a whole: joint object detection, scene classification and semantic segmentation. Comput. Vis. Pattern Recogn.(CVPR), pp. 702–709 (2012)

  28. 28.

    Galleguillos, C., Belongie, S.: Context based object categorization:a critical survey. Comput. Vis. Image Underst. 114, 712–722 (2010)

    Article  Google Scholar 

  29. 29.

    Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Class. pp. 61–74 (1999)

  30. 30.

    Liang, H., Zhang, H., Yan, Y.: Decision trees for probability estimation: an empirical study. IEEE 24th international conference on tools with artificial intelligence, pp. 756–764 (2012)

  31. 31.

    Herbert, B., Andreas, E., Tinne, T., Van Luc, G.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 346–359 (2008)

    Article  Google Scholar 

  32. 32.

    Comaniciu, D., Meer, P.: Mean Shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., pp. 603–619 (2002)

  33. 33.

    Jamie, S., John, W., Carsten, R., Antonio, C.: TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. Presented at the proceedings of the 9th European conference on computer vision, Volume Part I, pp. 1–15 (2006)

  34. 34.

    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings 18th international conference on machine learning, pp. 282–289 (2001)

  35. 35.

    Gould S., Fulton R., Koller D.: Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of international conference on computer vision (ICCV) (2009)

  36. 36.

    Chang, C.-C., Lin, C.-J.: LIBSVM : a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)

  37. 37.

    Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002)

    Article  Google Scholar 

  38. 38.

    Chen, Y.-W., Lin, C.-J.: Combining SVMs with various feature selection strategies. Featur. Extr. 207, 315–324 (2006)

    Article  Google Scholar 

  39. 39.

    CIE.: ‘Colorimetry’, CIE Pub. 15.2, 2nd ed., Commission International de L’Eclairage, pp. 29–30 (1986)

  40. 40.

    Domke, J.: Graphical models / conditional random field toolbox. http://users.cecs.anu.edu.au/~jdomke/JGMT/. Accessed 18 April (2013)

  41. 41.

    Domke, J.: Learning graphical model parameters with approximate marginal inference. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2454–2467 (2013)

    Article  Google Scholar 

  42. 42.

    Socher R., Lin C., Ng A., Manning C.: Parsing natural scenes and natural language with recursive neural networks. In ICML (2011)

  43. 43.

    Kumar M., Koller D.: Efficiently selecting regions for scene understanding. In CVPR (2010)

Download references

Author information



Corresponding author

Correspondence to Mohammadreza Mostajabi.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mostajabi, M., Gholampour, I. A robust multilevel segment description for multi-class object recognition. Machine Vision and Applications 26, 15–30 (2015). https://doi.org/10.1007/s00138-014-0642-1

Download citation


  • Contextual information
  • Feature extraction
  • Image segmentation
  • Multilevel description
  • Object Classification