Stacked Hierarchical Labeling

  • Daniel Munoz
  • J. Andrew Bagnell
  • Martial Hebert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6316)


In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work on structured prediction for scene understanding, we bypass a global probabilistic model and instead directly train a hierarchical inference procedure inspired by the message passing mechanics of some approximate inference procedures in graphical models. This approach mitigates both the theoretical and empirical difficulties of learning probabilistic models when exact inference is intractable. In particular, we draw from recent work in machine learning and break the complex inference process into a hierarchical series of simple machine learning subproblems. Each subproblem in the hierarchy is designed to capture the image and contextual statistics in the scene. This hierarchy spans coarse-to-fine regions and explicitly models the mixtures of semantic labels that may be present due to imperfect segmentation. To avoid cascading of errors and overfitting, we train the learning problems in sequence to ensure robustness to likely errors earlier in the inference sequence and leverage the stacking approach developed by Cohen et al


Graphical Model Training Image Inference Procedure Parent Region Semantic Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: From contours to regions: An empirical evaluation. In: CVPR (2009)Google Scholar
  2. 2.
    Barbu, A.: Training an active random field for real-time image denoising. IEEE Trans. on Image Processing 18(11) (2009)Google Scholar
  3. 3.
    Bouman, C.A., Shapiro, M.: A multiscale random field model for bayesian image segmentation. IEEE Trans. on Image Processing 3(2) (1994)Google Scholar
  4. 4.
    Cohen, W.W., Carvalho, V.R.: Stacked sequential learning. In: IJCAI (2005)Google Scholar
  5. 5.
    Daume III, H., Langford, J., Marcu, D.: Search-based structured prediction. Machine Learning Journal 75(3) (2009)Google Scholar
  6. 6.
    Feng, X., Williams, C.K.I., Felderhof, S.N.: Combining belief networks and neural networks for scene segmentation. IEEE T-PAMI 24(4) (2002)Google Scholar
  7. 7.
    Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D.: Multi-class segmentation with relative location prior. IJCV 80(3) (2008)Google Scholar
  8. 8.
    Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: ICCV (2009)Google Scholar
  9. 9.
    Gould, S., Russakovsky, O., Goodfellow, I., Baumstarck, P., Ng, A.Y., Koller, D.: The stair vision library, v2.3 (2009),
  10. 10.
    Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models: Combining models for holistic scene understanding. In: NIPS (2008)Google Scholar
  11. 11.
    Kakade, S., Teh, Y.W., Roweis, S.: An alternate objective function for markovian fields. In: ICML (2002)Google Scholar
  12. 12.
    Kohli, P., Ladicky, L., Torr, P.H.: Robust higher order potentials for enforcing label consistency. IJCV 82(3) (2009)Google Scholar
  13. 13.
    Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE T-PAMI 26(2) (2004)Google Scholar
  14. 14.
    Komodakis, N., Paragios, N., Tziritas, G.: Mrf energy minimization and beyond via dual decomposition. IEEE T-PAMI (in press)Google Scholar
  15. 15.
    Kou, Z., Cohen, W.W.: Stacked graphical models for efficient inference in markov random fields. In: SDM (2007)Google Scholar
  16. 16.
    Kulesza, A., Pereira, F.: Structured learning with approximate inference. In: NIPS (2007)Google Scholar
  17. 17.
    Kumar, S., August, J., Hebert, M.: Exploiting inference for approximate parameter learning in discriminative fields: An empirical study. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 153–168. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. 18.
    Kumar, S., Hebert, M.: A hierarchical field framework for unified context-based classification. In: ICCV (2005)Google Scholar
  19. 19.
    Kumar, S., Hebert, M.: Discriminative random fields. IJCV 68(2) (2006)Google Scholar
  20. 20.
    Ladicky, L., Russell, C., Kohli, P., Torr, P.: Associative hierarchical crfs for object class image segmentation. In: ICCV (2009)Google Scholar
  21. 21.
    Lim, J.J., Arbelaez, P., Gu, C., Malik, J.: Context by region ancestry. In: ICCV (2009)Google Scholar
  22. 22.
    Maire, M., Arbelaez, P., Fowlkes, C., Malik, J.: Using contours to detect and localize junctions in natural images. In: CVPR (2008)Google Scholar
  23. 23.
    Ohta, Y., Kanade, T., Sakai, T.: An analysis system for scenes containing objects with substructures. In: Int’l. Joint Conference on Pattern Recognitions (1978)Google Scholar
  24. 24.
    Ratliff, N., Silver, D., Bagnell, J.A.: Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots 27(1) (2009)Google Scholar
  25. 25.
    Ross, S., Bagnell, J.A.: Efficient reductions for imitation learning. In: AIStats (2010)Google Scholar
  26. 26.
    Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV 81(1) (2009)Google Scholar
  27. 27.
    Tu, Z., Bai, X.: Auto-context and its application to high-level vision tasks and 3d brain image segmentation. T-PAMI 18(11) (2009)Google Scholar
  28. 28.
    Viola, P., Jones, M.J.: Robust real-time face detection. IJCV 57(2) (2004)Google Scholar
  29. 29.
    Wainwright, M.J.: Estimating the “wrong” graphical model: Benefits in the computation-limited setting. JMLR 7(11) (2006)Google Scholar
  30. 30.
    Wolpert, D.H.: Stacked generalization. Neural Networks 5(2) (1992)Google Scholar
  31. 31.
    Zhang, L., Ji, Q.: Image segmentation with a unified graphical model. T-PAMI 32(8) (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Daniel Munoz
    • 1
  • J. Andrew Bagnell
    • 1
  • Martial Hebert
    • 1
  1. 1.The Robotics InstituteCarnegie Mellon University 

Personalised recommendations