Top-Down Learning for Structured Labeling with Convolutional Pseudoprior

  • Saining XieEmail author
  • Xun Huang
  • Zhuowen Tu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9908)


Current practice in convolutional neural networks (CNN) remains largely bottom-up and the role of top-down process in CNN for pattern analysis and visual inference is not very clear. In this paper, we propose a new method for structured labeling by developing convolutional pseudoprior (ConvPP) on the ground-truth labels. Our method has several interesting properties: (1) compared with classic machine learning algorithms like CRFs and Structural SVM, ConvPP automatically learns rich convolutional kernels to capture both short- and long- range contexts; (2) compared with cascade classifiers like Auto-Context, ConvPP avoids the iterative steps of learning a series of discriminative classifiers and automatically learns contextual configurations; (3) compared with recent efforts combining CNN models with CRFs and RNNs, ConvPP learns convolution in the labeling space with improved modeling capability and less manual specification; (4) compared with Bayesian models like MRFs, ConvPP capitalizes on the rich representation power of convolution by automatically learning priors built on convolutional filters. We accomplish our task using pseudo-likelihood approximation to the prior under a novel fixed-point network structure that facilitates an end-to-end learning process. We show state-of-the-art results on sequential labeling and image labeling benchmarks.


Structured prediction Deep learning Semantic segmentation Top-down processing 



This work is supported by NSF IIS-1618477, NSF IIS-1360566, NSF IIS-1360568, and a Northrop Grumman Contextual Robotics grant. We thank Zachary C. Lipton, Jameson Merkow, Long Jin for helping improve this manuscript. We are grateful for the generous donation of the GPUs by NVIDIA.

Supplementary material

419976_1_En_19_MOESM1_ESM.pdf (401 kb)
Supplementary material 1 (pdf 400 KB)


  1. 1.
    Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)CrossRefGoogle Scholar
  2. 2.
    Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields. In: ICML (2001)Google Scholar
  3. 3.
    Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006). doi: 10.1007/11744023_1 CrossRefGoogle Scholar
  4. 4.
    Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE PAMI 6(6), 721–741 (1984)CrossRefzbMATHGoogle Scholar
  5. 5.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. JMLR 6, 1453–1484 (2005)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: NIPS (2003)Google Scholar
  7. 7.
    Finley, T., Joachims, T.: Training structural SVMs when exact inference is intractable. In: ICML (2008)Google Scholar
  8. 8.
    Tu, Z.: Auto-context and its application to high-level vision tasks. In: CVPR (2008)Google Scholar
  9. 9.
    Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models. In: NIPS (2008)Google Scholar
  10. 10.
    Daumé, H.I., Langford, J., Marcu, D.: Search-based structured prediction. Mach. Learn. 75, 297–325 (2009)CrossRefGoogle Scholar
  11. 11.
    Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Ames Jr., A.: Visual perception and the rotating trapezoidal window. Psychol. Monogr. Gen. Appl. 65(7), i (1951)MathSciNetCrossRefGoogle Scholar
  13. 13.
    David, M.: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Henry Holt and Co., Inc., New York (1982)Google Scholar
  14. 14.
    Gibson, J.J.: A theory of direct visual perception. In: Vision and Mind: Selected Readings in the Philosophy of Perception, pp. 77–90 (2002)Google Scholar
  15. 15.
    Kersten, D., Mamassian, P., Yuille, A.: Object perception as Bayesian inference. Ann. Rev. Psychol. 55, 271–304 (2004)CrossRefGoogle Scholar
  16. 16.
    Tu, Z., Chen, X., Yuille, A.L., Zhu, S.C.: Image parsing: unifying segmentation, detection, and recognition. IJCV 63(2), 113–140 (2005)CrossRefGoogle Scholar
  17. 17.
    Borenstein, E., Ullman, S.: Combined top-down/bottom-up segmentation. IEEE PAMI 30(12), 2109–2125 (2008)CrossRefGoogle Scholar
  18. 18.
    Wu, T., Zhu, S.C.: A numerical study of the bottom-up and top-down inference processes in and-or graphs. IJCV 93(2), 226–252 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Krahenbuhl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011)Google Scholar
  20. 20.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  21. 21.
    Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks (2015). arXiv preprint arXiv:1502.03240
  22. 22.
    Lin, G., Shen, C., Reid, I., Hengel, A.v.d.: Deeply learning the messages in message passing inference. In: NIPS (2015)Google Scholar
  23. 23.
    Besag, J.: Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika 64, 616–618 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Zhu, S.C., Mumford, D.: A stochastic grammar of images. Found. Trends Comput. Graph. Vis. 2(4), 259–362 (2006)CrossRefzbMATHGoogle Scholar
  25. 25.
    He, X., Zemel, R.S., Carreira-Perpiñán, M.: Multiscale conditional random fields for image labeling. In: CVPR (2004)Google Scholar
  26. 26.
    Kae, A., Sohn, K., Lee, H., Learned-Miller, E.: Augmenting crfs with Boltzmann machine shape priors for image labeling. In: CVPR (2013)Google Scholar
  27. 27.
    Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data (2015). arXiv preprint arXiv:1506.05163
  28. 28.
    Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006). MIT PressMathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Snoek, J., Adams, R.P., Larochelle, H.: Nonparametric guidance of autoencoder representations using label information. JMLR 13, 2567–2588 (2012)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Bengio, Y., Thibodeau-Laufer, E., Alain, G., Yosinski, J.: Deep generative stochastic networks trainable by backprop (2013). arXiv preprint arXiv:1306.1091
  31. 31.
    Tu, Z., Narr, K.L., Dollár, P., Dinov, I., Thompson, P.M., Toga, A.W.: Brain anatomical structure segmentation by hybrid discriminative/generative models. IEEE Trans. Med. Imaging 27(4), 495–508 (2008)CrossRefGoogle Scholar
  32. 32.
    Li, Q., Wang, J., Wipf, D., Tu, Z.: Fixed-point model for structured labeling. ICML 28, 214–221 (2013)Google Scholar
  33. 33.
    Makhzani, A., Frey, B.: Winner-take-all autoencoders. In: NIPS (2015)Google Scholar
  34. 34.
    McCallum, A., Freitag, D., Pereira, F.C.: Maximum entropy Markov models for information extraction and segmentation. In: ICML (2000)Google Scholar
  35. 35.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE (1998)Google Scholar
  36. 36.
    Do, T., Arti, T.: Neural conditional random fields. In: AISTATS (2010)Google Scholar
  37. 37.
    Hoefel, G., Elkan, C.: Learning a two-stage SVM/CRF sequence classifier. In: CIKM, ACM (2008)Google Scholar
  38. 38.
    van der Maaten, L., Welling, M., Saul, L.K.: Hidden-unit conditional random fields. In: AISTATS (2011)Google Scholar
  39. 39.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) results (2012).
  40. 40.
    Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., et al.: The role of context for object detection and semantic segmentation in the wild. In: CVPR (2014)Google Scholar
  41. 41.
    Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: CVPR (2009)Google Scholar
  42. 42.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015)Google Scholar
  43. 43.
    Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33786-4_32 CrossRefGoogle Scholar
  44. 44.
    Dai, J., He, K., Sun, J.: Convolutional feature masking for joint object and stuff segmentation. In: CVPR (2015)Google Scholar
  45. 45.
    Dai, J., He, K., Sun, J.: BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: ICCV (2015)Google Scholar
  46. 46.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10590-1_53 Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Department of CogSci and Department of CSEUC San DiegoLa JollaUSA
  2. 2.Department of Computer ScienceCornell UniversityIthacaUSA

Personalised recommendations