Sparse Dictionaries for Semantic Segmentation

  • Lingling Tao
  • Fatih Porikli
  • René Vidal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8693)

Abstract

A popular trend in semantic segmentation is to use top-down object information to improve bottom-up segmentation. For instance, the classification scores of the Bag of Features (BoF) model for image classification have been used to build a top-down categorization cost in a Conditional Random Field (CRF) model for semantic segmentation. Recent work shows that discriminative sparse dictionary learning (DSDL) can improve upon the unsupervised K-means dictionary learning method used in the BoF model due to the ability of DSDL to capture discriminative features from different classes. However, to the best of our knowledge, DSDL has not been used for building a top-down categorization cost for semantic segmentation. In this paper, we propose a CRF model that incorporates a DSDL based top-down cost for semantic segmentation. We show that the new CRF energy can be minimized using existing efficient discrete optimization techniques. Moreover, we propose a new method for jointly learning the CRF parameters, object classifiers and the visual dictionary. Our experiments demonstrate that by jointly learning these parameters, the feature representation becomes more discriminative and the segmentation performance improves with respect to that of state-of-the-art methods that use unsupervised K-means dictionary learning.

Keywords

discriminative sparse dictionary learning conditional random fields semantic segmentation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bach, F., Mairal, J., Ponce, J.: Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(4), 791–804 (2012)CrossRefGoogle Scholar
  2. 2.
    Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
  3. 3.
    Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  5. 5.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. Journal of Computer Vision 88(2), 303–338 (2010)CrossRefGoogle Scholar
  6. 6.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005 (2005)Google Scholar
  7. 7.
    Fulkerson, B., Vedaldi, A., Soatto, S.: Class segmentation and object localization with superpixel neighborhoods. In: IEEE Int. Conf. on Computer Vision (2009)Google Scholar
  8. 8.
    Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: IEEE Conf. on Computer Vision and Pattern Recognition (2008)Google Scholar
  9. 9.
    Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D.: Multi-class segmentation with relative location prior. International Journal of Computer Vision 80(3), 300–316 (2008)CrossRefGoogle Scholar
  10. 10.
    Jain, A., Zappella, L., McClure, P., Vidal, R.: Visual dictionary learning for joint object categorization and segmentation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 718–731. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Machine Learning 77(1), 27–59 (2009)CrossRefMATHGoogle Scholar
  12. 12.
    Kohli, P., Ladicky, L., Torr, P.H.S.: Robust higher order potentials for enforcing label consistency. In: IEEE Conf. on Computer Vision and Pattern Recognition (2008)Google Scholar
  13. 13.
    Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Trans. on Pattern Analysis and Machine Intelligence 26(2), 147–159 (2004)Google Scholar
  14. 14.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. In: Neural Information Processing Systems, pp. 109–117 (2011)Google Scholar
  15. 15.
    Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, where and how many? Combining object detectors and cRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Ladicky, L., Russell, C., Kohli, P., Torr, P.: Associative hierarchical CRFs for object class image segmentation. In: IEEE Int. Conf. on Computer Vision (2009)Google Scholar
  17. 17.
    Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (2001)Google Scholar
  18. 18.
    Laptev, I.: On space-time interest points. International Journal of Computer Vision 64(2-3), 107–123 (2005)CrossRefGoogle Scholar
  19. 19.
    Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. In: Neural Information Processing Systems, pp. 801–808 (2007)Google Scholar
  20. 20.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.: Discriminative learned dictionaries for local image analysis. IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  21. 21.
    Murphy, K.P., Weiss, Y., Jordan, M.I.: Loopy belief propagation for approximate inference: An empirical study. In: Proceedings of Uncertainty in AI, pp. 467–475 (1999)Google Scholar
  22. 22.
    Naikal, N., Singaraju, D., Sastry, S.S.: Using models of objects with deformable parts for joint categorization and segmentation of objects. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part II. LNCS, vol. 7725, pp. 79–93. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  23. 23.
    Opelt, A., Pinz, A.: The TU Graz-02 database (2002), http://www.emt.tugraz.at/~pinz/data/GRAZ02/
  24. 24.
    Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: IEEE Conf. on Computer Vision and Pattern Recognition (2008)Google Scholar
  25. 25.
    Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. Journal of Computer Vision 81(1), 2–23 (2009)CrossRefGoogle Scholar
  26. 26.
    Singaraju, D., Vidal, R.: Using global bag of features models in random fields for joint categorization and segmentation of objects. In: IEEE Conference on Computer Vision and Pattern Recognition (2011)Google Scholar
  27. 27.
    Tighe, J., Lazebnik, S.: Finding things: Image parsing with regions and per-exemplar detectors. In: IEEE Conf. on Computer Vision and Pattern Recognition (2013)Google Scholar
  28. 28.
    Vedaldi, A.: A MATLAB wrapper of SVMstruct (2011), http://www.vlfeat.org/~vedaldi/code/svm-struct-matlab.html
  29. 29.
    Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org/
  30. 30.
    Vedaldi, A., Soatto, S.: Quick shift and kernel methods for mode seeking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 705–718. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  31. 31.
    Yang, J., Yang, M.: Top-down visual saliency via joint crf and dictionary learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  32. 32.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  33. 33.
    Yao, J., Fidler, S., Urtasun, R.: Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In: IEEE Conf. on Computer Vision and Pattern Recognition (2012)Google Scholar
  34. 34.
    Yu, C.N.J., Joachims, T.: Learning structural svms with latent variables. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 1169–1176. ACM, New York (2009)Google Scholar
  35. 35.
    Zhang, K., Zhang, W., Zheng, Y., Xue, X.: Sparse reconstruction for weakly supervised semantic segmentation. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1889–1895 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Lingling Tao
    • 1
  • Fatih Porikli
    • 2
  • René Vidal
    • 1
  1. 1.Center for Imaging ScienceJohns Hopkins UniversityUSA
  2. 2.Australian National University & NICTA ICTAustralia

Personalised recommendations