Skip to main content

Nonparametric Scene Parsing via Label Transfer

  • Chapter
Dense Image Correspondences for Computer Vision

Abstract

While there has been a lot of recent work on object recognition and image understanding, the focus has been on carefully establishing mathematical models for images, scenes, and objects. In this chapter, we propose a novel, nonparametric approach for object recognition and scene parsing using a new technology we name label transfer. For an input image, our system first retrieves its nearest neighbors from a large database containing fully annotated images. Then, the system establishes dense correspondences between the input image and each of the nearest neighbors using the dense SIFT flow algorithm (Liu et al., 33(5):978–994, 2011 Chap. 2), which aligns two images based on local image structures. Finally, based on the dense scene correspondences obtained from the SIFT flow, our system warps the existing annotations, and integrates multiple cues in a Markov random field framework to segment and recognize the query image. Promising experimental results have been achieved by our nonparametric scene parsing system on challenging databases. Compared to existing object recognition approaches that require training classifiers or appearance models for each object category, our system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Other scene parsing and image understanding systems also require such a database. We do not require more than others.

  2. 2.

    SIFT descriptors are computed at each pixel using a 16 × 16 window. The window is divided into 4 × 4 cells, and image gradients within each cell are quantized into a 8-bin histogram. Therefore, the pixel-wise SIFT feature is a 128-D vector.

  3. 3.

    This extrapolation is different from moving to a larger database in Sect. 5.2, where indoor scenes are included. This number is anticipated only when images similar to the LMO database are added.

References

  1. Adelson, E.H.: On seeing stuff: the perception of materials by humans and machines. In: SPIE, Human Vision and Electronic Imaging VI, pp. 1–12 (2001)

    Google Scholar 

  2. Belongie, S., Malik, J., Puzicha, J.: Shape context: a new descriptor for shape matching and object recognition. In: Advances in Neural Information Processing Systems (NIPS) (2000)

    Google Scholar 

  3. Berg, A., Berg, T., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

    Google Scholar 

  4. Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and Applications, 2nd edn. Springer, New York (2005)

    MATH  Google Scholar 

  5. Branson, S., Wah, C., Babenko, B., Schroff, F., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In: European Conference on Computer Vision (ECCV) (2010)

    Google Scholar 

  6. Choi, M.J., Lim, J.J., Torralba, A., Willsky, A.: Exploiting hierarchical context on a large database of object categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

  7. Crandall, D., Felzenszwalb, P., Huttenlocher, D.: Spatial priors for part-based recognition using statistical models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

    Google Scholar 

  8. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

    Google Scholar 

  9. Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: IEEE International Conference on Computer Vision (ICCV) (2009)

    Google Scholar 

  10. Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

    Google Scholar 

  11. Edwards, G., Cootes, T., Taylor, C.: Face recognition using active appearance models. In: European Conference on Computer Vision (ECCV) (1998)

    Google Scholar 

  12. Efros, A.A., Leung, T.: Texture synthesis by non-parametric sampling. In: IEEE International Conference on Computer Vision (ICCV) (1999)

    Google Scholar 

  13. Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)

    Article  Google Scholar 

  14. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008)

    Google Scholar 

  15. Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2003)

    Google Scholar 

  16. Frome, A., Singer, Y., Malik, J.: Image retrieval and classification using local distance functions. In: Advances in Neural Information Processing Systems (NIPS) (2006)

    Google Scholar 

  17. Galleguillos, C., McFee, B., Belongie, S., Lanckriet, G.R.G.: Multi-class object localization by combining local contextual interactions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

  18. Grauman, K., Darrell, T.: Pyramid match kernels: Discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision (ICCV) (2005)

    Google Scholar 

  19. Gupta, A., Davis, L.S.: Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In: European Conference on Computer Vision (ECCV) (2008)

    Google Scholar 

  20. Hays, J., Efros, A.A.: Scene completion using millions of photographs. ACM SIGGRAPH 26(3) (2007)

    Google Scholar 

  21. Heitz, G., Koller, D.: Learning spatial context: using stuff to find things. In: European Conference on Computer Vision (ECCV) (2008)

    Google Scholar 

  22. Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)

    Google Scholar 

  23. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. II, pp. 2169–2178 (2006)

    Google Scholar 

  24. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  25. Liang, L., Liu, C., Xu, Y.Q., Guo, B.N., Shum, H.Y.: Real-time texture synthesis by patch-based sampling. ACM Trans. Graph. (TOG) 20(3), 127–150 (2001)

    Google Scholar 

  26. Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: European Conference on Computer Vision (ECCV) (2008)

    Google Scholar 

  27. Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

    Google Scholar 

  28. Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across different scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)

    Article  Google Scholar 

  29. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  30. Murphy, K.P., Torralba, A., Freeman, W.T.: Using the forest to see the trees: a graphical model relating features, objects, and scenes. In: Advances in Neural Information Processing Systems (NIPS) (2003)

    Google Scholar 

  31. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)

    Google Scholar 

  32. Obdrzalek, S., Matas, J.: Sub-linear indexing for large scale object recognition. In: British Machine Vision Conference (2005)

    Book  Google Scholar 

  33. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  34. Park, D., Ramanan, D., Fowlkes, C.: Multiresolution models for object detection. In: European Conference on Computer Vision (ECCV) (2010)

    Google Scholar 

  35. Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: IEEE International Conference on Computer Vision (ICCV) (2007)

    Google Scholar 

  36. Russell, B.C., Torralba, A., Liu, C., Fergus, R., Freeman, W.T.: Object recognition by scene alignment. In: Advances in Neural Information Processing Systems (NIPS) (2007)

    Google Scholar 

  37. Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)

    Google Scholar 

  38. Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A.: Segmenting scenes by matching image composites. In: Advances in Neural Information Processing Systems (NIPS) (2009)

    Google Scholar 

  39. Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)

    Google Scholar 

  40. Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter sensitive hashing. In: IEEE International Conference on Computer Vision (ICCV) (2003)

    Google Scholar 

  41. Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)

    Google Scholar 

  42. Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81(1), 2–23 (2009)

    Google Scholar 

  43. Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision (ICCV) (2003)

    Google Scholar 

  44. Sudderth, E., Torralba, A., Freeman, W.T., Willsky, W.: Describing visual scenes using transformed dirichlet processes. In: Advances in Neural Information Processing Systems (NIPS) (2005)

    Google Scholar 

  45. Tighe, J., Lazebnik, S.: Superparsing: Scalable nonparametric image parsing with superpixels. In: European Conference on Computer Vision (ECCV) (2010)

    Google Scholar 

  46. Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large dataset for non-parametric object and scene recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2008)

    Google Scholar 

  47. Turk, M., Pentland, A.: Face recognition using eigenfaces. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1991)

    Google Scholar 

  48. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2001)

    Google Scholar 

  49. Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: European Conference on Computer Vision (ECCV) (2000)

    Google Scholar 

  50. Winn, J., Criminisi, A., Minka, T.: Object categorization by learned universal visual dictionary. In: IEEE International Conference on Computer Vision (ICCV) (2005)

    Google Scholar 

  51. Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

  52. Yang, Y., Hallman, S., Ramanan, D., Fowlkes, C.: Layered object detection for multi-class segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ce Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Liu, C., Yuen, J., Torralba, A. (2016). Nonparametric Scene Parsing via Label Transfer. In: Hassner, T., Liu, C. (eds) Dense Image Correspondences for Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-319-23048-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23048-1_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23047-4

  • Online ISBN: 978-3-319-23048-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics