What are Textons?

Zhu, Song-Chun; Guo, Cheng-en; Wang, Yizhou; Xu, Zijian

doi:10.1023/B:VISI.0000046592.70770.61

Song-Chun Zhu¹,
Cheng-en Guo¹,
Yizhou Wang¹ &
…
Zijian Xu¹

838 Accesses
129 Citations
3 Altmetric
Explore all metrics

Abstract

Textons refer to fundamental micro-structures in natural images (and videos) and are considered as the atoms of pre-attentive human visual perception (Julesz, 1981). Unfortunately, the word “texton” remains a vague concept in the literature for lack of a good mathematical model. In this article, we first present a three-level generative image model for learning textons from texture images. In this model, an image is a superposition of a number of image bases selected from an over-complete dictionary including various Gabor and Laplacian of Gaussian functions at various locations, scales, and orientations. These image bases are, in turn, generated by a smaller number of texton elements, selected from a dictionary of textons. By analogy to the waveform-phoneme-word hierarchy in speech, the pixel-base-texton hierarchy presents an increasingly abstract visual description and leads to dimension reduction and variable decoupling. By fitting the generative model to observed images, we can learn the texton dictionary as parameters of the generative model. Then the paper proceeds to study the geometric, dynamic, and photometric structures of the texton representation by further extending the generative model to account for motion and illumination variations. (1) For the geometric structures, a texton consists of a number of image bases with deformable spatial configurations. The geometric structures are learned from static texture images. (2) For the dynamic structures, the motion of a texton is characterized by a Markov chain model in time which sometimes can switch geometric configurations during the movement. We call the moving textons as “motons”. The dynamic models are learned using the trajectories of the textons inferred from video sequence. (3) For photometric structures, a texton represents the set of images of a 3D surface element under varying illuminations and is called a “lighton” in this paper. We adopt an illumination-cone representation where a lighton is a texton triplet. For a given light source, a lighton image is generated as a linear sum of the three texton bases. We present a sequence of experiments for learning the geometric, dynamic, and photometric structures from images and videos, and we also present some comparison studies with K-mean clustering, sparse coding, independent component analysis, and transformed component analysis. We shall discuss how general textons can be learned from generic natural images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adelson, E.H. and Pentland, A.R. 1996. The perception of shading and reflectance. In Perception as Bayesian Inference, D. Knill and W. Richards (Eds.), Cambridge Univ. Press: New York, pp. 409–423.
Google Scholar
Atick, J.J. and Redlich, A.N. 1992. What does the retina know about natural scenes? Neural Computation, 4:196–210.
Google Scholar
Barlow, H.B. 1961. Possible principles underlying the transformation of sensory messages. In Sensory Communication, W.A. Rosenblith (Ed.), MIT Press: Cambridge, MA, pp. 217–234.
Google Scholar
Bell, A.J. and Sejnowski, T.J. 1995. An information maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6):1129–1159.
CAS PubMed Google Scholar
Belhumeur, P.N. and Kriegman, D. 1998. What is the set of images of an object under all possible illumination conditions? Int’l J. Computer Vision, 28(3).
Belhumeur, P.N., Kriegman, D., and Yuille, A.L. 1999. The bas-relief ambiguity. IJCV, 35(1).
Bergeaud, F. and Mallat, S. 1996. Matching pursuit: Adaptive representation of images and sounds. Comp. Appl. Math., 15:97–109.
Google Scholar
Coifman, R.R. and Wickerhauser, M.V. 1992. Entropy based algorithms for best basis selection. IEEE Trans. on Information Theory, 38:713–718.
Article Google Scholar
Dana, K. and Nayat, S. 1999. 3D textured surface modelling. In Proc. of Workshop on Integration of Appearance and Geometric Methods in Object Recognition, pp. 46–56.
Daugman, J. 1985. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of Optical Society of America, 2(7):1160–1169.
CAS Google Scholar
Dempster, A.P, Laird, N.M., and Rubin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39:1–38.
Google Scholar
Dong, J. and Chantler, M.J. 2002. In, Capture and synthesis of 3D surface texture. Proceedings of 2nd texture Workshop.
Donoho, D.L., Vetterli, M., DeVore, R.A., and Daubechie, I. 1998. Data compression and harmonic analysis. IEEE Trans. Information Theory, 6:2435–2476.
Article Google Scholar
Frey, B. and Jojic, N. 1999. Transfonmed component analysis: Joint estimation of spatial transforms and image components. In Proc. of Int’l Conf on Comp. Vis., Corfu, Greece.
Gu, M.G. and Kong, F.H. 1998. A stochastic approximation algorithm with Markov chain Monte-Carlo method for incomplete data estimation problems. In Proceedings of the National Academy of Sciences, U.S.A. 95, pp. 7270–7274.
Guo, C.E., Zhu, S.C., and Wu, Y.N. 2001a. Visual learning by integrating descriptive and generative methods. In Proc. of Int’l Conf on Computer Vision, Vancouver, CA, July, 2001 (to appear in JCV).
Guo, C.E., Zhu, S.C., and Wu, Y.N. 2001b. A mathematical theory for primal sketch and sketability. In Proc. of Int’l Conf on Computer Vision, Nice France, 2003.
Haddon, J. and Forsyth, D.A. 1998. Shading primitives: Finding folds and shadow grooves. In Proc. 6th Int’l Conf on Computer Vsion, Bambay, India.
Hubel, D., and Wiesel, T.N. 1962. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J.Physiology, 160:106–154.
CAS Google Scholar
Jacobs, D. 1997. Linear fiting with missing data: Applications to structure from motion and characterizing intensity images. In Proc. of CVPR.
Julesz, B. 1981. Textons, the elements of texture perception and their interactions. Nature, 290:91–97.
Article CAS PubMed Google Scholar
Karni, A. and Sagi, D. 1991. Where practice makes perfect in texture discrimination-Evidence for primary visual cortex plasticity. Proc. Nat. Acad. Sci. US, 88:4966–4970.
CAS Google Scholar
Koenderink, J.J., van Doomn, A.J., Dana, K.J., and Nayar, S. 1999. Bidirectional reflection distribution function of thoroughly pitted surfaces. IJCV, 31(2/3).
Lee, A.B., Huang, J.G., and Mumford, D.B. 2000. Random collage model for natural images. Int’l J. of Computer Vision, Oct. 2000.
Leung, T. and Malik, J. 1999. Recognizing surface using three-dimensional textons. In Proc. of 7th ICCV, Corfu, Greece, 1999.
Li, Y., Wang, T.S., and Shum, H.Y. 2002. Motion textures: A two-level statistical model for character motion synthesis. In Proceedings of Siggraph.
Liu, Y.X. and Collins, R.T. 2000. A computational model for repeated pattern perception using Frieze and wallpaper groups. In Proc. of Comp. Vis. and Patt. Recog., Hilton Head, SC. June, 2000.
Liu, X.G., Yu, Y.Z., and Shum, H.Y. 2001. Synthesizing bidirectional texture functions forreal world surfaces. In Proceeding of Siggraph.
Mallat, S.G. 1989. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. on PAM1, 11(7):674–693.
Google Scholar
Olshausen, B.A. and Field, D.J. 1997. Sparse coding with an over complete basis set: A strategy employed by VI? Vision Research, 37:3311–3325.
Article CAS PubMed Google Scholar
Shashua, A. 1992. Geometry and photometry in 3D visual recognition. <nt>Ph.D Thesis, MT.</nt>
Simoncelli, E.P., Freeman, W.T., Adelson, E.H., and Heeger, D.J. 1992. Shiftable multiscale transforms. IEEE ‘Trans. on Info. Theory, 38(2):587–607.
Article Google Scholar
Tu, Z.W. and Zhu, S.C. 2002. Image segmentation by data-driven Markov chain Monte Carlo. IEEE Trans. on PAMI, 24(5):657–673.
Google Scholar
Wang, Y.Z. and Zhu, S.C. 2002a. In A generative model for textured motion: Analysis and synthesis. In Proc. of European Conf. on Computer Vision, Copenhagen, Denmark.
Wang, Y.Z. and Zhu, S.C. 2002b. A generative model for textured motion: analysis and syghesis. In Proc. of European Conf. on Computer Vision. Copenhagen, Denmark.
Zhu, S.C. 2002. Statistical modeling and conceptualization of visual patterns. Preprint of UCLA Statistics Department.
Zhu, S.C., Guo, C.E., Wu, Y.N., and Wang, Y.Z. 2002. What are textons. In Proc. of European Conf on Computer Vision, Copenhagen, Denmark.

Download references

Author information

Authors and Affiliations

Departments of Statistics and Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Song-Chun Zhu, Cheng-en Guo, Yizhou Wang & Zijian Xu

Authors

Song-Chun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-en Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yizhou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zijian Xu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, SC., Guo, Ce., Wang, Y. et al. What are Textons?. Int J Comput Vision 62, 121–143 (2005). https://doi.org/10.1023/B:VISI.0000046592.70770.61

Download citation

Issue Date: April 2005
DOI: https://doi.org/10.1023/B:VISI.0000046592.70770.61

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What are Textons?

Abstract

Access this article

Similar content being viewed by others

Learning Sparse FRAME Models for Natural Image Patterns

Video Primal Sketch: A Unified Middle-Level Representation for Video

Eigen Appearance Maps of Dynamic Shapes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

What are Textons?

Abstract

Access this article

Similar content being viewed by others

Learning Sparse FRAME Models for Natural Image Patterns

Video Primal Sketch: A Unified Middle-Level Representation for Video

Eigen Appearance Maps of Dynamic Shapes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation