Abstract
A natural object of study in texture representation and material classification is the probability density function, in pixel-value space, underlying the set of small patches from the given image. Inspired by the fact that small \(n\times n\) high-contrast patches from natural images in gray-scale accumulate with high density around a surface \(\fancyscript{K}\subset {\mathbb {R}}^{n^2}\) with the topology of a Klein bottle (Carlsson et al. International Journal of Computer Vision 76(1):1–12, 2008), we present in this paper a novel framework for the estimation and representation of distributions around \(\fancyscript{K}\), of patches from texture images. More specifically, we show that most \(n\times n\) patches from a given image can be projected onto \(\fancyscript{K}\) yielding a finite sample \(S\subset \fancyscript{K}\), whose underlying probability density function can be represented in terms of Fourier-like coefficients, which in turn, can be estimated from \(S\). We show that image rotation acts as a linear transformation at the level of the estimated coefficients, and use this to define a multi-scale rotation-invariant descriptor. We test it by classifying the materials in three popular data sets: The CUReT, UIUCTex and KTH-TIPS texture databases.
Similar content being viewed by others
Notes
Available at http://www.kyb.tuebingen.mpg.de/?id=227.
Provided one has a “continuous projection” such as the one described in Sect. 3.1.
Preprint available at http://arxiv.org/abs/1112.1993.
Convergence is with respect to the weak-* topology. This result is a consequence of the \(N\)-representation theorem (V.14, p. 143, Reed and Simon (1972)).
References
Aherne, F. J., Thacker, N. A., & Rockett, P. I. (1998). The bhattacharyya metric as an absolute similarity measure for frequency coded data. Kybernetika, 34(4), 363–368.
Bell, A. J., & Sejnowski, T. J. (1997). The “independent components” of natural scenes are edge filters. Vision Research, 37(23), 3327.
Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U. (1999). When is “nearest neighbor” meaningful? Database Theory-ICDT’99 (pp. 217–235).
Broadhurst, R. E. (2005). Statistical estimation of histogram variation for texture classification. In Proc. Intl. Workshop on texture analysis and synthesis. (pp. 25–30).
Brodatz, P. (1966). Textures: A photographic album for artists and designers (Vol. 66). New York: Dover.
Carlsson, G. (2009). Topology and data. Bulletin of the American Mathematical Society, 46(2), 255.
Carlsson, G., Ishkhanov, T., De Silva, V., & Zomorodian, A. (2008). On the local behavior of spaces of natural images. International Journal of Computer Vision, 76(1), 1–12.
Crosier, M., & Griffin, L. D. (2010). Using basic image features for texture classification. International Journal of Computer Vision, 88(3), 447–460.
Dana, K. J., Van Ginneken, B., Nayar, S. K., & Koenderink, J. J. (1999). Reflectance and texture of real-world surfaces. ACM Transactions on Graphics (TOG), 18(1), 1–34.
De Silva, V., Morozov, D., & Vejdemo-Johansson, M. (2011). Persistent cohomology and circular coordinates. Discrete and Computational Geometry, 45(4), 737–759.
De Wit, T. D., & Floriani, E. (1998). Estimating probability densities from short samples: A parametric maximum likelihood approach. Physical Review E, 58(4), 5115.
Edelman, A., & Murakami, H. (1995). Polynomial roots from companion matrix eigenvalues. Mathematics of Computation, 64(210), 763–776.
Franzoni, G. (2012). The klein bottle: Variations on a theme. Notices of the AMS, 59(8), 1094–1099.
Lewis, D. (2005). Feature classes for 1D, 2nd order image structure arise from natural image maximum likelihood statistics. Network, 16(2–3), 301–320.
Griffin, L. D. (2007). The second order local-image-structure solid. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(8), 1355–1366.
Harris, C. & Stephens, M. (1988) A combined corner and edge detector. In Proc. of Fourth Alvey Vision Conference. (pp. 147–151).
Hatcher, A. (2002). Algebraic topology. Cambridge: Cambridge University Press.
Hayman, E., Caputo, B., Fritz, M., & Eklundh, J. O. (2004). On the significance of real-world conditions for material classification. Computer Vision ECCV, 3024, 253–266.
Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 148(3), 574–591.
Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology, 195(1), 215–243.
Jurie, F. and Triggs, B. (2005). Creating efficient codebooks for visual recognition. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, IEEE. (Vol. 1, pp. 604–610).
Koenderink, J. J. (1984). The structure of images. Biological Cybernetics, 50(5), 363–370.
Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1265–1278.
Lee, A. B., Pedersen, K. S., & Mumford, D. (2003). The nonlinear statistics of high-contrast patches in natural images. International Journal of Computer Vision, 54(1), 83–103.
Leung, T., & Malik, J. (2001). Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43(1), 29–44.
Moler, C. (1991). Cleve’s corner: Roots-of polynomials, that is. The MathWorks Newsletter, 5(1), 8–9.
Pedersen, Kim S and Lee, Ann B. (2002). Toward a full probability model of edges in natural images. In Computer Vision-ECCV 2002, Springer (pp. 328–342).
Reed, M., & Simon, B. (1972). Methods of modern mathematical physics: Functional analysis (Vol. 1). New York: Academic Press.
Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.
Silverman, B. W. (1986). Density estimation for statistics and data analysis (Vol. 26). London: Chapman & Hall.
van Hateren, J. H., & van der Schaaf, A. (1998). Independent component filters of natural images compared with simple cells in primary visual cortex. In Proceedings of the Royal Society of London. Series B: Biological Sciences, (Vol. 265(1394) pp. 359–366).
Varma, M. and Ray, D. (2007). Learning the discriminative power-invariance trade-off. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, IEEE. (pp. 1–8).
Varma, M., & Zisserman, A. (2004). Unifying statistical texture classification frameworks. Image and Vision Computing, 22(14), 1175–1183.
Varma, M., & Zisserman, A. (2005). A statistical approach to texture classification from single images. International Journal of Computer Vision, 62(1), 61–81.
Varma, M., & Zisserman, A. (2009). A statistical approach to material classification using image patch exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 2032–2047.
Watson, G. S. (1969). Density estimation by orthogonal series. The Annals of Mathematical Statistics, 40(4), 1496–1498.
Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research, 10, 207–244.
Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Acknowledgments
Jose Perea was partially supported by the National Science Foundation (NSF) through grant DMS 0905823. Gunnar Carlsson was supported by the NSF through grants DMS 0905823 and DMS 096422, by the Air Force Office of Scientific Research through grants FA9550-09-1-0643 and FA9550-09-1-0531, and by the National Institutes of Health through grant I-U54-ca149145-01.
Author information
Authors and Affiliations
Corresponding author
Appendix: Main Results and Proofs
Appendix: Main Results and Proofs
Proof
(Proposition 1)
-
1.
The Cauchy-Schwartz inequality implies that for every \(\mathbf {v}\in S^1\) and every almost-everywhere differentiable \(I_P\) (not necessarily purely directional) one has that
$$\begin{aligned}Q_P(\mathbf {v}) \le \int \int \limits _{[-1 ,1]^2} \Vert \nabla I_P \Vert ^2 dxdy.\end{aligned}$$Since the equality holds when \(I_P(x,y) = g(ax + by)\) and \(\mathbf {v} = \begin{bmatrix}a\\ b\end{bmatrix}\), the result follows.
-
2.
Let \(\lambda _{max} \ge \lambda _{min} \ge 0\) be the eigenvalues of \(A_P\), and let \(B_P\) be a unitary matrix such that
$$\begin{aligned}A_PB_P = \begin{bmatrix}\lambda _{max}&0 \\ 0&\lambda _{min}\end{bmatrix}.\end{aligned}$$If \(\mathbf {v}\in S^1\) and \( \mathbf {v} = B_P\begin{bmatrix}v_x \\ v_y\end{bmatrix}\), then \(1 = \Vert \mathbf {v}\Vert ^2 = v_x^2 +v_y^2\), \(Q_P(\mathbf {v}) = \lambda _{max} v_x^2 + \lambda _{min}v_y^2\) and therefore
$$\begin{aligned}\max _{\Vert \mathbf {v}\Vert =1}Q_P(\mathbf {v})= \lambda _{max}.\end{aligned}$$Finally, since \(Q_P(\mathbf {v}) = \lambda _{max}\) for \(\mathbf {v}\in S^1\), if and only if \(\mathbf {v} \in E_{max}(A_P)\), we obtain the result.
-
3.
If the eigenvalues of \(A_P\) are distinct, then \(E_{max}(A_P)\) is one dimensional and thus intercepts \(S^1\) at exactly two antipodal points.\(\square \)
Proof
(Proposition 2) Since \(u\) and \(\sqrt{3}u^2\) are orthonormal with respect to \(\langle \cdot , \cdot \rangle _D\), then it follows that
which can be found via the usual condition of parallelism, provided \(\varphi (I_P,\alpha ) \ne 0\). \(\square \)
Proof
(Proposition 3) Let \(g\in L^2(T)\) and consider
It follows that \(g^\perp \) is square-integrable on \(T\), and that:
-
1.
\(g^\perp \left( -z,-\overline{w}\right) = g^\perp (z,w)\) for every \((z,w)\in T\). Hence \(g^\perp \in L^2(K)\) for every \(g\in L^2(T)\).
-
2.
If \(g_1,g_2\in L^2(T)\) and \(a_1,a_2\in {\mathbb {C}}\) then
$$\begin{aligned} (a_1g_1 + a_2g_2)^\perp = a_1(g_1^\perp ) + a_2 (g_2)^{\perp } \end{aligned}$$ -
3.
\(g^\perp = g\) for every \(g\in L^2(K)\).
We claim that \(g^\perp \) is the orthogonal projection of \(g\in L^2(T)\) onto \(L^2(K)\), and all we have to check is that \(g -g^\perp \) is perpendicular to every \(h\in L^2(K)\). To this end, let us introduce the notation
By writing the inner product of \(L^2(T)\) in polar coordinates, using the substitution \((\alpha ,\theta )\mapsto (\alpha + \pi ,\pi - \theta )\), and the fact that \(h\) satisfies Eq. 3.3, one obtains that
Therefore \( 2\langle g - g^\perp , h\rangle _T = \langle g, h\rangle _T - \langle g^*, h\rangle _T = 0 \) and we get the result. \(\square \)
Proof
[Theorem 1] By continuity, \(\varPi \) takes spanning sets to spanning sets which is the first part of the theorem. Now, if we consider the decomposition
where \(L^2(K)^\perp \) denotes the orthogonal linear complement of \(L^2(K)\) in \(L^2(T)\), and \(\fancyscript{B}\) is an orthonormal basis for \(L^2(K)\), then for any orthonormal basis \(\fancyscript{B}^\prime \) of \(L^2(K)^\perp \) we have that \(\fancyscript{B}\cup \fancyscript{B}^\prime \) is an orthonormal basis for \(L^2(T)\). The thing to notice is that since \(\ker (\varPi ) = L^2(K)^\perp \) and \(\varPi \) restricted to \(L^2(K)\) is the identity, then \(\varPi (\fancyscript{B}^\prime ) = \{\mathbf {0}\}\) and therefore \(\varPi (\fancyscript{B}\cup \fancyscript{B}^\prime ) = \fancyscript{B}\cup \{\mathbf {0}\}\). It follows that the only subset of \(\fancyscript{B}\cup \{\mathbf {0}\}\) which can be a basis for \(L^2(K)\) is \(\fancyscript{B}\), and that it is invariant when applying Gram-Schmidt.
Proof
(Theorem 3) If \(f,g\in L^2(K,{\mathbb {R}})\) then from a change of coordinates it follows that \(\langle f^\tau , g \rangle _K = \langle f, g^{-\tau }\rangle _K\), and therefore
Similarly
\(\square \)
Corollary 4
Let \(T_\tau :\ell ^2({\mathbb {R}}) \longrightarrow \ell ^2({\mathbb {R}})\) be as in Theorem 3. Then
Proof
(Theorem 4) From the description of \(T_\tau \) as a block diagonal matrix whose blocks are rotation matrices (Theorem 3), it follows that
where \(x_n\) and \(y_n\) depend solely on \(K\fancyscript{F}_w(f)\) and \(K\fancyscript{F}_w(g)\). By taking the derivative with respect to \(\tau \), we get that if \(\tau ^*\) is a minimizer for \(\varPsi (\tau )\) then we must have
which is equivalent to having
Let \(q(z)= \sum \limits _{n=1}^w n(y_n + ix_n)z^n,\; \bar{q}(z)= \sum \limits _{n=1}^w n(y_n - ix_n)z^n\) and
It follows that \(p(z)\) is a complex polynomial of degree less than or equal to \(2w\), and so that
\(\square \)
Corollary 5
The vector \(T_\tau \left( \widehat{K\fancyscript{F}}(f)\right) \) is a componentwise unbiased estimator for \(K\fancyscript{F}(f^\tau )\), which converges almost surely as the sample size tends to infinity.
Proof
(Proposition 4)Let \(\widehat{\mathbf {v}}^\tau \) be the vector with entries \(\widehat{d}_{1,1}^\tau \) and \(\widehat{e}_{1,1}^\tau \) for \(\widehat{K\fancyscript{F}}(f^\tau ,S^\tau )\). It follows from Theorem 3 and Corollaries 4 and 5 that
from which we get \(\sigma \left( f^\tau \right) \equiv \sigma (f) - \tau \;\; (\mathrm {mod}\;\;2\pi )\), and
as claimed. \(\square \)
Rights and permissions
About this article
Cite this article
Perea, J.A., Carlsson, G. A Klein-Bottle-Based Dictionary for Texture Representation. Int J Comput Vis 107, 75–97 (2014). https://doi.org/10.1007/s11263-013-0676-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-013-0676-2