Journal of Mathematical Imaging and Vision

, Volume 47, Issue 1, pp 79–92

Adaptive Matrices and Filters for Color Texture Classification

Authors

    • Johann Bernoulli Institute for Mathematics and Computer ScienceUniversity of Groningen
  • Kerstin Bunte
    • CITEC Center of Excellence—Cognitive Interaction TechnologyBielefeld University
  • Nicolai Petkov
    • Johann Bernoulli Institute for Mathematics and Computer ScienceUniversity of Groningen
  • Michael Biehl
    • Johann Bernoulli Institute for Mathematics and Computer ScienceUniversity of Groningen
Open AccessArticle

DOI: 10.1007/s10851-012-0356-9

Cite this article as:
Giotis, I., Bunte, K., Petkov, N. et al. J Math Imaging Vis (2013) 47: 79. doi:10.1007/s10851-012-0356-9

Abstract

In this paper we introduce an integrative approach towards color texture classification and recognition using a supervised learning framework. Our approach is based on Generalized Learning Vector Quantization (GLVQ), extended by an adaptive distance measure, which is defined in the Fourier domain, and adaptive filter kernels based on Gabor filters. We evaluate the proposed technique on two sets of color texture images and compare results with those other methods achieve. The features and filter kernels learned by GLVQ improve classification accuracy and they are able to generalize much better for data previously unknown to the system.

Keywords

Adaptive metric Adaptive filters Classification Color texture analysis Gabor filters Learning Vector Quantization

1 Introduction

Texture analysis and classification are topics of interest due to their numerous possible applications, such as medical imaging, industrial quality control and remote sensing. A wide variety of methods for texture analysis has been developed such as co-occurrence matrices [11], Markov random fields [34], autocorrelation methods [24, 29], Gabor filtering [6, 9, 16, 18, 21, 32] and wavelet decomposition [33]. These methods mostly concern intensity images and since color information is a vector quantity an adaptation to the color domain is not always straightforward. Regarding color texture, the possible approaches can be divided in three categories [25] called parallel, sequential and integrative. In the parallel approach [22, 27] textural features are extracted solely from the luminance plane of an image and are used together with color features. The sequential approach [12] involves a quantization of the color space and subsequently the extraction of statistical features from the indexed images.

The integrative approach [5, 14, 15, 20, 25] is the most popular one and it describes color texture by combining color information with the spatial relationships of image regions within each color channel and between different color channels. The simplest integrative approach would only consist of a gray scale transformation of the input image but in many cases this has been proven insufficient. A very common advance of the integrative approach is based on the opponent-process theory of human color vision that has its roots in neuroscience. Ewald Hering [13] first noted that there are some color combinations that humans are not able to see, such as reddish-green or yellowish-blue, since these colors contrast each other strongly. Hence, he proposed that such color combinations can be the components of one vision mechanism that oppose each other through a process of excitatory and inhibitory responses. A popular application of this theory in computer vision is the Gaussian color model [8].

In this contribution we introduce a novel integrative approach towards color texture classification and recognition based on adaptive filters through supervised learning. The kernels we use are initialized as two-dimensional Gabor filters. A 2D Gabor filter acts as a local band-pass filter and can achieve optimal joint localization both in the spatial and frequency domains [4]. Given a set of labeled color images (RGB) for training and a bank of 2D Gabor filter kernels the goal here is to learn a transformation of a color image to a single channel (intensity) image and an optimal adaptation of the kernels such that the responses of the transformed images when filtered with the optimized kernels will yield the best possible classification.

Many signal processing techniques are based on insights or empirical observations from neurophysiology or optical physics. The proposed novel approach incorporates data-driven adaptation of the system, e.g. example based learning. Furthermore, the “family” of filters used in our approach can be substituted, depending on the data domain and the task at hand. As an example we explore in this paper the use of rotation and scale invariant descriptors based on Gabor filter responses [10]. We demonstrate that our approach yields very good generalization ability.

The paper is structured as follows: In Sects. 2 and 3 we present overviews of the existing approaches for color texture analysis and the Learning Vector Quantization algorithm respectively. In Sect. 4 the Color Image Analysis LVQ is explained in detail and Sect. 5 presents experimental results. Finally, in Sect. 6 we draw conclusions.

2 Overview of Existing Approaches

In texture analysis Gabor filter responses and Local Binary Patterns are two very popular types of descriptor that have been extended to color texture via integrative approaches that are using the opponent color model.

Jain et al. [15] proposed an approach that extends the use of features extracted from Gabor filter responses to color texture classification motivated by mechanisms of human vision. For this purpose they compute features from each color channel independently (unichrome features), as well as features that capture the spatial correlation between spectral bands (opponent features). Let h imn be the response of the i-th color channel of a given image when filtered with a Gabor kernel with scale m and orientation n.The unichrome features are defined as the square root of the energy of the Gabor responses:
$$ \mu_{imn} = \sqrt{ \biggl( \sum_{x,y} h^2_{imn}(x,y) \biggr)}. $$
The opponent features are based on the difference of normalized energies between different color channels and scales in the same orientation. The difference of normalized energies is:
$$ d_{ijm'mn} = \biggl( \frac{h_{imn}}{\mu_{imn}} - \frac {h_{jm'n}}{\mu_{jm'n}} \biggr) $$
thus defining the opponent feature:
$$ \psi_{ijm'mn} = \sqrt{ \biggl( \sum_{x,y} d^2_{ijm'mn}(x,y) \biggr)} . $$
All unichrome and opponent features are concatenated into a single feature vector that is used as a descriptor for the given image. In the following we refer to this technique as Opponent Color Features (OCF).
Local Binary Patterns (LBP) are based on the idea that texture can be described by local spatial patterns and gray scale contrast. The original LBP operator [23] creates labels for the image pixels by thresholding their 3×3 neighborhood with the center value. The pixels with lower intensities than the center pixel are labeled with 0, whereas those with equal or higher intensity values are labeled with 1. The labels are read clockwise as a binary number. This process is repeated for every pixel and the histogram of the 256 possible binary numbers is then used as a texture descriptor. The LBP operator was further extended to use neighborhoods of different sizes [24] using circular neighborhoods and bi-linearly interpolated values at non-integer pixel coordinates. In the following, the notation (P,R) is used to denote pixel neighborhoods formed by P sampling points on a circle of radius R. Another extension to the original operator is the definition of the so called uniform patterns, which can be used to reduce the length of the feature vector and implement a simple rotation-invariant descriptor. This extension was inspired by the fact that some binary patterns occur more commonly in texture images than others. A local binary pattern is called uniform if it contains at most two transitions from 0 to 1 or vice versa when it is traversed circularly. Ojala et al. [24] noticed in their experiments that uniform patterns account for a little less than 90 % of all patterns when using a (8,1) neighborhood and for around 70 % with a (16,2) neighborhood. After the LBP labeled image f l (x,y) has been obtained, the descriptor is defined as:
$$ H_i=\sum_{x,y} I \bigl \{f_l(x,y)=i \bigr\}, \quad i=0,\ldots,n-1 . $$
The Color LBP extension [28] is based on the ability to take the local threshold (neighborhood center) from n different color channels. The neighborhood to be thresholded can also be taken from these channels, which makes up a total of n 2 different combinations. The n 2 histogram descriptors are then concatenated into a single feature vector.

3 Review of the (Generalized Matrix) Learning Vector Quantization

Learning Vector Quantization (LVQ) is a supervised prototype-based classification method [17]. The training is based on data points x i ∈ℝ D and their corresponding label information y i ∈{1,…,C}, where D denotes the dimension of the feature vectors and C the number of classes. A set of prototypes is characterized by their location in the feature space w i ∈ℝ D and the respective class label c(w i )∈{1,…,C}. Classification is implemented as a winner-takes-all scheme. For this purpose, a possibly parameterized dissimilarity measure d Ω is defined, where Ω specifies the metric parameters which can be adapted during training. Given d Ω (x,w), any data point x is assigned to the class label c(w i ) of the closest prototype w i with d Ω (x,w i )≤d Ω (x,w j ) for all ji. The position of the closest (“winner”) prototype in the feature space is then adapted according to a learning rule, i.e. w i is moved closer to x if the data point is correctly classified and moved away from x if otherwise. The number of prototypes used to represent a class can be chosen by the user according to the nature of the data and the task at hand. The typical number of prototypes assigned to each class varies from 1 to 5.

A training scheme called Generalized LVQ (GLVQ) [30] is derived as a minimization of the cost function:
$$ f_c\bigl(d^\varOmega,J,K\bigr) = \sum _i \varPhi \biggl(\frac{d^\varOmega(\mathbf {x}^i,\mathbf {w}^J)-d^\varOmega(\mathbf {x}^i,\mathbf {w}^K)}{d^\varOmega(\mathbf {x}^i,\mathbf {w}^J) + d^\varOmega (\mathbf {x}^i,\mathbf {w}^K)} \biggr) $$
(1)
where the quantities
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ2_HTML.gif
(2)
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ3_HTML.gif
(3)
correspond to the distances of the feature vector x i from the respective closest correct prototype w J and the closest wrong prototype w K . Φ must be a monotonic function and throughout the following the identity Φ(x)=x is used.
Generalized Matrix Learning Vector Quantization (GMLVQ) is an extension of the original algorithm with adaptive dissimilarity measure based on the quadratic form:
$$ d^\varOmega(\mathbf {x},\mathbf {w}) = (\mathbf {x}-\mathbf {w})^\top \varOmega^\top\varOmega(\mathbf {x}-\mathbf {w}) $$
(4)
The matrix Λ=Ω Ω is assumed to be positive (semi-) definite. Hence the measure corresponds to a (squared) Euclidean distance in an appropriately transformed space
$$ d^\varOmega(\mathbf {x},\mathbf {w}) = \bigl[\varOmega(\mathbf {x}-\mathbf {w}) \bigr]^2 $$
(5)
Specific restrictions may be imposed on the transformation Ω∈ℝ M×D with MD without loss of generality. For M<D, Ω transforms the D-dimensional data into a lower M-dimensional space. This variant is referred to as Limited Rank Matrix LVQ (LiRaM LVQ) and explained in [1, 2]. The original algorithm follows a stochastic gradient descent for the optimization of the cost function (Eq. (1)). The gradients are evaluated with respect to the contribution of single instances x i , which are presented in random order and sequentially during training. The algorithm has been introduced and discussed in [31] and will be modified in the subsequent sections.

4 Color Image Analysis Learning Vector Quantization

In this contribution we present an extension of the GMLVQ concept, that is especially designed for color texture analysis. We use the same cost function, Eq. (1), as in the original GMLVQ algorithm and follow a stochastic gradient descent procedure where the samples x i of the training set are sequentially presented and the parameters accordingly updated. We will refer to this algorithm as Color Image Analysis LVQ (CIA-LVQ) and to one sweep through the training set as one epoch E.

Let D be a data set of color images of a priorly known size (p×p) that belong to C different classes and a bank of filter kernels G, initialized as a sum of Gabor filters with different scales and orientations. The goal is to learn one or more matrices Ω k that transform the color images into a single-channel, “intensity” image, a set of optimized kernels \(\widehat{G}_{k}\) and a set of prototypes w k such that the filter responses of the transformed images will yield the best possible classification. In addition, we use an adaptation of the learning rates that allows the system to be less dependent on their initial values.

We use for both the filter kernels and the images their representation in the Fourier domain. The image data are vectorized thus resulting in a data set of complex vectors x i ∈ℂ N , where N=pp⋅3, with p denoting the image patch size. These vectors are transformed by Ω k ∈ℂ M×N , where M=pp. The transformation Ω k ∈ℂ M×N can be considered as the equivalent of a color to gray scale image transformation, with k referring to the index of a prototype w k or the index of its class label for class-wise transformations. Subsequently, the transformed image data are filtered with every kernel G l G and the l responses are summed up. The filter kernels are also represented as complex vectors G l ∈ℂ M . The general form of the descriptor of an individual image is denoted as:
$$ \mathbf {r}_k^i = \mathbf {x}^i\varOmega_k^\top *\sum _l G_l $$
(6)
where ∗ denotes the convolution. Each such descriptor is associated with a label y i ∈1,2,…,C.
Note that, Eq. (6) describes only one convolution with a sum of kernels \(\widehat{G} = \sum_{l} G_{l}\). At this point, this is not of conceptual value. Since the algebraic property of distributivity holds for the operation of convolution in the Fourier domain, Eq. (6) yields a result identical to what is described above and can be simplified as:
$$ \mathbf {r}_k^i = \mathbf {x}^i\varOmega_k^\top *\widehat{G} $$
(7)
This obviously offers a gain in processing time, especially for larger filter banks and also simplifies the optimization process. In the following we optimize the sum of kernels \(\widehat{G}\).
We define the dissimilarity measure as:
$$ d^{\varOmega_k}_{\widehat{G}_k}\bigl(\mathbf {x}^i, \mathbf {w}^k\bigr) = \bigl\lVert\bigl|\mathbf {r}_k^i\bigr|^2 - \bigl|\mathbf {w}^k\bigr|^2\bigr\rVert^2 $$
(8)
which corresponds to the difference of magnitudes between a prototype and an image descriptor. In this fashion we ensure that two images containing the same texture pattern are considered similar, independent of the position within the image where this pattern occurs.

4.1 Explicit Form of the Learning Rules

The learning rules of CIA-LVQ can be derived from the dissimilarity measure as presented in Eq. (8) by taking the derivatives with respect to the parameters w k , Ω k and \({\widehat{G}_{k}} \). The parameter updates read as follows:
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ9_HTML.gif
(9)
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ10_HTML.gif
(10)
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ11_HTML.gif
(11)
where
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ12_HTML.gif
(12)
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ13_HTML.gif
(13)
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ14_HTML.gif
(14)
In Eqs. (9)–(14) L∈{J,K} and α,ϵ and η are the learning rates for the prototypes, the transformation matrix and the kernel used for filtering respectively.
The derivatives with respect to the closest correct w J and closest wrong prototype w K together with the corresponding matrices Ω J , Ω K and the filter kernels \({\widehat{G}}_{J}\), \({\widehat{G}}_{K}\) for the given training data point x i read:
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ15_HTML.gif
(15)
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ16_HTML.gif
(16)
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ17_HTML.gif
(17)
with denoting the complex conjugate and
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ18_HTML.gif
(18)
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ19_HTML.gif
(19)
Note, that since we are working with complex values we have to take all derivatives with respect to the real and imaginary parts respectively.

4.2 Adaptation of the Learning Rates

Steepest descent methods rely upon the choice of the suitable magnitude for the update step (learning rate). Very small steps usually only slow down convergence, whereas very large steps might result in oscillatory or divergent behavior. In the case of CIA-LVQ the update steps are denoted as α,ϵ and η and the issue of choosing their values is addressed by considering way-point averages over a number of latest iteration steps together with an efficient step size adaptation. This technique is being discussed in [26] for normalized gradients, but in CIA-LVQ we use its basic principles without the normalization.

The general form of the update of a parameter x is an iterative process with an initial learning rate value ψ 0 and an initial parameter value x 0. At every iteration step the cost function f c (x j ) is computed. At first we perform k>1 unaltered gradient steps as follows:
$$ \mathbf {x}_{j+1} = \mathbf {x}_j - \psi_j\Delta \mathbf {x}_j $$
(20)
for j=0,1,…,k−1 with ψ j =ψ 0. Consequently, apart from the current gradient step \(\tilde{\mathbf {x}}_{t+1}\) we also compute the way-point average of the previous k steps:
$$ \hat{\mathbf {x}}_{t+1} = \frac{1}{k}\sum _{i=0}^{k-1} \mathbf {x}_{t-i} $$
(21)
We determine the new position of the parameter x t+1 and the new step size ψ t+1 as:
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ22_HTML.gif
(22)
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Equ23_HTML.gif
(23)
As long as a simple gradient descent step yields a position for the parameter x that results in lower costs than the average of the k latest positions of x, the iterative process remains unaltered. On the other hand, \(f_{c}(\tilde{\mathbf {x}}_{t+1}) > f_{c}(\hat{\mathbf {x}}_{t+1})\) indicates that the step size is too large and should be reduced by a factor β.

In the next section we experiment with the algorithm and show its use in practice.

5 Experiments

In order to evaluate the usefulness of the proposed algorithm, we perform classification on patches of pictures taken from the VisTex [3] and the KTH-TIPS [7] databases. From the VisTex database we use 29 color images with size 128×128 pixels from the groups Bark, Brick, Fabric and Food. The KTH-TIPS set is used in its original form and consists of 810 color images with size 200×200 pixels from 10 different classes: Sandpaper, Aluminium Foil, Sponge, Styrofoam, Corduroy, Linen, Brown Bread, Cotton, Orange Peel and Cracker. Although in texture classification literature every image is often considered as a different class, here we distinguish into four and ten different classes respectively, which are equivalent to the conceptual groups that the images belong to. Despite its increased difficulty, this classification task allows us to better demonstrate the ability of CIA-LVQ to describe general characteristics of real-world texture patterns.

We split both data sets in two subsets. One subset is used for training whereas the other is never seen during training and we use it for evaluation. Figures 1 and 2 depict the training and evaluation images from the VisTex database. Figures 3 and 4 depict examples of training and evaluation images respectively from the KTH-TIPS database.
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig1_HTML.gif
Fig. 1

Images, used to provide patches for training and test (VisTex)

https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig2_HTML.gif
Fig. 2

Images, used to provide patches for evaluation (VisTex)

https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig3_HTML.gif
Fig. 3

Images, used to provide patches for training and test (KTH-TIPS)

https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig4_HTML.gif
Fig. 4

Images, used to provide patches for evaluation (KTH-TIPS

For our experiments we draw 15×15 patches randomly from each image. The training subsets of images are further divided in training and test sets of patches. The VisTex training subset consists of 200 patches per image. We use 150 patches per image (2400 data points) for training and test the performance of CIA-LVQ on the remaining 50 patches per image (800 data points). With respect to the KTH-TIPS training subset we draw 9 patches per image and use 6 for training (3240 data points) and the remaining 3 (1640 data points) for testing. The test sets may contain patches which partially overlap with those used for training. Therefore we use the images in Figs. 2 and 4 in order to create evaluation sets that have never been seen in the training process and thus better demonstrate the generalization ability of the proposed approach. The evaluation sets consist of 50 and 6 randomly drawn patches per image for VisTex and KTH-TIPS respectively.

A note is due here to the nature of the filters used for initialization. A 2D Gabor filter is defined as a Gaussian kernel function modulated by a sinusoidal plane wave. All filter kernels can be generated from one basic wavelet by dilation and rotation. In these experiments we initialize the adaptive filter banks as follows: Every bank consists of 16 Gabor filters of bandwidth equal to 1 at eight orientations θ=0, 22.5, 45, 67.5, 90, 112.5, 135 and 157.5 degrees and two scales (wavelengths) varying by one octave \(\lambda= \{ 5,5\sqrt{2}\}\). These scales ensure that the Gabor function yields an adequate number of visible parallel excitatory and inhibitory stripe zones. Dependent on the patch size and the nature of the data at hand different scales might be more suitable. We set the phase offset ϕ=0 and the aspect ratio γ=1 for all filters. In this way we create center-on symmetric filters with circular support.

We run the localized version of CIA-LVQ with matrices Ω k initialized with the identity matrix and 4 prototypes per class for E=300 epochs. The prototypes are initialized as the mean of the corresponding class. Regarding VisTex the training error is 5.75 % and the error on the test set 15 %. For the KTH-TIPS data set CIA-LVQ reaches training and test errors of 15.4 % and 22.8 % respectively.

We use the same data sets and the same filter banks to compare with the Gabor-based Opponent Color Features (OCF) [15], the Color Local Binary Patterns (Color LBP) [28] and the common approach of deriving textural information only from the luminance plane of images [5]. The luminance approach is considered to often outperform combined color and texture features [19]. We implement this approach with a RGB to gray (RGB2G) transformation, which builds intensity values by a weighted sum of the color components of every pixel:
$$ I_{(x,y)} = 0.2989 \cdot R_{(x,y)} + 0.587 \cdot G_{(x,y)} + 0.114 \cdot B_{(x,y)} $$
(24)
We again vectorize all patches s and in this case the image patch descriptor is given by
$$ \mathbf {r}_2(\mathbf {s})=\mathbf {s}*\sum_l \mathbf{G}^l . $$
(25)
For OCF we use a k-nearest neighbors (k-NN) classification scheme with precisely the set of features and the dissimilarity measure suggested by the authors of [15], whereas for the Color LBP we use rotation-invariant uniform LBP histograms in (8,1) neighborhoods and the Euclidean distance in an k-NN scheme. We choose the size of the neighborhood in relation to the patch size and the dimensions of the feature vectors created. With respect to the RGB2G approach we use the k-NN scheme with a dissimilarity measure similar to Eq. (8):
$$ d_\mathbf{G}\bigl(\mathbf {x}^i,\mathbf {x}^j\bigr) = \bigl\lVert\bigl|\mathbf {r}_2\bigl(\mathbf {x}^i\bigr)\bigr|^2 - \bigl|\mathbf {r}_2\bigl(\mathbf {x}^j\bigr)\bigr|^2\bigr\rVert^2 . $$
(26)
Regarding all k-NN schemes we cross-validate the number of nearest neighbors using the values k=1,3,…,15 on the testing image patches from the training subsets. The optimal k obtained is then used for experimenting on the previously unseen evaluation image patches. Ties are solved by defaulting to the 1-NN classifier.

5.1 Comparisons on the VisTex Data Set

The k-NN scheme shows a test error of 9.1 % based on the OCF (k=3), 2 % based on the Color LBP (k=1) and 25.8 % based on the RGB2G transformation (k=1), but the most interesting comparison relies on the evaluation set which displays the generalization ability of each method. Here the k-NN scheme produces much higher error rates of 35.2 %, 25.2 % and 50 % for OCF, Color LBP and RGB2G respectively, while the CIA-LVQ has an error of 13.1 %, in the same order of magnitude as for the test patches. Table 1 presents in detail the confusion matrices and classwise accuracies of all methods for the evaluation set.
Table 1

Confusion matrices for the VisTex evaluation set

CIA-LVQ:

 

Bark

Brick

Fabric

Food

Bark

179

2

23

4

208

Brick

5

85

1

2

93

Fabric

2

13

176

19

210

Food

14

0

0

125

139

200

100

200

150

650

Class-wise accuracy of estimation in %

 

89.50

85.00

88.00

83.33

 

OCF:

 

Bark

Brick

Fabric

Food

Bark

111

10

35

36

192

Brick

70

78

10

26

184

Fabric

16

12

155

11

194

Food

3

0

0

77

80

200

100

200

150

650

Class-wise accuracy of estimation in %

 

55.50

78.00

77.50

51.33

 

Color LBP:

 

Bark

Brick

Fabric

Food

Bark

152

24

2

6

174

Brick

21

56

12

1

178

Fabric

2

12

138

3

127

Food

25

8

4

140

181

200

100

200

150

650

Class-wise accuracy of estimation in %

 

76.00

56.00

69.00

93.33

 

RGB2G:

 

Bark

Brick

Fabric

Food

Bark

79

12

38

38

167

Brick

64

62

34

28

188

Fabric

16

15

113

13

157

Food

41

11

15

71

138

200

100

200

150

650

Class-wise accuracy of estimation in %

 

39.50

62.00

56.50

47.33

 
CIA-LVQ consistently outperforms all other methods and displays remarkable ability to generalize for previously unknown data. The magnitude of the prototypes, which classify the evaluation set are shown in Fig. 5. Additionally we show some example patches from the evaluation set, which are classified correctly together with their descriptors in Fig. 6 and some examples of wrongly classified patches in Fig. 7. Finally, Fig. 8 depicts in the spatial domain the optimized sums of kernels that are used together with the corresponding prototypes in order to classify the evaluation set. The accuracy rates of the proposed approach don’t vary a lot among the different classes. However, Brick and Food are the most difficult to classify using a small patch size due to the large size of the texture patterns and the possibly low contrast respectively. Therefore, Color LBP being invariant to monotonic contrast changes outperforms CIA-LVQ for the class Food.
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig5_HTML.gif
Fig. 5

Plots of the optimized prototypes |(w L )| actively used to classify the data in the evaluation set of the VisTex database. The names consist of the corresponding class name and the index number (1–4) of the prototype

https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig6_HTML.gif
Fig. 6

Plots of the descriptors |r k | of some correctly classified image patches from the evaluation set of VisTex database

https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig7_HTML.gif
Fig. 7

Plots of the descriptors |r k | of some wrongly classified image patches from the evaluation set of VisTex database

https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig8_HTML.gif
Fig. 8

Plots of the final form of filter kernels actively used to classify the evaluation set of the VisTex database. The filter kernels have been locally adapted during training. The names consist of the corresponding class name and the index number (1–4) of the kernel

5.2 Comparisons on the KTH-TIPS Database

The k-NN scheme shows a test error of 41.7 % based on the OCF (k=13), 26.4 % based on the Color LBP (k=11) and of 52.7 % based on the RGB2G transformation (k=11), which are all higher than what CIA-LVQ can achieve. On the evaluation set the superior performance of the proposed technique is further clarified. The k-NN scheme reaches error rates of 46.4 %, 35.6 % and 58.4 % for OCF, Color LBP and RGB2G respectively, while the CIA-LVQ has an error of 20.3 %, again in the same order of magnitude as for the test patches. Table 2 presents in detail the confusion matrices and classwise accuracies of all methods for the evaluation set of the KTH-TIPS database.
Table 2

Confusion matrices for the KTH-TIPS evaluation set

CIA-LVQ:

 

S/paper

Al. Foil

Sponge

Styrofoam

Corduroy

Linen

Br. Bread

Cotton

Or. Peel

Cracker

S/paper

129

0

2

0

0

4

0

9

0

0

144

Al. Foil

0

123

0

0

0

0

0

0

0

0

123

Sponge

0

0

107

0

41

0

30

0

0

0

178

Styrofoam

0

4

0

133

0

0

0

0

2

0

139

Corduroy

0

0

22

0

91

0

32

0

0

0

145

Linen

0

0

0

2

0

108

0

54

0

0

164

Br. Bread

0

0

3

0

3

0

51

0

0

1

58

Cotton

1

7

0

0

0

23

0

70

2

0

103

Or. Peel

0

0

0

0

0

0

0

0

131

1

132

Cracker

5

1

1

0

0

0

22

2

0

133

164

135

135

135

135

135

135

135

135

135

135

1350

Class-wise accuracy of estimation in %

 

95.56

91.11

79.26

98.52

67.41

80.00

37.78

51.85

97.04

98.52

 

OCF:

 

S/paper

Al. Foil

Sponge

Styrofoam

Corduroy

Linen

Br. Bread

Cotton

Or. Peel

Cracker

S/paper

34

0

21

26

0

2

7

10

2

7

109

Al. Foil

0

112

1

1

2

1

7

4

0

6

134

Sponge

30

0

43

10

8

3

17

4

13

12

140

Styrofoam

28

0

15

61

3

17

7

13

0

7

151

Corduroy

7

4

7

6

97

2

3

22

0

5

153

Linen

2

0

0

2

1

91

4

10

2

3

115

Br. Bread

16

6

39

13

8

4

58

5

7

33

189

Cotton

3

0

1

5

1

3

1

61

6

0

81

Or. Peel

7

0

1

0

1

1

1

4

105

1

121

Cracker

8

13

7

11

14

11

30

2

0

61

157

135

135

135

135

135

135

135

135

135

135

1350

Class-wise accuracy of estimation in %

 

25.19

82.96

31.85

45.19

71.85

67.41

42.96

45.19

77.78

45.19

 

Color LBP:

 

S/paper

Al. Foil

Sponge

Styrofoam

Corduroy

Linen

Br. Bread

Cotton

Or. Peel

Cracker

S/paper

66

0

27

1

11

4

8

6

0

25

148

Al. Foil

0

134

0

0

0

0

0

0

0

0

134

Sponge

28

0

65

1

5

0

14

1

28

26

168

Styrofoam

0

0

0

126

0

30

0

6

0

2

164

Corduroy

2

0

8

0

109

1

15

5

0

5

145

Linen

0

1

0

1

0

69

0

36

0

0

107

Br. Bread

13

0

20

1

6

7

86

1

6

27

167

Cotton

0

0

0

2

1

20

0

72

5

3

103

Or. Peel

5

0

15

0

2

0

1

0

95

0

118

Cracker

21

0

0

3

1

4

11

8

1

47

96

135

135

135

135

135

135

135

135

135

135

1350

Class-wise accuracy of estimation in %

 

48.89

99.26

48.15

93.33

80.74

51.11

63.70

53.33

70.37

34.81

 

RGB2G:

 

S/paper

Al. Foil

Sponge

Styrofoam

Corduroy

Linen

Br. Bread

Cotton

Or. Peel

Cracker

S/paper

26

1

17

38

2

11

9

7

24

5

140

Al. Foil

1

81

0

0

0

1

3

0

0

8

94

Sponge

22

2

33

13

5

11

16

5

26

14

147

Styrofoam

37

1

12

45

0

11

1

8

8

13

136

Corduroy

0

15

8

1

108

6

9

17

3

7

174

Linen

1

5

1

4

3

56

10

2

1

6

89

Br. Bread

10

6

22

7

3

9

50

2

10

37

156

Cotton

11

2

4

3

5

12

3

77

12

4

133

Or. Peel

24

1

32

19

3

10

7

16

51

6

169

Cracker

3

21

6

5

6

8

27

1

0

35

112

135

135

135

135

135

135

135

135

135

135

1350

Class-wise accuracy of estimation in %

 

19.26

60.00

24.44

33.33

80.00

41.48

37.04

57.04

37.78

25.93

 
CIA-LVQ is outperformed only for the Corduroy class by all three methods that we compare with, while Color LBP achieves better results for Aluminium Foil, Brown Bread and Cotton as well. The prototypes, which classify the evaluation set of KTH-TIPS are shown in Fig. 9, together with examples of correctly (Fig. 10) and wrongly (Fig. 11) classified patches and their corresponding descriptors. The optimized sums of kernels that are used are shown in Fig. 12. The classes Corduroy, Brown Bread and Cotton are both characterized from nuances of brown color and very diverse patterns as well as the class Sponge. Therefore, the former two are often mistaken for one another or Sponge from CIA-LVQ. The same occurs also between Cotton and Linen that are dominated by very similar colors and often low contrast. Finally, the performance of CIA-LVQ with regard to the class Aluminium Foil is mostly due to the combination of large textures and the small patch size.
https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig9_HTML.gif
Fig. 9

Plots of the optimized prototypes |(w L )| actively used to classify the data in the evaluation set of the KTH-TIPS database. The names consist of the corresponding class name and the index number (1–4) of the prototype

https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig10_HTML.gif
Fig. 10

Plots of the descriptors |r k | of some correctly classified image patches from the evaluation set of KTH-TIPS database

https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig11_HTML.gif
Fig. 11

Plots of the descriptors |r k | of some wrongly classified image patches from the evaluation set of KTH-TIPS database

https://static-content.springer.com/image/art%3A10.1007%2Fs10851-012-0356-9/MediaObjects/10851_2012_356_Fig12_HTML.gif
Fig. 12

Plots of the final form of filter kernels actively used to classify the evaluation set of the VisTex database. The filter kernels have been locally adapted during training. The names consist of the corresponding class name and the index number (1–4) of the kernel

6 Conclusion and Outlook

In this contribution we propose a prototype based framework for color texture classification. As an example we initialize the system with Gabor filters and classify color texture patterns in 15×15 patches randomly drawn from images of two public data sets. The results show that CIA-LVQ can learn typical texture patterns with very good generalization, even from relatively small patches and filter banks and it consistently outperforms state of the art techniques used for color texture analysis. It is also of conceptual value that this LVQ adaptation is suitable for learning in the complex number domain.

The resulting filter kernels may not strictly conform to the notion of Gabor filters, they preserve however the important property of symmetric and periodic excitatory and inhibitory regions, the shape and size of which are data driven. In principle every adaptive metric method could be extended following our suggestion, but we consciously choose LVQ because of its easily interpretable results and the lower computational costs in comparison to other approaches. Similarly to Gabor filters any other family of 2D filters commonly used to describe gray scale image information could be adapted and applied to color image analysis with this algorithm. Initializing with a filter bank of differences of Gaussians for color edge detection is a possible example. Furthermore, depending on the task at hand it might be desirable that two patches in which the same texture occurs on different positions should not be interpreted as similar. In this case another similarity measure should be used: \(\lVert|\mathbf {r}(\mathbf {x}^{i}) - \mathbf {r}(\mathbf {w}^{L}) |\rVert^{2}\), which is not based on the difference of magnitudes. This might be of advantage for example in the recognition of objects such as traffic signs, were a corner or an edge might have different interpretations dependent on their position in the image. Combinations of CIA-LVQ with keypoint detectors to avoid the drawing of patches from random positions within an image can also be easily implemented and can be beneficial especially for tasks that are related to object recognition. A completely unbiased, regarding the nature of the filters, variant of CIA-LVQ where the adaptive kernels are randomly initialized is also of particular interest mostly in cases where there is no prior knowledge for the nature of the data (i.e. medical imaging).

CIA-LVQ formulates a novel general principle: based on a differentiable convolution and an adaptive filter bank, the algorithm optimizes the classification. Contrary to standard approaches which are either based on a single channel representation of the images through a fixed transformation or empirical observations for combining color and textural information, the proposed technique offers the alternative of data driven learning of suitable, parameterized image descriptors. The ability of automatically weighing different color channels and different filters in localized neighborhoods, according to their importance for the classification task, is the most significant factor which qualifies our approach.

Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Copyright information

© The Author(s) 2012