Keywords

1 Introduction

Texture is one of the fundamental visual properties of objects, materials and scenes. Understanding texture is therefore essential in a wide range of applications such as surface inspection and grading, content-based image retrieval, object recognition, material classification, remote sensing and medical image analysis. As a consequence, research on texture has been attracting significant attention for at least forty years, and a very large number of visual descriptors is now available in the literature – for an overview see for instance Refs. [9, 33].

During the last two decades the ‘bag of features’ paradigm (BoF) has emerged as one of the most effective approaches to texture analysis. This scheme is best explained by resorting to a parallel with the ‘bag of words’ model (BoW), whereby a text is represented by the statistical, orderless distribution of its words over a predefined dictionary. Likewise, the BoF represents images by the distribution of local patterns regardless of their spatial distribution [2]. One possible implementations of the bag of features model is represented by a class of methods known as Histogram of Equivalent Patterns (HEP) [10]. Descriptors of this class sample the input images densely and assign each local image patch to one visual word among those in the dictionary. The image representation is the probability distribution (histogram) of the visual words over the dictionary. In the HEP the mapping image patch \(\rightarrow \) visual word is typically a function (usually referred to as the kernel function) of the grey-levels of pixels in the patch. In this approach the dictionary is defined ‘a priori’ and coincides with the codomain of the kernel function. Local Binary Patterns and related methods are all instances of this general scheme [2, 10].

Extensions of this approach to the colour domain involve comparing the colour (or multi-spectral) pixel values instead of the grey-levels. This area, however, has received significantly less attention than the grey-scale counterpart. One of the first extensions of LBP to colour images was Opponent Colour Local Binary Patterns (OCLBP) [25] in which, as we detail in Sect. 3, the LBP operator is applied to each colour channel separately as well as to pairs of colour channels. Herein we propose a conceptually simple yet effective improvement on this method. We denote our descriptor as Improved Opponent-colour LBP (IOCLBP) and show, experimentally, that it can significantly outperform OCLBP in colour texture classification.

In the remainder of the paper we first provide some background in Sect. 2 then introduce IOCLBP in Sect. 3; we discuss the experimental activity in Sect. 4 and summarise the results in Sect. 5. Some final considerations and directions for future studies conclude the paper (Sect. 6).

2 Background

Few would object that LBP is one the most prominent and widely investigated texture descriptors ever. Suffice it to say that Ojala and Pietikäinen’s seminal work [27] has been cited no fewer than 5500 timesFootnote 1 since it was first published in 2002. Keys to the success of this method are the ease of implementation, low computational demand and high discrimination accuracy. A lot of LBP variations also exist: so many, indeed, that in a recent review Liu et al. [21] noted that their number is so large that it is becoming more and more difficult – even to the expert in the field – to grasp them all.

In comparison, colour variants have received significantly less attention in the literature. A common strategy for extending LBP to the colour domain consists of applying the LBP operator to each colour channel separately (as for instance in [23]), and/or to pairs of channels jointly, as suggested by and Mäenpää and Pietikäinen [24, 25]. Alternatively, one can treat colour data as vectors in the three-dimensional space and compare them on the basis of their norm, relative orientation (e.g.: Local Angular Patterns [8]) or both (e.g. Local Color Vector Binary Patterns [20]). Another possible way consists of defining a suitable total ordering in the colour space, and use this as a replacement for the natural grey-level ordering in LBP definition. This strategy has been recently investigated extensively by Ledoux et al. [19].

As we detail in Sect. 3, IOCLBP considers intra- and inter-channel features– just as OCLBP – but with a different local thresholding scheme. While in OCLBP the peripheral pixels are thresholded at the value of the central pixel, in IOCLBP thresholding is done against the average value. In the grey-scale domain the same approach has been used to define Improved Local Binary Patterns (ILBP) [16], which generally work better than LBP [11], but as far as we know this idea has not been extended to the colour domain yet. The method proposed here can be therefore considered an extension of ILBP to colour textures.

3 Improved Opponent Colour Local Binary Patterns

Let us consider a neighbourhood \(\mathcal {N} = \left\{ \mathbf {p}_0,\mathbf {p}_1,\dots ,\mathbf {p}_n\right\} \) composed of a central pixel \(\mathbf {p}_0\) and n peripheral pixels \(\mathbf {p}_i\), \(i \in \{1,\dots ,n\}\). For the sake of simplicity we shall assume that the peripheral pixels be arranged circularly around the central one (Fig. 1), though this restriction is not essential (see Nanni et al. [26] on this point).

In Local Binary Patterns an instance \(\mathcal {P}\) of \(\mathcal {N}\) (i.e. a local image patch) is assigned a unique decimal code (or, equivalently, a visual word) in the following way:

$$\begin{aligned} f_{\text {LBP}} \left( \mathcal {P} \right) = \sum _{i = 1}^n 2^{i-1} \phi \left[ g\left( \mathbf {p}_0\right) , g\left( \mathbf {p}_i\right) \right] \end{aligned}$$
(1)

where

$$\begin{aligned} \phi \left( x, y\right) = {\left\{ \begin{array}{ll} 0 &{} \text {if} \ x \le y \\ 1 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(2)

In Eq. 1 \(g\left( \mathbf {p}_0\right) \) stands for a generic function which converts from colour into grey-scale; this is normally the standard NTSC/PAL grey-scale conversion [29, Sect. 4.3.1]. The resulting feature vector is the dense, orderless statistical distribution over the set of possible codes.

Mäenpää and Pietikäinen [25] proposed an extension of this scheme to the colour domain by considering intra- and inter-channel features. For a pair of channels (uv) the resulting Opponent Colour Local Binary Patterns (OCLBP) can be defined as follows

$$\begin{aligned} f_{\text {OCLBP}_{u,v}} \left( \mathcal {P} \right) = \sum _{i = 1}^n 2^{i-1} \phi \left( p_{0,u}, p_{i,v}\right) \end{aligned}$$
(3)

where \(\mathbf {p}_{i,v}\) indicates intensity of the i-th pixel in the v-th channel. In the RGB space the image representation is the concatenation of the feature vectors generated by OCLBP\(_{R,R}\), OCLBP\(_{G,G}\), OCLBP\(_{B,B}\), OCLBP\(_{R,G}\), OCLBP\(_{R,B}\) and OCLBP\(_{G,B}\). The resulting vector is therefore six times larger than LBP’s.

Improved Opponent Color Local Binary Patterns differ from OCLBP in that the thresholding is no longer point-to-point but point-to-average. In formulas we have:

$$\begin{aligned} f_{\text {IOCLBP}_{u,v}} \left( \mathcal {P} \right) = \sum _{i = 0}^n 2^i \phi \left( \bar{p}_u, p_{i,v}\right) \end{aligned}$$
(4)

where:

Table 1. Summary table of the datasets used in the experiments.
$$\begin{aligned} \bar{p}_u = \frac{1}{n} \sum _{i=0}^n \bar{p}_{i,u} \end{aligned}$$
(5)

It is easy to see that the number of directional features generated by LBP, OCLBP and IOCLBP respectively is \(2^{n-1}\), \(6 \times 2^{n-1}\) and \(6 \times 2^n\). Variations of LBP such as the rotation invariant (LBP\(^{ri}\)) and the uniform, rotation invariant version (LBP\(^{riu2}\)) apply seamlessly to OCLBP and IOCLBP. The number of features in this case (see Table 2) can be computed through invariant theory as for instance shown by González et al. [12].

4 Experiments

We assessed the effectiveness of IOCLBP in a set of colour texture classification tasks. Datasets, classifier, accuracy evaluation procedure and methods used for comparison are detailed in the following subsections.

4.1 Datasets

We considered eight datasets of colour texture images as detailed below. The main features of each dataset along with sample images are also reported in Table 1.

KTH-TIPS includes images from the following 10 classes of materials: aluminum foil, bread, corduroy, cotton, cracker, linen, orange peel, sandpaper, sponge and styrofoam [14, 18]. For each class one sample of the corresponding material was acquired at nine different scales, three poses and under three illumination conditions, resulting in 81 image samples per class. The dimension of the images is 200px \(\times \) 200px.

KTH-TIPS2b extends KTH-TIPS by adding one class (i.e.: 11 instead of ten), three samples for each class (i.e.: four instead of one), and one illumination condition (i.e.: four instead of three). The overall dataset is therefore composed of 432 samples for each class [3, 18]. The image dimension is the same as in KTH-TIPS.

Outex-00013 comprises the same 68 texture classes as Outex’s test suite TC-00013. This is a collection of heterogeneous materials such as cardboard, fabric, natural stone, paper, sandpaper, wool, etc. [28]. The dataset contains 20 image samples of dimension 128px \(\times \) 128px which have no variation in scale, rotation angle or illumination conditions.

Outex-00014 features the same classes as Outex-00013 – but in this case each sample was acquired under three different lighting sources, respectively a 2300 K horizon sunlight, a 2856 K incandescent CIE A and a 4000 K fluorescent TL84 lamp. There are therefore 60 samples for each class instead of 20, whereas the image dimension is the same as in Outex-00014.

It is important to point out that in order to maintain a uniform evaluation protocol (see Sect. 4.2) for all the datasets considered here, we used different subdivisions into train and test set than those provided with the TC-00013 and TC-00014 test suites.

Plant Leaves is composed of images of leaves from 20 different species of plants. There are 60 samples for each class, each of dimension 128px \(\times \) 128px. The images were acquired under controlled imaging conditions through a planar scanner [4].

RawFooT is a dataset specifically designed for investigating the robustness of image descriptors against changes in the illumination conditions [9, 30]. It includes 68 classes of different types of raw food such as grain, fish, fruit, meat, pasta and vegetables. There are 46 image samples for each class; each sample was acquired under 46 different lighting conditions which differ in the direction of light, colour, intensity and a combination of these. Other viewing conditions such as scale and rotation angle are invariable. In our experiments we subdivided each sample into four non-overlapping images of dimension 400px \(\times \) 400px this way obtaining 46 \(\times \) 4 = 184 samples for each class.

USPTex features 191 classes of colour textures with 12 samples per class [1, 31]. The images are rather varied, representing materials such as seeds, rice and fabric, but also road scenes, vegetation, walls, clouds and soil. The images have been acquired ‘in the wild’ and have a dimension of 128px \(\times \) 128px.

V \(\times \) C TSG is based on 42 classes of ceramic tiles acquired under controlled and steady imaging conditions in the V \(\times \) C laboratory at the Polytechnic University of Valencia, Spain [22, 32]. The dataset is composed of 14 base classes and three grades for each class. Notably, the three grades are very similar and difficult to differentiate even to the trained eye. The original images come in different resolution, are either rectangular or square, and the number of samples varies from class to class. In our experiments we cropped in any case a maximal central square window and retained 12 samples for each class.

4.2 Classification and Accuracy Evaluation

Classification was based on the nearest-neighbour rule with \(L_1\) (‘cityblock’) distance. Accuracy estimation was performed through split-sample validation with stratified sampling, i.e. half of the samples of each class was used to train the classifier and the remaining half to test it. The estimated accuracy was the fraction of samples of the test set that were classified correctly. For a stable estimation the results (Table 3) were averaged over 100 random splits into train and test set.

4.3 Comparison with Other Methods

We compared the performance of IOCLBP with that of the following closely related methods:

  • Local Binary Patterns [27];

  • Combination of Local Binary Patterns and Local Colour Contrast [7];

  • Improved Local Binary Patterns [16]

  • Completed Local Binary Patterns [13];

  • Opponent-colour Local Binary Patterns [25];

  • Local Colour Vector Binary Patterns [20];

  • Texture Spectrum [15].

For each of the above descriptors a rotation-invariant, multi-resolution feature vector was obtained by concatenating the rotation-invariant feature vectors (e.g. LBP\(^{ri}\)) computed at resolution 1, 2 and 3 (Fig. 1). The number of features of each method is shown in Table 2.

Fig. 1.
figure 1

Pixel neighbourhoods corresponding to resolutions 1, 2 and 3

For calibration purposes we also included off-the-shelf features from pre-trained CNN – specifically Caffe AlexNet and VGG-M as respectively described in Refs. [5, 17]. We considered three different sets of features obtained by the following encoding systems [6]:

  • The output of the last fully-connected layer (FC);

  • The output of the last convolutional layer pooled through a bag-of-words encoder (BoVW);

  • The output of the last convolutional layer pooled through a vector of logically-aggregated descriptors encoder (VLAD).

Further post-processing involved \(L_2\) normalisation of the FC and BoWV features, and individual \(L_2\) normalisation of the VLAD subvectors. For a fair comparison we chose a number of clusters for the BoVW and VLAD encoders so as to guarantee that the three feature vectors (namely FC, BoVW and VLAD) had approximately the same length (see Table 2).

Table 2. Summary list of the methods included in the experiments

5 Results and Discussion

Table 3 summarises the average classification accuracy by image descriptor and dataset. As can be seen, the performance of IOCLBP was superior to that of the methods of the same family (particularly OCLBP and LCVBP) in six datasets out of eight. Comparison with CNN-based features shows a perfectly split scenario, with IOCLBP performing better in four datasets out of eight and the reverse occurring in the remaining four. Particularly interesting was the result obtained with dataset V \(\times \) C TSG: in this case IOCLBP outperformed all the other methods, and among them CNN-based features by a large margin (\(\approx \)15% points). In fact, all the hand-designed methods performed better than CNN-based features with this dataset.

Table 3. Overall accuracy by descriptor and dataset. Boldface figures indicate, for each dataset, the best among the hand-designed descriptors; boxed values the best among all descriptors.

6 Conclusions

In this work we have introduced a variant of OCLBP which we called Improved Opponent Colour Local Binary Patterns. Experimentally we have shown the superiority of IOCLBP with respect to akin methods in texture classification tasks. In our experiments IOCLBP’s accuracy was comparable to that of CNN-based features, but with the advantage that IOCLBP is conceptually much easier, training-free and less computationally demanding. Remarkably, IOCLBP showed clearly superior in the classification of texture images very similar to each other (e.g. dataset V \(\times \) C TSG).