1 Introduction

In this paper we introduce a new approach to classify rocks through cross-section images with new Riesz transform features in combination with machine learning. Several techniques which combine signal theory concepts with machine learning already exist in the literature. In most cases these concepts target higher accuracy while in our concept the runtimes of the algorithms are focused.

A recent paper describe the possibility to use the results of the Riesz decomposition into phase, amplitude and phase orientation as features of local binary patterns for histological image classification [27]. Furthermore 3D CT images of lung tissues [5] have been shown to be classified by covariance descriptors which are based on higher order Riesz transform. Several other proposed methods of applications of the Riesz transform in literature are based on steerable filter pyramids [25]. The according carrier signals, usually based on wavelet frames, are essential in this concept to extract the local features of the decomposition. Such filtering techniques are not needed in the new classification approach introduced in this paper. The algorithmic possibilities of such pyramids in image processing seem endless up to facial recognition [7] and amplification of motion [26] but focus on filtering and processing and less in classification.

Another related technique to our approach are convolutional neural networks (CNNs) which are assisted with Gabor frames called Gabor CNNs like in [21] or [16]. In this setup Gabor filters are used as fixed weight kernels instead of regular trainable kernels. Through this approach, improvements were implemented in training time as well as memory and storage requirements. In difference to these GaborCNNs our new approach has no convolutional layers and is based on a feature vector.

Usually, petrographic analysis is performed with the help of techniques from microscopy, spectroscopy and tomography as well as chemical analysis but during recent years, image processing based methods were introduced. In [15] a basic approach is followed with an image feature vector of variance and mean values of the intensities of Gabor filtered images and a nearest neighbour classifier. To improve this process, [14] adds structural information through the Sobel operator into the feature vector and uses a neural network to reach high accuracy rates. Similarities exist in [13], where texture and color features were processed in different phases. Our approach can be seen as an improvement of this general technique which is focused on fast calculations and a competing concept with modern convolutional networks.

These CNN are widely used today and rely on powerful and specific hardware for training. An early project [4] shows high accuracy results for classifications of thin sandstone rock images into different classes of granularity. In [2] modern Inception V3 networks were tested and reached an accuracy of 81% for petrographic thin sections. A simplified custom CNN for classification was presented in [18] and focused on field surface images instead of thin sections to encourage UAV applications.

2 Mathematical Preliminaries

Felsberg and Sommer [8] constructed a higher dimensional analogue of the analytic signal via an extension of complex analysis known as Clifford analysis, based on a transform which is known as the Riesz transform [22, ch. 3] [23, sec. II]. Especially in image processing, this transform has a wide range of applications in pattern analysis [1], orientation estimation [19, 20], medical image processing [17] and optical imaging [11].

Definition 1

(Riesz transform) The Riesz transform \(\mathcal {R}\) with the mapping \(L^2(\mathbb {R}^n)\rightarrow (L^2(\mathbb {R}^n))^n\) is defined as

$$\begin{aligned} (\mathcal {R}f)(\textbf{x}) = \begin{pmatrix} (\mathcal {R}_1f)(\textbf{x})\\ (\mathcal {R}_2f)(\textbf{x})\\ \vdots \\ (\mathcal {R}_nf)(\textbf{x})\\ \end{pmatrix} \end{aligned}$$
(1)

through its single components

$$\begin{aligned} (\mathcal {R}_jf)(\textbf{x}) = \frac{\Gamma \left( \frac{n+1}{2}\right) }{\pi ^{\frac{n+1}{2}}}\int _{\mathbb {R}^n}\frac{y_j}{\Vert \textbf{y}\Vert ^{n+1}}f(\textbf{x}-\textbf{y}){\text {d}}\!\textbf{y}. \end{aligned}$$
(2)

In Fourier domain, the single component Riesz transform is defined as

$$\begin{aligned} \left( \widehat{\mathcal {R}_jf}\right) (\mathbf {\xi }) = -\textrm{i}\frac{\xi _j}{\Vert \mathbf {\xi }\Vert }\hat{f}(\mathbf {\xi }). \end{aligned}$$
(3)

This transform and the connected monogenic signal have many properties which are useful in image processing and wavelet construction [22, p. 56] [24, prop. 2] [25, sec. 2]. Furthermore, through the Fast Fourier Transforms (FFT) quick computations are possible because of a \(\mathcal {O}((n+1)N\log {}N)\) complexity [12] for images with N pixels and a Riesz transform with n single components.

It is possible to define a higher order Riesz transformFootnote 1 by continuously applying the single component Riesz transforms. Before an explicit definition is stated, the normalization factor can be found via a decomposition of the identity [19, sec. 2.4].

Theorem 1

An iterative application of K single component Riesz transforms in a n dimensional setting hold the following decomposition of the identity operator:

$$\begin{aligned} \sum _{|\varvec{k}|=K}\frac{K!}{\varvec{k}!}\left( \mathcal {R}_1^{k_1}\cdots \mathcal {R}_n^{k_n}\right) ^*\left( \mathcal {R}_1^{k_1}\cdots \mathcal {R}_n^{k_n}\right) = {\text {Id}} \end{aligned}$$
(4)

using the multi index vector \(\varvec{k}=(k_1,\ldots ,k_n)\) and the property of the adjoint operator \(\mathcal {R}^*=\mathcal {R}^{-1}\).

While there are \(n^K\) possible ways to form a Kth order Riesz transform, because of commutativity and factorization properties of the convolution, there are just \(\left( {\begin{array}{c}K+n-1\\ n-1\end{array}}\right) \) distinct components of the transform.

Definition 2

(The Kth order Riesz transform) The Kth order Riesz transform \(\mathcal {R}^{(K)}\mathop {:} L^2(\mathbb {R}^n)\rightarrow (L^2(\mathbb {R}^n))^{p(K,n)}\) with \(p(K,n) = \left( {\begin{array}{c}K+n-1\\ n-1\end{array}}\right) \) is defined as a vector with single components

$$\begin{aligned} \mathcal {R}^{\varvec{k}}f = \sqrt{\frac{K!}{k_1!\cdots k_n!}}\mathcal {R}_1^{k_1}\cdots \mathcal {R}_n^{k_n}f \end{aligned}$$
(5)

Considering Theorem 1, the higher order Riesz transform preserves the inner product and norm. There exist more general results about arbitrary \(L^p\) spaces which show a boundedness of the Higher Order Riesz transforms (with notable exceptions of \(p=1,+\infty \)). These are not in the focus of image or higher dimensional data processing.

Corollary 1

For all \(f,g\in L^2(\mathbb {R}^n)\) it holds:

$$\begin{aligned}{} & {} \left\langle \mathcal {R}^{(K)}f,\mathcal {R}^{(K)}g\right\rangle _{(L^2(\mathbb {R}^n))^{p(K,n)}} \end{aligned}$$
(6)
$$\begin{aligned}{} & {} \quad = \sum _{|\varvec{k}|=K} \left\langle \mathcal {R}^{(K)}f,\mathcal {R}^{(K)}g\right\rangle _{L^2(\mathbb {R}^n)} \end{aligned}$$
(7)
$$\begin{aligned}{} & {} \quad = \left\langle f,g\right\rangle _{L^2(\mathbb {R}^n)} \end{aligned}$$
(8)

which is, by setting \(f=g\)

$$\begin{aligned} \left\| \mathcal {R}^{(K)}f\right\| _{(L^2(\mathbb {R}^n))^{p(K,n)}} = \left\| f\right\| _{L^2(\mathbb {R}^n)} \end{aligned}$$
(9)
Fig. 1
figure 1

Workflow of feature generation: first, an image of a rock is cropped and sampled into a dataset which consists of many patches. Each sample is then split into its different colour channels. Riesz kernels of higher order are applied on each channel of the sample which generates a set of images for each channel. These contain information about structure. The feature vector is finally calculated through either singular values or the Frobenius norm. Original Rock Image: Mica Schist, Senckenberg Naturhistorische Sammlungen Dresden, Inv.-Nr. MMG: PET BD 404165

3 Riesz Features in Machine Learning

This section introduces into the workflow of feature generation, beginning from the input data. Images which are taken into account are coloured scans of polished rock thin sections in high resolution (around 4200x2800 pixels). The whole classification pipeline including preprocessing is shown in Fig. 1. Here the workflow of feature generation is shown graphically. The following points describe the process in a more detailed way.

3.1 Preprocessing

In order to generate an appropriate dataset, the images are first rotated and cropped. Especially the cut edges of the material might disturb the calculations of the real image features in the upcoming process. The reasons are artefacts, which arises from the cutting process of the rocks. Also, heavily damaged areas in the image need to be removed from the dataset. This results in a set of RGB (24 bit) images with a resolution of approximately 3500 \(\times \) 2200 pixels. This image data is then sampled into smaller patches. Afterwards a channel splitting is done to process these independently separately from each other. This ensures the detection of specific properties which just occur in one channel. These colour specific features are merged again in the final feature vector in the end.

The exact parameters for the patches depend on several considerations like quality of data, computational power and storage as well as the homogeneity of the underlying image data. Because of further computations which involve a significant amount of Fourier transforms, quadratic patches with side lengths of powers of two are recommended.

After preprocessing the input data, one data point consists of a set of three square patches with only one channel each. To increase the amount of input data for the upcoming training process, the patches have overlapping areas. This can be realized with a classic stride approach, which is smaller than the patch size.

In case of the given dataset, which was created with photographs in a specific digitization construction with optimized lighting conditions no more pre-filtering of the images was done. In case of mixed datasets an augmentation to uniform the image characteristics is recommended.

3.2 Application of Higher Order Riesz transform

A filter-bank consisting of different higher order Riesz transforms is the central aspect of the new approach. It is commonly considered best practice to do all computations within the frequency domain using the convolution theorem, because all components of the transform are defined in Fourier space (see (3)). This changes the convolution of the Riesz filter kernel and the patch to a Hadamard product (element wise product of the single elements) in Fourier space. The filter bank with a fixed number of Riesz transforms can be precomputed in advance. Before the Riesz transform can be applied, the DC component of the input signal has to be set to 0 to gain pure structural information.

From a computational point of view, the application of a Riesz filter bank involves several Fourier transforms in 2D (one forward transform for the patch; one inverse transform for each kernel in the filter bank) and Hadamard products (one for each kernel in the filter bank). This approach improves the theoretical complexity from \(\mathcal {O}(nN^2)\) (n convolutions of two matrices with N entries each) to \(\mathcal {O}((n+1)N\log {}N)\). Furthermore, the characteristics of Hadamard operations are generally favoured by modern computer architectures.

From an image processing point of view, the Riesz transform is an edge detector combined with a fractional Laplace operator [19, Sec. 2.6]. Consecutive applications therefore result in images which resemble higher derivatives. Such operators can be interpreted as curvature or corner detectors [11].

3.3 Dimensional reduction

The output of the filter bank application is a set of transformed images for each channel of a sample.These contain structural information which are gained from the application of the Riesz transform. Because such images are not a suitable input for machine learning tools, the data needs to be reduced in its size and dimension. An application of Matrix norms like Frobenius are a common method in the area of machine learning. These are applied on each Riesz kernel transformed image separately. A whole image is therefore reduced to a single number.

Another approach which involves an additional degree of freedom are singular values or Eigenvalues. Instead of norms based on singular values like Schatten, Ky-Fan or spectral norm, it is possible to use the first n largest singular values as a resulting feature for one transformed image. An increase of singular values might lead directly to a gain in accuracy of the classification. A drawback of too many values is the growth of the input vector size of the machine learning algorithm and thus in higher runtimes of the training process.

After gaining the dimensional reductions values of single transformed images, these values are combined to a vector describing one channel of the sample. A combination of these vectors with the results of the other colour channels leads to a final feature vector which describes the whole colour patch from 3.1.

3.4 Possible machine learning tools

Using the above mentioned technique, each patch of the original data provides a feature vector of. Additionally, to add more general information about the colour, values for standard deviation and mean of each channel are added to the vector. This set of values describes the structure of the small patch detailed enough to use machine learning tools. The approach of a feature vector has the advantage to use a variety of different tools like classic support vector machine (SVM) techniques as well as fully connected neural networks.

The basic idea of SVM is to find hyperplanes in the data space that separate the classes [3]. For non-linearly distributed data contexts, one can apply kernel functions to project the data into a feature space before separation. In case of the data given in the present context a linear kernel is sufficient. SVM can only distinguish between two classes, which it separates by a hyperplane, therefore for multi class problems several comparisons of two classes each must be performed.

Fully connected neural networks [9] are an alternative approach to solve classification problems with feature vectors. In this case a network with two hidden layers is used. From the input to the first hidden layer there is a drop of neurons to 50%. In the next step an additional drop to one third of the first hidden layer is realized. Lastly there is the connected output layer with a softmax activation function.

4 Experimental Results

The provided data is a set of 36 high resolution RGB images from 18 different classes of rocks. This small dataset is a challenging problem in machine learning. Therefore in the preprocessing step is important to synthetically increase the amount of data with a small stride of 10 pixels. For all training processes the dataset was divided to 70% of training data and 30% validation data. This partition gives the chance for a better generalization of the networks. For all tests and results a setup with an Intel i9-9900KF and a single NVIDIA RTX 2070 was used. This allowed the utilization of GPU enabled training algorithms.

Through all experiments the runtime to create the filter bank can be omitted because it is done only once per dataset.

Table 1 Results of the feature experiments. The tables show the different accuracy results when a single parameter is changed while the other parameters are fixed. All accuracy values resemble the median of 5 distinct training runs to avoid inaccuracies caused by the specific collection of training and validation data. The comparison is made between fully connected neural networks (NN) and support vector machines (SVM). The number of features resembles the size of the feature vector

4.1 Experimental Baseline

As comparison to the new approach the convolutional neural network (CNN) ResNet50 is chosen, as it is a widespread network structure used for image classification [10]. In contrast to classic feed forward networks, which are depth-limited in their practical trainability due to the vanishing gradient problem, the ResNet architecture allows training up to more than 1000 layers. This is made possible by the introduction of residual connections which pass the identity across multiple layers, preventing both small gradients and loss of important information across many layers. Figure 2 gives an impression of the learning process. Validation and Training curves follow an ordinary expected behaviour.

Fig. 2
figure 2

Training progress of ResNet50 using the rock cross-section data. After 1300 iterations the training stopped. Validation and training curves of loss and accuracy show ordinary behaviour

Table 2 Results comparing different techniques for classification of the entire dataset. In case of fully connected neural networks and the support vector machine, the Feature vector based on 4 singular values and Riesz order of 4 was used. The patch size was chosen to be comparable to the Resnet50, which depends on the exact input size of 224 \(\times \) 224. All values represent the median of 5 consecutive experimental runs with different seeds of the random number generator

4.2 Optimal Feature Setup

The introduced approach in 3 allows a range of parameters to be set. Several experiments were made to find the optimal setup for the given dataset. The results are summarized in Table 1. To reduce the influence of errors based on the random selection of data for training and validation, the results each resemble the median of five independent training passes with different partitions. Additional to Riesz features, the mean and standard deviation of all channels of each patch were included into the feature vector to deliver pure colour information about the rock cross-section image.

The first experiment studies the influence of patch sizes. Table 1A clearly shows, that the accuracy increases with higher patch sizes. This is a natural phenomenon considering heterogeneous image data. Bigger patches contain more structural areas of the cross-section images. A drawback of using too large patches are the higher computational costs of feature generation. With each step the algorithm is fed with four times as many data per patch which influences the runtime of Fourier transforms and Hadamard products.

The experiment shown in Table 1B resembles the influence of different orders of Riesz transforms. Here, higher orders give more structural information and thus give better results. Higher orders give a more detailed response on structural information. The drawback are significantly higher computational costs. Every additional order includes a new complete set of filters which needs to be applied per data channel and inverse Fourier transformed. The experiments show, that already a small amount of Riesz Filters give enough structural information to reliably classify images.

The last Table 1C shows an experiment where the amount of singular values are studied. While only one singular value would give the spectral norm, more values give a more detailed response of the single Riesz filters. On CPU the computational costs to calculate singular values are manageable and higher amount of values do not significantly increase the runtime. These calculations on GPU on the other hand are not optimized and decrease efficiency. The experiments demonstrate that SVM benefit from more values while NN are less influenced by the amount.

4.3 Runtime Comparison

Next to the feature generation and the accuracy, computational costs of the training process need to be taken into account when comparing the approaches. Table 2 gives an impression about the direct comparison of a feature vector classified by machine learning algorithms and the use of a state-of-the-art convolutional neural network. Here, the cost of feature calculations is listed separately. The image size in this experiment was changed to a non ideal (not a power of 2) setup of 224 \(\times \) 224 to compare the runtimes against a non modified ResNet50.

5 Conclusion

The new approach to use feature vectors based on higher order Riesz transform in combination with singular values are introduced in this paper. Next to the general proof of principle a direct comparison shows challenging accuracy results. We see several possible applications in the field of classifying structural image data like textures. Independence from high performance GPUs could be useful in the field of embedded systems. Furthermore the approach is not restricted to any dimensionality of the data and thus higher dimensional data could be taken into account as well.

In case of higher dimensional data, matrices are replaced by tensor approaches. This includes further discussions about the process of reductions because the SVD is an issue in such a setup. Solutions could be alternative approaches of the reduction or the higher order SVD (HOSVD) [6].

Further research need to be done with bigger datasets containing hundreds or even thousands of images. Especially in this case we expect results with lower runtimes in comparison to CNNs while maintaining comparable accuracies. Such an investigation could make better statements about the generalization property of the trained machine learning tools, i.e. the model’s ability to adapt properly to new, previously unseen but similar data.

Additionally the new concept needs to be tested with broader datasets which are not restricted to images of rock thin sections which were recorded under ideal conditions. We propose high accuracy even through noise corrupted images because of the filtering component of the Riesz transform.