Introduction

Image classification using various machine learning methods is attracting growing interest due to its crucial impact on sectors such as medicine, surveillance and agriculture [1,2,3]. With the exponential growth of data and images, classification helps to extract essential information [4]. Automating classification takes the pain out of what were once laborious tasks, while access to advanced technologies makes it easier to implement and explore different methods. The need for accurate decisions constantly drives researchers to explore cutting-edge methods, starting with the careful selection of databases, through the application of pre-processing to the data, to the use of artificial intelligence, neural network model enhancement [5, 6] and its implementation for various applications [7]. In this paper, we present a new classification approach based on the use of a database containing stereoscopic images [8,9,10,11]. Pairs of stereo images are often used in virtual reality, 3D movies and other applications requiring depth perception, while normal images are commonly used for conventional visual presentations [12, 13]. Stereoscopic images surpass monoscopic images in providing richer, more accurate information for image classification. This superiority is due to the ability of these methods to capture the depth and three-dimensional structure of scenes [14, 15]. Several factors explain the advantage of stereo images in classification: the deduction of distances between objects, essential for discerning similar elements in a scene; the extraction of 3D features to differentiate apparently identical objects in 2D; the facilitation of the separation of superimposed objects thanks to depth information; and finally, the enhanced realism simulating human perception and thus improving classification performance by mimicking the human cognitive process in object recognition [16, 17].

Before presenting several works based on stereo images for various applications, we would like to emphasize that pre-processing stereo images is a crucial step in ensuring the quality of the input data used in classification tasks. The following are several methods commonly employed in stereo image pre-processing: geometric rectification, filtering and smoothing, resolution reduction, artifact removal and database balancing. In [18] the author addresses the prediction of stereoscopic visual saliency. The authors present an innovative model that takes into account the pop-out effect and the comfort zone of stereo vision to predict visual saliency. In [19] the author addresses the complex challenge of detecting and classifying objects in stereoscopic images. An iterative method is proposed to improve both the detection of salient objects and their classification. The approach is divided into two distinct stages. The first step presents a 3D saliency detection method. In the second step, clustering is performed on a set of keyword features from the front and back planes, using location information. The authors in [20] propose a novel method for three-dimensional object detection in the context of autonomous driving. Their approach takes advantage of both sparse and dense information.

The quest for accurate decisions prompted us to examine an innovative approach. In this paper, we introduce for the first time a new classification method exploiting stereo images. This is based on two state-of-the-art approaches: octonion transforms and neural networks. Overall, our methodology is divided into two key stages, each playing an essential role in the approach. The first stage focuses on data pre-processing, with two major objectives. First, we set about building a balanced and condensed database. To this end, we carry out manual operations such as data cleaning, scaling and sample augmentation. These rigorous actions enable us to structure the final database, made up of 600 color stereoscopic images (512 × 512 × 3), distributed in three main classes. The second facet of this step consists in extracting the Krawtchouk octonion moments for each image in the database. This approach enables us to acquire global features, while capturing the salient properties of all stereoscopic images. The result is a database of image moments, with dimensions of 128 × 128 × 1. The second step involves training the CNN model on all the moment images. Before describing the structure of this paper in detail, it is worth highlighting its salient points:

  1. 1.

    Introducing stereo images into classification applications: This innovative approach aims to exploit the advantages of stereo images to improve classification performance in a variety of applications.

  2. 2.

    Introducing stable Krawtchouk octonion moments for holistic stereo image preprocessing: We introduce adaptable Krawtchouk moments as a preprocessing method for stereo images. This holistic approach enables us to extract global information while preserving the essential features of stereo images, thus enhancing data quality upstream of classification.

  3. 3.

    Suggesting an innovative classification approach that merges octonion moment transformations with neural networks.

Motivation

Our motivation lies in combining two state-of-the-art approaches, namely Krawtchouk octonion moment transformations and neural networks, to develop a new classification method based on stereoscopic images. This innovative methodology exploits the potential of stereoscopic images and aims to improve classification accuracy. In summary, our work seeks to introduce a new image classification paradigm by leveraging the advantages of stereoscopic images and exploiting Krawtchouk octonion moments as a crucial pre-processing step to improve data quality prior to classification.

Organizations

After a general introduction, the third section looks at discrete and stable orthogonal Krawtchouk polynomials adapted to octonions. The fourth section details our proposed classification method. In the fifth section, we highlight our simulation results. Finally, the sixth section presents our conclusions and avenues for future research.

Related works

In summary, each row in Table 1 describes a vision-based classification method. Each method is characterized by the type of vision, the database used for training and/or testing, the metrics, the polynomial basis used to extract features, and finally, the specific class of moments employed in the classification process. It should be emphasized that the theory of moments is exceptionally effective in a number of areas, including data security, localization, and reconstruction [26,27,28].

Table 1 State of the art in moment-based classification

Stable octonion krawtchouk moments

In this part of the paper, we focus on the two essential elements that make this possible. The first aspect focuses on highlighting the octonion algebra and its characteristic properties. The second part, on the other hand, is devoted to a detailed presentation of SOKM, providing a comprehensive perspective on the concepts addressed in this study.

Octonion preliminary

Octonion is a fascinating mathematical concept that represents a special type of hypercomplex number. It is considered an extension of complex numbers and quaternions.

Complex numbers, formed by a real part and an imaginary part, are represented by the expression a + bi, where a and b are real numbers and i is the imaginary unit. Quaternions, on the other hand, are an extension of complex numbers and include three imaginary parts, denoted i, j and k. They are represented by the expression \(a \, + \, bi \, + \, cj \, + \, dk\) where \(a, \, b, \, c{\text{ and }}d\) are real numbers and i, j and k are distinct imaginary units that satisfy specific relations [29, 30].

The idea behind octonions is to take this extension of complex numbers even further. Octonions are formed by a real part \(\lambda_{i} \,\left\{ {i = 0,1,...,7} \right\}\) and seven imaginary parts \({\varvec{i}}\), \({\varvec{j}}\), \({\varvec{k}}\), \(\user2{\ell }\), \(\user2{i\ell }\), \(\user2{j\ell }\), and \(\user2{k\ell }\). The octonion is defined by Eq. (1), where the complex numbers obey the rules presented in Table 2 [31].

$$ \chi = \lambda_{0} + \lambda_{1} {\varvec{i}} + \lambda_{2} {\varvec{j}} + \lambda_{3} {\varvec{k}} + \lambda_{4} \user2{\ell } + \lambda_{5} \user2{i\ell } + \lambda_{6} \user2{j\ell } + \lambda_{7} \user2{k\ell } $$
(1)
Table 2 Octonion multiplication table

An octonion \(\lambda\) can be expanded into a real part and an imaginary part: \(\lambda = {\text{Re}} (\lambda ) + {\text{Im}} (\lambda )\) where

$$ {\text{Re}} (\lambda ) = \lambda_{0} $$
(2)
$$ {\text{Im}} (\lambda ) = \lambda_{1} {\varvec{i}} + \lambda_{2} {\varvec{j}} + \lambda_{3} {\varvec{k}} + \lambda_{4} \user2{\ell } + \lambda_{5} \user2{i\ell } + \lambda_{6} \user2{j\ell } + \lambda_{7} \user2{k\ell } $$
(3)

The conjugate \(\overline{\lambda }\) of octonion \(\lambda\) is defined as

$$ \begin{aligned} \overline{\lambda } &= {\text{Re}} (\lambda ) - {\text{Im}} (\lambda ) \hfill \\ &= \lambda_{0} - \lambda_{1} {\varvec{i}} - \lambda_{2} {\varvec{j}} - \lambda_{3} {\varvec{k}} - \lambda_{4} \user2{\ell } - \lambda_{5} \user2{i\ell } - \lambda_{6} \user2{j\ell } - \lambda_{7} \user2{k\ell } \end{aligned} $$
(4)

and its modulus \(|\lambda |\) is defined as

$$\begin{aligned} |\lambda | &= \sqrt {{\text{Re}} (\lambda )^{2} + {\text{Im}} (\lambda )^{2} }\\& = \sqrt {\lambda_{0}^{2} + \lambda_{1}^{2} + \lambda_{2}^{2} + \lambda_{3}^{2} + \lambda_{4}^{2} + \lambda_{5}^{2} + \lambda_{6}^{2} + \lambda_{7}^{2} }\end{aligned} $$
(5)

An octonion with \(|\lambda | = 1\) is a unit octonion, and an octonion with \(\lambda_{0} = 0\) is a pure octonion. For two octonions \(\lambda\) and \(\lambda^{\prime}\), they satisfy the following properties:

$$ \left\{ {\begin{array}{*{20}c} {\lambda .(\lambda .\lambda^{\prime}) = (\lambda .\lambda ).\lambda^{\prime}} \\ {\lambda .(\chi^{\prime}.\lambda ) = (\lambda .\lambda^{\prime}).\lambda } \\ {\lambda^{\prime}.(\lambda .\lambda ) = (\lambda^{\prime}.\lambda ).\lambda } \\ \end{array} } \right. $$
(6)
$$ \lambda .\lambda^{\prime} \ne \lambda^{\prime}.\lambda $$
(7)
$$ \overline{{\lambda .\lambda^{\prime}}} = \overline{{\lambda^{\prime}}} .\overline{\lambda } $$
(8)

Equations (6), (7), and (8) describe the fundamental properties of octonion multiplication and illustrate some of its distinctive features. Equation (6) indicates that octonion multiplication is "associative". Associativity is a mathematical property according to which the order in which operations are performed is irrelevant. Equation (7) states that the multiplication of octonions does not satisfy the law of commutativity. Commutativity is a mathematical property that indicates that the order of the factors is irrelevant in multiplication. Equation (8) shows that the multiplication of conjugates of octonions respects the commutativity law. The conjugate of an octonion is obtained by changing the sign of the imaginary parts, i.e., the real part remains unchanged and each imaginary part is inverted.

Stable octonion Krawtchouk Moments

This section is dedicated to presenting the Krawtchouk polynomial, its corresponding moments, and the proposed Krawtchouk octonion stable moments.

Krawtchouk Polynomials-Moments

The order n of the Krawtchouk polynomials \(K_{n}^{p} \left( x \right)\), with parameters p, is expressed in terms of the hypergeometric series as defined in Eq. (1). Here, \(p \in \left\{ {0,1} \right\}\) represents the parameter of the Krawtchouk polynomials, and \({}_{2}F_{1}\) denotes the hypergeometric series, defined as follows:

$$ K_{n}^{p} \left( x \right) = {}_{2}F_{1} \left( { - n, - x, - N + 1,\frac{1}{p}} \right)\, $$
(9)
$$_{2} F_{1} \left( {a,b;c;z} \right) = \sum\limits_{k = 0}^{\infty } {\frac{{(a)_{k} (b)_{k} }}{{(c)_{k} }}} \,\,\frac{{z^{k} }}{k!} $$
(10)

By utilizing Eqs. (9) and (10), it is possible to rephrase the Krawtchouk polynomials as follows:

$$ K_{n}^{p} \left( x \right) = \sum\limits_{k = 0}^{n} {\frac{{( - n)_{k} \,( - x)_{k} }}{{( - N)_{k} k!}}} \,\,\,\left( \frac{1}{P} \right)^{k} \,\,\, $$
(11)

In this context, \(\left( a \right)_{k}\) represents the Pochhammer symbol. To maintain the orthogonality property of the Krawtchouk polynomials, the following formula is employed:

$$ \sum\limits_{x = 0}^{N - 1} {K_{n}^{p} \left( x \right)} K_{m}^{p} \left( x \right)\omega_{k} \left( x \right) = \rho_{k} \left( n \right)\delta_{nm} $$
(12)

Here, \(\rho_{k} \left( n \right)\) corresponds to the squared norm of the Krawtchouk polynomials, and \(\omega_{k} \left( x \right)\) represents the weight function associated with the Krawtchouk polynomials.

$$ \rho_{k} \left( n \right) = \left( { - 1} \right)^{n} \left( {\frac{1 - p}{p}} \right)^{n} \frac{n!}{{\left( { - N + 1} \right)_{n} }} $$
(13)
$$ \omega_{k} \left( x \right) = \left( {\begin{array}{*{20}c} {N - 1} \\ x \\ \end{array} } \right)p^{x} \left( {1 - p} \right)^{N - x - 1} $$
(14)

To address the numerical fluctuations in computing Krawtchouk polynomial coefficients using Eq. 11, a weighted and normalized representation of the polynomials is utilized. The orthonormalized Krawtchouk polynomials at the nth order are defined as follows:

$$ \tilde{K}_{n}^{p} \left( x \right) = K_{n}^{p} \left( x \right) \times \,\,\,\sqrt {\frac{{\omega_{k} (x)}}{{\rho_{k} (n)}}} $$
(15)

Discrete orthogonal moments are known for their ability to facilitate image reconsideration, suitable for 1D signals, 2D images, and 3D volumes, respectively. In this context. To compute the 2D moments of Krawtchouk \(KM_{nm}\) of order \(\left( {n + m} \right)^{th}\), for a 2D image \(F(x,y)\) with a size \(N \times N\) the following function is applied:

$$ KM_{nm} = \sum\limits_{x = 0}^{N - 1} {\sum\limits_{y = 0}^{N - 1} {\tilde{K}_{n}^{p} (x,N)\tilde{K}_{m}^{p} (y,N)F(x,y)} } ;\,\,\,\,\,\,\,\,\,\,n,m = 0,1,2,...,N - 1 $$
(16)

The reconstruction of the 2D signal can be defined as follows:

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F} (x,y)\, = \sum\limits_{n = 0}^{N - 1} {\sum\limits_{m = 0}^{N - 1} {\tilde{K}_{n}^{p} (x,N)\tilde{K}_{m}^{p} (y,N)} } KM_{nm} $$
(17)

A careful analysis of Eqs. 16 and 17 reveals that moment calculation and reconstruction are closely linked to the control parameters P of the Krawtchouk polynomial. Figure 1 significantly highlights the influence of the choice of P on the polynomial design.

Fig. 1
figure 1

Plotting a 2D graph depicting the orthonormalized Krawtchouk polynomials of order 1000 with N = 1000

Krawtchouk polynomials can be global or local in nature. Therefore, a judicious adaptation of the parameter is required to ensure equitable importance to all pixels in the image.

Based on the conclusions drawn from the figure, we opted for P = 0.5 to obtain convincing results when extracting global image features. This selection proved to be crucial in obtaining one of the best renderings for stereo image classification, demonstrating the relevance of this strategy for improving overall process performance.

Stable octonion krawtchouk moments

The SOKM in the Cartesian system is defined as follows:

Step 1 (octonion representation of stereo images): In this step, based on the definition of the octonion presented earlier, we adopt an approach that enables us to represent the stereo image in an integrated and concise way. To do this, we exploit the imaginary parts of the octonion representation in the Cartesian coordinate system to encode the six channels of the stereo images.

$$ \begin{gathered} F_{{{\text{stereo}}}} (x,y) = 0 + F_{1} (x,y){\varvec{i}} + F_{2} (x,y){\varvec{j}} + F_{3} (x,y){\varvec{k}} \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,+ F_{4} (x,y)\user2{\ell } + F_{5} (x,y)\user2{i\ell } + F_{6} (x,y)\user2{j\ell } + 0\user2{k\ell } \hfill \\ \end{gathered} $$
(18)

Pure imaginary units \(i, \, j, \, k, \, l, \, il{\text{ and }}jl\) are used to represent specific channels. Specifically, \(F_{1} \left( {x, \, y} \right), \, F_{2} \left( {x, \, y} \right){\text{ and }}F_{3} \left( {x, \, y} \right)\) corresponds to the red, green and blue channels of the right stereo image, while \(F_{4} \left( {x, \, y} \right), \, F_{5} \left( {x, \, y} \right){\text{ and }}F_{6} \left( {x, \, y} \right)\) corresponds to the red, green and blue channels of the left stereo image.

Step 2 (calculation of stable Krawtchouk moments): In this step, we address an essential element, the calculation of Krawtchouk moments \(\tilde{K}_{n}^{p} \left( x \right)\), which plays a fundamental role in the definition of SOKMs in the cartesian system. We define p as the specific control parameter of the Krawtchouk polynomial (KP). Algorithm. 1 highlighted the fast and stable computation of \(\tilde{K}_{n}^{p} \left( x \right)\) based on a triangular distribution of the polynomial array of KP as shown in Fig. 2 [32]:

Fig. 2
figure 2

Triangular distribution of the KP polynomial array [32]

Algorithm 1
figure a

Stable calculation of KPs

Step 3 (calculating octonion moments): As the commutativity law of octonion numbers is not applicable to multiplication operations, the following formulations can be used to define the OKM, where \({}^{R}OKM_{nm}\) and \({}^{L}OKM_{nm}\) are the right and left sides of the OKMs, respectively.

$$ \left\{ {\begin{array}{*{20}c} {{}^{R}OKM_{nm} = \sum\limits_{x = 0}^{N - 1} {\sum\limits_{y = 0}^{N - 1} {\tilde{k}_{{_{n} }}^{p} (x)} } \tilde{k}_{{_{m} }}^{p} (y)F_{stereo} \mu \,;\,\,\,n,m = 0,1,2,...N - 1} \\ {{}^{L}OKM_{nm} = \sum\limits_{x = 0}^{N - 1} {\sum\limits_{y = 0}^{N - 1} {\mu \tilde{k}_{{_{n} }}^{p} (x)} } \tilde{k}_{{_{m} }}^{p} (y)F_{stereo} ;\,\,\,n,m = 0,1,2,...N - 1} \\ \end{array} } \right. $$
(25)

In this paper, we have chosen a number of pure unit octonions, denoted \(\mu\), equal to \(\mu = \left( {{\varvec{i}} + {\varvec{j}} + {\varvec{k}} + \user2{\ell } + \user2{i\ell } + \user2{j\ell } + \user2{k\ell }} \right)/\sqrt 7\). Pure unit octonions are key elements in our approach.

Step 4 (stereo image reconstruction): The octonion image F, which is defined in the Cartesian system, can be reconstructed using either its right-side OKM or its left-side OKM.

$$\left\{ {\begin{array}{*{20}c} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{stereo} = \sum\limits_{x = 0}^{N - 1} {\sum\limits_{y = 0}^{N - 1} {\tilde{k}_{{_{n} }}^{p} (x)} } \tilde{k}_{{_{m} }}^{p} (y){}^{R}OKM_{nm} \mu \,;\,\,\,n,m = 0,1,2,...N - 1} \\ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{stereo} = \sum\limits_{x = 0}^{N - 1} {\sum\limits_{y = 0}^{N - 1} {\mu \tilde{k}_{{_{n} }}^{p} (x)} } \tilde{k}_{{_{m} }}^{p} (y){}^{L}OKM_{nm} ;\,\,\,n,m = 0,1,2,...N - 1} \\ \end{array} } \right..$$
(26)

Proposed classification method

The classification of stereoscopic images using CNNs generally follows a similar approach to that of conventional 2D image classification. However, using the additional depth information present in stereoscopic images can improve classification accuracy. The main steps involved in classifying stereoscopic images using a CNN are as follows [33]:

  1. 1.

    Data preparation: Pairs of stereoscopic images are used as input for the CNN. Each image pair comprises a left and a right view of the scene.

  2. 2.

    The CNN architecture includes deciding on the type of convolutional layers, pooling layers, and fully connected layers to use, as well as the number of filters and neurons to use in each layer.

  3. 3.

    CNN training: The CNN is trained on a labeled dataset, comprising pairs of stereoscopic images and their corresponding class labels.

  4. 4.

    The Validation and adjustment of hyperparameters such as the learning rate or number of layers can be performed to improve classification performance.

  5. 5.

    Classification of stereoscopic images: Once the model has been trained and validated, it can be used to classify new stereoscopic images.

In the following, we present the difference between the classical method of stereo image classification and the proposed method.

Classic classification method

Over time, the performance of CNNs in the field of stereo vision has proved remarkable. A particularly noteworthy application of CNNs in stereo matching was developed by Zbontar and LeCun. In their study [34], the authors describe in detail how they set up a CNN network (Fig. 3) to process stereo images for stereo matching. Here, is a summary of the architecture.

  1. 1.

    Input: Stereo images are fed to the network in the form of 9 × 9 Gy image patches.

  2. 2.

    Convolution layer: The first layer is a convolution layer with 32 convolution filters of size 5 × 5. This layer is responsible for feature extraction from the image patches.

  3. 3.

    Fully connected layers: The outputs of the convolutional layers are then sent to the following fully connected layers. Layers 2 and 3 are fully connected layers with 200 neurons each. The output vectors of the convolutional layers are concatenated from the two channels (left and right) to form a single 400-dimensional vector.

  4. 4.

    Layer 4 to Layer 7: These layers are also fully connected layers, each with 300 neurons. They continue to process the information extracted from the previous layers.

  5. 5.

    Output layer: The final layer 8 is responsible for producing a distribution of good and bad match classes. It can be used to predict the disparity or depth of matching pixels in a stereo image pair.

Fig. 3
figure 3

Network for stereo image processing [34]

Processing pairs of stereo images can be complex and time-consuming, especially when using deep neural networks. Stereo matching requires processing each image separately and extracting information from each image to estimate the corresponding disparity or depth. CNNs are often used for this task due to their ability to learn higher-level representations from input data. However, the complexity of deep neural networks can lead to significant computation times, particularly when training the network on large amounts of data. To reduce the processing time and optimize the process, several techniques can be used as shown in Fig. 4:

Fig. 4
figure 4

Optimum technique for stereo image processing

To address the challenges posed by the complexity and computational time associated with traditional stereo-image pair processing, we propose an alternative approach that leverages octonion moments in conjunction with CNNs.

Suggested classification method and database

Database

Before beginning to present the proposed method, it is important to highlight the database used in this study. We used a database downloaded from the public Kaggle website, which contains more than 1,600 stereoscopic images divided into several classes. To construct a diversified database, we selected three classes, as illustrated in Fig. 5, and then carried out manual processing, which included image resizing and data augmentation to prepare a balanced database.

Fig. 5
figure 5

Database components

Proposed classification method

Our method for classifying stereoscopic images generally follows the following process (Fig. 6). First, we prepared a balanced database comprising three classes of stereoscopic images, thus ensuring adequate representativeness of our dataset. Then, in the second step, we calculate the stable Krawtchouk moments using step 2, as presented in Section "Stable octonion Krawtchouk Moments". In the third step, we calculate octonion moment matrices using step 3, which is presented in Section "Stable octonion Krawtchouk Moments"; this step involves capturing essential information about the visual characteristics of the images. Finally, in the fourth step, we feed these matrix moments into a convolutional neural network (CNN) model specifically designed for stereoscopic image classification. This innovative approach exploits the wealth of information contained in SOKMs to achieve accurate, in-depth classification of stereoscopic images, offering a new perspective on this complex task.

Fig. 6
figure 6

General diagram of the proposed classification method

To present our methodology in a comprehensive way, we constructed a detailed two-stage diagram (see Fig. 7). The first stage is dedicated to pre-processing, encompassing two major objectives. Firstly, we carry out manual operations such as data cleaning, scaling, and augmentation, with the aim of preparing a reduced and balanced database. This rigorous process results in a final database of 600 stereoscopic color images, each in 512 × 512 × 3 format, divided into three main categories. The second part of this step involves calculating the stable SOKM for each image in the database. This allows us to extract the global properties and describe the essential features of all stereoscopic images, creating a database containing 600 matrix moments, each with dimensions of 128 × 128 × 1. The second stage of our methodology is devoted to training the CNN model on all the moments extracted from the images. This training phase enables our CNN model to develop a thorough understanding of the stereoscopic features captured by the SOKM moments and to perform accurate classification of the stereoscopic images. Table 3 gives an overview of the structure of the model adopted, including the number of parameters and the shape of the output at each layer: (1) The layer (type) column lists the name and type of each layer in the model. The type can be "Input Layer", "Dense", or "Conv2D." (2) The output shape column describes the shape of the output produced by each layer. It includes the batch size as the first dimension, followed by the shape of the output tensor. For example, (None, 128) means that the output has a batch size of None (indicating that it can be variable) and a shape of (128). (3) The Param # column displays the number of parameters (weights and biases) associated with each layer. This number represents the layer's trainable parameters. If a layer has no parameters, it will display 0 in this column.

Fig. 7
figure 7

Detailed diagram of the proposed classification method

Table 3 Model summary

By combining these two steps, our approach aims to open up new perspectives in the field of stereoscopic image classification by taking advantage of matrix moments and deep learning.

Results and discussion

The results section is articulated in two distinct parts, each aimed at deepening our understanding of the effectiveness of SOKM-based image representation in different aspects. The first part focuses on demonstrating this effectiveness using metrics such as mean square error (MSE) and structural similarity index (SSIM). These metrics are carefully chosen to highlight the quality of the image representation obtained with the SOKM approach. In addition, we highlight the efficiency of SOKM moments by examining the computation time required to obtain these stereo image moments, underlining the practical performance of the method. The second part of this section is dedicated specifically to illustrating the effectiveness of our approach for stereo image classification. To this end, we use crucial measures such as:

  1. i.

    Visual representation of accuracy and loss across epochs, offering a dynamic perspective on model performance throughout learning.

  2. ii.

    Use of confusion matrices to examine in detail the model's ability to discriminate between different classes.

  3. iii.

    In-depth evaluation of classifier performance metrics, including F1 score, recall, precision and accuracy. This multi-dimensional approach enables a more in-depth analysis of the model's performance in relation to different evaluation dimensions.

  4. iv.

    In-depth analysis of ROC curves.

By combining these two parts, our aim is to provide a holistic and in-depth understanding of the effectiveness of our approach, both from the point of view of image representation and that of stereoscopic classification.

Stereo image reconstruction: representation capacity and computation time

As part of our research into the ability to represent color stereo images, we are undertaking a comparative study using different types of moments, such as classic Krawtchouk moments, quaternion Krawtchouk moments and octonion Krawtchouk moments. The main objective is to evaluate the effectiveness and efficiency of each type of moment in representing color stereo images.

Before turning to the presentation of the results and their interpretation, we would first like to point out the differentiation between the methods of calculating moment matrices [35–37] using the three categories of moments mentioned (Fig. 8).

Fig. 8
figure 8

Calculation of moment matrices using the three classic moment types [38], quaternion [39] and octonion

To carry out this evaluation, we carried out an initial test in which we observed the results with the naked eye. This approach highlights the capabilities of SOKM \(\left( { {}^{R}{\text{OKM}}_{{{\text{nm}}}} \,{\text{and}}\,{}^{L}{\text{OKM}}_{{{\text{nm}}}} \,} \right)\) in terms of color stereo image representation. We use different reconstruction orders, ranging from n = 0 to n = 500 and a stereo image of size 1000 × 1000, to cover a wide range of detail levels. Adopting the naked eye in this stage of the test allows us to assess the visual quality of the images reconstructed from the different types of moments. We aim to determine whether the proposed method is capable of capturing the details and features of color stereo images, while maintaining high visual fidelity. By carrying out this external comparison (Fig. 9), we hope to gain valuable insights into the relative performance of the different types of moments studied. These results will enable us to better understand the advantages and limitations of each type of moment, contributing to the advancement of color stereo image representation techniques and their application in various fields such as computer vision, medical imaging and image analysis. From the observations in Fig. 9, it is clear that the image representation capability using the proposed SOKMs exceeds that of quaternion moments and conventional moments, even when very low reconstruction orders are used.

Fig. 9
figure 9

The ability to represent the image via the three types of moments

To complete our assessment of stereo image representativeness, we also perform statistical tests using two commonly used image quality measures: mean square error (MSE) [40] and structural similarity index (SSIM) [26], adopting different reconstruction orders, ranging from n = 0 to n = 800 (Fig. 10). MSE is a measure of the mean squared difference between the original stereo image and the reconstructed stereo image. The lower the MSE, the more accurate and faithful to the original the reconstruction is considered to be. We calculate the MSE for each type of moment studied and for each reconstruction order n. This allows us to numerically quantify the quality of each method and compare their performance. The SSIM is a more complex measure that evaluates the structural similarity between the original stereo image and the reconstructed stereo image. It takes into account luminance fidelity, contrast and image spatial structure. An SSIM score close to 1 indicates a high-quality reconstruction, while a score closes to 0 indicates a low-quality reconstruction.

Fig. 10
figure 10

MSE and SSIM of two images of size 1000 × 1000 for n = 800

Subsequently, we emphasize the efficiency of the proposed SOKM moments by examining the computation time necessary for calculating stereo image moments (Table 4). Specifically, we compare the computation time of octonion moments with that of classical moments and quaternion moments. Calculating the moments of a stereo image can be a complex and resource-intensive task. Octonion moments, classical moments and quaternion moments are different approaches for representing stereo image features.

Table 4 Calculation time (in seconds) required to calculate stereo image moments

Having carried out the three tests mentioned, namely visual observation, statistical tests using MSE and SSIM, and the computation time test, we can confirm that the method based on octonion moments is extremely effective. Visual observation with the naked eye enabled us to assess the visual quality of the images reconstructed from the different types of moments. If the octonion moment method was able to capture the details and features of the color images with high visual fidelity, this is a positive indicator of its effectiveness. Statistical tests, using MSE and SSIM as measures of image quality, provided us with quantitative results to assess the performance of each method. If the method based on the proposed stable Krawtchouk octonion moments obtained high scores in terms of low MSE and SSIM close to 1, this confirms its effectiveness in terms of accurate representation of stereo images. Furthermore, the computational time test revealed that the octonion moment-based method required less time to compute stereo image moments compared to the classical and quaternion methods. This increased time efficiency indicates that the octonion moment method may be more practical and economical in terms of computational resources.

Considering the results of all three tests, it is clear that the octonion moment method is highly efficient at visualizing stereo images. This conclusion reinforces the relevance of using octonion moments in this context and underlines their potential for future applications in stereo image analysis.

Stereo image classification

In this section, we present the results obtained by our classification method based on the SOKM. First, we present the curves representing the evolution of "loss" and "accuracy" generated by our CNN architecture. This architecture is used to classify the stereoscopic images in our database into three distinct categories: class 1 designates "towers", class 2 designates "streets" and class 3 designates "statues". Our database comprises a total of 600 image moments divided into three sets: the training set, the validation set, and the test set, distributed according to a ratio of 70% for training, 15% for validation, and 15% for testing. Thus, we use 420 images to train our model and 90 images for its validation. In order to assess the performance of our classifier, we also reserve 90 images for evaluation purposes.

The performance of our CNN model during training is represented visually, based on the loss and accuracy curves. Figure 11 illustrates the learning traces observed with the training dataset for 35 epochs using Adam's optimizer and assuming a learning rate of 0.0005. This plot offers a visual representation of the model's learning progression and provides valuable information on its performance. The graphs in Fig. 11 show that the "loss" curve decreases as the accuracy increases. On the basis of these curves, we can deduce that we have started to achieve decent results around the twentieth epoch. Generally, we can deduce that our model works well with a small database, learns sufficiently well and predicts accurately. These results are attributable to the intrinsic ability of the proposed SOKM moments to offer a global representation of images. Even when the order of moments does not exceed half the image dimension, this approach manages to capture essential features. This can potentially lead to advantages in terms of processing speed. In other words, SOKM moments manage to capture key information in a condensed way, which can translate into positive gains in processing efficiency.

Fig. 11
figure 11

The plot of accuracy and Loss in terms of epoch

We then demonstrate the classifier's effectiveness by evaluating its performance on test samples using the confusion matrix as an analysis tool (Fig. 12) [43, 44]. This evaluation is carried out on a set of 90 images, which are divided into three distinct classes as follows: 30 instances of class 1 "Tower", 30 instances of class 2 "Street", and 30 instances of class 3 "Statue". This varied dataset enables us to test the classifier's ability to identify and distinguish between different categories of objects. Looking closely at these matrices, we see that of the 30 samples in each class, namely "Tower" and "Statue", 29 were accurately identified by our classifier. Moreover, for the "Street" class, all 30 samples were predicted with remarkable accuracy. This performance highlights the remarkable ability of our classifier to effectively discriminate and differentiate image features within each category. These results reinforce confidence in the skills of our classification model and suggest that it is capable of successfully generalizing test data to produce relevant and consistent predictions.

Fig. 12
figure 12

Confusion matrix

Classifier performance is assessed through four crucial measures for multiple classification tasks: precision, accuracy, recall and F1 score. These indicators are detailed in Table 5. In addition, we present the ROC curves corresponding to each class [34].

Table 5 Classifier performance

Table 5 presents a comprehensive evaluation of the classification model's performance for three distinct classes. Key indicators such as accuracy, precision, recall and F1 score are used to evaluate model performance. For Class 1, the model achieved high accuracy, with an accuracy of 97.78%, precision, recall and F1 score of 0.97. This shows that the model correctly predicts the vast majority of instances in this class. Class 2 reveals perfect performance with values of 100% for all the measures, meaning that the model predicts every instance of this class accurately. As for Class 3, accuracy is 97.78% with a precision, recall and F1 score of 0.97, underlining the model's ability to correctly predict the majority of instances in this class. In summary, the model shows strong and consistent performance across all classes, reflecting its ability to make accurate predictions and correctly identify instances in each category.

Overall, the model performed very well for all the classes, with very accurate predictions and the ability to identify most of the real instances of each class. Class 2 stood out in particular, with perfect performance in all the evaluation metrics [45, 46].

The proposed classification approach was evaluated using ROC curves. The ROC curve, represented graphically, provides a comprehensive summary of the classifier's capabilities. In the context of the ROC graph, the vertical axis, also known as the y-axis, represents the model's sensitivity. Sensitivity measures the model's ability to correctly detect true positives among all the real positive examples. On the other hand, the horizontal axis, or abscissa axis, represents specificity 1. Specificity is the model's ability to correctly classify negative examples, i.e., true negatives, among all real negative examples. By combining these two concepts on the ROC graph, we obtain a visual representation of the model's overall performance at different classification thresholds. An ideal model would be represented by a dot in the top left-hand corner of the graph, indicating maximum sensitivity and perfect specificity 1 [47]. All ROC curves are grouped together in a single figure (Fig. 13). Observation of this figure reveals that the proposed method offers optimum performance for all classes. As a result, the model's performance and classification capability are described as remarkable.

Fig. 13
figure 13

The ROC curves for each class

Figure 14 shows a comprehensive comparison of alternative methods for classifying stereoscopic images. The measures used for this comparison include the averages of precision, accuracy, recall and F1 score [48, 49]. These calculations were performed using the same reference classes. Method 1 corresponds to an approach that closely aligns with the procedure of the proposed method, differing mainly in its use of classical Krawtchouk moments. Method 2, similar to the proposed method, follows a parallel methodology but uses quaternion Krawtchouk moments. Analysis of the results highlights the comparative performance of the three classification methods (Proposed method, method 2 and method 3) using various evaluation measures. The proposed method performs best, with an average precision of 98.52%, an average recall of 98% and an average F1 score of 0.98. Method 2 achieves slightly lower average performance, with an average precision of 90.73%, an average recall of 89% and an average F1 score of 0.87. Method 3 falls between the two, with an average precision of 92.35%, an average recall of 90% and an average F1 score of 0.90. Overall, the proposed method is the best performer.

Fig. 14
figure 14

Comparison of average metrics for different methods

Following the completion of the various tests mentioned above, including visual observation, statistical analyses using MSE and SSIM measures, computation time evaluation, use of confusion matrices, ROC analysis, and other tests, a close inspection of the results clearly reveals that the advanced method based on stable Krawtchouk octonion moments stands out for its exceptional efficiency in both the representation and classification of stereo images. This finding significantly underscores the relevance of integrating octonion moments in this particular framework, highlighting their ability to excel in the faithful representation of stereo-image features. Furthermore, these findings reinforce the credibility of the proposed method as a promising choice for classification tasks, demonstrating its usefulness in diverse applications. The significance of these findings extends beyond the specific results obtained in this study, suggesting strong potential for octonion moments in the field of stereo image analysis. This breakthrough offers encouraging prospects for future developments, paving the way for innovative applications and significant advances in the field of stereo vision.

Conclusion

In conclusion, this paper addresses the crucial problem of object classification within stereoscopic images, an area of research that has received limited attention despite the growing popularity of such images. Adopting a two-stage methodology, we developed a comprehensive approach to this challenge. The first step involved setting up a balanced database of color stereoscopic images, comprising 600 images divided into three classes. Thanks to pre-processing techniques such as cleaning, scaling and sample augmentation, we were able to guarantee the quality and diversity of the data. The second stage was dedicated to the design and training of a CNN model based on matrix moment image data. The results obtained are extremely promising, with an average precision of 98.52%, an average recall of 98% and an average F1 score of 0.98. These performance measures attest to the remarkable effectiveness of our method for object classification in the context of stereoscopic images. The use of standard metrics such as precision, accuracy, recall, F1 score and ROC curves enabled a rigorous and quantitative evaluation of our approach. These robust results underline the relevance and applicability of our methodology for solving complex problems related to visual content analysis in the stereoscopic image domain. Finally, this study makes a major contribution to research by presenting an efficient and accurate solution for the categorization of objects within stereo images.

From a broader perspective, this research can pave the way for advancements in various domains where stereoscopic image analysis is crucial. These promising results and suggested future applications underscore the potential impact of this study on both current and future research efforts. This paper not only fills a gap in the literature but also sets the stage for further exploration and innovation in the field, such as the classification of stereo images associated with diabetic retinopathy (Fig. 15). As a research team, we aim to incorporate optimization methods and embedded boards into our future work, to turn these advances into reality.

Fig. 15
figure 15

Stereo pair of fibrovascular proliferation in diabetic retinopathy (http://eye-pix.com/alignment-of-stereo-images/)