Quaternion Convolutional Neural Networks: Current Advances and Future Directions

Altamirano-Gomez, Gerardo; Gershenson, Carlos

doi:10.1007/s00006-024-01350-x

Quaternion Convolutional Neural Networks: Current Advances and Future Directions

Open access
Published: 28 August 2024

Volume 34, article number 42, (2024)
Cite this article

Download PDF

You have full access to this open access article

Advances in Applied Clifford Algebras Aims and scope Submit manuscript

Quaternion Convolutional Neural Networks: Current Advances and Future Directions

Download PDF

607 Accesses
Explore all metrics

Abstract

Since their first applications, Convolutional Neural Networks (CNNs) have solved problems that have advanced the state-of-the-art in several domains. CNNs represent information using real numbers. Despite encouraging results, theoretical analysis shows that representations such as hyper-complex numbers can achieve richer representational capacities than real numbers, and that Hamilton products can capture intrinsic interchannel relationships. Moreover, in the last few years, experimental research has shown that Quaternion-valued CNNs (QCNNs) can achieve similar performance with fewer parameters than their real-valued counterparts. This paper condenses research in the development of QCNNs from its very beginnings. We propose a conceptual organization of current trends and analyze the main building blocks used in the design of QCNN models. Based on this conceptual organization, we propose future directions of research.

A survey of quaternion neural networks

Article 16 August 2019

A Comprehensive Review on the Advancement of High-Dimensional Neural Networks in Quaternionic Domain with Relevant Applications

Article 13 April 2023

Centrosymmetric constrained Convolutional Neural Networks

Article 09 January 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the last decade, the use of deep learning models has become ubiquitous for solving difficult and open problems in science and engineering. Convolutional Neural Networks (CNN) were one of the first deep learning models [56, 88, 99], and their success in tackling the large scale object recognition and classification problem (Imagenet challenge) [92], led to its application in other domains.

The core components of a CNN architecture are the convolution and pooling layers. A convolution layer is as a variation of a fully connected layer (Perceptron), as shown in Fig. 1. In the former case, a weight-sharing mechanism over locally connected inputs is applied [51]. This technique is inspired by the local receptive fields discovered by Hubel and Wiesel in their experiments with macaques [79].

Formally speaking, the convolution layer applies the mathematical definition of convolution between discrete signals; thus, for a bi-dimensional input:

$$\begin{aligned} \textbf{D}= [d(x,y)] \in \mathbb {R}^{N_1\times N_2}, \end{aligned}$$

(1.1)

the convolution with a kernel:

$$\begin{aligned} \textbf{W}= [w(x,y)] \in \mathbb {R}^{M_1\times M_2}, \end{aligned}$$

(1.2)

is defined as follows:

$$\begin{aligned} \textbf{F}= & \textbf{W} *\textbf{D} \nonumber \\= & \sum _{r=-\infty }^\infty \sum _{s=-\infty }^\infty \left[ w(r,s) d(x-r,y-s)\right] ; \end{aligned}$$

(1.3)

thus, $\textbf{F}\in \mathbb {R}^{N_1-M_1+1 \times N_2-M_2+1}$ is called feature map.

The convolution layer is typically followed by a pooling layer; this provides a sort of local invariance to small rotations and translations of the input features [95, 96]. Moreover, T. Poggio proved that the combination of convolution and pooling layers produce an invariant signature to a group of geometric transformations [5, 135, 136]. However, in the design of very deep architectures, researchers have encountered some difficulties, e.g. reducing the number of parameters without losing generalization, and finding fast optimization methods for adjusting millions of parameters avoiding the vanishing and exploding gradient problems [76, 131].

Fundamental theoretical, as well as experimental analysis, have shown that some algebraic systems, different from the real numbers, have the potential to solve these problems. For example, using a complex numbers representation avoids local minima caused by the hierarchical structure [124], exhibits better generalization [73], and faster learning [12]. Because of the Cayley–Dickson construction [13], it could be inferred that these properties would hold on quaternion-valued neural networks. Recent experimental work has favored this conjecture, where quaternion-valued neural networks show a reduction in the number of parameters, and improved classification accuracies compared to its real-valued counterparts [8, 54, 77, 125, 130, 151, 181]. In addition, a quaternion representation can deal with four-dimensional signals as a single entity [21, 45, 75, 114, 142], models efficiently 3D transformations [66, 93], captures internal latent relationships via the Hamilton product [127], among other properties.

Because of the diversity of deep learning models, this paper focuses in those using quaternion convolution as the main component. Consequently, we have identified three dominant conceptual trends: the classic, the geometric, and the equivariant approaches. These differ in the definition and interpretation of the quaternion convolution layer. The main contributions of this paper can be summarized as follows:

1.
This paper presents a classification of QCNNs models based on the definition of quaternion convolution.
2.
This paper provides a description of all atomic components needed for implementing QCNNs models, the motivation behind each component, the challenges when they are applied in the quaternion domain, and future directions of research for each component.
3.
This paper presents an organized overview of the models that have been found in the literature. They are organized by application domain, classified in: classic, geometric, or equivariant approach; and presented by the type of model: recurrent, residual, convNet, generative or CAE.

The organization of the paper is as follows: In Sect. 2, we introduce fundamental concepts of quaternion algebra, then in Sect. 3 we explain the three conceptual trends (classic, geometric, and equivariant), followed by a presentation of the key atomic components to construct QCNNs architectures. Thereafter, in Sect. 4, we show a classification of current works by application, and present the diverse types of architectures. Finally, Sect. 5 presents open issues and guidelines for future work, followed by the conclusions in Sect. 6. Table 12 present a list of acronyms used in the paper.

2 Quaternion Algebra

This mathematical system was developed by W.R. Hamilton (1805–1865) at the middle of the XIX century [63,64,65]. His work on this subject started by exploring ratios between geometric elements, consequently he called quaternion to the quotient of two vectors, which are expressed in the quadrinomial form as follows [64]:

$$\begin{aligned} {\textbf {q}} = q_R+ q_I\hat{i}+ q_J\hat{j} + q_K\hat{k}, \end{aligned}$$

(2.1)

where $q_R, q_I, q_J, q_K$ are scalars, and $\hat{i}, \hat{j}, \hat{k}$ are three right versors, i.e. the quotient of two perpendicular equally long vectors, and also the square roots of negative unity.

In terms of modern mathematics, the quaternion algebra, $\mathbb {H}$, is: the 4-dimensional vector space over the field of the real numbers, generated by the basis $\{1,\hat{i},\hat{j},\hat{k}\}$, and endowed with the following multiplication rules (Hamilton product):

$$\begin{aligned} (1)(1)= & 1\end{aligned}$$

(2.2)

$$\begin{aligned} (1)(\hat{i})= & \hat{j}\hat{k} = -\hat{k}\hat{j} = \hat{i} \nonumber \\ (1)(\hat{j})= & \hat{k}\hat{i} = -\hat{i}\hat{k} = \hat{j} \nonumber \\ (1)(\hat{k})= & \hat{i}\hat{j} = -\hat{j}\hat{i} = \hat{k} \nonumber \\ \hat{i}^2= & \hat{j}^2 = \hat{k}^2 =-1 \end{aligned}$$

(2.3)

Thus, for two arbitrary quaternions: ${\textbf {p}} = p_R + p_I\hat{i} + p_J\hat{j} + p_K\hat{k}$ and ${\textbf {q}} = q_R + q_I\hat{i} + q_J\hat{j} + q_K\hat{k}$, their multiplication is calculated as follows:

$$\begin{aligned} {\textbf {p}} {\textbf {q}}= & p_R q_R - p_I q_I - p_J q_J - p_K q_K \nonumber \\ & + (p_R q_I + p_I q_R + p_J q_K - p_K q_J) \hat{i} \nonumber \\ & + (p_R q_J - p_I q_K + p_J q_R + p_K q_I) \hat{j} \nonumber \\ & + (p_R q_K + p_I q_J - p_J q_I + p_K q_R) \hat{k} \end{aligned}$$

(2.4)

Notice that each coefficient of the resulting quaternion, is composed of real and imaginary parts of the factors p and q. In this way, the Hamilton product capture interchannel relationships between both factors.

Moreover, quaternion algebra is associative and non-commutative, and in the sense of W. K. Clifford (1845–1879), it is the even geometric algebra of the 3D Euclidean space, $G(\mathbb {R}^3)$ [107, 152].

Next, there are introduced some useful operations with quaternions.

Let, ${\textbf {q}} = q_R+ q_I\hat{i}+ q_J\hat{j} + q_K\hat{k}$, be a quaternion, its conjugate is defined as:

$$\begin{aligned} \bar{{\textbf {q}}} = q_R - q_I\hat{i} - q_J\hat{j} - q_K\hat{k} \end{aligned}$$

(2.5)

And its magnitude is computed as follows:

$$\begin{aligned} \Vert {\textbf {q}}\Vert = \sqrt{{\textbf {q}}\bar{{\textbf {q}}}} \end{aligned}$$

(2.6)

As well as complex numbers, quaternions can be represented in polar form [85, 164]:

$$\begin{aligned} {\textbf {q}}= \Vert {\textbf {q}}\Vert \left[ \cos (\theta ) +\sin (\theta ) \hat{q}\right] , \end{aligned}$$

(2.7)

where $\hat{v}$ is a unit vector, and:

$$\begin{aligned} \theta = \text {atan} \left( \frac{\sqrt{q_I^2 + q_J^2 + q_K^2}}{q_R} \right) . \end{aligned}$$

(2.8)

Note that for a pure quaternion ${\textbf {q}}=0$ its angle $\theta =0$; otherwise, the unit vector of the polar representation can be computed as follows:

$$\begin{aligned} \hat{v}=\frac{q_I\hat{i} + q_J\hat{j} + q_K\hat{k}}{\Vert q_I\hat{i} + q_J\hat{j} + q_K\hat{k}\Vert }. \end{aligned}$$

(2.9)

Even though other polar parametrizations have been proposed, see for example [4, 20, 21, 143], current QCNNs apply Eq. (2.7).

As was mentioned before, a quaternion can represent a geometric transformation. Let ${\textbf {w}}_\theta $ be a unitary versor expressed in polar form:

$$\begin{aligned} {\textbf {w}}_\theta = \cos (\theta ) +\sin (\theta ) (w_I\hat{i} + w_J\hat{j} + w_K\hat{k}), \end{aligned}$$

(2.10)

and let $\textbf{q}$ be any quaternion; then, their multiplication:

$$\begin{aligned} {\textbf {p}}= {\textbf {w}}_\theta {\textbf {q}} \end{aligned}$$

(2.11)

applies a rotation, with angle $\theta $, along an axis ${\textbf {w}}=w_I\hat{i}+w_J\hat{j}+w_K \hat{k}$.

Since we have not made any particular assumption on ${\textbf {q}}$, we are applying a rotation in the four-dimensional space of quaternions. Denoting by $\Pi _1$ the orthogonal projection to the plane defined by the scalar axis and the vector $w_I\hat{i} + w_J\hat{j} + w_K\hat{k}$, and by $\Pi _2$ the orthogonal projection to plane perpendicular to $w_I\hat{i} + w_J\hat{j} + w_K\hat{k}$, it can be proved that this 4-dimensional rotation is composed of a simultaneous rotation of the elements on plane $\Pi _1$, and of the elements on plane $\Pi _2$ [164].

Alternatively, we can split the transformation as a sandwiching product:

$$\begin{aligned} {\textbf {p}}= {\textbf {w}}_{\frac{\theta }{2}} {\textbf {q}}\bar{{\textbf {w}}}_{\frac{\theta }{2}}; \end{aligned}$$

(2.12)

in this case, the angle of each versor, ${\textbf {w}}_{\theta /2}$, is divided to half.

From the group theory perspective, the set of unitary versors lies on a 3-Sphere, $\mathbb {S}^3$, embedded in a 4D Euclidean space [66]; and together with the Hamilton product form a group, which is isomorphic to the 4D rotation group SO(4) [164], and to the special unitary group SU(2) [43, 66]. Moreover, there exist a two to one homomorphism with the rotation group SO(3) [43, 66].

Another operation of interest is quaternion convolution; because the non-commutative property of quaternion multiplication, we have 3 different definitions of discrete quaternion convolution. First, left-side quaternion convolution, defined as follows [44, 46, 133]:

$$\begin{aligned} (\textbf{w} *\textbf{q})(x,y) = \sum _{r=-\frac{L}{2}}^{\frac{L}{2}} \sum _{s=-\frac{L}{2}}^{\frac{L}{2}} [\textbf{w}(r,s) \textbf{q}(x-r,y-s)] \end{aligned}$$

(2.13)

Second, right-side quaternion convolution, defined as follows [46]:

$$\begin{aligned} (\textbf{q} *\textbf{w})(x,y) = \sum _{r=-\frac{L}{2}}^{\frac{L}{2}} \sum _{s=-\frac{L}{2}}^{\frac{L}{2}} [\textbf{q}(x-r,y-s)\textbf{w}(r,s)] \end{aligned}$$

(2.14)

In third place, we have two-sided quaternion convolution [46, 133, 141]:

$$\begin{aligned} (\textbf{w}_{\text {left}} *\textbf{q} *\textbf{w}_{\text {right}})(x,y) = \sum _{r=-\frac{L}{2}}^{\frac{L}{2}} \sum _{s=-\frac{L}{2}}^{\frac{L}{2}} [\textbf{w}_{\text {left}}(r,s)\textbf{q}(x-r,y-s)\textbf{w}_{\text {right}}(r,s)] \end{aligned}$$

(2.15)

where ${\textbf {q}}, {\textbf {w}}, {\textbf {w}}_{\text {left}}, {\textbf {w}}_{\text {right}} \in \mathbb {H}$, $*$ represents the convolution operator, and the Hamilton product is applied between $\textbf{q}$’s and $\textbf{w}$’s.

Finally, Table 1 summarizes the notation that will be used in the rest of this paper.

Table 1 Definition of mathematical symbols

Full size table

3 QCNN Components

In this section, we present the main building blocks for implementing quaternion-valued convolutional deep learning architectures. Future directions for each component are specified at the end of each subsection.

3.1 Quaternion Convolution Layers

In Sect. 2, it was presented the different definitions of quaternion convolution. From this definition, different conceptual approaches can be obtained by setting some restrictions on the quaternions, e.g. using unitary quaternions, or using just the real part of one of the quaternions. In the following paragraphs, we describe the three main conceptual treads found in most of the current works on QCNNs.

Let’s assume a dataset, where each sample has dimension $N\times M\times 4$, i.e. an input can be represented as a 4-channels matrix of real numbers. Then each sample, $\textbf{Q}$, is represented as a $N\times M$ matrix where each element is a quaternion:

$$\begin{aligned} \textbf{Q}= [\textbf{q}(x,y)] \in \mathbb {H}^{N\times M}. \end{aligned}$$

(3.1)

Then $\textbf{Q}$ can be decomposed in its real and imaginary components:

$$\begin{aligned} \textbf{Q}= {Q_R}+{Q_I}\hat{i}+{Q_J}\hat{j}+{Q_K}\hat{k} \end{aligned}$$

(3.2)

where ${Q_R}, {Q_I}, {Q_J}, {Q_K} \in \mathbb {R}^{N\times M}$.

In the same way, a convolution kernel of size $L\times L$ is represented by a quaternion matrix, as follows:

$$\begin{aligned} \textbf{W}= [\textbf{w}(x,y)] \in \mathbb {H}^{L\times L} \end{aligned}$$

(3.3)

which can be decomposed as:

$$\begin{aligned} \textbf{W}= {W_R}+{W_I}\hat{i}+{W_J}\hat{j}+{W_K}\hat{k}, \end{aligned}$$

(3.4)

where ${W_R},{W_I}, {W_J}, {W_K} \in \mathbb {R}^{L\times L}$.

In the classic approach, the definition of quaternion convolution is a natural extension of the real and complex convolution; its role is to compute the correlation between input data and kernel patterns, in the quaternion domain. Then, Altamirano [3], Gaudet and Maida [54], and Parcollet et al. [130] define the convolution layer using left-sided convolution:

$$\begin{aligned} \textbf{F} = \textbf{W} *\textbf{Q}. \end{aligned}$$

(3.5)

Thus, $\textbf{F}\in \mathbb {H}^{(N-L+1)\times (M-L+1)}$ represents the output of the layer, i.e. a quaternion feature map, and each element of the tensor is computed as follows:

$$\begin{aligned} \textbf{f}(x,y) = (\textbf{w} *\textbf{q})(x,y). \end{aligned}$$

(3.6)

This approach does not make any particular assumption about quaternions $\textbf{w}$ and $\textbf{q}$. Thus, the convolution represents the integral transformation of a quaternion function on a quaternion input signal.

It should be noted that the definition of Altamirano [3] is based on the work on Quaternion-valued Multilayer Perceptrons (QMLP’s) by Arena et al. [6,7,8], while the definition of Gaudet and Maida [54] and Parcollet et al. [130] is derived from Travelsi’s research on Complex-valued CNNs [157]. However, all the authors arrived to the same definition of the convolution layer.

Alternatively, we have the geometric approach which was constructed based on the previous work of Matsui et al. [112] and Isokawa et al. [81]. In this case, the quaternion product applies affine transformations over the input features, consequently the quaternion convolution inherits this property. Thus, Zhu et al. [181] apply the two-sided convolution definition:

$$\begin{aligned} \textbf{F} = \textbf{W} *\textbf{Q} *\mathbf {\bar{W}} \end{aligned}$$

(3.7)

where $\textbf{F}\in \mathbb {H}^{(N-L+1)\times (M-L+1)}$, and each element of the output is computed as follows:

$$\begin{aligned} \textbf{f}(x,y) =\sum _{r=-\frac{L}{2}}^{\frac{L}{2}} \sum _{s=-\frac{L}{2}}^{\frac{L}{2}} \frac{\textbf{w}(r,s)\textbf{q}(x-r,y-s)\bar{\textbf{w}}(r,s)}{\Vert \textbf{w}(r,s)\Vert }. \end{aligned}$$

(3.8)

In addition, each quaternion $\textbf{w}$ is represented in its polar form:

$$\begin{aligned} \textbf{w}(r,s)=\Vert \textbf{w}(r,s)\Vert \left( \cos \frac{\theta (r,s)}{2}+\sin \frac{\theta (r,s)}{2} \textbf{u} \right) \end{aligned}$$

(3.9)

where $\theta (r,s)\in [-\pi ,\pi ]$, and $\textbf{u}$ represents the unitary rotation axis. Thus, quaternion convolution applies fixed-axis rotation and scaling transformation (2DoF) on the quaternion $\textbf{q}$. Note that the role of the magnitude of the quaternion weight at the denominator of Eq. (3.8) is to compensate the double effect of multiplication of the magnitude of $\textbf{w}(r,s)$ by the left and by the right. Thus, for $\Vert \textbf{w}(r,s)\Vert =0$ we have directly $\textbf{f}(x,y)=0$, avoiding division by zero.

Similarly, Hongo et al. [77] use the two-sided quaternion convolution definition, but add a threshold value for each component of the quaternion:

$$\begin{aligned} \textbf{F} = \textbf{W} *\textbf{Q} *\mathbf {\bar{W}} + \textbf{B}. \end{aligned}$$

(3.10)

where $\textbf{B} \in \mathbb {H}^{(N-L+1)\times (M-L+1)}$.

In both cases, quaternion convolution applies geometric transformations over the input data. In a recent work, Matsumoto et al. [113] note the limited expression ability of working with fixed axes, and propose a model that learns general rotation axis (4DoF), interested readers are encouraged to see full details of this model in [113].

Finally, concerning the equivariant approach Shen et al. [151] propose a simplified version; they convolve the quaternion input, $\textbf{Q} \in \mathbb {H}^{N\times M}$, with a kernel of real numbers, $\textbf{W} \in \mathbb {R}^{L\times L}$:

$$\begin{aligned} \textbf{F}= & \textbf{W} *\textbf{Q} \nonumber \\= & \textbf{W} *\textbf{Q}_R +\textbf{W} *\textbf{Q}_I \hat{i}+\textbf{W} *\textbf{Q}_J \hat{j}+ \textbf{W} *\textbf{Q}_K \hat{k}. \end{aligned}$$

(3.11)

This version allows to extract equivariant features, i.e. if an input sample, $\textbf{Q}$, produces a feature map, $g(\textbf{Q})$, then the rotated input sample, $R(\textbf{Q})$, will produce a rotated feature map, $g(\textbf{Q})$, thus:

$$\begin{aligned} g(R(\textbf{Q}))= R(g(\textbf{Q})). \end{aligned}$$

(3.12)

This type of convolution is specially useful when working with 3D point clouds, since the networks are robust to rotated input features, and the feature maps produced by inner layers are invariant to permutations of the input data points.

Most work on Quaternion-valued CNN is based on these, or lies in one of the preceding categories. However, independently of the approach that we follow, a related problem when we implement QCNNs, is how to deal with multidimensional inputs. In real-valued CNNs, the common way of dealing with them is: defining 2D convolution kernels with the same number of channels as the input data, apply 2D convolution separately for each channel; then, the resulting 2D outputs are summed over all channels to produce a single-channel feature map. A different approach is to apply N-dimensional convolution; in this case, the multidimensional kernel is convoluted over all the channels of the input data.

For quaternion-valued CNNs, current implementations use variations of the first approach; i.e. the input data is divided into 4-channel sub-inputs, thereafter quaternion convolution is computed for each sub-input. Before explaining the details, some notation is introduced.

Let $\textbf{X} \in \mathbb {R}^{N\times M \times C}$ be an input data, N is the number of rows, M the number of columns, and C is the number of channels, where C is multiple of 4, then X is partitioned as follows:

$$\begin{aligned} \textbf{X}= [\textbf{Q}_0, \textbf{Q}_1, \dots , \textbf{Q}_{(C/4)-1} ] \end{aligned}$$

(3.13)

where each $\textbf{Q}_s \in \mathbb {H}^{N\times M}$, $0<s<(C/4)-1$ is a quaternion channel.

Let $\textbf{V} \in \mathbb {R}^{L\times L \times K}$, with K multiple of 4, be the convolution kernel, then:

$$\begin{aligned} \textbf{V}= [\textbf{W}_0, \textbf{W}_1, \dots , \textbf{W}_{(K/4)-1} ] \end{aligned}$$

(3.14)

where each $\textbf{W}_s \in \mathbb {H}^{L\times L}$, $0<s<(K/4)-1$ is a quaternion channel.

Thus, there are three different ways of dealing with multidimensional quaternion inputs, see Fig. 2:

1.
Encoding set-up: Kernel and input have the same number of channels ($K=C$). In this case, each quaternion channel input is assigned to one quaternion channel kernel, and convolution between them is computed using Eqs. (3.5), (3.7), (3.10) or (3.11); Thus, if we use left-sided convolution, each individual output, $\textbf{F}_s \in \mathbb {H}^{N-L+1\times M-L+1}$, is computed as follows:
$$\begin{aligned} \textbf{F}_s= \textbf{W}_s * \textbf{Q}_s, \end{aligned}$$
(3.15)
where $0<s<(C/4)-1$, and the final quaternion feature map is obtained by concatenating all outputs:
$$\begin{aligned} \textbf{F}= [\textbf{F}_0, \textbf{F}_2, \dots , \textbf{F}_{(C/4)-1} ] \end{aligned}$$
(3.16)
This method produces an output with the same number of channels as the input data; and could be used in Convolutional Auto-Encoders (CAE).
2.
Pyramidal set-up: Kernel and input have different number of channels ($K\ne C$), but they are multiples of 4. In [3], the computing of feature maps using a pyramidal approach is proposed: each kernel $\textbf{W}_t$, where $0<t<(K/4)-1$ is convolved with each sub-input $\textbf{Q}_s$, where $0<s<(C/4)-1$, hence, it produces C/4 quaternion outputs. Since each quaternion input channel is convolved with each quaternion kernel channel, see Fig. 2, a convolution kernel $\textbf{V} \in \mathbb {R}^{L\times L \times K}$ will produce $C*K/16$ quaternion outputs:
$$\begin{aligned} \textbf{F}= [\textbf{F}_0, \textbf{F}_2, \dots , \textbf{F}_{C*K/16}]. \end{aligned}$$
(3.17)
Thus, if left-sided convolution is applied, each quaternion output channel, $\textbf{F}_k$, is computed as follows:
$$\begin{aligned} \textbf{F}_{(t*C/4)+s}= \textbf{W}_t * \textbf{Q}_s. \end{aligned}$$
(3.18)
Even though [3] used left-sided convolution, this approach is valid with any of the other convolution definitions. The intuition behind this approach is to detect a quaternionic pattern in any sub-input; however, its application is impractical because of the exponential growth of the number of channels in deep architectures. Calculation of summed outputs can alleviate the computational cost.
3.
Summed set-up: Similarly to the former method, but in this approach, each quaternion input channel is convolved with a different quaternion kernel channel:
$$\begin{aligned} \textbf{F}_s= \textbf{W}_s * \textbf{Q}_s, \end{aligned}$$
(3.19)
where $0<s<(C/4)-1$. Then, the final quaternion feature map, $\textbf{F}\in \mathbb {H}^{N\times M}$, is obtained by summing all outputs:
$$\begin{aligned} \textbf{F} = \sum _{s=0}^{(C/4)-1} \textbf{F}_s. \end{aligned}$$
(3.20)

3.1.1 Future Directions

As was mentioned before, N-dimensional convolution has been applied on Real-valued CNNs for processing multichannel inputs. In a similar way, the equations presented can be extended to 8-channel inputs using an octonion or hypercomplex algebra, see for example [61, 147, 159, 169, 178]. For larger number of inputs, a geometric algebra [71] representation can be applied. Some approaches in this direction can be found in [61, 178], which works learn the algebra rules directly from the data. A different approach consist in adapting Clifford Neurons [16, 18, 19, 132] to the deep learning domain. Even though some works has been proposed in this line of thought [15, 177], the Clifford Deep Leaning area is still in its infancy and further work is required in the study of properties, development of components, models and applications.

Finally, in these type of deep learning architectures: quaternion, hyper-complex, or geometric, a major concern is the selection of the signature of the algebra, which will embed data into different geometric spaces, and the processing will take distinct meanings accordingly. Thus, a sensible selection of the dimension and signature should be made according to the nature of the problem and the meaning of input data as well as intermediate layers.

3.2 Quaternion Fully Connected Layers

Let $\textbf{Q}$ be a $N_1\times N_2 \times N_3$ tensor, representing the input to a fully connected layer; then each element of $\textbf{Q}$ is a quaternion:

$$\begin{aligned} \textbf{Q}= [\textbf{q}(x,y,z)] \in \mathbb {H}^{N_1\times N_2 \times N_3}, \end{aligned}$$

(3.21)

where $N_1, N_2, N_3$ are the height, width, and number of channels of the input.

Now, for the fully connected layer, a quaternion kernel, $\textbf{W}$, of size $N_1\times N_2 \times N_3$ is defined, where each element is a quaternion:

$$\begin{aligned} \textbf{W}= [\textbf{w}(x,y,z)] \in \mathbb {H}^{N_1\times N_2 \times N_3}. \end{aligned}$$

(3.22)

where $N_1, N_2, N_3$ are the height, width, and number of channels of the input.

Note that elements of the input and weight tensors are denoted as q(x, y, z), and w(x, y, z), respectively; and the output of the layer will be a quaternion, $\textbf{f} \in \mathbb {H}$.

Thus, for classic fully connected layers, the output, $\textbf{f}$, is computed as follows [3]:

$$\begin{aligned} \textbf{f} = \sum _{r,s,t}^{N_1,N_2,N_3} \left[ \textbf{w}(r,s,t) \textbf{q}(r,s,t)\right] . \end{aligned}$$

(3.23)

Similarly, for geometric fully connected layers, the output is computed as follows [181]:

$$\begin{aligned} \textbf{f} = \sum _{r,s,t}^{N_1,N_2,N_3} \left[ \frac{1}{\Vert \textbf{w}(r,s,t)\Vert } \textbf{w}(r,s,t) \textbf{q}(r,s,t) \bar{\textbf{w}}(r,s,t) \right] . \end{aligned}$$

(3.24)

The difference between quaternion-valued and real-valued fully connected layers relies on the application of the Hamilton product, which captures interchannel relationships, and in the former case, the output is a quaternion. Moreover, it should be noticed that for real-based networks, fully connected layers are equivalent to inner product layers, but for quaternion-valued networks, the output of quaternion fully connected layers and quaternion inner product layers are different.

3.2.1 Future Directions

Current implementations of fully quaternion layers follows a classic or geometric approach; the equivariance approach should be implemented and tested.

3.3 Quaternion Pooling

A pooling layer introduces a sort of invariance to geometric transformations, such as small translations and rotations. Most of the current methods for applying quaternion pooling rely on channel-wise pooling, in a similar way as is applied in Real-valued CNNs. For example, [54, 113] use channel-wise global average pooling layers, and [100, 101, 181] apply channel-wise average as well as channel-wise max pooling layers, while [77, 119, 130] apply just channel-wise max pooling layers, defined as follows:

$$\begin{aligned} \text {SplitPool}(\textbf{Q})= & \max _{(x,y)}(q_R(x,y)) + \max _{(x,y)}(q_I(x,y))\hat{i} \nonumber \\ & + \max _{(x,y)}(q_J(x,y))\hat{j} + \max _{(x,y)}(q_K(x,y))\hat{k}, \end{aligned}$$

(3.25)

where ${\textbf {Q}}$ is a quaternion submatrix.

In contrast, [3, 151, 176] use a max-pooling approach, but instead of using the channel-wise maximum value, they select the quaternion with maximum amplitude within a region:

$$\begin{aligned} \text {FullyPool}(\textbf{Q}) = \textbf{q}({x'},{y'})\ s.t.\ ({x'},{y'}) = \underset{(x,y)}{\arg \max }\ (\Vert \textbf{q}(x,y)\Vert ). \end{aligned}$$

(3.26)

In addition, since this method can obtain multiple maximum amplitude quaternions, [176] applies the angle cosine theorem to discriminate between them. In this way, fully-quaternion pooling takes advantage of the information contained in different channels.

3.3.1 Future Directions

Future works should introduce novel fully quaternion pooling methods, emphasizing the use of interchannel relationships, and their advantages over split methods. For example: using polar representations, introducing quaternion measures [108], or taking inspiration from information theory.

3.4 Quaternion Batch Normalization

Internal covariance shift [80], is a statistical phenomenon that occurs during training: the change of the network parameters causes changes in the statistical distribution of the inputs in hidden layers. Whitening procedures [97, 165], i.e. apply linear transformations to obtain uncorrelated inputs with zero means and unit variances, alleviate this phenomenon. Since whitening the input layers is computationally expensive, because of the computing of covariance matrices, its inverse square root, and the derivatives of these transformations, Ioffe and Szegedy [80] introduced the batch normalization algorithm, which normalizes each dimension independently. It uses mini-batches to estimate the mean and variance of each channel, and transform the channel to have zero mean and unit variance using the following equation:

$$\begin{aligned} \tilde{x}=\frac{x-E(x)}{\sqrt{\sigma ^2+\epsilon }}, \end{aligned}$$

(3.27)

where $\epsilon $ is a constant added for numerical stability. In order to maintain the representational ability of the network, an affine transformation with two learnable parameters, $\gamma $ and $\beta $, is applied:

$$\begin{aligned} BN(\tilde{x})=\gamma \tilde{x}+\beta \end{aligned}$$

(3.28)

Even though this method does not produce uncorrelated inputs, it improves convergence time by enabling higher learning rates, and has allowed the training of deeper neural network models. On the other hand, uncorrelated inputs reduce overfitting and improve generalization [33].

Since channel-wise normalization does not assure equal variance in the real and imaginary components, Gaudet and Maida [54] proposed a quaternion batch-normalization algorithm using the whitening approach [87], and treat each component of the quaternion as an element of a four dimensional vector. We call this approach Whitening Quaternion Batch-Normalization (WQBN). Let $\textbf{x}$ be a quaternion input variable, $\textbf{x}=[x_R, x_I, x_J, x_K]^T$, $E(\textbf{x})$ its expected value, and $V(\textbf{x})$ its quaternion covariance matrix, both computed over a mini-batch. Then:

$$\begin{aligned} V(\textbf{x}) = \begin{bmatrix} v_{rr} & \quad v_{ri} & \quad v_{rj} & \quad v_{rk}\\ v_{ir} & \quad v_{ii} & \quad v_{ij} & \quad v_{ik}\\ v_{jr} & \quad v_{ji} & \quad v_{jj} & \quad v_{jk}\\ v_{kr} & \quad v_{ki} & \quad v_{kj} & \quad v_{kk} \end{bmatrix}, \end{aligned}$$

(3.29)

where subscripts represent the covariance between real or imaginary components of $\textbf{x}$, e.g. $v_{ij}=\text {cov}(x_I,x_J)$. Then, the Cholesky decomposition of $V^{-1}$ is computed, and one of the resulting matrices, W, is selected. Thereafter, the whitened quaternion variable, $\tilde{\textbf{x}}$, is calculated with the following matrix multiplication [54]:

$$\begin{aligned} \tilde{\textbf{x}}=W(\textbf{x}-E(\textbf{x})), \end{aligned}$$

(3.30)

and finally:

$$\begin{aligned} WQBN(\tilde{\textbf{x}})=\mathbf {\Gamma } \tilde{\textbf{x}}+\mathbf {\beta } \end{aligned}$$

(3.31)

where $\beta \in \mathbb {H}$ is a trainable parameter, and:

$$\begin{aligned} \mathbf {\Gamma } = \begin{bmatrix} \Gamma _{rr} & \quad \Gamma _{ri} & \quad \Gamma _{rj} & \quad \Gamma _{rk}\\ \Gamma _{ri} & \quad \Gamma _{ii} & \quad \Gamma _{ij} & \quad \Gamma _{ik}\\ \Gamma _{rj} & \quad \Gamma _{ij} & \quad \Gamma _{jj} & \quad \Gamma _{jk}\\ \Gamma _{rk} & \quad \Gamma _{ik} & \quad \Gamma _{jk} & \quad \Gamma _{kk} \end{bmatrix}, \end{aligned}$$

(3.32)

is a symmetric matrix with trainable parameters.

In contrast, Yin et al. [176] apply the quaternion variance definition proposed in [162]:

$$\begin{aligned} V(\textbf{x}) = \frac{1}{T}\sum _{i=1}^T \textbf{v} \bar{\textbf{v}}, \end{aligned}$$

(3.33)

where $\textbf{v}=\textbf{x}-E(\textbf{x})$. Note that in this case, the variance is a single real value. We call this approach the Variance Quaternion Batch Normalization (VQBN). Thus, the batch normalization is computed as follows:

$$\begin{aligned} VQBN(\textbf{x})=\mathbf {\gamma } \frac{\textbf{x}-E(\textbf{x})}{\sqrt{V(\textbf{x})^2+\epsilon }}+\mathbf {\beta }, \end{aligned}$$

(3.34)

where $\mathbf {\gamma ,\epsilon }\in \mathbb {R}$, $\mathbf {\beta }\in \mathbb {H}$ are trainable parameters.

Recently, Grassucci et al. [58] noted that for proper quaternion random variables [29, 161], the covariance matrix in Eq. (3.29) becomes the diagonal matrix:

$$\begin{aligned} V(\textbf{x}) =4\sigma ^2 I, \end{aligned}$$

(3.35)

and the batch-normalization procedure is simplified to applying Eqs. (3.33) and (3.34) with $V(\textbf{x})=2\sigma $.

A third approach is the one proposed in [151], where the authors define a rotation-equivariance operation as follows:

$$\begin{aligned} RQBN(\textbf{x})=\frac{\textbf{x}}{\sqrt{E(\Vert \textbf{x}\Vert ^2)+\epsilon }}. \end{aligned}$$

(3.36)

Generative Adversarial Networks apply another type of normalization, called Spectral Normalization [115]. In these networks, having a Lipschitz-bounded discriminative function is crucial to mitigate the gradient explosion problem [11, 57, 180]; thus, based on their real-valued counterpart, Grassucci et al. [58] proposed a Quaternion Spectral Normalization algorithm that constrains the spectral norm of each layer. To explain this procedure, we introduce some definitions.

Let f be a generic function, it is said to be K-Lipschitz continuous if, for any two points, $x_1,x_2$, it satisfies:

$$\begin{aligned} \frac{\Vert f(x_1)-f(x_2) \Vert }{|x_1-x_2|}\le K \end{aligned}$$

(3.37)

Let $\sigma (\cdot )$ be the spectral norm of a matrix, i.e. the largest singular value of a matrix. The Lipschitz norm of a function f, denoted by $\Vert f\Vert _{\text {Lip}}$, is defined as follows:

$$\begin{aligned} \Vert f\Vert _{\text {Lip}}=\sup _x \sigma (\nabla f(x)). \end{aligned}$$

(3.38)

Thus, for a generic linear layer, $f(h)=Wx+b$, whose gradient is W, their Lipschitz norm is:

$$\begin{aligned} \Vert f\Vert _{\text {Lip}}= \sigma (W). \end{aligned}$$

(3.39)

Now, let $\textbf{W}$ be a quaternion matrix; their Lipschitz norm, $\sigma (\textbf{W})$, is computed by estimating the largest singular value of $\textbf{W}$ via the power iteration method [58]. Then, Quaternion Spectral Normalization is applied, in a split way, using the following equations:

$$\begin{aligned} {W'}_R=\frac{W_R}{\sigma (W)}, {W'}_I=\frac{W_I}{\sigma (W)}, {W'}_J=\frac{W_J}{\sigma (W)}, \text {\ and\ } {W'}_K=\frac{W_K}{\sigma (W)}. \end{aligned}$$

(3.40)

For applying the Lipschitz bound to the whole network, the following relationship is used:

$$\begin{aligned} \Vert f_1\circ f_2\Vert _{\text {Lip}} \le \Vert f_1\Vert _{Lip}\cdot \Vert f_2\Vert _{\text {Lip}}. \end{aligned}$$

(3.41)

provided the Lipschitz norm of each layer is constrained to 1 [58]. For details about the incorporation of the method in the training stage, consult [58, 115]

3.4.1 Future Directions

The WQBN algorithm produces uncorrelated, zero mean, and unit variance inputs, but is computationally expensive. In addition, Kessy et al. [87] states that there are infinitely many possible matrices satisfying $V^{-1}=\bar{W}W$. In contrast, VQBN algorithm produces zero mean and unit variance input (according to the Wang et al. definition [162], which averages the variance of the four channels), but since we use a single value variance, scaling is isotropic, and the input channels still correlated. However, this approach greatly reduces the computational time, by avoiding the decomposition of the covariance matrix. Further theoretical and experimental analysis is required to grasp its advantages versus independent channel batch-normalization.

3.5 Activation Functions

Biological neurons produce an output signal if a set of input stimuli surpasses a threshold value within a lapse of time; for artificial neurons, the role of the activation function is to simulate this triggering action. Mathematically, this behavior is modeled by a mapping: in the case of real-valued neurons, the domain and image is the field of real numbers, while complex and quaternion neurons map complex or quaternion inputs to complex or quaternion outputs, respectively. In addition, non-linear activation functions are required so artificial neural networks work as universal interpolators of continuous quaternion valued functions, related density theorems for quaternion-valued MLP can be consulted in [8].

Since back-propagation has become the standard method for training artificial neural networks, the activation function is required to be analytic [72], i.e. the derivative exists at any point. In the complex domain, by the Liouville theorem, it is known that a bounded function, which is analytical at any point, is a constant function; reciprocally, complex non-constant functions have non-bounded images [153]. Accordingly, some common real-valued functions, like the hyperbolic tangent and sigmoid, will diverge to infinity when they are extended to the complex domain [72]; this makes them unsuitable for representing the behavior of a biological neuron. A similar problem arises in the quaternion domain: “the only quaternion function regular with bounded norm in $E^4$ is a constant” [40], where $E^4$ stands for a 4-dimensional Euclidean space.

The relationship between regular functions and the existence of their quaternionic derivatives is stated by the Cauchy–Riemann–Fueter equation [155]; leading to the result that the only globally analytic quaternion functions are some linear and constant functions. Moreover, non-linear activation functions are required for constructing a neural network architecture that works as a universal interpolator of a continuous quaternion valued function [8].

In this manner, a typical approach to circumvent this problem, has been to relax the constraints, and use non-linear non-analytic quaternion functions satisfying input and output properties, where the learning dynamics is built using partial derivatives on the quaternion domain. An example of this approach are split quaternion functions, defined as follows:

$$\begin{aligned} f(\textbf{q})=f_R(\textbf{q})+f_I(\textbf{q})\hat{i}+f_J(\textbf{q})\hat{j}+f_K(\textbf{q})\hat{k}, \end{aligned}$$

(3.42)

where $\textbf{q}\in \mathbb {H}$, $f_R$, $f_I$, $f_J$, and $f_K$ are mappings over the real numbers: $f_{*}:\mathbb {R}\rightarrow \mathbb {R}$.

Nowadays, split quaternion functions remain the only type of activation function that has been implemented on QCNNs. Even though any type of quaternion split activation function used in QMLP can be applied, split quaternion ReLU is currently the most common activation function applied on QCNNs. For the sake of completeness, we present the activation functions found in current works.

Let $\textbf{q}=q_R+q_I\hat{i}+q_J\hat{j}+q_K\hat{k} \in \mathbb {H}$, we have the following activation functions:

1.
Split Quaternion Sigmoid. It was introduced in [3, 10], and is defined as follows:
$$\begin{aligned} {Q}S(\textbf{q})= S(q_R)+ S(q_I) \hat{i} + S(q_J) \hat{j} + S(q_K) \hat{k}, \end{aligned}$$
(3.43)
where $S: \mathbb {R}\rightarrow \mathbb {R}$ is the real-valued sigmoid function:
$$\begin{aligned} S(x)= \frac{1}{1+e^{-x}}. \end{aligned}$$
(3.44)
2.
Split Quaternion Hyperbolic Tangent. It was introduced in [129, 181], and is defined as follows:
$$\begin{aligned} {Q}tanh(\textbf{q})= \tanh (q_R)+ \tanh (q_I)\hat{i} + \tanh (q_J) \hat{j} + \tanh (q_K) \hat{k}, \end{aligned}$$
(3.45)
where $\tanh : \mathbb {R}\rightarrow \mathbb {R}$ is the real-valued hyperbolic tangent function:
$$\begin{aligned} \tanh (x)= & \frac{\sinh (x)}{\cosh (x)} \nonumber \\= & \frac{\exp (2x)-1}{\exp (2x)+1}. \end{aligned}$$
(3.46)
3.
Split Quaternion Hard Hyperbolic Tangent. It was introduced in [127], and is defined as follows:
$$\begin{aligned} {Q}H^2T(\textbf{q})= H^2T(q_R)+ H^2T(q_I) \hat{i} + H^2T(q_J) \hat{j} + H^2T(q_K) \hat{k}, \end{aligned}$$
(3.47)
where $H^2T: \mathbb {R}\rightarrow \mathbb {R}$ [34]:
$$\begin{aligned} H^2T(x)= {\left\{ \begin{array}{ll} -1 & \text {if } x<-1 \\ x & \text {if } -1\le x\le 1 \\ 1 & \text {if } x>1. \end{array}\right. } \end{aligned}$$
(3.48)
For this type of activation functions, Collobert [34] proved that hidden layers of MLP work as local SVM’s when the real-value hard hyperbolic tangent is used.
4.
Split Quaternion ReLU. It was introduced in [54, 77, 176], and is defined as follows:
$$\begin{aligned} {Q}ReLU(\textbf{q}) = ReLU(q_R) + ReLU(q_I)\hat{i} + ReLU(q_J)\hat{j} + ReLU(q_K)\hat{k}, \end{aligned}$$
(3.49)
where $ReLU: \mathbb {R}\rightarrow \mathbb {R}$ is the real-valued ReLU function [50, 56]:
$$\begin{aligned} ReLU(x)= \max (0, x). \end{aligned}$$
(3.50)
5.
Split Quaternion Parametric ReLU. It was introduced in [130], and is defined as follows:
$$\begin{aligned} {Q}PReLU(\textbf{q}) = PReLU(q_R)+ PReLU(q_I)\hat{i} + PReLU(q_J)\hat{j} + PReLU(q_K)\hat{k}, \end{aligned}$$
(3.51)
where $PReLU: \mathbb {R}\rightarrow \mathbb {R}$ [67]:
$$\begin{aligned} PReLU(x)= {\left\{ \begin{array}{ll} x & \text {if } x>0 \\ \alpha x & \text {if } x\le 0 \end{array}\right. } \end{aligned}$$
(3.52)
and $\alpha $ is a parameter learned during the training stage, which controls the slope of the negative side of the ReLU function. Equivalently, we have:
$$\begin{aligned} PReLU(x)= \max (0, x)+\alpha \min (0,x). \end{aligned}$$
(3.53)
6.
Split Quaternion Leaky ReLU. It was introduced in [59] as a particular case of the $\mathbb {Q}$PReLU function; in this case, $\alpha $ is a small constant value, e.g. 0.01 [109].

An advantage of using split activation functions is that by processing each channel separately, we can adopt existing frameworks without additional modifications of the source code; however, this separate processing does not adequately capture the cross-dynamics of the data channels.

A different approach in the design of quaternion activation functions comes from quaternion analysis [155]. So far, it is clear that the key problem is designing suitable activation functions, and computing its quaternion derivatives; thus, some mathematicians have been working in redefining quaternion calculus.

For example, De Leo and Rotelli [38, 39], as well as Schwartz [144], have introduced the concept of local quaternionic derivative. The trick was to extend “the concept of a derivative operator from those with constant (even quaternionic) coefficients, to one with variable coefficients depending upon the point of application... [thus] the derivative operator passes from a global form to local form” [39]. Using this novel approach, they define local analyticity, which, in contrast to global analyticity, does not reduce functions to a trivial class (constant or linear functions). Following this idea, Ujang et al. [158] define the concept of fully quaternion functions, and the properties they should fulfill to be locally analytic, non-linear, and suitable for gradient-based learning. They displayed better performance over split-quaternion functions when they were applied for designing adaptive filters. Opposed to split functions, fully quaternion functions capture interchannel relationships, making them suitable for quaternion-based learning; however, experimental comparison over a standard benchmark remain an open issue.

To the best of our knowledge, currently, the only fully quaternion activation function that has been applied in QCNNs is the Rotation-Equivariant ReLU function. It was proposed by Shen et al. [151] as part of a model that extracts rotation equivariant features. Let $\{\textbf{q}_1, \textbf{q}_2, \dots , \textbf{q}_N\}$ be a set of quaternions; then, for a quaternion $\textbf{q}_s$, with $1\le s\le N$, the activation function is defined as follows:

$$\begin{aligned} {Q}REReLU (\textbf{q}_s)= \frac{\Vert \textbf{q}_s\Vert }{\max (\Vert \textbf{q}_s\Vert ,c)}\textbf{q}_s; \end{aligned}$$

(3.54)

where c is a positive constant, computed as follows: $c=\frac{1}{N}\sum _{t=1}^N \Vert \textbf{q}_t\Vert $. Since c is the average of the magnitudes of all quaternion outputs in a neighborhood, if the magnitude of a specific quaternion output is above the average, the output remains the same (active); otherwise, the magnitude of the quaternion output is mitigated by a 1/c factor.

3.5.1 Future Directions

So far, we have identified the fundamental trends of thought for activation functions: split-quaternion functions, whose derivatives for training are computed in a channel wise manner, and fully quaternion functions, whose derivatives are computed locally, or using partial derivatives. Theoretical analysis, as well as some preliminary experimental results, have indicated a better performance of fully quaternion activation functions over others. However, these ideas have not been set in practice on QCNNs. Intended works should focus on introducing novel fully connected activation functions that exploit specific properties of the quaternion representation, and establishing proper benchmarks for comparison of existing functions.

3.6 Training

From the point of view of supervised learning, the problem of training a QCNN can be stated as an optimization one:

$$\begin{aligned} \underset{\textbf{w}_{1},\dots ,\textbf{w}_{n}}{\textrm{argmin}}\ L(\textbf{w}_{1},\dots ,\textbf{w}_{n}) \end{aligned}$$

(3.55)

where L is the loss function, and $\textbf{w}_{1},\dots ,\textbf{w}_{n}$ are all the quaternion weights of the network.

For solving the problem, current works extend the Gradient Descent Algorithm to the quaternion domain, and computation of the gradient is achieved by means of the backpropagation technique. However, due to the non-analytic condition of non-linear quaternionic functions, there are three approaches to computing derivatives during the training stage: use partial derivatives and the generalized quaternion chain rule [54], use local quaternionic derivatives [38, 39, 144] or use the Generalized Hamiltonian-Real calculus (GHR) [172].

Currently, all of the methods for training that have been tested on QCNNs rely on the first approach, i.e. the use of partial derivatives. These methods are adaptations of the QMLP backpropagation algorithm [7], or an extension of the generalized complex chain rule for real-valued loss functions [157]. Both approaches are equivalent and relax the analyticity condition by using partial derivatives with respect to the real and imaginary parts, as follows: Let L be a real-valued loss function and $\textbf{q}=q_R+q_I\hat{i}+ q_J\hat{j} + q_K\hat{k}$, where $q_R, q_I, q_J, q_K \in \mathbb {R}$, then [54, 155]:

$$\begin{aligned} \frac{\partial L}{\partial \textbf{q}} = \frac{\partial L}{\partial q_R}+ \frac{\partial L}{\partial q_I}\hat{i} + \frac{\partial L}{\partial q_J}\hat{j} + \frac{\partial L}{\partial q_K}\hat{k}. \end{aligned}$$

(3.56)

Now, assuming $\textbf{q}$ can be expressed in terms of a second quaternion variable, $\textbf{w}=w_R+w_I \hat{i}+ w_J \hat{j} + w_K \hat{k}$, where $w_R, w_I, w_J, w_K \in \mathbb {R}$. Then, the generalized quaternion chain rule is calculated as follows [54]:

$$\begin{aligned} \frac{\partial L}{\partial \textbf{w}}= & \frac{\partial L}{\partial q_R} \left( \frac{\partial q_R}{\partial w_R} +\frac{\partial q_R}{\partial w_I}\hat{i} + \frac{\partial q_R}{\partial w_J}\hat{j} + \frac{\partial q_R}{\partial w_K}\hat{k} \right) \nonumber \\ & + \frac{\partial L}{\partial q_I} \left( \frac{\partial q_I}{\partial w_R} +\frac{\partial q_I}{\partial w_I}\hat{i} + \frac{\partial q_I}{\partial w_J}\hat{j} + \frac{\partial q_I}{\partial w_K}\hat{k} \right) \nonumber \\ & + \frac{\partial L}{\partial q_J} \left( \frac{\partial q_J}{\partial w_R} +\frac{\partial q_J}{\partial w_I}\hat{i} + \frac{\partial q_J}{\partial w_J}\hat{j} + \frac{\partial q_J}{\partial w_K}\hat{k} \right) \nonumber \\ & + \frac{\partial L}{\partial q_K} \left( \frac{\partial q_K}{\partial w_R} +\frac{\partial q_K}{\partial w_I}\hat{i} + \frac{\partial q_K}{\partial w_J}\hat{j} + \frac{\partial q_K}{\partial w_K}\hat{k} \right) \end{aligned}$$

(3.57)

From these equations, the gradient descendant algorithm using backpropagation was developed, see [3, 9, 10, 113] for a full derivation of the algorithm; the algorithmic form is summarized as follows:

Let $\textbf{Q} = [\textbf{q}(x,y)] \in \mathbb {H}^{N\times M}$ be the input to a convolution layer, $\textbf{W} = [\textbf{w}(x,y)] \in \mathbb {H}^{L\times L}$ be the weights of the convolution kernel, and $\textbf{F} = [\textbf{f}(x,y)]\in \mathbb {H}^{(N-L+1)\times (M-L+1)}$ be the output of the layer; then, a quaternion weight, $\textbf{w}(s,t)$, where $1<s,t<L$, represents the weight connecting input $\textbf{q}(a+s,b+t)$ with output $\textbf{f}(u,v)$, where $1<u<N-L+1$ and $1<v<N-L+1$. Thus, the gradient of the loss function with respect to $\textbf{w}(s,t)$ is computed using Eq. 3.57. Hence, given a positive constant learning rate, $\epsilon \in \mathbb {R}^+$, the quaternion weight must be changed by a value proportional to the negative gradient:

$$\begin{aligned} \Delta \textbf{w}(s,t) = \epsilon \left( -\frac{\partial L}{\partial \textbf{w}(s,t)} \right) \end{aligned}$$

(3.58)

Let $\textbf{d}(u,v)^{top}, \textbf{d}(s,t)^{bottom} \in \mathbb {H}$ be the error propagated from the top and to the bottom layers, respectively, the weights are updated as follows:

$$\begin{aligned} \textbf{w}(s,t)=\textbf{w}(s,t) + \epsilon \textbf{d}(u,v)^{\text {top}} \mathbf {\bar{x}}(u+s,v+t) \end{aligned}$$

(3.59)

The bias term is updated using the following equation:

$$\begin{aligned} \textbf{b}(u,v)=\textbf{b}(u,v) + \epsilon \textbf{d}(u,v)^{\text {top}}, \end{aligned}$$

(3.60)

and the error is propagated to the bottom layer according to the following equation:

$$\begin{aligned} \textbf{d}(s,t)^{\text {bottom}} = \sum _{u}\sum _{v} (\mathbf {\bar{w}}(s,t) \textbf{d}(u,v)^{\text {top}}) \end{aligned}$$

(3.61)

Note that Eqs. (3.59) and (3.61) use quaternion products. Finally, for an activation layer, the error is propagated to the bottom layer according to the following equation:

$$\begin{aligned} \textbf{d}(s,t)^{\text {bottom}} = \textbf{d}(u,v)^{\text {top}} \odot f'(\textbf{x}(u+s,v+t)) \end{aligned}$$

(3.62)

where $\odot $ is the Hadamard or Schur product (component wise) [36].

Adopting the same definition of the chain rule, but a different definition of convolution, we have the work of Zhu et al. [181], whose authors apply the two-sided convolution and a polar representation of the quaternion weights. Since their model applies rotation and scaling on the input features, the quaternion gradient for his model is simplified to a rotation transformation over the same axis, but with a reversed angle. A general model of this approach, with learnable arbitrary axes, is presented by [113]. Therefore, the training algorithm is similar to the one presented before, but Eq. (3.59) changes accordingly [81, 112, 113]:

$$\begin{aligned} \textbf{w}(s,t)= & \textbf{w}(s,t) \nonumber \\ & + \frac{\epsilon }{\Vert \textbf{w}(s,t)\Vert } \left\{ \frac{\textbf{d}(u,v)^{\text {top}} \cdot \left( \textbf{w}(s,t) \textbf{x}(u+s,v+t) \mathbf {\bar{w}}(s,t) \right) }{\Vert \textbf{w}(s,t)\Vert ^2} \textbf{w}(s,t)\right\} \nonumber \\ & - \frac{\epsilon }{\Vert \textbf{w}(s,t)\Vert } \left\{ 2\textbf{d}(u,v)^{\text {top}} \textbf{w}(s,t) \mathbf {\bar{x}}(u+s,v+t) \right\} \end{aligned}$$

(3.63)

and Eq. (3.61) changes to [81, 112, 113]:

$$\begin{aligned} \textbf{d}(s,t)^{\text {bottom}} = \sum _u \sum _v \frac{ \mathbf {\bar{w}}(s,t) \textbf{d}(u,v)^{\text {top}} \textbf{w}(s,t)}{\Vert \textbf{w}(s,t) \Vert } \end{aligned}$$

(3.64)

So far, we have reviewed the first approach for training; the second approach relies on developing novel quaternionic calculus tools. Thus, using the concept of local quaternionic derivative [38, 39, 144] T. Isokawa proposed a QMLP and a backpropagation algorithm [82, 120].

The third approach follows the same trend of developing novel quaternion calculus tools, but from a different perspective. Mandic et al. [110] start from the observation that for gradient-based optimization, a common objective is to minimize a positive real function of quaternion variables, e.g. $J(\textbf{e},\mathbf {\bar{e}})=\textbf{e}\mathbf {\bar{e}}$, and for that purpose the pseudo-gradient is used, i.e. the sum of component-wise gradients. Formalization of these ideas is achieved by “establishing the duality between derivatives of quaternion valued functions in $\mathbb {H}$ and the corresponding quadrivariate real functions in $\mathbb {R}^4$” [110], leading to what is called Hamiltonian-Real Calculus (HR). In addition, Mandic et al. [110] proved that for a real function of quaternion vector variable, the maximum change is in the direction of the conjugate gradient, establishing a general framework for quaternion gradient-based optimization. Going further, Xu and Mandic [172] proposed the product and chain rules for computing derivatives, as well as the quaternion counterparts of the mean value and Taylor’s theorems; this establishes an alternative framework called Generalize HR calculus (GHR). Thereafter, novel quaternion gradient algorithms using GHR calculus were proposed [173, 174]. Although in [175] is shown that a QMLP trained with a GHR-based algorithm obtains better prediction gains, on the 4D Saito’s chaotic signal task, than other quaternion-based learning algorithms [8, 17, 112], further experimental analysis on a standard benchmark and proper comparison with real and complex counterparts is required. To this date, there is no published work on the use of local quaternionic derivatives or GHR calculus for training QCNNs.

Besides the problem of computing derivatives of non-analytic quaternion activation functions, another problem is that algorithms such as gradient descent and back propagation could be trapped in local minima. Therefore, QCNNs could be trained with alternative methods that do not rely on the computing of quaternion derivatives, such as evolutionary algorithms, ant colony optimization, particle swarm optimization, etc. [31]. In this case, general-purpose optimization algorithms will have the same performance in average [166, 167].

3.6.1 Future Directions

Novel methods for training should be applied, for example: modern quaternion calculus techniques, such as local or GHR calculus, and meta-heuristic algorithms working in the quaternion domain. In addition, fully quaternion loss functions should be introduced.

3.7 Quaternion Weight Initialization

At the beginning of this century, real-valued neural networks were showing the superiority of deep architectures; however, the standard gradient descent algorithm from random weight initialization performed poorly when used for training these models. To understand the reason of this behavior, Glorot and Bengio [55] established an experimental setup and observed that some activation functions can cause saturation in top hidden layers. In addition, by theoretically analyzing the forward and backward propagation variances (expressed with respect to the input, output and weight initialization randomness), they realized that, because of the multiplicative effect through layers, the variance of the back-propagated gradient might vanish or explode in very deep networks. Consequently, they proposed, and validated experimentally, a weight initialization procedure, which makes the variance dependent on each layer, and maintains activation and back-propagated gradient variances as we move up or down the network. Their method, called normalized initialization, uses a scaled uniform distribution for initialization. However, one of its assumptions is that the network is in a linear regime at the initialization step; thus, this method works better for softsign units than for sigmoid or hyperbolic tangent ones [55]. Since the linearity assumption is invalid for activation functions such as ReLU, He et al. [67] developed an initialization method for non-linear ReLU and Parametric ReLU activation functions. Their initialization method uses a zero-mean Gaussian distribution with a specific standard deviation value, and surpasses the performance of the previous method for training extremely deep models, e.g. 30 convolution layers.

In the case of classic QCNNs, Trabelsi et al. [157] extended these results to deep complex networks. Similar results are presented by Gaudet and Maida [54] for deep quaternion networks, and by Parcollet et al. [129] for quaternion recurrent networks. These works treat quaternion weights as 4-dimensional vectors, whose components are normally distributed, centered at zero, and independent. Hence, it can be proved that the weights and their magnitude follow a 4DOF Rayleigh distribution [54, 129], reducing the weight initialization by selecting a single parameter, $\sigma $, which indicates the mode of the distribution. Let $n_\text {in}$ and $n_\text {out}$ be the number of neurons of the input and output layers, respectively, if we select $\sigma =\frac{1}{\sqrt{2(n_{\text {in}}+n_{\text {out}})}}$, then we have a quaternion normalized initialization which ensures that the variances of the quaternion input, output and their gradients are the same. Alternatively, $\sigma =\frac{1}{\sqrt{2n_{\text {in}}}}$ is used for the quaternion ReLU initialization. The algorithm is presented in Fig. 3.

On the other hand, for geometric QCNNs, the quaternion weights represent an affine transformation composed by a scale factor, s, and a rotation angle $\theta $. Thus, to keep the same variance of the gradients during training, Zhu et al. [181] proposed a simple initialization procedure using the uniform distribution, $U[\cdot ]$:

$$\begin{aligned} s\sim & U \left[ -\frac{\sqrt{6}}{\sqrt{n_\text {in}+n_{\text {out}}}}, \frac{\sqrt{6}}{\sqrt{n_\text {in}+n_{\text {out}}}} \right] \nonumber \\ \theta\sim & U\left[ -\frac{\pi }{2}, \frac{\pi }{2} \right] \end{aligned}$$

(3.65)

3.7.1 Future Directions

In the case of geometric QCNNs, further theoretical analysis of weight initialization techniques is required, as well as the extension for ReLU units. For equivariant QCNNs, weight initialization procedures have not been introduced or are not described in the literature. In addition, current works only consider the case of split quaternion activation functions. The propagation of variance using fully quaternion activation functions should be investigated.

4 Architectural Design for Applications

The previous section deals with the individual building blocks of QCNNs; the possible ways we can interconnect them, give rise to numerous models. In this manner, there are three leading factors to consider when implementing applications:

1.
Domain of application. Current works are primarily focused on 3 areas: vision, language, and forecasting.
2.
Mapping the input data from real numbers to the quaternion domain.
3.
Topology. Based on current works, we have the following types:
- ConvNets. They use convolution layers without additional tricks.
- Residual. They use a shortcut connection from input to forward blocks. This connection creates a pathway with an identity mapping from input to output of a residual block. In this way, it is easier for the solver to learn the desired function with reference to an identity mapping than to learn it as a new one.
- Convolution Auto-Encoders (CAE). Its aim is to reconstruct the input feature at the output.
- Point-based. They focus on unordered sets of vectors such as point-cloud input data.
- Recurrent. They exploit connections that can create cycles, allowing the network to learn long-term temporal information.
- Generative. They use an adversarial learning paradigm, where one player generates samples resembling the real data, and the other discriminates between real and fake data; the solution of the game is the model achieving Nash equilibrium.

This section is devoted to show how the blocks presented in Sect. 3 can be used to construct different architectures, and we focus our analysis on the three points previously introduced.

The information is organized in three subsections corresponding to the domains of applications: vision, language, and forecasting. Within each subsection, the models are ordered according to the type of convolution involved: classic, geometric, or equivariant. In addition, different networks are presented according to their topology: ConvNet, Residual, CAE, Point-based, Recurrent, or Generative. Methods for mapping input data to the quaternion domain are presented in each subsection.

A graphic depiction of several models is shown in Figs. 4, 5, and 6. Some architectures available in the literature are presented from a high-level abstraction point of view; showing their topologies and the blocks they use. Information about the number of quaternion kernels, size, and stride is included when available. Further implementation details, such as weight initialization, optimization methods, etc. is omitted to avoid a cumbersome presentation. Readers interested in specific models are encouraged to consult the original sources.

In addition, Tables 2, 3, 4, 5, 6, 7, 8, 9 and 10 provide a comparison between different QCNNs. They provide extra information, such as datasets on which the models were tested, performance metrics, comparison to real-valued models, etc. For comparison between quaternion and real-valued models, we only report real-valued architectures that have similar topology. The difference in the number of parameters between real-valued and quaternion-valued networks is due to the following reasons: Some authors prefer to compare networks with the same number of real-valued units, and quaternion-valued units, where the quaternion-valued units have 4 times more parameters. Others compare networks with the same number of parameters: they reduce the number of quaternion units to $25\%$ of the real-valued networks, or quadruple the number of real-valued units. In addition, some quaternion-value networks use real-valued components in some parts of the network.

To conclude, Sect. 4.4 summarizes some insights obtained from these works (Table 10).

Table 2 Applications of QCNNs in Computer Vision, Part I: Image Classification

Full size table

Table 3 Applications of QCNNs in Computer Vision, Part II

Full size table

Table 4 Applications of QCNNs in Computer Vision, Part III

Full size table

Table 5 Applications of QCNNs in Computer Vision, Part IV

Full size table

Table 6 Applications of QCNNs in Computer Vision, Part V

Full size table

Table 7 Applications of QCNNs in Image Processing, Part I

Full size table

Table 8 Applications of QCNNs in Image Processing, Part II

Full size table

Table 9 Applications of QCNNs in natural language processing

Full size table

Table 10 Applications of QCNNs in forecasting

Full size table

4.1 Vision

For mapping data to the quaternion domain, several methods are available in the literature; in the case of color images, the most common approach is to encode the red, green, and blue channels into the imaginary parts of the quaternion:

$$\begin{aligned} \textbf{q}=0+R\hat{i}+G\hat{j}+B\hat{k}, \end{aligned}$$

(4.1)

see for example [26, 27, 58,59,60, 77, 83, 119, 127, 146, 176, 181].

Another method, proposed by Gaudet and Maida [54], is to use a residual block:

$BN\rightarrow ReLU\rightarrow Conv\rightarrow BN\rightarrow ReLU\rightarrow Conv$,

where a shortcut connection sums the input to the output of the block.

In contrast, for grayscale images, Chen et al. [27] propose to map the grayscale values to the real part of the quaternion as follows:

$$\begin{aligned} \textbf{q}=\text {grayvalue}+0\hat{i}+0\hat{j}+0\hat{k} \end{aligned}$$

(4.2)

In the case of Polarimetric Synthetic Aperture Radar (PolSAR) images [106] containing scattering matrices [14, 90]:

$$\begin{aligned} \begin{bmatrix} S_{HH} & \quad S_{HV} \\ S_{VH} & \quad S_{VV} \end{bmatrix} \end{aligned}$$

(4.3)

Matsumoto et al. [113] propose to compute a Pauli decomposition [32] from the complex scattering matrix, as follows:

$$\begin{aligned} a = \frac{S_{HH}+S_{VV}}{\sqrt{2}}, b = \frac{S_{HH}-S_{VV}}{\sqrt{2}}, c = \frac{S_{HV}+S_{VH}}{\sqrt{2}}, \end{aligned}$$

(4.4)

and assign the square magnitude of the components b, c, and a to the red, green, and blue channels, respectively, of an RGB image. Thereafter, the image is mapped to the quaternion domain as usual. Alternatively, Matsumoto et al. [113] propose to transform the scattering matrix into 3D Stokes vectors [98] normalized by their total power. Thereafter, each component is mapped to the imaginary parts of the quaternion.

Finally, for point clouds, Shen et al. [151] propose to map the (x, y, z) coordinates of each 3D point to a quaternion as follows:

$$\begin{aligned} \textbf{q}=0+x\hat{i}+y\hat{j}+z\hat{k}. \end{aligned}$$

(4.5)

4.1.1 Vision Under the Classic Convolution Paradigm

One of the first models proposed in the QCNNs literature were Residual Networks; in the quaternion domain, Gaudet and Maida [54] tested deep and shallow quaternion-valued residual nets. For both cases, they use three stages; the shallow network contains 2, 1 and 1 residual blocks in each stage, while the deep network uses 10, 9 and 9 residual blocks in each stage, see Fig. 4a. These models were applied on image classification tasks using the CIFAR-10 and CIFAR-100 [91] datasets, and the KITTI Road Estimation benchmark [49]. Recently, Sfikas et al. [148] proposed a standard residual model for keyword spotting in handwritten manuscripts, see Fig. 4c. Their model is applied on the PIOP-DAS [149] and the GRPOLY-DB [53] datasets, containing digitized documents written in modern Greek.

A different type of model is ConvNets. The first proposal came from Yin et al. [176], who implemented a basic QCNN model, see Fig. 5f. Since quaternion models extract 4 times more features, they propose to use an attention module, whose purpose is to filter out redundant features. These models were applied to the image classification problem on the CIFAR-10 dataset [91]. In addition, they propose a similar model for Double JPEG compression detection on the UCID dataset [78]. Thereafter, Jin et al. [83] propose to include a deformable layer, leading to a Deformable Quaternion Gabor CNN (DQG-CNN), see Fig. 5h. They apply it on a facial recognition task using the Oulu-CASIA [156], MMI [126], and SFEW [41] datasets. Chen et al. [26] propose an architecture based on quaternion versions of Fully Convolutional Neural Networks (FCN) [150]. Their model is applied for the color image splicing localization problem, and tested on CASIA v1.0, CASIA v2.0 [42], and Columbia color DVMM [122] datasets. The basis of their architecture is the Quaternion-valued Fully Convolutional Network (QFCN), see Fig. 5d. Then, the final model is composed of three QFCNs working in parallel, each one has different up-sampling layers: The first network does not have extra connections, the second network has a shortcut connection fusing the results of the fifth pooling layer with the output of the last layer, while the third network combines results of the third, and fourth pooling layers with the output of the last layer. In addition, each network uses a Super-pixel-enhanced pairwise Condition Random Field module to improve the results from the QCNN. This fusion of the outputs allows to work with different scales of image contents [103]. Thereafter, Chen et al. [27] propose the quaternion-valued two-stream Region-CNN. They extend the real-valued RGB-N [179] to the quaternion domain, improve it for pixel-level processing, and implement two extra-modules: an Attention Region Proposal Network, based on CBAM [168], for enhancing the features of important regions; and a Feature Pyramid Network, based on Quaternion-valued ResNet, to extract multi-scale features. Their results improved the ones previously published in [26]. Because of the complexity of these models, it is recommended to read it directly from [26, 27].

In the case of CAE’s, whose aim is to reconstruct the input feature at the output, Parcollet et al. [127] propose a quaternion convolutional encoder-decoder (QCAE), see Fig. 6a, and tested it on the KODAK PhotoCD dataset [48].

Another type of CAE’s is Variational Autoencoder, which estimates the probabilistic relationship between input and latent spaces. The Quaternion-valued Variational Autoencoder model (QVAE) was proposed by Grassucci et al. [59], see Fig. 6b, and applied it to reconstruct and generate faces of the CelebFaces Attributes Dataset (CelebA) [104].

A more sophisticated type of architecture is the generative models. Sfikas et al. [146] propose a Quaternion Generative Adversarial Network for text detection of inscriptions found on byzantine monuments [89, 138]. The generator is an U-Net [139] like model, and the discriminator, a cascade of convolution layers, see Fig. 6e, f, where the activation function of the last layer sums the output of real and imaginary parts of the quaternion to produce a real-valued output.

Grassucci et al. [58] adapted a Spectral Normalized GAN (SNGAN) [28, 116] to the quaternion domain, and apply it on an image to image translation task using the CelebA-HQ [86] and 102 Oxford Flowers [123] datasets. The model is presented in Fig. 6c, d. Thereafter, Grassucci et al. [60] proposed the quaternion-valued version of the StarGANv2 model [30]. It is composed of the generator, mapping, encoding, and discriminator networks; this model was evaluated on an image to image translation task using the CelebA-HQ dataset [86]. Because of the complexity of the model, it is recommended to consult them directly from [60].

4.1.2 Vision Under the Geometric Paradigm

The only residual model lying in this paradigm is the one of Hongo et al. [77], who proposed a residual model, based on ResNet34 [68], see Fig. 4a. This is applied to the image classification task using the CIFAR-10 dataset [91].

On ConvNet models, we have the work of Zhu et al. [181], who proposed shallow QCNN and quaternion VGG-S [145] models for image classification problems. These models are shown in Fig. 5b, c; the former was tested on the CIFAR-10 dataset [91], and the latter on the 102 Oxford flower dataset [123]. Hongo et al. [77] proposed a different QCNN model, see Fig. 5g, and tested it on the image classification task using the CIFAR-10 dataset [91]. Moreover, Matsumoto et al. [113] propose QCNN models for classifying pixels of PolSAR images. This type of images contain additional experimental features given in complex scattering matrices. For their experiments, they labeled each pixel of two images in one of 4 classes: water, grass, forest, or town. Then, they converted the complex scattering matrices into PolSAR pseudocolor features, or into normalized Stokes vectors. By testing similar models under these two different representations, they found out that the classification results largely depend on input features. The proposed model is shown in Fig. 5j.

In the context of CAE’s, Zhu et al. [181] propose a U-Net-like encoder-decoder network [139] for the color image denoising problem. The model was tested for a denoising task on images of the 102 Oxford flower dataset [123], and on a subset of the COCO dataset [102], see details of the model in [111, 139, 160, 181].

4.1.3 Vision Under the Equivariant Paradigm

For tasks that use point clouds as input data, Shen et al. [151] modified the real-valued PointNet model [25] by exchanging all the layers for Rotation Equivariant Quaternion Modules, and removing the Spatial Transformer module, since it discards rotation information. The rotation equivariant properties of the modules were evaluated on the ShapeNet dataset [24]. They experimentally proved that point clouds reconstructed using the synthesized quaternion features had the same orientations as point clouds generated by directly rotating the original point cloud. In addition, they modify other models, i.e. PointNet++ [137], DGCNN [163], and PointConv [170], by replacing their components into its equivariance counterparts, and tested it on a 3D shape classification task on the ModelNet40 [171] and 3D MNIST [37] datasets. Because the variety of components and interconnections, the reader is directed to [25, 137, 151, 163, 170] to consult the details of these models.

4.2 Language

For language applications, all works that have been proposed are ConvNets.

Under the classic paradigm, Parcollet et al. [130] propose a Connectionist Temporal Classification CTC-QCNN model, see Fig. 5a, and tested it on a phoneme recognition task, with TIMIT dataset [52]. The mapping of input signals to the quaternion domain is achieved by transforming the raw audio into 40-dimensional log mel-filter-bank coefficients with deltas, delta-deltas, and energy terms, and arranging the resulting vector into the components of an acoustic quaternion signal:

$$\begin{aligned} \mathbf {q(f,t)}=0+e(f,t)\hat{i}+ \frac{\partial e(f,t)}{\partial t} \hat{j} + \frac{\partial ^2 e(f,t)}{\partial t^2} \hat{k}. \end{aligned}$$

(4.6)

Another problem related to language is joint 3D Sound Event Localization and Detection (SELD); solving this task can be helpful for activity recognition, assisting hearing impaired people, among other applications. The SELD problem consists in simultaneous solving: the Sound Event Detection (SED) problem, i.e. in a set of overlapping sound events, detecting temporal activities of each sound event and associating a label, and the Sound Localization Problem, which consists of estimating the spatial localization trajectory of each sound. The latter task could be simplified to determining the orientation of a sound source with respect to the microphone, which is called Direction-of-Arrival (DOA). For this problem, also following a classical approach, Comminiello et al. [35] propose a quaternion-valued recurrent network (QSELD-net), based on the real-valued model of Adavanne et al. [1]. The model has a first processing stage, where output is processed in parallel by two branches: the first one performs a multi-label classification task (SED), while the second one performs a multi-output regression task (DOA estimation). The model is shown in Fig. 5e, where N is the number of sound event classes to be detected; for DOA estimation, we have three times more outputs, since they represent (x, y, z) coordinates for each sound event class. This model was tested on the Ambisonic, Anechoic and Synthetic Impulse Response (ANSYN), and the Ambisonic, Reverberant and Synthetic Impulse Response (RESYN) datasets [2], consisting of spatially located sound events in an anechoic/reverberant environment synthesized using artificial impulse responses. Each dataset comprises three subsets: no temporally overlapping sources (O1), maximum two temporally overlapping sources (O2) and maximum three temporally overlapping sources (O3). In this application, the input data is a multichannel audio signal; real-valued SELD-net [1] as well as the quaternion-valued counterpart [35] apply the same feature extraction method: the spectrogram is computed using the M-point discrete Fourier transform to obtain a feature sequence of T frames containing the magnitude and phase components for each channel.

Under the geometric paradigm, Muppidi and Radfar [119] proposed a QCNN for the emotion classification from speech task and evaluate it on the RAVDESS [105], IEMOCAP [23], and EMO-DB [22] datasets. They convert speech waveform inputs from the time domain to the frequency domain using Fourier transform, thereafter compute its Mel-Spectrogram, and convert it to an RGB image. Finally, RGB images are processed with the model shown in Fig. 5i, where the Reset-ReLU activation function resets invalid values to the nearest point in color space.

4.3 Forecasting

Neshat et al. [121] proposed an hybrid forecasting model, composed of QCNNs and Bi-directional LSTM recurrent networks, for prediction of short-term and long-term wind speeds. Historical meteorological wind data was collected from Lesvos and Samothraki Greek islands located in the North Aegean Sea, and obtained from the Institute of Environmental Research and Sustainable Development (IERSD), and the National Observatory of Athens (NOA). This work integrates classic standard QCNNs within a complete system, and shows that better results are obtained when using QCNNs than other AI models or handcrafted techniques. Because of the complexity of the model, it is recommended to consult the details directly from [121].

4.4 Insights

A fair comparison between the performance of the models and their individual blocks is difficult because of the fact that most of them are applied on different problems or using different datasets; moreover, most of the quaternion-valued models are based on their real-valued counterparts, and there are very few works based on incremental improvement over previous quaternion models. However, from the works reviewed, the following insights can be established:

1.
Quaternion-valued models achieve, at least, comparable results to their real-valued counterparts, and with a lower number of parameters. As was stated by [148]: “$\dots $an operation that is written as $y=Wx$, where $y\in \mathbb {R}^{4K}$, $x \in \mathbb {R}^{4\,L}$, and $W \in \mathbb {R}^{4K\times 4L}$ is thus mapped to a quaternionic operation $\textbf{y} = \textbf{W} \textbf{x}$, where $\textbf{y} \in \mathbb {H}^K$, $\textbf{x} \in \mathbb {H}^L$ and $\textbf{W}\in H^{K\times L}$. Parameter vector $\textbf{W}$ only contains $4 \times K \times L$ parameters compared to $4 \times 4 \times K \times L$ of W, hence we have a 4x saving.” In addition, some works report faster convergence on the training stage.
2.
For the image classification task, the CIFAR-10 dataset has become the standard benchmark. Although the classification errors reported in different works are not standardized (for example: some authors use data-augmentation techniques), the best reported result is from Gaudet and Maida [54], who uses a classic approach and a deep residual model. In addition, residual models present the best results for image classification tasks.
3.
Attention modules filter out redundant features, and their use can improve the performance of the network [176].
4.
Variance Quaternion Batch Normalization (VQBN) experimentally outperforms Split Batch Normalization [176].
5.
The integration of quaternion Gabor filters and convolution layers enhances the abilities of the model to capture features in the aspects of spatial localization, orientation selectivity, and spatial frequency selectivity [83, 117, 118].
6.
Quaternion generative models demonstrate better abilities to learn the properties of the color space, and generate RGB images with more defined objects than real-valued models [58, 127].

Finally, Table 11 presents information about the available source code. It only includes source code published by the authors of the papers; other implementations can be found in the popular website paperswithcode [154].

Table 11 Available source code of some QCNNs; all implementations lie in the classic approach

Full size table

5 Further Discussion

Section 3 presented each individual block, discussion of current issues, knowledge gaps, and future works for improvement. Here, we discuss future directions of research and open questions for any model of QCNNs, independently of its components. Even more, some topics involve any type of quaternion-valued deep learning model.

5.1 On Proper Comparisons

When looking at the different models that have been proposed, one of the questions that arises is: Is it fair to compare a real-valued architecture with a quaternion-valued one using the same topology and number of parameters? Note that in the first case, the optimization of the parameters occurs in an Euclidean 4-dimensional space, while in the second case, the optimization occurs in the quaternion space, which can be connected to geometric spaces different to the Euclidean one. From this observation, we could argue that a more reasonable way to compare is: an optimal real value network vs the optimal quaternion-value network, even though they have different topologies, connections, and number of weights. Now, it is clear that a single real-valued convolution layer cannot capture interchannel relationships, but could a real-valued multilayer or recurrent architecture capture interchannel relationships without the use of Hamilton products? When are Hamilton products really needed?

Something that could shed light on these questions is the reflection of Sfikas et al. [148], who believe that the effectiveness of quaternion networks is because of “navigation on a much compact parameter space during learning” [148], as well as an extra parameter sharing trait.

5.2 Mapping from Data to Quaternions

Another question that arises is: how to find an optimal mapping from the input data to the quaternion domain? Common sense would say that finding a suitable mapping, and connecting with the right geometry could seriously improve the performance of quaternion networks; however, theoretical and experimental work is required in this direction. For example, for images, we can adopt different color models with a direct geometric relation to quaternion space, and point clouds as well as deformable models can avoid algebraic singularities when the quaternion representation is used. Moreover, models for language tasks are in their infancy, and novel methods that take advantage of the quaternion space should be proposed for signal processing, speech recognition, and other tasks.

5.3 Extension to Hypercomplex and Geometric Algebras

Current QCNN models rely on processing 4-dimensional input data, which is mapped to the quaternion domain. If the input data have less than 4 channels, a common solution is to apply zero-padding on some channels, or define a mapping to a 4-dimensional space.

In contrast, for more than 4 channels, there are several alternatives: The first one is to map the input data to an n-dimensional space, where n is a multiple of four. Then, we can process the input data by defining a quaternion kernel for each 4-channel input, see Fig. 2. Alternatively, an extension of the quaternion convolution to the octonions, sedenions or generalized hypercomplex numbers can be applied [159]. Moreover, Geometric algebras [71, 84] can be used to generalize hypercomplex convolution to general n-dimensional spaces, not restricted to multiples of 4, and to connect with different geometries.

An approach in this direction is to parametrize the hypercomplex convolution in such a way that the convolution kernel is separated in a set of matrices containing the algebra multiplication rules, and a set of matrices containing the filtering weights [61, 178]. Let S and D be the input and output channel dimension of the data, respectively, and let $\textbf{W}\in \mathbb {R}^{S\times D}$ be an element of the convolution kernel; then, it is decomposed into the sum of its Kronecker products [70] as follows:

$$\begin{aligned} \textbf{W}= & \sum _{n=1}^{N}\textbf{A}_n \otimes \textbf{V}_n \nonumber \\= & \sum _{n=1}^{N} \begin{bmatrix} {a}_{1,1,n}\textbf{V}_n & \quad \cdots & \quad {a}_{1,N.n}\textbf{V}_n\\ \vdots & \quad \ddots & \quad \vdots \\ {a}_{N,1,n}\textbf{V}_n & \quad \cdots & \quad {a}_{N,N,n}\textbf{V}_n \end{bmatrix} \end{aligned}$$

(5.1)

where $\textbf{A}_n\in \mathbb {R}^{N\times N}$, with $n=1,\dots ,N$ are learnable matrices containing the algebra multiplication rules, each $\textbf{V}_n\in \mathbb {R}^{\frac{S}{N}\times \frac{D}{N}\times K\times K}$ is a learnable matrix, and they compose together an element $\textbf{W}$ of the convolution kernel.

The parameter N defines the dimension of the algebra; then, for $N=2$ we are working in the complex domain, while a value $N=4$ leads us to the quaternion space. For example, for computing the product between quaternions ${\textbf {q}}$ and ${\textbf {p}}$:

$$\begin{aligned} {\textbf {q}} {\textbf {p}} = (q_R+ q_I\hat{i}+ q_J\hat{j} + q_K\hat{k})(p_R+ p_I\hat{i}+ p_J\hat{j} + p_K\hat{k}), \end{aligned}$$

(5.2)

which can be rewritten as:

$$\begin{aligned} {\textbf {q}} {\textbf {p}}= & \begin{bmatrix} q_R & \quad -q_I & \quad -q_J & \quad -q_K \\ q_I & \quad q_R & \quad -q_K & \quad q_J \\ q_J & \quad q_K & \quad q_R & \quad -q_I \\ q_K & \quad -q_J & \quad q_I & \quad q_R \\ \end{bmatrix} \begin{bmatrix} p_R \\ p_I \\ p_J \\ p_K \\ \end{bmatrix}, \end{aligned}$$

(5.3)

and their decomposition into the sum of Kronecker products is:

$$\begin{aligned} {\textbf {q}} {\textbf {p}}= & \begin{bmatrix} 1 & \quad 0 & \quad 0 & \quad 0 \\ 0 & \quad 1 & \quad 0 & \quad 0 \\ 0 & \quad 0 & \quad 1 & \quad 0 \\ 0 & \quad 0 & \quad 0 & \quad 1 \\ \end{bmatrix} \otimes \begin{bmatrix} p_R \\ \end{bmatrix}+ \begin{bmatrix} 0 & \quad -1 & \quad 0 & \quad 0 \\ 1 & \quad 0 & \quad 0 & \quad 0 \\ 0 & \quad 0 & \quad 0 & \quad -1 \\ 0 & \quad 0 & \quad 1 & \quad 0 \\ \end{bmatrix} \otimes \begin{bmatrix} p_I \\ \end{bmatrix}+ \nonumber \\ & \begin{bmatrix} 0 & \quad 0 & \quad -1 & \quad 0 \\ 0 & \quad 0 & \quad 0 & \quad 1 \\ 1 & \quad 0 & \quad 0 & \quad 0 \\ 0 & \quad -1 & \quad 0 & \quad 0 \\ \end{bmatrix} \otimes \begin{bmatrix} p_J \\ \end{bmatrix}+ \begin{bmatrix} 0 & \quad 0 & \quad 0 & \quad -1 \\ 0 & \quad 0 & \quad -1 & \quad 0 \\ 0 & \quad 1 & \quad 0 & \quad 0 \\ 1 & \quad 0 & \quad 0 & \quad 0 \\ \end{bmatrix} \otimes \begin{bmatrix} p_K \\ \end{bmatrix}. \end{aligned}$$

(5.4)

Since matrices $\textbf{A}_n$, and $\textbf{V}_n$ are obtained during training, this method does not require previous definition of multiplication rules, it just requires to set up a single parameter N, which represents the dimension of the algebra. Thus, the works of Grassucci, E. et al. [61] and Zhang et al. [178] are an approach to learn latent mathematical structures directly from data. Other ideas on this topic will be discussed in the next section.

5.4 Machine-Learning Mathematical Structures

The use of quaternion-valued algorithms might lead to a faster search of solutions in the quaternion space or to find better solutions for problems, since the machinery of the algorithm is built on quaternion properties and solutions are computed directly on the quaternion domain.

For example, QCNNs could be applied for synthesizing quaternion functions from data in a similar way to the work of Lample and Charton [94] for real-valued functions. Another interesting question arises about which of the mathematical problems of the ones recollected in [69] can be posed in the quaternion domain, and how to solve them with QCNNs models.

Table 12 Acronym table

Full size table

5.5 Quaternions on the Frequency Domain

A natural implementation of QCNNs in the frequency domain would be to compute the Quaternion Fourier transform [21, 74, 134] of the input and kernels, and multiply them in the frequency domain. Moreover, the computing of the convolution could be accelerated using Fast QFT algorithms [140]. To this date, works following this approach have not been published. However, using a bio-inspired approach, Moya-sanchez et al. [118] proposed a monogenic convolution layer. A monogenic signal [47], $I_M$, is a mapping of the input signal, which simultaneously encodes local space and frequency characteristics:

$$\begin{aligned} I_M = I'+I_1\hat{i}+I_2\hat{j} \end{aligned}$$

(5.5)

where $I_1$ and $I_2$ are the Riesz transforms of the input in the x and y directions. In [118], the monogenic signal is computed in the Fourier domain, $I'$ is computed as a convolution of the input with quaternion log-Gabor filters with learnable parameters, and local phase and orientation are computed for achieving contrast invariance. In addition, their model provides crucial sensitivity to orientations, resembling properties of V1 cortex layer.

5.6 Might the Classic, Geometric, and Equivariant Models be Special Cases of a Unified General Model?

To answer this question, let us recall that a quaternion is an element of a 4-dimensional space, and unitary versors together with the Hamilton product are isomorphic to the group SO(4). Thus, our point of view is that the classic model, using the four components, is the most general one. By dividing the product as a sandwiching product and using an equivalent polar representation, a generalized version of the geometric model could be obtained (not Euclidean or affine, but 3D projective model); in addition, for the equivariant model, the kernel is reduced merely to its real components. However, the connection to invariant theory should be investigated for each of these models, so that a stratified organization at the light of geometry might be achieved. For example, if we link a unified general model to a particular geometry and its invariant properties, we could go down on the hierarchy of geometries by setting restrictions on the general model, until we obtain the rotation equivariant model or the others.

6 Conclusions

We have recollected the substantial majority of the ideas that have been published on QCNNs in recent years. In particular, being the convolution layer the core component of this type of models, we presented a logical organization of QCNNs based on the definition of convolution; therefore, we proposed three classifications: classic, geometric, and rotation equivariant. For other components, we presented the purpose of each block, the key problem to solve in its design, knowledge gaps, if any, and ideas for future improvement. In addition, a review of the models by application domain and topology type was presented, including available source code. Further ideas for model design were also discussed.

Finally, most of the ideas that have been implemented are extensions of the work on real-valued CNNs or complex CNNs to the quaternion domain [62, 157], or adaptations from quaternion neural networks models [128]. Further work is required in developing novel ideas, and exploiting the particularities of quaternion representation and its connection to geometry, topology, functional analysis, or invariant theory. This paper presents, in an organized manner, the current advances in the development of QCNNs, in the hope that it serves as a starting point for subsequent research, as well as for those interested in implementing applications.

Data Availability

Data availability is not applicable to this article as no new data were created or analysed in this study.

References

Adavanne, S., Politis, A., Nikunen, J., Virtanen, T.: Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J. Sel. Top. Signal Process. 13(1), 34–48 (2019). https://doi.org/10.1109/JSTSP.2018.2885636
Article ADS Google Scholar
Adavanne, S., Politis, A., Virtanen, T.: Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network. In: Proceedings of the 26th European Signal Processing Conference, EUSIPCO, pp. 1462–1466. IEEE, Rome (2018)
Altamirano, G.: Geometric methods of perceptual organisation for computer vision. Ph.D. thesis, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Cinvestav (2017)
Altmann, S.: Rotations, Quaternions, and Double Groups, 1st edn. Oxford University Press, New York (1986)
Google Scholar
Anselmi, F., Leibo, J.Z., Rosasco, L., Mutch, J., Tacchetti, A., Poggio, T.: Unsupervised learning of invariant representations with low sample complexity: The magic of sensory cortex or a new framework for machine learning. Tech. Rep. CBMM Memo No. 001, Massachusetts Institute of Technology, Cambridge (2014)
Arena, P., Baglio, S., Fortuna, L., Xibilia, M.G.: Chaotic time series prediction via quaternionic multilayer perceptrons. In: Proceedings of the IEEE Conference on System, Man and Cybernetics, pp. 1790–1794. IEEE, Vancouver (1995)
Arena, P., Caponetto, R., Fortuna, L., Muscato, G., Xibilia, M.G.: Quaternionic multilayer perceptrons for chaotic time series prediction. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E79–A(10), 1682–1688 (1996). https://doi.org/10.1016/j.ins.2017.09.057
Article Google Scholar
Arena, P., Fortuna, L., Muscato, G., Xibilia, M.G.: Multilayer perceptrons to approximate quaternion valued functions. Neural Netw. 10(2), 335–342 (1997). https://doi.org/10.1016/S0893-6080(96)00048-2
Article Google Scholar
Arena, P., Fortuna, L., Muscato, G., Xibilia, M.G.: MLP in quaternion algebra. In: Arena, P., Fortuna, L., Muscato, G., Xibilia, M.G. (eds.) Neural Networks in Multidimensional Domains: Fundamentals and New Trends in Modelling and Control. Lecture Notes in Control and Information Sciences (LNCIS), vol. 234, pp. 49–75. Springer, London (1998)
Chapter Google Scholar
Arena, P., Fortuna, L., Occhipinti, L., Xibilia, M.: Neural networks for quaternion-valued function approximation. In: Proceedings of the IEEE International Symposium on Circuits and Systems, ISCAS ’94, vol. 6, pp. 307–310. IEEE, London (1994). https://doi.org/10.1109/ISCAS.1994.409587
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017). Preprint at arXiv:1701.07875
Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks (2016). Preprint at arXiv:1511.06464
Baez, J.C.: The octonions. Bull. Am. Math. Soc. 39, 145–205 (2002). https://doi.org/10.1090/S0273-0979-01-00934-X
Article MathSciNet Google Scholar
Boerner, W.M., El-Arini, M., Chan, C.Y., Mastori, P.: Polarization dependence in electromagnetic inverse problems. IEEE Trans. Antennas Propag. 29(2), 262–271 (1981). https://doi.org/10.1109/TAP.1981.1142585
Article ADS Google Scholar
Brandstetter, J., van den Berg, R., Welling, M., Gupta, J.K.: Clifford neural layers for PDE modeling (2023). Preprint at arXiv:2209.04934
Buchholz, S.: A theory of neural computation with Clifford algebras. Ph.D. thesis, University of Kiel, Germany (2005)
Buchholz, S., Bihan, N.: Polarized signal classification by complex and quaternionic multilayer perceptrons. Int. J. Neural Syst. 18(2), 75–85 (2008). https://doi.org/10.1142/S0129065708001403
Article Google Scholar
Buchholz, S., Sommer, G.: Clifford algebra multilayer perceptrons. In: G. Sommer (ed.) Geometric Computing with Clifford Algebras: Theoretical Foundations and Applications in Computer Vision and Robotics, chap. 13, pp. 315–334. Springer, Berlin (2001). https://doi.org/10.1007/978-3-662-04621-0_13
Buchholz, S., Sommer, G.: Introduction to neural computation in Clifford algebra. In: G. Sommer (ed.) Geometric Computing with Clifford Algebras: Theoretical Foundations and Applications in Computer Vision and Robotics, chap. 12, pp. 291–314. Springer, Berlin (2001). https://doi.org/10.1007/978-3-662-04621-0_12
Bülow, T.: Hypercomplex spectral signal representations for the processing and analysis of images. Ph.D. thesis, University of Kiel, Germany (1999)
Bülow, T., Sommer, G.: Hypercomplex signals-a novel extension of the analytic signal to the multidimensional case. IEEE Trans. Signal Process. 49(11), 2844–2852 (2001). https://doi.org/10.1109/78.960432
Article ADS MathSciNet Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of Interspeech, pp. 1517–1520. ISCA, Lisbon (2005)
Busso, C., Bulut, M., Lee, C.C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J., Lee, S., Narayanan, S.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008). https://doi.org/10.1007/s10579-008-9076-6
Article Google Scholar
Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: ShapeNet: An Information-Rich 3D Model Repository (2015). Preprint at arXiv:1512.03012
Charles, R.Q., Su, H., Kaichun, M., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 77–85. IEEE, Honolulu (2017)
Chen, B., Gao, Y., Xu, L., Hong, X., Zheng, Y., Shi, Y.Q.: Color image splicing localization algorithm by quaternion fully convolutional networks and superpixel-enhanced pairwise conditional random field. Math. Biosci. Eng. 16(6), 6907–6922 (2019). https://doi.org/10.3934/mbe.2019346
Article MathSciNet Google Scholar
Chen, B., Ju, X., Gao, Y., Wang, J.: A quaternion two-stream R-CNN network for pixel-level color image splicing localization. Chin. J. Electron. 30(6), 1069–1079 (2021). https://doi.org/10.1049/cje.2021.08.004
Article Google Scholar
Chen, T., Zhai, X., Ritter, M., Lucic, M., Houlsby, N.: Self-supervised GANs via auxiliary rotation loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, pp. 12146–12155. IEEE, Long Beach (2019)
Cheong Took, C., Mandic, D.P.: Augmented second-order statistics of quaternion random signals. Signal Process. 91(2), 214–224 (2011). https://doi.org/10.1016/j.sigpro.2010.06.024
Article ADS Google Scholar
Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: Diverse Image Synthesis for Multiple Domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, pp. 8185–8194. IEEE, Seattle (2020)
Chong, H.Y., Yap, H.J., Tan, S.C., Yap, K.S., Wong, S.Y.: Advances of metaheuristic algorithms in training neural networks for industrial applications. Soft Comput. 25(16), 11209–11233 (2021). https://doi.org/10.1007/s00500-021-05886-z
Article Google Scholar
Cloude, S., Pottier, E.: A review of target decomposition theorems in radar polarimetry. IEEE Trans. Geosci. Remote Sens. 34(2), 498–518 (1996). https://doi.org/10.1109/36.485127
Article ADS Google Scholar
Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. In: Lintas, A., Rovetta, S., Verschure, P.F., Villa, A.E. (eds.) Proceedings of the 4th International Conference on Learning Representations, ICLR, pp. 49–55. Springer International Publishing, San Juan (2016)
Google Scholar
Collobert, R.: Large scale machine learning. Ph.D. thesis, Université Paris VI, France (2004)
Comminiello, D., Lella, M., Scardapane, S., Uncini, A.: Quaternion convolutional neural networks for detection and localization of 3D sound events. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 8533–8537. IEEE, Brighton (2019)
Dasdemir, A.: On Hadamard product of hypercomplex numbers. Bull. Karaganda Univ. 104(4), 68–73 (2021). https://doi.org/10.31489/2021I4/68-73
Article Google Scholar
de la Iglesia, D.: 3D MNIST (2016). https://www.kaggle.com/datasets/daavoo/3d-mnist. Accessed 9 May 2023
De Leo, S., Rotelli, P.: Local hypercomplex analyticity (1997). Preprint at arXiv:funct-an/9703002
De Leo, S., Rotelli, P.: Quaternionic analyticity. Appl. Math. Lett. 16(7), 1077–1081 (2003). https://doi.org/10.1016/S0893-9659(03)90097-8
Article MathSciNet Google Scholar
Deavours, C.A.: The quaternion calculus. Am. Math. Mon. 80(9), 995–1008 (1973). https://doi.org/10.1080/00029890.1973.11993432
Article MathSciNet Google Scholar
Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, ICCV Workshops, pp. 2106–2112. IEEE, Barcelona (2011)
Dong, J., Wang, W., Tan, T.: CASIA image tampering detection evaluation database. In: Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP, pp. 422–426. IEEE, Beijing (2013)
Du Val, P.: Homographies, Quaternions and Rotations, 1st edn. Oxford University Press, London (1964)
Google Scholar
Ell, T.A.: Quaternion-Fourier transforms for analysis of two-dimensional linear time-invariant partial differential systems. In: Proceedings of 32nd IEEE Conference on Decision and Control, vol. 2, pp. 1830–1841. IEEE, San Antonio (1993)
Ell, T.A., Le Bihan, N., Sangwine, S.: Quaternion Fourier Transforms for Signal and Image Processing, 1st edn. ISTE Ltd and John Wiley & Sons Inc, Hoboken (2014)
Book Google Scholar
Ell, T.A., Sangwine, S.J.: Hypercomplex Fourier transforms of color images. IEEE Trans. Image Process. 16, 22–35 (2007). https://doi.org/10.1109/ACCESS.2019.2897000
Article ADS MathSciNet Google Scholar
Felsberg, M., Sommer, G.: The monogenic signal. IEEE Trans. Signal Process. 49(12), 3136–3144 (2001). https://doi.org/10.1109/78.969520
Article ADS MathSciNet Google Scholar
Franzen, R.W.: Kodak lossless true color image suite (2013). http://r0k.us/graphics/kodak/. Accessed 9 May 2023
Fritsch, J., Kuehnl, T., Geiger, A.: A new performance measure and evaluation benchmark for road detection algorithms. In: Proceedings of the 16th International Conference on Intelligent Transportation Systems, ITSC, pp. 1693–1700. IEEE, The Hague (2013)
Fukushima, K.: Visual feature extraction by a multilayered network of analog threshold elements. IEEE Trans. Syst. Sci. Cybern. 5(4), 322–333 (1969). https://doi.org/10.1109/TSSC.1969.300225
Article Google Scholar
Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980). https://doi.org/10.1007/BF00344251
Article Google Scholar
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N.: Darpa timit acoustic-phonetic continuous speech corpus CD-ROM TIMIT. Tech. Rep. NISTIR 4930, U.S. Department of Commerce, National Institute of Standards and Technology, Gaithersburg (1993)
Gatos, B., Stamatopoulos, N., Louloudis, G., Sfikas, G., Retsinas, G., Papavassiliou, V., Sunistira, F., Katsouros, V.: GRPOLY-DB: An old Greek polytonic document image database. In: Proceedings of the 13th International Conference on Document Analysis and Recognition, ICDAR, pp. 646–650. IEEE, Nancy (2015)
Gaudet, C., Maida, A.: Deep quaternion networks. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN, pp. 1–8. IEEE, Rio (2018)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 249–256. PMLR, Sardinia (2010). https://proceedings.mlr.press/v9/glorot10a.html
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, 1st edn. MIT Press, Cambridge (2016)
Google Scholar
Gouk, H., Frank, E., Pfahringer, B., Cree, M.J.: Regularisation of neural networks by enforcing Lipschitz continuity. Mach. Learn. 110(2), 393–416 (2021). https://doi.org/10.1007/s10994-020-05929-w
Article MathSciNet Google Scholar
Grassucci, E., Cicero, E., Comminiello, D.: Quaternion generative adversarial networks. In: Razavi-Far, R., Ruiz-Garcia, A., Palade, V., Schmidhuber, J. (eds.) Generative Adversarial Learning: Architectures and Applications, Intelligent Systems Reference Library (ISRL), vol. 217, pp. 57–86. Springer International Publishing, Cham (2022)
Google Scholar
Grassucci, E., Comminiello, D., Uncini, A.: A quaternion-valued variational autoencoder. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 3310–3314. IEEE, Toronto (2021). https://doi.org/10.1109/ICASSP39728.2021.9413859
Grassucci, E., Sigillo, L., Uncini, A., Comminiello, D.: Hypercomplex image-to-image translation. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN, pp. 1–8. IEEE, Italy (2022)
Grassucci, E., Zhang, A., Comminiello, D.: PHNNs: lightweight neural networks via parameterized hypercomplex convolutions (2021). Preprint at arXiv:2110.04176
Guberman, N.: On complex valued convolutional neural networks. Master’s thesis, The Hebrew University of Jerusalem, Israel (2016)
Hamilton, W.: Lectures on quaternions: containing a systematic statement of a new mathematical method, 1st edn. Hodges and Smith, Whittaker & Co., MacMillan & Co., Dublin (1853)
Google Scholar
Hamilton, W.: Elements of Quaternions, 1st edn. Longmans, Green, & Co., London (1866)
Google Scholar
Hamilton, W.R.: On quaternions, or on a new system of imaginaries in algebra (2000). https://www.maths.tcd.ie/pub/HistMath/People/Hamilton/OnQuat/. Accessed 9 May 2023
Hanson, A.J.: Visualizing Quaternions, 1st edn. Elsevier Inc., San Francisco (2006)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, ICCV, pp. 1026–1034. IEEE, Santiago (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 770–778. IEEE, Las Vegas (2016)
He, Y.H.: Machine-learning mathematical structures (2021). Preprint at arXiv:2101.06317
Henderson, H., Pukelsheim, F., Searle, S.: On the history of the kronecker product. Linear Multilinear Algebra 14(2), 113–120 (1983). https://doi.org/10.1080/03081088308817548
Article MathSciNet Google Scholar
Hestenes, D., Sobczyk, G.: Clifford Algebra to Geometric Calculus: A Unified Language for Mathematics and Physics, 1st edn. D. Reidel Publishing Company, Dordrecht (1987)
Google Scholar
Hirose, A.: Complex-valued neural networks, Studies in Computational Intelligence, vol. 400. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-27632-3
Book Google Scholar
Hirose, A., Yoshida, S.: Generalization characteristics of complex-valued feedforward neural networks in relation to signal coherence. IEEE Trans. Neural Netw. Learn. Syst. 23(4), 541–551 (2012). https://doi.org/10.1109/TNNLS.2012.2183613
Article Google Scholar
Hitzer, E.: The quaternion domain Fourier transform and its properties. Adv. Appl. Clifford Algebras 26(3), 969–984 (2016). https://doi.org/10.1007/s00006-015-0620-3
Article MathSciNet Google Scholar
Hitzer, E., Sangwine, S.J.: Quaternion and Clifford Fourier Transforms and Wavelets, 1st edn. Springer, Basel (2013)
Book Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Hongo, S., Isokawa, T., Matsui, N., Nishimura, H., Kamiura, N.: Constructing convolutional neural networks based on quaternion. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN, pp. 1–6. IEEE, Glasgow (2020)
Huang, F., Huang, J., Shi, Y.Q.: Detecting double JPEG compression with the same quantization matrix. IEEE Trans. Inf. Forensics Secur. 5(4), 848–856 (2010). https://doi.org/10.1109/TIFS.2010.2072921
Article ADS Google Scholar
Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex. J. Physiol. (Lond.) 195, 215–243 (1968). https://doi.org/10.1113/jphysiol.1968.sp008455
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, ICML’15, vol. 37, pp. 448–456. JMLR.org, Lille (2015). https://doi.org/10.5555/3045118.3045167
Isokawa, T., Matsui, N., Nishimura, H.: Quaternionic neural networks: fundamental properties and applications. In: Nitta, T. (ed.) Complex-Valued Neural Networks: Utilizing High-Dimensional Parameters, 1st edn., pp. 411–439. IGI Global, Hershey (2009). https://doi.org/10.4018/978-1-60566-214-5.ch016
Chapter Google Scholar
Isokawa, T., Nishimura, H., Matsui, N.: Quaternionic multilayer perceptron with local analyticity. Information 3(4), 756–770 (2012). https://doi.org/10.3390/info3040756
Article Google Scholar
Jin, L., Zhou, Y., Liu, H., Song, E.: Deformable quaternion Gabor convolutional neural network for color facial expression recognition. In: Proceedings of the 27th IEEE International Conference on Image Processing, ICIP, pp. 1696–1700. IEEE, United Arab Emirates (2020)
Kanatani, K.: Understanding Geometric Algebra: Hamilton, Grassmann, and Clifford for Computer Vision and Graphics, 1st edn. CRC Press/Taylor & Francis Group, Boca Raton (2015)
Book Google Scholar
Kantor, I.L., Solodovnikov, A.S.: Hypercomplex Numbers. An Elementary Introduction to Algebras, 1st edn. Springer, New York (1989)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the 6th International Conference on Learning Representations, ICLR. OpenReview, Vancouver (2018). https://openreview.net/forum?id=Hk99zCeAb
Kessy, A., Lewin, A., Strimmer, K.: Optimal whitening and decorrelation. Am. Stat. 72(4), 309–314 (2018). https://doi.org/10.1080/00031305.2016.1277159
Article MathSciNet Google Scholar
Khan, A., Sohail, A., Zahoora, U., Qureshi, A.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53(8), 5455–5516 (2020). https://doi.org/10.1007/s10462-020-09825-6
Article Google Scholar
Kordatos, E., Exarchos, D., Stavrakos, C., Moropoulou, A., Matikas, T.E.: Infrared thermographic inspection of murals and characterization of degradation in historic monuments. Constr. Build. Mater. 48, 1261–1265 (2013). https://doi.org/10.1016/j.conbuildmat.2012.06.062
Article Google Scholar
Kostinski, A., Boerner, W.M.: On foundations of radar polarimetry. IEEE Trans. Antennas Propag. 34(12), 1395–1404 (1986). https://doi.org/10.1109/TAP.1986.1143771
Article ADS Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images (2009). https://www.cs.toronto.edu/~kriz/cifar.html. Accessed 9 May 2023
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, NIPS, vol. 1, pp. 1097–1105. Curran Associates Inc., Lake Tahoe (2012)
Kuipers, J.B.: Quaternions and Rotation Sequences: A Primer with Applications to Orbits, Aerospace and Virtual Reality, 1st edn. Princeton University Press, New Jersey (1999)
Book Google Scholar
Lample, G., Charton, F.: Deep learning for symbolic mathematics. In: Proceedings of the 8th International Conference on Learning Representations, ICLR. OpenReview, Addis Ababa (2020). https://openreview.net/forum?id=S1eZYeHFDS
LeCun, Y.: Generalization and network design strategies. Tech. Rep. CRG-TR-89-4, University of Toronto. Connectionis Research Group, Toronto (1989)
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: Arbib, M. (ed.) The Handbook of Brain Theory and Neural Networks, 1st edn., pp. 255–258. MIT Press, Cambridge (1998). https://doi.org/10.5555/303568.303704
Chapter Google Scholar
LeCun, Y., Bottou, L., Orr, G., Müller, K.: Efficient BackProp. In: Orr, G., Müller, K. (eds.) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science (LNCS), vol. 7700, pp. 9–50. Springer, Berlin (1998)
Chapter Google Scholar
Lee, J.S., Pottier, E.: Polarimetric Radar Imaging: From Basics to Applications, 1st edn. CRC Press, Boca Raton (2009)
Google Scholar
Li, Z., Liu, F., Yang, W., Peng, S., Zhou, J.: A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 33(12), 6999–7019 (2021). https://doi.org/10.1109/TNNLS.2021.3084827
Article ADS MathSciNet Google Scholar
Lin, M., Chen, Q., Yan, S.: Network in network. In: Proceedings of the International Conference on Learning Representations, ICLR. OpenReview, Scottsdale (2013). https://openreview.net/forum?id=ylE6yojDR5yqX
Lin, M., Chen, Q., Yan, S.: Network in network (2013). Preprint at arXiv:1312.4400
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common Objects in Context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Proceedings of the 10th European Conference on Computer Vision, ECCV, pp. 740–755. Springer International Publishing, Zurich (2014)
Google Scholar
Liu, B., Pun, C.M.: Locating splicing forgery by fully convolutional networks and conditional random field. Signal Process. Image Commun. 66, 103–112 (2018). https://doi.org/10.1016/j.image.2018.04.011
Article Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, ICCV, pp. 3730–3738. IEEE, Santiago (2015)
Livingstone, S.R., Russo, F.A.: The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One 13(5), 1–35 (2018). https://doi.org/10.1371/journal.pone.0196391
Article Google Scholar
López-Martínez, C., Pottier, E.: Basic principles of SAR polarimetry, chap. 1. In: Hajnsek, I., Desnos, Y.L. (eds.) Polarimetric Synthetic Aperture Radar: Principles and Application, 1st edn., pp. 1–58. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-56504-6_1
Chapter Google Scholar
Lounesto, P.: Clifford Algebras and Spinors, 2nd edn. Cambridge University Press, Cambridge (2001)
Book Google Scholar
Luna-Elizarrarás, M., Pogorui, A., Shapiro, M., Kolomiiets, T.: On quaternionic measures. Adv. Appl. Clifford Algebras 30(4), 1661–4909 (2020). https://doi.org/10.1007/s00006-020-01090-8
Article MathSciNet Google Scholar
Maas, A.L., Hannun, A., Ng, A.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th ICML Workshop on Deep Learning for Audio, Speech and Language Processing. JMLR.org, Atlanta (2013)
Mandic, D., Jahanchahi, C., Took, C.: A quaternion gradient operator and its applications. IEEE Signal Process. Lett. 18(1), 47–50 (2011). https://doi.org/10.1109/LSP.2010.2091126
Article ADS Google Scholar
Mao, X.J., Shen, C., Yang, Y.B.: Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 2810-2818. Curran Associates Inc., Barcelona (2016). https://doi.org/10.5555/3157382.3157412
Matsui, N., Isokawa, T., Kusamichi, H., Peper, F., Nishimura, H.: Quaternion neural network with geometrical operators. J. Intell. Fuzzy Syst. 15, 149–164 (2004). https://doi.org/10.5555/1299118.1299121
Article Google Scholar
Matsumoto, Y., Natsuaki, R., Hirose, A.: Full-learning rotational quaternion convolutional neural networks and confluence of differently represented data for PolSAR land classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 15, 2914–2928 (2022). https://doi.org/10.1109/JSTARS.2022.3164431
Article ADS Google Scholar
Minemoto, T., Isokawa, T., Nishimura, H., Matsui, N.: Feed forward neural network with random quaternionic neurons. Signal Process. 136, 59–68 (2017). https://doi.org/10.1016/j.sigpro.2016.11.008
Article ADS Google Scholar
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: Proceedings of the 6th International Conference on Learning Representations, ICLR. OpenReview, Vancouver (2018). https://openreview.net/forum?id=B1QRgziT-
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks (2018). Preprint at arXiv:1802.05957
Moya-Sánchez, E., Xambó-Descamps, S., Sánchez Pérez, A., Salazar-Colores, S., Martínez-Ortega, J., Cortés, U.: A bio-inspired quaternion local phase CNN layer with contrast invariance and linear sensitivity to rotation angles. Pattern Recognit. Lett. 131, 56–62 (2020). https://doi.org/10.1016/j.patrec.2019.12.001
Article ADS Google Scholar
Moya-Sánchez, E.U., Xambó-Descamps, S., Pérez, A.S., Salazar-Colores, S., Cortés, U.: A trainable monogenic ConvNet layer robust in front of large contrast changes in image classification. IEEE Access 9, 163735–163746 (2021). https://doi.org/10.1109/ACCESS.2021.3128552
Article Google Scholar
Muppidi, A., Radfar, M.: Speech emotion recognition using quaternion convolutional neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 6309–6313. IEEE, Toronto (2021)
Muramoto, N., Isokawa, T., Nishimura, H., Matsui, N.: On processing three dimensional data by quaternionic neural networks. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN, pp. 1–5. IEEE, Dallas (2013)
Neshat, M., Majidi Nezhad, M., Mirjalili, S., Piras, G., Astiaso Garcia, D.: Quaternion convolutional long short-term memory neural model with an adaptive decomposition method for wind speed forecasting: North aegean islands case studies. Energy Convers. Manag. 259, 115590 (2022). https://doi.org/10.1016/j.enconman.2022.115590
Article Google Scholar
Ng, T.T., Chang, S.F.C.: A dataset of authentic and spliced image blocks (2004). https://www.ee.columbia.edu/ln/dvmm/newDownloads.htm/. Accessed 9 May 2023
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the 6th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), ICVGIP, pp. 722–729. IEEE, Bhubaneswar (2008). https://doi.org/10.1109/ICVGIP.2008.47
Nitta, T.: On the critical points of the complex-valued neural network. In: Proceedings of the 9th International Conference on Neural Information Processing, ICONIP ’02, vol. 3, pp. 1099–1103. IEEE, Singapore (2002)
Nitta, T., Tsukuba, A.: A solution to the 4-bit parity problem with a single quaternary neuron. Neural Inf. Process. Lett. Rev. 5(2), 33–39 (2004)
Google Scholar
Pantic, M., Valstar, M., Rademaker, R., Maat, L.: Web-based database for facial expression analysis. In: Proceedings of the IEEE International Conference on Multimedia and Expo, ICME. IEEE, Amsterdam (2005)
Parcollet, T., Morchid, M., Linarès, G.: Quaternion convolutional neural networks for heterogeneous image processing. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 8514–8518. IEEE, Brighton (2019)
Parcollet, T., Morchid, M., Linàres, G.: A survey of quaternion neural networks. Artif. Intell. Rev. 53(4), 2957–2982 (2020). https://doi.org/10.1007/s10462-019-09752-1
Article Google Scholar
Parcollet, T., Ravanelli, M., Morchid, M., Linarès, G., Trabelsi, C., de Mori, R., Bengio, Y.: Quaternion recurrent neural networks. In: Proceedings of the 7th International Conference on Learning Representations, ICLR. OpenReview, New Orleans (2019). https://openreview.net/forum?id=ByMHvs0cFQ
Parcollet, T., Zhang, Y., Morchid, M., Trabelsi, C., Linares, G., de Mori, R., Bengio, Y.: Quaternion convolutional neural networks for end-to-end automatic speech recognition. In: Proceedings of Interspeech, pp. 22–26. ISCA, Hyderabad (2018)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning, ICML’13, vol. 28, pp. 1310–1318. PMLR, Atlanta (2013). https://proceedings.mlr.press/v28/pascanu13.html
Pearson, J.K.: Clifford networks. Ph.D. thesis, University of Kent at Canterbury (1995)
Pei, S.C., Ding, J.J., Chang, J.H.: Efficient implementation of quaternion Fourier transform, convolution, and correlation by 2D complex FFT. IEEE Trans. Signal Process. 49(11), 2783–2797 (2001). https://doi.org/10.1109/78.960426
Article ADS MathSciNet Google Scholar
Pei, S.C., Ding, J.J., Chang, J.H.: Efficient implementation of quaternion Fourier transform, convolution, and correlation by 2D complex FFT. IEEE Trans. Signal Process. 49(11), 2783–2797 (2011). https://doi.org/10.1109/78.960426
Article ADS Google Scholar
Poggio, T., Anselmi, F., Rosasco, L.: I-theory on depth vs width: hierarchical function composition. Tech. Rep. CBMM Memo No. 041, Massachusetts Institute of Technology, MA (2015)
Poggio, T., Mutch, J., Leibo, J., Rosasco, L., Tacchetti, A.: The computational magic of the ventral stream: towards a theory. Tech. Rep. MIT-CSAIL-TR-2012-035, Massachusetts Institute of Technology, Cambridge (2012)
Qi, C., Yi, L., Su, H., Guibas, L.: PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 77–85. IEEE, Long Beach (2017)
Rhoby, A.: Text as art? Byzantine inscriptions and their display. In: Berti, I., Bolle, K., Opdenhoff, F., Stroth, F. (eds.) Writing Matters. Presenting and Perceiving Monumental Inscriptions in Antiquity and the Middle Ages, 1st edn., pp. 265–284. De Gruyter, Berlin (2017). https://doi.org/10.1515/9783110534597-011
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation (2015). Preprint at arXiv:1505.04597
Said, S., Le Bihan, N., Sangwine, S.J.: Fast complexified quaternion Fourier transform. IEEE Trans. Signal Process. 56(4), 1522–1531 (2008). https://doi.org/10.1109/TSP.2007.910477
Article ADS MathSciNet Google Scholar
Sangwine, S.J.: Colour image edge detector based on quaternion convolution. Electron. Lett. 34(10), 969–971 (1998). https://doi.org/10.1049/el:19980697
Article ADS Google Scholar
Sangwine, S.J., Ell, T.A.: Colour image filters based on hypercomplex convolution. IEE Proc. Vis. Image Signal Process. 147(2), 89–93 (2000). https://doi.org/10.1049/ip-vis:20000211
Article Google Scholar
Sangwine, S.J., Le Bihan, N.: Quaternion polar representation with a complex modulus and complex argument inspired by the Cayley–Dickson form. Adv. Appl. Clifford Algebras 20, 111–120 (2010). https://doi.org/10.1007/s00006-008-0128-1
Article MathSciNet Google Scholar
Schwartz, C.: Calculus with a quaternionic variable. J. Math. Phys. 50(1), 013523 (2009). https://doi.org/10.1063/1.3058642
Article ADS MathSciNet Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks (2013). Preprint at arXiv:1312.6229
Sfikas, G., Giotis, A., Retsinas, G., Nikou, C.: Quaternion generative adversarial networks for inscription detection in Byzantine monuments. In: Del Bimbo, A., Cucchiara, R., Sclaroff, S., Farinella, G.M., Mei, T., Bertini, M., Escalante, H.J., Vezzani, R. (eds.) Proceedings of the 25th International Conference on Pattern Recognition, ICPR International Workshops and Challenges, vol. 12667, pp. 171–184. Springer International Publishing, Milan (2021)
Google Scholar
Sfikas, G., Retsinas, G., Gatos, B., Nikou, C.: Hypercomplex generative adversarial networks for lightweight semantic labeling. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds.) Proceedings of the 3rd International Conference on Pattern Recognition and Artificial Intelligence, ICPRAI, vol. 13363. Springer International Publishing, Cham (2022)
Google Scholar
Sfikas, G., Retsinas, G., Giotis, A.P., Gatos, B., Nikou, C.: Keyword spotting with quaternionic ResNet: application to spotting in Greek manuscripts. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems: 15th IAPR International Workshop, DAS, vol. 13237, pp. 382–396. Springer International Publishing, La Rochelle (2022)
Chapter Google Scholar
Sfikas, G., Retsinas, G., Giotis, A.P., Gatos, B., Nikou, C.: PIOP-DAS Dataset (2022). https://github.com/sfikas/piop-das-dataset. Accessed 9 May 2023
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017). https://doi.org/10.1109/TPAMI.2016.2572683
Article Google Scholar
Shen, W., Zhang, B., Huang, S., Wei, Z., Zhang, Q.: 3D-Rotation-equivariant quaternion neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Proceedings of the 16th European Conference on Computer Vision, ECCV, vol. 12365, pp. 531–547. Springer International Publishing, Glasgow (2020)
Google Scholar
Sobczyk, G.: New Foundations in Mathematics: The Geometric Concept of Number, 1st edn. Springer Science+Business Media, New York (2013)
Book Google Scholar
Spiegel, M.R., Lipschutz, S., Schiller, J.J., Spellman, D.: Complex Variables, 2nd edn. McGraw-Hill International, New York (2009)
Google Scholar
Stojnic, R., Taylor, R., Kardas, M., Saravia, E., Cucurull, G., Poulton, A., Scialom, T.: Papers with code (2022). https://paperswithcode.com. Accessed 9 May 2023
Sudbery, A.: Quaternionic analysis. Math. Proc. Camb. Philos. Soc. 85(2), 199–225 (1979). https://doi.org/10.1017/S0305004100055638
Article MathSciNet Google Scholar
Taini, M., Zhao, G., Li, S.Z., Pietikainen, M.: Facial expression recognition from near-infrared video sequences. In: Proceedings of the 19th International Conference on Pattern Recognition, ICPR, pp. 1–4. IEEE, Tampa (2008)
Trabelsi, C., Bilaniuk, O., Zhang, Y., Serdyuk, D., Subramanian, S., Santos, J., Mehri, S., Rostamzadeh, N., Bengio, Y., Pal, C.: Deep complex networks. In: Proceedings of the 6th International Conference on Learning Representations, ICLR. OpenReview, Vancouver (2018). https://openreview.net/forum?id=H1T2hmZAb
Ujang, B., Took, C., Mandic, D.: Quaternion-valued nonlinear adaptive filtering. IEEE Trans. Neural Netw. 22(8), 1193–1206 (2011). https://doi.org/10.1109/TNN.2011.2157358
Article Google Scholar
Vieira, G., Valle, M.E.: Acute lymphoblastic leukemia detection using hypercomplex-valued convolutional neural networks (2022). Preprint at arXiv:2205.13273
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pp. 1096–1103. Association for Computing Machinery, Helsinki (2008). https://doi.org/10.1145/1390156.1390294
Vía, J., Ramírez, D., Santamaría, I.: Properness and widely linear processing of quaternion random vectors. IEEE Trans. Inf. Theory 56(7), 3502–3515 (2010). https://doi.org/10.1109/TIT.2010.2048440
Article MathSciNet Google Scholar
Wang, J., Li, T., Luo, X., Shi, Y.Q., Jha, S.K.: Identifying computer generated images based on quaternion central moments in color quaternion wavelet domain. IEEE Trans. Circ. Syst. Video Technol. 29(9), 2775–2785 (2019). https://doi.org/10.1109/TCSVT.2018.2867786
Article Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38(5), 1–12 (2019). https://doi.org/10.1145/3326362
Article Google Scholar
Ward, J.: Quaternions and Cayley numbers, Mathematics and Its Applications, vol. 403. Kluwer Academic Publishers, Dordrecht (1997). https://doi.org/10.1007/978-94-011-5768-1
Book Google Scholar
Wiesler, S., Ney, H.: A convergence analysis of log-linear training. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, pp. 657–665. Curran Associates Inc., Granada (2011). https://doi.org/10.5555/2986459.2986533
Wolpert, D.H., Macready, W.G.: No free lunch theorems for search. Tech. Rep. SFI-WP-95-02-010, The Santa Fe Institute, Santa Fe (1995)
Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997). https://doi.org/10.1109/4235.585893
Article Google Scholar
Woo, S., Park, J., Lee, J.Y., Kweon, I.: CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Proceedings of the 14th European Conference on Computer Vision, ECCV, vol. 11211, pp. 3–19. Springer International Publishing, Munich (2018)
Google Scholar
Wu, J., Xu, L., Wu, F., Kong, Y., Senhadji, L., Shu, H.: Deep octonion networks. Neurocomputing 397, 179–191 (2020). https://doi.org/10.1016/j.neucom.2020.02.053
Article Google Scholar
Wu, W., Qi, Z., Fuxin, L.: PointConv: Deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), CVPR, pp. 9613–9622. IEEE, Long Beach (2019)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1912–1920. IEEE, Boston (2015). https://doi.org/10.1109/CVPR.2015.7298801
Xu, D., Jahanchahi, C., Took, C., Mandic, D.: Enabling quaternion derivatives: the generalized HR calculus. R. Soc. Open Sci. 2(8), 150255 (2015). https://doi.org/10.1098/rsos.150255
Article ADS MathSciNet Google Scholar
Xu, D., Mandic, D.: The theory of quaternion matrix derivatives. IEEE Trans. Signal Process. 63(6), 1543–1556 (2015). https://doi.org/10.1109/TSP.2015.2399865
Article ADS MathSciNet Google Scholar
Xu, D., Xia, Y., Mandic, D.: Optimization in quaternion dynamic systems: Gradient, Hessian, and learning algorithms. IEEE Trans. Neural Netw. Learn. Syst. 27(2), 249–261 (2016). https://doi.org/10.1109/TNNLS.2015.2440473
Article MathSciNet Google Scholar
Xu, D., Zhang, L., Zhang, H.: Learning algorithms in quaternion neural networks using GHR calculus. Neural Netw. World 27, 271–282 (2017). https://doi.org/10.14311/nnw.2017.27.014
Article Google Scholar
Yin, Q., Wang, J., Luo, X., Zhai, J., Jha, S.K., Shi, Y.Q.: Quaternion convolutional neural network for color image classification and forensics. IEEE Access 7, 20293–20301 (2019). https://doi.org/10.1109/ACCESS.2019.2897000
Article Google Scholar
Zang, D., Chen, X., Lei, J., Wang, Z., Zhang, J., Cheng, J., Tang, K.: A multi-channel geometric algebra residual network for traffic data prediction. IET Intell. Transp. Syst. 16(11), 1549–1560 (2022). https://doi.org/10.1049/itr2.12232
Article Google Scholar
Zhang, A., Tay, Y., Zhang, S., Chan, A., Luu, A., Hui, S., Fu, J.: Beyond fully-connected layers with quaternions: Parameterization of hypercomplex multiplications with $1/n$ parameters. In: Proceedings of the 9th International Conference on Learning Representations, ICLR. OpenReview, Vienna (2021). https://openreview.net/forum?id=rcQdycl0zyk
Zhou, P., Han, X., Morariu, V., Davis, L.: Learning rich features for image manipulation detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1053–1061. IEEE, Salt Lake City (2018)
Zhou, Z., Liang, J., Song, Y., Yu, L., Wang, H., Zhang, W., Yu, Y., Zhang, Z.: Lipschitz generative adversarial nets. In: K. Chaudhuri, R. Salakhutdinov (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML’19, vol. 97, pp. 7584–7593. PMLR, Long Beach (2019). https://proceedings.mlr.press/v97/zhou19c.html
Zhu, X., Xu, Y., Xu, H., Chen, C.: Quaternion convolutional neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Proceedings of the 14th European Conference on Computer Vision, ECCV, vol. 11212, pp. 645–661. Springer International Publishing, Munich (2018)
Google Scholar

Download references

Acknowledgements

G.Altamirano received a Postdoctoral Fellowship Estancias Posdoctorales por México from CONACYT. C. Gershenson acknowledges support from UNAM-PAPIIT (IN107919, IV100120, IN105122), and the PASPA program from UNAM-DGAPA.

Author information

Authors and Affiliations

Universidad Nacional Autónoma de México, Circuito Escolar S/N, Ciudad Universitaria, Coyoacán, Mexico City, 04510, Mexico
Gerardo Altamirano-Gomez & Carlos Gershenson

Authors

Gerardo Altamirano-Gomez
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Gershenson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gerardo Altamirano-Gomez.

Ethics declarations

Conflict of Interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Machine-Learning Mathematical Structures edited by Yang-Hui He, Pierre Dechant, Alexander Kasprzyk, and Andre Lukas.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Altamirano-Gomez, G., Gershenson, C. Quaternion Convolutional Neural Networks: Current Advances and Future Directions. Adv. Appl. Clifford Algebras 34, 42 (2024). https://doi.org/10.1007/s00006-024-01350-x

Download citation

Received: 24 July 2023
Accepted: 09 August 2024
Published: 28 August 2024
DOI: https://doi.org/10.1007/s00006-024-01350-x

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Quaternion Convolutional Neural Networks: Current Advances and Future Directions

Abstract

Similar content being viewed by others

A survey of quaternion neural networks

A Comprehensive Review on the Advancement of High-Dimensional Neural Networks in Quaternionic Domain with Relevant Applications

Centrosymmetric constrained Convolutional Neural Networks

Explore related subjects

1 Introduction

2 Quaternion Algebra

3 QCNN Components

3.1 Quaternion Convolution Layers

3.1.1 Future Directions

3.2 Quaternion Fully Connected Layers

3.2.1 Future Directions

3.3 Quaternion Pooling

3.3.1 Future Directions

3.4 Quaternion Batch Normalization

3.4.1 Future Directions

3.5 Activation Functions

3.5.1 Future Directions

3.6 Training

3.6.1 Future Directions

3.7 Quaternion Weight Initialization

3.7.1 Future Directions

4 Architectural Design for Applications

4.1 Vision

4.1.1 Vision Under the Classic Convolution Paradigm

4.1.2 Vision Under the Geometric Paradigm

4.1.3 Vision Under the Equivariant Paradigm

4.2 Language

4.3 Forecasting

4.4 Insights

5 Further Discussion

5.1 On Proper Comparisons

5.2 Mapping from Data to Quaternions

5.3 Extension to Hypercomplex and Geometric Algebras

5.4 Machine-Learning Mathematical Structures

5.5 Quaternions on the Frequency Domain

5.6 Might the Classic, Geometric, and Equivariant Models be Special Cases of a Unified General Model?

6 Conclusions

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation