Introduction

The processing of high-dimensional data is one of the most difficult and time-consuming tasks for researchers, as it necessitates the employment of specialized approaches to maintain the relationship between characteristics. According to current research, quaternion has the capacity to keep the relationship between characteristics of high-dimensional data [1,2,3]. In the recent research, the quaternion is employed for the applications containing 3D and 4D related issues, particularly in the encoding and processing of color image pixels with red, green, and blue channels [4, 5]. A quaternion is a four-dimensional hyper-complex number extended from a complex number (\({\mathbb {C}}\)), first proposed by Irish mathematician Sir WR Hamilton in 1843 [6]. In modern mathematics, it establishes four-dimensional associative norms and non-commutative division algebras over the real numbers. The Clifford classification also expresses the algebra of this number, where quaternion presents a unique state analysis [7]. In addition, the Frobenius theorem also states that the quaternion has two finite-dimensional division rings: one ring containing real numbers is a proper subring and the other ring will act as the complex numbers [8]. These rings also form Euclidean Hurwitz algebras, where the quaternion presents the associative algebra [9]. Furthermore, the quaternion number has been also used to prove the four-square Lagrange theorem in the number theory. This theorem states that every non-negative integer can be computed by the sum of the squares of four integers [10]. Thus, this theorem can be used in noteworthy applications related to mathematics, such as number and combinatorial design theories.

It is prominent that quaternion (\(q=\Re (q)+\Im _1(q)i + \Im _2(q)j+ \Im _3(q)k\)) consists of four orthogonal bases 1, ijk. These bases satisfy the Hamilton’s rules (\(i \otimes i=-1; j \otimes j=-1; k \otimes k=-1; i \otimes j \otimes k=-1; i \otimes j=k=-j \otimes i; j \otimes k=i=-k \otimes j=i; k \otimes i=j=-i \otimes k\), where the symbol \(\otimes \) denotes the non-commutative quaternionic multiplication) [3]. The beauty of this number is to process four real numbers simultaneously using a single body representation with one real (\(\Re (q)\)) and three distinct imaginary components (\(\Im _1(q),\Im _2(q),\Im _3(q)\)). Although the multiplication of this number is non-commutative in nature, but its spatial rotation can be modeled and visualized by the quaternion Hamilton product. Because of its numerous merits, such as avoiding gimbal lock, robust and compact representation, and rotation in 4D spaces, the quaternion is an essential and demanding tool for researchers working in 3D (using pure quaternions) and 4D environments. These merits emphasize the importance of quaternions in both pure and applied mathematics, primarily for visualizing 3D rotations in attitude control [11, 12], orbital mechanics [13, 14], molecular dynamics [15, 16], image and signal processing [17, 18], computer vision and robotics [19], bioinformatics [20, 21], and crystallographic texture [22]. The functions of quaternionic numbers also recommend to build the physical models like functions of a complex variable. For example, the electric and magnetic fields are represented using functions of quaternionic numbers in 3D space and the Mandelbrot and Julia sets in 4D space [23].

The correlations among the components of quaternionic signals are requisite and essential elements in neural networks for better learning and generalization. The compact behavior and special representation ability of quaternions build the simple natural structure that learns the inter- and intra-dependencies among high-dimensional features. In the recent studies [3, 24, 25], it has been mentioned that neural networks with quaternions provide a smaller number of neural parameters with superior learning and better generalization capability as compared to networks with real or complex values. Apart from these, quaternion-valued neural networks (QVNN) store and learn the spatial relationships in the various transformations of 3D coordinates [2, 3] and in between the color pixels [26], whereas real/complex-valued neural network fails. These qualities have motivated the researchers to apply QVNN in many fields such as automatic speech recognition [27, 28], image classification [29], PolSAR land classification [30], prostate cancer Gleason grading [31], color image compression [32], facial expression recognition [33], robot manipulator [34], spoken language understanding [35], attitude control of spacecraft [36], and banknote classification [37]. However, 3D or 4D information has been processed using a real-valued neural network (RVNN) where all components are considered separately in which it is neglected the correlation among each other, addressed in [38]. It is also not the natural way to process the high-dimensional signals by neglecting the correlations among their components. In the literature, it has been investigated that the neural network in the quaternionic domain learns the amplitude as well as phase information of quaternionic signals effectively [39]. In [35], the real-valued multi-layer perceptron (MLP) and quaternion-valued multi-layer perceptron (QMLP) are also compared for the identification of spoken dialogues and the result has been reported that QMLP takes lesser number of epochs and better accuracy as compared to MLP. From the perspective of a biological neuron, its action potential may have heterogeneous pulse configurations and diverse separation among pulses. The configuration associated with action potential imparts an idea to consider a quaternionic signal containing amplitude and phase information (correlations among quaternion’s components) which is the most promising.

Neural networks are preferred over any other machine learning (ML) algorithm, because it captures non-linearity among data, which is important for researchers dealing with real-world problems. For handling the non-linear relationships, a modified group method of data handling (GMDH) [40] is proposed to improve the accuracy and replaced the conventional polynomial functions with ML models. However, capturing of non-linearity by neural networks depends on the selection of activation functions. The main purpose of building an artificial neuron is to establish a mathematical model that approximates the functional capability of the biological neuron. The renowned and basic McCulloch–Pitts neuron [41] is the linear model where the net action potential is calculated through the weighted summation of input, and this has further extended in complex [42, 43] and quaternionic domains [44] for specific applications. However, this neuron model has not considered the possibility and capability of non-linear aggregation. It has also neglected the character of dendrites for information processing in the cell body of the neural system. Many literatures have been addressed that the computational capability of a biological neuron depends on how the input signals are aggregated in the cell body [45,46,47]. Also, the aggregation based on non-linear combination with inputs increases the computation power of artificial neurons [48,49,50]. In an experimental study of a cat’s visual system, it is observed that the binocular interaction in its neural system is aggregated through the multiplicative fashion [51].

The neurobiologists are studying the non-linear aggregation based on the multiplication of input signals due to three basis intentions: (1) how the input signals with their synaptic weights process through dendrites in a non-linear manner, (2) how to analyze the behavior of its output signal, and (3) how the simple but non-linear model recognizes the complicated behavior using possible biological methods. These studies have indicated that multiplicative aggregation has a major role in the artificial neural system, particularly for computing the motion perception and its learning [52,53,54]. Further, the Weierstrass–Stone theorem proved that the multiplication-based network containing only input–output layers can be modeled using any function that should be continuous in a finite interval [55]. This theorem also works for the unsupervised learning of the novel network using the blind source separation problem containing mixed signals with non-linearity [56, 57]. The neuron based on multiplicative aggregation in the real domain and its learning methodology has been investigated the computational capabilities in [58], and most recently, the network containing multiplicative neurons in the complex domain has been also verified its computational power [59] but not investigated the computational capability in quaternionic domain. In the past studies, some evidence has been also addressed that the aggregation of input signals is based on the multiplication in the nervous system of several animals [51, 60, 61]. There are some other non-linear neuron models based on higher order aggregation addressed in [62, 63], but they are mathematically complex in structure and tough to train, especially for quaternionic-valued signals. However, the polynomial neuron for large number of inputs is exceedingly difficult to train a network because of higher order terms. Thus, it is very demanding for the neurocomputing community to construct non-linear neuron models with a simple structure. These pieces of evidence influenced to propose a simple and non-linear quaternionic-valued multiplicative (QVM) neuron model. This neuron model is the simpler model of polynomial neuron and its training process is done by the standard backpropagation algorithm. This model processes quaternionic-valued signals with their quaternionic weights in non-linear manners to produce a quaternionic-valued output signal. The linearity of the proposed neuron is established by the summation of quaternionic-valued input and weight product, and the non-linearity has been achieved by non-commutative multiplication of all linearly associated quaternionic input-weight terms. The multiplication of quaternionic signals only restrict to the sum of products of basis elements excluding higher order expression that provides the simplicity of QVM neuron model. In this paper, the different benchmark problems have been considered in the analysis of the computational capabilities of proposed neurons experimentally. Its performance is comparatively analyzed through network topology (parameters), no. of average epochs, training MSE, training and testing error variances, and Akaike information criterion (AIC) [64]. The proposed QVM neuron based on multiplication is compared experimentally with summation-based conventional and root-power mean (RPM) neurons in quaternionic domain [2]. The potential (V) of quaternionic-conventional neurons for N number of quaternionic-valued signals (\(q_i; i=1\cdots N\)) is calculated using weighted sum of quaternionic input signals; mathematically, it is represented as

$$\begin{aligned} V = \sum \limits _{i=1}^{N} w_{i} \otimes q_{i} + w_{0} \otimes q_{0} \end{aligned}$$

where input weights (\(w_i\)) and bias weight (\(w_0\)) are quaternions. And, the potential (V) of RPM aggregation-based neuron is calculated as weighted sum of RPMs of input signals and is given by

$$\begin{aligned} V = \left( \sum \limits _{i=1}^{N} w_{i} \otimes q_{i}^{\alpha } + w_{0} \otimes q_{0}^{\alpha } \right) ^{\frac{1}{\alpha }}, \end{aligned}$$
Fig. 1
figure 1

Quaternionic-valued multiplicative neuron model

where input weights (\(w_i\)) and bias weight (\(w_0\)) are quaternions, and \(\alpha \) is the power coefficient and is used to adjust the degree of approximation. \(\alpha \) can be set to any value in range (\(-\infty \), \(\infty \)), and when \(\alpha =-1; 1; 2\), then V will act as harmonic, arithmetic, and quadratic means, respectively. Also, the RPM neuron will become a conventional neuron when \(\alpha =1\). Apart from these neurons, quaternion multi-valued network is proposed in [65] which is also based on the linear aggregation and neglects the non-linear aggregation, which presents better computational power as compared to linear aggregation. However, activation function used in [65] applied over all components of quaternionic which maintains the inter-relationships among their components. Although, QVM considered a general activation function that can be selected any activation function on each component of quaternionic signal which will depend on the intended applications. The main contributions of this papers are addressed as follows:

  1. 1.

    The multiplicative neuron in quaternionic domain (QVM) with its backpropagation algorithm is proposed.

  2. 2.

    The QVM neuron is non-linear neuron model with simple structure (lower class of polynomial neuron).

  3. 3.

    The proposed work is compared with existing quaternionic-valued conventional and root-power mean neurons through benchmark problems.

  4. 4.

    The results of proposed QVM neuron model report better in terms of network topology (parameters), average epochs, training MSE, training and testing error variances, and AIC.

The detailed description of the remaining parts of this paper presented section wise as follows: The section “Quaternionic-valued multiplicative neuron model” explains the designing concept of the proposed quaternionic-valued multiplicative (QVM) neuron model. The section “Learning rule” presents the learning process of a three-layered quaternionic-valued neural network (QVNN) embedded with the proposed QVM neurons and its learning rules is derived step by step using a quaternionic-backpropagation algorithm in Appendix A. The section “Performance assessment of QVM neuron using benchmark problems” investigates the performance capability of the proposed QVM neuron through the wide spectrum of benchmark problems such as 3D and 4D chaotic time-series prediction, 3D transformations, and 3D face recognition. And finally, the section “Conclusion” exhibits the conclusion with future perspectives of this work.

Quaternionic-valued multiplicative neuron model

In the past pieces of literature, various neuron models based on linear or non-linear aggregation have been proposed. Most of them are linearly aggregated for real-, complex-, or quaternionic-valued signals [41, 42]. Although, a few non-linear neurons have been also designed for achieving better computational power in the recent past [2, 3, 66]. These literatures have addressed that non-linear aggregative neurons have the learnability advantage over linear aggregation-based neurons. However, non-linear aggregation is more computationally expensive and complicated in structure than linear aggregation, but the non-linear aggregation-based models require a lesser number of neurons and layers in training. The advantages have motivated us to design a computationally powerful and novel neuron model based on a simple non-linear aggregation of input signals, especially for non-commutative quaternionic-valued signals. Hence, for the processing of quaternionic-valued signals, a simple and non-linear quaternionic-valued multiplicative (QVM) neuron has been proposed and its basic structure is shown in Fig. 1. The net potential function of the QVM neuron is governed by the non-linear aggregation of linearly associated quaternionic input signals with weights. Its non-linearity has been accomplished by using the multiplication of multiple linearly weighted inputs. Its bias weights are adaptable during the learning process, which provides better accuracy.

The resulting quaternionic-valued multiplicative (QVM) neuron uses Hamilton’s product in the aggregation of multiple weighted quaternion-valued inputs. The proposed QVM neuron as shown in Fig. 1 processes L quaternionic-valued inputs with a multiplicative aggregation of their corresponding weights and produces a quaternionic-valued output, where all inputs, weights, and an output are considered in quaternion number, e.g., \(q =\Re (q)+\Im _1(q)i+\Im _2(q)j+\Im _3(q)k\). Each input \([q_1,q_2,\ldots ,q_l,\ldots ,q_L]\) passes, respectively, with quaternionic-valued weights \([w_{1m},w_{2m},\ldots ,w_{lm},\ldots ,w_{Lm}]\), where m presents the general mth neuron of the quaternionic-valued. Let \(q^0\) denotes the bias input, where \(\Re (q^0)=\Im _1(q^0)=\Im _2(q^0)=\Im _3(q^0)=1\) and \([w^0_{1m},w^0_{2m},\ldots ,w^0_{lm},\ldots ,w^0_{Lm}]\) denotes a bias weight vector that contains the bias weights for each bias input. The net internal potential \(V_m\) and output \(Y_m\) of mth QVM neuron are defined, respectively, as

$$\begin{aligned} V_m = \,\, ^{\otimes }\prod \limits _{l=1}^{L} \left( w_{lm}\otimes q_{l} + w^{0}_{lm}\otimes q^{0}\right) \end{aligned}$$
(1)

and

$$\begin{aligned} Y_m = f_{\mathbb {H}}(V_m). \end{aligned}$$
(2)

The \(\otimes \) notation prsented in Eq. 1 denotes the quaternionic multiplication operator in non-commutative manner and the symbol \(^{\otimes }\prod \) represents the quaternionic product of quaternionic variables that multiply its terms one by one from right end (e.g., \(^{\otimes }\prod _{i=1}^{t}r_i=r_1\otimes r_2\otimes \cdots \otimes r_t\)) in Eq. 1. The quaternionic activation function is denoted by \(f_{\mathbb {H}}\) in Eq. 2.

Learning rule

The proposed quaternionic-valued multiplicative (QVM) neuron can be embedded in multi-layered neural networks for quaternionic-valued signal processing. This network structure is similar to the multi-layered perceptron employed with a conventional neuron (linear aggregation of input signals) in the quaternionic domain [1, 67]. All conventional neurons are replaced by the proposed QVM neurons in both hidden and output layers of the network which is shown in Fig. 2. The output of the QVM neuron is obtained by applying a split-type quaternion version of the activation function (\(f_{\mathbb {H}}\)) on quaternionic-valued net internal potential (V). This activation function \(f_{\mathbb {H}}\) is the 4D extension of a real-valued activation function (\(f_{\mathbb {R}}\)) which is addressed in [1, 38, 68]. If the net internal potential of quaternionic-valued neuron is expressed as \(V=\Re (V)+\Im _1(V)i+\Im _2(V)j+\Im _3(V)k\), then its output is computed as follows:

$$\begin{aligned} f_{\mathbb {H}}(V)&= f_{\mathbb {R}}\big (\Re (V)\big )+f_{\mathbb {R}}\big (\Im _1(V)\big )i \nonumber \\&\quad +f_{\mathbb {R}}\big (\Im _2(V)\big )j+f_{\mathbb {R}}\big (\Im _3(V)\big )k. \end{aligned}$$
(3)
Fig. 2
figure 2

Three-layered network with quaternionic-valued multiplicative neuron

The outputs of the most of non-linear neurons in the real/complex/quaternionic domain produce solely results when applying non-linear activation function. However, there are very few fully non-linear and analytical quaternionic-valued activation functions available and they also require extensive and careful training [25]. The widely used split-type activation functions do not model the local interdependencies [68]. However, split activation function provides the simplicity for computation of output of a neuron for both complex-valued and quaternionic-valued signals. It is also used due to prior investigation and better stability (pure quaternion activation functions contain the singularities) addressed in [69]. To tackle this problem, the non-linear aggregation of inputs can be used, which will enable us to choose from a wide range of split functions while preserving the local interdependencies to some degree. Equation 3 represents an activation function of quaternionic-valued signals (V) in which a real-valued activation \(f_{\mathbb {R}}(.)\) is splitting on each component of V. In a few literatures, various linear and non-linear activation functions for real-valued signals are addressed and investigated their computational capabilities [70]. In this paper, we have used linear, sigmoid, and hyperbolic-tangent functions as activation functions for the considered benchmark problems. However, its selection criteria are based on specific applications. In the learning algorithm, we have considered the general activation function by which the specific activation function can be selected for the intended applications.

The quaternionic version of conventional neuron is replaced by the proposed quaternionic-valued multiplicative (QVM) neuron in the fully connected three-layered neural network containing L number of quaternionic input signals, M number of QVM neurons in hidden layer, and N number of QVM neurons in output layer, as shown in Fig. 2. In this network, all inputs, weights, and outputs including bias input and its weight are considered in quaternion. Let \(q_l\) and \(Y_n\) be the general \(l^{th} (l=1, \ldots , L)\) quaternionic input and \(n^{th} (n=1, \ldots , N)\) output signals, respectively. Let \(w_{lm}\) and \(w_{mn}\) be the general weights used for the connections from lth input to mth hidden neuron and mth hidden neuron to nth output neuron, respectively, where \(m=1,\ldots , M\). Let the bias input is \(q^0=1+i+j+k\) and its quaternionic conjugate \(\overline{q^0}=1-i-j-k\). The net internal potential (\(V_m\)) of mth QVM neuron at hidden layer is defined as follows:

$$\begin{aligned} V_m&=\,\,^\otimes \prod \limits _{l=1}^{L} \left( w_{lm}\otimes q_{l} + w^{0}_{lm}\otimes q^{0}\right) \nonumber \\&= V_m^{\text {LT}}\otimes \varOmega _{lm}\otimes \ V_m^{\text {RT}}, \end{aligned}$$
(4)

where \(\varOmega _{lm} = w_{lm}\otimes q_{l}+w^{0}_{lm}\otimes q^{0}\) is the particular term that associates lth input and mth hidden neuron, \(V_m^{\text {LT}}=^\otimes \prod _{l=1}^{l-1}\varOmega _{lm}\) is the aggregation function that aggregates all left terms (LT) of \(\varOmega _{lm}\), and similarly, \( V_m^{\text {RT}}=^\otimes \prod _{l=l+1}^M\varOmega _{lm}\) aggregates all right terms (RT) of \(\varOmega _{lm}\). The output (\(Y_m\)) of mth QVM neuron is obtained by the quaternionic activation function (\(f_{\mathbb {H}}\)) of \(V_m\) as follows:

$$\begin{aligned} Y_m = f_{\mathbb {H}}(V_m). \end{aligned}$$
(5)

Similarly, the net internal potential (\(V_n\)) of nth QVM neuron at output layer is defined as

$$\begin{aligned} V_n&= \,\,^\otimes \prod \limits _{l=1}^{M} \left( w_{mn}\otimes Y_{m} + w^{0}_{mn}\otimes q^{0}\right) \nonumber \\&= V_n^{\text {LT}}\otimes \varOmega _{mn}\otimes \ V_n^{\text {RT}}, \end{aligned}$$
(6)

where \(\varOmega _{mn} = w_{mn}\otimes Y_{m}+w^{0}_{mn}\otimes q^{0}\), \(V_n^{\text {LT}}=^\otimes \prod _{m=1}^{m-1}\varOmega _{mn}\), and \( V_n^{\text {RT}}=^\otimes \prod _{m=m+1}^M\varOmega _{mn}\). The output (\(Y_n\)) of nth hidden QVM neuron is computed as

$$\begin{aligned} Y_n = f_{\mathbb {H}}(V_n). \end{aligned}$$
(7)

Let E be the real-valued cost function which provides the total error of the feed-forward network. This function is computed by the mean square error (MSE) as follows:

$$\begin{aligned} E =\frac{1}{2N} \sum \limits _{n=1}^{N} e_n\otimes \overline{e_n}, \end{aligned}$$
(8)

where \(e_n\) represents the error at nth QVM neuron at output layer. This error is calculated by taking the difference between desired output (\(Y^{D}_n\)) and actual output (\(Y_n\)) (i.e., \( e_n =Y^{D}_n - Y_n \)). The expression (\(e_n\otimes \overline{e_n}\)) presented in Eq. 8 provides a real value, because the quaternionic multiplication of any quaternion number with its conjugate yields the real number. The gradient descent-based quaternionic backpropagation (QBP) is used to deduce the weight updates (\(\Delta w\)) through the minimization of the cost function (E). The objective error function E is minimized by recursively updation of all weights associated with the network. The weight update \(\Delta w\) can be calculated by negative quaternionic gradient of real-valued cost function \((\nabla _w{E})\). This gradient can be evaluated using partial derivative with respect to a real and other three imaginary components of quaternionic weights which is expressed as

$$\begin{aligned} \Delta w&= -\eta \nabla _w{E} \nonumber \\&= -\eta \left\{ \frac{\partial E}{\partial \Re (w)}+\frac{\partial E}{\partial \Im _1(w)}i+ \frac{\partial E}{\partial \Im _2(w)}j+\frac{\partial E}{\partial \Im _3(w)}k\right\} , \end{aligned}$$
(9)

where \(\eta \) represents the learning rate lies in the interval (0,1]. Any weight updates \(\Delta w=\Delta w_{mn}\) (i.e., the weight updates between hidden and output layers) can be deduced by the chain rule of the derivative with respect to \(w_{mn}\) and its simplified form is expressed as

$$\begin{aligned} \Delta w_{mn}=\frac{\eta }{N}\xi _{n}\otimes \overline{Y_m} \end{aligned}$$
(10)
$$\begin{aligned} \Delta w_{mn}^0=\frac{\eta }{N}\xi _{n}\otimes \overline{q^0}, \end{aligned}$$
(11)

where

$$\begin{aligned} \xi _{n}= \overline{V_n^{\text {LT}}}\otimes \big \{ e_n \odot f_{\mathbb {H}}'(V_n) \big \}\otimes \overline{V_{n}^{\text {RT}}}. \end{aligned}$$

The symbol \(\odot \) denotes the component-wise multiplication of quaternionic variables (e.g., \(q_1 \odot q_2=\Re (q_1)\Re (q_2)+\Im _1(q_1)\Im _1(q_2)i+\Im _2(q_1)\Im _2(q_2)j+\Im _3(q_1)\Im _3(q_2)k\), where \(q_1\) and \(q_2\) are two arbitrary quaternionic variables). Similarly, for the input-hidden weight updates, \(\Delta w=\Delta w_{lm}\) is also derived by chain rule of derivation and its simplified form is expressed as

$$\begin{aligned} \Delta w_{lm} = \frac{\eta }{N} \xi _{m}\otimes \overline{q_l} \end{aligned}$$
(12)
$$\begin{aligned} \Delta w_{lm}^0 =\frac{\eta }{N} \xi _{m}\otimes \overline{q^0}, \end{aligned}$$
(13)

where

$$\begin{aligned} \xi _{m}= \overline{V_m^{\text {LT}}}\otimes \left\{ \left\{ \sum \limits _{n=1}^N\overline{w_{mn}}\otimes \xi _{n}\right\} \odot f'_{\mathbb {H}}(V_m)\right\} \otimes \overline{V_m^{\text {RT}}}. \end{aligned}$$

The complete derivation of all weight updates is given in Appendix A. The pseudocodes of learning procedures for the proposed network are also represented as:

figure a
figure b
figure c
figure d

Performance assessment of QVM neuron using benchmark problems

In this section, we have conducted various experiments for evaluating the training and testing performances of proposed quaternionic-valued multiplicative (QVM) neuron-based neural network through a wide spectrum of benchmark problems such as 3D and 4D chaotic time-series predictions, 3D transformations, and 3D face recognition. Its performance is compared against conventional and RPM neurons through network topology (parameters), no. of average epochs, training MSE, training and testing error variances, and Akaike information criterion (AIC). For the learning of 3D transformations (scaling, translation, and rotation), a set of few points lying on a 3D straight line is considered and its generalization abilities are tested through the complex geometrical objects in 3D space. As a biometric application, we have opted for two primary datasets of the same and different 3D human faces for the classification of faces. This application will surely provide an idea to perspective researchers working in computer vision. All the experiments are conducted ten times using same number of parameters and same learning rule for same number of epochs with variable the initial weights. The average of all statistical parameters is reported for the comparison of networks.

Table 1 Comparative performance analysis for the prediction of 3D chaotic time-series of Chua’s circuit

3D chaotic time-series prediction problems

3D chaotic time-series prediction for Chua’s circuit

The Chua’s circuit containing one locally active resistor, one Chua’s diode as non-linear negative resistance, and three energy-storage elements (two capacitors and an inductor) generates chaotic time-series. This circuit shows the chaotic behavior, because it satisfies all three criteria of any chaotic circuit consisting of one or more non-linear elements, one or more locally active resistors, and three or more energy-storage elements. The dynamics of Chua’s circuit is governed by the following ordinary differential equations:

$$\begin{aligned} \frac{\textrm{d}x(t)}{\textrm{d}t}&= \alpha \Big [y(t) - x(t) - f\big ((x(t) \big ) \Big ] \nonumber \\ \frac{\textrm{d}y(t)}{\textrm{d}t}&= x(t) - y(t) + z(t) \nonumber \\ \frac{\textrm{d}z(t)}{\textrm{d}t}&= - \beta y(t) - \gamma z(t), \end{aligned}$$
(14)

where the symbols \(\alpha \), \(\beta \), and \(\gamma \) (\(\alpha \ge 0, \beta \ge 0, \gamma \ge 0\)) are the parameters which are computed by the specific values of the circuit components. The function \(f\big (x(t) \big )\) is piecewise-linear function in three segments that provides the variation of resistance with respect to the current flowing across the Chua’s diode. This function is mathematically expressed as

$$\begin{aligned} f\big ((x(t)\big )= & {} m_1 x(t) + \frac{1}{2} (m_0 - m_1)\\{} & {} \times \big [ |x(t)+1|-|x(t)-1| \big ], \end{aligned}$$

where the symbols \(m_0\) and \(m_1\) present the slopes of the inner and outer segments of the function \(f\big (x(t)\big )\), respectively. The variables x(t) and y(t) present the voltages across the capacitors and the variable z(t) presents the current flowing through the inductor. To predict the chaotic behavior of the Chua’s circuit, QVM neuron-based network has used effectively, because the network processes quaternionic-valued signal containing four real values in a single body. For this problem, the quaternion can be transformed in pure quaternion by substituting real part zero or near to zero (\(q=0+x(t)i +y(t)j +z(t)k\)).

This circuit exhibits the double scrolled chaotic attractor in 3D space under specific values of its parameters \(\alpha =15.6, \beta =28, \gamma =0, m_0=-\,1.143\), and \(m_1=-\,0.714\). The chaotic system presented in Eq. 14 generates a chaotic time-series of 10, 000 terms when we consider initial voltages \(x(t)=0.1\) and \(y(t)=0.1\), and initial current \(z(t)=0.1\) and time step 0.01 s. The generated time-series is first normalized between \((-0.8, 0.8)\), and then, its terms are transformed in the form of pure quaternions. Initial 1000 terms out of total 10, 000 terms are used training and the rest 9000 terms are used for testing through quaternionic-valued neural networks. At hidden and output layers of the networks, we have used the sigmoid and hyperbolic-tangent activation functions, respectively. Table 1 presents the comparative analysis of training and testing of the proposed QVM neuron model against the conventional and RPM neuron models, which reports better results in all perspectives such as network topology, MSE training, variance, and AIC. The learning curves presented in Fig. 3 also prove the slightly faster convergence of the proposed neuron model. The 3D graphical view of the testing result is also demonstrated by double scrolled chaotic attractor as shown in Fig. 4. The training and testing results report evidence that QVM neurons are much more superior to conventional and RPM neurons.

Fig. 3
figure 3

Learning curves of the networks for 3D chaotic time-series prediction of Chua’s circuit

Fig. 4
figure 4

Comparative testing demonstration for the prediction of 3D chaotic time-series of Chua’s circuit

Table 2 Comparative performance analysis for the prediction of 3D chaotic time-series of Lorenz system

3D chaotic time-series prediction for the Lorenz system

The Lorenz system considered as a benchmark problem for predicting its chaotic behavior in 3D space. It exists in several interdisciplinary fields such as atmospheric convection, thermosyphons, light amplification, chemical reaction, and electrical circuits. However, it behaves chaotically in nature under specific values of the initial condition and its parameters. This system is expressed by the following system of three differential equations as:

$$\begin{aligned} \frac{\textrm{d}x(t)}{\textrm{d}t}&= \sigma \big (y(t) - x(t) \big ) \nonumber \\ \frac{\textrm{d}y(t)}{\textrm{d}t}&= x(t) \big ( \rho - z(t) \big )- y(t) \nonumber \\ \frac{\textrm{d}z(t)}{\textrm{d}t}&= x(t)y(t)- \beta z(t), \end{aligned}$$
(15)

where the symbols \(\sigma \), \(\rho \), and \(\beta \) present its constant parameters [\(\sigma = 15\), \(\rho = 28\), and \(\beta = 8/3\)]. The variables x(t), y(t), and z(t) vary with time and show chaotic behavior in 3D space under specific value of its parameters. As chaotic systems, it magnifies significantly in just a small perturbation in the initial conditions and also gets a completely new series. Thus, the prediction of its series is challenging and demanding for the neurocomputing community. The network with QVM neurons for quaternionic-valued signals can predict its sequence, because its variables can be expressed as a pure quaternionic signal (\(0+x(t)i+y(t)j+z(t)k\)) and it can be processed as a single quaternionic number in the quaternionic-valued neural network which preserves the amplitude and correlation among variables. Its 6000 terms between a time interval of 0–60 s under initial condition and parameters are generated by the system of equations presented in Eq. 15 and then normalized in range of \((-\,0.8, 0.8)\). The first 500 terms from the complete normalized set are used for training of the networks, and the rest of the terms are used for testing. We have used the sigmoidal and linear activation functions for neurons at hidden and output layers of the network, respectively. The training and testing results reported in Table 2 present that a simple neural structure with proposed neurons is capable to learn its chaotic behavior better and also outperforms the conventional neural network. The testing results are demonstrated in Fig. 6 and also reported through statistical parameters in Table 2. The faster learning of the network with proposed QVM neurons over the conventional or RPM neurons based quaternionic-valued network is also demonstrated in Fig. 5. Overall, all results reveal that the proposed QVM neuron is significantly better than the quaternionic version of the conventional and RPM neurons.

Fig. 5
figure 5

Learning curves of the networks for 3D chaotic time-series prediction of the Lorenz system

Fig. 6
figure 6

Comparative testing demonstration for the prediction of 3D chaotic time-series of the Lorenz system

4D time-series prediction problems

Circular noise-based linear autoregressive

The circular noise-based linear autoregressive process is considered a vital and standard problem for the prediction of its quaternionic-valued signals. The stable version of the linear autoregressive filtering process is defined by the following recurrence function:

$$\begin{aligned} O(n)&=1.73 \times O(n-1)-1.82 \times O(n-2)\nonumber \\&\quad +1.32 \times O(n-3)-0.45 \times O(n-4)+\psi (n), \end{aligned}$$
(16)

where \(\psi (n)=\Re (\psi (n))+\Im _1(\psi (n))i+\Im _2(\psi (n))j+\Im _3(\psi (n))k\) represents circular white noise in quaternionic variable and its components (\(\Re (\psi (n))\), \(\Im _1(\psi (n))\), \(\Im _2(\psi (n))\), and \(\Im _3(\psi (n)\)) embed the noise in the linear autoregression, which are normally distributed with zero mean and unit variance (\({\mathcal {N}}(0,1)\)). Using Eq. 16, we have generated 1500 quaternionic signals from which 500 signals are used for the training, and the rest 1000 signals are used to test the trained networks. The training of networks constructed using conventional, RPM, and proposed QVM neurons is performed through the quaternionic version of the backpropagation algorithm. The comparative training results reported in Table 3 presented that the network based on the proposed QVM neuron needs a smaller size of the network, lesser parameters, and faster convergence as compared to conventional and RPM neurons in the quaternionic domain. The testing processes of the networks are compared through statistical parameters such as testing MSE, testing error variance, and AIC in Table 3. These results also report that the QVM neuron learns and generalizes this system significantly better than the quaternionic version of the conventional and RPM neurons.

Table 3 Comparative performance analysis of networks for circular noise-based linear autoregressive filtering process
Table 4 Comparative training and testing performances for 4D chaotic time-series prediction of Saito’s circuit

4D chaotic time-series prediction for Saito’s circuit

The Saito’s circuit containing one locally active resistor, one non-linear negative resistance, and three energy-storage elements (one capacitor and two inductors) shows chaotic behavior in 4D space, because it satisfies the chaotic criteria of the circuit. Its circuit dynamics is defined by the following differential equations:

$$\begin{aligned}&\begin{bmatrix} \textrm{d}x_1(t)/\textrm{d}t \\ \textrm{d}y_1(t)/\textrm{d}t \end{bmatrix} =\begin{bmatrix} 1 &{} -1 \\ -\alpha _1 &{} \alpha _1 \beta _1 \end{bmatrix} \begin{bmatrix} x_1(t)-\zeta p_1 h\big (z(t)\big )\\ y_1(t)-\zeta p_1 h\big (z(t)\big )/\beta _1 \end{bmatrix} \nonumber \\&\begin{bmatrix} \textrm{d}x_2(t)/\textrm{d}t \\ \textrm{d}y(t)_2/\textrm{d}t \end{bmatrix} =\begin{bmatrix} 1 &{} -1 \\ -\alpha _2 &{} \alpha _2 \beta _2 \end{bmatrix} \begin{bmatrix} x_2(t)-\zeta p_2 h\big (z(t)\big )\\ y_2(t)-\zeta p_2 h\big (z(t)\big )/\beta _2 \end{bmatrix}, \end{aligned}$$
(17)

where the symbols \(\alpha _1\), \(\beta _1\), \(\alpha _2\), \(\beta _2\), and \(\zeta \) present the parameters which are computed by the particular values of the circuit components

$$\begin{aligned} h\big (z(t)\big )= {\left\{ \begin{array}{ll} -1, &{} \text {if } z(t)< 1\\ 1, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

presents the hysteresis value in normalized form; \(z(t)=x_1(t)+x_2(t)\), \(p_1=\beta _1/(1-\beta _1)\), and \(p_2=\beta _2/(1-\beta _2)\). This system considered as a benchmark problem for the prediction of its chaotic behavior through the quaternionic-valued neural network. Its chaotic behavior depends on its parameter values (\(\alpha _1=7.5\), \(\beta _1=0.16\), \(\alpha _2=15\), \(\beta _2=0.097\), \(\zeta =1.3\)). Using Eq. 17, we have generated 1500 terms of its time-series when its dynamics is initiated from a stationary state. Each term of this time-series containing four components (\(x_1(t)\), \(y_1(t)\), \(x_2(t)\), and \(y_2(t)\)) is represented in quaternion number (\(x_1+y_1(t)i +x_2(t)j +y_2(t)k\)). Its first 500 terms are considered to train the quaternionic-valued networks through the QBP learning algorithm and the rest 1000 terms are used for testing to predict the chaotic time series through the trained networks. In the training process, it is observed that the proposed QVM neuron-based network needs a smaller network topology (parameters), and a lesser number of average epochs to achieve threshold training MSE as compared to the network containing conventional or RPM neuron as reported in Table 4. The testing results of the trained networks are also comparatively analyzed through statistical parameters, such as testing MSE, testing variance, and AIC in Table 4. These training and testing results again disclose the dominance of the proposed QVM neuron over the conventional and RPM neurons in the quaternionic domain.

Fig. 7
figure 7

Training with input–output mappings of 3D straight line for the learning of three basic transformations in 3D space: a scaling with scaling factor 2, b translation with 0.3 unit in positive x-, y-, and z-directions, and c \(\pi /4\) radian rotation about x-axis

Table 5 Comparative analysis of training and testing results for 3D scaling transformation problem
Table 6 Comparative analysis of training and testing results for 3D translational transformation problem
Table 7 Comparative analysis of training and testing results for the 3D rotational transformation problem
Fig. 8
figure 8

Testing results of scaling transformation with scaling factor 2 using complicated 3D geometrical objects of a sphere, b ellipsoid, c torus, d cylinder, e cone, and f cube

Fig. 9
figure 9

Testing results of translational transformation with 0.3 unit translation in positive x-, y-, and z-directions using complicated 3D geometrical objects of a sphere, b ellipsoid, c torus, d cylinder, e cone, and f cube

Fig. 10
figure 10

Testing results of rotational transformation with \(\pi /4\) radian rotation around x-axis using complicated 3D geometrical objects of a sphere, b ellipsoid, c torus, d cylinder, e cone, and f cube

3D transformation problems

The transformation problem is one of the important and standard problems for analyzing the learning and generalization capability of 3D motion patterns through the quaternionic-valued neural network. The training for this problem is performed through the straight line in 3D space and testing is done through the various complex 3D geometrical structures. In this subsection, we present the learning and generalization of scaling, translation, and rotation transformations in 3D space through conventional, RPM, and proposed QVM neurons. This problem is considered as a benchmark that facilitates the visualization of objects with variable orientations and motion interpretation in 3D space. To perform learning for these transformations through three-layered networks (2-M-2), we have used a 3D straight line and its reference point (midpoint of the line) is referenced at origin in 3D space, as shown in Fig. 7. In the training, the networks process two inputs; the first is the point lying on the straight line and the second is its reference points. Similarly, the networks produce the two outputs; first for transforming point of the straight line and second for transforming reference point. It is observed that the consideration of reference points provides better accuracy during testing. The networks process the points of a straight line as the first input and a reference point as the second input. Similarly, the networks produce the transformed output of the straight line with a reference point. Simulation results for all transformations report that the network with proposed neurons reduces the training error drastically and also provides a better generalization and accuracy over the network associated with conventional and RPM neurons in the quaternionic domain.

The training of three-layered network is performed using input–output mapping of 3D straight line containing only 21 points for the learning of three different transformations: (1) scaling transformation (scaling factor 2) as shown in Fig. 7a, (2) translation transformation (0.3 unit translation in all \(+\)ve x-, y-, and z-directions), as shown in Fig. 7b, and (3) rotation transformation (\(\pi /4\) radian rotation about x-axis), as shown in Fig. 7c. For all these transformations, the QVM neuron-based network smaller network topology (parameters) converges faster to reach the threshold mean square error and better training variance as compared to conventional/RPM neuron-based network and presented in Tables 5, 6 and 7 respectively. The generalization of trained networks for all transformations is tested through various complex 3D geometrical objects; sphere (400 points), ellipsoid (400 points), torus (900 points), cylinder (500 points), cone (500 points), and cube (240 points) as shown in Figs. 8, 9, and 10, respectively. These figures show excellent generalization ability of proposed neuron over conventional neuron for all transformations. The testing results in terms of variance also reported in Tables 5, 6, and 7, which presents the superiority of the proposed QVM neuron in all experiments.

3D face recognition

3D face recognition is one of the challenging and important biometric problems for neurocomputing researchers due to the variations in facial orientation and expression in 3D space. The proposed network is used to recognize 3D faces and compared with conventional and RPM networks in the quaternionic domain. In this section, we have conducted two experiments separately on two datasets of 3D human faces containing point clouds. These faces have diverse orientations, head positions, and facial expressions. The first experiment is conducted on the first dataset containing the five faces of the same person and the second experiment on the second dataset containing five faces of different persons. In both experiments, we have selected a single human face for training and the rest for testing. In both the experiments, the quaternionic signal-based networks learn the complicated 3D surface of one human face and recognize the surface of the rest of the human faces in 3D space. The simple and small structure of the quaternionic-valued-based network with proposed QVM neurons will provide the new direction of research for the researchers working with the dataset containing large human faces in 3D space.

In the first experiment, we have considered the first dataset containing a 3D points cloud of five human faces of the same person with variable orientations and expressions, as shown in Fig. 11. Each 3D face consists of 4654 points of cloud data. The training of the quaternionic-based network is performed by using one face as shown in Fig. 11a and the trained network is tested over all the five faces. The training results of the networks reported in Table 8 exhibit that QVM-based network with a smaller structure provides faster convergence of training MSE with respect to the epoch as compared to conventional/RPM neuron-based networks. Table 8 also reports that the testing errors of all five faces are not varying much more. These results report that all faces in the first dataset belong to the same person when we do minor variations in face orientations and expressions. Thus, these results reveal that the QVM neuron has better learning capability in 3D space than the conventional and RPM neuron in the quaternionic domain.

Fig. 11
figure 11

Five 3D faces of the same person with different pose and orientation

Table 8 Performance comparison of training and testing processes for the 3D human faces of the same person having different orientations and poses

Similarly, the second experiment is conducted on the second dataset that consists of a 3D points cloud of five human faces of different persons, as shown in Fig. 12. Each face contains 6397 points of cloud data. The first face is used for the training of quaternionic-valued neural networks, as shown in Fig. 12a, and all faces including the first face are considered for testing to recognize the faces of the same or different person. The training and testing analysis are reported in Table 9 which presents the comparative analysis of networks in terms of threshold MSEs in average epochs. The MSE of the proposed QVM neuron-based network also reduces with respect to epoch relatively faster than convention network in training. The testing MSEs for all five faces reported in Table 9 present that the MSEs of the other four faces are much more varied and higher than the face used in training. These results report that the system recognizes the faces of the same or different person when we do the minor variations in orientation and expression. These experiments also reveal the significant improvement in learning and generalization capabilities of the proposed QVM neuron over conventional and RPM neurons.

Conclusion

For the processing of quaternionic-valued signals, a novel non-linear quaternionic-valued multiplicative (QVM) neuron with a simple structure is proposed in this paper. In the neurocomputing community, there was a strong demand for a non-linear neuron model with a simple structure for quaternionic-valued signals. A neuron model based on non-linear aggregation is better than a linear model in learning high-dimensional data. Quaternionic-multiplication aggregation is used to generate the proposed neuron’s non-linearity and simple structure. The amplitude and phase information of quaternionic-valued signals in 3D and 4D space is also learned by this neuron. Many benchmark problems, such as 3D time-series prediction (Chua’s circuit and Lorenz system) and 4D time-series (circular noise-based linear autoregressive system and Saito’s circuit), 3D transformations, and 3D face recognition, are used to evaluate its performance capability. The computational strength and approximation capability are illustrated through multiple sets of simulations and performance evaluation metrics such as training MSE, training error, variance, testing error variance, and AIC. Training and testing results of these problems through the proposed QVM neuron model confirm faster convergence and better generalization using a smaller network topology than conventional and root-power mean (RPM) models in the quaternionic domain. It is occurred due to the capability of multiplicative aggregation of quaternionic-valued signals in the proposed QVM neuron.

Fig. 12
figure 12

Five 3D faces of different persons

Table 9 Performance comparison of training and testing processes for the 3D human faces of different persons

There are various deep learning networks, such as varieties of convolution neural networks (CNN), recurrent neural networks (RNN), GAN, etc., that may be extended in the quaternionic domain with multiplicative interaction to solve real-life problems. These networks may be employed with the proposed neuron to achieve better computational ability, faster convergence, and better generalization using a smaller network topology (parameter). This paper focuses on the practical and logical demonstrations of the network based on the proposed QVM neuron model for the processing of quaternionic-valued signals. Convergence analysis and universal approximation may be proved theoretically in the future. For the high-dimensional information processing, the extension of the QVM neuron may be done in 8D octonionic and 16D sedenionic domains also. The various deep network models may also be reconstructed using QVM neurons for the realization of earlier convergence and noteworthy computational competence.