Abstract
The learning algorithm for a three-layered neural structure with novel non-linear quaternionic-valued multiplicative (QVM) neurons is proposed in this paper. The computing capability of non-linear aggregation in the cell body of biological neurons inspired the development of a non-linear neuron model. However, unlike linear neuron models, most non-linear neuron models are built on higher order aggregation, which is more mathematically complex and difficult to train. As a result, building non-linear neuron models with a simple structure is a difficult and time-consuming endeavor in the neurocomputing field. The concept of a QVM neuron model was influenced by the non-linear neuron model, which has a simple structure and the great computational ability. The suggested neuron’s linearity is determined by the weight and bias associated with each quaternionic-valued input. Non-commutative multiplication of all linearly connected quaternionic input-weight terms accommodates the non-linearity. To train three-layered networks with QVM neurons, the standard quaternionic-gradient-based backpropagation (QBP) algorithm is utilized. The computational and generalization capabilities of the QVM neuron are assessed through training and testing in the quaternionic domain utilizing benchmark problems, such as 3D and 4D chaotic time-series predictions, 3D geometrical transformations, and 3D face recognition. The training and testing outcomes are compared to conventional and root-power mean (RPM) neurons in quaternionic domain using training–testing MSEs, network topology (parameters), variance, and AIC as statistical measures. According to these findings, networks with QVM neurons have greater computational and generalization capabilities than networks with conventional and RPM neurons in quaternionic domain.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
The processing of high-dimensional data is one of the most difficult and time-consuming tasks for researchers, as it necessitates the employment of specialized approaches to maintain the relationship between characteristics. According to current research, quaternion has the capacity to keep the relationship between characteristics of high-dimensional data [1,2,3]. In the recent research, the quaternion is employed for the applications containing 3D and 4D related issues, particularly in the encoding and processing of color image pixels with red, green, and blue channels [4, 5]. A quaternion is a four-dimensional hyper-complex number extended from a complex number (\({\mathbb {C}}\)), first proposed by Irish mathematician Sir WR Hamilton in 1843 [6]. In modern mathematics, it establishes four-dimensional associative norms and non-commutative division algebras over the real numbers. The Clifford classification also expresses the algebra of this number, where quaternion presents a unique state analysis [7]. In addition, the Frobenius theorem also states that the quaternion has two finite-dimensional division rings: one ring containing real numbers is a proper subring and the other ring will act as the complex numbers [8]. These rings also form Euclidean Hurwitz algebras, where the quaternion presents the associative algebra [9]. Furthermore, the quaternion number has been also used to prove the four-square Lagrange theorem in the number theory. This theorem states that every non-negative integer can be computed by the sum of the squares of four integers [10]. Thus, this theorem can be used in noteworthy applications related to mathematics, such as number and combinatorial design theories.
It is prominent that quaternion (\(q=\Re (q)+\Im _1(q)i + \Im _2(q)j+ \Im _3(q)k\)) consists of four orthogonal bases 1, i, j, k. These bases satisfy the Hamilton’s rules (\(i \otimes i=-1; j \otimes j=-1; k \otimes k=-1; i \otimes j \otimes k=-1; i \otimes j=k=-j \otimes i; j \otimes k=i=-k \otimes j=i; k \otimes i=j=-i \otimes k\), where the symbol \(\otimes \) denotes the non-commutative quaternionic multiplication) [3]. The beauty of this number is to process four real numbers simultaneously using a single body representation with one real (\(\Re (q)\)) and three distinct imaginary components (\(\Im _1(q),\Im _2(q),\Im _3(q)\)). Although the multiplication of this number is non-commutative in nature, but its spatial rotation can be modeled and visualized by the quaternion Hamilton product. Because of its numerous merits, such as avoiding gimbal lock, robust and compact representation, and rotation in 4D spaces, the quaternion is an essential and demanding tool for researchers working in 3D (using pure quaternions) and 4D environments. These merits emphasize the importance of quaternions in both pure and applied mathematics, primarily for visualizing 3D rotations in attitude control [11, 12], orbital mechanics [13, 14], molecular dynamics [15, 16], image and signal processing [17, 18], computer vision and robotics [19], bioinformatics [20, 21], and crystallographic texture [22]. The functions of quaternionic numbers also recommend to build the physical models like functions of a complex variable. For example, the electric and magnetic fields are represented using functions of quaternionic numbers in 3D space and the Mandelbrot and Julia sets in 4D space [23].
The correlations among the components of quaternionic signals are requisite and essential elements in neural networks for better learning and generalization. The compact behavior and special representation ability of quaternions build the simple natural structure that learns the inter- and intra-dependencies among high-dimensional features. In the recent studies [3, 24, 25], it has been mentioned that neural networks with quaternions provide a smaller number of neural parameters with superior learning and better generalization capability as compared to networks with real or complex values. Apart from these, quaternion-valued neural networks (QVNN) store and learn the spatial relationships in the various transformations of 3D coordinates [2, 3] and in between the color pixels [26], whereas real/complex-valued neural network fails. These qualities have motivated the researchers to apply QVNN in many fields such as automatic speech recognition [27, 28], image classification [29], PolSAR land classification [30], prostate cancer Gleason grading [31], color image compression [32], facial expression recognition [33], robot manipulator [34], spoken language understanding [35], attitude control of spacecraft [36], and banknote classification [37]. However, 3D or 4D information has been processed using a real-valued neural network (RVNN) where all components are considered separately in which it is neglected the correlation among each other, addressed in [38]. It is also not the natural way to process the high-dimensional signals by neglecting the correlations among their components. In the literature, it has been investigated that the neural network in the quaternionic domain learns the amplitude as well as phase information of quaternionic signals effectively [39]. In [35], the real-valued multi-layer perceptron (MLP) and quaternion-valued multi-layer perceptron (QMLP) are also compared for the identification of spoken dialogues and the result has been reported that QMLP takes lesser number of epochs and better accuracy as compared to MLP. From the perspective of a biological neuron, its action potential may have heterogeneous pulse configurations and diverse separation among pulses. The configuration associated with action potential imparts an idea to consider a quaternionic signal containing amplitude and phase information (correlations among quaternion’s components) which is the most promising.
Neural networks are preferred over any other machine learning (ML) algorithm, because it captures non-linearity among data, which is important for researchers dealing with real-world problems. For handling the non-linear relationships, a modified group method of data handling (GMDH) [40] is proposed to improve the accuracy and replaced the conventional polynomial functions with ML models. However, capturing of non-linearity by neural networks depends on the selection of activation functions. The main purpose of building an artificial neuron is to establish a mathematical model that approximates the functional capability of the biological neuron. The renowned and basic McCulloch–Pitts neuron [41] is the linear model where the net action potential is calculated through the weighted summation of input, and this has further extended in complex [42, 43] and quaternionic domains [44] for specific applications. However, this neuron model has not considered the possibility and capability of non-linear aggregation. It has also neglected the character of dendrites for information processing in the cell body of the neural system. Many literatures have been addressed that the computational capability of a biological neuron depends on how the input signals are aggregated in the cell body [45,46,47]. Also, the aggregation based on non-linear combination with inputs increases the computation power of artificial neurons [48,49,50]. In an experimental study of a cat’s visual system, it is observed that the binocular interaction in its neural system is aggregated through the multiplicative fashion [51].
The neurobiologists are studying the non-linear aggregation based on the multiplication of input signals due to three basis intentions: (1) how the input signals with their synaptic weights process through dendrites in a non-linear manner, (2) how to analyze the behavior of its output signal, and (3) how the simple but non-linear model recognizes the complicated behavior using possible biological methods. These studies have indicated that multiplicative aggregation has a major role in the artificial neural system, particularly for computing the motion perception and its learning [52,53,54]. Further, the Weierstrass–Stone theorem proved that the multiplication-based network containing only input–output layers can be modeled using any function that should be continuous in a finite interval [55]. This theorem also works for the unsupervised learning of the novel network using the blind source separation problem containing mixed signals with non-linearity [56, 57]. The neuron based on multiplicative aggregation in the real domain and its learning methodology has been investigated the computational capabilities in [58], and most recently, the network containing multiplicative neurons in the complex domain has been also verified its computational power [59] but not investigated the computational capability in quaternionic domain. In the past studies, some evidence has been also addressed that the aggregation of input signals is based on the multiplication in the nervous system of several animals [51, 60, 61]. There are some other non-linear neuron models based on higher order aggregation addressed in [62, 63], but they are mathematically complex in structure and tough to train, especially for quaternionic-valued signals. However, the polynomial neuron for large number of inputs is exceedingly difficult to train a network because of higher order terms. Thus, it is very demanding for the neurocomputing community to construct non-linear neuron models with a simple structure. These pieces of evidence influenced to propose a simple and non-linear quaternionic-valued multiplicative (QVM) neuron model. This neuron model is the simpler model of polynomial neuron and its training process is done by the standard backpropagation algorithm. This model processes quaternionic-valued signals with their quaternionic weights in non-linear manners to produce a quaternionic-valued output signal. The linearity of the proposed neuron is established by the summation of quaternionic-valued input and weight product, and the non-linearity has been achieved by non-commutative multiplication of all linearly associated quaternionic input-weight terms. The multiplication of quaternionic signals only restrict to the sum of products of basis elements excluding higher order expression that provides the simplicity of QVM neuron model. In this paper, the different benchmark problems have been considered in the analysis of the computational capabilities of proposed neurons experimentally. Its performance is comparatively analyzed through network topology (parameters), no. of average epochs, training MSE, training and testing error variances, and Akaike information criterion (AIC) [64]. The proposed QVM neuron based on multiplication is compared experimentally with summation-based conventional and root-power mean (RPM) neurons in quaternionic domain [2]. The potential (V) of quaternionic-conventional neurons for N number of quaternionic-valued signals (\(q_i; i=1\cdots N\)) is calculated using weighted sum of quaternionic input signals; mathematically, it is represented as
where input weights (\(w_i\)) and bias weight (\(w_0\)) are quaternions. And, the potential (V) of RPM aggregation-based neuron is calculated as weighted sum of RPMs of input signals and is given by
where input weights (\(w_i\)) and bias weight (\(w_0\)) are quaternions, and \(\alpha \) is the power coefficient and is used to adjust the degree of approximation. \(\alpha \) can be set to any value in range (\(-\infty \), \(\infty \)), and when \(\alpha =-1; 1; 2\), then V will act as harmonic, arithmetic, and quadratic means, respectively. Also, the RPM neuron will become a conventional neuron when \(\alpha =1\). Apart from these neurons, quaternion multi-valued network is proposed in [65] which is also based on the linear aggregation and neglects the non-linear aggregation, which presents better computational power as compared to linear aggregation. However, activation function used in [65] applied over all components of quaternionic which maintains the inter-relationships among their components. Although, QVM considered a general activation function that can be selected any activation function on each component of quaternionic signal which will depend on the intended applications. The main contributions of this papers are addressed as follows:
-
1.
The multiplicative neuron in quaternionic domain (QVM) with its backpropagation algorithm is proposed.
-
2.
The QVM neuron is non-linear neuron model with simple structure (lower class of polynomial neuron).
-
3.
The proposed work is compared with existing quaternionic-valued conventional and root-power mean neurons through benchmark problems.
-
4.
The results of proposed QVM neuron model report better in terms of network topology (parameters), average epochs, training MSE, training and testing error variances, and AIC.
The detailed description of the remaining parts of this paper presented section wise as follows: The section “Quaternionic-valued multiplicative neuron model” explains the designing concept of the proposed quaternionic-valued multiplicative (QVM) neuron model. The section “Learning rule” presents the learning process of a three-layered quaternionic-valued neural network (QVNN) embedded with the proposed QVM neurons and its learning rules is derived step by step using a quaternionic-backpropagation algorithm in Appendix A. The section “Performance assessment of QVM neuron using benchmark problems” investigates the performance capability of the proposed QVM neuron through the wide spectrum of benchmark problems such as 3D and 4D chaotic time-series prediction, 3D transformations, and 3D face recognition. And finally, the section “Conclusion” exhibits the conclusion with future perspectives of this work.
Quaternionic-valued multiplicative neuron model
In the past pieces of literature, various neuron models based on linear or non-linear aggregation have been proposed. Most of them are linearly aggregated for real-, complex-, or quaternionic-valued signals [41, 42]. Although, a few non-linear neurons have been also designed for achieving better computational power in the recent past [2, 3, 66]. These literatures have addressed that non-linear aggregative neurons have the learnability advantage over linear aggregation-based neurons. However, non-linear aggregation is more computationally expensive and complicated in structure than linear aggregation, but the non-linear aggregation-based models require a lesser number of neurons and layers in training. The advantages have motivated us to design a computationally powerful and novel neuron model based on a simple non-linear aggregation of input signals, especially for non-commutative quaternionic-valued signals. Hence, for the processing of quaternionic-valued signals, a simple and non-linear quaternionic-valued multiplicative (QVM) neuron has been proposed and its basic structure is shown in Fig. 1. The net potential function of the QVM neuron is governed by the non-linear aggregation of linearly associated quaternionic input signals with weights. Its non-linearity has been accomplished by using the multiplication of multiple linearly weighted inputs. Its bias weights are adaptable during the learning process, which provides better accuracy.
The resulting quaternionic-valued multiplicative (QVM) neuron uses Hamilton’s product in the aggregation of multiple weighted quaternion-valued inputs. The proposed QVM neuron as shown in Fig. 1 processes L quaternionic-valued inputs with a multiplicative aggregation of their corresponding weights and produces a quaternionic-valued output, where all inputs, weights, and an output are considered in quaternion number, e.g., \(q =\Re (q)+\Im _1(q)i+\Im _2(q)j+\Im _3(q)k\). Each input \([q_1,q_2,\ldots ,q_l,\ldots ,q_L]\) passes, respectively, with quaternionic-valued weights \([w_{1m},w_{2m},\ldots ,w_{lm},\ldots ,w_{Lm}]\), where m presents the general mth neuron of the quaternionic-valued. Let \(q^0\) denotes the bias input, where \(\Re (q^0)=\Im _1(q^0)=\Im _2(q^0)=\Im _3(q^0)=1\) and \([w^0_{1m},w^0_{2m},\ldots ,w^0_{lm},\ldots ,w^0_{Lm}]\) denotes a bias weight vector that contains the bias weights for each bias input. The net internal potential \(V_m\) and output \(Y_m\) of mth QVM neuron are defined, respectively, as
and
The \(\otimes \) notation prsented in Eq. 1 denotes the quaternionic multiplication operator in non-commutative manner and the symbol \(^{\otimes }\prod \) represents the quaternionic product of quaternionic variables that multiply its terms one by one from right end (e.g., \(^{\otimes }\prod _{i=1}^{t}r_i=r_1\otimes r_2\otimes \cdots \otimes r_t\)) in Eq. 1. The quaternionic activation function is denoted by \(f_{\mathbb {H}}\) in Eq. 2.
Learning rule
The proposed quaternionic-valued multiplicative (QVM) neuron can be embedded in multi-layered neural networks for quaternionic-valued signal processing. This network structure is similar to the multi-layered perceptron employed with a conventional neuron (linear aggregation of input signals) in the quaternionic domain [1, 67]. All conventional neurons are replaced by the proposed QVM neurons in both hidden and output layers of the network which is shown in Fig. 2. The output of the QVM neuron is obtained by applying a split-type quaternion version of the activation function (\(f_{\mathbb {H}}\)) on quaternionic-valued net internal potential (V). This activation function \(f_{\mathbb {H}}\) is the 4D extension of a real-valued activation function (\(f_{\mathbb {R}}\)) which is addressed in [1, 38, 68]. If the net internal potential of quaternionic-valued neuron is expressed as \(V=\Re (V)+\Im _1(V)i+\Im _2(V)j+\Im _3(V)k\), then its output is computed as follows:
The outputs of the most of non-linear neurons in the real/complex/quaternionic domain produce solely results when applying non-linear activation function. However, there are very few fully non-linear and analytical quaternionic-valued activation functions available and they also require extensive and careful training [25]. The widely used split-type activation functions do not model the local interdependencies [68]. However, split activation function provides the simplicity for computation of output of a neuron for both complex-valued and quaternionic-valued signals. It is also used due to prior investigation and better stability (pure quaternion activation functions contain the singularities) addressed in [69]. To tackle this problem, the non-linear aggregation of inputs can be used, which will enable us to choose from a wide range of split functions while preserving the local interdependencies to some degree. Equation 3 represents an activation function of quaternionic-valued signals (V) in which a real-valued activation \(f_{\mathbb {R}}(.)\) is splitting on each component of V. In a few literatures, various linear and non-linear activation functions for real-valued signals are addressed and investigated their computational capabilities [70]. In this paper, we have used linear, sigmoid, and hyperbolic-tangent functions as activation functions for the considered benchmark problems. However, its selection criteria are based on specific applications. In the learning algorithm, we have considered the general activation function by which the specific activation function can be selected for the intended applications.
The quaternionic version of conventional neuron is replaced by the proposed quaternionic-valued multiplicative (QVM) neuron in the fully connected three-layered neural network containing L number of quaternionic input signals, M number of QVM neurons in hidden layer, and N number of QVM neurons in output layer, as shown in Fig. 2. In this network, all inputs, weights, and outputs including bias input and its weight are considered in quaternion. Let \(q_l\) and \(Y_n\) be the general \(l^{th} (l=1, \ldots , L)\) quaternionic input and \(n^{th} (n=1, \ldots , N)\) output signals, respectively. Let \(w_{lm}\) and \(w_{mn}\) be the general weights used for the connections from lth input to mth hidden neuron and mth hidden neuron to nth output neuron, respectively, where \(m=1,\ldots , M\). Let the bias input is \(q^0=1+i+j+k\) and its quaternionic conjugate \(\overline{q^0}=1-i-j-k\). The net internal potential (\(V_m\)) of mth QVM neuron at hidden layer is defined as follows:
where \(\varOmega _{lm} = w_{lm}\otimes q_{l}+w^{0}_{lm}\otimes q^{0}\) is the particular term that associates lth input and mth hidden neuron, \(V_m^{\text {LT}}=^\otimes \prod _{l=1}^{l-1}\varOmega _{lm}\) is the aggregation function that aggregates all left terms (LT) of \(\varOmega _{lm}\), and similarly, \( V_m^{\text {RT}}=^\otimes \prod _{l=l+1}^M\varOmega _{lm}\) aggregates all right terms (RT) of \(\varOmega _{lm}\). The output (\(Y_m\)) of mth QVM neuron is obtained by the quaternionic activation function (\(f_{\mathbb {H}}\)) of \(V_m\) as follows:
Similarly, the net internal potential (\(V_n\)) of nth QVM neuron at output layer is defined as
where \(\varOmega _{mn} = w_{mn}\otimes Y_{m}+w^{0}_{mn}\otimes q^{0}\), \(V_n^{\text {LT}}=^\otimes \prod _{m=1}^{m-1}\varOmega _{mn}\), and \( V_n^{\text {RT}}=^\otimes \prod _{m=m+1}^M\varOmega _{mn}\). The output (\(Y_n\)) of nth hidden QVM neuron is computed as
Let E be the real-valued cost function which provides the total error of the feed-forward network. This function is computed by the mean square error (MSE) as follows:
where \(e_n\) represents the error at nth QVM neuron at output layer. This error is calculated by taking the difference between desired output (\(Y^{D}_n\)) and actual output (\(Y_n\)) (i.e., \( e_n =Y^{D}_n - Y_n \)). The expression (\(e_n\otimes \overline{e_n}\)) presented in Eq. 8 provides a real value, because the quaternionic multiplication of any quaternion number with its conjugate yields the real number. The gradient descent-based quaternionic backpropagation (QBP) is used to deduce the weight updates (\(\Delta w\)) through the minimization of the cost function (E). The objective error function E is minimized by recursively updation of all weights associated with the network. The weight update \(\Delta w\) can be calculated by negative quaternionic gradient of real-valued cost function \((\nabla _w{E})\). This gradient can be evaluated using partial derivative with respect to a real and other three imaginary components of quaternionic weights which is expressed as
where \(\eta \) represents the learning rate lies in the interval (0,1]. Any weight updates \(\Delta w=\Delta w_{mn}\) (i.e., the weight updates between hidden and output layers) can be deduced by the chain rule of the derivative with respect to \(w_{mn}\) and its simplified form is expressed as
where
The symbol \(\odot \) denotes the component-wise multiplication of quaternionic variables (e.g., \(q_1 \odot q_2=\Re (q_1)\Re (q_2)+\Im _1(q_1)\Im _1(q_2)i+\Im _2(q_1)\Im _2(q_2)j+\Im _3(q_1)\Im _3(q_2)k\), where \(q_1\) and \(q_2\) are two arbitrary quaternionic variables). Similarly, for the input-hidden weight updates, \(\Delta w=\Delta w_{lm}\) is also derived by chain rule of derivation and its simplified form is expressed as
where
The complete derivation of all weight updates is given in Appendix A. The pseudocodes of learning procedures for the proposed network are also represented as:
Performance assessment of QVM neuron using benchmark problems
In this section, we have conducted various experiments for evaluating the training and testing performances of proposed quaternionic-valued multiplicative (QVM) neuron-based neural network through a wide spectrum of benchmark problems such as 3D and 4D chaotic time-series predictions, 3D transformations, and 3D face recognition. Its performance is compared against conventional and RPM neurons through network topology (parameters), no. of average epochs, training MSE, training and testing error variances, and Akaike information criterion (AIC). For the learning of 3D transformations (scaling, translation, and rotation), a set of few points lying on a 3D straight line is considered and its generalization abilities are tested through the complex geometrical objects in 3D space. As a biometric application, we have opted for two primary datasets of the same and different 3D human faces for the classification of faces. This application will surely provide an idea to perspective researchers working in computer vision. All the experiments are conducted ten times using same number of parameters and same learning rule for same number of epochs with variable the initial weights. The average of all statistical parameters is reported for the comparison of networks.
3D chaotic time-series prediction problems
3D chaotic time-series prediction for Chua’s circuit
The Chua’s circuit containing one locally active resistor, one Chua’s diode as non-linear negative resistance, and three energy-storage elements (two capacitors and an inductor) generates chaotic time-series. This circuit shows the chaotic behavior, because it satisfies all three criteria of any chaotic circuit consisting of one or more non-linear elements, one or more locally active resistors, and three or more energy-storage elements. The dynamics of Chua’s circuit is governed by the following ordinary differential equations:
where the symbols \(\alpha \), \(\beta \), and \(\gamma \) (\(\alpha \ge 0, \beta \ge 0, \gamma \ge 0\)) are the parameters which are computed by the specific values of the circuit components. The function \(f\big (x(t) \big )\) is piecewise-linear function in three segments that provides the variation of resistance with respect to the current flowing across the Chua’s diode. This function is mathematically expressed as
where the symbols \(m_0\) and \(m_1\) present the slopes of the inner and outer segments of the function \(f\big (x(t)\big )\), respectively. The variables x(t) and y(t) present the voltages across the capacitors and the variable z(t) presents the current flowing through the inductor. To predict the chaotic behavior of the Chua’s circuit, QVM neuron-based network has used effectively, because the network processes quaternionic-valued signal containing four real values in a single body. For this problem, the quaternion can be transformed in pure quaternion by substituting real part zero or near to zero (\(q=0+x(t)i +y(t)j +z(t)k\)).
This circuit exhibits the double scrolled chaotic attractor in 3D space under specific values of its parameters \(\alpha =15.6, \beta =28, \gamma =0, m_0=-\,1.143\), and \(m_1=-\,0.714\). The chaotic system presented in Eq. 14 generates a chaotic time-series of 10, 000 terms when we consider initial voltages \(x(t)=0.1\) and \(y(t)=0.1\), and initial current \(z(t)=0.1\) and time step 0.01 s. The generated time-series is first normalized between \((-0.8, 0.8)\), and then, its terms are transformed in the form of pure quaternions. Initial 1000 terms out of total 10, 000 terms are used training and the rest 9000 terms are used for testing through quaternionic-valued neural networks. At hidden and output layers of the networks, we have used the sigmoid and hyperbolic-tangent activation functions, respectively. Table 1 presents the comparative analysis of training and testing of the proposed QVM neuron model against the conventional and RPM neuron models, which reports better results in all perspectives such as network topology, MSE training, variance, and AIC. The learning curves presented in Fig. 3 also prove the slightly faster convergence of the proposed neuron model. The 3D graphical view of the testing result is also demonstrated by double scrolled chaotic attractor as shown in Fig. 4. The training and testing results report evidence that QVM neurons are much more superior to conventional and RPM neurons.
3D chaotic time-series prediction for the Lorenz system
The Lorenz system considered as a benchmark problem for predicting its chaotic behavior in 3D space. It exists in several interdisciplinary fields such as atmospheric convection, thermosyphons, light amplification, chemical reaction, and electrical circuits. However, it behaves chaotically in nature under specific values of the initial condition and its parameters. This system is expressed by the following system of three differential equations as:
where the symbols \(\sigma \), \(\rho \), and \(\beta \) present its constant parameters [\(\sigma = 15\), \(\rho = 28\), and \(\beta = 8/3\)]. The variables x(t), y(t), and z(t) vary with time and show chaotic behavior in 3D space under specific value of its parameters. As chaotic systems, it magnifies significantly in just a small perturbation in the initial conditions and also gets a completely new series. Thus, the prediction of its series is challenging and demanding for the neurocomputing community. The network with QVM neurons for quaternionic-valued signals can predict its sequence, because its variables can be expressed as a pure quaternionic signal (\(0+x(t)i+y(t)j+z(t)k\)) and it can be processed as a single quaternionic number in the quaternionic-valued neural network which preserves the amplitude and correlation among variables. Its 6000 terms between a time interval of 0–60 s under initial condition and parameters are generated by the system of equations presented in Eq. 15 and then normalized in range of \((-\,0.8, 0.8)\). The first 500 terms from the complete normalized set are used for training of the networks, and the rest of the terms are used for testing. We have used the sigmoidal and linear activation functions for neurons at hidden and output layers of the network, respectively. The training and testing results reported in Table 2 present that a simple neural structure with proposed neurons is capable to learn its chaotic behavior better and also outperforms the conventional neural network. The testing results are demonstrated in Fig. 6 and also reported through statistical parameters in Table 2. The faster learning of the network with proposed QVM neurons over the conventional or RPM neurons based quaternionic-valued network is also demonstrated in Fig. 5. Overall, all results reveal that the proposed QVM neuron is significantly better than the quaternionic version of the conventional and RPM neurons.
4D time-series prediction problems
Circular noise-based linear autoregressive
The circular noise-based linear autoregressive process is considered a vital and standard problem for the prediction of its quaternionic-valued signals. The stable version of the linear autoregressive filtering process is defined by the following recurrence function:
where \(\psi (n)=\Re (\psi (n))+\Im _1(\psi (n))i+\Im _2(\psi (n))j+\Im _3(\psi (n))k\) represents circular white noise in quaternionic variable and its components (\(\Re (\psi (n))\), \(\Im _1(\psi (n))\), \(\Im _2(\psi (n))\), and \(\Im _3(\psi (n)\)) embed the noise in the linear autoregression, which are normally distributed with zero mean and unit variance (\({\mathcal {N}}(0,1)\)). Using Eq. 16, we have generated 1500 quaternionic signals from which 500 signals are used for the training, and the rest 1000 signals are used to test the trained networks. The training of networks constructed using conventional, RPM, and proposed QVM neurons is performed through the quaternionic version of the backpropagation algorithm. The comparative training results reported in Table 3 presented that the network based on the proposed QVM neuron needs a smaller size of the network, lesser parameters, and faster convergence as compared to conventional and RPM neurons in the quaternionic domain. The testing processes of the networks are compared through statistical parameters such as testing MSE, testing error variance, and AIC in Table 3. These results also report that the QVM neuron learns and generalizes this system significantly better than the quaternionic version of the conventional and RPM neurons.
4D chaotic time-series prediction for Saito’s circuit
The Saito’s circuit containing one locally active resistor, one non-linear negative resistance, and three energy-storage elements (one capacitor and two inductors) shows chaotic behavior in 4D space, because it satisfies the chaotic criteria of the circuit. Its circuit dynamics is defined by the following differential equations:
where the symbols \(\alpha _1\), \(\beta _1\), \(\alpha _2\), \(\beta _2\), and \(\zeta \) present the parameters which are computed by the particular values of the circuit components
presents the hysteresis value in normalized form; \(z(t)=x_1(t)+x_2(t)\), \(p_1=\beta _1/(1-\beta _1)\), and \(p_2=\beta _2/(1-\beta _2)\). This system considered as a benchmark problem for the prediction of its chaotic behavior through the quaternionic-valued neural network. Its chaotic behavior depends on its parameter values (\(\alpha _1=7.5\), \(\beta _1=0.16\), \(\alpha _2=15\), \(\beta _2=0.097\), \(\zeta =1.3\)). Using Eq. 17, we have generated 1500 terms of its time-series when its dynamics is initiated from a stationary state. Each term of this time-series containing four components (\(x_1(t)\), \(y_1(t)\), \(x_2(t)\), and \(y_2(t)\)) is represented in quaternion number (\(x_1+y_1(t)i +x_2(t)j +y_2(t)k\)). Its first 500 terms are considered to train the quaternionic-valued networks through the QBP learning algorithm and the rest 1000 terms are used for testing to predict the chaotic time series through the trained networks. In the training process, it is observed that the proposed QVM neuron-based network needs a smaller network topology (parameters), and a lesser number of average epochs to achieve threshold training MSE as compared to the network containing conventional or RPM neuron as reported in Table 4. The testing results of the trained networks are also comparatively analyzed through statistical parameters, such as testing MSE, testing variance, and AIC in Table 4. These training and testing results again disclose the dominance of the proposed QVM neuron over the conventional and RPM neurons in the quaternionic domain.
3D transformation problems
The transformation problem is one of the important and standard problems for analyzing the learning and generalization capability of 3D motion patterns through the quaternionic-valued neural network. The training for this problem is performed through the straight line in 3D space and testing is done through the various complex 3D geometrical structures. In this subsection, we present the learning and generalization of scaling, translation, and rotation transformations in 3D space through conventional, RPM, and proposed QVM neurons. This problem is considered as a benchmark that facilitates the visualization of objects with variable orientations and motion interpretation in 3D space. To perform learning for these transformations through three-layered networks (2-M-2), we have used a 3D straight line and its reference point (midpoint of the line) is referenced at origin in 3D space, as shown in Fig. 7. In the training, the networks process two inputs; the first is the point lying on the straight line and the second is its reference points. Similarly, the networks produce the two outputs; first for transforming point of the straight line and second for transforming reference point. It is observed that the consideration of reference points provides better accuracy during testing. The networks process the points of a straight line as the first input and a reference point as the second input. Similarly, the networks produce the transformed output of the straight line with a reference point. Simulation results for all transformations report that the network with proposed neurons reduces the training error drastically and also provides a better generalization and accuracy over the network associated with conventional and RPM neurons in the quaternionic domain.
The training of three-layered network is performed using input–output mapping of 3D straight line containing only 21 points for the learning of three different transformations: (1) scaling transformation (scaling factor 2) as shown in Fig. 7a, (2) translation transformation (0.3 unit translation in all \(+\)ve x-, y-, and z-directions), as shown in Fig. 7b, and (3) rotation transformation (\(\pi /4\) radian rotation about x-axis), as shown in Fig. 7c. For all these transformations, the QVM neuron-based network smaller network topology (parameters) converges faster to reach the threshold mean square error and better training variance as compared to conventional/RPM neuron-based network and presented in Tables 5, 6 and 7 respectively. The generalization of trained networks for all transformations is tested through various complex 3D geometrical objects; sphere (400 points), ellipsoid (400 points), torus (900 points), cylinder (500 points), cone (500 points), and cube (240 points) as shown in Figs. 8, 9, and 10, respectively. These figures show excellent generalization ability of proposed neuron over conventional neuron for all transformations. The testing results in terms of variance also reported in Tables 5, 6, and 7, which presents the superiority of the proposed QVM neuron in all experiments.
3D face recognition
3D face recognition is one of the challenging and important biometric problems for neurocomputing researchers due to the variations in facial orientation and expression in 3D space. The proposed network is used to recognize 3D faces and compared with conventional and RPM networks in the quaternionic domain. In this section, we have conducted two experiments separately on two datasets of 3D human faces containing point clouds. These faces have diverse orientations, head positions, and facial expressions. The first experiment is conducted on the first dataset containing the five faces of the same person and the second experiment on the second dataset containing five faces of different persons. In both experiments, we have selected a single human face for training and the rest for testing. In both the experiments, the quaternionic signal-based networks learn the complicated 3D surface of one human face and recognize the surface of the rest of the human faces in 3D space. The simple and small structure of the quaternionic-valued-based network with proposed QVM neurons will provide the new direction of research for the researchers working with the dataset containing large human faces in 3D space.
In the first experiment, we have considered the first dataset containing a 3D points cloud of five human faces of the same person with variable orientations and expressions, as shown in Fig. 11. Each 3D face consists of 4654 points of cloud data. The training of the quaternionic-based network is performed by using one face as shown in Fig. 11a and the trained network is tested over all the five faces. The training results of the networks reported in Table 8 exhibit that QVM-based network with a smaller structure provides faster convergence of training MSE with respect to the epoch as compared to conventional/RPM neuron-based networks. Table 8 also reports that the testing errors of all five faces are not varying much more. These results report that all faces in the first dataset belong to the same person when we do minor variations in face orientations and expressions. Thus, these results reveal that the QVM neuron has better learning capability in 3D space than the conventional and RPM neuron in the quaternionic domain.
Similarly, the second experiment is conducted on the second dataset that consists of a 3D points cloud of five human faces of different persons, as shown in Fig. 12. Each face contains 6397 points of cloud data. The first face is used for the training of quaternionic-valued neural networks, as shown in Fig. 12a, and all faces including the first face are considered for testing to recognize the faces of the same or different person. The training and testing analysis are reported in Table 9 which presents the comparative analysis of networks in terms of threshold MSEs in average epochs. The MSE of the proposed QVM neuron-based network also reduces with respect to epoch relatively faster than convention network in training. The testing MSEs for all five faces reported in Table 9 present that the MSEs of the other four faces are much more varied and higher than the face used in training. These results report that the system recognizes the faces of the same or different person when we do the minor variations in orientation and expression. These experiments also reveal the significant improvement in learning and generalization capabilities of the proposed QVM neuron over conventional and RPM neurons.
Conclusion
For the processing of quaternionic-valued signals, a novel non-linear quaternionic-valued multiplicative (QVM) neuron with a simple structure is proposed in this paper. In the neurocomputing community, there was a strong demand for a non-linear neuron model with a simple structure for quaternionic-valued signals. A neuron model based on non-linear aggregation is better than a linear model in learning high-dimensional data. Quaternionic-multiplication aggregation is used to generate the proposed neuron’s non-linearity and simple structure. The amplitude and phase information of quaternionic-valued signals in 3D and 4D space is also learned by this neuron. Many benchmark problems, such as 3D time-series prediction (Chua’s circuit and Lorenz system) and 4D time-series (circular noise-based linear autoregressive system and Saito’s circuit), 3D transformations, and 3D face recognition, are used to evaluate its performance capability. The computational strength and approximation capability are illustrated through multiple sets of simulations and performance evaluation metrics such as training MSE, training error, variance, testing error variance, and AIC. Training and testing results of these problems through the proposed QVM neuron model confirm faster convergence and better generalization using a smaller network topology than conventional and root-power mean (RPM) models in the quaternionic domain. It is occurred due to the capability of multiplicative aggregation of quaternionic-valued signals in the proposed QVM neuron.
There are various deep learning networks, such as varieties of convolution neural networks (CNN), recurrent neural networks (RNN), GAN, etc., that may be extended in the quaternionic domain with multiplicative interaction to solve real-life problems. These networks may be employed with the proposed neuron to achieve better computational ability, faster convergence, and better generalization using a smaller network topology (parameter). This paper focuses on the practical and logical demonstrations of the network based on the proposed QVM neuron model for the processing of quaternionic-valued signals. Convergence analysis and universal approximation may be proved theoretically in the future. For the high-dimensional information processing, the extension of the QVM neuron may be done in 8D octonionic and 16D sedenionic domains also. The various deep network models may also be reconstructed using QVM neurons for the realization of earlier convergence and noteworthy computational competence.
References
Kumar S, Tripathi BK (2018) High-dimensional information processing through resilient propagation in quaternionic domain. J Ind Inf Integr 11:41–49
Kumar S, Tripathi BK (2019) Root-power mean aggregation-based neuron in quaternionic domain. IETE J Res 65(4):557–575
Kumar S, Tripathi BK (2019) On the learning machine with compensatory aggregation based neurons in quaternionic domain. J Comput Des Eng 6(1):33–48
Yin Q, Wang J, Luo X, Zhai J, Jha SK, Shi Y-Q (2019) Quaternion convolutional neural network for color image classification and forensics. IEEE Access 7:20293–20301
Denis P, Carre P, Fernandez-Maloigne C (2007) Spatial and spectral quaternionic approaches for colour images. Comput Vis Image Underst 107(1–2):74–87
Hamilton WR (1866) Elements of quaternions. Green, & Company, Longmans
Kajiwara J, Li XD, Shon KH (2004) Regeneration in complex, quaternion and Clifford analysis. In: Finite or infinite dimensional complex analysis and applications. Springer, pp 287–298
Élashvili AG (1982) Frobenius Lie algebras. Funct Anal Appl 16(4):326–328
Hurwitz A (1922) Über die Komposition der quadratischen Formen. Math Ann 88(1–2):1–25
Sun Z-W (2017) Refining Lagrange’s four-square theorem. J Number Theory 175:167–190
Ma Y, Jiang B, Tao G, Cheng Y (2014) Actuator failure compensation and attitude control for rigid satellite by adaptive control using quaternion feedback. J Frankl Inst 351(1):296–314
Ariyibi SO, Tekinalp O (2020) Quaternion-based nonlinear attitude control of quadrotor formations carrying a slung load. Aerosp Sci Technol 105:105995
Andrle MS, Crassidis JL (2013) Geometric integration of quaternions. J Guid Control Dyn 36(6):1762–1767
Condurache D, Martinusi V (2010) Quaternionic exact solution to the relative orbital motion problem. J Guid Control Dyn 33(4):1035–1047
Chen P-C, Hologne M, Walker O (2017) Computing the rotational diffusion of biomolecules via molecular dynamics simulation and quaternion orientations. J Phys Chem B 121(8):1812–1823
Karney CF (2007) Quaternions in molecular modeling. J Mol Graph Model 25(5):595–604
Wu L, Zhang X, Chen H, Zhou Y (2020) Unsupervised quaternion model for blind colour image quality assessment. Signal Process 176:107708
Ell TA, Le Bihan N, Sangwine SJ (2014) Quaternion Fourier transforms for signal and image processing. Wiley, New York
Pervin E, Webb JA (1982) Quaternions in computer vision and robotics. Technical report, Department of Computer Science, Carnegie-Mellon University, Pittsburgh, PA
Caudai C, Salerno E, Zoppè M, Tonazzini A (2015) Inferring 3D chromatin structure using a multiscale approach based on quaternions. BMC Bioinform 16(1):1–11
Hanson RM, Kohler D, Braun SG (2011) Quaternion-based definition of protein secondary structure straightness and its relationship to Ramachandran angles. Proteins Struct Funct Bioinform 79(7):2172–2180
Mason J, Schuh C (2008) Hyperspherical harmonics for the representation of crystallographic texture. Acta Mater 56(20):6141–6155
Barrallo J et al (2010) Expanding the Mandelbrot set into higher dimensions. In: Proceedings of Bridges 2010: mathematics, music, art, architecture, culture, pp 247–254
Kumar S, Tripathi BK (2019) On the learning machine with quaternionic domain neural network and its high-dimensional applications. J Intell Fuzzy Syst 36(6):5189–5202
Parcollet T, Morchid M, Linares G (2020) A survey of quaternion neural networks. Artif Intell Rev 53(4):2957–2982
Kusamichi H, Isokawa T, Matsui N, Ogawa Y, Maeda K (2004) A new scheme for color night vision by quaternion neural network. In: Proceedings of the 2nd international conference on autonomous robots and agents, vol 1315. Citeseer
Parcollet T, Ravanelli M, Morchid M, Linarès G, De Mori R (2018) Speech recognition with quaternion neural networks. arXiv preprint arXiv:1811.09678
Qiu X, Parcollet T, Ravanelli M, Lane N, Morchid M (2020) Quaternion neural networks for multi-channel distant speech recognition. arXiv preprint arXiv:2005.08566
Rao SP, Panetta K, Agaian S (2020) Quaternion based neural network for hyperspectral image classification. In: Mobile multimedia/image processing, security, and applications 2020, vol 11399. International Society for Optics and Photonics, p 113990S
Shang F, Hirose A (2013) Quaternion neural-network-based PolSAR land classification in Poincare-sphere-parameter space. IEEE Trans Geosci Remote Sens 52(9):5693–5703
Greenblatt A, Mosquera-Lopez C, Agaian S (2013) Quaternion neural networks applied to prostate cancer Gleason grading. In: 2013 IEEE international conference on systems, man, and cybernetics, IEEE, pp 1144–1149
Luo L, Feng H, Ding L (2010) Color image compression based on quaternion neural network principal component analysis. In: 2010 international conference on multimedia technology. IEEE, pp 1–4
Takahashi K, Takahashi S, Cui Y, Hashimoto M (2014) Remarks on computational facial expression recognition from HOG features using quaternion multi-layer neural network. In: International conference on engineering applications of neural networks. Springer, pp 15–24
Takahashi K (2018) Remarks on control of robot manipulator using quaternion neural network. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 560–565
Parcollet T, Morchid M, Bousquet P-M, Dufour R, Linarès G, De Mori R (2016) Quaternion neural networks for spoken language understanding. In: 2016 IEEE spoken language technology workshop (SLT). IEEE, pp 362–368
Zou A-M, Kumar KD (2013) Quaternion-based distributed output feedback attitude coordination control for spacecraft formation flying. J Guid Control Dyn 36(2):548–556
Huang X, Gai S (2020) Banknote classification based on convolutional neural network in quaternion wavelet domain. IEEE Access 8:162141–162148
Arena P, Fortuna L, Muscato G, Xibilia MG (1997) Multilayer perceptrons to approximate quaternion valued functions. Neural Netw 10(2):335–342
Muramoto N, Isokawa T, Nishimura H, Matsui N (2013) On processing three dimensional data by quaternionic neural networks. In: The 2013 international joint conference on neural networks (IJCNN). IEEE, pp 1–5
Amiri M, Soleimani S (2021) ML-based group method of data handling: an improvement on the conventional GMDH. Complex Intell Syst 7(6):2949–2960
Hayman S (1999) The Mcculloch–Pitts model. In: IJCNN’99. International joint conference on neural networks. Proceedings (Cat. No. 99CH36339), vol 6. IEEE, pp 4438–4439
Nitta T (1997) An extension of the back-propagation algorithm to complex numbers. Neural Netw 10(8):1391–1415
Aizenberg I (2011) Complex-valued neural networks with multi-valued neurons, vol 353. Springer, Berlin
Nitta T (1995) A quaternary version of the back-propagation algorithm. In: Proceedings of ICNN’95-international conference on neural networks, vol 5. IEEE, pp 2753–2756
Mel BW (1994) Information processing in dendritic trees. Neural Comput 6(6):1031–1085
Payeur A, Béïque J-C, Naud R (2019) Classes of dendritic information processing. Curr Opin Neurobiol 58:78–85
London M, Häusser M (2005) Dendritic computation. Annu Rev Neurosci 28:503–532
Jiang T, Wang D, Ji J, Todo Y, Gao S (2015) Single dendritic neuron with nonlinear computation capacity: a case study on xor problem. In: 2015 IEEE international conference on progress in informatics and computing (PIC). IEEE, pp 20–24
Todo Y, Tamura H, Yamashita K, Tang Z (2014) Unsupervised learnable neuron model with nonlinear interaction on dendrites. Neural Netw 60:96–103
Stöckel A, Eliasmith C (2021) Passive nonlinear dendritic interactions as a computational resource in spiking neural networks. Neural Comput 33(1):96–128
Anzai A, Ohzawa I, Freeman RD (1999) Neural mechanisms for processing binocular information ii. Complex cells. J Neurophysiol 82(2):909–924
Koch C, Poggio T (1992) Multiplying with synapses and neurons. In: Single neuron computation. Elsevier, pp 315–345
Todo Y, Tang Z, Todo H, Ji J, Yamashita K (2019) Neurons with multiplicative interactions of nonlinear synapses. Int J Neural Syst 29(08):1950012
Schnupp JW, King AJ (2001) Neural processing: the logic of multiplication in single neurons. Curr Biol 11(16):R640–R642
Cotter NE (1990) The Stone–Weierstrass theorem and its application to neural networks. IEEE Trans Neural Netw 1(4):290–295
Gao P, Woo W, Dlay S (2006) Weierstrass approach to blind source separation of multiple nonlinearly mixed signals. IEE Proc Circuits Devices Syst 153(4):332–345
Gao P, Woo W, Dlay S (2006) Non-linear independent component analysis using series reversion and Weierstrass network. IEE Proc Vis Image Signal Process 153(2):115–131
Yadav RN, Kalra PK, John J (2007) Time series prediction with single multiplicative neuron model. Appl Soft Comput 7(4):1157–1163
Kumar S, Singh RK, Chaudhary A (2020) On the learning machine with amplificatory neuron in complex domain. Arab J Sci Eng 45(12):10287–10309
Anzai A, Ohzawa I, Freeman RD (1999) Neural mechanisms for processing binocular information i. Simple cells. J Neurophysiol 82(2):891–908
Roberts S (1987) Evidence for distinct serial processes in animals: the multiplicative-factors method. Anim Learn Behav 15(2):135–173
Fallahnezhad M, Moradi MH, Zaferanlouei S (2011) A hybrid higher order neural classifier for handling classification problems. Expert Syst Appl 38(1):386–393
Zhang M (2009) Artificial higher order neural network nonlinear models: Sas nlin or honns?. In: Artificial higher order neural networks for economics and business. IGI Global, pp 1–47
Wagenmakers E-J, Farrell S (2004) AIC model selection using Akaike weights. Psychonom Bull Rev 11(1):192–196
Greenblatt AB, Agaian SS (2018) Introducing quaternion multi-valued neural networks with numerical examples. Inf Sci 423:326–342
Tripathi BK, Kalra PK (2010) The novel aggregation function-based neuron models in complex domain. Soft Comput 14(10):1069–1081
Popa C-A (2018) Learning algorithms for quaternion-valued neural networks. Neural Process Lett 47(3):949–973
Isokawa T, Kusakabe T, Matsui N, Peper F (2003) Quaternion neural network and its application. In: International conference on knowledge-based and intelligent information and engineering systems. Springer, pp 318–324
Parcollet T, Ravanelli M, Morchid M, Linarès G, Trabelsi C, De Mori R, Bengio Y (2018) Quaternion recurrent neural networks. arXiv preprint arXiv:1806.04418
Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941
Acknowledgements
This work is financially supported by Technical Education Quality Improvement Programme (TEQIP-III), Dr. A.P.J. Abdul Kalam Technical University, Lucknow, U.P., INDIA (Grant No.: AKTU/Dean-PGSR/2019/CRIP/01).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: QVM neuron-based neural network and derivation of its learning rules
Appendix A: QVM neuron-based neural network and derivation of its learning rules
We consider a three-layered neural architecture (\(L-M-N\)) associated with the proposed quaternionic-valued multiplicative (QVM) neurons. This architecture contains L number of inputs, M number of hidden neurons, and N number of output neurons. The output (\(Y_m\)) of mth QVM neuron (\(Y_m\)) at hidden layer is computed by the split-type quaternionic-valued activation function (\(f_{\mathbb {H}}\)) of its net potential (\(V_m\)). The net internal potential (\(V_m\)) and output (\(Y_m\)) are, respectively, defined as follows:
In the same way, the net internal potential (\(V_n\)) and output (\(Y_n\)) of nth QVM neuron at output layer are expressed, respectively, as follows:
and
Let \(Y_n^D=\Re (Y_n^D) +\Im _1(Y_n^D)i+\Im _2(Y_n^D)j+\Im _3(Y_n^D)k\) presents the desired output for nth QVM neuron at output layer. The error difference (\(e_n=Y_n^D-Y_n\)) at nth neuron is expressed as
Let E be the real-valued objective function which is evaluated as follows:
The quaternionic version of gradient descent backpropagation (QBP) algorithm is used to minimize the error function (E). This algorithm recursively alters all weights of the network through the weight updates. The weight update (\(\Delta w\)) is used to update old weight (\(w^\textrm{old}\)) to new weight (\(w^\textrm{new}\)) as
where the weight update (\(\Delta w\)) is derived by negative quaternionic gradient of cost function (\(\nabla _w{E}\)) with the constant learning rate (\(\eta \)). This gradient (\(\nabla _w{E}\)) represents the derivative with respect to all components (a real and three imaginary components) of quaternionic weights, which is expressed in the following weight update:
For any weight \(w=w_{mn}\) that connects mth hidden neuron to nth output neuron, the derivative \(-\partial E/\partial \Re (w_{mn})\) required in Eq. A.8 is deduced by using chain rule and its simplification yields
Similarly, the derivative \(-\partial E/\partial \Im _k(w_{mn})\) for each component (\(k=1, 2, 3\)) required in Eq. A.8 is deduced as
After substituting the derivatives \(-\partial E/\partial \Re (w_{mn})\) and \(-\partial E/\partial \Im _k(w_{mn})\) from Eqs. A.9 and A.10 in Eq. A.8, we get the simplified expression as
Equation A.11 requires the quaternionic gradients \(\nabla _{w_{mn}}\Re (V_n)\) and \(\nabla _{w_{mn}}\Im _k(V_n)\) for the simplification. Thus, we have considered Eq. A.3 and expressed mathematically as
where \(\varOmega _{mn} = w_{mn}\otimes Y_{m}+w^{0}_{mn}\otimes q^{0}\) is the particular term that associates mth hidden neuron and nth output neuron, \(V_n^{\text {LT}}=^\otimes \prod _{m=1}^{m-1}\varOmega _{mn}\) is the aggregation function that aggregates all left terms (LT) of \(\varOmega _{mn}\), and similarly, \( V_n^{\text {RT}}=^\otimes \prod _{m=m+1}^M\varOmega _{mn}\) aggregates all right terms (RT) of \(\varOmega _{mn}\). Now, the quaternionic gradients \(\nabla _{w_{mn}}\Re (V_n)\) and \(\nabla _{w_{mn}}\Im _k(V_n); k=1,2,3\) are derived using chain rule and their simplified forms are presented as follows:
and
After substituting the quaternionic gradients \(\nabla _{w_{mn}}\Re (V_n)\) and \(\nabla _{w_{mn}}\Im _k(V_n)\) from Eqs. A.13 to A.16 in Eq. A.11, we get the simplified expression as
where
Now, for the weight \(w=w_{lm}\) that provides the connection between lth input and mth hidden neuron, the derivatives \(-\partial E/\partial \Re (w_{lm})\) and \(-\partial E/\partial \Im _k(w_{lm}); k=1,2,3\) required in Eq. A.8 are deduced using the chain rule. These derivations are computed as similar to the derivation of Eqs. A.9 and A.10
After substituting the quaternionic gradients Eqs. A.19 and A.20 in Eq. A.8, we get
For the derivation of quaternionic gradients \(\nabla _{w_{lm}}\Re (V_n)\) and \(\nabla _{w_{lm}}\Im _k(V_n)\), we have considered Eq. A.12 and rewritten as
where \(V'_n=V_n^{\text {LT}}\otimes \varOmega _{mn}=\,\,^\otimes \prod _{m=1}^{m}\varOmega _{mn}\). Now, the quaternionic gradients \(\nabla _{w_{lm}}\Re (V_n)\) and \(\nabla _{w_{lm}}\Im _k(V_n)\) are deduced from Eq. A.22 through the chain rule of derivation and we get
and
In Eqs. A.23–A.26, we need to derive the quaternionic gradients \(\nabla _{w_{lm}}\Re (V'_n)\) and \(\nabla _{w_{lm}}\Im _k(V'_m)\). Thus, these gradients are also derived by the chain rule which are similar to the derivation of Eqs. A.23–A.26
Similarly, the quaternionic gradients \(\nabla _{w_{lm}}\Re (\varOmega _{mn})\) and \(\nabla _{w_{lm}}\Im _k(\varOmega _{mn})\) are derived using \(\varOmega _{mn} = w_{mn}\otimes Y_{m}+w^{0}_{mn}\otimes q^{0}\)
The derivations of \(\nabla _{w_{lm}}\Re (V_m)\) and \(\nabla _{w_{lm}}\Im _k(V_m)\) are derived and simplified as similar to the derivation of \(\nabla _{w_{lm}}\Re (V_n)\) and \(\nabla _{w_{lm}}\Im _k(V_n); k=1,2,3\) presented in Eqs. A.13–A.16, respectively. Therefore
After substituting the quaternionic gradients \(\nabla _{w_{lm}}\Re (V_m)\) and \(\nabla _{w_{lm}}\Im _k(V_m)\) from Eqs. A.35 to A.38 in Eqs. A.31 to A.34 and then simplifying, we get
The substitution of quaternionic gradients \(\nabla _{w_{lm}}\Re (\varOmega _{mn})\) and \(\nabla _{w_{lm}}\Im _k(\varOmega _{mn}); k=1,2,3\) from Eqs. A.39 to A.42 in Eqs. A.27, A.30 and simplification yield
Now, the quaternionic gradients \(\nabla _{w_{lm}}\Re (V'_n)\) and \(\nabla _{w_{lm}}\Im _k(V'_n)\) from Eqs. A.43 to A.46 are substituted in Eqs. A.23 to A.26, and then, the simplified expressions are presented as
Finally, substituted the quaternionic gradients \(\nabla _{w_{lm}}\Re (V_n)\) and \(\nabla _{w_{lm}}\Im _k(V_n)\) from Eqs. A.47 to A.50 in Eq. A.21 for the calculation of weight updates \(\Delta w_{lm}\) and \(\Delta w_{lm}^0\) and we got the expression after simplification as
where
The derivation of learning rules for the network associated with proposed QVM neurons completes here.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kumar, S., Singh, R.K. & Chaudhary, A. A novel non-linear neuron model based on multiplicative aggregation in quaternionic domain. Complex Intell. Syst. 9, 3161–3183 (2023). https://doi.org/10.1007/s40747-022-00911-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40747-022-00911-6