Abstract
This paper presents the deduction of the enhanced gradient descent, conjugate gradient, scaled conjugate gradient, quasi-Newton, and Levenberg–Marquardt methods for training quaternion-valued feedforward neural networks, using the framework of the HR calculus. The performances of these algorithms in the real- and complex-valued cases led to the idea of extending them to the quaternion domain, also. Experiments done using the proposed training methods on time series prediction applications showed a significant performance improvement over the quaternion gradient descent algorithm.
Similar content being viewed by others
References
Amin M, Amin M, Al-Nuaimi A, Murase K (2011) Wirtinger calculus based gradient descent and Levenberg–Marquardt learning algorithms in complex-valued neural networks. In: Lu BL, Zhang L, Kwok J (eds) Neural information processing. Lecture notes in computer science, vol 7062. Springer, Berlin, pp 550–559. doi:10.1007/978-3-642-24955-6_66
Arena P, Baglio S, Fortuna L, Xibilia M (1995) Chaotic time series prediction via quaternionic multilayer perceptrons. In: International conference on systems, man and cybernetics, vol 2. IEEE, pp 1790–1794. doi:10.1109/ICSMC.1995.538035
Arena P, Fortuna L, Muscato G, Xibilia M (1997) Multilayer perceptrons to approximate quaternion valued functions. Neural Netw 10(2):335–342. doi:10.1016/S0893-6080(96)00048-2
Arena P, Fortuna L, Muscato G, Xibilia M (1998) Neural networks in multidimensional domains fundamentals and new trends in modelling and control. Lecture notes in control and information sciences, vol 234. Springer, London. doi:10.1007/BFb0047683
Barnard E (1992) Optimization for training neural nets. IEEE Trans Neural Netw 3(2):232–240. doi:10.1109/72.125864
Battiti R (1992) First and second-order methods for learning between steepest descent and Newton’s method. Neural Comput 4(2):141–166. doi:10.1162/neco.1992.4.2.141
Beale E (1972) A derivation of conjugate gradients. In: Lootsma FA (ed) Numerical methods for nonlinear optimization. Academic Press, London, pp 39–43
Bishop C (1995) Neural networks for pattern recognition. Oxford University Press Inc, New York
Buchholz S, Le Bihan N (2008) Polarized signal classification by complex and quaternionic multi-layer perceptrons. Int J Neural Syst 18(2):75–85. doi:10.1142/S0129065708001403
Buchholz S, Sommer G (2000) Quaternionic spinor MLP. In: European symposium on artificial neural networks, pp 377–382
Charalambous C (1992) Conjugate gradient algorithm for efficient training of artificial neural networks. IEE Proc G Circuits Devices Syst 139(3):301–310
Che Ujang B, Took C, Mandic D (2010) Split quaternion nonlinear adaptive filtering. Neural Netw 23(3):426–434. doi:10.1016/j.neunet.2009.10.006
Che Ujang B, Took C, Mandic D (2011) Quaternion-valued nonlinear adaptive filtering. IEEE Trans Neural Netw 22(8):1193–1206. doi:10.1109/TNN.2011.2157358
Che Ujang B, Took C, Mandic D (2012) On quaternion analyticity: enabling quaternion-valued nonlinear adaptive filtering. In: International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2117–2120. doi:10.1109/ICASSP.2012.6288329
Fahlman S (1988) An empirical study of learning speed in backpropagation networks. Technical report 1800, Carnegie Mellon University. http://repository.cmu.edu/compsci/1800
Fletcher R, Powell M (1963) A rapidly convergent descent method for minimization. Comput J 6(2):163–168. doi:10.1093/comjnl/6.2.163
Goh S, Mandic D (2004) A complex-valued RTRL algorithm for recurrent neural networks. Neural Comput 16(12):2699–2713. doi:10.1162/0899766042321779
Goh S, Mandic D (2005) Nonlinear adaptive prediction of complex-valued signals by complex-valued PRNN. IEEE Trans Signal Process 53(5):1827–1836. doi:10.1109/TSP.2005.845462
Goh S, Mandic D (2007) An augmented CRTRL for complex-valued recurrent neural networks. Neural Netw 20(10):1061–1066. doi:10.1016/j.neunet.2007.09.015
Goh S, Mandic D (2007) Stochastic gradient-adaptive complex-valued nonlinear neural adaptive filters with a gradient-adaptive step size. IEEE Trans Neural Netw 18(5):1511–1516. doi:10.1109/TNN.2007.895828
Hagan M, Menhaj M (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993. doi:10.1109/72.329697
Hestenes M, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49(6):409–436
Isokawa T, Kusakabe T, Matsui N, Peper F (2003) Quaternion neural network and its application. In: Palade V, Howlett R, Jai L (eds) Knowledge-based intelligent information and engineering systems. Lecture notes in computer science, vol 2774. Springer, Berlin, pp 318–324. doi:10.1007/978-3-540-45226-3_44
Jacobs R (1988) Increased rates of convergence through learning rate adaptation. Neural Netw 1(4):295–307. doi:10.1016/0893-6080(88)90003-2
Jahanchahi C, Took C, Mandic D (2010) On HR calculus, quaternion valued stochastic gradient, and adaptive three dimensional wind forecasting. In: International joint conference on neural networks (IJCNN). IEEE, pp 1–5. doi:10.1109/IJCNN.2010.5596629
Johansson E, Dowla F, Goodman D (1991) Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. Int J Neural Syst 2(4):291–301. doi:10.1142/S0129065791000261
Kusamichi H, Isokawa T, Matsui N, Ogawa Y, Maeda K (2004) A new scheme for color night vision by quaternion neural network. In: International conference on autonomous robots and agents, pp 101–106
Luenberger D, Ye Y (2008) Linear and nonlinear programming. International series in operations research & management science, vol 116. Springer, Berlin. doi:10.1007/978-0-387-74503-9
Mandic D, Chambers J (2001) Recurrent neural networks for prediction: learning algorithms, architectures and stability. Wiley, New York. doi:10.1002/047084535X
Marquardt D (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441. doi:10.1137/0111030
Møller M (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6(4):525–533. doi:10.1016/S0893-6080(05)80056-5
Nocedal J, Wright S (1999) Numerical optimization. Springer series in operations research. Springer, New York. doi:10.1007/978-0-387-40065-5
Polak E, Ribiere G (1969) Note sur la convergence de méthodes de directions conjuguées. Revue Française d’Informatique et de Recherche Opérationnelle 3(16):35–43
Popa CA (2014) Enhanced gradient descent algorithms for complex-valued neural networks. In: International symposium on symbolic and numeric algorithms for scientific computing (SYNASC). IEEE, pp 272–279. doi:10.1109/SYNASC.2014.44
Popa CA (2015) Conjugate gradient algorithms for complex-valued neural networks. Neural Inf Process ICONIP 2015:412–422. doi:10.1007/978-3-319-26535-3_47
Popa CA (2015) Quasi-Newton learning methods for complex-valued neural networks. In: International joint conference on neural networks (IJCNN). IEEE. doi:10.1109/IJCNN.2015.7280450
Popa CA (2015) Scaled conjugate gradient learning for complex-valued neural networks. In: Matoušek R (ed) Mendel 2015. Advances in intelligent systems and computing, vol 378. Springer, Berlin, pp 221–233. doi:10.1007/978-3-319-19824-8_18
Popa CA (2016) Levenberg–Marquardt learning algorithm for quaternion-valued neural networks. In: 18th International symposium on symbolic and numeric algorithms for scientific computing (SYNASC), pp 272–278. doi:10.1109/SYNASC.2016.050
Popa CA (2016) Scaled conjugate gradient learning for quaternion-valued neural networks. Neural Inf Process ICONIP 2016:243–252. doi:10.1007/978-3-319-46675-0_27
Popa CA (2017) Conjugate gradient algorithms for quaternion-valued neural networks. In: Matoušek R (ed) Recent advances in soft computing. ICSC-MENDEL 2016. Advances in intelligent systems and computing, vol 576, pp 176–185. doi:10.1007/978-3-319-58088-3_17
Popa CA (2017) Quasi-newton learning methods for quaternion-valued neural networks. Adv Comput Intell IWANN 2017:362–374. doi:10.1007/978-3-319-59153-7_32
Popa CA (2018) Enhanced gradient descent algorithms for quaternion-valued neural networks. In: Balas V, Jain L, Balas M (eds) Soft computing applications. SOFA 2016. Advances in intelligent systems and computing, vol 634. doi:10.1007/978-3-319-62524-9_5
Powell M (1977) Restart procedures for the conjugate gradient method. Math Program 12(1):241–254. doi:10.1007/BF01593790
Reeves C, Fletcher R (1964) Function minimization by conjugate gradients. Comput J 7(2):149–154. doi:10.1093/comjnl/7.2.149
Riedmiller M (1994) Advanced supervised learning in multi-layer perceptrons—from backpropagation to adaptive learning algorithms. Comput Stand Interfaces 16(3):265–278. doi:10.1016/0920-5489(94)90017-5
Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE international conference on neural networks, vol 1. IEEE, pp 586–591. doi:10.1109/ICNN.1993.298623
Shanno D (1970) Conditioning of quasi-Newton methods for function minimization. Math Comput 24(111):647–656. doi:10.1090/S0025-5718-1970-0274029-X
Tollenaere T (1990) Supersab: fast adaptive back propagation with good scaling properties. Neural Netw 3(5):561–573. doi:10.1016/0893-6080(90)90006-7
Took C, Mandic D (2009) The quaternion LMS algorithm for adaptive filtering of hypercomplex processes. IEEE Trans Signal Process 57(4):1316–1327. doi:10.1109/TSP.2008.2010600
Took C, Mandic D (2010) Quaternion-valued stochastic gradient-based adaptive IIR filtering. IEEE Trans Signal Process 58(7):3895–3901. doi:10.1109/TSP.2010.2047719
Took C, Mandic D (2010) A quaternion widely linear adaptive filter. IEEE Trans Signal Process 58(8):4427–4431. doi:10.1109/TSP.2010.2048323
Took C, Mandic D, Aihara K (2010) Quaternion-valued short term forecasting of wind profile. In: International joint conference on neural networks (IJCNN). IEEE, pp 1–6. doi:10.1109/IJCNN.2010.5596690
Took C, Mandic D, Benesty J (2009) Study of the quaternion LMS and four-channel LMS algorithms. In: International conference on acoustics, speech and signal processing. IEEE, pp 3109–3112. doi:10.1109/ICASSP.2009.4960282
Took C, Strbac G, Aihara K, Mandic D (2011) Quaternion-valued short-term joint forecasting of three-dimensional wind and atmospheric parameters. Renew Energy 36(6):1754–1760. doi:10.1016/j.renene.2010.12.013
Wang M, Took C, Mandic D (2011) A class of fast quaternion valued variable stepsize stochastic gradient learning algorithms for vector sensor processes. In: International joint conference on neural networks (IJCNN). IEEE, pp 2783–2786. doi:10.1109/IJCNN.2011.6033585
Watrous R (1988) Learning algorithms for connectionist networks: applied gradient methods of nonlinear optimization. Technical reports (CIS) MS-CIS-88-62, University of Pennsylvania
Xia Y, Jahanchahi C, Mandic D (2015) Quaternion-valued echo state networks. IEEE Trans Neural Netw Learn Syst 26(4):663–673. doi:10.1109/TNNLS.2014.2320715
Xia Y, Jelfs B, Van Hulle M, Principe J, Mandic D (2011) An augmented echo state network for nonlinear adaptive filtering of complex noncircular signals. IEEE Trans Neural Netw 22(1):74–83. doi:10.1109/TNN.2010.2085444
Xu D, Xia Y, Mandic D (2016) Optimization in quaternion dynamic systems: gradient, Hessian, and learning algorithms. IEEE Trans Neural Netw Learn Syst 27(2):249–261. doi:10.1109/TNNLS.2015.2440473
Zhang F (ed) (2005) The Schur complement and its applications. Springer, New York. doi:10.1007/b105056
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Popa, CA. Learning Algorithms for Quaternion-Valued Neural Networks. Neural Process Lett 47, 949–973 (2018). https://doi.org/10.1007/s11063-017-9716-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-017-9716-1