Skip to main content
Log in

Learning Algorithms for Quaternion-Valued Neural Networks

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

This paper presents the deduction of the enhanced gradient descent, conjugate gradient, scaled conjugate gradient, quasi-Newton, and Levenberg–Marquardt methods for training quaternion-valued feedforward neural networks, using the framework of the HR calculus. The performances of these algorithms in the real- and complex-valued cases led to the idea of extending them to the quaternion domain, also. Experiments done using the proposed training methods on time series prediction applications showed a significant performance improvement over the quaternion gradient descent algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Amin M, Amin M, Al-Nuaimi A, Murase K (2011) Wirtinger calculus based gradient descent and Levenberg–Marquardt learning algorithms in complex-valued neural networks. In: Lu BL, Zhang L, Kwok J (eds) Neural information processing. Lecture notes in computer science, vol 7062. Springer, Berlin, pp 550–559. doi:10.1007/978-3-642-24955-6_66

    Chapter  Google Scholar 

  2. Arena P, Baglio S, Fortuna L, Xibilia M (1995) Chaotic time series prediction via quaternionic multilayer perceptrons. In: International conference on systems, man and cybernetics, vol 2. IEEE, pp 1790–1794. doi:10.1109/ICSMC.1995.538035

  3. Arena P, Fortuna L, Muscato G, Xibilia M (1997) Multilayer perceptrons to approximate quaternion valued functions. Neural Netw 10(2):335–342. doi:10.1016/S0893-6080(96)00048-2

    Article  Google Scholar 

  4. Arena P, Fortuna L, Muscato G, Xibilia M (1998) Neural networks in multidimensional domains fundamentals and new trends in modelling and control. Lecture notes in control and information sciences, vol 234. Springer, London. doi:10.1007/BFb0047683

  5. Barnard E (1992) Optimization for training neural nets. IEEE Trans Neural Netw 3(2):232–240. doi:10.1109/72.125864

    Article  Google Scholar 

  6. Battiti R (1992) First and second-order methods for learning between steepest descent and Newton’s method. Neural Comput 4(2):141–166. doi:10.1162/neco.1992.4.2.141

    Article  Google Scholar 

  7. Beale E (1972) A derivation of conjugate gradients. In: Lootsma FA (ed) Numerical methods for nonlinear optimization. Academic Press, London, pp 39–43

    Google Scholar 

  8. Bishop C (1995) Neural networks for pattern recognition. Oxford University Press Inc, New York

    MATH  Google Scholar 

  9. Buchholz S, Le Bihan N (2008) Polarized signal classification by complex and quaternionic multi-layer perceptrons. Int J Neural Syst 18(2):75–85. doi:10.1142/S0129065708001403

    Article  Google Scholar 

  10. Buchholz S, Sommer G (2000) Quaternionic spinor MLP. In: European symposium on artificial neural networks, pp 377–382

  11. Charalambous C (1992) Conjugate gradient algorithm for efficient training of artificial neural networks. IEE Proc G Circuits Devices Syst 139(3):301–310

    Article  Google Scholar 

  12. Che Ujang B, Took C, Mandic D (2010) Split quaternion nonlinear adaptive filtering. Neural Netw 23(3):426–434. doi:10.1016/j.neunet.2009.10.006

    Article  Google Scholar 

  13. Che Ujang B, Took C, Mandic D (2011) Quaternion-valued nonlinear adaptive filtering. IEEE Trans Neural Netw 22(8):1193–1206. doi:10.1109/TNN.2011.2157358

    Article  Google Scholar 

  14. Che Ujang B, Took C, Mandic D (2012) On quaternion analyticity: enabling quaternion-valued nonlinear adaptive filtering. In: International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2117–2120. doi:10.1109/ICASSP.2012.6288329

  15. Fahlman S (1988) An empirical study of learning speed in backpropagation networks. Technical report 1800, Carnegie Mellon University. http://repository.cmu.edu/compsci/1800

  16. Fletcher R, Powell M (1963) A rapidly convergent descent method for minimization. Comput J 6(2):163–168. doi:10.1093/comjnl/6.2.163

    Article  MathSciNet  MATH  Google Scholar 

  17. Goh S, Mandic D (2004) A complex-valued RTRL algorithm for recurrent neural networks. Neural Comput 16(12):2699–2713. doi:10.1162/0899766042321779

    Article  MATH  Google Scholar 

  18. Goh S, Mandic D (2005) Nonlinear adaptive prediction of complex-valued signals by complex-valued PRNN. IEEE Trans Signal Process 53(5):1827–1836. doi:10.1109/TSP.2005.845462

    Article  MathSciNet  MATH  Google Scholar 

  19. Goh S, Mandic D (2007) An augmented CRTRL for complex-valued recurrent neural networks. Neural Netw 20(10):1061–1066. doi:10.1016/j.neunet.2007.09.015

    Article  MATH  Google Scholar 

  20. Goh S, Mandic D (2007) Stochastic gradient-adaptive complex-valued nonlinear neural adaptive filters with a gradient-adaptive step size. IEEE Trans Neural Netw 18(5):1511–1516. doi:10.1109/TNN.2007.895828

    Article  Google Scholar 

  21. Hagan M, Menhaj M (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993. doi:10.1109/72.329697

    Article  Google Scholar 

  22. Hestenes M, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49(6):409–436

    Article  MathSciNet  MATH  Google Scholar 

  23. Isokawa T, Kusakabe T, Matsui N, Peper F (2003) Quaternion neural network and its application. In: Palade V, Howlett R, Jai L (eds) Knowledge-based intelligent information and engineering systems. Lecture notes in computer science, vol 2774. Springer, Berlin, pp 318–324. doi:10.1007/978-3-540-45226-3_44

    Chapter  Google Scholar 

  24. Jacobs R (1988) Increased rates of convergence through learning rate adaptation. Neural Netw 1(4):295–307. doi:10.1016/0893-6080(88)90003-2

    Article  Google Scholar 

  25. Jahanchahi C, Took C, Mandic D (2010) On HR calculus, quaternion valued stochastic gradient, and adaptive three dimensional wind forecasting. In: International joint conference on neural networks (IJCNN). IEEE, pp 1–5. doi:10.1109/IJCNN.2010.5596629

  26. Johansson E, Dowla F, Goodman D (1991) Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. Int J Neural Syst 2(4):291–301. doi:10.1142/S0129065791000261

    Article  Google Scholar 

  27. Kusamichi H, Isokawa T, Matsui N, Ogawa Y, Maeda K (2004) A new scheme for color night vision by quaternion neural network. In: International conference on autonomous robots and agents, pp 101–106

  28. Luenberger D, Ye Y (2008) Linear and nonlinear programming. International series in operations research & management science, vol 116. Springer, Berlin. doi:10.1007/978-0-387-74503-9

    Google Scholar 

  29. Mandic D, Chambers J (2001) Recurrent neural networks for prediction: learning algorithms, architectures and stability. Wiley, New York. doi:10.1002/047084535X

    Book  Google Scholar 

  30. Marquardt D (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441. doi:10.1137/0111030

    Article  MathSciNet  MATH  Google Scholar 

  31. Møller M (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6(4):525–533. doi:10.1016/S0893-6080(05)80056-5

    Article  Google Scholar 

  32. Nocedal J, Wright S (1999) Numerical optimization. Springer series in operations research. Springer, New York. doi:10.1007/978-0-387-40065-5

    Google Scholar 

  33. Polak E, Ribiere G (1969) Note sur la convergence de méthodes de directions conjuguées. Revue Française d’Informatique et de Recherche Opérationnelle 3(16):35–43

    Article  MATH  Google Scholar 

  34. Popa CA (2014) Enhanced gradient descent algorithms for complex-valued neural networks. In: International symposium on symbolic and numeric algorithms for scientific computing (SYNASC). IEEE, pp 272–279. doi:10.1109/SYNASC.2014.44

  35. Popa CA (2015) Conjugate gradient algorithms for complex-valued neural networks. Neural Inf Process ICONIP 2015:412–422. doi:10.1007/978-3-319-26535-3_47

    Google Scholar 

  36. Popa CA (2015) Quasi-Newton learning methods for complex-valued neural networks. In: International joint conference on neural networks (IJCNN). IEEE. doi:10.1109/IJCNN.2015.7280450

  37. Popa CA (2015) Scaled conjugate gradient learning for complex-valued neural networks. In: Matoušek R (ed) Mendel 2015. Advances in intelligent systems and computing, vol 378. Springer, Berlin, pp 221–233. doi:10.1007/978-3-319-19824-8_18

    Google Scholar 

  38. Popa CA (2016) Levenberg–Marquardt learning algorithm for quaternion-valued neural networks. In: 18th International symposium on symbolic and numeric algorithms for scientific computing (SYNASC), pp 272–278. doi:10.1109/SYNASC.2016.050

  39. Popa CA (2016) Scaled conjugate gradient learning for quaternion-valued neural networks. Neural Inf Process ICONIP 2016:243–252. doi:10.1007/978-3-319-46675-0_27

    Google Scholar 

  40. Popa CA (2017) Conjugate gradient algorithms for quaternion-valued neural networks. In: Matoušek R (ed) Recent advances in soft computing. ICSC-MENDEL 2016. Advances in intelligent systems and computing, vol 576, pp 176–185. doi:10.1007/978-3-319-58088-3_17

  41. Popa CA (2017) Quasi-newton learning methods for quaternion-valued neural networks. Adv Comput Intell IWANN 2017:362–374. doi:10.1007/978-3-319-59153-7_32

    Article  Google Scholar 

  42. Popa CA (2018) Enhanced gradient descent algorithms for quaternion-valued neural networks. In: Balas V, Jain L, Balas M (eds) Soft computing applications. SOFA 2016. Advances in intelligent systems and computing, vol 634. doi:10.1007/978-3-319-62524-9_5

  43. Powell M (1977) Restart procedures for the conjugate gradient method. Math Program 12(1):241–254. doi:10.1007/BF01593790

    Article  MathSciNet  MATH  Google Scholar 

  44. Reeves C, Fletcher R (1964) Function minimization by conjugate gradients. Comput J 7(2):149–154. doi:10.1093/comjnl/7.2.149

    Article  MathSciNet  MATH  Google Scholar 

  45. Riedmiller M (1994) Advanced supervised learning in multi-layer perceptrons—from backpropagation to adaptive learning algorithms. Comput Stand Interfaces 16(3):265–278. doi:10.1016/0920-5489(94)90017-5

    Article  Google Scholar 

  46. Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE international conference on neural networks, vol 1. IEEE, pp 586–591. doi:10.1109/ICNN.1993.298623

  47. Shanno D (1970) Conditioning of quasi-Newton methods for function minimization. Math Comput 24(111):647–656. doi:10.1090/S0025-5718-1970-0274029-X

    Article  MathSciNet  MATH  Google Scholar 

  48. Tollenaere T (1990) Supersab: fast adaptive back propagation with good scaling properties. Neural Netw 3(5):561–573. doi:10.1016/0893-6080(90)90006-7

    Article  Google Scholar 

  49. Took C, Mandic D (2009) The quaternion LMS algorithm for adaptive filtering of hypercomplex processes. IEEE Trans Signal Process 57(4):1316–1327. doi:10.1109/TSP.2008.2010600

    Article  MathSciNet  Google Scholar 

  50. Took C, Mandic D (2010) Quaternion-valued stochastic gradient-based adaptive IIR filtering. IEEE Trans Signal Process 58(7):3895–3901. doi:10.1109/TSP.2010.2047719

    Article  MathSciNet  Google Scholar 

  51. Took C, Mandic D (2010) A quaternion widely linear adaptive filter. IEEE Trans Signal Process 58(8):4427–4431. doi:10.1109/TSP.2010.2048323

    Article  MathSciNet  Google Scholar 

  52. Took C, Mandic D, Aihara K (2010) Quaternion-valued short term forecasting of wind profile. In: International joint conference on neural networks (IJCNN). IEEE, pp 1–6. doi:10.1109/IJCNN.2010.5596690

  53. Took C, Mandic D, Benesty J (2009) Study of the quaternion LMS and four-channel LMS algorithms. In: International conference on acoustics, speech and signal processing. IEEE, pp 3109–3112. doi:10.1109/ICASSP.2009.4960282

  54. Took C, Strbac G, Aihara K, Mandic D (2011) Quaternion-valued short-term joint forecasting of three-dimensional wind and atmospheric parameters. Renew Energy 36(6):1754–1760. doi:10.1016/j.renene.2010.12.013

    Article  Google Scholar 

  55. Wang M, Took C, Mandic D (2011) A class of fast quaternion valued variable stepsize stochastic gradient learning algorithms for vector sensor processes. In: International joint conference on neural networks (IJCNN). IEEE, pp 2783–2786. doi:10.1109/IJCNN.2011.6033585

  56. Watrous R (1988) Learning algorithms for connectionist networks: applied gradient methods of nonlinear optimization. Technical reports (CIS) MS-CIS-88-62, University of Pennsylvania

  57. Xia Y, Jahanchahi C, Mandic D (2015) Quaternion-valued echo state networks. IEEE Trans Neural Netw Learn Syst 26(4):663–673. doi:10.1109/TNNLS.2014.2320715

    Article  MathSciNet  Google Scholar 

  58. Xia Y, Jelfs B, Van Hulle M, Principe J, Mandic D (2011) An augmented echo state network for nonlinear adaptive filtering of complex noncircular signals. IEEE Trans Neural Netw 22(1):74–83. doi:10.1109/TNN.2010.2085444

    Article  Google Scholar 

  59. Xu D, Xia Y, Mandic D (2016) Optimization in quaternion dynamic systems: gradient, Hessian, and learning algorithms. IEEE Trans Neural Netw Learn Syst 27(2):249–261. doi:10.1109/TNNLS.2015.2440473

    Article  MathSciNet  Google Scholar 

  60. Zhang F (ed) (2005) The Schur complement and its applications. Springer, New York. doi:10.1007/b105056

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Călin-Adrian Popa.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Popa, CA. Learning Algorithms for Quaternion-Valued Neural Networks. Neural Process Lett 47, 949–973 (2018). https://doi.org/10.1007/s11063-017-9716-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-017-9716-1

Keywords

Navigation