Advertisement

Trainable back-propagated functional transfer matrices

  • Cheng-Hao Cai
  • Yanyan Xu
  • Dengfeng Ke
  • Kaile Su
  • Jing Sun
Article
  • 15 Downloads

Abstract

Functional transfer matrices consist of real functions with trainable parameters. In this work, functional transfer matrices are used to model functional connections in neural networks. Different from linear connections in conventional weight matrices, the functional connections can represent nonlinear relations between two neighbouring layers. Neural networks with the functional connections, which are called functional transfer neural networks, can be trained via back-propagation. On the two spirals problem, the functional transfer neural networks are able to show considerably better performance than conventional multi-layer perceptrons. On the MNIST handwritten digit recognition task, the performance of the functional transfer neural networks is comparable to that of the conventional model. This study has demonstrated that the functional transfer matrices are able to perform better than the conventional weight matrices in specific cases, so that they can be alternatives of the conventional ones.

Keywords

Functional transfer neural networks Functional connections Back-propagation 

Notes

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities (No. 2016JX06) and the National Natural Science Foundation of China (No. 61472369).

References

  1. 1.
    Aizenberg IN, Moraga C (2007) Multilayer feedforward neural network based on multi-valued neurons (MLMVN) and a backpropagation learning algorithm. Soft Comput 11(2):169–183.  https://doi.org/10.1007/s00500-006-0075-5 CrossRefGoogle Scholar
  2. 2.
    Balog M, Gaunt AL, Brockschmidt M, Nowozin S, Tarlow D (2016) Deepcoder: Learning to write programs. CoRR arXiv:1611.01989
  3. 3.
    Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wise training of deep networks. In: Advances in Neural information processing systems 19, Proceedings of the Twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, December 4-7, 2006, pp 153–160. http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks
  4. 4.
    Bergstra J, Desjardins G, Lamblin P, Bengio Y (2009) Quadratic polynomials learn better image features. Tech. rep. Technical Report 1337, Département d’Informatique et de Recherche Opérationnelle, Université de MontréalGoogle Scholar
  5. 5.
    Bottou L (2012) Stochastic gradient descent tricks. In: Neural networks: tricks of the trade. 2nd edn, pp 421–436.  https://doi.org/10.1007/978-3-642-35289-8_25
  6. 6.
    Buchholz S, Sommer G (2008) On clifford neurons and clifford multi-layer perceptrons. Neural Netw 21 (7):925–935.  https://doi.org/10.1016/j.neunet.2008.03.004 CrossRefzbMATHGoogle Scholar
  7. 7.
    Cai C, Ke D, Xu Y, Su K (2017) Learning of human-like algebraic reasoning using deep feedforward neural networks. CoRR arXiv:1704.07503
  8. 8.
    Cai C, Ke D, Xu Y, Su K (2017) Symbolic manipulation based on deep neural networks and its application to axiom discovery. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 2136–2143.  https://doi.org/10.1109/IJCNN.2017.7966113
  9. 9.
    Chung H, Lee SJ, Park JG (2016) Deep neural network using trainable activation functions. In: 2016 International joint conference on neural networks, IJCNN 2016, vancouver, BC, Canada, July 24-29, 2016, pp 348–352.  https://doi.org/10.1109/IJCNN.2016.7727219
  10. 10.
    Dehuri S, Cho S (2010) A comprehensive survey on functional link neural networks and an adaptive PSO-BP learning for CFLNN. Neural Comput & Applic 19(2):187–205.  https://doi.org/10.1007/s00521-009-0288-5 CrossRefGoogle Scholar
  11. 11.
    Dehuri S, Cho S (2010) Evolutionarily optimized features in functional link neural network for classification. Expert Syst Appl 37(6):4379–4391.  https://doi.org/10.1016/j.eswa.2009.11.090 CrossRefGoogle Scholar
  12. 12.
    Dehuri S, Roy R, Cho S, Ghosh A (2012) An improved swarm optimized functional link artificial neural network (ISO-FLANN) for classification. J Syst Softw 85(6):1333–1345.  https://doi.org/10.1016/j.jss.2012.01.025 CrossRefGoogle Scholar
  13. 13.
    Deng L (2012) The MNIST, database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142.  https://doi.org/10.1109/MSP.2012.2211477 CrossRefGoogle Scholar
  14. 14.
    Deng L, Hinton GE, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2013, vancouver, BC, Canada, May 26-31, 2013, pp 8599–8603.  https://doi.org/10.1109/ICASSP.2013.6639344
  15. 15.
    Dreiseitl S (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5-6):352–359.  https://doi.org/10.1016/S1532-0464(03)00034-0 CrossRefGoogle Scholar
  16. 16.
    Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13-15, 2010, pp 249–256. http://www.jmlr.org/proceedings/papers/v9/glorot10a.html
  17. 17.
    Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth international conference on artificial intelligence and statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011, pp 315–323. http://www.jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf
  18. 18.
    Goodenough DJ, Rossmann K, Lusted LB (1974) Radiographic applications of receiver operating characteristic (roc) curves. Radiology 110(1):89–95CrossRefGoogle Scholar
  19. 19.
    Hecht-Nielsen R (1988) Theory of the backpropagation neural network. Neural Netw 1(Supplement-1):445–448.  https://doi.org/10.1016/0893-6080(88)90469-8 CrossRefGoogle Scholar
  20. 20.
    Hinton G, Deng L, Yu D, Dahl GE, Ar Mohamed, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97CrossRefGoogle Scholar
  21. 21.
    Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780.  https://doi.org/10.1162/neco.1997.9.8.1735 CrossRefGoogle Scholar
  23. 23.
    Irving G, Szegedy C, Alemi AA, Eén N, Chollet F, Urban J (2016) Deepmath - deep sequence models for premise selection. In: Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5-10, 2016, Barcelona, Spain, pp 2235–2243 . http://papers.nips.cc/paper/6280-deepmath-deep-sequence-models-for-premise-selection
  24. 24.
    Kominami Y, Ogawa H, Murase K (2017) Convolutional neural networks with multi-valued neurons. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 2673–2678.  https://doi.org/10.1109/IJCNN.2017.7966183
  25. 25.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRefGoogle Scholar
  26. 26.
    Leung H, Haykin S (1991) The complex backpropagation algorithm. IEEE Trans Signal Process 39 (9):2101–2104.  https://doi.org/10.1109/78.134446 CrossRefGoogle Scholar
  27. 27.
    Matsui N, Isokawa T, Kusamichi H, Peper F, Nishimura H (2004) Quaternion neural network with geometrical operators. J Intell Fuzzy Syst 15(3-4):149–164. http://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs00236 zbMATHGoogle Scholar
  28. 28.
    Mohamed A, Dahl GE, Hinton GE (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22.  https://doi.org/10.1109/TASL.2011.2109382 CrossRefGoogle Scholar
  29. 29.
    Montúfar GF, Pascanu R, Cho K, Bengio Y (2014) On the number of linear regions of deep neural networks. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp 2924–2932. http://papers.nips.cc/paper/5422-on-the-number-of-linear-regions-of-deep-neural-networks
  30. 30.
    Pao Y, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180.  https://doi.org/10.1016/0925-2312(94)90053-1 CrossRefGoogle Scholar
  31. 31.
    Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. CoRR arXiv:1712.04621
  32. 32.
    Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Tech. rep., California Univ San Diego La Jolla Inst for Cognitive ScienceGoogle Scholar
  33. 33.
    Rumelhart DE, Hinton GE, Williams RJ et al (1988) Learning representations by back-propagating errors. Cognitive Modeling 5(3):1zbMATHGoogle Scholar
  34. 34.
    Sarikaya R, Hinton GE, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE/ACM Trans Audio Speech &, Language Processing 22(4):778–784.  https://doi.org/10.1109/TASLP.2014.2303296 CrossRefGoogle Scholar
  35. 35.
    Siniscalchi SM, Svendsen T, Sorbello F, Lee C (2010) Experimental studies on continuous speech recognition using neural architectures with adaptive hidden activation functions. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, ICASSP 2010, 14-19 March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA, pp 4882–4885.  https://doi.org/10.1109/ICASSP.2010.5495120
  36. 36.
    Siniscalchi SM, Li J, Lee C (2013) Hermitian polynomial for speaker adaptation of connectionist speech recognition systems. IEEE Trans Audio Speech Lang Process 21(10):2152–2161.  https://doi.org/10.1109/TASL.2013.2270370 CrossRefGoogle Scholar
  37. 37.
    Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958. http://dl.acm.org/citation.cfm?id=2670313 MathSciNetzbMATHGoogle Scholar
  38. 38.
    Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300.  https://doi.org/10.1023/A:1018628609742 CrossRefGoogle Scholar
  39. 39.
    Villaseñor C, Arana-daniel N, Alanis AY, López-Franco C (2017) Hyperellipsoidal neuron. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 788–794.  https://doi.org/10.1109/IJCNN.2017.7965932
  40. 40.
    Wiesler S, Richard A, Schlüter R, Ney H (2014) Mean-normalized stochastic gradient for large-scale deep learning. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2014, florence, italy, may 4-9, 2014, pp 180–184.  https://doi.org/10.1109/ICASSP.2014.6853582
  41. 41.
    Zhang L, Suganthan PN (2016) A comprehensive evaluation of random vector functional link networks. Inf Sci 367-368:1094–1105.  https://doi.org/10.1016/j.ins.2015.09.025 CrossRefGoogle Scholar
  42. 42.
    Zhang X, Trmal J, Povey D, Khudanpur S (2014) Improving deep neural network acoustic models using generalized maxout networks. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2014, florence, italy, may 4-9, 2014, pp 215–219.  https://doi.org/10.1109/ICASSP.2014.6853589
  43. 43.
    Zhao H, Zhang J (2008) Functional link neural network cascaded with chebyshev orthogonal polynomial for nonlinear channel equalization. Signal Process 88(8):1946–1957.  https://doi.org/10.1016/j.sigpro.2008.01.029 CrossRefzbMATHGoogle Scholar
  44. 44.
    Zinkevich M, Weimer M, Smola AJ, Li L (2010) Parallelized stochastic gradient descent. In: Advances in neural information processing systems 23: 24th annual conference on neural information processing systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada, pp 2595–2603. http://papers.nips.cc/paper/4006-parallelized-stochastic-gradient-descent

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceThe University of AucklandAucklandNew Zealand
  2. 2.School of Information Science and TechnologyBeijing Forestry UniversityHaidian DistrictChina
  3. 3.Institute of AutomationChinese Academy of SciencesHaidian DistrictChina
  4. 4.School of Information and Communication TechnologyGriffith UniversityBrisbaneAustralia

Personalised recommendations