Skip to main content
Log in

Trainable back-propagated functional transfer matrices

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Functional transfer matrices consist of real functions with trainable parameters. In this work, functional transfer matrices are used to model functional connections in neural networks. Different from linear connections in conventional weight matrices, the functional connections can represent nonlinear relations between two neighbouring layers. Neural networks with the functional connections, which are called functional transfer neural networks, can be trained via back-propagation. On the two spirals problem, the functional transfer neural networks are able to show considerably better performance than conventional multi-layer perceptrons. On the MNIST handwritten digit recognition task, the performance of the functional transfer neural networks is comparable to that of the conventional model. This study has demonstrated that the functional transfer matrices are able to perform better than the conventional weight matrices in specific cases, so that they can be alternatives of the conventional ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. For details about layer-wise supervised training, please also refer to Algorithm 7 in the appendix of the referenced paper.

  2. The implementation of the functional transfer matrices in Table 1 can be downloaded from https://github.com/cchrewrite/Functional-Transfer-Neural-Networks/tree/master/Matrices.

  3. For more information about the CSTR machine learning platform, please refer to https://github.com/CSTR-Edinburgh/mlpractical.

  4. The “best” activation means that when all models with a certain type of functional connections, a certain depth and different values of γ are considered as a group, the activation brings about the highest accuracy in this group.

References

  1. Aizenberg IN, Moraga C (2007) Multilayer feedforward neural network based on multi-valued neurons (MLMVN) and a backpropagation learning algorithm. Soft Comput 11(2):169–183. https://doi.org/10.1007/s00500-006-0075-5

    Article  Google Scholar 

  2. Balog M, Gaunt AL, Brockschmidt M, Nowozin S, Tarlow D (2016) Deepcoder: Learning to write programs. CoRR arXiv:1611.01989

  3. Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wise training of deep networks. In: Advances in Neural information processing systems 19, Proceedings of the Twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, December 4-7, 2006, pp 153–160. http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks

  4. Bergstra J, Desjardins G, Lamblin P, Bengio Y (2009) Quadratic polynomials learn better image features. Tech. rep. Technical Report 1337, Département d’Informatique et de Recherche Opérationnelle, Université de Montréal

  5. Bottou L (2012) Stochastic gradient descent tricks. In: Neural networks: tricks of the trade. 2nd edn, pp 421–436. https://doi.org/10.1007/978-3-642-35289-8_25

  6. Buchholz S, Sommer G (2008) On clifford neurons and clifford multi-layer perceptrons. Neural Netw 21 (7):925–935. https://doi.org/10.1016/j.neunet.2008.03.004

    Article  MATH  Google Scholar 

  7. Cai C, Ke D, Xu Y, Su K (2017) Learning of human-like algebraic reasoning using deep feedforward neural networks. CoRR arXiv:1704.07503

  8. Cai C, Ke D, Xu Y, Su K (2017) Symbolic manipulation based on deep neural networks and its application to axiom discovery. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 2136–2143. https://doi.org/10.1109/IJCNN.2017.7966113

  9. Chung H, Lee SJ, Park JG (2016) Deep neural network using trainable activation functions. In: 2016 International joint conference on neural networks, IJCNN 2016, vancouver, BC, Canada, July 24-29, 2016, pp 348–352. https://doi.org/10.1109/IJCNN.2016.7727219

  10. Dehuri S, Cho S (2010) A comprehensive survey on functional link neural networks and an adaptive PSO-BP learning for CFLNN. Neural Comput & Applic 19(2):187–205. https://doi.org/10.1007/s00521-009-0288-5

    Article  Google Scholar 

  11. Dehuri S, Cho S (2010) Evolutionarily optimized features in functional link neural network for classification. Expert Syst Appl 37(6):4379–4391. https://doi.org/10.1016/j.eswa.2009.11.090

    Article  Google Scholar 

  12. Dehuri S, Roy R, Cho S, Ghosh A (2012) An improved swarm optimized functional link artificial neural network (ISO-FLANN) for classification. J Syst Softw 85(6):1333–1345. https://doi.org/10.1016/j.jss.2012.01.025

    Article  Google Scholar 

  13. Deng L (2012) The MNIST, database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142. https://doi.org/10.1109/MSP.2012.2211477

    Article  Google Scholar 

  14. Deng L, Hinton GE, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2013, vancouver, BC, Canada, May 26-31, 2013, pp 8599–8603. https://doi.org/10.1109/ICASSP.2013.6639344

  15. Dreiseitl S (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5-6):352–359. https://doi.org/10.1016/S1532-0464(03)00034-0

    Article  Google Scholar 

  16. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13-15, 2010, pp 249–256. http://www.jmlr.org/proceedings/papers/v9/glorot10a.html

  17. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth international conference on artificial intelligence and statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011, pp 315–323. http://www.jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf

  18. Goodenough DJ, Rossmann K, Lusted LB (1974) Radiographic applications of receiver operating characteristic (roc) curves. Radiology 110(1):89–95

    Article  Google Scholar 

  19. Hecht-Nielsen R (1988) Theory of the backpropagation neural network. Neural Netw 1(Supplement-1):445–448. https://doi.org/10.1016/0893-6080(88)90469-8

    Article  Google Scholar 

  20. Hinton G, Deng L, Yu D, Dahl GE, Ar Mohamed, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Proc Mag 29(6):82–97

    Article  Google Scholar 

  21. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  MATH  Google Scholar 

  22. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  23. Irving G, Szegedy C, Alemi AA, Eén N, Chollet F, Urban J (2016) Deepmath - deep sequence models for premise selection. In: Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5-10, 2016, Barcelona, Spain, pp 2235–2243 . http://papers.nips.cc/paper/6280-deepmath-deep-sequence-models-for-premise-selection

  24. Kominami Y, Ogawa H, Murase K (2017) Convolutional neural networks with multi-valued neurons. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 2673–2678. https://doi.org/10.1109/IJCNN.2017.7966183

  25. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  26. Leung H, Haykin S (1991) The complex backpropagation algorithm. IEEE Trans Signal Process 39 (9):2101–2104. https://doi.org/10.1109/78.134446

    Article  Google Scholar 

  27. Matsui N, Isokawa T, Kusamichi H, Peper F, Nishimura H (2004) Quaternion neural network with geometrical operators. J Intell Fuzzy Syst 15(3-4):149–164. http://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs00236

    MATH  Google Scholar 

  28. Mohamed A, Dahl GE, Hinton GE (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22. https://doi.org/10.1109/TASL.2011.2109382

    Article  Google Scholar 

  29. Montúfar GF, Pascanu R, Cho K, Bengio Y (2014) On the number of linear regions of deep neural networks. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp 2924–2932. http://papers.nips.cc/paper/5422-on-the-number-of-linear-regions-of-deep-neural-networks

  30. Pao Y, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180. https://doi.org/10.1016/0925-2312(94)90053-1

    Article  Google Scholar 

  31. Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. CoRR arXiv:1712.04621

  32. Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Tech. rep., California Univ San Diego La Jolla Inst for Cognitive Science

  33. Rumelhart DE, Hinton GE, Williams RJ et al (1988) Learning representations by back-propagating errors. Cognitive Modeling 5(3):1

    MATH  Google Scholar 

  34. Sarikaya R, Hinton GE, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE/ACM Trans Audio Speech &, Language Processing 22(4):778–784. https://doi.org/10.1109/TASLP.2014.2303296

    Article  Google Scholar 

  35. Siniscalchi SM, Svendsen T, Sorbello F, Lee C (2010) Experimental studies on continuous speech recognition using neural architectures with adaptive hidden activation functions. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, ICASSP 2010, 14-19 March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA, pp 4882–4885. https://doi.org/10.1109/ICASSP.2010.5495120

  36. Siniscalchi SM, Li J, Lee C (2013) Hermitian polynomial for speaker adaptation of connectionist speech recognition systems. IEEE Trans Audio Speech Lang Process 21(10):2152–2161. https://doi.org/10.1109/TASL.2013.2270370

    Article  Google Scholar 

  37. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958. http://dl.acm.org/citation.cfm?id=2670313

    MathSciNet  MATH  Google Scholar 

  38. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300. https://doi.org/10.1023/A:1018628609742

    Article  Google Scholar 

  39. Villaseñor C, Arana-daniel N, Alanis AY, López-Franco C (2017) Hyperellipsoidal neuron. In: 2017 International joint conference on neural networks, IJCNN 2017, anchorage, AK, USA, May 14-19, 2017, pp 788–794. https://doi.org/10.1109/IJCNN.2017.7965932

  40. Wiesler S, Richard A, Schlüter R, Ney H (2014) Mean-normalized stochastic gradient for large-scale deep learning. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2014, florence, italy, may 4-9, 2014, pp 180–184. https://doi.org/10.1109/ICASSP.2014.6853582

  41. Zhang L, Suganthan PN (2016) A comprehensive evaluation of random vector functional link networks. Inf Sci 367-368:1094–1105. https://doi.org/10.1016/j.ins.2015.09.025

    Article  Google Scholar 

  42. Zhang X, Trmal J, Povey D, Khudanpur S (2014) Improving deep neural network acoustic models using generalized maxout networks. In: IEEE International conference on acoustics, speech and signal processing, ICASSP 2014, florence, italy, may 4-9, 2014, pp 215–219. https://doi.org/10.1109/ICASSP.2014.6853589

  43. Zhao H, Zhang J (2008) Functional link neural network cascaded with chebyshev orthogonal polynomial for nonlinear channel equalization. Signal Process 88(8):1946–1957. https://doi.org/10.1016/j.sigpro.2008.01.029

    Article  MATH  Google Scholar 

  44. Zinkevich M, Weimer M, Smola AJ, Li L (2010) Parallelized stochastic gradient descent. In: Advances in neural information processing systems 23: 24th annual conference on neural information processing systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada, pp 2595–2603. http://papers.nips.cc/paper/4006-parallelized-stochastic-gradient-descent

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities (No. 2016JX06) and the National Natural Science Foundation of China (No. 61472369).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanyan Xu.

Additional information

Funding: This work was supported by the Fundamental Research Funds for the Central Universities [grant number 2016JX06]; and the National Natural Science Foundation of China [grant number 61472369].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, CH., Xu, Y., Ke, D. et al. Trainable back-propagated functional transfer matrices. Appl Intell 49, 376–395 (2019). https://doi.org/10.1007/s10489-018-1266-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1266-3

Keywords

Navigation