Abstract
This paper proposes the Mesh Neural Network (MNN), a novel architecture which allows neurons to be connected in any topology, to efficiently route information. In MNNs, information is propagated between neurons throughout a state transition function. State and error gradients are then directly computed from state updates without backward computation. The MNN architecture and the error propagation schema is formalized and derived in tensor algebra. The proposed computational model can fully supply a gradient descent process, and is potentially suitable for very large scale sparse NNs, due to its expressivity and training efficiency, with respect to NNs based on back-propagation and computational graphs.
Similar content being viewed by others
References
Galatolo F, Cimino M, Vaglini G (2019) Using stigmergy as a computational memory in the design of recurrent neural networks. In: Proceedings of the 8th international conference on pattern recognition applications and methods
Wilamowski BM, Yu H (2010) Neural network learning without backpropagation. IEEE Trans Neural Netw 21:1793–1803
Guo W, Huang H, Huang T (2017) Complex-valued feedforward neural networks learning without backpropagation. In: Liu D, Xie S, Li Y, Zhao D, El-Alfy E-SM (eds) Neural information processing. Springer, Cham, pp 100–107
Ma KW-D, Lewis JP, Kleijn WB (2019) The hsic bottleneck: deep learning without back-propagation, ArXiv:1908.01580v3
Jaderberg M, Czarnecki WM, Osindero S, Vinyals O, Graves A, Silver D, Kavukcuoglu K (2017) Decoupled neural interfaces using synthetic gradients. In: Proceedings of the 34th international conference on machine learning, Volume 70, pp. 1627–1635, JMLR.org
Keller JM, Liu D, Fogel DB (2016) Multilayer neural networks and backpropagation, fundamentals of computational intelligence: neural networks, fuzzy systems and evolutionary computation. Wiley, New York, p 378
Werbos PJ (1994) The roots of backpropagation: from ordered derivatives to neural networks and political forecasting, vol 1. Wiley, New York
Theodoridis S (2015) Chapter 5—stochastic gradient descent: the lms algorithm and its family. In: Theodoridis S (ed) Mach Learn. Academic Press, Oxford, pp 161–231
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Tang H, Glass J (2018) On training recurrent networks with truncated backpropagation through time in speech recognition. In: 2018 IEEE spoken language technology workshop (SLT). IEEE, pp 48–55
Oliphant T (2006) NumPy: a guide to NumPy. Trelgol Publishing, Austin
Galatolo F (2019) https://github.com/galatolofederico/mesh-neural-networks. GitHub repository
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Baydin AG, Pearlmutter BA, Radul AA, Siskind JM (2017) Automatic differentiation in machine learning: a survey. J Mach Learn Res 18(1):5595–5637
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Anderson E (1936) The species problem in Iris. Ann Missouri Bot Garden 23:457–509
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. In: International conference on learning representations
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, arXiv:1708.07747
Simard P, LeCun Y, Denker J (1993) Efficient pattern recognition using a new transformation distance. In: Hanson S, Cowan J, Giles C (eds) Advances in neural information processing systems, vol 5. Morgan-Kaufmann, Burlington, pp 50–58
Chen M, Shi X, Zhang Y, Wu D, Guizani M (2017) Deep features learning for medical image analysis with convolutional autoencoder neural network. IEEE Trans Big Data. https://doi.org/10.1109/TBDATA.2017.2717439
Irony D, Toledo S, Tiskin A (2004) Communication lower bounds for distributed-memory matrix multiplication. J Parallel Distrib Comput 64(9):1017–1026
Acknowledgements
This research was partially carried out in the framework of the following projects: (i) PRA 2018_81 project entitled “Wearable sensor systems: personalized analysis and data security in healthcare” funded by the University of Pisa; (ii) CrossLab project (Departments of Excellence), funded by the Italian Ministry of Education and Research (MIUR); (iii) “KiFoot: Sensorized footwear for gait analysis” project, co-funded by the Tuscany Region (Italy) under the PAR FAS 2007-2013 fund and the FAR fund of the Ministry of Education, University and Research (MIUR).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Galatolo, F.A., Cimino, M.G.C.A. & Vaglini, G. Formal Derivation of Mesh Neural Networks with Their Forward-Only Gradient Propagation. Neural Process Lett 53, 1963–1978 (2021). https://doi.org/10.1007/s11063-021-10490-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-021-10490-1