Abstract
We analyze algorithmic and computational aspects of biological phenomena, such as replication and programmed death, in the context of machine learning. We use two different measures of neuron efficiency to develop machine learning algorithms for adding neurons to the system (i.e., replication algorithm) and removing neurons from the system (i.e., programmed death algorithm). We argue that the programmed death algorithm can be used for compression of neural networks and the replication algorithm can be used for improving performance of the already trained neural networks. We also show that a combined algorithm of programmed death and replication can improve the learning efficiency of arbitrary machine learning systems. The computational advantages of the bio-inspired algorithms are demonstrated by training feedforward neural networks on the MNIST dataset of handwritten images.
This is a preview of subscription content, access via your institution.


























Data availability
The MNIST dataset [24] analyzed during the current study is available in the MNIST database, http://yann.lecun.com/exdb/mnist/.
References
Galushkin AI (2007) Neural networks theory. Springer, Berlin, p 396
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Haykin Simon S (1999) Neural networks: a comprehensive foundation. Prentice Hall, Hoboken
Vapnik Vladimir N (2000) The nature of statistical learning theory. Information Science and Statistics
Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. PNAS 79(8):2554–2558
Shwartz-Ziv R, Tishby N (2017) Opening the black box of deep neural networks via information. arXiv:1703.00810 [cs.LG]
Roberts D, Yaida S, Hanin B (2022) The principles of deep learning theory: an effective theory approach to understanding neural networks. Cambridge University Press, Cambridge
Vanchurin V (2021) Toward a theory of machine learning. Mach Learn: Sci Technol 2:035012
Vanchurin V, Wolf YI, Katsnelson MO, Koonin EV (2022) Towards a theory of evolution as multilevel learning. Proc Natl Acad Sci USA 119:e2120037119
Vanchurin V, Wolf YI, Koonin EV, Katsnelson MO (2022) Thermodynamics of evolution and the origin of life. Proc Natl Acad Sci USA 119:e2120042119
Katsnelson MI, Vanchurin V (2021) Emergent quantumness in neural networks. Found Phys 51(5):1–20
Katsnelson MI, Vanchurin V, Westerhout T (2021) Self-organized criticality in neural networks. arXiv:2107.03402
Vanchurin V (2022) Towards a theory of quantum gravity from neural networks. Entropy 24:7
Vanchurin V (2020) The world as a neural network. Entropy 22:1210
Hassibi B, Stork DG (1992) Second order derivatives for network pruning: optimal brain surgeon. Adv Neural Inform Proc Syst 5
Medeiros CMS, Baretto GA (2013) A novel weight pruning method for MLP classifiers on the MAXCORE principle. Neural Comput Appl 22:71–84
Thomas P, Suhner M-C (2015) A new multilayer perceptron pruning algorithm for classification and regression applications. Neural Process Lett 42(2):437–458
Augasta MG, Kathirvalavakumar T (2011) A novel pruning algorithm for optimizing feedforward neural network of classification problems. Neural Process Lett 34:241–258
Zeng X, Yeung DS (2006) Hidden neuron pruning of multilayer perceptrons using a quantified sensitivity measure. Neuro Comput 69:825–837
Kwok TY, Yeung DY (1997) Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Trans Neural Netw 8(3):630–645
Parekh R, Yang J, Honavar V (2000) Constructive neural-network learning algorithms for pattern classification. Trans Neural Netw 11(2):436–451
Monirul IMd, Abdus SMd, Faijul Md, Xin Y, Kazuyuki M (2009) A new constructive algorithm for architectural and functional adaptation of artificial neural networks. IEEE Trans Syst, Man, Cybern Part B, Cyberne: Publ IEEE Syst, Man, Cybern Soc 39(6):1590–1605
Puma-Villanueva WJ, dos Santos EP, Von Zuben FJ (2012) A constructive algorithm to synthesize arbitrarily connected feedforward neural networks. Neurocomputing 75(1):14–32
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Acknowledgements
V.V. was supported in part by the Foundational Questions Institute (FQXi) and the Oak Ridge Institute for Science and Education (ORISE).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Appendix
A Appendix
Here, we list the pruning algorithms customized for the feedforward neural network with n neurons \(\{x_{1}(t),...x_{n} (t)\}=\{x_{1},...x_{n}\}\) in an hidden layer t such that
1.1 A.1 Connection cut algorithm
-
1.
Measure variances of neurons on level t and level \(t+1\)
$$\begin{aligned} C^t_{kk}=\langle \Delta x_k(t)^2\rangle ,\quad C^{t+1}_{ii}=\langle \Delta x_i(t+1)^2\rangle . \end{aligned}$$(71) -
2.
Find the neuron l with minimal efficiency (28)
$$\begin{aligned} E_l=\min _k E_k =\min _k C^t_{kk} \sum _i \frac{(w_{ik}^t)^2}{C_{ii}^{t+1}}f'_i (\sum _jw_{ij}^{t} \langle x_{j}\rangle +b_{i}^{t} )^2. \end{aligned}$$(72) -
3.
Use \(x_l=\langle x_l\rangle \) as linear dependence equation (39) with
$$\begin{aligned} a_{k\ne l}=0,\quad a_l=1,\quad a_0=\langle x_l\rangle . \end{aligned}$$(73) -
4.
Remove neuron l from the net according to (41)
$$\begin{aligned} \sum _kw_{jk}^{t}x_{k}+b_{j}^{t}&\simeq \sum _{k\ne l} w_{jk}^{t}x_{k}+{\tilde{b}}_{j}^{t},\quad {\tilde{b}}_{j}^{t}=w_{jl}^{t} \langle x_l\rangle +b_{j}^{t}. \end{aligned}$$(74) -
5.
Do so while there are neurons with efficiency less than a cutoff or while accuracy or loss stays acceptable.
1.2 A.2 Probability algorithm
-
1.
Measure variances of neurons on level t and level \(t+1\)
$$\begin{aligned} C^t_{kk}=\langle \Delta x_k(t)^2\rangle ,\quad C^{t+1}_{ii}=\langle \Delta x_i(t+1)^2\rangle . \end{aligned}$$(75) -
2.
Find the neuron l with minimal efficiency (28)
$$\begin{aligned} E_l=\min _k E_k =\min _k C^t_{kk} \sum _i \frac{(w_{ik}^t)^2}{C_{ii}^{t+1}}f' \left(\sum _jw_{ij}^{t} \langle x_{j}\rangle +b_{i}^{t} \right)^2. \end{aligned}$$(76) -
3.
Use linear dependence equation (39) \(\sum _j a_j x_j=a_0\) with
$$\begin{aligned} a_j=\sum _i \frac{w_{il}^tw_{ij}^t}{C_{ii}^{t+1}}f' \left(\sum _kw_{ik}^{t} \langle x_{k}\rangle +b_{i}^{t} \right)^2,\quad a_0=\sum _ja_j\langle x_j\rangle . \end{aligned}$$(77) -
4.
Remove neuron l from the net according to (41)
$$\begin{aligned} \sum _kw_{jk}^{t}x_{k}+b_{j}^{t}&\simeq \sum _{k\ne l} {\tilde{w}}_{jk}^{t}x_{k}+{\tilde{b}}_{j}^{t}, \end{aligned}$$(78)where
$$\begin{aligned} {\tilde{w}}_{jk}^{t}=w_{jk}^{t}-w_{jl}^{t}\frac{a_{k}}{a_{l}},\quad \tilde{b}_{j}^{t}=w_{jl}^{t}\frac{ a_0}{a_{l}}+b_{j}^{t}. \end{aligned}$$(79) -
5.
Do so while there are neurons with efficiency less than a cutoff or while accuracy or loss stays acceptable.
1.3 A.3 Covariance algorithm
-
1.
Measure covariance matrix \(C^t_{kj}=\langle \Delta x_k\Delta x_j\rangle \) (10) of the neurons on hidden layer t and find its eigenvectors \(\textbf{v}\) and eigenvalues \(\lambda \) (11).
-
2.
Find the neuron l and the eigenvalue \(\lambda _p\) with minimal efficiency (17)
$$\begin{aligned} E'_l = \min _k E'_k = \min _{i,k} \frac{ \lambda _i }{\left( \textbf{v}^{(i)}_k\right) ^2}=\frac{ \lambda _p }{\left( \textbf{v}^{(p)}_l\right) ^2}. \end{aligned}$$(80) -
3.
Use \(\sum _{k}{} \textbf{v}^{(p)}_k x_{k}=\lambda _{p}\) as linear dependence equation (39) with
$$\begin{aligned} a_k=\textbf{v}^{(p)}_k,\quad a_0=\lambda _{p}. \end{aligned}$$(81) -
4.
Remove neuron l from the net according to (41)
$$\begin{aligned} \sum _kw_{jk}^{t}x_{k}+b_{j}^{t}&\simeq \sum _{k\ne l} {\tilde{w}}_{jk}^{t}x_{k}+{\tilde{b}}_{j}^{t}, \end{aligned}$$(82)where
$$\begin{aligned} {\tilde{w}}_{jk}^{t}=w_{jk}^{t}-w_{jl}^{t}\frac{a_{k}}{a_{l}},\quad \tilde{b}_{j}^{t}=w_{jl}^{t}\frac{ a_0}{a_{l}}+b_{j}^{t}. \end{aligned}$$(83) -
5.
Do so while there are neurons with efficiency less than a cutoff or while accuracy or loss stays acceptable.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Grabovsky, A., Vanchurin, V. Bio-inspired machine learning: programmed death and replication. Neural Comput & Applic 35, 20273–20298 (2023). https://doi.org/10.1007/s00521-023-08806-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08806-4