Regularization theory in the study of generalization ability of a biological neural network model

This paper focuses on the generalization ability of a dendritic neuron model (a model of a simple neural network). The considered model is an extension of the Hodgkin-Huxley model. The Markov kinetic schemes have been used in the mathematical description of the model, while the Lagrange multipliers method has been applied to train the model. The generalization ability of the model is studied using a method known from the regularization theory, in which a regularizer is added to the neural network error function. The regularizers in the form of the sum of squared weights of the model (the penalty function), a linear differential operator related to the input-output mapping (the Tikhonov functional), and the square norm of the network curvature are applied in the study. The influence of the regularizers on the training process and its results are illustrated with the problem of noise reduction in images of electronic components. Several metrics are used to compare results obtained for different regularizers.


Introduction
This article is the latest part of a series reporting the present author's research into a set of models of biological dendritic neurons and neurons with a point-like structure. The article presents the results of research into generalization ability of the stochastic Communicated by: Pavel Solin AleksandraŚwietlicka aleksandra.swietlicka@put.poznan.pl 1 Institute of Automation and Robotics, Poznan University of Technology, ul. Piotrowo 3A, 60-965, Poznań, Poland kinetic model of a biological dendritic neuron, which stems from the model proposed by A.L. Hodgkin and A.F. Huxley in [8]. Generalization ability of the stochastic kinetic model of a biological neuron with a point-like structure was studied in [16]. Models of biological neural networks were considered in [15], which showed the discretization of model equations in time and space, and in [13], where the training of neural network models was discussed, albeit without analyzing the generalization ability of these models.
The generalization ability determines whether a neural network is capable of returning a proper solution for data that did not appear in the training set. Moreover, ability of generalization prevents the training algorithm from over-training the neural network.
As was shown in [16], a biological neuron model has the ability of generalization. The model of a biological neural network understood as a biological dendritic neuron may be assumed to have enhanced capabilities compared to the model of a neuron with a point-like structure, as each and every point of a so-defined neural network may perform consecutive steps of the algorithm for which the network has been designed. Therefore, to be able to treat the results rendered by a neural network as valid, it is necessary to consider the generalization ability of the biological neural network model.
Currently, the most common method used to study the generalization ability of a model of any neural network is the method adopted from the regularization theory [2,7]. This method consists in adding the so-called regularizer to the error function of the network. Depending on how the regularizer is defined, it can penalize the curvature of the neural network [10], smoothen and therefore stabilize the solution [7], or smoothen the curvature, making the output of the network locally invariant with respect to small perturbations of the input [6].
In this study, three kinds of regularizer are used: the penalty function, the Tikhonov functional, and the square norm of the network curvature. The penalty function is the sum of squared weights of the model [10], the Tikhonov functional is a linear differential operator related to input-output mapping [7], while the network curvature is defined as the second derivative of the output with respect to the input of the model (second-order differential operator) [6].
To show how the regularizer impacts the training process of the model of a biological neural network, this network was used to reduce noise in images of selected electronic components, such as a printed board or an integrated circuit. The noise was introduced, for example, by covering the camera lens with a thin plastic film.
The model of a biological neural network was implemented in MATLAB. The network was trained with the Lagrange multipliers method [1]. The definition of the problem made it possible to use the embedded MATLAB fsolve function.
The paper is organized as follows. A concise mathematical description of the biological neural network model is given in Section 2. The regularizers used in the study are described in Section 3. The application of the Lagrange multipliers method to train the neural network model is discussed in Section 4. Section 5 presents the results of the training process of the network. Section 6 completes the paper.

Model
The paper considers the stochastic kinetic model of a biological dendritic neuron (for simplicity, we will refer to it as a biological neural network throughout the paper). The original model, which was used to derive the kinetic model, was proposed and described in detail in the paper by A.L. Hodgkin and A.F. Huxley in 1952 [8]. The biological foundations of the model have been thoroughly studied and presented in a number of publications, e.g., in [3,5,8,9,14]; therefore, in this paper, we only provide a concise mathematical description of the model.
The main equation describing the considered model has the following form [13]: where V is the potential on the cell membrane, and the remaining terms (that do not include a derivative) represent the ion current components related to the respective types of ions (sodium, potassium, and chlorine  (2) and (3). Their values are obtained from the normal distribution. A detailed description of the procedure for obtaining these values is included, among others, in [5,[12][13][14]. The forms of transfer functions α i (V ) and β i (V ), which appear in the kinetic schemes, are given in Table 1 [8].
The structure of the biological neural network considered in this paper is given in Fig. 1. The points on the structure represent places where the potential will be Fig. 1 Structure of the dendritic neuron [13]. Points on the structure represent places where the potential will be determined determined. To make the mathematical description easier to comprehend, we introduce the symbol V i for potential in a point placed on the branch where point i is, in time t + t and in the respective point of branch x : i (x). This notation also applies to the variables [m 3 h 0 ] and [n 4 ]. The following initial condition applies in the discretization of (1) [4], where [4]: and I 0 is a known value. The remaining boundary conditions are equal to 0. Taking into consideration condition (4), we can write five equations, which describe all points of the model of the neural network given in Fig. 1 [13]: A detailed description of the discretization of a model of this type is given in [15].
It is noticeable that a biological neural network is modeled in a similar way to an artificial neural network. It has an input (current i 0 (t)), output (the cell membrane potential V ) while selected parameters of the model are treated as weights. In our case, the weights are g Na , g K , and g L : whereǧ are the initial values of conductances, given in Table 2, whileg are proper weights of the model, which are updated during the training process.

Generalization terms
In this paper, three types of regularizer have been used to study the generalization ability of the biological neural network. As in dynamic neural networks, the regularizer is appended to the error function of a neural network [6]. Since the network considered in this paper is a five-point structure, the error function has the following form [15]: The first term in 8 is the standard error of the neural network, where V * i is the reference voltage (the target of the training) in the i th point of the neural network structure. The term λE λ denotes the regularizer, where E λ is the regularization functional and λ is the regularization parameter, which ensures a trade-off between the task performed by the regularizer and the accuracy of the network's input-output mapping.
The first form of E λ considered in this study is the so-called penalty function, given with the following formula: To simplify the analysis, we have assumed that the weights are constants; therefore, we can omit the integral in (9). The next considered form of E λ is the Tikhonov functional: where · denotes the standard Euclidean norm. Since in our case the derivative of potential V with respect to current I is a single value, the norm may be omitted in (10). The last considered form of E λ is the square norm of the network curvature (second-order differential operator): The regularizers vary in the way they affect the neural network. The penalty function allows us to regularize the model's structure, which is essential since the overfitting of the neural network structure may lead to over-training the network. The Tikhonov functional introduces the additional assumption that the input-output mapping function should be smooth, which means that similar values of the input current I correspond to similar values of the output potential V [7]. Finally, as a result of minimization of the square norm of the network curvature, the derivative is locally invariant with respect to small changes in I . The application of this regularizer allows us to obtain a better mapping of the training set.

Training
The stochastic version of the kinetic model of a biological neural network is trained using the method of Lagrange multipliers. This method searches for the minimum (or maximum) of a certain function subject to specific constraints. In our case, we minimize the error function (8), while the constraints are represented by the system of equations (6). In order to find the minimum of function E, let us formulate the following auxillary function: where λ i are so-called Lagrange multipliers. We calculate the derivatives of function L with respect to all parameters of the model of the biological neural network. These parameters are V 1 , V 2 , V 3 , V 4 , V 5 ,g Na ,g K ,g L , λ 1 , λ 2 , λ 3 , λ 4 , and λ 5 . In the next step, we equate all derivatives to zero, thus obtaining a system of equations, whose solution is the minimum of function E. For clarity, the derivaties of (12) are given below as three separate systems of Eqs. (13)(14)(15)). Only the system of equations (14) changes depending on the form of regualizer used.
If the regularizer is not applied, then P 1 = P 2 = P 3 = 0 (or equivalently λ = 0). For the regularizer in the form of the penalty function, we have: For the Tikhonov functional and for the square norm of the network curvature, P 1 , P 2 , and P 3 take the following form: where: for the Tikhonov functional, and: (20) for the square norm of the network curvature.

Results
In order to assess the influence of the regularizer on the process of training a neural network model, we conducted a test consisting in noise reduction in images of electronic components. Figures 2(A 1 ) and 3(A 1 ) show two original images, to which noise was added in the following way: the lens was covered with thin plastic film before taking the picture (Fig. 2(B 1 , C 1 )), and the image was digitally blurred (Fig. 3(B 1 , C 1 )). The histograms in Figs. 2 and 3 (A 2 , B 2 , C 2 ) depict the difference between the original pictures and their distorted versions.
The task of the neural network model was to gradually reduce the noise in the images. The image was fed in the form of current I to the neural network input. The potential V obtained at any output point of the neural network structure was assumed to represent the denoised image, ideally as close to the original as possible. All final points of the considered structure rendered the same response as the neural network structure was symmetric.
In the test, we considered various values of the regularization factor λ (λ = 0.01, 0.02, 0.1, 0.2, 0.5, 1, 2, 5, 10), as well as the case λ = 0 (which is equivalent to not having a regularizer at all). We considered a number of measures to compare the results of the training process with and without the regularizer.
The first measure is the training time. Example training times of a biological neural network model for an approximately 100,000 pixel image are given in Table 3. Except for one case, the training time improves up to two times compared to the training time for the case without the regularizer (λ = 0).
The root mean square error (RMSE) is the second measure. Example RMSE results for a reconstructed image of approximately 100,000 pixels are given in Table 4. For λ = 0 (no regularizer used), RMSE is of the order of 10 −2 . The addition Finally, the fsolve MATLAB function allows us to determine the so-called firstorder optimality measure, which says how close the obtained solution is from the sought solution. This measure is related to the Karush-Kuhn-Tucker conditions of the method of Lagrange multipliers [11]. The measure must be zero at a minimum (a sufficient condition for optimality of a solution), but a point with the measure equal to zero may not be a minimum.  Figure 4 shows the minimal, maximal, and average distance between the vectors representing the original image and the image reconstructed by the model of a biological neural network. Except for individual cases, these values improve when a regularizer is added to the error function, compared to the case when λ = 0. Figure 5 features example images which have been poorly reconstructed by the neural network model. The respective histograms in Fig. 5 show the difference between the reconstructions and the original images. The most difficult areas to be reconstructed are the edges and the dark parts of the images. Poor reconstruction of images may take place when the model is over-trained and consequently it cannot  show the difference between the reconstructions and the original images find a solution if the processed image is not in the training set. Also, during initialization, the values of weights are selected randomly; as a result, the algorithm which calculates the minimum error function may find a local minimum which it cannot leave. However, these situations are exceptional in case of the presented model of a biological neural network. In most cases, the network is able to reconstruct images which are very close to the original.

Summary
The paper analyzed the generalization ability of a stochastic kinetic model of a biological neural network. The neural network model was based on the Hodgkin-Huxley model extended with Markov's kinetic schemes, which describe in detail the processes taking place on the cell membrane. The generalization ability of the model was studied using a method known from the regularization theory, which consists in adding a regularizer to the error function of the biological neural network model. The following forms of regularizer were studied: the penalty function, the Tikhonov functional, and the square norm of the network curvature. Different regularizers affect the training process of the neural network structure in a different way, which results from the way they change the neural network structure (it is minimized) or the input-output mapping (it is smoothened). The results presented in the paper show that the use of the regularizer improves the measures used in comparisons, such as RMSE or the first-order optimality measure. The most visible effect is that of the square norm of the network curvature, thanks to which RMSE is improved by up to one order of magnitude, and the first-order optimality measure is improved by several orders of magnitude, from values of the order 10 2 to values of the order 10 8 .