Asymptotic Expansion for Neural Network Operators of the Kantorovich Type and High Order of Approximation

In this paper, we study the rate of pointwise approximation for the neural network operators of the Kantorovich type. This result is obtained proving a certain asymptotic expansion for the above operators and then by establishing a Voronovskaja type formula. A central role in the above resuts is played by the truncated algebraic moments of the density functions generated by suitable sigmoidal functions. Furthermore, to improve the rate of convergence, we consider finite linear combinations of the above neural network type operators, and also in the latter case, we obtain a Voronovskaja type theorem. Finally, concrete examples of sigmoidal activation functions have been deeply discussed, together with the case of rectified linear unit (ReLu) activation function, very used in connection with deep neural networks.


Introduction
The theory of the neural network (NN) operators arise since 1992 with the pioneer work of Cardaliaguet and Euvrard [15], and then in the next years, they have been largely studied by several authors under different aspects, see, e.g., [12,13,32,33]; the main advantage of the above theory lies on its connection with the artificial neural networks and their applications to Approximation Theory.For a complete overview concerning applications of neural networks and learning theory, see, e.g., the very complete monograph of Cucker and Zhou [29], and also [9,37,42,43,[46][47][48][49]53,56].
A generalization to the L p -setting of the above NN operators has been proved in [21]; here, the authors showed that any multivariate L p -data can be approximated in the L p -norm by means of the NN operators of the Kantorovich-type.However, even if their natural settings are the L p -spaces, the above operators converge also pointwise and uniformly when continuous functions are considered.
Such operators are called "of Kantorovich-type", since their coefficients are defined by means of suitable averages of the functions f that is approximated.
Usually, the activation functions of the NN operators are a sigmodal function ( [30]); the reason is that a sigmoidal curve allows to simulate the two states of the biological neurons, that are, the activated and the nonactivated ones, see, e.g., [9,38,39,44].
Recently, also a new unbounded activation functions has been investigated, that is the so-called ReLu (rectified linear unit) activation function, defined by the positive part of x, for every x ∈ R. The ReLu activation function has a linear grow as x goes to +∞ and it revealed to be very suitable in order to train deep (i.e., the multi-layers) neural networks, see, e.g., [31,51].
Due to its definition, it is easy to show that the ReLu activation function can be used to express the well-known ramp (or cut) sigmoidal function (see [36]); furthermore, it is also well known that the ramp function can be considered as the activation function in the NN operators.Then, the above relation implies that also the ReLu activation function can be used in NN operators.
The main purpose of this paper is to study the pointwise rate of approximation for the NN operators of the Kantorovich type.The above aim is pursuit by adopting the following strategy: first of all, an asymptotic expansion for the above operators is established, and then, a Voronovskaja type theorem is proved.
The asymptotic formula allows to expand the above approximation operators, when they are evaluated for a sufficiently smooth functions f , in terms of derivatives of the approximated function f evaluated at suitable nodes.A central role in the asymptotic expansion is played by the truncated algebraic moments of the density functions φ σ (x); the density function is generated by suitable sigmoidal functions and it can be considered as the kernel of the operators.
The asymptotic formula can be used to prove a Voronovskaja-type theorem (see, e.g., [6][7][8]), from which we establish the rate of pointwise approximation of the operators under investigation.
Furthermore, to improve the above convergence results and to obtain a family of NN operators with an higher order of approximation, here, we also considered suitable families of finite linear combination of the above NN operators of the Kantorovich type.Also for these families of operators, we derive asymptotic and Voronovskaja type formulas, following the same steps above described.Finally, we discuss in details several examples of sigmoidal functions for which the above results hold.In particular, we consider the cases of the logistic and the hyperbolic tangent sigmoidal functions, which are of crucial importance in the theory of learning artificial neural networks, and the case of the sigmoidal functions generated by the central B-splines, which are useful to construct high-order convergence NN operators.

Notations and Preliminary Results
In this paper, we denote by C (I) the space of all functions f : I → R which are continuous on I := [−1, 1] and we denote by C r (I), r ∈ N + the space of functions of C (I) which have continuous derivative f (s) on I, for every 1 ≤ s ≤ r, s ∈ N + .The above spaces will be considered endowed by the usual max-norm • ∞ .Now, denoting by Φ : R → R a given function, we can recall the following definitions.

Definition 1.
Let h ∈ N be fixed.We define the truncated algebraic moment of Φ order h, by: for every n ∈ N + .

Definition 2.
Let h ≥ 0 be fixed.We call discrete absolute moment of Φ of order h what follows: The discrete absolute moments of a given function are widely used tools very useful to establish the convergence of families of linear operators, see, e.g., [6,7,55].
Remark 5. Note that, if we remove condition (Σ2) on σ, and we assume directly that φ σ satisfies assertion (iii) of Lemma 4, the theory still holds, see, e.g., [25].The consequence of the above observation is that, we can apply the above theory to C 2 as well as to non-smooth sigmoidal functions, such that the corresponding φ σ satisfies (iii) of Lemma 4, and φ σ (2) > 0.
For a proof of the properties (i) -(v) see [24]; for the properties (vi) -(viii), see Lemma 2.6 of [28].Now, we recall the definition of the NN operators of the Kantorovich type.Definition 6.Let f : I → R be a locally integrable function.The Kantorovichtype NN operators K n (f, •), activated by a sigmoidal function σ, and acting on f , are defined by: ( Clearly, for every n the latter inequality, together with condition (v) of Lemma 4, implies that the operators are well defined, e.g., in case of locally integrable and bounded functions on I. Furthermore, it turns out that: Remark 7. It is well known that (see, e.g., [21]) the family K n (f, •) converges pointwise at each point of continuity of any bounded f .The convergence turns out to be uniform on I if f is continuous on the whole I.

Asymptotic Formulas for the NN Operators of the Kantorovich Type
The main aim of this section is to study the order of pointwise approximation for the operators K n by means of Voronovskaja type theorem.
From now on, when we refer to a sigmoidal function σ, we always consider a sigmoidal function σ satisfying conditions (Σ1), (Σ2), and (Σ3) introduced in Sect. 2.
We begin establishing the following asymptotic expansion for the operators K n .Theorem 8. Let σ be a sigmoidal function witch satisfies assumption (Σ3) with α > r, for some r ∈ N + .Moreover, let f ∈ C r (I) be fixed.Then, the following asymptotic formula holds: for every x ∈ I, where m n 0 (φ σ , nx) > 0, for every x ∈ I, and n ∈ N + .
Proof.From the local Taylor formula (for details, see, e.g., [8, p. 283]), we have: for every x, u ∈ I, where h (y) is a suitable bounded1 function, such that h (y) → 0 as y → 0.Then, for every fixed x ∈ I, we can write what follows: Let us analyze the term I 1 .Using the binomial theorem and recalling that ), we have that: . Now, let us analyze the term I 2 .Let ε > 0 be fixed.Since h (y) → 0 as y → 0, then there exists a γ > 0, such that |h (y)| < ε if |y| ≤ γ.Hence, we can write: As concerns I 2,1 , we observe that, if u ∈ [k/n, (k + 1) /n], we have: for every sufficiently large n, and hence: We now consider I 2,2 .By the boundedness of h (y) and arguing as in I 2,1 , we get: for n sufficiently large, by the property (vii) of Lemma 4, and the trivial identity: This concludes the proof.
As a consequence of the previous result, we can prove the following Voronovskaja type theorem.Theorem 9. Let σ be a sigmoidal function which satisfies assumption (Σ3) with α > 1. Suppose that, for every x ∈ R: where m h ∈ R, θ h > 0, h = 0, 1, and the "o-term" tends to zero uniformly with respect to x ∈ R.Then, for any f ∈ C 1 (I), we have: where m 0 > 0 in view of Lemma 4 (v).
Proof.From the asymptotic formula of Theorem 8, we have: and the result follows taking the limit as n → +∞.
Remark 10.The previous theorem shows that the order of pointwise approximation by means of the Kantorovich NN operators is at least of order O n −1 when n → +∞ and f ∈ C 1 (I).
Moreover, we can observe that, under the assumption of Theorem 9, it is not hard to show that, if f ∈ C 1 (I), we have: Thus, it seems clear that the best possible order of uniform approximation that can be achieved by the operators K n is O(1/n), as n → +∞.This shows that, in general, a better order of approximation cannot be obtained; indeed, in the main examples of sigmoidal functions (see Section 5 below) m 1 = 0, then the limit of Theorem 9 gives the exact order of approximation.The order is also less rapid than that achieved in case of the classical NN operators, (see [28]), that is O(1/n 2 ), as n → +∞.

High-Order Convergence of NN Operators of the Kantorovich Type
To construct NN operators of the Kantorovich type with higher order of approximation than K n , we adopt the following strategy.Let r ∈ N + , r > 1 and α j ∈ R \ {0} , j = 1, . . ., r fixed, such that: Let us define: where f is any bounded locally integrable function.It is clear that condition (4) is such that K r n preserves the approximation properties of K n .Obviously, also in this section, when we refer to σ, we always consider sigmoidal functions satisfying (σi), i = 1, 2, 3. Now, we can prove the following asymptotic and Voronovskaja type formula for K r n .
Theorem 11.Let σ be a sigmoidal function which satisfies assumption (Σ3) with α > r > 1, for some r ∈ N + .Moreover, let f ∈ C r (I) be fixed.Then: as n → +∞ and for every x ∈ I. Furthermore, if we suppose that: for n ∈ N + sufficiently large, where m h ∈ R, such that α j , j = 1, . . ., r satisfies the following linear algebraic system: Proof.We simply observe that, proceeding as in Theorem 8, we can write: hence, the asymptotic formula (5) follows arguing as for I 2,1 and I 2,2, in the proof of Theorem 8. Now, to prove the second part of the theorem, it is sufficient to observe that, under the additional assumptions on the truncated algebraic moments, we have: for n ∈ N + sufficiently large, x ∈ I. Then: and passing to the limit as n → +∞, the proof of the Voronovskaja formula (8) follows.
The previous theorem shows that the order of pointwise approximation by the NN operators K r n (f, x) is at least O (n −r ) as n → +∞ when functions that belong to C r (I) , r ∈ N are approximated.Again, it is quite simple to observe that, under the assumption of Theorem 11 and for f ∈ C r (I), we have:

Applications to Special Cases
In this section, we discuss the results proved above for some concrete cases of sigmoidal functions.

Applications with the Logistic Function
First of all, we consider the case of the Kantorovich NN type operators activated by the well-known logistic function (see, e.g., [20,40,41] and Fig. 1): NN operators activated by logistic functions have been widely studied, e.g., in [12,32].
Obviously, σ is a smooth function and it satisfies all the assumptions (Σi) , i = 1, 2, 3. Furthermore, by its exponential decay to zero as x → −∞, condition (Σ3) is satisfied for every α > 0. Hence, it turns out that M h (φ σ ) < +∞ for every h ≥ 0 in view of Lemma 4,(vi).The above function is very useful in the theory of artificial neural network, since it is used as activation function in neuronal models when training algorithms would applied, see, e.g., [14,16,45,48,52].Now, to apply the results proved in the previous sections, the truncated algebraic moments of the function φ σ (x) must be computed.In general, it is possible to compute the truncated algebraic moments of a given function exploiting a well-known result of Fourier analysis, i.e., the so-called Poisson summation formula (see, e.g., [11]).In particular, since (Σ3) holds for every α > 0, by Lemma 4 (iv), we have that φ σ ∈ L 1 (R), and then, the usual L 1 -Fourier transform can be used to apply the above mentioned Poisson summation formula.In particular, following the procedure given in [28], it turns out that φ σ satisfies assumption (3) with m 0 = 1, m 1 = 0 and θ 0 , θ 1 > 1.Then, from Theorem 9, we can obtain what follows.
Corollary 12. Let σ (x) be the logistic function, and f ∈ C 1 (I) be fixed.Then, for every x ∈ I, we have: Moreover, φ σ (x) does not satisfy assumption (6) of Theorem 11, which, hence, cannot be used to construct the operators K r n .

Applications with Hyperbolic Tangent Sigmoidal Function
In this section, we study the case of the Kantorovich NN type operators activated by the well-known hyperbolic tangent sigmoidal function (see, e.g., [25]): The graph of σ h (x) together with its density function φ h (x) is plotted in Fig. 2.
The NN operators activated by the hyperbolic tangent sigmoidal functions have been widely studied, see, e.g., [22,23,33].Again, we can observe that σ h is a smooth function that satisfies (Σi) , i = 1, 2, 3, it tends to zero exponentially as x → −∞, and so (Σ3) holds for all α > 0. Now, computing again the truncated algebraic moments of φ σ h using the Poisson summation formula (as in the case of logistic function), it turns out that m 0 = 1, m 1 = 0, and θ 0 , θ 1 > 1 (see [28] again).Hence, we can obtain the following corollary.
Corollary 13.Let σ h (x) be the hyperbolic tangent sigmoidal function, and f ∈ C 1 (I) be fixed.Then, for every x ∈ I, we have: Also in this case, assumption (6) of Theorem 11 is not satisfied.
To provide examples of sigmoidal functions useful to construct highorder NN type operators, we can consider what follows.

Applications with Sigmoidal Functions Generated by B-splines
In [22], the well-known central B-splines of order d ≥ 1, defined by (see, e.g., [5,17,19,27,31]): have been used to introduce a class of sigmoidal functions.Here, (x) + := max {x, 0} denotes the "positive part" of x ∈ R. The sigmoidal function σ M d (x) associated with the central B-spline M d , is defined by the following formula: Consequently, the corresponding density function has the following expression: By simple computations it is easy to prove that, also for σ M d , assumptions (Σi) , i = 1, 2, 3 are satisfied.In particular, since the central B-splines have compact support, then (Σ3) is satisfied by σ M d (x) for every α > 0. Exploiting the relation (10), it is easy to obtain that: where Remark 15.As showed in Remark 5, the previous results still hold also in case of the well-known (not differentiable) ramp function: since the corresponding density function satisfies the condition (iii) of Lemma 4. Note that (see [18,26]) the sigmoidal function σ M1 (3 •) coincides with the ramp function σ R .Then, if we recall the definition of the well-known ReLu activation function (see, e.g., [36]): it turns out that: can be considered as an NN activated by the above linear combination of ReLu activation function.For more details concerning the usefulness of ψ ReLu , see, e.g., [2,3].
As showed above, the truncated algebraic moments of φ σM d are exactly equal to suitable constants.Then, the sigmoidal functions σ M d can be used to generate the high-order convergence operators K r n .Here, we consider the linear combination of α In this way, we obtain the following operators: Clearly, an analogous of Corollary 16 can be reformulated also for a finite linear combination of Kantorovich NN type operators K n with r > 3 to achieve a faster convergence.processi innovativi per lo sviluppo di una banca di immagini mediche per fini diagnostici" funded by the Fondazione Cassa di Risparmio di Perugia, 2018.
Funding Open access funding provided by Universitá degli Studi di Perugia within the CRUI-CARE Agreement.

Definition 3 .
Let σ : R → R be a measurable function.We call σ a sigmoidal function if

K 2 Corollary 16 . 2 )γ d 2 , 3 )
,d n (f, x) := −K σM d n (f, x) + 2K σM d 2n (f, x) K 3,d n (f, x) Let σ M d (x) , d ∈ N + ,be the sigmoidal function generated by the central B-spline M d (x), and let x ∈ I be fixed.For any f ∈ C 2 (I), there holds:lim n→+∞ n 2 K 2,d n (f, x) − f (x) = − f (while for any f ∈ C 3 (I), we have:lim n→+∞ n 3 K 3,d n (f, x) − f (x) = 3f ( ), j ∈ N with i the imaginary unit, then, for n sufficiently large, we obtain that m n 0 φ σM d , x = 1 and m n 1 φ σM d , x = 0. Hence: Corollary 14.Let σ M d (x) , d ∈ N + , be the sigmoidal function generated by the central B-spline M d (x), and f ∈ C 1 (I) be fixed.Then, for every x ∈ I, we have: