Approximation of multi-parametric functions using the differential polynomial neural network

Zjavka, Ladislav

doi:10.1186/2251-7456-7-33

Approximation of multi-parametric functions using the differential polynomial neural network

Original research
Open access
Published: 24 July 2013

Volume 7, article number 33, (2013)
Cite this article

Download PDF

You have full access to this open access article

Mathematical Sciences Aims and scope Submit manuscript

Approximation of multi-parametric functions using the differential polynomial neural network

Download PDF

Ladislav Zjavka¹

2532 Accesses
2 Citations
Explore all metrics

Abstract

Unknown data relations can describe a lot of complex systems through a partial differential equation solution of a multi-parametric function approximation. Common artificial neural network techniques of a pattern classification or function approximation in general are based on whole-pattern similarity relations of trained and tested data samples. It applies input variables of only absolute interval values, which may cause problems by far various training and testing data ranges. Differential polynomial neural network is a new type of neural network developed by the author, which constructs and resolves an unknown general partial differential equation, describing a system model of dependent variables. It creates a sum of fractional polynomial terms, defining partial mutual derivative changes of input variables combinations. This type of regression is based on learned generalized data relations. It might improve dynamic system models a standard time-series prediction, as the character of relative data allows to apply a wider range of input interval values than defined by the trained data. Also the characteristics of differential equation solutions facilitate a great variety of model forms.

Forecast Models of Partial Differential Equations Using Polynomial Networks

Complex System Modeling with General Differential Equations Solved by Means of Polynomial Networks

Neural Network Representation for Ordinary Differential Equations

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Solving differential equations are able to define models for a variety of pattern recognition [1] and primarily function approximation problems, applying genetic programming techniques [2] or an artificial neural network (ANN) construction [3]. A common ANN operating principle is based on entire similarity relations of new presented input patterns with the trained ones. A principal lack of its functionality in general is a disability of an input pattern data relation generalization. It utilizes only input variables of absolute interval values, which are not able to describe a wider range of applied data. The ANN generalization from the training data set may be difficult or problematic if the model has not been trained with inputs in the range covered testing data [4]. If the data involve relations, which may become stronger or weaker character, the neural network model should generalize it to be applied also onto different interval values. Differential polynomial neural network (D-PNN) is a new neural network type, which creates and resolves an unknown partial differential equation (DE) of a multi-parametric function approximation. A DE is replaced producing sum of fractional polynomial derivative terms, forming a system model of dependent variables. In contrast with the ANN approach, each neuron of the D-PNN can direct take part in the network total output calculation. Analogous to the ANN function approximation (and pattern identification), the study tried to create a neural network, which function estimation (or pattern recognition) is based on any dependent data relations. In a case a function approximation the output of the neural network is a functional value. In the case a pattern identification its response should be the same to all input vectors, which variables keep up the trained dependencies, no matter what values they become. However the principle of both types is the same, analogous to the ANN approach [5].

\begin{array}{l} y = a_{0} + \sum_{i = 1}^{m} a_{i} x_{i} + \sum_{i = 1}^{m} \sum_{j = 1}^{m} a_{ij} x_{i} x_{j} \\ + \sum_{i = 1}^{m} \sum_{j = 1}^{m} \sum_{k = 1}^{m} a_{ijk} x_{i} x_{j} x_{k} + \dots \end{array}

(1)

m – number of variables A(a ₁ , a ₂ , …, a _m ), … - vectors of parameters

X(x ₁ , x ₂ , …, x _m) - vector of input variables

D-PNN’s block skeleton is formed by the GMDH (Group Method of Data Handling) polynomial neural network, which was created by a Ukrainian scientist Aleksey Ivakhnenko in 1968, when the back-propagation technique was not known yet [6]. General connection between input and output variables is possible to express by the Volterra functional series, a discrete analogue of which is Kolmogorov-Gabor polynomial (1). This polynomial can approximate any stationary random sequence of observations and can be computed by either adaptive methods or system of Gaussian normal equations [7]. GMDH decomposes the complexity of a process into many simpler relationships each described by low order polynomials (2) for every pair of the input values.

y ’ = a_{0} + a_{1} x_{i} + a_{2} x_{j} + a_{3} x_{i} x_{j} + a_{4} x_{i} 2 + a_{5} x_{j} 2

(2)

Partial differential equation construction

The basic idea of the D-PNN is to create and resolve a generally true partial differential equation (3), which is not known in advance and can describe a system of dependent variables, with a special type of fractional multi-parametric polynomials (4), i.e. sum derivative terms.

\begin{array}{l} a + \sum_{i = 1}^{n} b_{i} \frac{\partial u}{\partial x_{i}} + \sum_{i = 1}^{n} \sum_{j = 1}^{n} c_{ij} \frac{\partial^{2} u}{\partial x_{i} \partial x_{j}} + \dots = 0 \\ u = \sum_{k = 1}^{\infty} u_{k} \end{array}

(3)

u = f(x ₁ , x ₂ ,, …, x _n ) – searched function of all input variables

a, B(b ₁ , b ₂ ,, …, b _n ), C(c ₁₁ , c _12, ,…) – polynomial parameters

Elementary methods of DE solutions express the solution in special elementary functions – polynomials (e.g. Bessel’s functions, Fourier’s or power series). Numerical integration of differential equation solutions is based on using:

rational integral functions
trigonometric series

Partial DE terms are formed by the adapted application of the method of integral analogues, which replaces mathematical operators and symbols of a DE by ratio of corresponding values. Derivatives are replaced by their integral analogues, i.e. derivative operators are removed and simultaneously all operators are replaced by similarly or proportion signs in equations, all vectors are replaced by their absolute values [8]. However there should be possible to form sum derivative terms replacing a general partial DE (3) by using different math techniques, e.g. wave series and others.

\begin{array}{l} y_{i} = \frac{{(a_{0} + a_{1} x_{1} + a_{2} x_{2} + \dots + a_{n} x_{n} + a_{n + 1} x_{1} x_{2} + \dots)}^{\frac{m + 1}{n}}}{b_{0} + b_{1} x_{1} + \dots} \\ = \frac{\partial^{m} f (x_{1}, x_{2}, \dots, x_{n})}{\partial x_{1} \partial x_{2} \dots \partial x_{m}} Y = \sum_{i = 1}^{\infty} y_{i} = 0 \end{array}

(4)

n – combination degree of n-input variable polynomial of numerator

m – combination degree of denominator w _t – weights of terms

The fractional polynomials (4), defining partial relations of n-input variables, represent summation derivative terms (neurons) of a DE. The numerator of eq. (4) is a complete n-variable polynomial, which realizes a new partial function u of formula (3). The denominator of eq. (4) is a derivative part, which gives a partial mutual change of some input variables combination. It arose from the partial derivation of the complete n-variable polynomial in respect to competent combination variables. Root functions of numerator (4) take the polynomials into a correspondent combination degree but needn’t be used at all if are not necessary. They may be adapted to enable the D-PNN to generate an adequate range of desired output values.

Multi-parametric function approximation

D-PNN can approximate a multi-parametric function through a general partial sum DE solution (3). Consider first only linear data relations, which describe the DE, e.g. a simple sum function y_t= x₁+ x₂ (however it could be any linear function). The network with 2 inputs, forming 1 functional output value y = f(x₁, x₂) should approximate the true function y_t by replacing sum derivative terms of the DE (5). It consists of only 1 block of 2 neurons, terms of both derivative variables x₁ and x₂ (Figure 1).

\begin{array}{l} y = w_{1} \frac{a_{0} + a_{1} x_{1} + a_{2} x_{2} + a_{3} x_{1} x_{2}}{b_{0} + b_{1} x_{1}} \\ + w_{2} \frac{a_{0} + a_{1} x_{1} + a_{2} x_{2} + a_{3} x_{1} x_{2}}{b_{0} + b_{1} x_{2}} \end{array}

(5)

D-PNN can be trained with only very small data set (6 samples), involving a wide range of input values <5,500>. Figure 2 shows approximation errors (y-axis) of the trained network, i.e. differences of the true and estimated function, to random input vectors with dependent variables. Thus X-axis represents the ideal function.

Output errors can result from some disproportional dependent random vector values (Figure 2), which D-PNN was not trained with, e. g. 360 = 358 + 2. Figure 2 shows only the results of the 2-variable 2D-function f(x₁,x₂)= x₁+ x₂. The testing random output functions f(x₁,x₂) displayed on x-axis (Figure 2) exceed the maximal trained sum value 500, while the approximation error increases just slowly.

If the number of input variables is increased to 3, the DE composition can apply polynomials of higher combination degree (=3), which results in raising amount of sum derivative terms. The 3-variable D-PNN for linear true function approximation (e.g. y_t= x₁+ x₂+ x₃) can contain again 1 block of 6 neurons, DE terms of all 1 and 2-combination derivative variables of the complete DE, e.g. (6)(7).

y_{1} = w_{1} \frac{{(a_{0} + a_{1} x_{1} + a_{2} x_{2} + a_{3} x_{3} + a_{4} x_{1} x_{2} + \dots + a_{7} x_{1} x_{2} x_{3})}^{\frac{2}{3}}}{b_{0} + b_{1} x_{1}}

(6)

y_{4} = w_{2} \frac{a_{0} + a_{1} x_{1} + a_{2} x_{2} + a_{3} x_{3} + a_{4} x_{1} x_{2} + \dots + a_{7} x_{1} x_{2} x_{3}}{b_{0} + b_{1} x_{1} + b_{2} x_{1} + b_{3} x_{1} x_{2}}

(7)

The training data set of the 3-variable function required an extension (in comparison 2-variables) to enable the D-PNN to get with a desired approximation error. The parameter optimization may apply a proper difference evolution algorithm (EA), supplied with sufficient random mutations to prevent a parameter adjustment convergence before a desired error reaching [9]. Not every experiment succeeds in a functional model.

Multi-layered backward D-PNN

Multi-layered D-PNN, consisting of blocks of neurons, forms composite polynomial functions (functions of functions) (8) in each next hidden layer. Each block contains a single polynomial (without derivative part) forming its output, entranced into the next hidden layer (Figure 3). Neurons don’t affect the block output but are applied just directly as the sum derivative terms of a total output calculation (DE composition). The blocks of the 2^nd and following hidden layers also form additional extended neurons, i.e. composite terms (CT), which define derivatives of composite functions, applying reverse outputs and inputs of back connected blocks of previous layers. These partial derivatives in respect to variables of previous layers are calculated according to the composite function derivation rules (9) (10) and formed by products of partial derivatives of outer and inner functions [10].

y_{i} = ϕ_{i} (X) = ϕ_{i} (x_{1}, x_{2}, \dots, x_{n}) i = 1, \dots, m

(8)

\begin{array}{l} F (x_{1}, x_{2}, \dots, x_{n}) = f (y_{1}, y_{2}, \dots, y_{m}) \\ = f (ϕ_{1} (X), ϕ_{2} (X), \dots, ϕ_{m} (X)) \end{array}

(9)

\frac{\partial F}{\partial x_{k}} = \sum_{i = 1}^{m} \frac{\partial f (y_{1}, y_{2}, \dots, y_{m})}{\partial y_{i}} \cdot \frac{\partial ϕ_{i} (X)}{\partial x_{k}} k = 1, \dots, n

(10)

The 1^st block of the last (3^rd) hidden layer (Figure 3) forms 2 neurons of its own input variables as 2 simple terms (11) of the DE (3). It creates also 4 compound terms of the 2^nd (previous) hidden layer, using reverse outputs and inputs of 2 bound blocks in respect to 4 derivative variables (12). As couples of variables of the inner functions can differ from each other, their partial derivations are 0 and so the sum of formula (10) will consist only of 1 term. Thus each neuron of the D-PNN represents a DE term. Likewise compound terms can be created in respect to the 1^st hidden layer variables e.g. (13). The 3 back-joint blocks form 8 CT of the DE and this can be well performed by a recursive algorithm.

y_{1}^{1} = w_{1} \frac{{(a_{0} + a_{1} x_{1}^{,,} a_{2} x_{2}^{,,} + a_{3} x_{1}^{,,} x_{2}^{,,})}^{\frac{4}{7}}}{b_{0} + b_{1} x_{1}^{,,}} = w_{1} \frac{{({}^{1}x_{1}^{,,,})}^{\frac{4}{7}}}{b_{0} + b_{1} x_{1}^{,,}}

(11)

y_{3}^{1} = w_{3} \frac{{(^{3} x_{1}^{,,,})}^{\frac{2}{3}}}{x_{2}^{,,}} \frac{{(x_{1}^{,,})}^{\frac{2}{3}}}{b_{0} + b_{1} x_{1}^{,}}

(12)

y_{7}^{1} = w_{7} \frac{{7 x_{1}^{,,,}}^{}}{x_{2}^{,,}} \frac{x_{1}^{,,}}{x_{2}^{,}} \frac{x_{1}^{,}}{b_{0} + b_{1} x_{1}}

(13)

D-PNN should create a functional value around the desired output. As the input vector variables can take a wide range of values (Figure 2), the combination polynomials produce big output values. Therefore the multiplication (10) was replaced by division operator in fractions of compound terms (11)(12)(13) without an negative effect, reducing the combination degree of composite term polynomials each previous joint layer. Without this modification the root exponents of CT fractions would require an adjustment. The numerator exponents are adapted to the current layer calculation, as the combination degree of polynomials doubles each following hidden layer (11)(12)(13).

Each neuron has an adjustable term weight w_i but not everyone may participate in the total network output calculation (DE composition). The selection of optimal neuron combination can perform easily a proper genetic algorithm (GA) [11]. Parameters of polynomials are represented by real numbers, which random initial values are generated from the interval <0.5, 1.5>. They are adjusted with simultaneous GA best-fit neuron combination search in the initial phase of the DE composition. There would be welcome to apply an adequate gradient steepest descent method [12], in conjunction with the EA [13]. D-PNN can be trained with only small input–output data set likewise the GMDH polynomial neural network does [14]. D-PNN’s total output Y is the arithmetic mean of all active neuron output values (14) to prevent the neuron amount to influence it.

Y = \frac{\sum_{i = 1}^{k} y_{i}}{k} k = actual amount of active neurons

(14)

Experiments

The presented 3-variable multi-layered D-PNN (Figure 3) is able to approximate any linear function e.g. simple sum f(x₁, x₂, x₃) = x₁+ x₂+ x₃. The D-PNN and ANN comparison processes 12 fixed same training data samples. The progress curves are typical of all following experiments (benchmarks) (Figure 4). The approximation accuracy of both methods is co-equal on the trained interval values <10,300>, however the ANN approximation ability rapidly falls outside of this range, while the D-PNN alternate errors grow just slowly. The ANN with 2-hidden layers of neurons applied the sigmoidal activation function and the standard back-propagation algorithm. The D-PNN output has typically a wave-like behavior as it model is composed of sum DE terms.

Approximation of non-linear functions requires the extension of the D-PNN block and neuron polynomials (11)(12)(13) with square power variables. Polynomials are the same as applied by the GMDH algorithm (2). Competent square power (16) and combination (17) derivatives form additional sum terms of the 2^nd order partial DE (15). The compound neurons of these derivatives are also formed according to the composite function derivative rules [10].

F (x_{1}, x_{2}, u, \frac{\partial u}{\partial x_{1}}, \frac{\partial u}{\partial x_{2}}, \frac{\partial^{2} u}{\partial x_{1}^{2}}, \frac{\partial^{2} u}{\partial x_{1} \partial x_{2}}, \frac{\partial^{2} u}{\partial x_{2}^{2}}) = 0

(15)

where F(x ₁ , x ₂ , u, p, q, r, s, t) is a function of 8 variables

\begin{array}{l} y_{10} = w_{10} \frac{a_{0} + a_{1} x_{1} + a_{2} x_{2} + a_{3} x_{1}^{2} + a_{4} x_{2}^{2} + a_{5} x_{1} x_{2}}{b_{0} + b_{1} x_{1} + b_{2} x_{1}^{2}} \\ = \frac{\partial^{2} f (x_{1}, x_{2})}{\partial x_{1}^{2}} \end{array}

(16)

\begin{array}{l} y_{12} = w_{12} \frac{a_{0} + a_{1} x_{1} + a_{2} x_{2} + a_{3} x_{1}^{2} + a_{4} x_{2}^{2} + a_{5} x_{1} x_{2}}{b_{0} + b_{1} x_{1} + b_{2} x_{2} + b_{3} x_{1} x_{2}} \\ = \frac{\partial^{2} f (x_{1}, x_{2})}{\partial x_{1} \partial x_{2}} \end{array}

(17)

Figures 5 and 6 show the D-PNN and ANN compare approximation of some benchmarks - growing non-linear functions. The 24 training data samples were randomly generated by benchmark functions from the interval <10,400> for both network models. The parameter and weight adjustment of both methods appeared heavy time-consuming and have not succeed any experiment. The optimal number of the D-PNN’s derivative neurons of the non-linear benchmark models was around 100. Experiments with other benchmarks (e.g. x₂²+x₃³+x₃⁴) result in similar outcome graphs. The D-PNN wavelet output can comprise a wider range of testing data interval values, which was not trained, than ANN. Both method inaccuracies intensify on untrained interval values, as the benchmarks involve power functions. Presented experiments verified the neural network capability to approximate any multi-parametric function, though the operating principles of both techniques differ essentially.

Figure 7a-c show D-PNN models of different final training root mean square errors (RMSE). RMSE decrease results in a lower generalization on untrained data interval values (Figure 7c). Vice versa the greater RMSE evoke the model is valid for a wider data range, while the inaccuracies of the training data interval are overvalued (Figure 7a). This effect became evident of all experiments. As a result the D-PNN should not be trained to a minimal achievable error value to get with the optimal generalization of testing data. The applied incomplete adjustment and selective methods require improvements, which could yield better results. The presented D-PNN operating principle differs by far from other common neural network applied techniques. Benchmark results enable to suppose the D-PNN will succeed forming complex models, which can be defined in the form of multi-parametric functions.

Conclusion

D-PNN is a new type of neural network, which function approximation and dependence of variables identification is based on a generalization of data relations. It does not utilize absolute values of variables but relative ones, which can better describe a wide range of input data interval values. D-PNN constructs a general partial differential equation, which defines a system model of dependent variables, applying integral fractional polynomial sum terms. An acceptable implementation of the sum derivative terms is the principal part of a partial DE substitution. Artificial neural network pattern identification and function approximation models are simpler techniques, based only on whole-pattern similarity relations. A real data example might solve weather forecasts of 1 locality, e.g. static pressure values prediction applying some trained data relations of few nearby localities of surrounding areas. Phases of a constant time interval, of this very complex system could define input vectors of training data set. Estimated multi-parametric function values, i.e. next system states of a selected locality of a time delay form desired network outputs. D-PNN could create better long-time models based on a partial sum DE solution, than a standard ANN time-series prediction (based on entire pattern definitions too).

Author’s contributions

The author designed a new type of neural network based on GMDH polynomial neural network, which forms its skeleton structure. The proposed new neural network generates a sum of relative derivative polynomial terms as a general differential equation model description. It replaces and resolves the sum partial differential equation as an approximation of an unknown multi-parametric function defined by several discrete point observations. The network was tested with some benchmark functions and compared with artificial neural network models. Real data application models will follow the test experiments.

References

Zhou B, Xiao-Li Y, Liu R, Wei W: Image segmentation with partial differential equations. Information Technology Journal 2010,9(5):1049–1052.
Article Google Scholar
Iba H: Inference of differential equation models by genetic programming. : Information Sciences, Volume 178, Issue 23, 1 December 2008, Pages; 2008:4453–4468.
Google Scholar
Tsoulos I, Gavrilis D, Glavas E: Solving differential equations with constructed neural networks Neurocomputing, Volume: 72. Issues 2009, 10–12: 2385–2391.
Google Scholar
Giles CL: Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference. Machine Learning 2001, 44: 161–183. 10.1023/A:1010884214864
Article MATH Google Scholar
Zjavka L: Recognition of Generalized Patterns by a Differential Polynomial Neural Network. ETASR - Engineering, Technology & Applied Science Research 2012,2(1):167–172.
Google Scholar
Ivakhnenko AG: Polynomial theory of complex systems. : IEEE Transactions on systems, Vol. SMC-1, No.4; 1971.
Google Scholar
Nikolaev NY, Iba H: Adaptive Learning of Polynomial Networks. New York: Springer; 2006.
MATH Google Scholar
Kuneš J, Vavroch O, Franta V: Fundamentals of modeling. in Czech: SNTL Praha; 1989.
Google Scholar
Das S, Abraham A, Konar A: Particle swarm optimization and Differential evolution algorithms. Springer-Verlag Berlin: Studies in Computational Intelligence 116, 1–38; 2008.
Google Scholar
Kluvánek I, Mišík L, Svec M, Matematics I: SNTL Bratislava. in Slovak: ; 1966.
Google Scholar
Obitko M: Genetic algorithms. Hochshule fur Technik und Wirtschaft Dresden; 1998. [Online] Available: http://www.obitko.com/tutorials/genetic-algorithms/ [Online] Available:
Google Scholar
Nikolaev NY, Iba H: Polynomial harmonic GMDH learning networks for time series modeling. Neural Networks 2003, 16: 1527–1540. Science Direct Science Direct 10.1016/S0893-6080(03)00188-6
Article Google Scholar
Zjavka L: Construction and adjustment of differential polynomial neural network. Academic Journals: Journal of Engineering and Computer Innovations Vol. 2 Num. 3, March 2011; 2011:40–50.
Google Scholar
Galkin I: Polynomial neural networks. Materials for UML 91. University Mass Lowell: 531 Data mining course; http://ulcar.uml.edu/~iag/CS/Polynomial-NN.html

Download references

Acknowledgements

This paper has been elaborated in the framework of the IT4Innovations Centre of Excellence project, reg. no. CZ.1.05/1.1.00/02.0070 supported by Operational Programme’Research and Development for Innovations’ funded by Structural Funds of the European Union and state budget of the Czech Republic and supported by the Ministry of Industry and Trade of the Czech Republic, under the grant no. FR-TI1/420, and by SGS, VŠB – Technical University of Ostrava, Czech Republic, under the grant No. SP2012/58.

Author information

Authors and Affiliations

VŠB-Technical University of Ostrava Centre of Excellence IT4innovations, Ostrava, Czech republic
Ladislav Zjavka

Authors

Ladislav Zjavka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ladislav Zjavka.

Additional information

Competing interests

The author declares that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Zjavka, L. Approximation of multi-parametric functions using the differential polynomial neural network. Math Sci 7, 33 (2013). https://doi.org/10.1186/2251-7456-7-33

Download citation

Received: 31 May 2012
Accepted: 21 June 2013
Published: 24 July 2013
DOI: https://doi.org/10.1186/2251-7456-7-33

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Approximation of multi-parametric functions using the differential polynomial neural network

Abstract

Similar content being viewed by others

Forecast Models of Partial Differential Equations Using Polynomial Networks

Complex System Modeling with General Differential Equations Solved by Means of Polynomial Networks

Neural Network Representation for Ordinary Differential Equations

Introduction