Probabilistic neural network training procedure based on Q(0)learning algorithm in medical data classification
 3.1k Downloads
 7 Citations
Abstract
In this article, an iterative procedure is proposed for the training process of the probabilistic neural network (PNN). In each stage of this procedure, the Q(0)learning algorithm is utilized for the adaptation of PNN smoothing parameter (σ). Four classes of PNN models are regarded in this study. In the case of the first, simplest model, the smoothing parameter takes the form of a scalar; for the second model, σ is a vector whose elements are computed with respect to the class index; the third considered model has the smoothing parameter vector for which all components are determined depending on each input attribute; finally, the last and the most complex of the analyzed networks, uses the matrix of smoothing parameters where each element is dependent on both class and input feature index. The main idea of the presented approach is based on the appropriate update of the smoothing parameter values according to the Q(0)learning algorithm. The proposed procedure is verified on six repository data sets. The prediction ability of the algorithm is assessed by computing the test accuracy on 10 %, 20 %, 30 %, and 40 % of examples drawn randomly from each input data set. The results are compared with the test accuracy obtained by PNN trained using the conjugate gradient procedure, support vector machine algorithm, gene expression programming classifier, k–Means method, multilayer perceptron, radial basis function neural network and learning vector quantization neural network. It is shown that the presented procedure can be applied to the automatic adaptation of the smoothing parameter of each of the considered PNN models and that this is an alternative training method. PNN trained by the Q(0)learning based approach constitutes a classifier which can be treated as one of the top models in data classification problems.
Keywords
probabilistic neural network smoothing parameter training procedure Q(0)learning algorithm reinforcement learning accuracy1 Introduction
Probabilistic neural network (PNN) is an example of the radial basis function based model effectively used in data classification problems. It was proposed by Donald Specht [37, 38] and, as the data classifier, draws the attention of researchers from the domain of data mining. For example, it is applied in medical diagnosis and prediction [23, 25, 28], image classification and recognition [7, 20, 27], bearing fault detection [32], digital image watermarking [45], earthquake magnitude prediction [1] or classification in a timevarying environment [31].
PNN is a feedforward neural network with a complex structure. It is composed of an input layer, a pattern layer, a summation layer and an output layer. Despite its complexity, PNN only has a single training parameter. This is a smoothing parameter of the probability density functions (PDFs) which are utilized for the activation of the neurons in the pattern layer. Thereby, the training process of PNN solely requires a single inputoutput signal pass in order to compute network response. However, only the optimal value of the smoothing parameter gives the possibility of correctness of the model’s response in terms of generalization ability. The value of σ must be estimated on the basis of the PNN’s classification performance which is usually achieved in an iterative manner.
Within the process of the smoothing parameter estimation two issues must be addressed. The first one pertains to the selection of σ in PDF for the pattern layer neurons of PNN. Four possible approaches are applied, i.e. a single parameter for the whole model [37, 38], a single parameter for each class [1], a single parameter for each data attribute [11, 39], and a single parameter for each attribute and a class [7, 11, 12].
The second problem related to the smoothing parameter estimation for PNN is concerned with the computation of the σ value. In literature, different procedures have been developed. For example, in [39], the conjugate gradient descent (ascent) is used to find iteratively the set of σ’s which maximize the optimization criterion. Chtioui et al [7] exploit the conjugate gradient method and the approximate Newton algorithm to determine the smoothing parameters associated with each data attribute and class. In [12], the authors utilize the particle swarm optimization algorithm to estimate the matrix of the smoothing parameters for the probability density functions in the pattern layer. An interesting study is presented in [47], where the gapbased approach for smoothing parameter adaptation is proposed. The authors provide the formula for σ on the basis of the gap computed between the two nearest points of the data set. The solution is applied to PNN for which the smoothing parameter takes the form of a scalar and the vector whose parameters are associated with each data feature.
As one can observe, the choice of the smoothing parameter plays a crucial role in the training process of the probabilistic neural network. This fact is of particular importance when PNN has different σ for: each class, each attribute, and each class and attribute. The task of smoothing parameter selection can then be considered as a highdimensional function optimization problem. The reinforcement learning (RL) algorithm is an efficient method in solving such type of problems, e.g. finding extrema of some family of functions [46] or the computation of the set of optimal weights for multilayer perceptron [40]. The RL method is also frequently applied in various engineering tasks. It is used in nonstationary serial supply chain inventory control [18], adaptive control of nonlinear objects [43], adjusting robot behavior for autonomous navigation system [26] or path planning for improving positioning accuracy of a mobile microrobot [22]. There are also studies which propose the use of RL in nontechnical domains, e.g. in the explanation of dopamine neuronal activity [5] or in an educational system to improve the pedagogical policy [16].
In this work, we introduce a novel procedure for the computation of the smoothing parameter of the PNN model. This procedure uses the Q(0)learning algorithm. The method adjusts the smoothing parameter according to four different strategies: single σ for the whole network, single σ for each class, single σ for each data attribute and single σ for each data attribute and each class. The results of our proposed solution are compared to the outcomes of PNN for which the smoothing parameter is calculated using the conjugate gradient procedure and, additionally, to the support vector machine classifier, gene expression programming algorithm, k–Means clustering method, multilayer perceptron, radial basis function neural network and learning vector quantization neural network in medical data classification problems.
The authors of the present study have already proposed the application of the reinforcement learning algorithm to the computation of the smoothing parameter of radial basis function based neural networks [19]. In that work, the stateless Qlearning algorithm was used for the adaptive computation of the smoothing parameter of the networks.
This paper is organized as follows. Section 2 discusses the probabilistic neural network highlighting its basics, structure, principle of operation and the problem of smoothing parameter selection. Section 3 presents the basis of one of reinforcement learning algorithms which is applied in this work, namely the Q(0)learning algorithm. In Section 4, we present the proposed procedure. Here the problem statement is provided, a general idea of applying the Q(0)learning algorithm to the choice of the smoothing parameter is described and, finally, the details of the algorithm are given. Section 5 presents the data sets used in this research, the algorithm settings and the obtained empirical results along with the illustration of the PNN training process. In this part of our work, we compare the performance of our method with the efficiency of the PNN whose σ is determined by means of the conjugate gradient method and, additionally, to the efficiency of the reference classifiers and neural networks. Finally, in Section 6, we conclude our work.
2 Probabilistic neural network
Probabilistic neural network is a data classification model which implements the Bayesian decision rule. This rule is defined as follows. If we assume that: (1) there is a data pattern \(\mathbf {x\in \mathbb {R}}^{n}\) which is included in one of the predefined classes g=1,…,G; (2) the probability of x belonging to the class g equals p _{ g }; (3) the cost of classifying x into class g is c _{ g }; (4) the probability density functions y _{1}(x),y _{2}(x),…,y _{ G }(x) for all classes are known. Then, according to the Bayes theorem, when g≠h, the vector x is classified to the class g, if p _{ g } c _{ g } y _{ g }(x)>p _{ h } c _{ h } y _{ h }(x). Usually p _{ g }=p _{ h } and c _{ g }=c _{ h }, thus if y _{ g }(x)>y _{ h }(x), the vector x is classified to the class g.
In real data classification problems, any knowledge on the probability density functions y _{ g }(x) is not given since a data set distribution is usually unknown. Therefore, some approximation of the PDF must be determined. Such an approximation can be obtained using the Parzen method [29]. Commonly, the Gaussian function is a choice for PDF since it satisfies the conditions required by Parzen’s method.
3 Reinforcement learning
3.1 Introduction
Reinforcement learning addresses the problem of the agent that must learn to perform a task through a trial and error interaction with an unknown environment. The agent and the environment interact continuously until the terminal state is reached. The agent senses the environment and selects an action to perform. Depending on the effect of its action, the agent obtains a reward. Its goal is to maximize the discounted sum of future reinforcements r _{ t } received in the long run in any time step t, which is usually formalized as \(\sum \nolimits _{t=0}^{\infty }\gamma ^{t}r_{t}\), where \(\gamma \in \left [ 0,1\right ] \) is the agent’s discount rate [41].
The mathematical model of the reinforcement learning method is a Markov Decision Process (MDP). MDP is defined as the quadruple \(\langle S, A, P_{s_{t}s_{t+1}}^{a_{t}}, r_{t} \rangle \) where S is a set of states, A is a set of actions, \(P_{s_{t}s_{t+1}}^{a_{t}}\) denotes the probability of the transition to the state s _{ t+1}∈S after the execution of the action a _{ t }∈A in the state s _{ t }∈S.
3.2 Q(0)learning
4 Application of Q(0)learning based procedure to the adaptation of PNN’s smoothing parameter
4.1 Problem statement
The task is to find the optimal value of the smoothing parameter which maximizes the accuracy (10). For PNNC, PNNV and PNNVC models, this is a multivalued optimization problem. As the solution, we propose a new procedure, which is based on the Q(0)learning algorithm. The set of system states S, the set of actions A and the reinforcement signal r which are required by the Q(0)learning method will be defined along with the description of the algorithm.
4.2 General idea
For the adaptation of the smoothing parameter, the procedure based on the Q(0)learning algorithm is proposed for PNNS, PNNC, PNNV and PNNVC models. The introduction of the Q(0)learning algorithm is based on the assumption that in the PNN training process it is possible to distinguish two elements which interact with each other: the environment and the agent. The environment is composed of the data set used for the training process, the PNN model and the accuracy measure. The agent, on the basis of the policy which is represented by the action value function Q, chooses an action a _{ t } in a state s _{ t }. The action a _{ t } is used to modify the smoothing parameter. In this work, the state is represented by the accuracy measure. This has some natural interpretation since the state defined in such a way is the function of PNN output, which depends on the smoothing parameter. The output of PNN is computed for the training and test set in order to determine the training and test accuracies. On the basis of the training accuracy, the next state s _{ t+1} and the reinforcement signal r _{ t } are computed. The reinforcement signal provides information about the change of the training accuracy taking the negative value when the accuracy decreases and the positive value when the accuracy increases. The effect of interaction between the agent and the environment results in both the modification of the action value function Q, and the change of the smoothing parameter.
The main assumption of the proposed procedure is to perform the training of the PNN model on the training set in order to maximize the training accuracy (10). Additionally, PNN is tested by computing the accuracy on the test set. The highest test accuracy and its corresponding value of the smoothing parameter is stored. Finding the highest test accuracy of PNN will provide the optimal smoothing parameter in terms of the prediction ability.

choose a _{ t } action using an actual policy derived from the action value function Q;

update a single element of σ _{ V } with the value of a _{ t };

compute training and test accuracy A c c _{ t } according to (10);

actualize the maximal test accuracy \(Acc_{\max }\) and the corresponding \(\boldsymbol {\sigma }_{\max }^{(m)}\);

calculate the reinforcement signal r _{ t } on the basis of training accuracy;

update the action value function Q.
Once the first stage of the procedure is completed, the second one begins. Here, \(\boldsymbol {\sigma }_{\max }^{(2)}\) is initialized with the optimal value from the previous stage and the action set is changed. In our approach, this change relies on the decrease of all action values by an order of magnitude. The PNNV model is trained using Algorithm 1 and the smoothing parameter vector which maximizes the training accuracy is updated.
The procedure is performed Mtimes, each time: (1) – updating \(\boldsymbol {\sigma }_{\max }^{(m)}\) on the basis of \(\boldsymbol {\sigma }_{\max }^{(m1)}\), (2) – decreasing the absolute values of the actions and (3) – finding new smoothing parameter values which maximize training accuracy. Such a type of approach, where the initial absolute values of the actions are large, gives the possibility of the selection of the smoothing parameter values within a broad range. The iterative decrease of actions in subsequent stages makes \(\boldsymbol {\sigma }_{\max }^{(m)}\) narrow its values. This, in turn, allows us to search for a more optimal parameter of the PNNV model.
Once all M stages are performed, the highest test accuracy \(Acc_{\max }\) is computed for \(\boldsymbol {\sigma }_{\max }^{(m)}\). Such a solution provides the highest prediction ability of PNNV.
As shown, the above procedure utilizes RL in the problem of smoothing parameter adjustment to perform a classification task. However, it is also possible to combine PNN and RL in the other way. In the work of Heinen and Engel [15], a new incremental probabilistic neural network (IPNN) is proposed. The matrix of the smoothing parameters of IPNN is used for the action selection in the RL problem. IPNN is therefore utilized as the action value function approximator.
4.3 Application of the Q(0)learning algorithm to adaptive computation of the smoothing parameter
In this subsection, we explain the details concerning the application of the Q(0)learning algorithm to the problem of σ _{ V } adaptive selection for the PNNV classifier. As mentioned before, the algorithm is solely highlighted for this type of network since Q(0)learning works in a similar manner for PNNS, PNNC and PNNVC. The only difference is related to the number of the smoothing parameters which have to be updated. For PNNV, there is n parameters of the model while for PNNS, PNNC and PNNVC, there exist 1, G and n×G smoothing parameters, respectively.
The use of the Q(0)learning algorithm for the choice of σ _{ V } parameter requires the definition of the set of the system states, the action set and the reinforcement signal.
Definition 1
The set of system states is defined by the accuracy measure: \(S=\left \lbrace 0,\frac {1}{l_{T}},\frac {2}{l_{T}},\ldots ,\frac {l_{T}1}{l_{T}},1\right \rbrace \). S takes the real values from the interval [0,1]. The total number of states is therefore l _{ T }+1.
Definition 2
A ^{(m)} is the symmetric set of actions of the following form: \(A^{(m)}=\left \lbrace a_{1}^{(m)},a_{2}^{(m)},\ldots ,a_{p^{(m)}}^{(m)},\right .\) \(\left .a_{p^{(m)}}^{(m)},\ldots ,a_{2}^{(m)},a_{1}^{(m)}\right \rbrace \) where p ^{(m)} denotes the half of the cardinality of this set in stage m of the procedure.
In each stage of the procedure, the smoothing parameters of PNNV are increased or decreased by the element values of A ^{(m)} action set. The proposition of the action set in the first stage (A ^{(1)}) allows the modification of σ _{ V } with large values. This gives the possibility of searching optimal parameters inside a broad range of values. Maximally, the elements of σ _{ V } can be modified by the value of ±10. The first stage of the procedure ends up with finding a candidate for optimum of the smoothing parameter. Subsequent decreases of absolute values of actions in A ^{(2)}, shrink the domain of possible optimal parameter values. Finally, in A ^{(3)}, the absolute values of the actions are so small so that the smoothing parameters of PNNV slightly change. A large change of σ _{ V } in the third stage of the procedure is not required because the optimal modification route has already been established (in the first two stages).
In order to maximize the training accuracy of PNNV, the actual reinforcement signal r _{ t } should reward an agent when the training accuracy increases and punish an agent when the accuracy decreases. This idea can be simply formalized as follows.
Definition 3
Such a form of the reinforcement signal combined with the action value function update strengthens the confidence if the choice of an action is beneficial or not.
Algorithm 1 shows the application of the Q(0)learning method to the adaptive choice of σ _{ V } for the PNNV classifier. This algorithm is executed in each mth stage of the procedure shown in Fig. 3.
The algorithm starts with the initialization of \(Acc_{\max }\) on the basis of the smoothing parameter values found in the previous stage of the procedure except from the first stage, when σ _{ V } is initialized with ones. \(Acc_{\max }\) will store the maximal test accuracy computed on the test set during the training process. Then, in step 2, the action value function Q is set to zero.
Algorithm 1The proposed algorithm of smoothing parameter adaptation of PNNV model with the use of Q(0)learning method for mth stage of the procedure  
It is worth noting that the type of the PNN model influences the number of smoothing parameter updates. For PNNS, PNNC, PNNV and PNNVC, the number of the smoothing parameter updates is equal \(t_{\max }\), \(t_{\max }\times G\), \(t_{\max }\times n\) and \(t_{\max }\times n\times G\), respectively.
5 Experiments
In this section, we present the simulation results in the classification of medical databases obtained by PNNS, PNNC, PNNV and PNNVC trained by the proposed procedure. These results are compared with the outcomes obtained by PNN trained using the conjugate gradient procedure (PNNVC–CG), support vector machine (SVM) algorithm, gene expression programming (GEP) classifier, k–Means method, multilayer perceptron (MLP), radial basis function neural network (RBFN) and learning vector quantization neural network (LVQN). The data sets used in these experiments are also briefly described and the adjustments of the algorithm are provided. Moreover, the illustration of the PNNS training process is presented.
5.1 Data sets used in the study

Wisconsin breast cancer database [24] that consists of 683 instances with 9 attributes. The data is divided into two groups: 444 benign cases and 239 malignant cases.

Pima Indians diabetes data set [36] that includes 768 cases having 8 features. Two classes of data are considered: samples tested negative (500 records) and samples tested positive (268 records).

Haberman’s survival data [21] that contains 306 patients who underwent surgery for breast cancer. For each instance, 3 variables are measured. The 5year survival status establishes two input classes: patients who survived 5 years or longer (225 records) and patients who died within 5 years (81 records).

Cardiotocography data set [3] that comprises 2126 measurements of fetal heart rate and uterine contraction features on 22 attribute cardiotocograms classified by expert obstetricians. The classes are coded into three states: normal (1655 cases), suspect (295 cases) and pathological (176 cases).

Dermatology data [13] that includes 358 instances each of 34 features. Six data classes are considered: psoriasis (111 cases), lichen planus (71 cases), seborrheic dermatitis (60 cases), cronic dermatitis (48 cases), pityriasis rosea (48 cases) and pityriasis rubra pilaris (20 cases).

Statlog heart database [3] that consists of 270 instances and 13 attributes. There are two classes to be predicted: absence (150 cases) or presence (120 cases) of heart disease.
5.2 Algorithms’ settings
In the case of the proposed algorithm, the initial values of the action value function Q are set to zero. Three six–element action sets proposed in (12) are used. The maximum number of the training steps \(t_{\max }=100\) is assumed. We apply such a value of \(t_{\max }\) in order to show that at a relatively small number of training steps it is possible to achieve satisfactory results. Additionally, the Q(0)learning algorithm requires appropriate selection of its intrinsic parameters: the greedy parameter, the update rate and the discount factor.
The greedy parameter ε determines the probability of random action selection and must be taken from the set \(\left [ 0,1\right ] \). If ε=0.05, only 5 actions out of 100 are chosen randomly from the action set. The remaining 95 % of action selections are performed according to learned policy represented by the Q–table. If the elements of the Q–table are the same (initial iterations of Algorithm 1), the actions are selected randomly. In this work, the greedy parameter is chosen experimentally from the set \(\left \lbrace 0.5, 0.05, 0.005 \right \rbrace \). Unfortunately, for \(t_{\max }=100\), the use of ε=0.5 does not yield repeatable results. In turn, for ε=0.005, it is observed that some actions have never been selected. Therefore, ε=0.05 is utilized in the experiments.
The α parameter determines the update rate for the action value function Q. The small value of this factor increases the time of the training process. Its large value introduces the oscillations of Q elements [34]. The proper selection of α has a significant influence on the convergence of the training process. From the theoretical point of view, one requires that α is large enough to overcome any initial conditions or possible random fluctuations and it should decrease its value in time. However, in practical applications, the constant values of this factor are mostly used. Admittedly, this approach does not assure the convergence of the learning process, but a stable policy can be reached. In our study, we choose α experimentally from the set \(\left \lbrace 0.1, 0.01, 0.001 \right \rbrace \). For all three parameter values, similar results are obtained. In the final simulation, we assume α=0.01.
The discount factor γ determines the relative importance of short and long termed prizes. This parameter is mostly picked arbitrarily near 1, e.g. 0.8 [6], 0.9 [2], [33] or 0.95 [4]. In this contribution, γ=0.95.
PNNVC–CG used in the simulations is the probabilistic neural network trained by the conjugated gradient procedure. The model is a builtin tool of DTREG predictive modeling software [35]. In the experiments, we use the network for which the smoothing parameter is adapted for each input feature and class separately. The starting values of the smoothing parameters for PNNVC–CG are between 0.0001 and 10 [35].
SVM algorithm [42] is used in this work as the data classifier. The model is trained by means of the SMO algorithm [30] available in Matlab’s Bioinformatics Toolbox. Multiclass classification problems are solved by applying the oneagainstall method. In all data classification cases, radial basis function kernel is applied with experimental grid search for both C constraint and sc spread constant: \(C=\left \{10^{1}\right .\), 10^{0}, 10^{1}, 10^{2}, 10^{3}, 10^{4}, \(\left .10^{5},10^{6}\right \}\) and s c={0.08, 0.2, 0.3, 0.5, 0.8, 1.2, 1.5, 2, 5, 10, 50, 80, 100, 200, 500}, respectively.
The head size, the number of genes within each chromosome, the linking functions between genes, the computing functions in the head, the fitness functions and the genetic operators used for the GEP classifier
Head size  3,4,5,6,7,8 
Number of genes  1,2,…,12 
Linking function  Addition, Multiplication, Logical OR 
Computing functions  + , −, ∗, /, −x, 1/x 
\(b/\left (1+\exp \left (ax\right )\right )\), \(\exp \left (\left (xa\right )^{2}/\left (2b^{2}\right )\right )\)  
Fitness function  Sensitivity/Specificity 
Number of hits with penalty  
Mean squared error  
Genetic operators  Mutation = 0.044 
Inversion = 0.1  
IS Transposition = 0.1  
RIS Transposition = 0.1  
Gene Transposition = 0.1  
OnePoint Recombination = 0.3  
TwoPoint Recombination = 0.3  
Gene Recombination = 0.1 
The k–Means clustering algorithm [14] is used in the comparison for classification purposes. The predictions are made for the unknown cases by assigning them the category of the closest cluster center. In the simulations, the optimal number of clusters is found which provides highest test accuracy.
MLP neural network is simulated with one or two hidden layers activated by the transfer functions from the set {linear, hyperbolic tangent, logistic}. The same set of transfer functions is applied for the output neurons, for which the sum squared error function is calculated. The number of hidden layer neurons is optimized in order to minimize the network error. The model is trained with gradient descent with momentum and adaptive learning rate backpropagation algorithms [8].
For RBFN and LVQN neural networks, the number of hidden neurons is selected empirically from the set {2,4,6,…,100}. The optimal number of hidden neurons is taken so that the sum squared error for each model is minimized. The spread constant in RBFN hidden layer activation function is chosen experimentally from the interval [0.1,10] with the step size 0.1.
5.3 Empirical results
In this study, the performance of PNN models, for which the smoothing parameter is determined using the Q(0)learning based procedure, is evaluated on the input data partitioned in the following way. Firstly, the testing subsets are created by applying a random extraction of 10 %, 20 %, 30 % and 40 % of cases out of the input database. Then, the training sets are created using the rest of the patterns, i.e. 90 %, 80 %, 70 % and 60 % of data, respectively. This type of data division is introduced on purpose since considering all possible training–test subsets is complex from a computational point of view – the number of ways of dividing l training patterns into v sets, each of size k, is large and equals \( l! / \left (v! \cdot (k!)^{v}\right ) \) [17].
The remaining classifiers used in the comparative research: PNNVC–CG, SVM, GEP, k–Means, MLP, RBFN and LVQN are trained and validated on the same data subsets. The use of the same training/test sets for all the models makes the obtained results comparable.
 1.
In the classification of Wisconsin breast cancer data, out of all compared models, PNNC and PNNVC reach the highest average test accuracy which is equal 99.0 %. In the case of Haberman and dermatology data classification problems, the highest average test accuracy is obtained for PNNVC (81.2 %) and PNNV (97.6 %), respectively.
 2.
The SVM model provides the highest average test accuracy in the classification of the Pima Indians diabetes data set (77.2 %) and cardiotocography database (97.2 %). In these two classification tasks, PNNV is the second best model with the test accuracy lower by 0.5 % and 1.8 %, respectively. For the Statlog classification problem, GEP algorithm yields the highest average test accuracy which equals 94.6 %. This result is followed by the outcomes of PNNVC, PNNV and PNNC.
 3.
Except for the dermatology classification problem, PNNVC–CG turns out to be the worst classifier. The k–Means algorithm and the remaining reference neural networks (MLP, RBFN and LVQN) achieve lower test accuracy than the PNNV, PNNVC, SVM and GEP classifiers.
The test accuracy values (in %) determined for four considered trainingtest subsets for Wisconsin breast cancer data set
Data partitions [%]  

Model  60/40  70/30  80/20  90/10  max  avr  sd 
PNNS  97.8  97.1  93.4  98.5  98.5  96.7  2.3 
PNNC  99.6  98.5  97.8  100.0  100.0  99.0  1.0 
PNNV  99.0  98.5  96.3  100.0  100.0  98.4  1.5 
PNNVC  99.6  98.5  97.8  100.0  100.0  99.0  1.0 
PNNVC–CG  95.3  96.6  90.5  89.7  96.6  93.1  3.4 
SVM  98.9  97.6  95.6  97.1  98.9  97.3  1.4 
GEP  99.3  97.6  96.3  98.5  98.5  96.8  1.5 
k–Means  96.7  97.1  94.9  98.5  98.5  96.8  1.5 
MLP  97.7  96.4  93.9  96.8  97.7  96.2  1.7 
RBFN  97.4  96.1  95.6  95.6  97.4  96.2  0.9 
LVQN  97.8  97.6  92.3  96.9  97.8  96.2  2.6 
The test accuracy values (in %) determined for four considered trainingtest subsets for Pima Indians diabetes data set
Data partitions [%]  

Model  60/40  70/30  80/20  90/10  max  avr  sd 
PNNS  69.7  66.1  69.5  75.3  75.3  70.1  3.8 
PNNC  69.1  71.3  72.1  75.3  75.3  71.9  2.6 
PNNV  73.9  76.5  76.0  80.5  80.5  76.7  2.8 
PNNVC  74.6  76.5  75.3  79.2  79.2  76.4  2.0 
PNNVC–CG  65.5  30.5  68.2  66.3  68.2  57.6  18.1 
SVM  78.2  78.7  75.3  76.6  78.7  77.2  1.5 
GEP  72.6  77.4  76.0  80.5  80.5  76.6  3.3 
k–Means  68.3  70.0  70.8  71.4  71.4  70.2  1.2 
MLP  73.2  73.6  74.8  73.2  74.8  73.7  0.8 
RBFN  65.8  67.8  65.6  68.8  68.8  67.0  1.6 
LVQN  65.8  65.9  67.4  66.1  67.4  66.3  0.7 
The test accuracy values (in %) determined for four considered trainingtest subsets for Haberman survival data set
Data partitions [%]  

Model  60/40  70/30  80/20  90/10  max  avr  sd 
PNNS  77.0  75.0  73.0  74.2  77.0  75.8  1.5 
PNNC  79.5  77.2  78.7  77.4  79.5  78.2  1.1 
PNNV  76.2  78.3  82.0  87.1  87.1  80.9  4.8 
PNNVC  77.9  79.3  80.3  87.1  87.1  81.2  4.1 
PNNVC–CG  50.0  69.6  68.8  51.6  69.6  60.0  10.6 
SVM  78.7  76.1  78.7  80.6  80.6  78.5  1.9 
GEP  75.4  75.0  73.8  77.4  77.4  75.4  1.5 
k–Means  69.7  69.6  70.5  67.7  70.5  69.4  1.2 
MLP  74.2  74.2  74.9  76.4  76.4  74.9  1.0 
RBFN  74.6  73.9  75.4  74.2  75.4  74.5  0.7 
LVQN  76.4  75.8  78.4  74.2  78.4  76.2  1.7 
The test accuracy values (in %) determined for four considered trainingtest subsets for cardiotocography data set
Data partitions [%]  

Model  60/40  70/30  80/20  90/10  max  avr  sd 
PNNS  88.9  84.4  85.9  85.0  88.9  86.0  2.0 
PNNC  90.2  88.4  87.3  93.0  93.0  89.7  2.5 
PNNV  97.3  93.9  93.2  97.2  97.3  95.4  2.2 
PNNVC  95.8  91.9  93.2  94.9  95.8  93.9  1.7 
PNNVC–CG  11.9  72.0  74.5  65.4  84.5  58.5  32.0 
SVM  97.2  94.4  97.2  98.1  98.1  97.2  0.7 
GEP  92.4  92.3  94.1  96.7  96.7  93.9  2.1 
k–Means  89.4  89.7  85.6  93.9  93.9  89.7  3.4 
MLP  89.1  89.0  89.5  91.7  91.7  89.8  1.3 
RBFN  77.9  77.8  77.9  77.6  77.9  77.8  0.1 
LVQN  78.1  77.8  77.9  77.6  78.1  77.9  0.2 
The test accuracy values (in %) determined for four considered trainingtest subsets for dermatology data set
Data partitions [%]  

Model  60/40  70/30  80/20  90/10  max  avr  sd 
PNNS  85.9  89.6  87.5  91.7  91.7  88.7  2.5 
PNNC  89.4  90.6  90.3  94.4  94.4  91.2  2.2 
PNNV  96.5  97.2  95.8  100.0  100.0  97.6  1.8 
PNNVC  90.1  96.2  91.7  97.2  97.2  93.8  3.4 
PNNVC–CG  55.6  89.6  86.2  94.4  94.4  81.5  17.5 
SVM  98.6  96.2  97.2  94.4  98.6  96.6  1.8 
GEP  97.2  98.1  94.4  94.4  98.1  96.0  1.9 
k–Means  87.3  89.6  88.9  88.9  89.6  88.7  1.0 
MLP  73.9  70.7  75.6  76.7  76.7  74.2  2.6 
RBFN  78.9  79.3  76.4  80.6  80.6  78.8  1.8 
LVQN  66.4  31.1  73.2  69.4  73.2  60.0  19.5 
The test accuracy values (in %) determined for four considered trainingtest subsets for Statlog heart data set
Data partitions [%]  

Model  60/40  70/30  80/20  90/10  max  avr  sd 
PNNS  75.9  87.6  77.8  92.6  92.6  83.5  8.0 
PNNC  79.6  90.1  79.6  96.3  96.3  86.4  8.2 
PNNV  83.3  91.4  83.3  100.0  100.0  89.5  8.0 
PNNVC  83.3  91.4  87.0  100.0  100.0  90.4  7.2 
PNNVC–CG  63.0  59.3  51.9  74.1  74.1  62.1  9.3 
SVM  80.6  90.1  81.5  88.9  90.1  85.3  4.9 
GEP  76.9  86.4  81.5  88.9  83.4  94.6  2.7 
k–Means  63.9  76.9  66.7  81.5  81.5  70.0  7.8 
MLP  75.8  86.4  76.5  87.8  87.8  81.6  6.4 
RBFN  81.5  88.9  81.5  92.6  92.6  86.1  5.6 
LVQN  76.9  86.9  75.2  82.6  86.9  80.4  5.4 
5.4 Illustration of the PNN training process
We can observe that the changes of the smoothing parameter values within the training process result from the implementation of the proposed procedure. The magnitude of these changes becomes smaller in subsequent stages of the procedure (e.g. Figs. 4, 6 and 9). Large modifications of the smoothing parameter provide the possibility of either finding optimal σ after a small number of steps (t=6 in Fig. 5), or narrowing the range of its possible optimal values (Fig. 9).
Another interesting feature worth noting is that the reinforcement signal follows the changes of the training accuracy. r becomes negative when A c c ^{ t r a i n } decreases and r takes the positive value when A c c ^{ t r a i n } increases.
On the basis of the figures, the following observation can also be noticed. (i) In the dermatology and Statlog heart data classification tasks, the maximal value of A c c ^{ t e s t } is obtained for A c c ^{ t r a i n }=100 %. In the remaining classification problems, the maximal value of A c c ^{ t r a i n } does not guarantee the highest value of the test accuracy; (ii) Only for the Haberman survival data set classification problem it is impossible to achieve 100 % of the training accuracy; (iii) The classification problem of the Haberman survival and Statlog heart data sets confirm that it is necessary to perform all stages of the procedure. In the first case, 100 % of the training accuracy is not reached in any stage. In the second one, the maximum value of A c c ^{ t e s t }=92.6 % is obtained in the third stage of the procedure.
6 Conclusions
In this article, the procedure based on the Q(0)learning algorithm was proposed to the adaptive choice and computation of the smoothing parameters of the probabilistic neural network. All possible classes of the PNN models were regarded. These models differed in the way of the smoothing parameters representation. Application of the procedure based on theQ(0)learning algorithm for PNN parameter tuning is the element of novelty. It is worth to note that the comparison of all types of probabilistic neural networks has not been presented in literature.
The proposed approach was tested on six data sets and compared with PNN trained by the conjugate gradient procedure, SVM algorithm, GEP classifier, k–Means method, multilayer perceptron, radial basis function neural network and learning vector quantization neural network. In three classification problems, at least one of the PNNC, PNNV or PNNVC models trained by the proposed procedure provided the highest average accuracy. Four out of six times, PNNS was the second to last data classifier. This means that the representation of the smoothing parameter, either in terms of a vector or a matrix, contributes to higher PNN’s prediction ability. As one can observe, for PNN trained by the conjugate gradient procedure the lowest accuracy was obtained for all six data classification cases. Thus, the proposition of any alternative method for probabilistic neural network training is by all means justified.
Notes
Acknowledgements
This work was supported in part by Rzeszow University of Technology Grant No. U–235/DS and U–8613/DS.
References
 1.Adeli H, Panakkat A (2009) A probabilistic neural network for earthquake magnitude prediction. Neural Netw 22:1018–1024CrossRefGoogle Scholar
 2.Asadpour M, Siegwart R (2004) Compact Qlearning optimized for microrobots with processing and memory constraints. Robot Auton Syst 48(1):49–61CrossRefGoogle Scholar
 3.Bache K, Lichman M (2013) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. University of California School of Information and Computer Science. Irvine
 4.Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning problem. IEEE Trans SMC 13:834–847Google Scholar
 5.Bertin M, Schweighofer N, Doya K (2007) Multiple modelbased reinforcement learning explains dopamine neuronal activity. Neural Netw 20:668–67MATHCrossRefGoogle Scholar
 6.Braga APS, Arauno AFR (2003) A topological reinforcement learning agent for navigation. Neural Comput Applic 12:220– 236CrossRefGoogle Scholar
 7.Chtioui Y, Panigrahi S, Marsh R (1998) Conjugate gradient and approximate Newton methods for an optimal probabilistic neural network for food color classification. Opt Eng 37:3015– 3023CrossRefGoogle Scholar
 8.Demuth H, Beale M (1994) Neural network toolbox user’s guide. The Mathworks Inc.Google Scholar
 9.Ferreira C (2001) Gene expression programming: a new adaptive algorithm for solving problems. Compl Syst 13(2):87– 129MATHGoogle Scholar
 10.Ferreira C (2006) Gene expression programming: mathematical modeling by an artificial intelligence. Springer, BerlinGoogle Scholar
 11.Georgiou LV, Pavlidis NG, Parsopoulos K E et al (2006) New selfadaptive probabilistic neural networks in bioinformatic and medical tasks. Int J Artificial Intell Tools 15:371–396CrossRefGoogle Scholar
 12.Georgiou LV, Alevizos PD, Vrahatis MN (2008) Novel approaches to probabilistic neural networks through bagging and evolutionary estimating of prior probabilities. Neural Process Lett 27:153–162MATHCrossRefGoogle Scholar
 13.Guvenir HA, Demiroz G, Ilter N (1998) Learning differential diagnosis of EryhematoSquamous diseases using voting feature intervals. Artif Intell Med 13:147–165CrossRefGoogle Scholar
 14.Hartigan JA, Wong MA (1979) A kmeans clustering algorithm. J R Stat Soc Ser C 1:100–108Google Scholar
 15.Heinen MR, Engel PM (2010) An incremental probabilistic neural network for regression and reinforcement learning tasks. In: Diamantaras K, Duch W, Iliadis LS (eds) Lecture notes in computer science, vol 6353. Springer, Berlin, Heidelberg, pp 170–179Google Scholar
 16.Iglesias A, Martinez P, Aler R, Fernandez F (2009) Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Appl Intell 31:89–106CrossRefGoogle Scholar
 17.Jonathan P, Krzanowski WJ, McCarthy WV (2000) On the use of crossvalidation to assess performance in multivariate prediction. Stat Comput 10:209–229CrossRefGoogle Scholar
 18.Kim CO, Kwon I–H, Baek J–G (2008) Asynchronous actionreward learning for nonstationary serial supply chain inventory control. Appl Intell 28:1–16CrossRefGoogle Scholar
 19.Kusy M, Zajdel R (2014) Stateless Qlearning algorithm for training of radial basis function based neural networks in medical data classification. In: Korbicz J, Kowal M (eds) Advances in intelligent systems and computing, vol 230. Springer, Berlin / Heidelberg, pp 267–278Google Scholar
 20.Kyriacou E, Pattichis MS, Pattichis CS et al (2009) Classification of atherosclerotic carotid plaques using morphological analysis on ultrasound images. Appl Intell 30:3–23CrossRefGoogle Scholar
 21.Landwehr JM, Pregibon D, Shoemaker AC (1984) Graphical methods for assessing logistic regression models. J Am Stat Assoc 79:61–71MATHCrossRefGoogle Scholar
 22.Li J, Li Z, Chen J (2011) Microassembly path planning using reinforcement learning for improving positioning accuracy of a 1 cm3 omnidirectional mobile microrobot. Appl Intell 34:211– 225CrossRefGoogle Scholar
 23.Maglogiannis I, Zafiropoulos E, Anagnostopoulos I (2009) An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. Appl Intell 30:24– 36CrossRefGoogle Scholar
 24.Mangasarian OL, Street WN, Wolberg WH (1995) Breast cancer diagnosis and prognosis via linear programming. Oper Res 43(4):570–577MATHMathSciNetCrossRefGoogle Scholar
 25.Mantzaris D, Anastassopoulos G, Adamopoulos A (2011) Genetic algorithm pruning of probabilistic neural networks in medical disease estimation. Neural Netw 24:831–835CrossRefGoogle Scholar
 26.Mendonca M, Arruda LVR, Neves F Jr (2012) Autonomous navigation system using Event DrivenFuzzy Cognitive Maps. Appl Intell 37:175–188CrossRefGoogle Scholar
 27.Nebti S, Boukerram A (2013) Handwritten characters recognition based on natureinspired computing and neuroevolution. Appl Intell 38:146–159CrossRefGoogle Scholar
 28.Orr RK (1997) Use of a probabilistic neural network to estimate the risk of mortality after cardiac surgery. Med Decis Making 17:178–185CrossRefGoogle Scholar
 29.Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 36:1065–1076MathSciNetCrossRefGoogle Scholar
 30.Platt JC (1999) Sequential minimal optimization: a fast algorithm for training support vector machines. In: Schlkopf B, Burges J C, Smola J (eds) Advances in kernel methods  support vector learning. MIT Press, Cambridge, pp 185–208Google Scholar
 31.Rutkowski L (2004) Adaptive probabilistic neural networks for pattern classification in timevarying environment. IEEE Trans Neural Netw 15:811–827CrossRefGoogle Scholar
 32.Samanta B, AlBalushi KR, AlAraimi SA (2006) Artificial neural networks and genetic algorithm for bearing fault detection. Soft Comput 10:264–271CrossRefGoogle Scholar
 33.Schoknecht R, Riedmiller M (2003) Reinforcement learning on explicitly specified time scales. Neural Comput & Applic 12(2):61–80CrossRefGoogle Scholar
 34.Schweighofer N, Doya K (2003) Metalearning in reinforcement learning. Neural Netw. 16:5–9CrossRefGoogle Scholar
 35.Sherrod PH (2013) DTREG predictive modelling software. http://www.dtreg.com. Accessed 26 September 2013
 36.Smith JW, Everhart JE, Dickson WC, Knowler WC, Johannes RS (1988) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the symposium on computer applications and medical care. IEEE Computer Society Press, pp 261–265Google Scholar
 37.Specht DF (1990) Probabilistic neural networks and the polynomial adaline as complementary techniques for classification. IEEE Trans Neural Netw 1:11–121CrossRefGoogle Scholar
 38.Specht DF (1990) Probabilistic neural networks. Neural Netw 3(1):109–118CrossRefGoogle Scholar
 39.Specht DF (1994) Experience with adaptive probabilistic neural networks and adaptive general regression neural networks. In: IEEE international conference on neural networks. USA, Orlando, pp 1203–1208Google Scholar
 40.Starzyk JA, Liu Y, Batog S (2010) A novel optimization algorithm based on reinforcement learning. In: Tenne Y, Goh CK (eds) Computational intelligence in optimization, ALO, vol 7, pp 27–47Google Scholar
 41.Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, CambridgeGoogle Scholar
 42.Vapnik V (1995) The nature of statistical learning theory. SpringerVerlag, New YorkMATHCrossRefGoogle Scholar
 43.Vien NA, Ertel W, Chung TC (2013) Learning via human feedback in continuous state and action spaces. Appl Intell 39:267–278CrossRefGoogle Scholar
 44.Watkins C (1989) Learning from delayed rewards. PhD Dissertation. University of Cambridge, EnglandGoogle Scholar
 45.Wen XB, Zhang H, Xu XQ, Quan JJ (2009) A new watermarking approach based on probabilistic neural network in wavelet domain. Soft Computing 13:355–360CrossRefGoogle Scholar
 46.Wu QH, Liao HL (2010) Highdimensional function optimisation by reinforcement learning. In: IEEE congress on evolutionary computation (CEC), Barcelona, pp 1–8Google Scholar
 47.Zhong M, Coggeshall D, Ghaneie E et al (2007) Gapbased estimation: choosing the smoothing parameters for probabilistic and general regression neural networks. Neural Comput 19(10):2840–2864MATHCrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.