Probabilistic neural network training procedure based on Q(0)-learning algorithm in medical data classification
- First Online:
- 4 Citations
- 2k Downloads
Abstract
In this article, an iterative procedure is proposed for the training process of the probabilistic neural network (PNN). In each stage of this procedure, the Q(0)-learning algorithm is utilized for the adaptation of PNN smoothing parameter (σ). Four classes of PNN models are regarded in this study. In the case of the first, simplest model, the smoothing parameter takes the form of a scalar; for the second model, σ is a vector whose elements are computed with respect to the class index; the third considered model has the smoothing parameter vector for which all components are determined depending on each input attribute; finally, the last and the most complex of the analyzed networks, uses the matrix of smoothing parameters where each element is dependent on both class and input feature index. The main idea of the presented approach is based on the appropriate update of the smoothing parameter values according to the Q(0)-learning algorithm. The proposed procedure is verified on six repository data sets. The prediction ability of the algorithm is assessed by computing the test accuracy on 10 %, 20 %, 30 %, and 40 % of examples drawn randomly from each input data set. The results are compared with the test accuracy obtained by PNN trained using the conjugate gradient procedure, support vector machine algorithm, gene expression programming classifier, k–Means method, multilayer perceptron, radial basis function neural network and learning vector quantization neural network. It is shown that the presented procedure can be applied to the automatic adaptation of the smoothing parameter of each of the considered PNN models and that this is an alternative training method. PNN trained by the Q(0)-learning based approach constitutes a classifier which can be treated as one of the top models in data classification problems.
Keywords
probabilistic neural network smoothing parameter training procedure Q(0)-learning algorithm reinforcement learning accuracy1 Introduction
Probabilistic neural network (PNN) is an example of the radial basis function based model effectively used in data classification problems. It was proposed by Donald Specht [37, 38] and, as the data classifier, draws the attention of researchers from the domain of data mining. For example, it is applied in medical diagnosis and prediction [23, 25, 28], image classification and recognition [7, 20, 27], bearing fault detection [32], digital image watermarking [45], earthquake magnitude prediction [1] or classification in a time-varying environment [31].
PNN is a feed-forward neural network with a complex structure. It is composed of an input layer, a pattern layer, a summation layer and an output layer. Despite its complexity, PNN only has a single training parameter. This is a smoothing parameter of the probability density functions (PDFs) which are utilized for the activation of the neurons in the pattern layer. Thereby, the training process of PNN solely requires a single input-output signal pass in order to compute network response. However, only the optimal value of the smoothing parameter gives the possibility of correctness of the model’s response in terms of generalization ability. The value of σ must be estimated on the basis of the PNN’s classification performance which is usually achieved in an iterative manner.
Within the process of the smoothing parameter estimation two issues must be addressed. The first one pertains to the selection of σ in PDF for the pattern layer neurons of PNN. Four possible approaches are applied, i.e. a single parameter for the whole model [37, 38], a single parameter for each class [1], a single parameter for each data attribute [11, 39], and a single parameter for each attribute and a class [7, 11, 12].
The second problem related to the smoothing parameter estimation for PNN is concerned with the computation of the σ value. In literature, different procedures have been developed. For example, in [39], the conjugate gradient descent (ascent) is used to find iteratively the set of σ’s which maximize the optimization criterion. Chtioui et al [7] exploit the conjugate gradient method and the approximate Newton algorithm to determine the smoothing parameters associated with each data attribute and class. In [12], the authors utilize the particle swarm optimization algorithm to estimate the matrix of the smoothing parameters for the probability density functions in the pattern layer. An interesting study is presented in [47], where the gap-based approach for smoothing parameter adaptation is proposed. The authors provide the formula for σ on the basis of the gap computed between the two nearest points of the data set. The solution is applied to PNN for which the smoothing parameter takes the form of a scalar and the vector whose parameters are associated with each data feature.
As one can observe, the choice of the smoothing parameter plays a crucial role in the training process of the probabilistic neural network. This fact is of particular importance when PNN has different σ for: each class, each attribute, and each class and attribute. The task of smoothing parameter selection can then be considered as a high-dimensional function optimization problem. The reinforcement learning (RL) algorithm is an efficient method in solving such type of problems, e.g. finding extrema of some family of functions [46] or the computation of the set of optimal weights for multilayer perceptron [40]. The RL method is also frequently applied in various engineering tasks. It is used in nonstationary serial supply chain inventory control [18], adaptive control of nonlinear objects [43], adjusting robot behavior for autonomous navigation system [26] or path planning for improving positioning accuracy of a mobile microrobot [22]. There are also studies which propose the use of RL in non-technical domains, e.g. in the explanation of dopamine neuronal activity [5] or in an educational system to improve the pedagogical policy [16].
In this work, we introduce a novel procedure for the computation of the smoothing parameter of the PNN model. This procedure uses the Q(0)-learning algorithm. The method adjusts the smoothing parameter according to four different strategies: single σ for the whole network, single σ for each class, single σ for each data attribute and single σ for each data attribute and each class. The results of our proposed solution are compared to the outcomes of PNN for which the smoothing parameter is calculated using the conjugate gradient procedure and, additionally, to the support vector machine classifier, gene expression programming algorithm, k–Means clustering method, multilayer perceptron, radial basis function neural network and learning vector quantization neural network in medical data classification problems.
The authors of the present study have already proposed the application of the reinforcement learning algorithm to the computation of the smoothing parameter of radial basis function based neural networks [19]. In that work, the stateless Q-learning algorithm was used for the adaptive computation of the smoothing parameter of the networks.
This paper is organized as follows. Section 2 discusses the probabilistic neural network highlighting its basics, structure, principle of operation and the problem of smoothing parameter selection. Section 3 presents the basis of one of reinforcement learning algorithms which is applied in this work, namely the Q(0)-learning algorithm. In Section 4, we present the proposed procedure. Here the problem statement is provided, a general idea of applying the Q(0)-learning algorithm to the choice of the smoothing parameter is described and, finally, the details of the algorithm are given. Section 5 presents the data sets used in this research, the algorithm settings and the obtained empirical results along with the illustration of the PNN training process. In this part of our work, we compare the performance of our method with the efficiency of the PNN whose σ is determined by means of the conjugate gradient method and, additionally, to the efficiency of the reference classifiers and neural networks. Finally, in Section 6, we conclude our work.
2 Probabilistic neural network
Probabilistic neural network is a data classification model which implements the Bayesian decision rule. This rule is defined as follows. If we assume that: (1) there is a data pattern \(\mathbf {x\in \mathbb {R}}^{n}\) which is included in one of the predefined classes g=1,…,G; (2) the probability of x belonging to the class g equals p_{g}; (3) the cost of classifying x into class g is c_{g}; (4) the probability density functions y_{1}(x),y_{2}(x),…,y_{G}(x) for all classes are known. Then, according to the Bayes theorem, when g≠h, the vector x is classified to the class g, if p_{g}c_{g}y_{g}(x)>p_{h}c_{h}y_{h}(x). Usually p_{g}=p_{h} and c_{g}=c_{h}, thus if y_{g}(x)>y_{h}(x), the vector x is classified to the class g.
In real data classification problems, any knowledge on the probability density functions y_{g}(x) is not given since a data set distribution is usually unknown. Therefore, some approximation of the PDF must be determined. Such an approximation can be obtained using the Parzen method [29]. Commonly, the Gaussian function is a choice for PDF since it satisfies the conditions required by Parzen’s method.
3 Reinforcement learning
3.1 Introduction
Reinforcement learning addresses the problem of the agent that must learn to perform a task through a trial and error interaction with an unknown environment. The agent and the environment interact continuously until the terminal state is reached. The agent senses the environment and selects an action to perform. Depending on the effect of its action, the agent obtains a reward. Its goal is to maximize the discounted sum of future reinforcements r_{t} received in the long run in any time step t, which is usually formalized as \(\sum \nolimits _{t=0}^{\infty }\gamma ^{t}r_{t}\), where \(\gamma \in \left [ 0,1\right ] \) is the agent’s discount rate [41].
The mathematical model of the reinforcement learning method is a Markov Decision Process (MDP). MDP is defined as the quadruple \(\langle S, A, P_{s_{t}s_{t+1}}^{a_{t}}, r_{t} \rangle \) where S is a set of states, A is a set of actions, \(P_{s_{t}s_{t+1}}^{a_{t}}\) denotes the probability of the transition to the state s_{t+1}∈S after the execution of the action a_{t}∈A in the state s_{t}∈S.
3.2 Q(0)-learning
4 Application of Q(0)-learning based procedure to the adaptation of PNN’s smoothing parameter
4.1 Problem statement
The task is to find the optimal value of the smoothing parameter which maximizes the accuracy (10). For PNNC, PNNV and PNNVC models, this is a multivalued optimization problem. As the solution, we propose a new procedure, which is based on the Q(0)-learning algorithm. The set of system states S, the set of actions A and the reinforcement signal r which are required by the Q(0)-learning method will be defined along with the description of the algorithm.
4.2 General idea
For the adaptation of the smoothing parameter, the procedure based on the Q(0)-learning algorithm is proposed for PNNS, PNNC, PNNV and PNNVC models. The introduction of the Q(0)-learning algorithm is based on the assumption that in the PNN training process it is possible to distinguish two elements which interact with each other: the environment and the agent. The environment is composed of the data set used for the training process, the PNN model and the accuracy measure. The agent, on the basis of the policy which is represented by the action value function Q, chooses an action a_{t} in a state s_{t}. The action a_{t} is used to modify the smoothing parameter. In this work, the state is represented by the accuracy measure. This has some natural interpretation since the state defined in such a way is the function of PNN output, which depends on the smoothing parameter. The output of PNN is computed for the training and test set in order to determine the training and test accuracies. On the basis of the training accuracy, the next state s_{t+1} and the reinforcement signal r_{t} are computed. The reinforcement signal provides information about the change of the training accuracy taking the negative value when the accuracy decreases and the positive value when the accuracy increases. The effect of interaction between the agent and the environment results in both the modification of the action value function Q, and the change of the smoothing parameter.
The main assumption of the proposed procedure is to perform the training of the PNN model on the training set in order to maximize the training accuracy (10). Additionally, PNN is tested by computing the accuracy on the test set. The highest test accuracy and its corresponding value of the smoothing parameter is stored. Finding the highest test accuracy of PNN will provide the optimal smoothing parameter in terms of the prediction ability.
choose a_{t} action using an actual policy derived from the action value function Q;
update a single element of σ_{V} with the value of a_{t};
compute training and test accuracy Acc_{t} according to (10);
actualize the maximal test accuracy \(Acc_{\max }\) and the corresponding \(\boldsymbol {\sigma }_{\max }^{(m)}\);
calculate the reinforcement signal r_{t} on the basis of training accuracy;
update the action value function Q.
Once the first stage of the procedure is completed, the second one begins. Here, \(\boldsymbol {\sigma }_{\max }^{(2)}\) is initialized with the optimal value from the previous stage and the action set is changed. In our approach, this change relies on the decrease of all action values by an order of magnitude. The PNNV model is trained using Algorithm 1 and the smoothing parameter vector which maximizes the training accuracy is updated.
The procedure is performed M-times, each time: (1) – updating \(\boldsymbol {\sigma }_{\max }^{(m)}\) on the basis of \(\boldsymbol {\sigma }_{\max }^{(m-1)}\), (2) – decreasing the absolute values of the actions and (3) – finding new smoothing parameter values which maximize training accuracy. Such a type of approach, where the initial absolute values of the actions are large, gives the possibility of the selection of the smoothing parameter values within a broad range. The iterative decrease of actions in subsequent stages makes \(\boldsymbol {\sigma }_{\max }^{(m)}\) narrow its values. This, in turn, allows us to search for a more optimal parameter of the PNNV model.
Once all M stages are performed, the highest test accuracy \(Acc_{\max }\) is computed for \(\boldsymbol {\sigma }_{\max }^{(m)}\). Such a solution provides the highest prediction ability of PNNV.
As shown, the above procedure utilizes RL in the problem of smoothing parameter adjustment to perform a classification task. However, it is also possible to combine PNN and RL in the other way. In the work of Heinen and Engel [15], a new incremental probabilistic neural network (IPNN) is proposed. The matrix of the smoothing parameters of IPNN is used for the action selection in the RL problem. IPNN is therefore utilized as the action value function approximator.
4.3 Application of the Q(0)-learning algorithm to adaptive computation of the smoothing parameter
In this subsection, we explain the details concerning the application of the Q(0)-learning algorithm to the problem of σ_{V} adaptive selection for the PNNV classifier. As mentioned before, the algorithm is solely highlighted for this type of network since Q(0)-learning works in a similar manner for PNNS, PNNC and PNNVC. The only difference is related to the number of the smoothing parameters which have to be updated. For PNNV, there is n parameters of the model while for PNNS, PNNC and PNNVC, there exist 1, G and n×G smoothing parameters, respectively.
The use of the Q(0)-learning algorithm for the choice of σ_{V} parameter requires the definition of the set of the system states, the action set and the reinforcement signal.
Definition 1
The set of system states is defined by the accuracy measure: \(S=\left \lbrace 0,\frac {1}{l_{T}},\frac {2}{l_{T}},\ldots ,\frac {l_{T}-1}{l_{T}},1\right \rbrace \). S takes the real values from the interval [0,1]. The total number of states is therefore l_{T}+1.
Definition 2
A^{(m)} is the symmetric set of actions of the following form: \(A^{(m)}=\left \lbrace -a_{1}^{(m)},-a_{2}^{(m)},\ldots ,-a_{p^{(m)}}^{(m)},\right .\)\(\left .a_{p^{(m)}}^{(m)},\ldots ,a_{2}^{(m)},a_{1}^{(m)}\right \rbrace \) where p^{(m)} denotes the half of the cardinality of this set in stage m of the procedure.
In each stage of the procedure, the smoothing parameters of PNNV are increased or decreased by the element values of A^{(m)} action set. The proposition of the action set in the first stage (A^{(1)}) allows the modification of σ_{V} with large values. This gives the possibility of searching optimal parameters inside a broad range of values. Maximally, the elements of σ_{V} can be modified by the value of ±10. The first stage of the procedure ends up with finding a candidate for optimum of the smoothing parameter. Subsequent decreases of absolute values of actions in A^{(2)}, shrink the domain of possible optimal parameter values. Finally, in A^{(3)}, the absolute values of the actions are so small so that the smoothing parameters of PNNV slightly change. A large change of σ_{V} in the third stage of the procedure is not required because the optimal modification route has already been established (in the first two stages).
In order to maximize the training accuracy of PNNV, the actual reinforcement signal r_{t} should reward an agent when the training accuracy increases and punish an agent when the accuracy decreases. This idea can be simply formalized as follows.
Definition 3
Such a form of the reinforcement signal combined with the action value function update strengthens the confidence if the choice of an action is beneficial or not.
Algorithm 1 shows the application of the Q(0)-learning method to the adaptive choice of σ_{V} for the PNNV classifier. This algorithm is executed in each m-th stage of the procedure shown in Fig. 3.
The algorithm starts with the initialization of \(Acc_{\max }\) on the basis of the smoothing parameter values found in the previous stage of the procedure except from the first stage, when σ_{V} is initialized with ones. \(Acc_{\max }\) will store the maximal test accuracy computed on the test set during the training process. Then, in step 2, the action value function Q is set to zero.
Algorithm 1The proposed algorithm of smoothing parameter adaptation of PNNV model with the use of Q(0)-learning method for m-th stage of the procedure | |
It is worth noting that the type of the PNN model influences the number of smoothing parameter updates. For PNNS, PNNC, PNNV and PNNVC, the number of the smoothing parameter updates is equal \(t_{\max }\), \(t_{\max }\times G\), \(t_{\max }\times n\) and \(t_{\max }\times n\times G\), respectively.
5 Experiments
In this section, we present the simulation results in the classification of medical databases obtained by PNNS, PNNC, PNNV and PNNVC trained by the proposed procedure. These results are compared with the outcomes obtained by PNN trained using the conjugate gradient procedure (PNNVC–CG), support vector machine (SVM) algorithm, gene expression programming (GEP) classifier, k–Means method, multilayer perceptron (MLP), radial basis function neural network (RBFN) and learning vector quantization neural network (LVQN). The data sets used in these experiments are also briefly described and the adjustments of the algorithm are provided. Moreover, the illustration of the PNNS training process is presented.
5.1 Data sets used in the study
Wisconsin breast cancer database [24] that consists of 683 instances with 9 attributes. The data is divided into two groups: 444 benign cases and 239 malignant cases.
Pima Indians diabetes data set [36] that includes 768 cases having 8 features. Two classes of data are considered: samples tested negative (500 records) and samples tested positive (268 records).
Haberman’s survival data [21] that contains 306 patients who underwent surgery for breast cancer. For each instance, 3 variables are measured. The 5-year survival status establishes two input classes: patients who survived 5 years or longer (225 records) and patients who died within 5 years (81 records).
Cardiotocography data set [3] that comprises 2126 measurements of fetal heart rate and uterine contraction features on 22 attribute cardiotocograms classified by expert obstetricians. The classes are coded into three states: normal (1655 cases), suspect (295 cases) and pathological (176 cases).
Dermatology data [13] that includes 358 instances each of 34 features. Six data classes are considered: psoriasis (111 cases), lichen planus (71 cases), seborrheic dermatitis (60 cases), cronic dermatitis (48 cases), pityriasis rosea (48 cases) and pityriasis rubra pilaris (20 cases).
Statlog heart database [3] that consists of 270 instances and 13 attributes. There are two classes to be predicted: absence (150 cases) or presence (120 cases) of heart disease.
5.2 Algorithms’ settings
In the case of the proposed algorithm, the initial values of the action value function Q are set to zero. Three six–element action sets proposed in (12) are used. The maximum number of the training steps \(t_{\max }=100\) is assumed. We apply such a value of \(t_{\max }\) in order to show that at a relatively small number of training steps it is possible to achieve satisfactory results. Additionally, the Q(0)-learning algorithm requires appropriate selection of its intrinsic parameters: the greedy parameter, the update rate and the discount factor.
The greedy parameter ε determines the probability of random action selection and must be taken from the set \(\left [ 0,1\right ] \). If ε=0.05, only 5 actions out of 100 are chosen randomly from the action set. The remaining 95 % of action selections are performed according to learned policy represented by the Q–table. If the elements of the Q–table are the same (initial iterations of Algorithm 1), the actions are selected randomly. In this work, the greedy parameter is chosen experimentally from the set \(\left \lbrace 0.5, 0.05, 0.005 \right \rbrace \). Unfortunately, for \(t_{\max }=100\), the use of ε=0.5 does not yield repeatable results. In turn, for ε=0.005, it is observed that some actions have never been selected. Therefore, ε=0.05 is utilized in the experiments.
The α parameter determines the update rate for the action value function Q. The small value of this factor increases the time of the training process. Its large value introduces the oscillations of Q elements [34]. The proper selection of α has a significant influence on the convergence of the training process. From the theoretical point of view, one requires that α is large enough to overcome any initial conditions or possible random fluctuations and it should decrease its value in time. However, in practical applications, the constant values of this factor are mostly used. Admittedly, this approach does not assure the convergence of the learning process, but a stable policy can be reached. In our study, we choose α experimentally from the set \(\left \lbrace 0.1, 0.01, 0.001 \right \rbrace \). For all three parameter values, similar results are obtained. In the final simulation, we assume α=0.01.
The discount factor γ determines the relative importance of short and long termed prizes. This parameter is mostly picked arbitrarily near 1, e.g. 0.8 [6], 0.9 [2], [33] or 0.95 [4]. In this contribution, γ=0.95.
PNNVC–CG used in the simulations is the probabilistic neural network trained by the conjugated gradient procedure. The model is a built-in tool of DTREG predictive modeling software [35]. In the experiments, we use the network for which the smoothing parameter is adapted for each input feature and class separately. The starting values of the smoothing parameters for PNNVC–CG are between 0.0001 and 10 [35].
SVM algorithm [42] is used in this work as the data classifier. The model is trained by means of the SMO algorithm [30] available in Matlab’s Bioinformatics Toolbox. Multiclass classification problems are solved by applying the one-against-all method. In all data classification cases, radial basis function kernel is applied with experimental grid search for both C constraint and sc spread constant: \(C=\left \{10^{-1}\right .\), 10^{0}, 10^{1}, 10^{2}, 10^{3}, 10^{4}, \(\left .10^{5},10^{6}\right \}\) and sc={0.08, 0.2, 0.3, 0.5, 0.8, 1.2, 1.5, 2, 5, 10, 50, 80, 100, 200, 500}, respectively.
The head size, the number of genes within each chromosome, the linking functions between genes, the computing functions in the head, the fitness functions and the genetic operators used for the GEP classifier
Head size | 3,4,5,6,7,8 |
Number of genes | 1,2,…,12 |
Linking function | Addition, Multiplication, Logical OR |
Computing functions | + , −, ∗, /, −x, 1/x |
\(b/\left (1+\exp \left (ax\right )\right )\), \(\exp \left (-\left (x-a\right )^{2}/\left (2b^{2}\right )\right )\) | |
Fitness function | Sensitivity/Specificity |
Number of hits with penalty | |
Mean squared error | |
Genetic operators | Mutation = 0.044 |
Inversion = 0.1 | |
IS Transposition = 0.1 | |
RIS Transposition = 0.1 | |
Gene Transposition = 0.1 | |
One-Point Recombination = 0.3 | |
Two-Point Recombination = 0.3 | |
Gene Recombination = 0.1 |
The k–Means clustering algorithm [14] is used in the comparison for classification purposes. The predictions are made for the unknown cases by assigning them the category of the closest cluster center. In the simulations, the optimal number of clusters is found which provides highest test accuracy.
MLP neural network is simulated with one or two hidden layers activated by the transfer functions from the set {linear, hyperbolic tangent, logistic}. The same set of transfer functions is applied for the output neurons, for which the sum squared error function is calculated. The number of hidden layer neurons is optimized in order to minimize the network error. The model is trained with gradient descent with momentum and adaptive learning rate backpropagation algorithms [8].
For RBFN and LVQN neural networks, the number of hidden neurons is selected empirically from the set {2,4,6,…,100}. The optimal number of hidden neurons is taken so that the sum squared error for each model is minimized. The spread constant in RBFN hidden layer activation function is chosen experimentally from the interval [0.1,10] with the step size 0.1.
5.3 Empirical results
In this study, the performance of PNN models, for which the smoothing parameter is determined using the Q(0)-learning based procedure, is evaluated on the input data partitioned in the following way. Firstly, the testing subsets are created by applying a random extraction of 10 %, 20 %, 30 % and 40 % of cases out of the input database. Then, the training sets are created using the rest of the patterns, i.e. 90 %, 80 %, 70 % and 60 % of data, respectively. This type of data division is introduced on purpose since considering all possible training–test subsets is complex from a computational point of view – the number of ways of dividing l training patterns into v sets, each of size k, is large and equals \( l! / \left (v! \cdot (k!)^{v}\right ) \) [17].
The remaining classifiers used in the comparative research: PNNVC–CG, SVM, GEP, k–Means, MLP, RBFN and LVQN are trained and validated on the same data subsets. The use of the same training/test sets for all the models makes the obtained results comparable.
- 1.
In the classification of Wisconsin breast cancer data, out of all compared models, PNNC and PNNVC reach the highest average test accuracy which is equal 99.0 %. In the case of Haberman and dermatology data classification problems, the highest average test accuracy is obtained for PNNVC (81.2 %) and PNNV (97.6 %), respectively.
- 2.
The SVM model provides the highest average test accuracy in the classification of the Pima Indians diabetes data set (77.2 %) and cardiotocography database (97.2 %). In these two classification tasks, PNNV is the second best model with the test accuracy lower by 0.5 % and 1.8 %, respectively. For the Statlog classification problem, GEP algorithm yields the highest average test accuracy which equals 94.6 %. This result is followed by the outcomes of PNNVC, PNNV and PNNC.
- 3.
Except for the dermatology classification problem, PNNVC–CG turns out to be the worst classifier. The k–Means algorithm and the remaining reference neural networks (MLP, RBFN and LVQN) achieve lower test accuracy than the PNNV, PNNVC, SVM and GEP classifiers.
The test accuracy values (in %) determined for four considered training-test subsets for Wisconsin breast cancer data set
Data partitions [%] | |||||||
---|---|---|---|---|---|---|---|
Model | 60/40 | 70/30 | 80/20 | 90/10 | max | avr | sd |
PNNS | 97.8 | 97.1 | 93.4 | 98.5 | 98.5 | 96.7 | 2.3 |
PNNC | 99.6 | 98.5 | 97.8 | 100.0 | 100.0 | 99.0 | 1.0 |
PNNV | 99.0 | 98.5 | 96.3 | 100.0 | 100.0 | 98.4 | 1.5 |
PNNVC | 99.6 | 98.5 | 97.8 | 100.0 | 100.0 | 99.0 | 1.0 |
PNNVC–CG | 95.3 | 96.6 | 90.5 | 89.7 | 96.6 | 93.1 | 3.4 |
SVM | 98.9 | 97.6 | 95.6 | 97.1 | 98.9 | 97.3 | 1.4 |
GEP | 99.3 | 97.6 | 96.3 | 98.5 | 98.5 | 96.8 | 1.5 |
k–Means | 96.7 | 97.1 | 94.9 | 98.5 | 98.5 | 96.8 | 1.5 |
MLP | 97.7 | 96.4 | 93.9 | 96.8 | 97.7 | 96.2 | 1.7 |
RBFN | 97.4 | 96.1 | 95.6 | 95.6 | 97.4 | 96.2 | 0.9 |
LVQN | 97.8 | 97.6 | 92.3 | 96.9 | 97.8 | 96.2 | 2.6 |
The test accuracy values (in %) determined for four considered training-test subsets for Pima Indians diabetes data set
Data partitions [%] | |||||||
---|---|---|---|---|---|---|---|
Model | 60/40 | 70/30 | 80/20 | 90/10 | max | avr | sd |
PNNS | 69.7 | 66.1 | 69.5 | 75.3 | 75.3 | 70.1 | 3.8 |
PNNC | 69.1 | 71.3 | 72.1 | 75.3 | 75.3 | 71.9 | 2.6 |
PNNV | 73.9 | 76.5 | 76.0 | 80.5 | 80.5 | 76.7 | 2.8 |
PNNVC | 74.6 | 76.5 | 75.3 | 79.2 | 79.2 | 76.4 | 2.0 |
PNNVC–CG | 65.5 | 30.5 | 68.2 | 66.3 | 68.2 | 57.6 | 18.1 |
SVM | 78.2 | 78.7 | 75.3 | 76.6 | 78.7 | 77.2 | 1.5 |
GEP | 72.6 | 77.4 | 76.0 | 80.5 | 80.5 | 76.6 | 3.3 |
k–Means | 68.3 | 70.0 | 70.8 | 71.4 | 71.4 | 70.2 | 1.2 |
MLP | 73.2 | 73.6 | 74.8 | 73.2 | 74.8 | 73.7 | 0.8 |
RBFN | 65.8 | 67.8 | 65.6 | 68.8 | 68.8 | 67.0 | 1.6 |
LVQN | 65.8 | 65.9 | 67.4 | 66.1 | 67.4 | 66.3 | 0.7 |
The test accuracy values (in %) determined for four considered training-test subsets for Haberman survival data set
Data partitions [%] | |||||||
---|---|---|---|---|---|---|---|
Model | 60/40 | 70/30 | 80/20 | 90/10 | max | avr | sd |
PNNS | 77.0 | 75.0 | 73.0 | 74.2 | 77.0 | 75.8 | 1.5 |
PNNC | 79.5 | 77.2 | 78.7 | 77.4 | 79.5 | 78.2 | 1.1 |
PNNV | 76.2 | 78.3 | 82.0 | 87.1 | 87.1 | 80.9 | 4.8 |
PNNVC | 77.9 | 79.3 | 80.3 | 87.1 | 87.1 | 81.2 | 4.1 |
PNNVC–CG | 50.0 | 69.6 | 68.8 | 51.6 | 69.6 | 60.0 | 10.6 |
SVM | 78.7 | 76.1 | 78.7 | 80.6 | 80.6 | 78.5 | 1.9 |
GEP | 75.4 | 75.0 | 73.8 | 77.4 | 77.4 | 75.4 | 1.5 |
k–Means | 69.7 | 69.6 | 70.5 | 67.7 | 70.5 | 69.4 | 1.2 |
MLP | 74.2 | 74.2 | 74.9 | 76.4 | 76.4 | 74.9 | 1.0 |
RBFN | 74.6 | 73.9 | 75.4 | 74.2 | 75.4 | 74.5 | 0.7 |
LVQN | 76.4 | 75.8 | 78.4 | 74.2 | 78.4 | 76.2 | 1.7 |
The test accuracy values (in %) determined for four considered training-test subsets for cardiotocography data set
Data partitions [%] | |||||||
---|---|---|---|---|---|---|---|
Model | 60/40 | 70/30 | 80/20 | 90/10 | max | avr | sd |
PNNS | 88.9 | 84.4 | 85.9 | 85.0 | 88.9 | 86.0 | 2.0 |
PNNC | 90.2 | 88.4 | 87.3 | 93.0 | 93.0 | 89.7 | 2.5 |
PNNV | 97.3 | 93.9 | 93.2 | 97.2 | 97.3 | 95.4 | 2.2 |
PNNVC | 95.8 | 91.9 | 93.2 | 94.9 | 95.8 | 93.9 | 1.7 |
PNNVC–CG | 11.9 | 72.0 | 74.5 | 65.4 | 84.5 | 58.5 | 32.0 |
SVM | 97.2 | 94.4 | 97.2 | 98.1 | 98.1 | 97.2 | 0.7 |
GEP | 92.4 | 92.3 | 94.1 | 96.7 | 96.7 | 93.9 | 2.1 |
k–Means | 89.4 | 89.7 | 85.6 | 93.9 | 93.9 | 89.7 | 3.4 |
MLP | 89.1 | 89.0 | 89.5 | 91.7 | 91.7 | 89.8 | 1.3 |
RBFN | 77.9 | 77.8 | 77.9 | 77.6 | 77.9 | 77.8 | 0.1 |
LVQN | 78.1 | 77.8 | 77.9 | 77.6 | 78.1 | 77.9 | 0.2 |
The test accuracy values (in %) determined for four considered training-test subsets for dermatology data set
Data partitions [%] | |||||||
---|---|---|---|---|---|---|---|
Model | 60/40 | 70/30 | 80/20 | 90/10 | max | avr | sd |
PNNS | 85.9 | 89.6 | 87.5 | 91.7 | 91.7 | 88.7 | 2.5 |
PNNC | 89.4 | 90.6 | 90.3 | 94.4 | 94.4 | 91.2 | 2.2 |
PNNV | 96.5 | 97.2 | 95.8 | 100.0 | 100.0 | 97.6 | 1.8 |
PNNVC | 90.1 | 96.2 | 91.7 | 97.2 | 97.2 | 93.8 | 3.4 |
PNNVC–CG | 55.6 | 89.6 | 86.2 | 94.4 | 94.4 | 81.5 | 17.5 |
SVM | 98.6 | 96.2 | 97.2 | 94.4 | 98.6 | 96.6 | 1.8 |
GEP | 97.2 | 98.1 | 94.4 | 94.4 | 98.1 | 96.0 | 1.9 |
k–Means | 87.3 | 89.6 | 88.9 | 88.9 | 89.6 | 88.7 | 1.0 |
MLP | 73.9 | 70.7 | 75.6 | 76.7 | 76.7 | 74.2 | 2.6 |
RBFN | 78.9 | 79.3 | 76.4 | 80.6 | 80.6 | 78.8 | 1.8 |
LVQN | 66.4 | 31.1 | 73.2 | 69.4 | 73.2 | 60.0 | 19.5 |
The test accuracy values (in %) determined for four considered training-test subsets for Statlog heart data set
Data partitions [%] | |||||||
---|---|---|---|---|---|---|---|
Model | 60/40 | 70/30 | 80/20 | 90/10 | max | avr | sd |
PNNS | 75.9 | 87.6 | 77.8 | 92.6 | 92.6 | 83.5 | 8.0 |
PNNC | 79.6 | 90.1 | 79.6 | 96.3 | 96.3 | 86.4 | 8.2 |
PNNV | 83.3 | 91.4 | 83.3 | 100.0 | 100.0 | 89.5 | 8.0 |
PNNVC | 83.3 | 91.4 | 87.0 | 100.0 | 100.0 | 90.4 | 7.2 |
PNNVC–CG | 63.0 | 59.3 | 51.9 | 74.1 | 74.1 | 62.1 | 9.3 |
SVM | 80.6 | 90.1 | 81.5 | 88.9 | 90.1 | 85.3 | 4.9 |
GEP | 76.9 | 86.4 | 81.5 | 88.9 | 83.4 | 94.6 | 2.7 |
k–Means | 63.9 | 76.9 | 66.7 | 81.5 | 81.5 | 70.0 | 7.8 |
MLP | 75.8 | 86.4 | 76.5 | 87.8 | 87.8 | 81.6 | 6.4 |
RBFN | 81.5 | 88.9 | 81.5 | 92.6 | 92.6 | 86.1 | 5.6 |
LVQN | 76.9 | 86.9 | 75.2 | 82.6 | 86.9 | 80.4 | 5.4 |
5.4 Illustration of the PNN training process
We can observe that the changes of the smoothing parameter values within the training process result from the implementation of the proposed procedure. The magnitude of these changes becomes smaller in subsequent stages of the procedure (e.g. Figs. 4, 6 and 9). Large modifications of the smoothing parameter provide the possibility of either finding optimal σ after a small number of steps (t=6 in Fig. 5), or narrowing the range of its possible optimal values (Fig. 9).
Another interesting feature worth noting is that the reinforcement signal follows the changes of the training accuracy. r becomes negative when Acc^{train} decreases and r takes the positive value when Acc^{train} increases.
On the basis of the figures, the following observation can also be noticed. (i) In the dermatology and Statlog heart data classification tasks, the maximal value of Acc^{test} is obtained for Acc^{train}=100 %. In the remaining classification problems, the maximal value of Acc^{train} does not guarantee the highest value of the test accuracy; (ii) Only for the Haberman survival data set classification problem it is impossible to achieve 100 % of the training accuracy; (iii) The classification problem of the Haberman survival and Statlog heart data sets confirm that it is necessary to perform all stages of the procedure. In the first case, 100 % of the training accuracy is not reached in any stage. In the second one, the maximum value of Acc^{test}=92.6 % is obtained in the third stage of the procedure.
6 Conclusions
In this article, the procedure based on the Q(0)-learning algorithm was proposed to the adaptive choice and computation of the smoothing parameters of the probabilistic neural network. All possible classes of the PNN models were regarded. These models differed in the way of the smoothing parameters representation. Application of the procedure based on theQ(0)-learning algorithm for PNN parameter tuning is the element of novelty. It is worth to note that the comparison of all types of probabilistic neural networks has not been presented in literature.
The proposed approach was tested on six data sets and compared with PNN trained by the conjugate gradient procedure, SVM algorithm, GEP classifier, k–Means method, multilayer perceptron, radial basis function neural network and learning vector quantization neural network. In three classification problems, at least one of the PNNC, PNNV or PNNVC models trained by the proposed procedure provided the highest average accuracy. Four out of six times, PNNS was the second to last data classifier. This means that the representation of the smoothing parameter, either in terms of a vector or a matrix, contributes to higher PNN’s prediction ability. As one can observe, for PNN trained by the conjugate gradient procedure the lowest accuracy was obtained for all six data classification cases. Thus, the proposition of any alternative method for probabilistic neural network training is by all means justified.
Acknowledgements
This work was supported in part by Rzeszow University of Technology Grant No. U–235/DS and U–8613/DS.
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.