Adaptive neuro-fuzzy inference system (ANFIS)
Jang in 1993 [15] presented a hybrid model, inheriting the merits of neural networks (NN) and fuzzy into a single framework called ANFIS. Takagi–Sugeno inference model is utilized which generates a nonlinear mapping from input space onto output employing fuzzy IF–THEN rules. ANFIS comprises of 5 layers as illustrated in Fig. 1
Let x and y be defined crisp input to the node i, then in the first layer, the output of each node is stated as
$$ O_{1i} = \mu_{Ai} \left( X \right),\quad i = 1,2 $$
(1)
$$ O_{1i} = \mu_{Bi - 2} \left( Y \right),\quad i = 3,4 $$
(2)
where Ai and Bi are the values of membership function \({\mu }_{Ai}\left(X\right)\) and \({\mu }_{Bi-2}\left(Y\right)\), respectively. These values are defined by the following generalized Gaussian function [1]:
$$ \mu_{x} = e^{{ - \left( {x - \frac{{P_{i} }}{{\alpha_{i} }}} \right)^{2} }} $$
(3)
where pi and αi are the mean and standard deviation of data, respectively. In the literature, they are known as premise parameters set.
In the second layer, each node produces the firing strength by the following rule:
$$ O_{2i} = \mu_{{A_{i} }} (x)*\mu_{{B_{i - 2} }} (y) $$
(4)
The output of each node, in the third layer, is the normalized firing strength obtained by the following equation:
$$ O_{3i} = \overline{w}_{i} = \frac{{\omega_{i} }}{{\sum\nolimits_{i = 1}^{2} {\omega_{i} } }} $$
(5)
The fourth layer is composed of adaptive nodes and each adaptive node creates the output, according to the function:
$$ O_{4,i} = \overline{w}_{i} f_{i} = \overline{w}_{i} \left( {p_{i} x + q_{i} y + r_{i} } \right) $$
(6)
where pi, qi and ri are the consequent parameters of the ith adaptive node.
Finally, in the fifth layer there is a single node as the overall output. The value of the output is defined as:
$$ O_{5} = \sum\limits_{i} {\overline{w}_{i} f_{i} } $$
(7)
The adjustable parameters such as premise and the consequent play a vital role in deciding the performance of the ANFIS. Non-steady parameters pose issues in ANFIS. Wider search space, slower convergence and getting trapped easily in local optima are some of the issues in ANFIS which can be minimized using a hybrid algorithm for optimization. To reduce this problem, hybrid techniques are widely employed. Inconsistent accuracy, more computational time, has demanded the use of hybrid algorithms to overcome these drawbacks.
Differential evolution
Differential evolution (DE) is an evolutionary algorithm, introduced in 1997 by Storn and Price [21]. The algorithm comprises of the following operations namely: mutation, crossover and selection. Wrapper method is adopted in the feature selection in selecting the feature set. Considering an optimization problem of d-dimension with d parameters, an n solution vector population is first generated. Consider xi, where i = 1, 2… n. For each solution at any generation t, chromosomes are represented by
$$ x_{i}^{t} = \left( {x_{1,i}^{t} ,x_{2,i}^{t} , \ldots x_{d,i}^{t} } \right) $$
Mutation: At any generation t, for each xi, three vectors xp, xq, xr are chosen randomly at t. By this method of mutation, the donors are represented by
$$ v_{i}^{t + 1} = x_{p}^{t} + F\left( {x_{q}^{t} - x_{r}^{t} } \right) $$
F is the real and constant factor ranged [0, 2] called differential weight or mutation factor. Ideal choice will be 0 to 1 for stability. The mutation factor F is a positive control parameter preferred to scale and control the difference vector amplification. Value of F has to be carefully chosen as small values would lead to small mutation step sizes resulting in longer convergence time of the algorithm. On the other hand, large F values will facilitate exploration, instead would lead to overshooting good optima. In order to improve local exploration and maintain diversity, usually small F values are chosen.
Crossover: This process is controlled by crossover constant C which ranges between 0 and 1. The crossover constant influences on the algorithm diversity, as it takes control of the number of elements that would change. Larger values will tend to introduce more variation in the new population, therefore increasing exploration capabilities. A compromise has to be performed in ensuring both local and global search capabilities. Crossover was performed on each parameter. A uniformly distributed number generated randomly ri [0, 1]; the jth component of vi is computed as
$$ u_{j,1}^{t + 1} = \left\{ {\begin{array}{*{20}c} {v_{j,i} } & {{\text{if}}\;\;r_{i} \le C_{r} ,} \\ {x_{j,1}^{t} } & {\text{otherwise,}} \\ \end{array} } \right.\;\;\;j = {1},{2} \ldots d. $$
(10)
One can randomly decide on exchanging each of the component with donor.
Selection: Selection follows the same step as in GA. Fittest individual is chosen with the minimum cost value as
$$ x_{i}^{t + 1} = \left\{ {\begin{array}{*{20}l} {u_{i}^{t + 1} } \hfill & {{\text{if}}\;\;f\left( {u_{i}^{t + 1} } \right) \le f\left( {x_{i}^{t} } \right),} \hfill \\ {x_{i}^{t} } \hfill & {{\text{otherwise}}{.}} \hfill \\ \end{array} } \right. $$
The search performance depends on controlling the most sensitive crossover probability Cr and differential weight F. Cr = 0.5 is found to be suitable in most cases and n can be chosen between 5d – 10d. Pseudo code for DE is depicted in Fig. 2.
Glowworm swarm optimization (GSO)
This algorithm, proposed by Krishnanand and Ghose [16], is a simple method with fewer parameters to be adjusted and said to have a better rate of convergence. Most glowworms are able to find their position and share information by transmitting rhythm like beam. Glowworms find neighbors in their search scope by moving from one initial position to other better position. Lastly they confine to one or more extreme valued points. In GSO, the attraction of the individual glowworm is in proportion to its brightness and inverse to the distance between two individual glowworms. Fitness function depends on the position of the individuals. Pseudo code is given in Fig. 3
The steps followed in GSO process are detailed below
-
1.
Initialize parameters such as: n individual glowworm, lO—fluorescein value, rO—dynamic decision domain, s—step number, nt—threshold value in the domain, ρ—fluorescein elimination coefficient, γ—fluorescein update coefficient, β—update co-efficient of the domain, rs—maximum search radius, t—iteration number.
-
2.
Fitness function from J(xi(t)) is transmuted into li(t) which is given below
$$ l_{i} (t) = (1 - \rho )l_{i} (t - 1) + \gamma J(x_{i} (t)),\;\;\;x_{i} {-\!\!-}{\text{position at time}}\;t $$
(12)
-
3.
In each \(r_{d}^{i} (t)\), select the higher fluorescein valued individuals that form a set of neighborhood Ni(t).
$$ N_{i} (t) = \left\{ {\left. {j:} \right\|\left. {x_{j} (t) - x_{i} (t)} \right\| \le r_{d}^{i} (t);l_{i} (t) \le l_{j} (t)} \right\} $$
(13)
-
4.
Probability of the individual i that move toward j as
$$ P_{ij} (t) = \frac{{l_{j} (t) - l_{i} (t)}}{{\sum\nolimits_{{k \in N_{i} (t)}} {l_{k} (t) - l_{i} (t)} }} $$
(14)
-
5.
Position of the individual i could be updated by
$$ x_{i} (t + 1) = x_{i} (t) + s\left( {\frac{{x_{j} (t) - x_{i} (t)}}{{\left\| {x_{j} (t) - x_{i} (t)} \right\|}}} \right) $$
(15)
-
6.
Dynamic decision domain could be updated as
$$ r_{d}^{i} (t + 1) = \min \left\{ {r_{s} ,\max \left\{ {0,r_{d}^{i} (t),\beta \left( {n_{t} - \left| {N_{i} (t)} \right|} \right)} \right\}} \right\} $$
(16)
Initial neighborhood range of each glowworm is denoted by r0 and the parameter that controls the neighbor numbers is denoted by nt.
-
7.
Iteration is continued till a maximum luciferin is obtained.
FCM clustering
Defining the membership function is the most vital concept in ANFIS. It is a clustering-based problem. Hence FCM is employed to attain smaller number of fuzzy rules. In FCM, the degree of data belonging to different clusters was obtained by minimizing the objective function:
$$ T_{r} { = }\sum\limits_{{i{ = 1}}}^{N} {\sum\limits_{{t{ = 1}}}^{C} {g_{it}^{r} \left\| {x_{i} - \left. {c_{t} } \right\|} \right.^{2} } } {,}\;\;\;{1} \le r \le \infty $$
(17)
where r represents a real number > 1. git denotes degree of membership of the measured data xi ∈ Rd belonging to the cluster with center ct ∈ Rd. Minimizing the above equation results in fuzzy partitioning with update of the membership (git) and the center of clusters (ct) using Eqs. (18) and (19).
$$ g_{it} = \frac{1}{{\sum\nolimits_{k = 1}^{c} {\left( {\frac{{\left\| {x_{i} - \left. {c_{t} } \right\|} \right.}}{{\left\| {x_{i} - \left. {c_{k} } \right\|} \right.}}} \right)}^{{\left( {\frac{2}{r - 1}} \right)}} }} $$
(18)
$$ c_{t} = \frac{{\sum\nolimits_{i - 1}^{N} {g_{it}^{r} x_{i} } }}{{\sum\nolimits_{i - 1}^{N} {g_{it}^{r} } }} $$
(19)
This iteration will stop when maxit{|git(k+1) − git(k)|} < ∈ ,, here ∈ [0, 1] is a stopping criterion. Previous steps are repeated until the stopping condition is attained.
Proposed methodology
In the proposed work, a novel predictive model for medical diagnosis using a modified glowworm swarm algorithm (GSO) is used in enhancing the performance of ANFIS. In order to avoid GSO from getting trapped or stuck at local minima, DE is used to support GSO in improving the behavior of GSO. The methodology is illustrated in Fig. 4. The proposed approach enhances the ANFIS model using DE and GSO algorithms, called DE-GSO-ANFIS model. ANFIS parameters are adjusted by supplying best weights between the layers 4 and 5 of ANFIS.
Input is obtained from datasets and divided as train and test set, normally in 70:30 ratios. Fuzzy C-mean (FCM) is applied to compute the required number of membership functions through clustering process [31]. Dataset is clustered into different sets or groups. Then, ANFIS utilizes these outcomes to initiate rest of the process. The weights of ANFIS are adapted using DE–GSO algorithm. Exploration starts now wherein the DE–GSO searches for solution in the problem search space.
DE is employed in generating the initial population of GSO. The GSO then makes use of this generated population to initiate the search for best possible weights for the ANFIS. The following equation represents the fitness value of population given by
$$ {\text{Obj}}_{{{\text{func}}}} = \sum\limits_{{i = 1}}^{N} {\left\| {{\text{Obs}}_{i} - {\text{Pred}}_{i} } \right\|^{2} \underrightarrow {{\text{min}}}} $$
(20)
where Obsi is the ith observed value and Predi is the ith predicted value.
This function depicts the summation of the square error between the original value and the predicted value. The best solution is the one which has the minimum objective function value. Hence, the weights that are chosen are updated in accordance to error reduction between the true or observed value and the predicted value during training. These weights are then carried forward to the ANFIS that prepares the problem outcomes. Training phase is stopped if the stop conditions (maximum number of iterations and error less than the small value) are satisfied. The DE–GSO continues till maximum number of iterations is reached. ANFIS is thus constructed based on the parameters arriving from the best solution. Testing phase now starts, and the best detected weights are carried to the ANFIS to generate the result.