An extended Takagi–Sugeno–Kang inference system (TSK+) with fuzzy interpolation and its rule base generation
Abstract
A rule base covering the entire input domain is required for the conventional Mamdani inference and Takagi–Sugeno–Kang (TSK) inference. Fuzzy interpolation enhances conventional fuzzy rule inference systems by allowing the use of sparse rule bases by which certain inputs are not covered. Given that almost all of the existing fuzzy interpolation approaches were developed to support the Mamdani inference, this paper presents a novel fuzzy interpolation approach that extends the TSK inference. This paper also proposes a data-driven rule base generation method to support the extended TSK inference system. The proposed system enhances the conventional TSK inference in two ways: (1) workable with incomplete or unevenly distributed data sets or incomplete expert knowledge that entails only a sparse rule base and (2) simplifying complex fuzzy inference systems by using more compact rule bases for complex systems without the sacrificing of system performance. The experimentation shows that the proposed system overall outperforms the existing approaches with the utilisation of smaller rule bases.
Keywords
Fuzzy inference system TSK Fuzzy rule base generation Fuzzy interpolation1 Introduction
Fuzzy inference mechanisms are built upon fuzzy logic to map system inputs and outputs. A typical fuzzy inference system consists of a rule base and an inference engine. A number of inference engines have been developed, with the Mamdani inference (Mamdani (1977)) and the TSK inference (Takagi and Sugeno (1985)) being the most widely used. The Mamdani fuzzy model is more intuitive and suitable for handling linguistic inputs; its outputs are usually fuzzy sets, and thus, a defuzzification process is often required. In contrast, the TSK inference approach produces crisp outputs directly, as TSK fuzzy rules use polynomials as rule consequences. There are generally two types of rule bases used to support the two fuzzy inference engines, which are Mamdani-style rule bases and TSK-style rule bases accordingly.
A rule base, Mamdani-style or TSK-style, can either be translated from expert knowledge or extracted from data. The rule base led by the knowledge-driven approaches therefore essentially is a representation of the human experts’ knowledge in the format of fuzzy rules (Negnevitsky (2005)). In order to enable this approach, a problem has to be human comprehensible and fully understood by human experts as linguistic rules, which are then interpreted as fuzzy rules by specifying the membership functions of linguistic words. Recognising that the expert knowledge may not always be available, data-driven approaches were proposed, which extract fuzzy rules from a set of training data using machine learning approaches (Rezaee and Zarandi (2010)). Both Mamdani and TSK inference approaches are only workable with dense rule bases which each covers the entire input domain.
Fuzzy interpolation relaxes the requirement of dense rule bases from conventional fuzzy inference systems (Kóczy and Hirota (1993); Yang et al. (2017c)). When a given observation does not overlap with any rule antecedent, certain conclusions can still be generated by means of interpolation. In addition, fuzzy interpolation helps in system complexity reduction by removing the rules that can be approximated by their neighbours. A number of fuzzy interpolation approaches have been developed, such as Chen and Hsin (2015), Huang and Shen (2006, 2008), Kóczy and Hirota (1997), Naik et al. (2014), Yang and Shen (2011), Yang et al. (2017a), Shen and Yang (2011) and Yang and Shen (2013). However, all these existing fuzzy interpolation approaches were extensions of the Mamdani inference.
A novel fuzzy interpolation approach, which extends the TSK inference, is presented in this paper. The proposed fuzzy inference engine is workable with sparse, dense or imbalanced TSK-style rule bases, which is a further development on the seminal work of Li et al. (2017). In addition, a data-driven TSK-style rule base generation approach is also proposed to extract compact and concise rule bases from incomplete, imbalanced, normal or over-dense data sets. Note that sparse and imbalanced data sets are still commonly seen, regardless of the magnitude of the data sets in the era of big data. The proposed approach has been applied to two benchmark problems and a real-world problem in the field of cyber security. The experimentation demonstrated the power of the proposed approach in enhancing the conventional TSK inference method by means of broader applicability and better system efficiency, and competitive performance in reference to other machine learning approaches.
The structure of rest of the paper is organised as follows. Section 2 introduces the theoretical underpinnings of TSK fuzzy inference model and the TSK rule base generation approaches. Section 3 presents the extended TSK system, and Sect. 4 discusses the proposed rule base generation approach. Section 5 details the experimentation for demonstration and validation. Section 6 concludes the paper and suggests probable future developments.
2 Background
The conventional TSK inference system and rule base generation approaches are briefly reviewed in this section.
2.1 TSK inference
- 1 Determine the firing strength of each rule \(R_r\) (\(r\in \{1,2,\ldots ,n\}\)) by integrating the similarity degrees between its antecedents and the given inputs:where \(\wedge \) is a t-norm usually implemented as a minimum operator, and \(S(A_s^*,A_{sr})\) (\(s\in \{1,2,\ldots , m\}\)) is the similarity degree between fuzzy sets \(A_s^*\) and \(A_{sr}\):$$\begin{aligned} \alpha _r = S(A_{1}^*,A_{1r}) \wedge \ldots \wedge S(A_m^*,A_{mr}), \end{aligned}$$(2)where \(\mu _{A_{s}^*}(x) \text { and } \mu _{A_{sr}}(x)\) are the degrees of membership for a given value x within the domain.$$\begin{aligned} S(A_s^*,A_{sr}) = max\{min\{\mu _{A_s^*}(x),\mu _{A_{sr}}(x)\}\}, \end{aligned}$$(3)
- 2 Calculate the sub-output led by each rule \(R_r\) based on the given observation (\(A_1^*,\ldots ,A_m^*\)):where \(Rep(A^*_s)\) is the representative value or defuzzified value of fuzzy set fuzzy set \(A^*_s\), which is often calculated as the centre of area of the membership function.$$\begin{aligned} \begin{aligned}&f_r(x_1^*,\ldots ,x_m^*) \\&\quad = \beta _{0r}+\beta _{1r}Rep(A_1^*)+ \cdots + \beta _{mr}Rep(A_m^*), \end{aligned} \end{aligned}$$(4)
- 3 Generate the final output by integrating all the sub-outputs from all the rules:$$\begin{aligned} y=\frac{\displaystyle \sum \nolimits _{r=1}^{n}\alpha _r f_r(x_1^*,\ldots ,x_m^*)}{\displaystyle \sum \nolimits _{r=1}^{n}\alpha _r}. \end{aligned}$$(5)
2.2 TSK rule base generation
The antecedent variables of a TSK rule are represented as fuzzy sets and the consequence is represented by a linear polynomial function, as shown in Eq. 1. Data-driven fuzzy rule extraction approaches first partition the problem domain into multiple regions or rule clusters. Then, each region is represented by a TSK rule. Various clustering algorithms, such as K-Means (MacQueen (1967)) and its variations, can be used to divide the problem domain into sub-regions or rule clusters (Chen and Linkens (2004); Rezaee and Zarandi (2010)).
Given a rule cluster that contains a set of multi-dimensional data instances, a typical TSK fuzzy rule extraction process is performed in two steps: (1) rule antecedents determination and (2) consequent polynomial determination (Nandi and Klawonn (2007)). The rule antecedent determination process prescribes a fuzzy set to represent the information of the cluster on each dimension, such as Gaussian membership functions Chen and Linkens (2004). The consequent polynomial can usually be determined by employing the linear regression (Nandi and Klawonn (2007); Rezaee and Zarandi (2010)). Once each rule cluster has been expressed by a TSK rule, the TSK rule base can be assembled by combining all the extracted rules.
Note that a 0-order TSK fuzzy model is required if only symbolic labels are included in the output of data set (Kerk et al. (2016)). In this case, the step of consequent polynomial determination is usually omitted. Instead, discrete crisp numbers are typically used in representing the symbolic output values. Accordingly, the rule base generation process is different by firstly dividing the labelled data set into multiple sub-data sets each sharing the same label. Then, a clustering algorithm is applied to each sub-data set to generate rule clusters. Finally, each rule cluster is represented as the antecedents of a rule with an integer number used as the 0-order consequence representing the label.
3 TSK inference with fuzzy interpolation (TSK+)
The conventional TSK fuzzy inference system is extended in this section by allowing the interpolation and extrapolation of inference results. The extended system is thus workable with sparse rule bases, dense rule bases and imbalanced rule bases, which is termed as TSK+ inference system.
3.1 Modified similarity measure
Conventional TSK will fail if a given input does not overlap any rule antecedent in the rule base. This can be addressed using fuzzy interpolation such that the inference consequence can be approximated from the neighbouring rules of the given input. In order to enable this, the measure of firing strength used in the conventional TSK inference is modified based on a revised similarity measure proposed in Chen and Chen (2003). In particular, the similarity measure proposed in Chen and Chen (2003) is not sensitive to distance in addition to membership functions. This similarity measure is further extended in this subsection such that its sensitivity to distance is flexible and configurable to support the development of TSK+ inference engine.
- 1.
lager value of \(S(A,A')\) represents higher similarity degree between fuzzy sets A and \(A'\);
- 2.
\(S(A,A') = 1\) if and only if fuzzy sets A and \(A'\) are identical;
- 3.
\(S(A,A') > 0\) unless (\(a_1=a_2=a_3=0\) and \(a_1'=a_2' =a_3' =1\)) or (\(a_1=a_2=a_3=1\) and \(a_1'=a_2'=a_3' =0\)).
3.2 Extended TSK inference
4 Sparse TSK rule base generation
A data-driven TSK-style rule base generation approach for the proposed TSK+ inference engine is presented in this section, which is outlined in Fig. 1. Given a data set \(\mathbb {T}\) which might be sparse, unevenly distributed, or dense, the system firstly groups the data instances into clusters using certain clustering algorithms. Then, each cluster is expressed as a TSK rule by employing linear regression. From this, an initial rule base is generated by combining all the extracted rules. Finally, the initialised rule base is optimised by applying the genetic algorithm (GA), which fine-tunes the membership functions of fuzzy sets in the rule base.
4.1 Rule base initialisation
Centroid-based clustering algorithms are traditionally employed in TSK fuzzy modelling to group similar objects together for rule extraction (Chen and Linkens (2004)), which is also the case in this work. However, differing from existing TSK-style rule base generation approaches, the proposed system is workable with dense data sets, sparse data sets and unevenly distributed data sets. Therefore, a two-level clustering scheme is applied in this work. The first level of clustering divides the given (dense/sparse) data set into multiple sub-data sets using sparse K-Means clustering algorithm (Witten and Tibshirani (2010)). Based on the feature of the sparse K-Means clustering, those divided sub-data sets are generally considered being dense. The second level of clustering is applied on each obtained dense sub-data set to generate rule clusters for TSK fuzzy rule extraction by employing the standard K-Means clustering algorithm (MacQueen (1967)). Note that the number of clusters has to be pre-defined for both sparse K-Means and the standard K-Means, which is discussed first below.
4.1.1 Number of clusters determination
4.1.2 Dense sub-data set generation
4.1.3 Rule cluster generation
4.1.4 Fuzzy rule extraction
Each determined cluster from the above steps is utilised to form one TSK fuzzy rule. A number of approaches have been proposed to use a Gaussian membership function to represent a cluster, such as Rezaee and Zarandi (2010). However, given the fact that most of the real-world data are not normally distributed, the cluster centroid usually is not identical with the centre of the Gaussian membership function, and thus, Gaussian membership functions may not be able to accurately represent the distribution of the calculated clusters. In order to prevent this and also keep computational efficiency as stated in Sect. 3, triangular membership functions are utilised in this work.
First-order polynomials are typically used as the consequences of TSK fuzzy rules. That is, \(y =\beta _{0r}+\beta _{1r}x_1 + \cdots +\beta _{mr}x_m\), where the parameters \(\beta _{0r}\) and \(\beta _{sr},\ s\in \{1,2,\ldots ,m\}\) are estimated using a linear regression approach. The locally weighted regression (LWR) is particularly adopted in this work, due to its ability to generate an independent model that is only related to the given cluster of data in the training data set (Nandi and Klawonn (2007); Rezaee and Zarandi (2010)). The rule consequence will deteriorate to 0-order, if the values in the output dimension are discrete integer numbers. From this, the raw base is initialised by combining all the extracted rules, which is of the form of Eq. 1.
4.2 Rule base optimisation
The generated raw rule base is optimised in this section by fine-tuning the membership functions using the general optimisation searching algorithm, genetic algorithm (GA). GA has been successfully utilised in rule base optimisation, such as Mucientes et al. (2009) and Tan et al. (2016). Briefly, GA is an adaptive heuristic search algorithm for solving both constrained and unconstrained optimisation problems based on evolutionary ideas of natural selection process that mimics biological evolution. The algorithm firstly initialises the population with random individuals. It then selects a number of individuals for reproduction by applying the genetic operators. The offspring and some of the selected existing individuals jointly form the next generation. The algorithm repeats this process until a satisfactory solution is generated or a maximum number of generations has been reached.
4.2.1 Problem representation
4.2.2 Population initialisation
The initial population \(\mathbb {P}= \{\textit{I}_1, \textit{I}_2,\ldots ,\textit{I}_{|\mathbb {P}|} \}\) is formed by taking the initialised rule base and its random variations. In order to guarantee all the variated fuzzy sets are valid and convex, constraint \(a_{sr}^{1}< a_{sr}^{2} < a_{sr}^{3}\) is applied to the genes representing each fuzzy set. The size of the population \(|\mathbb {P}|\) is a problem-specific adjustable parameter, typically ranging from tens to thousands, with 20–30 being used most commonly (Naik et al. (2014)).
4.2.3 Objective function
4.2.4 Selection
4.2.5 Reproduction
Once a number of parents are selected, they then breed some individuals for the next generation using the genetic operators crossover and mutation, as shown in Fig. 4. In particular, crossover swaps contiguous parts of the genes of two individuals. In this work, the single-point crossover approach is adopted, which swaps all data beyond this index point between the two parent chromosomes to generate two children. Note that the crossover point can only be between those genes which employed to represent two different fuzzy sets, such that all the fuzzy sets are valid and convex all the time during the reproduction process.
4.2.6 Iteration and termination
The selection and reproduction processes are iterated until the pre-defined maximum number of iterations is reached or the value of the objective function regarding an individual is less than a pre-specified threshold. When the termination condition is satisfied, the fittest individual in the current population is the optimal solution.
5 Experimentation
Two nonlinear mathematical models and a well-known real-world data set (KDD Cup 99 data set) are employed in this section to validate and evaluate the proposed system.
5.1 Experiment 1
5.1.1 Rule base initialisation
In order to demonstrate the proposed TSK+ rule base generation approach, a sparse training data set \(\mathbb {T}\) was manually generated from Eq. 17. The data set is composed of 300 data points sparsely distributed within the \([10,30] \times [10,30]\) domain covering 57% of the input domain. The distribution of this training data set is illustrated in a 2-dimensional plane in Fig. 5b. The key steps of TSK rule base initialisation using the training data set are summarised below.
SSE and performance improvement
No. of k | SSE | PI |
---|---|---|
1 | 2961.91 | |
2 | 2248.98 | 712.93 |
3 | 1763.74 | 485.24 |
4 | 1478.32 | 285.42 |
5 | 1286.41 | 191.91 |
6 | 1180.48 | 105.93 |
7 | 1079.48 | 101.00 |
8 | 997.626 | 81.85 |
9 | 946.238 | 51.39 |
10 | 880.43 | 65.81 |
11 | 840.865 | 39.57 |
12 | 808.51 | 32.36 |
13 | 767.61 | 40.90 |
14 | 709.997 | 57.61 |
15 | 689.301 | 20.70 |
SSE and PI for each sub-data set (\(k=\{1,2,\ldots ,10\}\))
k | \(T_1\) | \(T_2\) | \(T_3\) | \(T_4\) | \(T_5\) | \(T_6\) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
SSE | PI | SSE | PI | SSE | PI | SSE | PI | SSE | PI | SSE | PI | |
1 | 159.9 | 271.6 | 128.3 | 222.28 | 133.98 | 259.65 | ||||||
2 | 109.42 | 50.48 | 187.65 | 83.95 | 68.61 | 59.69 | 140.16 | 82.12 | 96.06 | 37.92 | 187.95 | 71.70 |
3 | 81.44 | 27.98 | 149.88 | 37.78 | 56.66 | 11.95 | 114.23 | 25.93 | 66.42 | 29.64 | 147.02 | 40.93 |
4 | 66.85 | 14.86 | 126.35 | 23.53 | 46.16 | 10.50 | 99.48 | 14.75 | 52.46 | 13.96 | 123.73 | 23.29 |
5 | 56.45 | 10.13 | 109.73 | 16.62 | 39.87 | 6.29 | 85.06 | 14.42 | 42.79 | 9.67 | 109.39 | 14.34 |
6 | 48.04 | 8.41 | 94.14 | 15.59 | 33.04 | 6.83 | 77.22 | 7.84 | 37.69 | 5.10 | 95.25 | 14.104 |
7 | 42.01 | 6.03 | 83.67 | 10.47 | 28.33 | 4.71 | 69.90 | 7.32 | 33.52 | 4.17 | 85.26 | 9.99 |
8 | 36.22 | 5.79 | 77.63 | 6.04 | 26.26 | 2.07 | 61.87 | 8.03 | 29.18 | 4.34 | 80.06 | 5.2 |
9 | 32.4 | 3.82 | 74.63 | 8.00 | 24.03 | 2.23 | 56.17 | 5.70 | 25.01 | 4.17 | 74.62 | 5.44 |
10 | 28.77 | 3.63 | 67.11 | 7.52 | 22.69 | 1.34 | 52.39 | 3.78 | 24.29 | 0.72 | 68.56 | 6.06 |
Step 3 Rule base initialisation A rule was extracted from each cluster and all the extracted rules jointly initialised the rule base. The generated 28 TSK fuzzy rules are detailed in Table 4.
5.1.2 TSK fuzzy interpolation
Given any input, overlapped with any rule antecedent or not, the proposed TSK+ inference engine is able to generate an output. For instance, a randomly generated testing data point was \((A_1^*=(27.37, 27.37, 27.37)\), \(A_2^*=(13.56, 13.56, 13.56))\). The proposed TSK+ inference engine firstly calculated the similarity degrees between the given input and the antecedents of every rule (\(S(A_1^*,A_{r1}), S(A_2^*,A_{r2}), r=\{1,2,\ldots ,28\}\)) using Eq. 7, with the results listed in Table 5.
From the calculated similarity degrees, the firing strength of each rule \(FS_r\) was computed using Eq. 2, and the sub-consequence from each rule was calculated using Eq. 4, which are also shown in Table 5. From this, the final output of the given input was calculated using Eq. 5 as \(y=-\,0.902\).
5.1.3 Rule base optimisation
The value of k for each sub-data set
\(T_1\) | \(T_2\) | \(T_3\) | \(T_4\) | \(T_5\) | \(T_6\) | |
---|---|---|---|---|---|---|
Determined number of k | 5 | 5 | 3 | 4 | 6 | 5 |
Generated raw TSK rule base
No. | Input 1 | Input 2 | Output | ||||||
---|---|---|---|---|---|---|---|---|---|
\(a_{11}\) | \(a_{12}\) | \(a_{13}\) | \(a_{21}\) | \(a_{22}\) | \(a_{23}\) | \(\beta _0\) | \(\beta _1\) | \(\beta _2\) | |
1 | 10.08 | 12.72 | 15.36 | 26.22 | 27.00 | 27.78 | \(-\) 0.14 | 0.06 | \(-\) 0.38 |
2 | 13.22 | 14.87 | 16.52 | 27.87 | 28.84 | 29.82 | \(-\) 0.00 | 0.28 | \(-\) 8.24 |
3 | 11.82 | 13.02 | 14.22 | 22.97 | 24.46 | 25.96 | \(-\) 0.15 | \(-\) 0.02 | 1.72 |
4 | 10.90 | 13.68 | 16.45 | 18.85 | 20.50 | 22.16 | \(-\) 0.04 | \(-\) 0.20 | 4.47 |
5 | 10.06 | 11.18 | 12.30 | 14.89 | 16.91 | 18.93 | 0.25 | \(-\) 0.06 | \(-\) 1.49 |
6 | 13.16 | 14.57 | 15.99 | 15.63 | 17.69 | 19.76 | 0.00 | \(-\) 0.21 | 4.30 |
7 | 19.24 | 20.48 | 21.71 | 26.83 | 28.25 | 29.68 | 0.14 | \(-\) 0.06 | \(-\) 1.21 |
8 | 19.59 | 20.52 | 21.45 | 22.15 | 22.89 | 23.64 | 0.24 | 0.08 | \(-\) 6.50 |
9 | 19.20 | 20.39 | 21.58 | 23.90 | 24.66 | 25.42 | 0.30 | 0.00 | \(-\) 6.00 |
10 | 10.16 | 11.26 | 12.37 | 10.85 | 12.44 | 14.02 | 0.12 | 0.09 | \(-\) 2.22 |
11 | 13.36 | 15.15 | 16.95 | 10.05 | 10.78 | 11.51 | \(-\) 0.01 | 0.30 | \(-\) 2.85 |
12 | 12.25 | 14.14 | 16.03 | 11.75 | 13.42 | 15.09 | 0.05 | 0.10 | \(-\) 2.21 |
13 | 20.51 | 21.24 | 21.97 | 15.28 | 17.48 | 19.69 | \(-\) 0.17 | 0.12 | 1.32 |
14 | 19.27 | 20.31 | 21.35 | 19.30 | 20.33 | 21.35 | 0.13 | \(-\) 0.00 | \(-\) 2.49 |
15 | 16.64 | 18.59 | 20.54 | 15.11 | 16.71 | 18.32 | \(-\) 0.27 | \(-\) 0.00 | 5.31 |
16 | 20.95 | 21.42 | 21.89 | 11.14 | 12.25 | 13.35 | \(-\) 0.17 | \(-\) 0.13 | 4.92 |
17 | 19.23 | 20.61 | 21.98 | 12.46 | 13.57 | 14.69 | \(-\) 0.28 | \(-\) 0.01 | 5.71 |
18 | 19.10 | 20.14 | 21.19 | 10.00 | 10.81 | 11.62 | \(-\) 0.06 | \(-\) 0.03 | 1.52 |
19 | 24.39 | 25.46 | 26.53 | 28.19 | 28.94 | 29.68 | \(-\) 0.00 | \(-\) 0.27 | 8.02 |
20 | 24.02 | 25.03 | 26.04 | 21.77 | 24.08 | 26.39 | \(-\) 0.01 | 0.05 | \(-\) 0.13 |
21 | 27.38 | 28.63 | 29.89 | 21.95 | 24.91 | 27.88 | \(-\) 0.27 | \(-\) 0.00 | 8.17 |
22 | 24.32 | 26.04 | 27.75 | 17.93 | 19.32 | 20.71 | 0.03 | 0.28 | \(-\) 6.23 |
23 | 24.31 | 25.73 | 27.15 | 15.75 | 16.37 | 16.98 | 0.09 | 0.15 | \(-\) 5.72 |
24 | 28.40 | 29.19 | 29.99 | 15.89 | 18.20 | 20.52 | 0.23 | 0.03 | \(-\) 7.43 |
25 | 26.03 | 27.77 | 29.52 | 10.50 | 11.39 | 12.29 | 0.07 | \(-\) 0.18 | \(-\) 0.01 |
26 | 24.15 | 25.29 | 26.42 | 10.16 | 11.62 | 13.08 | 0.03 | \(-\) 0.26 | 1.80 |
27 | 28.40 | 29.16 | 29.92 | 12.79 | 13.89 | 14.99 | 0.29 | \(-\) 0.02 | \(-\) 8.38 |
28 | 25.07 | 26.62 | 28.17 | 12.92 | 13.92 | 14.92 | 0.18 | \(-\) 0.06 | \(-\) 4.66 |
5.1.4 Results comparison
In order to enable a direct comparative study with support of the approaches proposed in Bellaaj et al. (2013) and Tan et al. (2016), the proposed approach was also applied to 36 randomly generated testing data points. The sum of errors for the 36 testing data led by the proposed approach is shown in Table 7, in addition to those led by the compared approaches. The proposed TSK+ outperformed the approaches proposed in Bellaaj et al. (2013) and Tan et al. (2016), although less rules have been used. Another advantage of the proposed approach is that the number of rules led by the proposed system was determined automatically by the system without the requirement of any human inputs. In addition, noticeably, the optimisation process significantly improved the system performance by reducing the sum of error from 3.38 to 1.78.
5.2 Experiment 2
Sub-result from each rule and its calculation details
p | \(S(A_1^*,A_{p1})\) | \(S(A_2^*,A_{p2})\) | \(FD_p\) | Consequence |
---|---|---|---|---|
1 | 0.0053 | 0.059 | 0.053 | \(-\) 0.014 |
2 | 0.020 | 0.003 | 0.003 | \(-\) 0.012 |
3 | 0.004 | 0.010 | 0.004 | \(-\) 0.014 |
4 | 0.001 | 0.836 | 0.001 | 0.006 |
5 | 0.007 | 0.455 | 0.007 | 0.004 |
6 | 0.016 | 0.780 | 0.016 | 0.023 |
7 | 0.467 | 0.157 | 0.157 | 0.170 |
8 | 0.461 | 0.004 | 0.004 | 0.008 |
9 | 0.449 | 0.052 | 0.052 | 0.118 |
10 | 0.012 | 0.954 | 0.012 | 0.018 |
11 | 0.024 | 0.870 | 0.024 | 0.024 |
12 | 0.002 | 0.939 | 0.002 | 0.005 |
13 | 0.211 | 0.848 | 0.211 | \(-\) 0.430 |
14 | 0.567 | 0.812 | 0.567 | \(-\) 0.996 |
15 | 0.440 | 0.480 | 0.440 | 0.114 |
16 | 0.591 | 0.941 | 0.591 | \(-\) 0.913 |
17 | 0.480 | 0.969 | 0.480 | \(-\) 1.042 |
18 | 0.414 | 0.871 | 0.414 | \(-\) 0.232 |
19 | 0.926 | 0.009 | 0.009 | 0.005 |
20 | 0.932 | 0.004 | 0.004 | 0.014 |
21 | 0.925 | 0.077 | 0.077 | 0.005 |
22 | 0.927 | 0.868 | 0.868 | \(-\) 0.937 |
23 | 0.918 | 0.736 | 0.736 | \(-\) 0.471 |
24 | 0.932 | 0.616 | 0.616 | \(-\) 1.030 |
25 | 0.948 | 0.902 | 0.902 | \(-\) 0.622 |
26 | 0.947 | 0.966 | 0.947 | \(-\) 0.547 |
27 | 0.906 | 0.913 | 0.906 | \(-\) 0.849 |
28 | 0.920 | 0.964 | 0.920 | \(-\) 0.589 |
GA parameters
Parameters | Values |
---|---|
Population Size | 100 |
Max value in fitness function | 2 |
Crossover rate | 0.85 |
Mutation rate | 0.05 |
Maximum iteration | 10,000 |
Termination threshold | 0.01 |
In this case, a randomly generated dense training data set was used, including 1681 data points distributed within the range of \([-10,10] \times [-10,10]\). By employing the proposed approach, 14 TSK fuzzy rules in total were generated. To allow a direct comparison, the mean-squares error (MSE) was used as the measurement of models, by following the work presented in Evsukoff et al. (2002) and zRezaee and Zarandi (2010). The detailed calculations of this experiment are omitted here. The MSE values led by different approaches with the specified number of rules are listed in Table 8. The results demonstrated that the proposed system TSK+ performed competitively with only 14 TSK fuzzy rules.
5.3 Experiment 3
This section considers a well-known real-world data set NSL-KDD-99 (Tavallaee et al. (2009)), which has been widely used as a benchmark (Bostani and Sheikhan (2017); Wang et al. (2010); Yang et al. (2017b)). This data set is a modified version of KDD Cup 99 data set generated in a military network environment. It contains 125,973 data instances with 41 attributes and 1 label which indicates the type of connection. In particular, normal connections and four types of attacks are labelled in the data set, including Denial of Service Attacks (DoS), User to Root Attacks(U2R), Remote to User Attacks (R2U) and Probes.
5.3.1 Data set pre-processing
The four most important attributes were selected from the original 41 using expertise knowledge in order to remove noises and redundancies in the work of Yang et al. (2017b). This experiment also took the four attributes as system input, and the application of automatic feature selection and reduction approaches, such as Zheng et al. (2015), remains as future work. The four selected input features are listed in Table 9. The data set in this experiment has also been normalised in an effort to reduce the potential noises in this real-world data set.
5.3.2 TSK+ model construction
As the labels are symbolic values, 0-order TSK-style fuzzy rules were used. In order to construct a 0-order TSK rule base, the training data set was divided into 5 sub-data sets based on the five symbolic labels, which are represented using five integer numbers. The sizes of the sub-data sets and their corresponding integer labels are listed in Table 10. The rule base generation process is summarised in four steps below.
Step 1 Dense sub-data set generation The sparse K-Means was applied on each sub-data set \(\mathbb {T}_j, 1\le j\le 5\) to generate dense sub-data sets. Taking the second sub-data set \(\mathbb {T}_2\) as an example, the performance improvement led by the increment of \(k_2,\ (k_2\in \{1,2,\ldots ,10\})\) is shown in Fig. 9. Following the Elbow approach, 3 was selected as the number of clusters, i.e. \(k_2 = 3\). Denote the 3 generated dense sub-sets as (\(T_{21},T_{22},T_{23}\)).
Selected input features
Feature # | Feature |
---|---|
5 | Size from source to destination |
6 | Size from destination to source |
23 | Number of connections in past 2 second |
35 | Different services rate for destination host |
Data details regarding the types of connections
Datasets | No. of instances | Classes | Rule consequence |
---|---|---|---|
\(\mathbb {T}_1\) | 53,874 | Normal traffic | 1 |
\(\mathbb {T}_2\) | 36,741 | DoS | 2 |
\(\mathbb {T}_3\) | 42 | U2R | 3 |
\(\mathbb {T}_4\) | 796 | R2U | 4 |
\(\mathbb {T}_5\) | 9,325 | Probes | 5 |
Rules and clusters for each set of data instances with the same type of connection
Dense data set | Normal \(\mathbb {T}_1\) | DoS \(\mathbb {T}_2\) | R2U \(\mathbb {T}_3\) | U2R \(\mathbb {T}_4\) | Probes \(\mathbb {T}_5\) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
\(T_{11}\) | \(T_{12}\) | \(T_{13}\) | \(T_{21}\) | \(T_{22}\) | \(T_{23}\) | \(T_{31}\) | \(T_{41}\) | \(T_{42}\) | \(T_{43}\) | \(T_{51}\) | \(T_{52}\) | \(T_{53}\) | |
Index i of rule cluster \(RC_i\) | 1–4 | 5–7 | 8–11 | 12–14 | 15–18 | 19–24 | 25–30 | 31–35 | 36–38 | 39–41 | 42–44 | 45 | 46 |
Index i of rule \(R_i\) | 1–4 | 5–7 | 8–11 | 12–14 | 15–18 | 19–24 | 25–30 | 31–35 | 36–38 | 39–41 | 42–44 | 45 | 46 |
Performance comparison
Normal Traffic | DoS | U2R | R2U | Probes | |
---|---|---|---|---|---|
Decision tree Wang et al. (2010) | 91.22 | 97.24 | 15.38 | 1.43 | 78.13 |
Naive bayes Wang et al. (2010) | 89.22 | 96.65 | 7.69 | 8.57 | 88.13 |
BPNN Wang et al. (2010) | 89.75 | 97.20 | 23.08 | 5.71 | 88.75 |
FC-ANN Wang et al. (2010) | 91.32 | 96.70 | 76.92 | 58.57 | 80.00 |
MOPF Bostani and Sheikhan (2017) | N/A | 96.89 | 77.98 | 81.13 | 85.92 |
TSK+ without GA (the proposed approach) | 77.10 | 94.07 | 57.69 | 55.29 | 78.71 |
TSK+ (the proposed approach) | 93.10 | 97.84 | 65.38 | 84.65 | 85.69 |
5.3.3 TSK+ model evaluation
Once the TSK fuzzy rule base was generated, it can then be used for connection classification by the proposed TSK+ inference engine. The rule base and the inference engine TSK+ jointly formed the fuzzy model, which was validated and evaluated using a testing dataset. The testing data set contains 22,544 data samples provided by Tavallaee et al. (2009). The testing data set was also extracted from original KDD Cup 99 data set, but it does not share any data instance with the training data set NSL-KDD-99.
Note that the testing data set has been used in a number of projects with different classification approaches. In particular, decision tree, naive Bayes, back-propagation neural network (BPNN), fuzzy clustering-artificial neural network (FC-ANN) have been employed in Wang et al. (2010), and modified optimum-path forest (MOPF) was applied in Bostani and Sheikhan (2017). The accuracy of the classification results for each class of network traffic generated by different approaches including the proposed one with the initialised rule base and the optimised rule base is listed in Table 12.
The results show that the proposed TSK+ overall outperformed all other approaches; and the proposed rule base optimisation approach significantly improved the system performance by over 10% on average. In particular, the proposed system achieved better accuracies on the prediction of normal connections, DoS and R2U than those of all other approaches; worse performance led by the proposed approach in the class of U2R compared to FC-ANN and MOPF; and similar result was generated for class Probes with the existing best performance resulted from other approaches.
6 Conclusion
This paper proposed a fuzzy inference system TSK+, which extended the conventional TSK inference system such that it is also applicable to sparse rule bases and unevenly distributed rule bases. This is achieved by allowing the interpolation and extrapolation of the output from all rules even if the given input does not overlap with any rule antecedents. This paper also proposed a novel data-driven rule base generation approach, which is workable with spare data sets, dense data set, and unevenly distributed data sets. The system was validated and evaluated by applying two benchmark problems and one real-world data set. The experimental results demonstrated the wide applicability of the proposed system with compact rule bases and competitive performances.
The proposed system can be further enhanced in the following areas. Firstly, the sparsity-aware possibilistic clustering algorithm (Xenaki et al. (2016)) was designed to work with sparse data sets, and thus, it is desired to investigate how this clustering algorithm can support the proposed TSK+ inference system. Secondly, it is worthwhile to study how the proposed sparse rule base generation approach can help in generating Mamdani-style fuzzy rule bases. Finally, it would be interesting to define the sparsity or density of a data set such that more accurate clustering results can be generated during rule cluster generation.
Notes
Acknowledgements
This work is supported by a Northumbria University PhD scholarship, the National Natural Science Foundation of China (No. 61502068) and the China Postdoctoral Science Foundation (Nos. 2013M541213 and 2015T80239).
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
References
- Baker JE (1985) Adaptive selection methods for genetic algorithms. In: Proceedings of an international conference on genetic algorithms and their applications, Hillsdale, New Jersey, pp 101–111Google Scholar
- Bellaaj H, Ketata R, Chtourou M (2013) A new method for fuzzy rule base reduction. J Intell Fuzzy Syst. 25(3):605–613Google Scholar
- Bostani H, Sheikhan M (2017) Modification of supervised opf-based intrusion detection systems using unsupervised learning and social network concept. Pattern Recognit 62:56–72CrossRefGoogle Scholar
- Chang PC, Fan CY (2008) A hybrid system integrating a wavelet and tsk fuzzy rules for stock price forecasting. IEEE Trans Syst Man, and Cybern Part C Appl Rev 38(6):802–815CrossRefGoogle Scholar
- Chen MY, Linkens D (2004) Rule-base self-generation and simplification for data-driven fuzzy models. Fuzzy Sets and Syst 142:243–265MathSciNetCrossRefzbMATHGoogle Scholar
- Chen S, Hsin W (2015) Weighted fuzzy interpolative reasoning based on the slopes of fuzzy sets and particle swarm optimization techniques. IEEE Trans Cybern 45(7):1250–1261CrossRefGoogle Scholar
- Chen SJ, Chen SM (2003) Fuzzy risk analysis based on similarity measures of generalized fuzzy numbers. IEEE Trans Fuzzy Syst 11(1):45–56CrossRefGoogle Scholar
- Evsukoff A, Branco AC, Galichet S (2002) Structure identification and parameter optimization for non-linear fuzzy modeling. Fuzzy Sets and Syst 132(2):173–188CrossRefzbMATHGoogle Scholar
- Huang Z, Shen Q (2006) Fuzzy interpolative reasoning via scale and move transformations. IEEE Trans Fuzzy Syst 14(2):340–359CrossRefGoogle Scholar
- Huang Z, Shen Q (2008) Fuzzy interpolation and extrapolation: A practical approach. IEEE Trans Fuzzy Syst 16(1):13–28CrossRefGoogle Scholar
- Johanyák ZC, Kovács S (2005) Distance based similarity measures of fuzzy sets. In: Proceedings of SAMI 2005Google Scholar
- Kerk YW, Pang LM, Tay KM, Lim CP (2016) Multi-expert decision-making with incomplete and noisy fuzzy rules and the monotone test. In: 2016 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 94–101Google Scholar
- Kóczy L, Hirota K (1993) Approximate reasoning by linear rule interpolation and general approximation. Int J Approx Reason 9(3):197–225MathSciNetCrossRefzbMATHGoogle Scholar
- Kóczy L, Hirota K (1997) Size reduction by interpolation in fuzzy rule bases. IEEE Trans Syst Man Cybern Part B Cybern 27(1):14–25CrossRefGoogle Scholar
- Kodinariya TM, Makwana PR (2013) Review on determining number of cluster in k-means clustering. Int J 1(6):90–95Google Scholar
- Li J, Qu Y, Shum HPH, Yang L (2017) TSK inference with sparse rule bases. Springer International Publishing, Cham, pp 107–123Google Scholar
- MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA., pp 281–297Google Scholar
- Mamdani EH (1977) Application of fuzzy logic to approximate reasoning using linguistic synthesis. Comput IEEE Trans C 26(12):1182–1191CrossRefzbMATHGoogle Scholar
- Mucientes M, Vidal JC, Bugarín A, Lama M (2009) Processing time estimations by variable structure tsk rules learned through genetic programming. Soft Comput 13(5):497–509CrossRefGoogle Scholar
- Naik N, Diao R, Shen Q (2014) Genetic algorithm-aided dynamic fuzzy rule interpolation. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) 2014, pp 2198–2205Google Scholar
- Nandi AK, Klawonn F (2007) Detecting ambiguities in regression problems using tsk models. Soft Comput 11(5):467–478CrossRefGoogle Scholar
- Negnevitsky M (2005) Artificial intelligence: a guide to intelligent systems. Pearson Education, LondonGoogle Scholar
- Rezaee B, Zarandi MF (2010) Data-driven fuzzy modeling for Takagi-Sugeno-Kang fuzzy system. Inf Sci 180(2):241–255CrossRefGoogle Scholar
- Shen Q, Yang L (2011) Generalisation of scale and move transformation-based fuzzy interpolation. JACIII 15(3):288–298. https://doi.org/10.20965/jaciii.2011.p0288 CrossRefGoogle Scholar
- Takagi T, Sugeno M (1985) Fuzzy identification of systems and its applications to modeling and control. Syst Man and Cybern IEEE Trans SMC 15(1):116–132CrossRefzbMATHGoogle Scholar
- Tan Y, Li J, Wonders M, Chao F, Shum HPH, Yang L (2016) Towards sparse rule base generation for fuzzy rule interpolation. In: IEEE world congress on computation intelligence internation conferenceGoogle Scholar
- Tavallaee M, Bagheri E, Lu W, Ghorbani A (2009) A detailed analysis of the kdd cup 99 data set. In: The second IEEE symposium on computational intelligence for security and defence applicationsGoogle Scholar
- Wang G, Hao J, Ma J, Huang L (2010) A new approach to intrusion detection using artificial neural networks and fuzzy clustering. Exp Syst Appl 37(9):6225–6232CrossRefGoogle Scholar
- Witten DM, Tibshirani R (2010) A framework for feature selection in clustering. J Am Stat Assoc 105(490):713–726MathSciNetCrossRefzbMATHGoogle Scholar
- Xenaki SD, Koutroumbas KD, Rontogiannis AA (2016) Sparsity-aware possibilistic clustering algorithms. IEEE Trans Fuzzy Syst 24(6):1611–1626CrossRefGoogle Scholar
- Yang L, Shen Q (2011) Adaptive fuzzy interpolation. Fuzzy Syst IEEE Trans 19(6):1107–1126CrossRefGoogle Scholar
- Yang L, Shen Q (2013) Closed form fuzzy interpolation. Fuzzy Sets and Syst 225:1–22 theme: Fuzzy SystemsMathSciNetCrossRefzbMATHGoogle Scholar
- Yang L, Chao F, Shen Q (2017) Generalized adaptive fuzzy rule interpolation. IEEE Trans Fuzzy Syst 25(4):839–853CrossRefGoogle Scholar
- Yang L, Li J, Fehringer G, Barraclough P, Sexton G, Cao Y (2017) Intrusion detection system by fuzzy interpolation. In: 2017 IEEE international conference on fuzzy systemsGoogle Scholar
- Yang L, Zuo Z, Chao F, Qu Y (2017) Fuzzy rule interpolation and its application in control. In: Ramakrishnan S (ed) Modern Fuzzy Control Systems and Its Applications, InTech, RijekaGoogle Scholar
- Zheng L, Diao R, Shen Q (2015) Self-adjusting harmony search-based feature selection. Soft Comput 19(6):1567–1579CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.