Abstract
A rule base covering the entire input domain is required for the conventional Mamdani inference and Takagi–Sugeno–Kang (TSK) inference. Fuzzy interpolation enhances conventional fuzzy rule inference systems by allowing the use of sparse rule bases by which certain inputs are not covered. Given that almost all of the existing fuzzy interpolation approaches were developed to support the Mamdani inference, this paper presents a novel fuzzy interpolation approach that extends the TSK inference. This paper also proposes a datadriven rule base generation method to support the extended TSK inference system. The proposed system enhances the conventional TSK inference in two ways: (1) workable with incomplete or unevenly distributed data sets or incomplete expert knowledge that entails only a sparse rule base and (2) simplifying complex fuzzy inference systems by using more compact rule bases for complex systems without the sacrificing of system performance. The experimentation shows that the proposed system overall outperforms the existing approaches with the utilisation of smaller rule bases.
1 Introduction
Fuzzy inference mechanisms are built upon fuzzy logic to map system inputs and outputs. A typical fuzzy inference system consists of a rule base and an inference engine. A number of inference engines have been developed, with the Mamdani inference (Mamdani (1977)) and the TSK inference (Takagi and Sugeno (1985)) being the most widely used. The Mamdani fuzzy model is more intuitive and suitable for handling linguistic inputs; its outputs are usually fuzzy sets, and thus, a defuzzification process is often required. In contrast, the TSK inference approach produces crisp outputs directly, as TSK fuzzy rules use polynomials as rule consequences. There are generally two types of rule bases used to support the two fuzzy inference engines, which are Mamdanistyle rule bases and TSKstyle rule bases accordingly.
A rule base, Mamdanistyle or TSKstyle, can either be translated from expert knowledge or extracted from data. The rule base led by the knowledgedriven approaches therefore essentially is a representation of the human experts’ knowledge in the format of fuzzy rules (Negnevitsky (2005)). In order to enable this approach, a problem has to be human comprehensible and fully understood by human experts as linguistic rules, which are then interpreted as fuzzy rules by specifying the membership functions of linguistic words. Recognising that the expert knowledge may not always be available, datadriven approaches were proposed, which extract fuzzy rules from a set of training data using machine learning approaches (Rezaee and Zarandi (2010)). Both Mamdani and TSK inference approaches are only workable with dense rule bases which each covers the entire input domain.
Fuzzy interpolation relaxes the requirement of dense rule bases from conventional fuzzy inference systems (Kóczy and Hirota (1993); Yang et al. (2017c)). When a given observation does not overlap with any rule antecedent, certain conclusions can still be generated by means of interpolation. In addition, fuzzy interpolation helps in system complexity reduction by removing the rules that can be approximated by their neighbours. A number of fuzzy interpolation approaches have been developed, such as Chen and Hsin (2015), Huang and Shen (2006, 2008), Kóczy and Hirota (1997), Naik et al. (2014), Yang and Shen (2011), Yang et al. (2017a), Shen and Yang (2011) and Yang and Shen (2013). However, all these existing fuzzy interpolation approaches were extensions of the Mamdani inference.
A novel fuzzy interpolation approach, which extends the TSK inference, is presented in this paper. The proposed fuzzy inference engine is workable with sparse, dense or imbalanced TSKstyle rule bases, which is a further development on the seminal work of Li et al. (2017). In addition, a datadriven TSKstyle rule base generation approach is also proposed to extract compact and concise rule bases from incomplete, imbalanced, normal or overdense data sets. Note that sparse and imbalanced data sets are still commonly seen, regardless of the magnitude of the data sets in the era of big data. The proposed approach has been applied to two benchmark problems and a realworld problem in the field of cyber security. The experimentation demonstrated the power of the proposed approach in enhancing the conventional TSK inference method by means of broader applicability and better system efficiency, and competitive performance in reference to other machine learning approaches.
The structure of rest of the paper is organised as follows. Section 2 introduces the theoretical underpinnings of TSK fuzzy inference model and the TSK rule base generation approaches. Section 3 presents the extended TSK system, and Sect. 4 discusses the proposed rule base generation approach. Section 5 details the experimentation for demonstration and validation. Section 6 concludes the paper and suggests probable future developments.
2 Background
The conventional TSK inference system and rule base generation approaches are briefly reviewed in this section.
2.1 TSK inference
Suppose a TSKstyle fuzzy rule base comprises of n rules each with m antecedents:
where \(\beta _{0r}\) and \(\beta _{sr}\), (\(r\in \{1,2,\ldots ,n\}\) and \(s\in \{1,2,\ldots \), \(m\}\)) are constant parameters of the linear functions of rule consequences. The consequence polynomials deteriorate to constant numbers \(\beta _{0r}\) when the outputs are discrete crisp numbers (to represent symbolic values). Given an input vector \((A_1^*,\ldots ,A_m^*)\), the TSK engine performs inference in the following steps:

1 Determine the firing strength of each rule \(R_r\) (\(r\in \{1,2,\ldots ,n\}\)) by integrating the similarity degrees between its antecedents and the given inputs:
$$\begin{aligned} \alpha _r = S(A_{1}^*,A_{1r}) \wedge \ldots \wedge S(A_m^*,A_{mr}), \end{aligned}$$(2)where \(\wedge \) is a tnorm usually implemented as a minimum operator, and \(S(A_s^*,A_{sr})\) (\(s\in \{1,2,\ldots , m\}\)) is the similarity degree between fuzzy sets \(A_s^*\) and \(A_{sr}\):
$$\begin{aligned} S(A_s^*,A_{sr}) = max\{min\{\mu _{A_s^*}(x),\mu _{A_{sr}}(x)\}\}, \end{aligned}$$(3)where \(\mu _{A_{s}^*}(x) \text { and } \mu _{A_{sr}}(x)\) are the degrees of membership for a given value x within the domain.

2 Calculate the suboutput led by each rule \(R_r\) based on the given observation (\(A_1^*,\ldots ,A_m^*\)):
$$\begin{aligned} \begin{aligned}&f_r(x_1^*,\ldots ,x_m^*) \\&\quad = \beta _{0r}+\beta _{1r}Rep(A_1^*)+ \cdots + \beta _{mr}Rep(A_m^*), \end{aligned} \end{aligned}$$(4)where \(Rep(A^*_s)\) is the representative value or defuzzified value of fuzzy set fuzzy set \(A^*_s\), which is often calculated as the centre of area of the membership function.

3 Generate the final output by integrating all the suboutputs from all the rules:
$$\begin{aligned} y=\frac{\displaystyle \sum \nolimits _{r=1}^{n}\alpha _r f_r(x_1^*,\ldots ,x_m^*)}{\displaystyle \sum \nolimits _{r=1}^{n}\alpha _r}. \end{aligned}$$(5)
It is clear from Eq. 3 that the firing strength will be 0 if a given input vector does not overlap with any rule antecedent. In this case, no rule will be fired and the conventional TSK approach will fail.
2.2 TSK rule base generation
The antecedent variables of a TSK rule are represented as fuzzy sets and the consequence is represented by a linear polynomial function, as shown in Eq. 1. Datadriven fuzzy rule extraction approaches first partition the problem domain into multiple regions or rule clusters. Then, each region is represented by a TSK rule. Various clustering algorithms, such as KMeans (MacQueen (1967)) and its variations, can be used to divide the problem domain into subregions or rule clusters (Chen and Linkens (2004); Rezaee and Zarandi (2010)).
Given a rule cluster that contains a set of multidimensional data instances, a typical TSK fuzzy rule extraction process is performed in two steps: (1) rule antecedents determination and (2) consequent polynomial determination (Nandi and Klawonn (2007)). The rule antecedent determination process prescribes a fuzzy set to represent the information of the cluster on each dimension, such as Gaussian membership functions Chen and Linkens (2004). The consequent polynomial can usually be determined by employing the linear regression (Nandi and Klawonn (2007); Rezaee and Zarandi (2010)). Once each rule cluster has been expressed by a TSK rule, the TSK rule base can be assembled by combining all the extracted rules.
Note that a 0order TSK fuzzy model is required if only symbolic labels are included in the output of data set (Kerk et al. (2016)). In this case, the step of consequent polynomial determination is usually omitted. Instead, discrete crisp numbers are typically used in representing the symbolic output values. Accordingly, the rule base generation process is different by firstly dividing the labelled data set into multiple subdata sets each sharing the same label. Then, a clustering algorithm is applied to each subdata set to generate rule clusters. Finally, each rule cluster is represented as the antecedents of a rule with an integer number used as the 0order consequence representing the label.
3 TSK inference with fuzzy interpolation (TSK+)
The conventional TSK fuzzy inference system is extended in this section by allowing the interpolation and extrapolation of inference results. The extended system is thus workable with sparse rule bases, dense rule bases and imbalanced rule bases, which is termed as TSK+ inference system.
3.1 Modified similarity measure
Conventional TSK will fail if a given input does not overlap any rule antecedent in the rule base. This can be addressed using fuzzy interpolation such that the inference consequence can be approximated from the neighbouring rules of the given input. In order to enable this, the measure of firing strength used in the conventional TSK inference is modified based on a revised similarity measure proposed in Chen and Chen (2003). In particular, the similarity measure proposed in Chen and Chen (2003) is not sensitive to distance in addition to membership functions. This similarity measure is further extended in this subsection such that its sensitivity to distance is flexible and configurable to support the development of TSK+ inference engine.
It has been proven in the literature that different types of membership functions do not pose a significant difference in inference results if the membership functions are properly finetuned (Chang and Fan (2008)). Based on this, only triangular membership functions are used in this work for computational efficiency. Given two triangular fuzzy sets \(A=(a_1,a_2,a_3)\) and \(A'=(a_1',a_2',a_3')\) in a normalised variable domain, their similarity degree \(S(A,A^{'})\) can be calculated as (Chen and Chen (2003)):
Equation 6 is extended in this work by introducing a configurable parameter as:
where d, termed as \(distance\ factor\), is a function of the distance between the two concerned fuzzy sets:
where \(\Vert A,A'\Vert \) represents the distance between the two fuzzy sets usually defined as the Euclidean distance of their representative values, and s (\(s > 0\)) is an adjustable sensitivity factor. Smaller value of s leads to a similarity degree which is more sensitive to the distance of the two fuzzy sets. The constant 5 in the equation ensures that the distance factor is normalised as 1 when the distance between two given fuzzy sets is 0 (i.e. the two fuzzy sets have the same representative values). According to Eq. 8, the distance factor is not considered when fuzzy sets \(A \text { and } A'\) are both crisp. This is because the shapes of the fuzzy set need to be considered by the representative values as contributing elements of the distance factor when the objects are fuzzy sets, but there is no point to consider this element if the objects are crisp numbers (Johanyák and Kovács (2005)).
The modified similarity measure \(S(A,A')\) between fuzzy sets A and \(A'\) has the following properties:

1.
lager value of \(S(A,A')\) represents higher similarity degree between fuzzy sets A and \(A'\);

2.
\(S(A,A') = 1\) if and only if fuzzy sets A and \(A'\) are identical;

3.
\(S(A,A') > 0\) unless (\(a_1=a_2=a_3=0\) and \(a_1'=a_2' =a_3' =1\)) or (\(a_1=a_2=a_3=1\) and \(a_1'=a_2'=a_3' =0\)).
3.2 Extended TSK inference
Given a rule base as specified in Eq. 1 and an input vector \((A_1^*,\ldots ,A_m^*)\), the TSK+ performs inferences using the same steps as those detailed in Sect. 2.1 except that Eq. 3 is replaced by Eq. 7. According to the third property of the modified similarity measure discussed above, \(S(A_s^{*},A_{sr}) >0\) unless \(A_s^*\) and \(A_{sr}\) take boundary crisp values 0 and 1. This means the firing strength of any rule \(R_r\) is always greater than 0, i.e. \(\alpha _r >0\), except for the special case when only boundary crisp values are involved. As a result, every rule in the rule base contributes to the final inference result to a certain degree. Therefore, even if the given observation does not overlap with any rule antecedent in the rule base, certain inference result can still be generated, which significantly improves the applicability of the conventional TSK inference system.
4 Sparse TSK rule base generation
A datadriven TSKstyle rule base generation approach for the proposed TSK+ inference engine is presented in this section, which is outlined in Fig. 1. Given a data set \(\mathbb {T}\) which might be sparse, unevenly distributed, or dense, the system firstly groups the data instances into clusters using certain clustering algorithms. Then, each cluster is expressed as a TSK rule by employing linear regression. From this, an initial rule base is generated by combining all the extracted rules. Finally, the initialised rule base is optimised by applying the genetic algorithm (GA), which finetunes the membership functions of fuzzy sets in the rule base.
4.1 Rule base initialisation
Centroidbased clustering algorithms are traditionally employed in TSK fuzzy modelling to group similar objects together for rule extraction (Chen and Linkens (2004)), which is also the case in this work. However, differing from existing TSKstyle rule base generation approaches, the proposed system is workable with dense data sets, sparse data sets and unevenly distributed data sets. Therefore, a twolevel clustering scheme is applied in this work. The first level of clustering divides the given (dense/sparse) data set into multiple subdata sets using sparse KMeans clustering algorithm (Witten and Tibshirani (2010)). Based on the feature of the sparse KMeans clustering, those divided subdata sets are generally considered being dense. The second level of clustering is applied on each obtained dense subdata set to generate rule clusters for TSK fuzzy rule extraction by employing the standard KMeans clustering algorithm (MacQueen (1967)). Note that the number of clusters has to be predefined for both sparse KMeans and the standard KMeans, which is discussed first below.
4.1.1 Number of clusters determination
A number of approaches have been proposed in the literature to determine the value of k, such as the Elbow method, Crossvalidation, Bayesian Information Criterion (BIC)based approach, and Silhouettebased approach (Kodinariya and Makwana (2013)). In particular, the Elbow method is faster and effective, and this approach is therefore employed in this work. This approach determines the number of clusters based on the criteria that adding another cluster does not lead to much better modelling result based on a given objective function. For instance, for a given problem, the relationship between performance improvement and the value of k is shown in Fig. 2. The value of k in this case can be determined as 4 which is the obvious turning point (or the Elbow point).
4.1.2 Dense subdata set generation
Sparse KMeans is an extension of the standard KMeans for handling sparse data sets (Witten and Tibshirani (2010)). Assuming k clusters are required, sparse KMeans also starts with the initialisation of k centroids (usually randomly). This is followed by the assignment of data instances to centroids and the updating of centroids based on the assignments, which are iterated until there is no change on the assignments. Different from the standard KMeans which assigns objects with the goal of minimising the withincluster sum of squares error (SSE), the sparse KMeans assigns objects by maximising the betweencluster sum of squares error (BCSS), which is defined as:
where p is the total numbers of data instances in the given data set, m is the number of input features in the given data set, \(\mu _q\) is the mean of all the elements on the qth feature, and \(x_{tq}\) is the qth feature of the tth data point in the given data set.
The withincluster sum of squares error SSE is defined as:
where k is the number of clusters determined by the Elbow approach, \(p_j\) is the number of data instances in the jth cluster, \(x_{jt}\) is the tth data point in the jth cluster, \(v_j\) is the jth cluster centre, and \(\parallel x_{jt}  v_j \parallel \) is the Euclidean distance between \(x_{jt}\) and \(v_j\). Note that if the labels in a given data set are symbolic values, only 0order TSK rules are required and thus Eq. 9 becomes:
4.1.3 Rule cluster generation
Once the given training data set \(\mathbb {T}\) has been divided into k dense subdata sets, KMeans is employed to each determined subdata set \(T_i\) (\(1\le i \le k\)) to generate rule clusters, each representing a rule. Assume that \(k_i\) clusters are required for a subdata set \(T_i\). KMeans is initialised by \(k_i\) random cluster centroids. It then assigns every data instance to one cluster by minimising the SSE:
where \(p_{t}^i\) is the number of data points in the jth cluster of the subdata set \(T_i\), \(x_{jt}^i\) is the tth data point in the jth cluster in the subdata set \(T_i\), \(v_j^i\) is the centre of the jth cluster in the subdata set \(T_i\), and \(\parallel x_{jt}^i  v_j^i \parallel \) is the Euclidean distance between \(x_{jt}^i\) and \(v_j^i\). Once all the data instances are assigned, the algorithm updates the cluster centroids accordingly to the newly assigned members. These two steps are iterated until there is no change in object assignments. After the KMeans is applied, the given training data set \(\mathbb {T}\) is divided into \(n = \sum _{j=1}^{k}k_i\) clusters. For simplicity, the generated rule clusters are jointly represented as \(\{RC_1, RC_2, \ldots , RC_n\}\).
4.1.4 Fuzzy rule extraction
Each determined cluster from the above steps is utilised to form one TSK fuzzy rule. A number of approaches have been proposed to use a Gaussian membership function to represent a cluster, such as Rezaee and Zarandi (2010). However, given the fact that most of the realworld data are not normally distributed, the cluster centroid usually is not identical with the centre of the Gaussian membership function, and thus, Gaussian membership functions may not be able to accurately represent the distribution of the calculated clusters. In order to prevent this and also keep computational efficiency as stated in Sect. 3, triangular membership functions are utilised in this work.
Suppose that a data set has m input features and a singleoutput feature. Given a rule cluster \(RC_r\) \((1\le r \le n)\), a TSK fuzzy rule \(R_r\) can be extracted from the cluster as follows:
Without loss generality, take the sth dimension (\(1 \le s\le m\)) of rule cluster \(RC_r\) as an example, denoted as \(RC_r^s\). Suppose that \(RC_r^s\) has \(p_r\) elements, i.e. \(RC_r^s = \{x_s^1,\) \( x_s^2, \ldots ,x_s^{p_r}\}\). As only triangular fuzzy sets are used in this work, fuzzy set \(A_{sr}\) can be precisely represented as \((a^1_{sr}, a^2_{sr}, a^3_{sr})\). The core of the triangular fuzzy set is set as the cluster centroid, that is \(a^2_{sr} = \sum _{q=1}^{p_r} x_s^q/p_r\); and the support of the fuzzy set is set as the span of the cluster, i.e. \((a^1_{sr}, a^3_{sr}) = (min\{x^1_s, x^2_s, \ldots , x_s^{p_r}\},\) \(max\{x^1_s, x^2_s, \ldots , x^{p_r}_s\})\).
Firstorder polynomials are typically used as the consequences of TSK fuzzy rules. That is, \(y =\beta _{0r}+\beta _{1r}x_1 + \cdots +\beta _{mr}x_m\), where the parameters \(\beta _{0r}\) and \(\beta _{sr},\ s\in \{1,2,\ldots ,m\}\) are estimated using a linear regression approach. The locally weighted regression (LWR) is particularly adopted in this work, due to its ability to generate an independent model that is only related to the given cluster of data in the training data set (Nandi and Klawonn (2007); Rezaee and Zarandi (2010)). The rule consequence will deteriorate to 0order, if the values in the output dimension are discrete integer numbers. From this, the raw base is initialised by combining all the extracted rules, which is of the form of Eq. 1.
4.2 Rule base optimisation
The generated raw rule base is optimised in this section by finetuning the membership functions using the general optimisation searching algorithm, genetic algorithm (GA). GA has been successfully utilised in rule base optimisation, such as Mucientes et al. (2009) and Tan et al. (2016). Briefly, GA is an adaptive heuristic search algorithm for solving both constrained and unconstrained optimisation problems based on evolutionary ideas of natural selection process that mimics biological evolution. The algorithm firstly initialises the population with random individuals. It then selects a number of individuals for reproduction by applying the genetic operators. The offspring and some of the selected existing individuals jointly form the next generation. The algorithm repeats this process until a satisfactory solution is generated or a maximum number of generations has been reached.
4.2.1 Problem representation
Assume that an initialised TSK rule base is comprised of n rules as expressed in Eq. 1. A chromosome or individual, denoted as \(\textit{I}\), in the GA is used to represent a potential solution, which is designed to represent the rule base in this proposed system, as illustrated in Fig. 3.
4.2.2 Population initialisation
The initial population \(\mathbb {P}= \{\textit{I}_1, \textit{I}_2,\ldots ,\textit{I}_{\mathbb {P}} \}\) is formed by taking the initialised rule base and its random variations. In order to guarantee all the variated fuzzy sets are valid and convex, constraint \(a_{sr}^{1}< a_{sr}^{2} < a_{sr}^{3}\) is applied to the genes representing each fuzzy set. The size of the population \(\mathbb {P}\) is a problemspecific adjustable parameter, typically ranging from tens to thousands, with 20–30 being used most commonly (Naik et al. (2014)).
4.2.3 Objective function
An objective function is used in the GA to determine the quality of individuals. The objective function in this work is defined as the root mean square error (RMSE). Given a training data set \(\mathbb {T} \) and an individual \(I_i\), \(1\le i \le  \mathbb {P} \), the RMSE value can be calculated as:
where \(\mathbb {T}\) is the size of the given training data set, \(z_j\) is the label of the jth training data instance, and \(\hat{z}_j\) represents the output value led by the proposed TSK+ inference approach. The individual with the smallest value of RMSE represents the fittest solution in the population.
4.2.4 Selection
A number of individuals need to be selected for reproduction, which is implemented in this work by the fitness proportionate selection method, also known as the roulette wheel selection. Assuming that \(f_i\) is the fitness of individual \(\textit{I}_i\) in the current population \(\mathbb {P}\), its probability of being selected to generate the next generation is:
where \(\mathbb {P}\) is the size of the population. The fitness value \(f_i\) of an individual \(\textit{I}_i\) in the proposed system was determined by adopting the linearranking algorithm (Baker (1985)) given as:
where \(r_i\) is the ranking position of individual \(I_i\) in the ordered population \(\mathbb {P}\), and max is the bias or selective pressure towards the fittest individuals in the population.
4.2.5 Reproduction
Once a number of parents are selected, they then breed some individuals for the next generation using the genetic operators crossover and mutation, as shown in Fig. 4. In particular, crossover swaps contiguous parts of the genes of two individuals. In this work, the singlepoint crossover approach is adopted, which swaps all data beyond this index point between the two parent chromosomes to generate two children. Note that the crossover point can only be between those genes which employed to represent two different fuzzy sets, such that all the fuzzy sets are valid and convex all the time during the reproduction process.
The second genetic operator mutation is used to maintain genetic diversity from one generation of an individual to the next, which is analogous to a biological mutation. Mutation alters one gene values in a chromosome from its initial state. A predefined mutation rate is used to control the percentage of occurrence of mutations. In this work, in order to make sure the resulted fuzzy sets are valid and convex, the constraint \(a_{sr}^1 \ge a_{sr}^2 \ge a_{sr}^3\) is applied to the genes representing each fuzzy set during the mutation operation. The newly bred individuals and some of the best individual in the current generation \(\mathbb {P}\) jointly form the next generation of the population (\(\mathbb {P}^{'}\)).
4.2.6 Iteration and termination
The selection and reproduction processes are iterated until the predefined maximum number of iterations is reached or the value of the objective function regarding an individual is less than a prespecified threshold. When the termination condition is satisfied, the fittest individual in the current population is the optimal solution.
5 Experimentation
Two nonlinear mathematical models and a wellknown realworld data set (KDD Cup 99 data set) are employed in this section to validate and evaluate the proposed system.
5.1 Experiment 1
A 3dimensional nonlinear function has been used as a benchmark by a number of projects, including the recent ones such as Bellaaj et al. (2013); Tan et al. (2016) and Li et al. (2017), which is reconsidered in this section for a comparative study. The problem is given below:
which takes two inputs, \(x_1\) (\(x_1 \in [10,30]\)) and \(x_2\) (\(x_2 \in [10,30]\)), and produces a single output \(y=f(x_1,x_2)\) (\(y \in [1,1]\)), as illustrated in Fig. 5a.
5.1.1 Rule base initialisation
In order to demonstrate the proposed TSK+ rule base generation approach, a sparse training data set \(\mathbb {T}\) was manually generated from Eq. 17. The data set is composed of 300 data points sparsely distributed within the \([10,30] \times [10,30]\) domain covering 57% of the input domain. The distribution of this training data set is illustrated in a 2dimensional plane in Fig. 5b. The key steps of TSK rule base initialisation using the training data set are summarised below.
Step 1 Dense subdata sets generation The sparse training data set \(\mathbb {T}\) was firstly divided into a number a dense subdata sets by applying the sparse KMeans clustering algorithm. In particular, the number of subdata sets was determined by the Elbow method as discussed in the Sect. 4.1.2. The performance improvements (PIs) against the incremented number clusters are listed in Table 1 and shown in Fig. 6. It is clear from the figure that the performance improvement decreases rapidly when k increased from 2 to 6, before it flattened out after 6. Therefore, 6 was taken as the number of clusters. The application of sparse KMeans led to 6 dense subdata sets as demonstrated in Fig. 7.
Step 2 Rule cluster generation Once the sparse training data set \(\mathbb {T}\) was divided into 6 dense subdata sets (\(T_1,T_2,\ldots \), \(T_6\)), the standard KMeans clustering algorithm was employed on each determined subdata set to group similar data points into rule clusters. The application of the Elbow approach led to a set of SSEs and PIs as listed in Table 2, which in turn determined the value of k for each subdata set as listed in Table 3.
Step 3 Rule base initialisation A rule was extracted from each cluster and all the extracted rules jointly initialised the rule base. The generated 28 TSK fuzzy rules are detailed in Table 4.
5.1.2 TSK fuzzy interpolation
Given any input, overlapped with any rule antecedent or not, the proposed TSK+ inference engine is able to generate an output. For instance, a randomly generated testing data point was \((A_1^*=(27.37, 27.37, 27.37)\), \(A_2^*=(13.56, 13.56, 13.56))\). The proposed TSK+ inference engine firstly calculated the similarity degrees between the given input and the antecedents of every rule (\(S(A_1^*,A_{r1}), S(A_2^*,A_{r2}), r=\{1,2,\ldots ,28\}\)) using Eq. 7, with the results listed in Table 5.
From the calculated similarity degrees, the firing strength of each rule \(FS_r\) was computed using Eq. 2, and the subconsequence from each rule was calculated using Eq. 4, which are also shown in Table 5. From this, the final output of the given input was calculated using Eq. 5 as \(y=\,0.902\).
5.1.3 Rule base optimisation
In order to achieve the optimal performance, the generated raw rule base was finetuned using the GA algorithm as detailed in Sect. 4.2. The GA parameters used in this experiment are listed in Table 6. The population was initialised as the individuals representing the raw rule base and its random variations. The performance against the number of iterations is shown in Fig. 8, which clearly demonstrates the performance improvements led by the GA.
5.1.4 Results comparison
In order to enable a direct comparative study with support of the approaches proposed in Bellaaj et al. (2013) and Tan et al. (2016), the proposed approach was also applied to 36 randomly generated testing data points. The sum of errors for the 36 testing data led by the proposed approach is shown in Table 7, in addition to those led by the compared approaches. The proposed TSK+ outperformed the approaches proposed in Bellaaj et al. (2013) and Tan et al. (2016), although less rules have been used. Another advantage of the proposed approach is that the number of rules led by the proposed system was determined automatically by the system without the requirement of any human inputs. In addition, noticeably, the optimisation process significantly improved the system performance by reducing the sum of error from 3.38 to 1.78.
5.2 Experiment 2
The proposed approach is not only able to deal with sparse or unevenly distributed data sets, but also able to work with dense data sets. In order to evaluate its ability in handling dense data sets, a twoinput and singleoutput nonlinear mathematical model expressed in Eq. 18 was employed as a test bed, given that it has been used by Evsukoff et al. (2002) and Rezaee and Zarandi (2010).
In this case, a randomly generated dense training data set was used, including 1681 data points distributed within the range of \([10,10] \times [10,10]\). By employing the proposed approach, 14 TSK fuzzy rules in total were generated. To allow a direct comparison, the meansquares error (MSE) was used as the measurement of models, by following the work presented in Evsukoff et al. (2002) and zRezaee and Zarandi (2010). The detailed calculations of this experiment are omitted here. The MSE values led by different approaches with the specified number of rules are listed in Table 8. The results demonstrated that the proposed system TSK+ performed competitively with only 14 TSK fuzzy rules.
5.3 Experiment 3
This section considers a wellknown realworld data set NSLKDD99 (Tavallaee et al. (2009)), which has been widely used as a benchmark (Bostani and Sheikhan (2017); Wang et al. (2010); Yang et al. (2017b)). This data set is a modified version of KDD Cup 99 data set generated in a military network environment. It contains 125,973 data instances with 41 attributes and 1 label which indicates the type of connection. In particular, normal connections and four types of attacks are labelled in the data set, including Denial of Service Attacks (DoS), User to Root Attacks(U2R), Remote to User Attacks (R2U) and Probes.
5.3.1 Data set preprocessing
The four most important attributes were selected from the original 41 using expertise knowledge in order to remove noises and redundancies in the work of Yang et al. (2017b). This experiment also took the four attributes as system input, and the application of automatic feature selection and reduction approaches, such as Zheng et al. (2015), remains as future work. The four selected input features are listed in Table 9. The data set in this experiment has also been normalised in an effort to reduce the potential noises in this realworld data set.
5.3.2 TSK+ model construction
As the labels are symbolic values, 0order TSKstyle fuzzy rules were used. In order to construct a 0order TSK rule base, the training data set was divided into 5 subdata sets based on the five symbolic labels, which are represented using five integer numbers. The sizes of the subdata sets and their corresponding integer labels are listed in Table 10. The rule base generation process is summarised in four steps below.
Step 1 Dense subdata set generation The sparse KMeans was applied on each subdata set \(\mathbb {T}_j, 1\le j\le 5\) to generate dense subdata sets. Taking the second subdata set \(\mathbb {T}_2\) as an example, the performance improvement led by the increment of \(k_2,\ (k_2\in \{1,2,\ldots ,10\})\) is shown in Fig. 9. Following the Elbow approach, 3 was selected as the number of clusters, i.e. \(k_2 = 3\). Denote the 3 generated dense subsets as (\(T_{21},T_{22},T_{23}\)).
Step 2 Rule cluster generation The standard KMeans clustering algorithm was applied on \(T_{ji}\) (\(j \in \{1,2, \ldots , 5\}\) and i ranging from 1 to the determined cluster numbers using the Elbow approach), to generate rule clusters each representing a rule. Again, take \(\mathbb {T}_2\) as an example for illustration. The determined cluster numbers \(k_{2i}\) for each dense subdata set \(T_{2i}\) are shown in the third column of Table 11. Then, 13 rule clusters were generated from the subdata set \(\mathbb {T}_{2}\). The rule cluster generation process for other dense subdata sets is not detailed here, but the generated rule cluster or the given training data set is summarised in Table 11.
Step 3 Raw rule base generation A rule is extracted from each generated rule cluster. Taking rule cluster \(RC_{12}\) as an example, which is the first determined rule cluster in \(T_{21}\), the corresponding rule \(R_{12}\) can be extracted. In particular, the consequence of rule \(R_{12}\) is the integer number representing the class of connections; and the rule antecedents are four triangle fuzzy sets (\(A_{(12)1},\ A_{(12)2},\ A_{(12)3},\ A_{(12)4}\)) led by the approach discussed in Sect. 4.1.4. The generated rule \(R_{12}\) is:
According to Table 11, 46 rules in total were generated to initialise the rule base. The detailed initialised rule base can be found in Appendix I.
Step 4 Rule base optimisation The GA was applied to optimise the membership functions of the fuzzy sets involved in the extracted fuzzy rules. The same GA parameters used in Experiment 1 as listed in Table 6 were also used in this example, but the number of iterations was increased to 20,000. The system performance against the number of iterations used in GA is shown in Fig. 10. The optimised rule base is attached in Appendix II.
5.3.3 TSK+ model evaluation
Once the TSK fuzzy rule base was generated, it can then be used for connection classification by the proposed TSK+ inference engine. The rule base and the inference engine TSK+ jointly formed the fuzzy model, which was validated and evaluated using a testing dataset. The testing data set contains 22,544 data samples provided by Tavallaee et al. (2009). The testing data set was also extracted from original KDD Cup 99 data set, but it does not share any data instance with the training data set NSLKDD99.
Note that the testing data set has been used in a number of projects with different classification approaches. In particular, decision tree, naive Bayes, backpropagation neural network (BPNN), fuzzy clusteringartificial neural network (FCANN) have been employed in Wang et al. (2010), and modified optimumpath forest (MOPF) was applied in Bostani and Sheikhan (2017). The accuracy of the classification results for each class of network traffic generated by different approaches including the proposed one with the initialised rule base and the optimised rule base is listed in Table 12.
The results show that the proposed TSK+ overall outperformed all other approaches; and the proposed rule base optimisation approach significantly improved the system performance by over 10% on average. In particular, the proposed system achieved better accuracies on the prediction of normal connections, DoS and R2U than those of all other approaches; worse performance led by the proposed approach in the class of U2R compared to FCANN and MOPF; and similar result was generated for class Probes with the existing best performance resulted from other approaches.
6 Conclusion
This paper proposed a fuzzy inference system TSK+, which extended the conventional TSK inference system such that it is also applicable to sparse rule bases and unevenly distributed rule bases. This is achieved by allowing the interpolation and extrapolation of the output from all rules even if the given input does not overlap with any rule antecedents. This paper also proposed a novel datadriven rule base generation approach, which is workable with spare data sets, dense data set, and unevenly distributed data sets. The system was validated and evaluated by applying two benchmark problems and one realworld data set. The experimental results demonstrated the wide applicability of the proposed system with compact rule bases and competitive performances.
The proposed system can be further enhanced in the following areas. Firstly, the sparsityaware possibilistic clustering algorithm (Xenaki et al. (2016)) was designed to work with sparse data sets, and thus, it is desired to investigate how this clustering algorithm can support the proposed TSK+ inference system. Secondly, it is worthwhile to study how the proposed sparse rule base generation approach can help in generating Mamdanistyle fuzzy rule bases. Finally, it would be interesting to define the sparsity or density of a data set such that more accurate clustering results can be generated during rule cluster generation.
References
Baker JE (1985) Adaptive selection methods for genetic algorithms. In: Proceedings of an international conference on genetic algorithms and their applications, Hillsdale, New Jersey, pp 101–111
Bellaaj H, Ketata R, Chtourou M (2013) A new method for fuzzy rule base reduction. J Intell Fuzzy Syst. 25(3):605–613
Bostani H, Sheikhan M (2017) Modification of supervised opfbased intrusion detection systems using unsupervised learning and social network concept. Pattern Recognit 62:56–72
Chang PC, Fan CY (2008) A hybrid system integrating a wavelet and tsk fuzzy rules for stock price forecasting. IEEE Trans Syst Man, and Cybern Part C Appl Rev 38(6):802–815
Chen MY, Linkens D (2004) Rulebase selfgeneration and simplification for datadriven fuzzy models. Fuzzy Sets and Syst 142:243–265
Chen S, Hsin W (2015) Weighted fuzzy interpolative reasoning based on the slopes of fuzzy sets and particle swarm optimization techniques. IEEE Trans Cybern 45(7):1250–1261
Chen SJ, Chen SM (2003) Fuzzy risk analysis based on similarity measures of generalized fuzzy numbers. IEEE Trans Fuzzy Syst 11(1):45–56
Evsukoff A, Branco AC, Galichet S (2002) Structure identification and parameter optimization for nonlinear fuzzy modeling. Fuzzy Sets and Syst 132(2):173–188
Huang Z, Shen Q (2006) Fuzzy interpolative reasoning via scale and move transformations. IEEE Trans Fuzzy Syst 14(2):340–359
Huang Z, Shen Q (2008) Fuzzy interpolation and extrapolation: A practical approach. IEEE Trans Fuzzy Syst 16(1):13–28
Johanyák ZC, Kovács S (2005) Distance based similarity measures of fuzzy sets. In: Proceedings of SAMI 2005
Kerk YW, Pang LM, Tay KM, Lim CP (2016) Multiexpert decisionmaking with incomplete and noisy fuzzy rules and the monotone test. In: 2016 IEEE international conference on fuzzy systems (FUZZIEEE), pp 94–101
Kóczy L, Hirota K (1993) Approximate reasoning by linear rule interpolation and general approximation. Int J Approx Reason 9(3):197–225
Kóczy L, Hirota K (1997) Size reduction by interpolation in fuzzy rule bases. IEEE Trans Syst Man Cybern Part B Cybern 27(1):14–25
Kodinariya TM, Makwana PR (2013) Review on determining number of cluster in kmeans clustering. Int J 1(6):90–95
Li J, Qu Y, Shum HPH, Yang L (2017) TSK inference with sparse rule bases. Springer International Publishing, Cham, pp 107–123
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA., pp 281–297
Mamdani EH (1977) Application of fuzzy logic to approximate reasoning using linguistic synthesis. Comput IEEE Trans C 26(12):1182–1191
Mucientes M, Vidal JC, Bugarín A, Lama M (2009) Processing time estimations by variable structure tsk rules learned through genetic programming. Soft Comput 13(5):497–509
Naik N, Diao R, Shen Q (2014) Genetic algorithmaided dynamic fuzzy rule interpolation. In: IEEE International Conference on Fuzzy Systems (FUZZIEEE) 2014, pp 2198–2205
Nandi AK, Klawonn F (2007) Detecting ambiguities in regression problems using tsk models. Soft Comput 11(5):467–478
Negnevitsky M (2005) Artificial intelligence: a guide to intelligent systems. Pearson Education, London
Rezaee B, Zarandi MF (2010) Datadriven fuzzy modeling for TakagiSugenoKang fuzzy system. Inf Sci 180(2):241–255
Shen Q, Yang L (2011) Generalisation of scale and move transformationbased fuzzy interpolation. JACIII 15(3):288–298. https://doi.org/10.20965/jaciii.2011.p0288
Takagi T, Sugeno M (1985) Fuzzy identification of systems and its applications to modeling and control. Syst Man and Cybern IEEE Trans SMC 15(1):116–132
Tan Y, Li J, Wonders M, Chao F, Shum HPH, Yang L (2016) Towards sparse rule base generation for fuzzy rule interpolation. In: IEEE world congress on computation intelligence internation conference
Tavallaee M, Bagheri E, Lu W, Ghorbani A (2009) A detailed analysis of the kdd cup 99 data set. In: The second IEEE symposium on computational intelligence for security and defence applications
Wang G, Hao J, Ma J, Huang L (2010) A new approach to intrusion detection using artificial neural networks and fuzzy clustering. Exp Syst Appl 37(9):6225–6232
Witten DM, Tibshirani R (2010) A framework for feature selection in clustering. J Am Stat Assoc 105(490):713–726
Xenaki SD, Koutroumbas KD, Rontogiannis AA (2016) Sparsityaware possibilistic clustering algorithms. IEEE Trans Fuzzy Syst 24(6):1611–1626
Yang L, Shen Q (2011) Adaptive fuzzy interpolation. Fuzzy Syst IEEE Trans 19(6):1107–1126
Yang L, Shen Q (2013) Closed form fuzzy interpolation. Fuzzy Sets and Syst 225:1–22 theme: Fuzzy Systems
Yang L, Chao F, Shen Q (2017) Generalized adaptive fuzzy rule interpolation. IEEE Trans Fuzzy Syst 25(4):839–853
Yang L, Li J, Fehringer G, Barraclough P, Sexton G, Cao Y (2017) Intrusion detection system by fuzzy interpolation. In: 2017 IEEE international conference on fuzzy systems
Yang L, Zuo Z, Chao F, Qu Y (2017) Fuzzy rule interpolation and its application in control. In: Ramakrishnan S (ed) Modern Fuzzy Control Systems and Its Applications, InTech, Rijeka
Zheng L, Diao R, Shen Q (2015) Selfadjusting harmony searchbased feature selection. Soft Comput 19(6):1567–1579
Acknowledgements
This work is supported by a Northumbria University PhD scholarship, the National Natural Science Foundation of China (No. 61502068) and the China Postdoctoral Science Foundation (Nos. 2013M541213 and 2015T80239).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by P. Angelov, F. Chao.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Li, J., Yang, L., Qu, Y. et al. An extended Takagi–Sugeno–Kang inference system (TSK+) with fuzzy interpolation and its rule base generation. Soft Comput 22, 3155–3170 (2018). https://doi.org/10.1007/s0050001729258
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0050001729258
Keywords
 Fuzzy inference system
 TSK
 Fuzzy rule base generation
 Fuzzy interpolation