Improving fuzzy rule interpolation performance with information gainguided antecedent weighting
Abstract
Fuzzy rule interpolation (FRI) makes inference possible when dealing with a sparse and imprecise rule base. However, the rule antecedents are commonly assumed to be of equal significance in most FRI approaches in the implementation of interpolation. This may lead to a poor performance of interpolative reasoning due to inaccurate or incorrect interpolated results. In order to improve the accuracy by minimising the disadvantage of the equal significance assumption, this paper presents a novel inference system where an information gain (IG)guided fuzzy rule interpolation method is embedded. In particular, the rule antecedents in FRI are weighted using IG to evaluate the relative importance given the consequent for decision making. The computation of antecedent weights is enabled by introducing an innovative reverse engineering process that artificially converts fuzzy rules into training samples. The antecedent weighting scheme is integrated with scale and move transformationbased interpolation (though other FRI techniques may be improved in the same manner). An illustrative example is used to demonstrate the execution of the proposed approach, while systematic comparative experimental studies are reported to demonstrate the potential of the proposed work.
Keywords
Fuzzy rule interpolation Antecedent weighting Reverse engineering1 Introduction
Fuzzy set theory (Zadeh 1965) has gained rapid developments in a variety of scientific areas, including mathematics, engineering, and computer science. It has been successfully applied for many realworld problems, such as systems control, fault diagnosis and computer vision, as an effective tool to address the issues of imprecision and vagueness in modelling and reasoning. In particular, fuzzy expert systems have been developed using the idea of linguistic reasoning (also known as approximate reasoning), which reflects the way of cogitation of human beings and leads to new, more human, intelligent systems.
In general, an approximate reasoning system can be formalised as a fuzzy if–then rulebased inference mechanism that derives a conclusion given an input observation. Various techniques have been established to implement generalised modus ponens that facilitates reasoning when provided with imprecise inputs, mostly by following the basic idea of Compositional Rule of Inference (CRI) (Zadeh 1973). However, CRI is unable to draw a conclusion when a rule base is not dense but sparse. Sparse rule bases considered here are not referring to the quantity of rules in a given rule base, but the domain coverage of the antecedents of rules in the universe of discourse. That is, an input observation may have no overlap with any of the rules available and hence, no rule may be executed to derive the required consequent by applying CRI.
Fuzzy rule interpolation (FRI) (Kóczy and Hirota 1993a, b) plays a significant role in such sparse fuzzy rulebased reasoning systems. It addresses the limitation of conventional fuzzy reasoning that only uses CRI to perform inference, where the antecedents of all the rules within a given rule base cannot cover the whole problem domain. An estimation is able to be made by computing an interpolated consequent for the observation which has no rules matched.
A number of FRI methods have been proposed and improved in the literature (Hsiao et al. 1998; Chang et al. 2008; Huang and Shen 2006; Yang and Shen 2011; Yang et al. 2017; Jin et al. 2014). However, common approaches assume that the rule antecedents involved are of equal significance while searching for rules to implement interpolation. This can lead to inaccurate or incorrect interpolative results. This is because for many application of (fuzzy) decision systems, the decision is typically reached by an aggregation of conditional attributes, with each attribute making a generally different contribution to the decision making process. Weighted FRI methods (Diao et al. 2014) have therefore been introduced to remedy this equal significance assumption. For example, a heuristic method based on Genetic Algorithm is applied to learn the weights of rule antecedents (Chen and Chang 2011), but this leads to a substantial increase in computation overheads. An alternative work is to subjectively predefine the weights on the antecedents of the rules by experts, but this may restrict the adaptivity of the rules and, therefore, the flexibility of the resulting fuzzy system (Li et al. 2005).
In order to assess the relative significance of attributes with regard to the decision variable, information gain has been commonly utilised in datadriven learning algorithms (Mitchell 1997). By observing the property of information gains, this paper presents an innovative approach for rule interpolation. Information gain is integrated within an FRI process to estimate the relative importance of rule antecedents in a given rule base. The required information gains are estimated using an artificially generated decision table through a reverse engineering process which converts a given sparse rule base into a training data set. The proposed work helps minimise the disadvantage of the equal significance assumption made in common FRI techniques, thereby improving the performance of FRI. In particular, the paper presents an information gainguided FRI method based on the popular scale and move transformationbased FRI (TFRI) (Huang and Shen 2006). However, alternative FRI techniques may be employed for the same purpose if preferred.
The remainder of this paper is structured as follows. Section 2 outlines the background work that is required for the present development, including TFRI, the basic concepts of information gain, and a simple iterative rule induction method (for providing the initial rule base). Section 3 describes the proposed information gainguided fuzzy rule interpolation approach, with a case study illustrating its execution process. Section 4 details the results of comparative experimental evaluations, supported by statistical tests and analysis. Finally, Sect. 5 concludes the paper and points out several further studies.
2 Background work
2.1 Transformationbased FRI
An FRI system can be defined as a tuple \(\langle R,Y \rangle \), where \(R = \{r^1,r^2,\ldots ,r^N\}\) is a nonempty set of finite fuzzy rules (the rule base), and Y is a nonempty finite set of variables (interchangeably termed attributes). \(Y = A \cup \{z\}\) where \(A = \{a_jj=1,2,\ldots ,m\}\) is the set of antecedent variables, and z is the consequent variable appearing in the rules. Without losing generality, a given rule \(r^i \in R\) and an observation \(o^*\) can be expressed in the following format:
\(r^i\): if \(a_1\) is \(A_1^i\) and \(a_2\) is \(A_2^i\) and \(\cdots \) and \(a_m\) is \(A_m^i\), then z is \(z^i\)
\(o^*\): \(a_1\) is \(A_1^*\) and \(a_2\) is \(A_2^*\) and \(\cdots \) and \(a_m\) is \(A_m^*\)
where \(A_j^i\) represents the value (or fuzzy set) of the antecedent variable \(a_j\) in the rule \(r^i\), and \(z^i\) denotes the value of the consequent variable z in \(r^i\).
Once the distances between a given observation and all rules in the rule base are calculated, the n rules which have minimal distances are chosen as the closest n rules with respect to the observation. In most applications of TFRI, n is taken to be 2. The selection of the n closest rules sets up the basis upon which to construct a socalled intermediate rule \(r^{\prime }\). This construction process computes intermediate antecedent fuzzy sets \(A^{\prime }_j,j=1,2,\ldots ,m\), and an intermediate consequent fuzzy set \(z^{\prime }\), resulting in an artificially created rule:
\(r^{\prime }\) : if \(a_1\) is \(A_1^{\prime }\) and \(a_2\) is \(A_2^{\prime }\) and \(\cdots \) and \(a_m\) is \(A_m^{\prime }\), then z is \(z^{\prime }\)
which is in effect a weighted aggregation of the n selected closest rules.
Then, the antecedent values of the intermediate rule are transformed through a process of scale and move modification such that they become the corresponding parts of the observation, recording the transformation factors \(s_{A_j}\) and \(m_{A_j}, j=1,2,\ldots ,m\) for each antecedent that are calculated. Finally, the interpolated consequent is obtained by applying the recorded factors to the consequent variable of the intermediate rule. This in effect implements fuzzy or generalised modus ponens.
The above process of scale and move transformations in an effort to interpolate the consequent variable can be summarised in Fig. 2, which can be collectively and concisely represented by: \(z^* = T(z^{\prime },s_z,m_z)\), highlighting the importance of the two key transformations required. The detailed computation involved in TFRI can be referred to the original work (Huang and Shen 2006, 2008).
2.2 Information gain
Information gain has been widely adopted in the development of learning classifier algorithms, to measure how well a given attribute may separate the training examples according to the underlying classes (Mitchell 1997). It is defined via the entropy metric in information theory (Shannon 2001), which is commonly used to characterise the disorder or uncertainty of a system.
From the perspective of entropy evaluation over U, the second part of Eq. (6) shows that the entropy is measured via weighted entropies that are calculated over the partition of O using the attribute \(a_k\). The bigger the value of information gain \(IG(O,a_k)\), the better the partitioning of the given examples with \(a_k\). Obtaining a high information gain, therefore, implies achieving a significant reduction of entropy or uncertainty caused by considering the influence of that attribute.
2.3 Iterative rule base generation
A datadriven rule base learning mechanism intuitively extracts rules from raw data to generate a rule base, which are in the format of antecedents associated with a corresponding consequent (Wang and Mendel 1992; Hong and Lee 1996). Rule base generation can also follow an iterative procedure (Hoffmann 2004; Galea and Shen 2006) to incrementally add new rules to the rule base. This section outlines an iterative rule base generation procedure, which repeatedly sequentially extracts rules from data into an emerging rule base.
Before the iterative procedure is executed to generate the rule base, the domains of all r antecedent attributes and the consequent attribute are quantified evenly into \(m_1, m_2, \ldots , m_r\) and \(m_c\) fuzzy regions, respectively, where \(m_c\) denotes the number of regions for the consequent attribute. Each fuzzy region is assigned with a membership function (implemented with triangular membership functions in this work for simplicity). This results in a division of fuzzy region space of the antecedent of an emerging rule in the form of a hypercube, of which each hypergrid stands for a combination of particular fuzzy regions of the r antecedent attributes.
The iteration process begins with the complete data set of instances D. A hypergrid hit by an instance indicates the largest value of membership is obtained for the corresponding combination of fuzzy regions. The hypergrid which is most covered by the instances in D receives the most hits amongst all. As indicated above, the threshold \(\delta \) is used to determined whether the most covered hypergrid can form a rule and be added into the rule base R. If the number of the highest hits is larger than the threshold, a rule is extracted from this hypergrid.
The rule antecedent values returned by this iteration are those fuzzy values associated with the corresponding hypergrid. The rule consequent adopts the fuzzy value which corresponds to one of the \(m_c\) values at which the instances have the highest number of hits. After this, those instances hit in this hypergrid are removed from the original data set, and the iterative process repeats by treating the remaining data as the input data set to start the next round for the generation of the rules following the current one. However, if the proportion of hit instances is less than \(\delta \), a rule cannot be generated by this hypergrid because those small number of hits may just be due to noise, and the iterative procedure is hence terminated.
This simple iterative rule generation procedure will be used to learn a rule base to construct the inference system proposed in Sect. 3 (assuming no rules are provided by domain experts). If the generated rule base is dense, any standard fuzzy rule inference technique (e.g., compositional rule of inference (CRI)) can be employed to perform classification once a new input observation is provided. Otherwise, the observation is used as the input to the fuzzy rule interpolation process if it does not match any learned rules. Of course, if it matches a certain rule in the space rule base, CRI will be used as usual.
3 Antecedent weighted TFRI
3.1 Illustrative case
To illustrate the proposed work, a simple fuzzy classification problem (Yuan and Shaw 1995) is utilised here, involving a small set of training data of 16 instances. The system is set to make a decision on what sports activity to be undertaken (namely, volleyball, swimming and weight lifting) given the status of four conditional attributes regarding the weather, in terms of temperature (hot, mild and cool), outlook (sunny, cloudy and rain), humidity (humid and normal) and wind (windy and not windy).
 1.
If Temperature is Hot and Outlook is Sunny, then Swimming.
 2.
If Temperature is Hot and Outlook is Cloudy, then Swimming.
 3.
If Outlook is Rain, then Weight lifting.
 4.
If Temperature is Mild and Wind is Windy, then Weight lifting.
 5.
If Temperature is Mild and Wind is Not Windy, then Volleyball.
 6.
(If Temperature is Cool, then Weight lifting.)
3.2 Turning rules into training data via reverse engineering
Given a rule base, the proposed information gainguided TFRI begins with a reverse engineering procedure which converts the rules into a set of artificial training samples, forming a decision table for the calculation of required information gains. This development is based on the examination of how TFRI performs its task. Its first key stage is the selection of n closest fuzzy rules when an observation is presented (which does not match with any existing rule in the sparse rule base and hence, CRI is not applicable).
In conventional TFRI algorithms, all antecedent attributes of the rules are assumed to be of equal significance while searching for a subset of rules closest to the observation since the original approaches are unable to assess, nor to make use of, the relative importance or ranking of these antecedent attributes. Information gain offers such an intuitively sound and implementationwise straightforward mechanism for evaluating the relative significance of attributes.
The question is what data are available to act as the learning examples for computing the information gains. TFRI works with a sparse rule base. When an observation is given, it is expected to produce an interpolated result for the consequent variable. Without losing generality, it is practically presumed that there is no sufficient example data available for use to support the computation of the required information gains due to the sparseness of domain knowledge. However, any TFRI method does use a given sparse rule base involving a set of antecedent variables \(Y=A \cup \{z\}\) (as shown in Sect. 2.1). This set of rules can be translated into an artificial decision table (i.e., a set of artificially generated training examples), where each row represents a particular rule. In any datadriven learning mechanism, rules are learned from given data samples. Translating rules back to data is therefore a reverse engineering process of datadriven learning.

Identifying all possible antecedent variables appearing in the rules and all value domains for these variables, and

Expanding iteratively each existing rule into one which involves all domain variables such that if a certain antecedent variable is not originally involved in a rule, then that rule is replaced by q rules, with q being the cardinality of the value domain of that variable, so that the variable within each of the expanded rule takes one possible and different value from its domain.
Rule base in illustrative case
Rules  Variables  

Temperature  Outlook  Humidity  Wind  Decision  
\(r^1\)  Hot  Sunny  –  –  Swimming 
\(r^2\)  Hot  Cloudy  –  –  Swimming 
\(r^3\)  –  Rain  –  –  Weight lifting 
\(r^4\)  Mild  –  –  Windy  Weight lifting 
\(r^5\)  Mild  –  –  Not windy  Volleyball 
The above procedure makes logical sense. This is because for any rule, if a variable is missing from the rule antecedent, it means that it does not matter what value it takes and the rule will lead to the same consequent value, provided that those variables that do appear in the rule are satisfied.
Given the rule base of Sect. 3.1 which may be reformulated as given in Table 1. Following the twostep procedure, 32 training data are generated as listed in Table 9 in “Appendix A”. The reverse engineering process can be explained using the illustrative case. Without losing generality, assume that the first given rule is used to create the artificial data first. Then, part of the emerging artificial decision table is constructed from this rule first. Note that Humidity and Wind are missing in Rule 1, which means if Temperature is satisfied with the value Hot and Outlook with Sunny, the rule is satisfied and thus, the consequent variable Decision will have the value of Swimming no matter which value Humidity and Wind takes. That is, Rule 1 can be expanded by the first four data in Table 9, each having the variable Humidity and Wind taking one of its two possible values. Similarly, more artificial data can be created by translating and expanding the remaining original rules.
Comparing both the antecedent values and the consequent in Table 9, it can be seen that there are several identical samples which are generated from different original rules. Retaining one of them results in a total of 30 training data. Note that in such an artificially constructed decision table, it may appear to include inconsistent data since they may have the same values for the respective antecedent attributes but different consequents (e.g., two inconsistent pairs are italicised in Table 9). This does not matter as the eventual rulebased inference, including rule interpolation does not use these artificially generated rules, but the original sparse rule base. They are created just to help assess the relevant significance of individual variables through the estimation of their respective information gains. It is because there are variables which may lead to potentially inconsistent implications in a given problem that it is possible to distinguish the different abilities of the variables in possessing the power in influencing the consequent. This in turn enables the measuring of the information gains of individual antecedent variables as described below.
3.3 Weighting of individual variables
Weighted decision table with information gain calculated for each antecedent variable
Rules  Variables  

\(a_1\)  \(a_2\)  \(\cdots \)  \(a_m\)  z  
\(r^1\)  \(A_1^1\)  \(A_2^1\)  \(\cdots \)  \(A_m^1\)  \(z^1\) 
\(r^2\)  \(A_1^2\)  \(A_2^2\)  \(\cdots \)  \(A_m^2\)  \(z^2\) 
\(\vdots \)  \(\vdots \)  \(\vdots \)  \(\ddots \)  \(\vdots \)  \(\vdots \) 
\(r^N\)  \(A_1^N\)  \(A_2^N\)  \(\cdots \)  \(A_m^N\)  \(z^N\) 
Weight  \(IG_1\)  \(IG_2\)  \(\cdots \)  \(IG_m\) 
Normalised information gains calculated using 30 training samples
Antecedent  Temperature  Outlook  Humidity  Wind 

Normalised IG  0.5000  0.4515  0.0000  0.0485 
Observation in illustrative example
Antecedent attribute  Temperature  Outlook  Humidity  Wind  

Observed value  0.91  0.42  0.5  0.51  
Membership value  Hot  Mild  Cool  Sunny  Cloudy  Rain  Humid  Normal  Windy  Not windy 
0.0  0.0  0.775  0.0  0.733  0.0  0.5  0.5  0.49  0.51 
3.4 Weighted TFRI
Given the weights associated with the rule antecedent attributes TFRI can be modified. Such modification will involve three key stages as detailed below.
3.4.1 Weightguided selection of n closest rules
Choosing the n closest rules this way allows those rules which involve certain antecedent variables that are regarded more significant to be selected with priority. Note that the normalisation term \(\frac{1}{\sum _{t=1}^m IG_t^2}\) is a constant and, therefore, can be omitted in computation since the purpose of calculating the distance \(\tilde{d}(r^p,o^*)\) is in order to rank the rules and only information on the relative distance measures is required.
To continue illustration with the case study, suppose that the membership functions used to describe the antecedent and consequent variables are defined as given in Fig. 5 of “Appendix B”. Also, suppose that the observation of Table 4 (involving only singleton fuzzy sets) is presented, resulting in the membership values for the observation as shown in the bottom of row of Table 4. This does not match with any of the rules in the sparse rule base. Thus, no rule in the sparse rule base can be fired directly and FRI is applied to derive a conclusion. Both the information gainguided TFRI (IGTFRI) and the original TFRI are employed here for comparison. Given the rule base and the observation, the 2 closest rules selected by TFRI and those by IGTFRI are different, with Rules 4 and 5 and Rules 3 and 5 are selected by TFRI and IGTFRI, respectively.
3.4.2 Weighted parameters for intermediaterule construction
3.4.3 Weighted transformation
Return to the illustrative case, applying the above improved TFRI with weighted parameters to the example leads to the following intermediate rule using Rules 3 and 5:
If Temperature is (0.78,0.91,1.03) and Outlook is (0.31,0.47,0.47) and Humidity is (0.50,0.50,0.50) and Wind is (0.20,0.66,0.66), then Decision is (2.49,2.49,2.49)
Differently, the intermediate rule created by the two closest rules, Rules 4 and 5 using TFRI is:
If Temperature is (0.61,0.91,1.21) and Outlook is (0.42,0.42,0.42) and Humidity is (0.50,0.50,0.50) and Wind is (0.01,0.51,1.01), then Decision is (2.51,2.51,2.51)
Given the simplified case where observations are all singleton fuzzy sets, the above intermediate results imply that the final interpolated result with IGTFRI is \(\tilde{z^*}=(2.49,2.49,2.49)\), using the IGguided transformation \(T(\tilde{z^{\prime }}=(2.49,2.49,2.49),\tilde{s_z}=0,\tilde{m_z}=0)\), and that the result with the standard TFRI is \(z^*=(2.51,2.51,2.51)\), using a transformation of \(T(z^{\prime }=(2.51,2.51,2.51),s_z=0,m_z=0)\). From this, through defuzzification (to obtain a classification result), the conclusions drawn by the use of these two different methods are Weight lifting and playing Volleyball, respectively. Clearly, the outcome of applying IGTFRI has a better intuitive appeal given the particular observation. Indeed, recall the original rule base for this illustrative case given in (Yuan and Shaw 1995), the observation used for illustration actually matches Rule 6 (i.e., the one purposefully removed to form a sparse rule base). This results in the same decision if fired as the interpolated consequent derived by the proposed IGTFRI method.
The workflow of the construction of the intermediate rule and of the computation of the interpolative results for both methods is outlined in Fig. 6 in “Appendix C”.
This illustrative case is very simple, involving only a small number of instances and a rather specific rule base. It is therefore not surprising that similar interpolated values may result by the use of either the original TFRI or the proposed IGTFRI. Even though the above still demonstrates the strength of the proposed approach, the following section will systematically evaluate such strength using more complicated datasets.
4 Experimental evaluation
This section presents a systematic experimental evaluation of the proposed inference system, where the information gainguided TFRI approach is embedded. The work is assessed for the task of performing pattern classification over nine benchmark datasets. Classification results are compared with those obtained by the original TFRI method and also, with the standard Mamdani inference (Mamdani and Assilian 1999) without involving rule interpolation but directly firing those (possibly partially) matched rules. In addition, a statistical analysis is utilised to further evaluate the performance of the proposed approach over the original TFRI.
4.1 Experimental setup
4.1.1 Datasets
Datasets used
Dataset  Attributes #  Classes #  Instances # 

Iris  4  3  150 
Diabetes  8  2  768 
Phoneme  5  2  5404 
Appendicitis  7  2  106 
Magic  10  2  1902 
NewThyroid  5  3  215 
Banana  2  2  5300 
Haberman  3  2  306 
Monk2  6  2  432 
4.1.2 Experimental methodology
Average classification accuracy (%) and standard deviation with 10 \(\times \) 10 fold crossvalidation
Dataset  CRI  TFRI  IGTFRI 

Iris  66.66 ± 0.25  76.99 ± 0.16*  82.53 ± 0.13* 
Diabetes  32.10 ± 0.08  62.50 ± 0.06*  68.49 ± 0.05* 
Phoneme  38.40 ± 0.09  60.53 ± 0.05*  66.18 ± 0.07* 
Appendicitis  32.27 ± 0.10  57.72 ± 0.12*  69.69 ± 0.13* 
Magic  49.15 ± 0.05  58.40 ± 0.09*  64.67 ± 0.05* 
NewThyroid  43.33 ± 0.28  47.43 ± 0.24*  53.28 ± 0.22* 
Banana  44.83 ± 0.08  60.49 ± 0.05*  63.27 ± 0.04* 
Haberman  54.00 ± 0.09  71.73 ± 0.08*  77.47 ± 0.07* 
Monk2  32.63 ± 0.05  60.01 ± 0.11*  63.31 ± 0.06* 
Average  43.70 ± 0.12  61.75 ± 0.11  67.65 ± 0.09 
4.2 Results and discussion
4.2.1 Comparison on overall classification accuracy
Table 6 shows the classification performance over the nine datasets, measured with the average accuracy and the standard deviation (SD) through a process of 10 \(\times \) 10 crossvalidation. In particular, the column of CRI presents the results obtained using the compositional rule of inference directly by firing those matched rules only; the TFRI column shows the results obtained by the use of the original TFRI; and the IGTFRI column summaries the results obtained using the information gainguided TFRI approach. A pairwise t test (\(p=0.05\)) validates the experimental evaluation furthermore. Note that the asterisk (*) after a result in the column TFRI indicates that the improvement made by the original TFRI over CRI is statistically significant, and similarly the asterisk (*) in the IGTFRI column shows that the improvement made by IGTFRI is in turn, statistically significant over TFRI.
Confusion matrix of TFRI on diabetes dataset by averaging \(10\times 10\) crossvalidation
Classified  

Positive  Negative  
Actual  
Positive  9.5  17.6 
Negative  11.2  38.5 
Confusion matrix of IGTFRI on Diabetes dataset by averaging \(10\times 10\) crossvalidation
Classified  

Positive  Negative  
Actual  
Positive  17.2  9.9 
Negative  14.3  35.4 
4.2.2 Comparison on false negatives and false positives
Apart from the classification accuracies, in many realworld applications, it is worth to examine the statistical rates on true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). Without overly complicating the experimental investigation while having a focused discussion, the Diabetes dataset, as a binary classification problem, is selected for this comparison. Tables 7 and 8 show the confusion matrices computed by the use of the original TFRI and that of IGTFRI, respectively. ‘Positive’ in both tables is interpreted as an instance in which a person is diagnosed to have diabetes. The numbers shown in both tables are computed by averaging the results obtained in \(10\times 10\) crossvalidation.
First of all, recall the results shown in Table 6, the classification accuracy of TFRI is 62.5%, which is improved to 68.49% using IGTFRI. As can be seen from comparing Tables 7 and 8, as the classification precision increases with the use of IGTFRI, the rate of FN reduces significantly from 64.94 to 36.53% [where the false negative rate is calculated by FN/(TP \(+\) FN)]. This makes a great sense in performing medical diagnosis since the rate of missing disease detection (i.e., the proportion of the disease tested as not present when it is really present) is reduced. Although the number of FP is slightly increased, the diagnostic sensitivity (true positive rate) has raised significantly also, with 28.41% in average. This promising result clearly indicates considerable improvement on the decisions made by the use of IGTFRI.
5 Conclusion
This paper has presented a novel fuzzy rulebased inference system to address the situation when the rule base is sparse. The proposed information gainguided fuzzy rule interpolation approach is embedded in this system, where the rule antecedent variables are weighted via computing the information gains. In particular, the computation is enabled through an innovative reverse engineering procedure which converts fuzzy rules into training samples. The proposed method is illustrated by a case study with a small data set and is systematically evaluated by solving benchmark classification problems over nine datasets. The experimental results have confirmed that the relative significance of the individual rule antecedent variables can indeed be captured by the information gains, forming the weights on the variables to guide FRI. This remarkably improves the performance of the interpolative reasoning, thanks to the exploitation of the information gains in differentiating the significances of different antecedent variables.
While very promising, much can be done to further improve this proposed work. The present implementation assumes the use of a datadriven rule learning mechanism that converts a given dataset into rules, with a simple fuzzification procedure. The size of the rule base may be very large due to a large dataset. Any other rule induction techniques (e.g., those reported in (Janikow 1998; Afify 2016)) that may be used as an alternative to generate a more compact rule base would be helpful, improving the performance of the interpolation method further. With the introduction of information gain in support of weighted rule interpolation, there may be an additional computation overhead overall as compared to the use of the original TFRI algorithm. An experimental analysis of the runtime expense, in comparison with TFRI, forms another piece of interesting further work. Finally, the current approach assumes a fixed (sparse) rule base. However, having run the process of rule interpolation, intermediate fuzzy rules are generated. These can be collected and refined to form additional rules to support subsequent inference, thereby enriching the rule base and avoiding unnecessary interpolation afterwards (Naik et al. 2017).
Notes
Acknowledgements
The first author is grateful to the China Scholarship Council and Aberystwyth University for their support in this research. The authors would like to thank the reviewers of the original version of this paper that was presented at the 16th UK Workshop on Computational Intelligence, 2016; their constructive comments have helped improve this work significantly, leading to it receiving one of the two best paper awards at the Workshop.
Funding This study was partly funded by the National Key Research and Development Program of China (Grant No. 2016YFB0502502).
Compliance with ethical standards
Conflict of interest
All authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
References
 A Asuncion DN (2007) UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets.html
 Afify AA (2016) A fuzzy rule induction algorithm for discovering classification rules. J Intell Fuzzy Syst 30(6):3067–3085CrossRefGoogle Scholar
 Alcalá J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2010) Keel datamining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2–3):255–287Google Scholar
 Chang YC, Chen SM, Liau CJ (2008) Fuzzy interpolative reasoning for sparse fuzzyrulebased systems based on the areas of fuzzy sets. IEEE Trans Fuzzy Syst 16(5):1285–1301CrossRefGoogle Scholar
 Chen SM, Chang YC (2011) Weighted fuzzy rule interpolation based on GAbased weightlearning techniques. IEEE Trans Fuzzy Syst 19(4):729–744CrossRefGoogle Scholar
 Diao R, Jin S, Shen Q (2014) Antecedent selection in fuzzy rule interpolation using feature selection techniques. In: 2014 IEEE international conference on fuzzy systems (FUZZIEEE). IEEE, pp 2206–2213Google Scholar
 Galea M, Shen Q (2006) Simultaneous ant colony optimization algorithms for learning linguistic fuzzy rules. In: Agraham A, Grosan C, Ramos V (eds) Swarm intelligence in data mining. Springer, Berlin, pp 75–99Google Scholar
 Hoffmann F (2004) Combining boosting and evolutionary algorithms for learning of fuzzy classification rules. Fuzzy Sets Syst 141(1):47–58MathSciNetCrossRefMATHGoogle Scholar
 Hong TP, Lee CY (1996) Induction of fuzzy rules and membership functions from training examples. Fuzzy Sets Syst 84(1):33–47MathSciNetCrossRefMATHGoogle Scholar
 Hsiao WH, Chen SM, Lee CH (1998) A new interpolative reasoning method in sparse rulebased systems. Fuzzy Sets Syst 93(1):17–22MathSciNetCrossRefMATHGoogle Scholar
 Huang Z, Shen Q (2006) Fuzzy interpolative reasoning via scale and move transformations. IEEE Trans Fuzzy Syst 14(2):340–359CrossRefGoogle Scholar
 Huang Z, Shen Q (2008) Fuzzy interpolation and extrapolation: a practical approach. IEEE Trans Fuzzy Syst 16(1):13–28CrossRefGoogle Scholar
 Janikow CZ (1998) Fuzzy decision trees: issues and methods. IEEE Trans Syst Man Cybern Part B (Cybern) 28(1):1–14CrossRefGoogle Scholar
 Jin S, Diao R, Quek C, Shen Q (2014) Backward fuzzy rule interpolation. IEEE Trans Fuzzy Syst 22(6):1682–1698CrossRefGoogle Scholar
 Kóczy L, Hirota K (1993a) Approximate reasoning by linear rule interpolation and general approximation. Int J Approx Reason 9(3):197–225MathSciNetCrossRefMATHGoogle Scholar
 Kóczy L, Hirota K (1993b) Interpolative reasoning with insufficient evidence in sparse fuzzy rule bases. Inf Sci 71(1–2):169–201MathSciNetCrossRefMATHGoogle Scholar
 Li YM, Huang DM, Zhang LN, et al (2005) Weighted fuzzy interpolative reasoning method. In: Proceedings of 2005 international conference on machine learning and cybernetics, 2005, vol 5. IEEE, pp 3104–3108Google Scholar
 Mamdani E, Assilian S (1999) An experiment in linguistic synthesis with a fuzzy logic controller. Int J Hum Comput Stud 51(2):135–147CrossRefMATHGoogle Scholar
 Mitchell TM (1997) Machine learning. McGrawHill Science/Engineering/MathGoogle Scholar
 Naik N, Diao R, Shen Q (2017) Dynamic fuzzy rule interpolation and its application to intrusion detection. IEEE Trans Fuzzy SystGoogle Scholar
 Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106Google Scholar
 Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55MathSciNetCrossRefGoogle Scholar
 Wang LX, Mendel JM (1992) Generating fuzzy rules by learning from examples. IEEE Trans Syst Man Cybern 22(6):1414–1427MathSciNetCrossRefGoogle Scholar
 Yang L, Shen Q (2011) Adaptive fuzzy interpolation. IEEE Trans Fuzzy Syst 19(6):1107–1126CrossRefGoogle Scholar
 Yang L, Chao F, Shen Q (2017) Generalized adaptive fuzzy rule interpolation. IEEE Trans Fuzzy Syst 25(4):839–853Google Scholar
 Yuan Y, Shaw MJ (1995) Induction of fuzzy decision trees. Fuzzy Sets Syst 69(2):125–139MathSciNetCrossRefGoogle Scholar
 Zadeh L (1973) Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans Syst Man Cybern 3:28–44MathSciNetCrossRefMATHGoogle Scholar
 Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353CrossRefMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.