1 Introduction

Fuzzy set theory (Zadeh 1965) and its extension, fuzzy logic, have been successfully applied for many real-world problems (e.g., Ross 2005; Terano et al. 2014; Zimmermann 2011). In particular, fuzzy expert systems are popular due to their ability to exploit the tolerance for imprecision, partial truth and approximations, in an effect to achieve close resemblance with human activity and reasoning intuition. Many of which have been developed using the idea of approximate reasoning (also known as linguistic reasoning), reflecting the manner of human cogitation and leading to new, more human interpretable, intelligent systems.

In general, an approximate reasoning system can be formalized as a fuzzy if–then rule-based inference mechanism that derives a conclusion given an input observation. It consists of linguistic variables, fuzzy rules and a fuzzy inference method. Linguistic variables facilitate the interpretation of linguistic expressions in terms of fuzzy quantities of certain underlying mathematical semantics. Fuzzy inference rules are a set of rules that associate input and output variables of a given physical system or other phenomenon in determining their relationships, either learned from historical data or directly acquired from domain experts, or a mixture of both. Based on such rules, a fuzzy inference mechanism is encoded to implement the process of approximate reasoning, through manipulation among the fuzzy inference rules in response to any new input data. As such, fuzzy rule bases are the essential component of any approximate reasoning model, storing knowledge required to inference and determining what computational techniques to use.

Another main component of an approximate reasoning system is the mechanism that computes the output given an input and the rule base. A variety of potentially useful methods exist in the literature. Many have been seen to implement generalised modus ponens, mostly by following the basic idea of Compositional Rule of Inference (CRI, Zadeh 1973). The law of CRI has been widely and successfully applied. For example, the famous Mamdani’s fuzzy logic controller (Mamdani and Assilian 1999) was implemented this way. Nevertheless, CRI can only derive reasonable and accurate outcomes in its full potential while working with dense, or complete, rule bases where rules available cover the entire problem space. This implies that any unknown observation must match at least one fuzzy rule in the rule base for the mechanism to work.

In many circumstances, a dense rule base cannot be realistically obtained, but only an incomplete rule base instead. A number of reasons may lead to such incomplete rule bases, with the most common ones being (Baranyi et al. 1999; Tikk and Baranyi 2000): (1) To utilise incomplete knowledge about the modelled problem, regardless of the means for the construction of the rule base, be it from human expertise or machine learning techniques; and (2) To reduce the number of rules in a rule base and hence, the complexity of the resultant fuzzy system.

Unfortunately, CRI is unable to draw a conclusion when a rule base is not dense but sparse. Sparse, or incomplete, rule bases considered here are not referring to the quantity of the rules in a given rule base, but to the coverage of the problem domain by the antecedents of the rules regarding the universe of discourse. That is, an input observation may have no overlap with any of the rules available and hence, no rule may be executed to derive the required consequent by directly applying CRI.

1.1 Research context

Resolving real-world problems frequently involves the use of such sparse rule bases, since a dense rule base may be impracticable in a multidimensional environment where the number of rules increases exponentially as the input variables and the fuzzy linguistic labels associated with each variable increase. It is therefore, desirable to develop more advanced inference mechanism to work with incomplete rule bases. Fuzzy rule interpolation (FRI) facilitates approximate reasoning in fuzzy rule-based systems when only sparse knowledge is available (Kóczy and Hirota 1993a, b). It addresses the key limitation of CRI that requires a dense fuzzy rule base which fully covers the entire problem domain. With a different underlying approach from CRI and many of its derivatives, FRI reasons through manipulation of rules that bear similarity with an unmatched observation without having to resort to direct pattern matching. This makes a significant breakthrough in fuzzy rule-based inference for situations where only an incomplete rule base is available. It works with sparse knowledge, attempting to reduce, if not to completely remove, the restriction of CRI for cases where no conclusion may be derived due to no rules matching a new observation. This offers an alternative way to infer an approximately interpolated outcome, accomplishing the so-called fuzzy interpolative reasoning.

FRI essentially makes two contributions to the development of fuzzy rule-based systems. It not only facilitates the assistance of reasoning on sparse rule bases (Burkhardt and Bonissone 1992), but also offers the potential for a reverse application where the rule base may be so dense that model simplification is required. That is, FRI can be utilised to simplify the complexity of fuzzy rule bases through say, a procedure of iteratively replacing two existing rules with an interpolated one (Koczy and Hirota 1997), thereby eliminating those fuzzy rules which may be approximated from their neighbouring ones. Nonetheless, this is not the motif of the great majority of FRI techniques developed in the literature and hence, is beyond the scope of this review. This paper focuses on the former issue that is performing inference with a sparse rule base. From this viewpoint, the goal of FRI is not to produce an interpolated rule through interpolative reasoning, but to compute an interpolated consequent that corresponds to the input observation. In so doing, FRI achieves the inference task with respect to the observations that originally have no conclusions to be drawn due to the sparseness of the fuzzy rule base.

As an inference mechanism, FRI starts to reach its goal from the selection of the nearest neighbouring (aka. the closest) rules in the sparse fuzzy rule base with regards the given unmatched observation. Such chosen rules form the basis for conducting fuzzy interpolation. Two major types have been seen in the literature to implement fuzzy interpolative reasoning: (1) the \(\alpha\)-cut based interpolations and (2) the intermediate rule based interpolations. This grouping depends upon whether the computation of the interpolated result is accomplished through a process of construction and transformation of an intermediate rule first. As such, FRI methods may also be organised in two groups, respectively termed as non-transformation based and transformation based FRI (Chen and Adam 2018). The seminal work for fuzzy interpolative reasoning, as of the techniques reported in Kóczy and Hirota (1993a, b) and their extensions, form the most typical non-transformation based FRI. For those relying on transforming intermediate rules, a family of scale and move transformation-based FRI (termed as T-FRI), such as those given in Huang and Shen (2006, 2008), Jin et al. (2014), Yang et al. (2017) and Naik et al. (2017b), have been popularly studied and widely applied despite their relative recency. This survey will review the general FRI methodologies as its first main topic.

In resolving practical real-world problems, multidimensional input variables are a common issue. Fortunately, many FRI methods exist in the literature that are capable of dealing with interpolation, by the use of fuzzy rules that involve multiple antecedent variables. Nonetheless, there is a common problem existing in these FRI approaches, where the antecedent attributes within the rules are presumed to be of equal significance for interpolation. Thus, inaccurate and even incorrect interpolated outcomes may result since different domain attributes may generally make different contributions to the decision making process.

Recently however, a number of methods have been proposed for FRI working on multiple antecedents associated with different weights (e.g., Chen and Chang 2011b; Chen and Chen 2016; Chen et al. 2009; Cheng et al. 2015; Diao et al. 2014). They are making significant contributions to enhancing the inference accuracy by moving away from the assumption of equal antecedent weight when dealing with sparse rule bases. In order to achieve the goal of weighted FRI, two closely related questions are the main issues shared within the weighted FRI methods, which are (1) how the weights are generated; and (2) whether and how the weights are integrated within the underlying, non-weighted FRI. A variety of weighted FRI approaches have been developed to implement approaches that attempt to address these two concerns. To generate the weights of different rule antecedent variables, it can be fulfilled either by assigning the predefined weights by domain experts or by running automatic weight learning algorithms. The latter may be obviously preferred as predefining weights will require human intervention and hence, will adversely reduce the flexibility and automation level of the resulting fuzzy systems. Regarding the second question, the weights are integrated within the original FRI in rather different way when it is implemented in different weighted FRI approaches. These two issues can be dealt with independently, or be implemented in a form of the “wrapper” approach that mixes up FRI-based inference and learning weights from data. Discussions about the techniques for weighted FRI form the second focus of this survey.

The efficacy of the inference mechanism introduced by an FRI method may be reflected or revealed through their utilisation in resolving real-world application problems. As with many practical applications of classic fuzzy reasoning tools, FRI has reinforced the power of systems control, including successful examples: for the simulation of automated guided vehicles (Kovács and Kóczy 1999), for surveillance navigation control of mobile robots (Vincze and Kovács 2008), and for general behaviour-based control (Kovács and Kóczy 2004). The work on dynamic FRI (Naik et al. 2017b) offers significant opportunities for facilitating selection, combination and generalisation of informative, frequently used interpolated rules for enriching existing rule base while performing interpolation. It provides promising solutions to cyber-security problems, including: network security analysis, intelligent intrusion detection (Naik et al. 2017b) and firewall reinforcement (especially for Microsoft Windows Firewall) (Naik et al. 2017a). FRI also finds impressive results in performing practical pattern recognition tasks, examples include: classic classification and prediction problems (Li et al. 2018b, 2020a; Chen and Chen 2016) using weighted FRI techniques; computer vision and image super resolution (Yang et al. 2019); and disease diagnosis in general and mammographic mass risk analysis (Li et al. 2019) and colorectal polyp detection (Nagy et al. 2018) in particular. Further applications of FRI are found in function approximation (Wong and Gedeon 2000; Berecz 2009) and student academic performance evaluation (Johanyák 2010).

1.2 Motivational objectives

This paper aims to provide a comprehensive review of FRI techniques that enable approximate reasoning in the context of sparse rule bases, covering both the conventional FRI methods and the recent advances (particularly in weighted FRI mechanisms). To be more specific, the objectives of this survey are three-fold:

  • To offer an up-to-date tutorial of the key developments regarding fuzzy rule-based inference which are tailored to situations where only incomplete domain knowledge is available;

  • To provide a systematic comparison between different approaches, so that readers can have an informed choice of what may be the potentially suitable FRI technique(s) to apply given their specific domain problems; and

  • To promote the advanced weighted FRI mechanisms to inform the readers about the benefits of using such most recent developed algorithms, which ensure not only the effectiveness of approximate reasoning but also the efficiency (as each time, only two nearest neighbouring rules are required to perform FRI).

1.3 Main contributions

By achieving the above objectives, this survey offers the following major contributions to the relevant literature: (C1) The basic idea of FRI techniques are summarised and the main FRI approaches categorised, with the properties of the representatives of each category discussed, offering a comprehensive tutorial of the FRI literature. (C2) The methodologies of weighted FRI are introduced systematically, in terms of both the mechanisms for weight generation and the techniques for weight integration within the underlying otherwise unweighted FRI methods, with all reviewed algorithms concisely described using an unified pseudocode format to ease understanding. (C3) The comparison between different FRI techniques are provided with respect to a series of commonly used criteria, and this is shown not only within each of the two main categories of approaches: unweighted or weighted, but also between the two categories themselves, highlighting the advantages of running a weighted FRI method over its unweighted counterpart.

1.4 Paper structure

The remainder of this paper is organised as follows. Section 2 first illustrates the basic notations utilised throughout the survey. Sections 3 and 4 review the techniques of unweighted FRI and weighted FRI, including a comparative analysis within each category. Section 4 also includes a qualitative comparison between the two major FRI categories themselves. Section 5 finally concludes this survey and points out interesting further work in this exciting field. To aid in better digesting and henceforth, in facilitating drawing informed conclusions from different parts of this paper, the structural organisation of this survey is illustrated Fig. 1, including an outline of the key content and concepts contained within each section, in relation to the major contributions as identified above.

Fig. 1
figure 1

Organisation of survey structure

2 Preliminaries for FRI in fuzzy rule-based systems

This is a significant undertaking to survey a large body of literature about fuzzy rule interpolation, concerning both recent and current developments over a diversity of different approaches. Thus, the typical structure used for presenting literature reviews in the field of computational intelligence is adopted in this work. In particular, preliminary theoretical foundations that are shared by the different approaches to be reviewed are provided herein, in an effort to help reduce potential repetitions in the subsequent main body of the survey.

This section first explains the fundamental notations adopted to express fuzzy rules and observations, both being the key components in any FRI system that performs fuzzy reasoning with incomplete knowledge. This is followed by an illustration of fuzzy membership functions (MFs), which are used to describe the antecedent and consequent parts of a fuzzy rule or a given observation. Then, the main categorisation of MFs is outlined, with triangular MFs being highlighted and described in slightly more detail.

In general, a fuzzy rule-based system, where FRI works within, has as its key component a set of if–then rules, each of which takes fuzzy or crisp terms that represent specifications of the input variables and associates these with the output of a certain problem description. In general, a rule may involve multiple output attributes as well as multiple input variables, but a multiple output rule can always be equivalently expressed by several single output rules. Without losing generality, only rules which have a single output class are considered in this work.

Formally, a typical fuzzy rule model essentially contains two key elements \(\langle R,Y \rangle\) in describing a given problem: A non-empty finite set of domain attributes \(Y = A \cup \{z\}\), where \(A = \{a_{j}|j=1,2,\dots ,m\}\) represents the set of input antecedent attributes and z stands for the consequent, and a non-empty finite set of fuzzy rules \(R = \{r^{1},r^{2},\dots ,r^{N}\}\). In many conventional fuzzy rule-based systems, including systems implemented with FRI techniques, a given rule \(r^{i} \in R\) and an observation \(o^{*}\) are often expressed generally as follows:

$$\begin{aligned} \begin{aligned} r^{i}\, :&\,if\;a_{1}\; is \; A_{1}^{i} \; and \; a_{2} \; is \; A_{2}^{i} \; and \; \cdots \; and \; a_{m}\; is \; A_{m}^{i}, \; then \; z\; is \; B^{i} \\ o^{*}\; :&\; a_{1}\; is \; A_{1}^{*}\; and\; a_{2}\; is \; A_{2}^{*}\; and \; \cdots \; and \; a_{m}\; is \; A_{m}^{*} \end{aligned} \end{aligned}$$
(1)

where \(A_{j}^{i}\) and \(A_{j}^{*}\) denote the fuzzy set values taken by the antecedent attribute \(a_{j}\) in \(r^{i}\) and \(o^{*}\), respectively; and \(B^{i}\) represents the fuzzy set value of the consequent attribute z in \(r^{i}\).

In the above rule representation, the logical conjunctive term and between each conjunct pair of the propositions that an antecedent attribute takes a fuzzy value indicates that the fuzzy values taken by the two attributes form a compound fuzzy value. The compound value is computed as the result of applying a t-norm operator (e.g., the popular operation min) to the two conjunct values (Bede 2013). Thus, the “if-part” of a fuzzy rule can be interpreted in the following, where the operator \(\wedge\) may be simply implemented with min:

$$\begin{aligned} A_{1}^{i}(a_{1}) \, \wedge \, A_{2}^{i}(a_{2}) \, \wedge \, \cdots \, \wedge \, A_{m}^{i}(a_{m}) \end{aligned}$$
(2)

This interpretation equally applies to the interpretation of the term and in an observation. However, a given observation is deliberately denoted differently from the rule in the above, where the value of any attribute involved is labelled with an asterisk. This is to explicitly differentiate it from the antecedent part of a rule, as unlike any rule the observation does not have the “then-part”. It is this “conclusion” part that is unknown and that is to be inferred, using an FRI technique if there is no rule to match the observation. Throughout this paper, unless otherwise stated, any notion that is attached with an asterisk sign implies that it is part of an observation.

Fuzzy values of both the rule antecedents and the consequent are in general represented by fuzzy sets. The concept of fuzzy sets was introduced by Zadeh (1965). Informally, the definition of a fuzzy set given by Zadeh can be stated as follows: A fuzzy set is a class with a continuum of membership grades. Thus, a fuzzy set A in a universe of discourse X is characterised by a membership function (MF) A which associates each element \(x \in X\) with a real number \(A(x) \in [0,1]\). This is interpreted such that A(x) is the membership grade of x belonging to the fuzzy set A (Bede 2013).

As can be seen from the above, the MF \(A: X \rightarrow [0,1]\) distinguishes the fuzzy sets from the classical boolean sets. Unlike a classical set with clear boundaries, i.e., \(x \in A\) or \(x \notin A\), which excludes any other possibility, the property of the membership function enables fuzzy sets to model partial degrees to which a variable or attribute is deemed to take a certain underlying real or categorical value. Such fuzzy sets are often assigned with linguistic terms to help capture and reflect human interpretation of imprecise measurements or descriptions.

Particularly, when the universe of discourse X consists of the real line \({\mathbb {R}}\), any type of continuous functions can be used as an MF, provided that a set of parameters is given to specify the appropriate meanings of the MF. In this case, it is impractical to list all the pairs defining an MF, even if imposing the constraint that all MFs are convex in topology to ease the expression of common sense interpretation of belongingness. Fortunately, only a small number of types of MF that are typically used in practice. Basically, there are two main categories in terms of their properties: smoothness and linearity, which are: (1) polygonal (piecewise linear) fuzzy sets, including triangular shaped, trapezoidal shaped, hexagonal shaped MF, etc., and (2) nonlinear fuzzy sets, typically including Gaussian, Generalised bell-shaped, and Sigmoid MFs.

Polygonal fuzzy sets are generally represented by their characteristic points (CPs) in ascending order [which are defined mathematically as the odd points of the membership function (Huang and Shen 2008)], and nonlinear ones by the defining parameters that are used to specify each nonlinear function. The choice of different MFs relies on the specific requirements of a given application. Amongst the family of all possible functions, triangular MFs and trapezoidal MFs have been used most extensively, especially for real-time implementations, thanks to their simple representation and computational efficiency. In the literature, generally speaking, different MFs have been exploited to implement the proposed FRI methods. Nonetheless, procedures employing triangular MFs and/or trapezoidal MFs may be seen as specific cases of those which utilise more complex polygonal fuzzy sets. It is difficult to have a generic closed form representation that unifies all FRI processes as they are dependent upon the MFs used (Yang and Shen 2013). In this work, for illustrative and demonstrative consistency and simplicity, as well as for their popularity in the literature, triangular MFs are employed to describe and contrast all FRI methods.

As shown in Fig. 2, a normal and convex triangular fuzzy set A is illustrated with its three ascending-ordered CPs, i.e., \((a_{1},a_{2},a_{3})\), where the first and third CP stand for the two extreme points of the support with a membership value of 0 and the middle one stands for the normal point of the fuzzy set with a membership of 1. For a fuzzy rule base consisting of rules in the form as per Eq. (1), the triangular fuzzy values \(A_{j}^{i}\), \(A_{j}^{*}\), \(B^{i}\), and the consequent \(B^{*}\) to be computed by an FRI process (\(i=1,2,\dots ,N,j=1,2,\dots ,m\)) are therefore, represented by their corresponding CPs: \((a_{j1}^{i},a_{j2}^{i},a_{j3}^{i})\), \((a_{j1}^{*},a_{j2}^{*},a_{j3}^{*})\), \((b_{1}^{i},b_{2}^{i},b_{3}^{i})\), and \((b_{1}^{*},b_{2}^{*},b_{3}^{*})\), respectively.

Fig. 2
figure 2

Normal and convex triangular membership function

3 Fuzzy rule interpolation techniques

This section categorises and details the representatives of classical FRI methods, with typical pros and cons of different approaches discussed.

3.1 Categorisation of FRI approaches

In the literature, various FRI approaches have been proposed following the seminal work of Kóczy and Hirota (1993a, b), to perform fuzzy interpolative reasoning. In general, the existing methodologies can be grouped into two categories:

  1. 1.

    \(\alpha\)-cut/Non-transformation based FRI, see Table 1 for a summary with Table 2 listing further developments belonging to this category.

  2. 2.

    Intermediate rule/Transformation based FRI, see Table 3 for a summary with Table 4 listing a particular family of scale and move transformation based FRI (denoted as T-FRI hereafter) which are the most popular in the recent literature.

This categorisation is made depending upon whether processes to construct and then, to utilise a so-called intermediate fuzzy rule are involved in order to derive an interpolated result.

The \(\alpha\)-cut based FRI approaches, also known as non-transformation based methods, directly interpolate the results based on the computation of each \(\alpha\)-cut level given at least two fuzzy rules adjacent to an unmatched observation. Considerable work has been reported on this type of approach at the early stage of the investigation of fuzzy interpolative reasoning. In particular, the very first proposed, termed the KH method after the name of its inventors (Kóczy and Hirota 1993a, b), is the most typical \(\alpha\)-cut based algorithm for FRI. As indicated earlier, Table 1 also summarises several other alternative \(\alpha\)-cut based methods from different perspectives.

Table 1 \(\alpha\)-cut (non-transformation) based FRI methods
Table 2 Family of KH FRI

For the group of transformation-based approaches, they work by first computing an intermediate rule. The required consequent to an unknown observation is obtained through a two-step procedure by manipulating selected neighbouring rules to the observation. An intermediate rule is artificially constructed such that its antecedent is as “close” (given a certain distance metric, often the Euclidean one) to the observation as possible. An intermediate consequent is computed from the constructed rule antecedent. Observing that there may still exist a difference between the antecedent of the intermediate rule and the observation, the second step works based on the principle of analogical reasoning mechanism (Bouchon-Meunier and Valverde 1999; Turksen and Zhong 1988). It derives the conclusion by transforming the intermediate consequent in terms of the similarity measured between the antecedent of the intermediate rule and the observation, in an analogical manner as transforming the intermediate rule antecedent to the given observation. As one of the outstanding intermediate rule based FRI methods, the foundational T-FRI methodology was first introduced by Huang and Shen (2006, 2008). Many follow-on developments and modifications to this seminal approach have been proposed over the last two decades.

Table 3 Intermediate rule (transformation) based FRI methods
Table 4 Family of scale and move transformation based FRI (T-FRI)

Apart from the above two major groups of FRI methods to conduct fuzzy interpolative reasoning, there are alternative FRI techniques, as summarised in Table 5. This shows the diversity of this interesting research area. For the purpose of demonstrating the basic ideas of typical FRI methods, several commonly used FRI approaches from each of the two main categories are reviewed below. As indicated previously, the triangular fuzzy membership functions, as defined in Sect. 2 are employed throughout, unless otherwise stated, both for consistency in demonstration of the ideas and for efficiency in computation.

Table 5 Alternative FRI techniques

3.2 Representative \(\alpha\)-cut based FRI

As the seminal approach to FRI, the \(\alpha\)-cut based interpolation is essentially a fuzzy extension of the classical linear interpolation of given points that are linked with fuzzy rules. The interpolated result is generated through the computation and then, the aggregation of linear interpolation at each \(\alpha\)-cut level. Theoretically, in the case of arbitrary shaped convex normalised fuzzy sets an infinite number of \(\alpha\)-levels should be taken into consideration for an approximate conclusion. In practice however, to achieve an acceptable computational requirement, most \(\alpha\)-cut based methods only take a finite number of \(\alpha\)-levels (usually two, three or four) into account, with the resulting points being connected piecewise linearly to yield an approximation of the consequent.

3.2.1 KH: foundational linear FRI

This section first formulates the basic idea of the most famous \(\alpha\)-cut based FRI, named KH linear FRI (after its inventors Kóczy and Hirota 1993a, b), in a general formation, followed by its practical implementation by the use of triangular membership functions in a multidimensional situation.

3.2.1.1 Core principle

The KH rule interpolation offers an initial proposal for fuzzy interpolative reasoning through manipulating \(\alpha\)-cut distances. When a given observation fails to match any rule in the sparse rule base for firing, an interpolated consequent is constructed by performing a linear aggregation of the rule consequents of a number (usually two) of selected neighbouring rules closest to the observation. The aggregation operation complies with the general principle of similarity-based analogical reasoning, such that

The closer a rule’s antecedent \(A^{i}\) (which is a logical aggregation of individual attribute values \(A^{i}_{j}\)) to the observation \(o^{*}\), the closer the rule’s consequent \(B^{i}\) to the outcome \(B^{*}\) that corresponds to \(o^{*}\).

The similarity measure employed is specified by the use of fuzzy distances defined between a rule antecedent and the observation. That is, the smaller distance between \(A^{i}\) and \(o^{*}\) is, the more similar they are, with the corresponding \(B^{i}\) deemed to potentially make more contribution than otherwise to the consequent being sought.

Suppose that there are two rules \(r^{i}\) and \(r^{j}\) in the rule base R, which are formulated as shown in Eq. (1). Given an observation \(o^{*}\) [again, as per Eq. (1)], the notion of linear rule interpolation can be written as:

$$\begin{aligned} \frac{{\tilde{d}}\big (A^{*},A^{i}\big )}{{\tilde{d}}\big (A^{*},A^{j}\big )} = \frac{{\tilde{d}}\big (B^{*},B^{i}\big )}{{\tilde{d}}\big (B^{*},B^{j}\big )} \end{aligned}$$
(3)

where

$$\begin{aligned} {\tilde{d}}\big (A^{*},A^{i}\big ) = \sqrt{{\tilde{d}}_{i1}^{2}+{\tilde{d}}_{i2}^{2}+\cdots +{\tilde{d}}_{im}^{2}} \qquad {\tilde{d}}_{it} = {\tilde{d}}\big (A_{t}^{*},A_{t}^{i}\big ),t=1,2,\ldots ,m \end{aligned}$$
(4)

and \({\tilde{d}}\) denotes the fuzzy distance between the two membership functions.

Fuzzy distance between two fuzzy sets is interpreted as a pair of lower and upper fuzzy distances between their \(\alpha\)-cut sets, with respect to the Resolution Principle (Kóczy and Hirota 1993a). For a particular \(\alpha \in [0,1]\), the lower fuzzy distance \({\tilde{d}}_{L}(A,B)\) and upper fuzzy distance \({\tilde{d}}_{U}(A,B)\) are denoted as:

$$\begin{aligned} {\tilde{d}}_{L}(A,B) = D(inf(A_{\alpha }),inf(B_{\alpha })) \qquad {\tilde{d}}_{U}(A,B) = D(sup(A_{\alpha }),sup(B_{\alpha })) \end{aligned}$$
(5)

where D denotes the Minkowski distance, and inf(.) and sup(.) are the infimum and supremum of the \(\alpha\)-cut concerned, respectively. Hence, the formula of linear rule interpolation [i.e., Eq. (3)] can be rewritten as:

$$\begin{aligned} \frac{{\tilde{d}}_{L}\big (A_{\alpha }^{*},A_{\alpha }^{i}\big )}{{\tilde{d}}_{L}\big (A_{\alpha }^{*},A_{\alpha }^{j}\big )}= & {} \frac{{\tilde{d}}_{L}\big (B_{\alpha }^{*},B_{\alpha }^{i}\big )}{{\tilde{d}}_{L}\big (B_{\alpha }^{*},B_{\alpha }^{j}\big )} \nonumber \\ \frac{{\tilde{d}}_{U}\big (A_{\alpha }^{*},A_{\alpha }^{i}\big )}{{\tilde{d}}_{U}\big (A_{\alpha }^{*},A_{\alpha }^{j}\big )}= & {} \frac{{\tilde{d}}_{U}\big (B_{\alpha }^{*},B_{\alpha }^{i}\big )}{{\tilde{d}}_{U}\big (B_{\alpha }^{*},B_{\alpha }^{j}\big )} \end{aligned}$$
(6)

This leads to the solution for \(min\{B_{\alpha }^{*}\}\) and \(max\{B_{\alpha }^{*}\}\) being:

$$\begin{aligned} min\big \{B_{\alpha }^{*}\big \}= & {} \frac{w_{\alpha L}^{i} min\big \{B_{\alpha }^{i}\big \} + w_{\alpha L}^{j} min\big \{B_{\alpha }^{j}\big \}}{w_{\alpha L}^{i} + w_{\alpha L}^{j}} \nonumber \\ max\big \{B_{\alpha }^{*}\big \}= & {} \frac{w_{\alpha U}^{i} max\{B_{\alpha }^{i}\} + w_{\alpha U}^{j} max\big \{B_{\alpha }^{j}\big \}}{w_{\alpha U}^{i} + w_{\alpha U}^{j}} \end{aligned}$$
(7)

where

$$\begin{aligned} w_{\alpha L}^{i}= & {} \frac{1}{{\tilde{d}}_{L}\big (A_{\alpha }^{*},A_{\alpha }^{i}\big )} \qquad w_{\alpha L}^{j} = \frac{1}{{\tilde{d}}_{L}\big (A_{\alpha }^{*},A_{\alpha }^{j}\big )} \nonumber \\ w_{\alpha U}^{i}= & {} \frac{1}{{\tilde{d}}_{U}\big (A_{\alpha }^{*},A_{\alpha }^{i}\big )} \qquad w_{\alpha U}^{j} = \frac{1}{{\tilde{d}}_{U}\big (A_{\alpha }^{*},A_{\alpha }^{j}\big )} \end{aligned}$$
(8)

The \(\alpha\)-cut of conclusion is then given by

$$\begin{aligned} B_{\alpha }^{*} = \big [min\big \{B_{\alpha }^{*}\big \},max\big \{B_{\alpha }^{*}\big \}\big ] \end{aligned}$$
(9)

and the interpolated conclusion can therefore, be obtained by the use of Resolution Principle, such that

$$\begin{aligned} B^{*} = \bigcup _{\alpha \in [0,1]} B_{\alpha }^{*} \end{aligned}$$
(10)
3.2.1.2 Multidimensional implementation

The foundational KH FRI works effectively and efficiently for simple linear problems. It has been subsequently developed to address sparse rule interpolation in more complex situations, for instance involving multiple rules with multiple antecedent variables (see Tikk et al. 2002; Wong et al. 2005). Thanks to the piecewise linear property presumed by KH interpolation, given triangular membership functions, the interpolated outcome \(B^{*}=(b_{1}^{*},b_{2}^{*},b_{3}^{*})\) can be determined with its two \(\alpha\)-cut sets (when \(\alpha\) is 0 or 1), resulting in the three characteristic points taking the values of

$$\begin{aligned} b_{t}^{*} = \frac{\sum _{i=1}^{n} \frac{1}{\sqrt{\sum _{j=1}^{m} {\big (a_{jt}^{i}-a_{jt}^{*}\big )^{2}}}} b_{t}^{i}}{\sum _{i=1}^{n} \frac{1}{\sqrt{\sum _{j=1}^{m} {\big (a_{jt}^{i}-a_{jt}^{*}\big )^{2}}}}} \end{aligned}$$
(11)

where n is the number of the neighbouring rules used for interpolation, m is the number of attributes in the rule, and \(t=1,2,3\). Such computation for the interpolated fuzzy set \(B^{*}\) reflects exactly the general situation as expressed by Eq. (7), where

$$\begin{aligned} b_{1}^{*}= & {} min\big \{B_{0}^{*}\big \} \nonumber \\ b_{3}^{*}= & {} max\big \{B_{0}^{*}\big \}, \qquad \alpha =0 \nonumber \\ b_{2}^{*}= & {} min\big \{B_{1}^{*}\} = max\{B_{1}^{*}\big \}, \qquad \alpha =1 \end{aligned}$$
(12)

3.2.2 CCL rule interpolation

As one of the most popularly used \(\alpha\)-cut based FRI methods, the CCL rule interpolation (named after its inventors Chang et al. 2008) offers an alternative means for fuzzy interpolative reasoning that exploits the areas of the fuzzy sets involved in the rules and the (unmatched) observation. The idea is to preserve the logically consistent properties with respect to the ratio of fuzziness (RF), which is determined by the areas of the fuzzy sets concerned. That is, it pursues consistency of RF between the (to be) interpolated consequent over the observation and the consequent value over the antecedent value of each rule used for interpolation. More specifically, the RF between two fuzzy values A and B is defined by

$$\begin{aligned} RF(A,B) = \frac{S(A)}{S(B)} \end{aligned}$$
(13)

where S(A), S(B) denote the area of the fuzzy set of A and that of B, respectively.

The CCL FRI method presents a flexible interpolative reasoning framework, allowing the use of different types of membership function (MF), including various polygonal typed and Gaussian shaped MFs. It can also handle general cases that involve multiple antecedent variables involved in multiple fuzzy rules. For simplification and consistency throughout, the core computations are summarised below in relation to the use of triangular fuzzy membership functions.

First, the normal point \(b_{2}^{*}\) of the (to be) interpolated consequent \(B^{*}\) is defined by linear interpolation, such that

$$\begin{aligned} S_{K}\big (B^{*}\big )= & {} {\left\{ \begin{array}{ll} \left( \sum \limits _{j=1}^{m} S_{K}\big (A_{j}^{*}\big )\right) \times \left( \sum \limits _{\begin{array}{c} i=1, \\ \exists j S_{K}\big (A_{j}^{i}\big )>0 \end{array}}^{n} W_{i} \times \frac{S_{K}(B^{i})}{\sum _{j=1}^{m} S_{K}\big (A_{j}^{i}\big )} \right) , &{} \quad {\text {if}}\; \exists ij S_{K}\big (A_{j}^{i}\big )>0 \\ \frac{\sum _{j=1}^{m} S_{K}\big (A_{j}^{*}\big )}{m},&{}\quad {\text {if}}\; \forall ij S_{K}\big (A_{j}^{i}\big )=0 \end{array}\right. } \end{aligned}$$
(14)
$$\begin{aligned} b_{2}^{*}= & {} \sum _{i=1}^{n} W_{i}b_{2}^{i} \end{aligned}$$
(15)

in which n is the number of selected rules for interpolation, and \(W_{i}\) is the aggregated rule weight, which is calculated by

$$\begin{aligned} W_{i} = \frac{\sum _{j=1}^{m} w_{ij}}{\sum _{i=1}^{n}\sum _{j=1}^{m} w_{ij}}, \qquad w_{ij} = 1 - \left| \frac{a_{j2}^{i}-a_{j2}^{*}}{max_{a_{j2}}-min_{a_{j2}}}\right| \end{aligned}$$
(16)

where \(max_{a_{j2}}\) and \(min_{a_{j2}}\) are used for normalisation, denoting the maximal and minimal value within \(\{a_{j2}^{i} | i=1,2,\dots ,n\}\).

Given the three characteristic points created from the two \(\alpha\)-cut sets (when \(\alpha =0,1\)), a triangular fuzzy set is divided into two smaller sub-triangles, as shown in Fig. 3 (more triangular or even trapezoidal shaped sub-polygons may be generated for more complex polygonal fuzzy sets with many characteristic points, but the same idea is followed as herein). From this, the left triangular area \(S_{L}(B^{*})\) (i.e., the part of the geometrical area of a triangular fuzzy set on the left hand side of the normal point) and the right triangular area \(S_{R}(B^{*})\), of the fuzzy set \(B^{*}\) are calculated by Eqs. (14 and 15), where for the subscript \(S_{K}\), \(K\in \{L,R\}\). This equation exactly reveals the basic idea of the CCL rule interpolation, where the RF from the observation viewpoint is constructed by the weighted aggregation of the RF of the involved rules, thereby leading to the derivation of the area of the interpolated fuzzy set.

Fig. 3
figure 3

Left area \(S_{L}\) and right area \(S_{R}\) of triangular fuzzy set

Finally, the left and right extreme points of the support for the interpolated result \(B^{*}\) are derived from the resulting triangular areas as follows:

$$\begin{aligned} b_{1}^{*} = b_{2}^{*}-2S_{L}(B^{*}), \qquad b_{3}^{*} = b_{2}^{*}+2S_{R}(B^{*}) \end{aligned}$$
(17)

3.3 Representative intermediate rule based FRI

This section reviews the underlying interpolation mechanism of the intermediate rule (or transformation) based FRI. In particular, more detailed description is given to the scale and move transformation-based FRI approach since it has been continuously investigated for decades and widely applied.

3.3.1 Representative value of fuzzy set

Prior to going through the details of intermediate rule based FRI techniques, a very important concept needs to be introduced, which is adopted within this type of interpolation algorithm. This is the Representative Value (Rep) of a fuzzy set. There are actually many variations in the literature (e.g., Baranyi et al. 2004; Chen and Chang 2011b; Chen et al. 2009; Huang and Shen 2008), assigned with different names, such as representative value in Huang and Shen (2008), reference point in Baranyi et al. (2004), and characteristic value in Chen and Ko (2008) and Chen et al. (2009). Nonetheless, they imply similar interpretations as described below with the term of representative value.

The representative value of a fuzzy set is a single value assigned to help capture important information contained by the set in a simplified way, such as the “most typical” overall location of the fuzzy set in its domain and also, its geometric shape. In certain situations, the Rep value may be defined by the defuzzified value of the fuzzy set if that is preferred since there is no unified definition. What is important is within a particular FRI method, all Rep values are computed in the same way.

More formally, as with the most popular approach in the literature, given an arbitrary polygonal fuzzy set \(A=(a_{1},a_{2},\dots ,a_{n})\) where \(a_{i},i=1,2,\dots ,n\) denote the characteristic points of the polygonal, its representative value Rep(A) is defined by:

$$\begin{aligned} Rep(A) = \sum _{i=1}^{n} w_{i}a_{i} \end{aligned}$$
(18)

where \(w_{i}\) is the weight assigned to the characteristic point \(a_{i}\) per i. In particular, the simplest case, which is named the average Rep, is one so computed where all points take the same weight value, i.e., \(w_{i}=1/n\). For a triangular fuzzy set \(A=(a_{1},a_{2},a_{3})\) as shown in Fig. 2 (of Sect. 2), Rep(A) is commonly and simply defined as follows in the literature (though its centre of gravity may also be used as an alternative if preferred):

$$\begin{aligned} Rep(A) = \frac{a_{1}+a_{2}+a_{3}}{3} \end{aligned}$$
(19)

The definition of representative values for more complex membership functions can be found in Huang and Shen (2008).

Apart from its geometrical meaning, the Rep value also simplifies the definition of the distance between fuzzy sets, to measure the degree of “closeness”. A simple case of the distance between two fuzzy sets A and B can be defined by

$$\begin{aligned} d(A,B) = |Rep(A)-Rep(B)| \end{aligned}$$
(20)

which is a crisp distance in contrast with \(\alpha\)-cut distance based methods (Baranyi et al. 2004). The distance definition employed in a given FRI approach will be specified in each method later.

3.3.2 Scale and move transformation-based FRI (T-FRI)

The scale and move transformation based FRI (T-FRI) is one of the most general and advanced intermediate rule based FRI mechanisms. One of the key aims of this development has been to eliminate an important practical issue that earlier work of FRI had in that the interpolated outcomes were not guaranteed to be convex and in certain cases, not even a fuzzy set. The presentation of the fundamental idea of T-FRI is reported in Huang and Shen (2006, 2008), which can handle both interpolation and extrapolation of multiple multi-antecedent rules with complex polygon shaped, Gaussian and bell-shaped fuzzy membership functions. The following outlines the key computational steps of T-FRI working with multiple fuzzy rules where in general, multiple rule antecedents are involved in each rule.

Given a sparse rule base R and an observation \(o^{*}\), in the form of Eq. (1), T-FRI works by running a computational process as highlighted in Fig. 4, involving four core procedures as summarised below.

Fig. 4
figure 4

Framework of scale and move transformation-based FRI

  • Step 1: Closest rules selection

This procedure is required as the basis upon which to perform FRI, when \(o^{*}\) does not match any of the rules in the rule base. Intuitively, it searches for a certain number of rules that are closest to the observation. The distance between an observation \(o^{*}\) and a rule \(r^{q}\), or the distance between any two rules \(r^{p},r^{q}\in R\), is determined by computing the aggregated distances over all the corresponding fuzzy values of the shared attributes between them:

$$\begin{aligned} d(v,r^{q}) = \frac{1}{\sqrt{m}}\sqrt{\sum _{j=1}^{m} d\big (A_{j}^{v},A_{j}^{q}\big )^{2}} \end{aligned}$$
(21)

where v is \(o^{*}\) or \(r^{p}\) (so \(A_{j}^{v}\) is \(A_{j}^{*}\) or \(A_{j}^{p}\)), depending on whether the distance is between an observation and a rule or between two rules. So, the n closest rules to \(o^{*}\) are those rules leading to the n smallest values of this distance measurement. In computing the aggregation regarding the above definition, the distance between a pair of antecedent fuzzy sets is calculated as below:

$$\begin{aligned} d\big (A_{j}^{v},A_{j}^{q}\big ) = \frac{\left| Rep\big (A_{j}^{v}\big )-Rep\big (A_{j}^{q}\big )\right| }{max_{A_{j}}-min_{A_{j}}} \end{aligned}$$
(22)

This is implemented by the use of the Rep values of the corresponding fuzzy sets (defined in Sect. 3.3.1), representing the normalised result of the otherwise absolute distance, where \(max_{A_{j}}\) and \(min_{A_{j}}\) denote the maximal and minimal value of the attribute \(a_{j}\), respectively. This normalisation is to ensure that all distance measures are compatible with each other over different attribute domains.

  • Step 2: Intermediate fuzzy rule construction

From the above, n closest rules to a given observation can be chosen which have the minimal distances amongst all the rules with respect to the observation. Next, an intermediate fuzzy rule \(r^{\prime }\) is constructed, forming the start point of the transformation process in T-FRI. In most applications of T-FRI, n is taken to be 2 purely for computational efficiency, but often at the expense of interpolative accuracy [if all rule antecedents are regarded as of having equal significance (Li et al. 2020b)].

The construction procedure computes the antecedent fuzzy sets \(A_{j}^{\prime }, j=1,\dots ,m\) and the corresponding consequent fuzzy set \(B^{\prime }\) of the intermediate rule:

$$\begin{aligned} r^{\prime } \text{: } \text{ if } a_{1} \text{ is } A_{1}^{\prime } \text{ and } a_{2} \text{ is } A_{2}^{\prime } \text{ and } \cdots \text{ and } a_{m} \text{ is } A_{m}^{\prime } \text{, } \text{ then } z \text{ is } B^{\prime } \end{aligned}$$

which is a weighted aggregation of the n closest rules. Let \(w_{j}^{i},i\in \{1,\dots ,n\}\), denote the weight to which the jth antecedent of the ith fuzzy rule contributes to the construction of the jth antecedent \(A_{j}^{\prime }\) of the intermediate fuzzy rule:

$$\begin{aligned} w_{j}^{i} = \frac{1}{1+d\big (A_{j}^{i},A_{j}^{*}\big )} \end{aligned}$$
(23)

where \(d(A_{j}^{i},A_{j}^{*})\) is calculated as per Eq. (22). Then,

$$\begin{aligned} A_{j}^{\prime } = A_{j}^{\prime \prime } + \delta _{A_{j}} \big (max_{A_{j}}-min_{A_{j}}\big ) \end{aligned}$$
(24)

with

$$\begin{aligned} A_{j}^{\prime \prime } = \sum _{i=1,\dots ,n} \hat{w_{j}^{i}}A_{j}^{i} \end{aligned}$$
(25)

where \(\hat{w_{j}^{i}}\) denotes the normalised weight and \(\delta _{A_{j}}\) is a constant (termed the shift factor of \({A_{j}}\)), defined respectively by

$$\begin{aligned} \hat{w_{j}^{i}} = \frac{w_{j}^{i}}{\sum _{t=1,\dots ,n}w_{j}^{t}}, \qquad \delta _{A_{j}} = \frac{|Rep(A_{j}^{*})-Rep(A_{j}^{\prime \prime })|}{max_{A_{j}}-min_{A_{j}}} \end{aligned}$$
(26)

The consequent value of the intermediate rule is constructed in the same manner as above, that is

$$\begin{aligned} B^{\prime } = B^{\prime \prime } + \delta _{z} \big (max_{z}-min_{z}\big ) \end{aligned}$$
(27)

where \(max_{z}\) and \(min_{z}\) are the maximal and minimal values of the consequent attribute, \(B^{\prime \prime }\) is the weighted aggregation of the consequent values of the n closest rules \(B^{i},i=1,\dots ,n\):

$$\begin{aligned} B^{\prime \prime } = \sum _{i=1,\dots ,n} \hat{w_{z}^{i}}B^{i} \end{aligned}$$
(28)

with \(\hat{w_{z}^{i}}\) being the mean of the normalised weights associated with the antecedents \(\hat{w_{j}^{i}}\) in each rule:

$$\begin{aligned} \hat{w_{z}^{i}} = \frac{1}{m} \sum _{j=1}^{m} \hat{w_{j}^{i}} \end{aligned}$$
(29)

and the shift factor \(\delta _{z}\) of the consequent is the mean of \(\delta _{A_{j}},j=1,\dots ,m\)

$$\begin{aligned} \delta _{z} = \frac{1}{m} \sum _{j=1}^{m} \delta _{A_{j}} \end{aligned}$$
(30)
  • Step 3: Scale and move factors calculation

As a transformation based FRI method, the goal of the transformation process T in T-FRI is to scale and move an intermediate fuzzy set \(A_{j}^{\prime }\), such that the transformed shape and representative value coincide with those of the observed value \(A_{j}^{*}\). This process is implemented in the following two stages:

  1. 1.

    Scale operation from \(A_{j}^{\prime }\) to \(\hat{A_{j}^{\prime }}\) (denoting the scaled intermediate fuzzy set).

In order to implement this, the required scale rate \(s_{A_{j}}\) is first determined. As the specification of the scale (and the subsequent move) factors are dependent upon the fuzzy membership functions used. Given a triangular fuzzy set \(A_{j}^{\prime }=(a_{j1}^{\prime },a_{j2}^{\prime },a_{j3}^{\prime })\), the scale rate \(s_{A_{j}}\) is:

$$\begin{aligned} s_{A_{j}} = \frac{a_{j3}^{*}-a_{j1}^{*}}{a_{j3}^{\prime }-a_{j1}^{\prime }} \end{aligned}$$
(31)

which essentially expands or contracts the support length of \(A_{j}^{\prime }:a_{j3}^{\prime }-a_{j1}^{\prime }\) so that it becomes the same as that of \(A_{j}^{*}\). The scaled intermediate fuzzy set \(\hat{A_{j}^{\prime }}\), which has the same representative value as \(A_{j}^{\prime }\), is then obtained such that

$$\begin{aligned} \hat{a_{j1}^{\prime }}= & {} \frac{\big (1+2s_{A_{j}}\big )a_{j1}^{\prime }+\big (1-s_{A_{j}}\big )a_{j2}^{\prime }+\big (1-s_{A_{j}}\big )a_{j3}^{\prime }}{3} \nonumber \\ \hat{a_{j2}^{\prime }}= & {} \frac{\big (1-s_{A_{j}}\big )a_{j1}^{\prime }+\big (1+2s_{A_{j}}\big )a_{j2}^{\prime }+\big (1-s_{A_{j}}\big )a_{j3}^{\prime }}{3} \nonumber \\ \hat{a_{j3}^{\prime }}= & {} \frac{\big (1-s_{A_{j}}\big )a_{j1}^{\prime }+\big (1-s_{A_{j}}\big )a_{j2}^{\prime }+\big (1+2s_{A_{j}}\big )a_{j3}^{\prime }}{3} \end{aligned}$$
(32)
  1. 2.

    Move operation from \(\hat{A_{j}^{\prime }}\) to \(A_{j}^{*}\).

Given the scaled intermediate fuzzy set \(\hat{A_{j}^{\prime }}\), the move ratio \(m_{A_{j}}\) can then be determined. As indicated above, the move operation shifts the position of \(\hat{A_{j}^{\prime }}\) to becoming the same as that of \(A_{j}^{*}\), while maintaining its representative value \(Rep(\hat{A_{j}^{\prime }})\). This is achieved using the move ratio \(m_{A_{j}}\):

$$\begin{aligned} m_{A_{j}} = {\left\{ \begin{array}{ll} \frac{3\big (a_{j1}^{*}-\hat{a_{j1}^{\prime }}\big )}{\hat{a_{j2}^{\prime }}-\hat{a_{j1}^{\prime }}}, &{} \quad {\text {if }} a_{j1}^{*} \ge \hat{a_{j1}^{\prime }} \\ \frac{3\big (a_{j1}^{*}-\hat{a_{j1}^{\prime }}\big )}{\hat{a_{j3}^{\prime }}-\hat{a_{j2}^{\prime }}},&{}\quad {\text {otherwise}} \end{array}\right. } \end{aligned}$$
(33)

This step computes and records all such scale rates and move ratios for use in the subsequent, and final, procedure to obtain the required consequent value, in response to the observation otherwise unmatched by any rule in the sparse rule base.

  • Step 4: Scale and move transformations

After calculating the necessary scale and move factors (i.e., \(s_{A_{j}}\) and \(m_{A_{j}},j=1,\dots ,m\)), this procedure completes the T-FRI process, deriving the required consequent value of \(B^{*}\). This follows the intuition of similar observations leading to similar consequents, a heuristic fundamental to analogical approximate reasoning. For this, the transformation factors on the antecedent attributes are aggregated. In all conventional T-FRI methods, this is implemented by averaging them:

$$\begin{aligned} s_{z} = \frac{1}{m} \sum _{j=1}^{m} s_{A_{j}} \qquad m_{z} = \frac{1}{m} \sum _{j=1}^{m} m_{A_{j}} \end{aligned}$$
(34)

This entails the computation of scaled \(\hat{B^{\prime }}=(\hat{b_{1}^{\prime }},\hat{b_{2}^{\prime }},\hat{b_{3}^{\prime }})\):

$$\begin{aligned} \hat{b_{1}^{\prime }}= & {} \frac{(1+2s_{z})b_{1}^{\prime }+(1-s_{z})b_{2}^{\prime }+(1-s_{z})b_{3}^{\prime }}{3} \nonumber \\ \hat{b_{2}^{\prime }}= & {} \frac{(1-s_{z})b_{1}^{\prime }+(1+2s_{z})b_{2}^{\prime }+(1-s_{z})b_{3}^{\prime }}{3} \nonumber \\ \hat{b_{3}^{\prime }}= & {} \frac{(1-s_{z})b_{1}^{\prime }+(1-s_{z})b_{2}^{\prime }+(1+2s_{z})b_{3}^{\prime }}{3} \end{aligned}$$
(35)

where \(B^{\prime }=(b_{1}^{\prime },b_{2}^{\prime },b_{3}^{\prime })\) is the fuzzy value of the intermediate consequent previously computed. From this, again, by analogy to the transformation required for the antecedent to match the observation, move transformation is applied, resulting in the final, required interpolated consequent \(B^{*}=(b_{1}^{*},b_{2}^{*},b_{3}^{*})\):

$$\begin{aligned} b_{1}^{*}&= \hat{b_{1}^{\prime }} + m_{z} \gamma \nonumber \\ b_{2}^{*}&= \hat{b_{2}^{\prime }} - 2m_{z} \gamma \qquad \gamma = {\left\{ \begin{array}{ll} \frac{\hat{b_{2}^{\prime }}-\hat{b_{1}^{\prime }}}{3},&{}\quad {\text {if}}\; m_{z} \ge 0 \\ \frac{\hat{b_{3}^{\prime }}-\hat{b_{2}^{\prime }}}{3},&{}\quad {\text {otherwise}} \end{array}\right. } \nonumber \\ b_{3}^{*}&= \hat{b_{3}^{\prime }} + m_{z} \gamma \end{aligned}$$
(36)

For illustration, Fig. 5 outlines the scale and move transformation process (i.e., Steps 3 and 4 of a typical T-FRI method), where the scale and move factors of each rule antecedent are shown to be calculated in the upper box and the interpolated result is obtained by the corresponding transformations shown underneath. For conciseness, such a process can be collectively represented by: \(B^{*} = T(B^{\prime },s_{z},m_{z})\), emphasising on the significance of both scale and move transformations.

Fig. 5
figure 5

Fuzzy rule interpolation via scale and move transformations

3.3.3 Representative modifications to scale and move transformation-based FRI

Following the generic and seminal ideas of the above-reviewed T-FRI approach, there have been a large family of works that have been proposed to further improve it, of which an overview is previously shown in Table 4. This section provides an outline of representative methods within this family.

  • Adaptive T-FRI (Yang and Shen 2011; Yang et al. 2017)

This work is motivated by an observation that there may exist inconsistency in interpolated results after a sequence of T-FRI operations. The potential reasons have been analysed to include detective interpolated rules and inaccurate interpolative transformations. The adaptive fuzzy interpolation enhances the original T-FRI with the ability for identification and correction of defective rules in interpolative transformations, facilitating the removal of certain inconsistencies. This is accomplished through two sub-systems: (1) a diagnostic sub-system that is constructed by the use of the general diagnostic engine, where the inconsistent interpolated results are recorded in an ATMS (assumption-based truth maintenance system) (deKleer 1986); and (2) a corrective sub-system that is derived from a fuzzy extension to the traditional numerical interpolation theory and its application in approximation computation. However, this work is focussed on the implementation of adaptive T-FRI that involves just two multiple-antecedent rules. Besides, further investigation is required to reveal whether it can handle situations where extrapolation is necessary (since the original T-FRI is able to deal with extrapolation in the same manner as with interpolation).

  • Backward T-FRI (Jin et al. 2014, 2019)

Conventional FRI generally executes in a “forward” manner, where the consequent is required to be interpolated given the rule base and all antecedent attributes of an observation available. Nevertheless, situations may arise when certain crucial antecedents are absent from the given observation, which may also be involved in the subsequent interpolation process, thereby leading to the failure of the derivation of the final interpolated conclusion. This important issue is addressed by a modification of T-FRI, termed backward T-FRI, which provides a series of solutions for handling both single missing antecedent value and multiple missing antecedents problems. The single missing antecedent issue is resolved by implementing a four-step computation procedures (mirroring what is presented in Sect. 3.3.2) of the original T-FRI, resulting in the reverse calculation of the relative parameters corresponding to the unknown antecedent value. The general backward T-FRI with multiple missing antecedent values is addressed by two procedures: (1) a direct extension of the method for the single missing value case, by estimating parameter combinations that would lead to the closest resemblance of the original (missing) values; and (2) an approach for the removal of possible missing antecedent values through a process of verifying interpolative results against (past) observations. The modification for backward T-FRI is proven to preserve many crucial properties that the original T-FRI possesses, e.g., the capability in handling multiple multi-antecedent rules and the maintenance of convexity and normality of interpolated results. Whilst backward T-FRI helps address the problem of missing antecedent attribute values it does not totally remove this problem, especially when the scale of missing values becomes substantial.

  • Dynamic T-FRI (Naik et al. 2017b)

A great majority of the existing transformation based FRI mechanisms work on a static sparse rule base. However, the use of a static rule base may affect the effectiveness of FRI due to the absence of the most concurrent (dynamic) rules as the requirements of fuzzy systems may change over time. Yet, a volume of intermediate fuzzy rules are typically generated from this type of FRI methods while executing rule interpolation. Collectively, they may gradually cover regions that were uncovered by the original sparse rule base, thereby offering possibly valuable information for updating the static sparse rule base. From this observation, the work of Naik et al. (2017b) makes use of such intermediate rules which are otherwise discarded once the required outcomes have been obtained within the most of transformation based FRI methods, to develop a dynamic T-FRI mechanism. It enriches the rule base by refining and promoting the intermediate rules, gaining efficiency by allowing for more direct rule-firing while minimising future running of the interpolation procedure. It is implemented by first partitioning the interpolated rules into hypercubes, where the nonempty ones are fed as the input into a Genetic Algorithm-based clustering algorithm to find the “best” cluster arrangement. An iterative process is then run to select the densest clusters that have accumulated a sufficient number of candidate rules for achieving the rule aggregation and promotion. The practical significance of this approach is obvious. Further reinforcement is however, still possible, say, by employing more effective and efficient clustering and optimisation methods to replace the relevant components within the current implementation.

  • Higher order T-FRI (Chen et al. 2016; Chen and Shen 2017)

A common implementation shared by most of the established FRI methods is that fuzzy membership functions in the rules and observations are generally defined with conventional type-1 fuzzy sets, for the interpretation and treatment of uncertainty. Very little of the existing FRI work can conjunctively handle more than one form of uncertainty in the rules or observations, despite there may be cases in which more complicated fuzzy set representations become necessary (Fu and Shen 2010). In response to this observation, a higher order FRI has been developed, allowing for the representation and manipulation of different types of uncertainty in knowledge within the common T-FRI framework. It works by first encoding uncertain knowledge with higher order representation and then, by deriving the final conclusions through performing higher order interpolation over models of such representation. In particular, two common types of technique for uncertainty representation are exploited, resulting in: (1) a rough-fuzzy set based rule interpolation approach (Chen et al. 2016), which facilitates the representation of uncertain fuzzy membership functions with rough-fuzzy approximations; and (2) an interval type-2 fuzzy set-based approach (Chen and Shen 2017), which works in the same manner as with the rough-fuzzy-based T-FRI. Within either method, the concept of representative value of a fuzzy set also plays an indispensable role as within the original type-1 T-FRI. Another type-2 fuzzy set-based FRI method can be found in Chen and Lee (2011). Such methods require relative modifications corresponding to each particular uncertainty representation, which inevitably increases the computational complexity as the cost for exchange of a much more general T-FRI mechanism that will collapse back to the type-1 method if all uncertainty involved can be sufficiently captured by type-1 fuzzy sets.

  • Other T-FRI-like approaches

Apart from the above-outlined modifications to T-FRI that are directly investigated and improved upon the original T-FRI method, there are other proposals for reinforcing fuzzy interpolative reasoning which are analogous to the basic ideas of T-FRI (Li et al. 2005). For instance, in Chen and Adam (2018), ranking values of arbitrary polygonal fuzzy sets are used to express the characteristic points of the underlying membership functions, which are in turn used to play a similar role in the modified transformation-based FRI process as the Rep values do in T-FRI. In addition, the scale and move transformations involved in T-FRI are replaced with the distance ratio and move rate, respectively, to transform the constructed intermediate rule in an effort to obtain the final interpolated outcome.

Another variation of T-FRI is reported in Chen and Ko (2008), named CK FRI hereafter to acknowledge it inventors. The classical Rep values are substituted by characteristic values in this work, facilitating not only the simplified representation of a fuzzy set but also the definition of the distance between fuzzy terms. For situations where polygonal fuzzy sets are involved, the interpolated fuzzy set being sought is derived by calculating each of the characteristic points that are obtained from a series of \(\alpha\)-cuts. Particularly, the normal points (of which the membership is 1) are first determined, aiding in any subsequent calculation of the remaining points. From this, two transformations, namely increment and ratio transformations, are executed to convert the average consequent into the final interpolated outcome with the similarity degree measured between these two analogous to that of the average antecedent and the observation. Improved on this work further, two enhanced transformations have been introduced (Chen et al. 2009) to support weighted approaches to FRI (that will be addressed separately later in this review).

3.3.4 Generalised function-based FRI

Bearing significant similarity with the intermediate rule based FRI methods as outlined above is another approach, which is herein referred to as generalised function-based for convenience. Example methods falling within this family include those reported by Baranyi et al. (1995, 1996a, b, 1998, 2004), Baranyi and Kóczy (1996a, b). Unlike the \(\alpha\)-cut based interpolation algorithms, given an unmatched observation, this approach infers the conclusion based on the interpolation of fuzzy relations instead of using \(\alpha\)-cut distances. It works through two major steps which are briefly outlined below for academic completeness. Further details can be referred to the relevant references provided.

Given two fuzzy rules (say, \(r^{1},r^{2}\)) and an observation (\(o^{*}\)) in the form of Eq. (1), the core of a generalised function-based FRI method can be described through the following two stages.

3.3.4.1 Generation of interpolated firing rule

The aim of this first stage is to create an intermediate rule \(r^{\prime }\), in such a way that the antecedent of \(r^{\prime }\) is as “close” to that of the observation (\(A^{*}\)) as possible. Note that the term “close” here stands for the case where at least partial overlapping is ensured between the observation and the intermediate rule. This implies that the firing of the resulting intermediate rule can be subsequently conducted (see the next stage). Denote the procedure of this stage by

$$\begin{aligned} r^{\prime } = f^{Interpolation}\big (r^{1},r^{2}\big ) \end{aligned}$$
(37)

where \(f^{Interpolation}\) represents a mapping from a pair of rules onto a set of all possible rules of the form as per Eq. (1). There are two types of algorithm that may be utilised to implement this stage of the approach:

  • Fuzzy relation interpolation, which includes any of the solid cutting methods (Baranyi et al. 1995, 1996a, b; Baranyi and Kóczy 1996a, b), and those based on the fixed point law or fixed value law (Ding et al. 1989, 1992; Mukaidono et al. 1990; Shen et al. 1993, 1988).

  • Semantic relation interpolation, which includes any of the semantic revision methods (Ding et al. 1989, 1992; Mukaidono et al. 1990; Shen et al. 1993, 1988), using the semantic revision principle to describe the relation between the antecedent and consequent fuzzy sets within an interpolated intermediate rule.

3.3.4.2 Inference with single rule firing

This second stage is to fire the intermediate rule returned by the first. This is enabled by temporarily regarding the newly generated intermediate rule as one of the existing rules within the rule base, and also by computing the overlapping between the observation and the antecedent of the intermediate rule. The procedure implementing this stage can be generally denoted by

$$\begin{aligned} B^{*} = f^{Inference}(r^{\prime },A^{*}) \end{aligned}$$
(38)

Exactly what mechanism to implement rule-firing may vary with respect to different FRI methods in this family. Any method reported in Ding et al. (1989, 1992), Mukaidono et al. (1990) and Shen et al. (1993, 1988) may be utilised to directly fire the transformed intermediate rule to compute the final consequent value required.

For simplicity, the above description has been focussed on interpolation for the cases where a single rule antecedent is considered. As with \(\alpha\)-cut based and T-FRI approaches, the generalised function-based mechanism has also been extended to performing FRI with multiple rule antecedents and fuzzy extrapolation. More details can be found in Baranyi et al. (2004) and other derivatives. Overall, the workflow of such a method can be illustrated in Fig. 6. Conceptually, this is of course very similar to the underlying approach taken by T-FRI.

Fig. 6
figure 6

Workflow of generalised function-based fuzzy rule interpolation

3.3.5 Geometry-based linear FRI (GLFRI)

As can be seen from the above, numerous contributions have been made to achieve fuzzy interpolative reasoning that perform with respect to a given sparse rule base. The review so far has been organised in terms of two fundamentally different approaches: \(\alpha\)-cut based and intermediate rule based. Whilst individual methods within each of these two groups share a number of similar characteristics, it is difficult to demonstrate their underlying interpolating patterns in a closed mathematical form (Yang and Shen 2013). A recent study presented in Das et al. (2019), referred to as geometry-based linear FRI (GLFRI), has attempted to solve this problem using the theory of fuzzy geometry, through which the basic idea of FRI can be geometrically visualised.

GLFRI belongs to the intermediate rule based FRI scheme, requiring a two-step procedure to implement fuzzy interpolative reasoning. This method first derives an intermediate rule as a convex combination of adjacent rules, then the final conclusion is obtained from the intermediate rule through excuting geometrical transformation. In particular, fuzzy rules are interpreted as fuzzy sets that are represented as certain fuzzy points in a multidimensional space. As such, all the fuzzy rules expressed in the form of multidimensional fuzzy points may be joined by fuzzy line segments. The first step of the interpolative process is then to construct an intermediate rule corresponding to the observation, which is implemented via a convex combination of the neighbouring rules of the observation. Since the fuzzy rules have been equivalently represented as fuzzy points, the point signifying the required intermediate rule can be obtained by the use of classic linear interpolation of the crisp values by manipulating the fuzzy line segments formed by the point expressing the observation and those neighbouring points (i.e., rules) to it. In the second step, predefined expansion or contraction parameters are calculated to transform the resulting intermediate point, by comparing the difference between the antecedent part of this point and the observation, which are subsequently utilised to transform the consequent of intermediate rule to derive the expected conclusion.

Whilst GLFRI is promising in presenting the computational process of FRI mathematically in a closed form, with geometrical visualisation, this initial attempt takes into consideration of only convex normalised fuzzy sets with singleton core, further extension of this study involving non-normal and non-convex fuzzy sets may form a piece of interesting investigation.

3.4 Comparison of representative FRI methods over common criteria

From the above, it can be seen that in common, running an FRI algorithm, be it \(\alpha\)-cut based or intermediate rule based, results in an inference consequent. This is in response to an unmatched observation via interpolating the fuzzy rules in a given rule base, thereby achieving the goal of interpolative reasoning. However, there are significant differences between these approaches. This subsection presents a comparison of a number of representative FRI methods reviewed previously, with respect to criteria that are commonly utilised in the literature for interpolation performance evaluation.

Theoretically, FRI is essentially a mapping (Tikk et al. 2011; Jenei 2001) that relates the input space \({\mathscr {A}}\) and the output space \({\mathscr {Z}}\), where fuzzy subsets in the domain \({\mathscr {A}}\) and \({\mathscr {Z}}\) (denoted by \({\mathscr {F}}({\mathscr {A}})\) and \({\mathscr {F}}({\mathscr {Z}})\), respectively) indicate the values of rule antecedent attributes and the value of rule consequent [as defined in Eq. (1)]. That is, given a rule base R, \(\forall r^{i} \in R\), the values \(\{A_{1}^{i},A_{2}^{i},\dots ,A_{m}^{i}\} \in {\mathscr {F}}({\mathscr {A}})\), of the m antecedent variables, and the rule consequent value \(B^{i} \in {\mathscr {F}}({\mathscr {Z}})\). FRI pursues to define the correlation \(I: {\mathscr {F}}({\mathscr {A}}) \rightarrow {\mathscr {F}}({\mathscr {Z}})\), which associates to an observation \(A^{*} (=\{A_{1}^{*},A_{2}^{*},\dots ,A_{m}^{*}\}) \in {\mathscr {F}}({\mathscr {A}})\) an interpolated conclusion \(I(A^{*})=B^{*}\) where \(B^{*} \in {\mathscr {F}}({\mathscr {Z}})\). Thus, FRI methods are required to satisfy certain common properties as a mapping function, which also form the general criteria facilitating the comparison amongst them.

Table 6 summarises the most commonly used criteria for FRI evaluation in the literature. Any given FRI method is expected to meet or qualify at least a certain number of such properties to be effective in performing interpolative reasoning. Over the history of FRI development, a number of approaches that have been reported at the early stages have been analysed and compared against these criteria in the previous work of Johanyák and Kovács (2006), Tikk et al. (2011), especially for the \(\alpha\)-cut based FRI methods including the seminal linear interpolation mechanism introduced in Kóczy and Hirota (1993a, b) and its derivatives. Such comparative discussion is therefore, not comprehensively included in the present review to avoid redundancy. Instead, particular attention is drawn for more recently developed FRI approaches, including many popular transformation-based techniques. As a summary, Table 7 presents the results of evaluating the representatives of such FRI methods, over the common criteria.

Table 6 Commonly adopted criteria for FRI evaluation

In general, it is not necessary that all such criteria are fulfilled in developing an FRI method. However, it is expected that most of the property should be satisfied, with other problem specific parameters to fulfill given a certain application. This also points out the trend in the development of FRI techniques, that is to amend the drawbacks of the existing FRI methods and to satisfy more criteria. For example, the very first proposal for FRI, KH linear interpolation (Kóczy and Hirota 1993a), is well-known that it cannot always guarantee the convexity of the derived fuzzy sets (i.e., C2 in Table 6) although they may be normal. This has led to much attention being paid to building FRI mechanisms that ensure not only normality but also convexity of inferred consequences. This in turn, has led many advanced variations of KH method. A number of recently developed FRI approaches are able to accomplish many key criteria successfully, including the listed C1–C7 as shown in Table 7. Also, criteria C9, C10 and C11 have increasingly become more demanded as more sophisticated fuzzy systems are constructed that enjoy more significant interpolative reasoning power.

Table 7 Evaluation of representative FRI methods over common criteria

4 Weighted FRI methodologies

In conventional fuzzy interpolative reasoning systems, multiple rules are generally involved with each concerning multiple rule antecedent attributes. However, these antecedent attributes are assumed to have equal significance when they are working together within the rule interpolation process. Recently, a number of methods have been proposed to weight the rule antecedents and to integrate the weights into the traditional algorithms where attributes are not weighted.

This section reviews such advances in the development of weighted fuzzy interpolative reasoning systems. Table 8 lists the titles of the methods being reviewed, and an acronym is assigned to each to act as the short name after its inventors while reflecting the year of the relevant publication. The rest of this section is organised by first summarising five typical approaches, followed by presenting a brief comparison amongst them, and finished with a qualitative, overall comparison between the weighted FRI methods and the original non-weighted approaches.

Table 8 Weighted fuzzy interpolative reasoning schemes with short names

4.1 Typical weighted FRI approaches

This section reviews five representative fuzzy interpolative reasoning mechanisms which are achieved by weighted FRI. As indicated previously, two key issues, namely weight learning and integration of weights in FRI, are the main concerns in implementing any weighted FRI approach. The following subsections are therefore composed of three parts for each method, by first reporting the development regarding these two issues and then drawing summarising remarks. To facilitate better understanding, all weighted FRI methods are outlined by the use of unified pseudo code for the main procedural steps.

4.1.1 Center of gravity-based weighted FRI (LHTZ2005)

A weighted FRI is presented in Li et al. (2005) as the original approach for fuzzy interpolative reasoning supported with antecedent weights. This is referred to as the LHTZ2005 method in Table 8 and hereafter. All weights in this approach are implemented by the use of trapezoidal fuzzy sets.

4.1.1.1 Learning weights

There is little learning involved in LHTZ2005, but the weights of the individual rule antecedents are predefined by domain experts. Each antecedent attribute within different rules of the rule base is assigned a different weight. For instance, two weighted fuzzy rules used for weighted interpolation are therefore, represented such that

$$\begin{aligned} \begin{aligned} r^{1}\,:\; if \;a_{1} \; is \; A_{1}^{1} \; \big (AW_{1}^{1}\big ) \; and \; a_{2} \; is \;A_{2}^{1} \;\big (AW_{2}^{1}\big ) \; and \cdots \; and \; a_{m} \; is \; A_{m}^{1} \; \big (AW_{m}^{1}\big ),\; then \; z \; is \; B^{1} (C_{1}) \\ r^{2}\; :\; if \; a_{1} \; is \; A_{1}^{2} \; \big (AW_{1}^{2}\big ) \; and \; a_{2} \; is \; A_{2}^{2} \; \big (AW_{2}^{2}\big ) \; and \cdots \; and \; a_{m} \; is \; A_{m}^{2} \; \big (AW_{m}^{2}\big ),\; then \; z \; is \; B^{2} (C_{2}) \end{aligned} \end{aligned}$$
(39)

where \(AW_{j}^{i}\) stands for the weight of the jth antecedent variable (\(j=1,2,\dots ,m\)) in the rule \(r^{i},i=1,2\), and \(C_{i},i=1,2\) is the certainty factor of \(r^{i}\). Note that all of the \(AW_{j}^{i}\) and \(C_{i}\) are specified as trapezoidal fuzzy numbers. As such, the computational effort involved may generally increase significantly.

4.1.1.2 Weighting FRI

This weighted fuzzy interpolative reasoning process essentially extends the FRI mechanism of Huang and Shen (2003) that uses only triangular fuzzy sets. In this work, the center of gravity (COG) of a fuzzy set is used to represent the fuzzy set for simplicity. In particular, the COG of a trapezoidal fuzzy set \(A=(a,b,c,d)\) is defined as a pair \((h_{L},h_{R})\):

$$\begin{aligned} h_{L}=\frac{1}{3} (a+b+d) \qquad h_{R}=\frac{1}{3} (a+c+d) \end{aligned}$$
(40)

where abcd denote the characteristic points of A with a and d having a membership of 0, and b and c being the odd normal points (i.e., the two extrema points of the nuclear of the trapezoidal with a membership of 1).

The distance between two trapezoidal fuzzy numbers \(A_{1}\) and \(A_{2}\) are defined using their COG pairs [namely, \((h_{1L},h_{1R})\) and \((h_{2L},h_{2R})\)] as follows:

$$\begin{aligned} d(A_{1},A_{2})=\frac{1}{2} (h_{2R}+h_{2L}-h_{1R}-h_{1L}) \end{aligned}$$
(41)

From this, LHTZ2005 can be summarised in Algorithm 1, showing the main execution steps. Note that the weights of rule antecedents are typefaced in bold wherever they are integrated within FRI in order to highlight the weighting mechanism.

figure a
4.1.1.3 Remarks
  1. 1.

    The weights of rule antecedent variables in this approach are assumed to be predefined, which requires human intervention and hence, adversely reduces the flexibility of the resulting fuzzy interpolative reasoning system.

  2. 2.

    As reflected in Algorithm 1, the individual weights for antecedent variables are only involved in the calculation of the aggregation factors \(\lambda _{j},j=1,2\) while constructing the new inference rule \(r^{\prime }\) (as shown in Line 1). The aggregation over the scale and move rates to compute the consequent variable is simply implemented by an algebraic average of the corresponding antecedent items (Line 3), which are externally assigned, to signify their individual significance levels in influencing the consequent.

  3. 3.

    Computational complexity may be increased significantly due to the use of trapezoidal fuzzy sets to represent the weights, but the interpretability may be improved if these weights are associated with domain-specific linguistic terms (which is possible given they are defined by the domain experts).

4.1.2 GA-based weighted FRI (CC2011a)

The method of genetic algorithm (GA)-based weighted fuzzy interpolative reasoning integrates the weights of rule antecedents within the underlying FRI procedure it adopts. This work is referred to as CC2011a with details given in Chen and Chang (2011b). In this method, the weights of the antecedent variables are automatically learned by the use of a GA-based weight-learning algorithm. The fuzzy sets are represented with polygonal or bell-shaped membership functions.

4.1.2.1 Learning weights

The learning method for generating the optimal weights of the rule antecedent variables used for this weighted FRI is based on the CHC algorithm (Eshelman 1991), which is a specialisation of traditional GAs (Holland 1975). This GA-based learning mechanism encodes the weights of individual antecedent attributes into a chromosome, on which each gene represents an individual attribute weight.

An initial population is randomly generated, forming the start point of the evolutionary weight learning process. For each chromosome in the current population, it decodes a certain weight value, which is to be employed in the proposed weighted FRI. The weighted interpolative scheme is then triggered for a set of training samples, with the interpolated outcomes recorded. The selection of “good” chromosomes depends on a predefined fitness function by comparing the values between the inferred outputs and the target outputs of the training samples. From this, a crossover operation is carried out among the selected chromosomes, forming the next generation. Once the number of evolutions reaches a predefined maximum number of evolutions, this iterative weight learning procedure terminates and the chromosome with the largest fitness value is deemed the optimal. The final learned weights for the rule antecedent variables are obtained by decoding the optimal chromosome. In so doing, this GA-based weight learning scheme follows a so-called “wrapper” approach, which mixes up weight learning and weighted FRI procedures. The weights obtained from the current generation are required to be integrated within FRI, to enable the evaluation of the fitness values.

4.1.2.2 Weighting FRI

A key concept employed for facilitating this weighted interpolative reasoning is the ratio of fuzziness (RF) (Chen and Chang 2011b). For polygonal fuzzy sets, the degree of fuzziness is computed in relation to the areas of the fuzzy sets. Let A and B be two polygonal fuzzy sets, the ratio of fuzziness RF(AB) of A to B is defined as follows:

$$\begin{aligned} RF(A,B)=\frac{S(A)}{S(B)} \end{aligned}$$
(42)

where S(A) and S(B) denote the area of the membership function of A and that of B, respectively.

The central idea for the computation of the interpolated consequent in response to an (unmatched) observation is the following: The algorithm attempts to keep the RF of the fuzzy set of each attribute in the observation over that of the corresponding antecedent of a selected rule for interpolation the same as the RF of the fuzzy set of the required consequent (to be computed) over that of an artificially constructed intermediate rule consequent. The intermediate rule consequent is herein generated by aggregating the consequents of rules that are involved for interpolation. This idea in effect reflects fuzzy or generalised modus ponens, a keen to the approach taken by T-FRI.

The weights of the rule antecedent variables are integrated within FRI by following the routine which is summarised in Algorithm 2. In order to emphasise the role that those antecedent weights play in the entire weighted FRI procedure, the individual weights are highlighted in bold in this algorithm description.

figure b
4.1.2.3 Remarks
  1. 1.

    The GA-based weight learning scheme requires many predefined parameters, such as fitness function and the maximum iteration number.

  2. 2.

    In the evolutionary learning process, the updating of weights requires repeated runs of weighted FRI to compute the consequent using the current weights, in order to evaluate their fitness. This means the weight learning process is affected by the implementation of the underlying FRI process itself.

  3. 3.

    The individual weights of rule antecedent attributes are only involved in the aggregation to obtain the rule weights, as shown in Line 3. They are not integrated with the underlying FRI.

4.1.3 Piecewise fuzzy entropies-based weighted FRI (CC2016)

In Chen and Chen (2016), another method for weighted fuzzy interpolative reasoning is proposed through the exploitation of the concept of piecewise fuzzy entropies. This is referred to as CC2016 hereafter, which can handle fuzzy sets defined by polygonal and bell-shaped membership functions. In this method, weights are assigned differently to each antecedent variable when dealing with the same variable that is involved in different rules used for interpolation.

4.1.3.1 Learning weights

In CC2016 a fuzzy rule is generally represented in the following:

$$\begin{aligned} r^{i}\,:\; if \; a_{1} \; is \; A_{1}^{i} \; \big (AW_{1}^{i}\big ) \; and \; a_{2} \; is \; A_{2}^{i} \; \big (AW_{2}^{i}\big ) \; and \cdots \; and \; a_{m} \; is \; A_{m}^{i} \; \big (AW_{m}^{i}\big ), \; then \; z \; is \; B^{i} \end{aligned}$$

where \(AW_{j}^{i}\) stands for the weight for jth antecedent variable in the rule \(r^{i}\). As indicated above, for a certain antecedent variable, its weight is allowed to be different when it is involved in different fuzzy rules. Such weights of rule antecedent attributes are generated during the weighted FRI process itself, which is explained next.

4.1.3.2 Weighting FRI

The fuzzy sets in this work are assumed to be polygonal, which are represented by their characteristic points (CPs), paired with the corresponding membership degrees. Let A be a polygonal fuzzy set in the universe of discourse and the number of CPs for characterising A be n, then

$$\begin{aligned} A=\big (a_{0},a_{1},\dots ,a_{l},a_{c},a_{r},\dots ,a_{n-1};\mu _{0},\mu _{1},\dots ,\mu _{n-1}\big ) \end{aligned}$$

where \(a_{l}\) and \(a_{r}\) are called the “left normal point” and the “right normal point”, and \(a_{c}=\frac{a_{l}+a_{r}}{2}\) the “central point”, with \(\mu _{0}=\mu _{n-1}=0, \mu _{l}=\mu _{r}=1\).

The basic idea is to construct an interpolated consequent fuzzy set \(B^{*}\) with regard to an input observation, by estimating its n CPs and the corresponding membership values such that

$$\begin{aligned} B^{*}=\big (b_{0}^{*},b_{1}^{*},\dots ,b_{n-1}^{*};\mu _{0}^{*},\mu _{1}^{*},\dots ,\mu _{n-1}^{*}\big ) \end{aligned}$$

The key step to perform the estimation of the membership degrees is carried out through computing the piecewise fuzzy entropies of the fuzzy sets involved. The concept of piecewise fuzzy entropy is defined via the notion of non-probability fuzzy entropy of a fuzzy set (see Al-Sharhan et al. 2001; De Luca and Termini 1972). In particular, the piecewise fuzzy entropy \(H_{t-1,t}(A)\) between the (\(t-1\))th CP and the tth CP of a polygonal fuzzy set A is specified as below:

$$\begin{aligned} H_{t-1,t}(A)=-K \sum _{s=t-1}^{t} [\mu _{s} log_{10}(\mu _{s}) + (1-\mu _{s})log_{10}(1-\mu _{s})] \end{aligned}$$
(43)

where \(K=1/n\), \(1\le t \le n-1\), and \(\mu _{s}\) denotes the degree of membership of the characteristic point \(a_{s}\).

The weighted interpolative approach is summarised in Algorithm 3, with the individual weights highlighted in bold where they are learned and used.

figure c
4.1.3.3 Remarks
  1. 1.

    As indicated above, in this method, the weights for individual rule antecedent variables are assigned differently when different rules involving them are taken into consideration.

  2. 2.

    The generation of the antecedent attribute weights is achieved during the weighted FRI process, as shown in Line 5 of the algorithm.

  3. 3.

    The individual weights of rule antecedent attributes are only involved in the aggregation stage to obtain the overall rule weights, as shown in Line 6. Unfortunately, such useful information is not integrated within the rest of the fuzzy interpolative reasoning process.

4.1.4 Weighted increment and ratio transformation-based weighted FRI (CKCP2009)

Another weighted FRI method is presented in Chen et al. (2009), referred to as CKCP2009 hereafter. It uses weighted increment transformation and weighted ratio transformation to enable weighted fuzzy interpolative reasoning. A “wrapper” algorithm is implemented for automatically tuning the optimal weights of the antecedent variables appearing in a fuzzy rule, capable of dealing with polygonal, Gaussian and bell-shaped membership functions.

4.1.4.1 Learning weights

The weights on individual rule antecedent variables are automatically learned with a “wrapper” mechanism. The weighted interpolation process is required to be iteratively triggered in order to update the current weights. Particularly, the weight learning procedure within the proposed weighted FRI is tailored for a certain system control problem, where one input may lead to several states indicating the current values of the observation. The weight learning process is summarised below.

Individual weights are initialised with the same value to start the first iteration. A set of training samples as rule antecedent attribute values are then employed as the input to the FRI system, together with the current weights, resulting in the next states of these variables. To adjust the weighting of each rule antecedent attribute, the gradient-descent training method is utilised, where a predefined fitness function over the rule antecedent variable is evaluated using the value recorded in its final state. The fitness function generates the prediction error for each antecedent variable in the current iteration and the weights are then modified with the aim to minimise the error, which are subsequently employed to run the next iteration. The entire iterative weight updating process is terminated when a preset maximum number of iterations is reached.

4.1.4.2 Weighting FRI

As with many FRI methods reviewed previously using Rep values, a unique real value is also defined herein and associated with a certain fuzzy set for reflecting the key information on the overall location in its domain. In CKCP2009, for a polygonal fuzzy set \(A=(a_{1},a_{2},\dots ,a_{n-1})\), the characteristic value CV(A) (or Rep as termed elsewhere) is defined as follows:

$$\begin{aligned} CV(A)=\frac{a_{0}+a_{1}+\cdots +a_{n-1}}{n} \end{aligned}$$
(44)

The distance between two polygonal fuzzy sets P and Q is then specified by the use of their CV values such that

$$\begin{aligned} d(P,Q)=|CV(P)-CV(Q)| \end{aligned}$$
(45)

Given these notions, the weighted FRI method can be summarised as shown in Algorithm 4 (Chen et al. 2009), where the multiplication operation between a polygonal fuzzy set A and a real value w (\(w \in [0,1]\)) is defined by

$$\begin{aligned} A \otimes w = (a_{1},a_{2},\dots ,a_{n-1}) \otimes w = (wa_{1},wa_{2},\dots ,wa_{n-1}) \end{aligned}$$
(46)

As with the other weighted FRI approaches, the individual weights are highlighted in bold within the algorithm description.

figure d
4.1.4.3 Remarks
  1. 1.

    This weight learning scheme is an iterative process. It is integrated within the weighted interpolation procedure, of which the outcome is required to be collected to evaluate the fitness function to update the current weights.

  2. 2.

    Although the weights are designed to be automatically tuned for optimisation, the approach is tailored to a specific problem, where the fitness functions for each rule antecedent attribute are predefined. This limits the generality of the underlying techniques.

  3. 3.

    This algorithm reflects the intuition in approximate reasoning in that “how an observation is transformed from an intermediate antecedent fuzzy set should be reflected in how the interpolated outcome is transformed from the intermediate consequent”. This is basically the same as the idea adopted by the conventional T-FRI.

4.1.5 Attribute weighted scale and move transformation-based FRI (LSLYS2018)

A weighted extension of T-FRI, the scale and move transformation-based FRI method (Huang and Shen 2006, 2008), is presented in Li et al. (2018b), which is referred to as LSLYS2018 hereafter. It enables the weights learn from the given sparse rule base only to automatically determine the relative importance of rule antecedent attributes. Also, the obtained attribute weights are thoroughly applied within each core step of T-FRI.

4.1.5.1 Learning weights

Unlike many weight learning schemes reported earlier, in LSLYS2018, the individual weights of rule antecedent attributes are automatically learned by the use of the given sparse rule base only, without acquisition of any observations nor that of any other human intervention. This attribute weighting scheme is enabled by an innovative Reverse Engineering procedure, by generating an artificial training decision table from the given sparse rule base. The essential idea is to reformulate all rules in the rule base into a common representation, where each (possibly) missing value of any rule antecedent is replaced by one of the alternative fuzzy values from its domain. All these reformulated rules, artificial or original, are collated for evaluation of the relative significance degrees of the individual attributes.

The weights of the attributes are individually measured using a certain feature ranking method, which is implemented by modifying the feature evaluation procedure extracted from a selected feature selection (FS) technique. Five approaches are reported for such usage, including the individual feature ranking-based method [namely, Information Gain (IG), Relief-F, Laplacian Score (LS) and Local Learning-based Clustering for FS (LLCFS)] and feature subset evaluation-based method [rough set-based FS (RSFS)]. References for these FS methods can be found in Li et al. (2018b). They may be employed equally but one and just one of the five types of the weighting scheme is required at a time for implementing the weighted T-FRI. The resultant learned weights are uniquely associated with each of rule antecedent variables, no matter which rule it is involved in.

4.1.5.2 Weighting FRI

The weighted T-FRI algorithm presented in LSLYS2018 is summarised in Algorithm 5. In particular, the individual attribute weights are integrated with every procedure of the underlying non-weighted algorithm. Hence, there are a total of three procedures that involve such integration: the selection of the nearest neighbouring rules, the construction of intermediate rules, and the computation of scale and move transformation factors. All computational steps in the original T-FRI, which effectively deals with evenly calculated average of the attribute values, are improved by a weighted aggregation of the corresponding components (as highlighted in bold in Algorithm 5). As a weighted extension to the conventional T-FRI that is described in Sect. 3.3.2, the general rule interpolation process of this extended algorithm remains the same as its original, where the relative computations can be referred to.

figure e
4.1.5.3 Remarks
  1. 1.

    The entire weight learning scheme exploits just the knowledge already available, i.e., the fuzzy rules in the given sparse rule base, without acquisition of any observations nor need for human intervention.

  2. 2.

    Such a learning method is independent from the underlying FRI mechanism. Note that in Table 8, two weighted FRI approaches reported in LSLYS19 (Li et al. 2020b) are the weighted extensions to the original non-weighted FRI [namely KH (Kóczy and Hirota 1993a; Wong et al. 2005) and CCL (Chang et al. 2008)]. Analogous to the weighted T-FRI herein, the weights of antecedent attributes are generated in the same manner, which are then integrated with the two conventional FRI methods throughout each computational procedure.

  3. 3.

    For the implementation of this weighted T-FRI method, no specification of which feature ranking mechanism to use is specified to generate the required weights. Indeed, any of the feature ranking methods available may be taken to assess the relative significance of individual antecedent attributes, offering flexibility in performing interpolative reasoning.

  4. 4.

    Individual attribute weights are integrated with every procedure of the conventional T-FRI. Nonetheless, this weighted T-FRI algorithm can easily degenerate to its original when all antecedent attributes are assumed to be of equal significance. This is because the weight of each attribute has been normalised over the ranking scores derived from a given feature ranking method, which results in an identical weight for each of rule antecedent being \(AW_{j}=1/m, j=1,2,\dots ,m\) if all weights are assumed to be equal.

4.2 Comparison of weighted FRI methods

The above five weighted fuzzy interpolative reasoning mechanisms are typical approaches from the viewpoint of weight learning and FRI weighting. This subsection contrasts these approaches and categorises other weighted FRI methods listed in Table 8 in relation to the distinct features of these five approaches.

4.2.1 Weight learning mechanisms

As reflected by the preceding subsections, typical approaches to weighted fuzzy interpolative reasoning contain basic properties along with which other weighted FRI methods can be grouped and compared.

4.2.1.1 Predefined versus automatically learned

The initial idea for obtaining weights on rule antecedent attributes is simply to predefine them with domain expertise directly acquired from the experts. This approach includes the early work as reported in LHTZ2005 and CC2008. It requires human intervention and hence, adversely reduces the flexibility of the resulting fuzzy systems. Automated weight learning schemes are obviously preferred. Indeed, all of the remaining methods in Table 8 pursue alternative ways to learn weights automatically.

4.2.1.2 Unique weight versus multiple weights for an antecedent attribute

In general, weighted FRI works with fuzzy rule bases that involve multiple rule antecedent variables. Different significance levels are associated with different variables to indicate their different contributions towards the conclusion. In the literature, for a given rule antecedent attribute, certain methods learn a unique weight for each variable independent of what rules that variable appears in, whilst others assign different weights to one common attribute in different rules. The former includes work in CC2011a, CC2008, CH2014, DJS2014, CC2011b, LSLYS18 and LSLYS19, and the latter includes LHTZ2005, CC2016, CKCP2009 and CA2018. When a rule antecedent attribute may be assigned with multiple weights, depending upon which rules it may appear in, the overall rule model becomes more complicated and harder to interpret. Moreover, more specific information regarding the antecedent variables of observations may become necessary, in order to compute the characteristic points of the corresponding fuzzy sets. This may include for example, information on central points (Chen and Chen 2016) or that on ranking values (Cheng et al. 2015) of the fuzzy sets, thereby at the expense of involving more computation to produce the weights than otherwise. Besides, in so doing, the weights are only measurable during the running of the weighted FRI system when an observation is provided.

4.2.1.3 Filter schemes versus wrapper schemes

The terms filter and wrapper are used to group the weight learning schemes, based on their dependence upon whether a weighted FRI method will be recursively called on during the process of weight generation. That is, those weight learning methods following the filter scheme are independent of the weighted FRI process, whereas the wrapper methods need to exploit the outcome of the weighted FRI in order to evaluate the “goodness” or quality of the current weights. The filter approach is taken by CC2016, DJS2014, CA2018, LSLYS18 and LSLYS19, and the wrapper approach by CC2011a, CKCP2009, CH2014 and CC2011b. Since methods belonging to the wrapper group employ interpolated results for constructing the fitness functions (in an effort to update the required weights in the current iteration), their performance in terms of accuracy may be very high, but the computational overheads is relatively costly at the same time.

4.2.2 Weighting FRI procedures

This issue is concerned with how the generated weights of rule antecedent attributes are integrated within the underlying FRI, for revealing the relative significance level of each individual attribute in contributing to the derivation of the interpolated results. As can be seen from the typical weighted FRI mechanisms reviewed previously, the following observations can be drawn:

  1. 1.

    Most existing techniques generally work by artificially creating an overall weight to each of the rules before running the weighted rules in FRI. Such weights are normally computed through aggregating the weights calculated for individual rule antecedent variables, thereby involving additional weight aggregation procedures. Weighted FRI approaches are the most recent developments in the literature. Established examples include: weighted T-FRI, weighted KH and weighted CCL (see Li et al. 2018b, 2020b for details), all of which exploit the individual antecedent weights to improve the original unweighted methods, signifying the importance of each attribute in influencing the conclusion given an unmatched observation.

  2. 2.

    Learned weights are seldom systematically integrated within all major components of the weighted FRI algorithm, but just involved in certain computational subroutines. As such, information regarding domain attribute significance is not exploited to its full potential. Fortunately, the recent developed weighted FRI offers a possibility of a general weighting scheme that enables different unweighted FRI methods to be supported with antecedent weights in a common manner. In so doing, it helps facilitate transplanting a developed weighting scheme from one FRI mechanism to another once the weights of rule antecedent attributes are available.

4.3 Weighted versus non-weighted FRI

In Sect. 3 and the above of the present section, the conventional (unweighted or flat) FRI techniques and the recently advanced approaches for weighted FRI have been systematically reviewed, and the relative pros and cons of the individual methods have been pointed out within each of these two categories. Detailed, quantitative comparisons between them are beyond the scope of this review, but such results can be found in the relevant references that report the advancements of individual FRI methods (e.g., Li et al. 2018b, 2020b). Nonetheless, it is helpful in completing this review of the existing weighted FRI mechanisms to qualitatively contrast the performances between these two categories of approaches.

As indicated in Sect. 4.1.5, three major approaches to weighted FRI have been proposed in the literature, each being a weighted extension to their original (namely, T-FRI, KH and CCL), by computing and taking into account of the rule antecedent attribute weights. The resultant weighted T-FRI, weighted KH and weighted CCL have been comprehensively compared against their corresponding non-weighted originals, when applied to solving a wide range of benchmark classification problems, from the viewpoints of both effectiveness and efficiency. First of all, all empirical results available in the literature have demonstrated that the weighted fuzzy interpolative reasoning mechanisms outperform their non-weighted FRI counterparts, in terms of the accuracy of interpolated results and hence, of the effectiveness of the corresponding FRI methods, especially when the rule bases are considerably sparse.

As the weights are integrated within the computational progress of FRI, the time complexity of a weighted algorithm may become a natural concern. Thus, investigations have also been carried out to reveal how much extra computation effort may be required by running a weighted approach. Such a comparison will only make sense if it is conducted between those weighted methods and their underlying non-weighted originals, just as with the case when their relative performances in terms of accuracy have been assessed. Importantly, note that the attribute weights can also be exploited to help modify the selection of the nearest neighbouring rules (as shown in Algorithm 5), which are to be utilised to implement weighted rule interpolation. Comparative experimental investigations have therefore, been accomplished in the literature with regard to various cases where different numbers of closest rules (in the range of 2-6) are selected to conduct interpolation. The results of running time have convincingly shown that there is no significant increase in the time cost by a weighted FRI as compared to that by its original where no weights are involved, while using the same number of rules for any of three FRI methods. This positively differs from the initial expectation due to the first glance at the extended algorithms.

Whilst there is indeed an increase in time consumption (though generally rather small) when exploiting more closest rules for all FRI methods, which is independent of using a weighted FRI method or not, a more important conclusion has been drawn from observing the change of classification accuracy by varying the number of neighbouring rules being used. That is, in general, the weighted FRI methods (no matter which one) only require the least number (i.e., 2) of the nearest neighbouring rules to be taken to perform interpolation while achieving the best performance. This finding significantly enhances the algorithm efficiency through avoiding the involvement of more rules in the implementation of the interpolative process.

Finally, it is worth pointing out that, in terms of the criteria commonly adopted in the literature to evaluate the FRI methods (as per the discussions of Sect. 3.4), weighted approaches naturally inherit those properties possessed by their respective originals. However, it should be noted that there exists limited work on the development of weighted fuzzy interpolative reasoning schemes which differ from what is focussed on in this review. For example, a particular piece of research as reported in Chen et al. (2013b) constructs a weighted FRI method based on interval type-2 fuzzy sets. Such work however, involves higher-order representation and hence, substantially more complex computation than those reviewed above. Being significantly more sophisticated in their underlying mathematical representation and computational implementation, and empirically less studied so far, their details are omitted in this paper.

5 Conclusion

This survey has reviewed a range of important techniques for approximate reasoning that work with incomplete and imprecise knowledge. It has demonstrated that fuzzy rule interpolation (FRI) is able to perform approximate inference when traditional rule-based methodologies fail, for situations where no existing rules in a given sparse rule base match a novel observation. The paper has reviewed the general FRI methodologies, highlighting the strengths and limitations of classical approaches. It has also comprehensively analysed a family of most recent FRI algorithms that successfully address the common and important problem shared by many conventional FRI mechanisms, where all antecedent attributes are forcefully assumed to be of equal significance. In particular, this survey has compared different FRI techniques, qualitatively outlining the advantages of running a certain FRI method. This work therefore, enables the readers to have an informed choice of what may be the potentially suitable FRI technique(s) to apply given their specific domain problems.

Whilst it has been shown that the FRI techniques in general and the weighted FRI schemes in particular are able to strengthen the power of approximate reasoning, there is much room for further improvement. As such, this survey also helps provide a number of insightful suggestions for future research in this important area. For instance, the current weighted FRI works on a static rule base. Yet, a volume of intermediate fuzzy rules are typically generated while executing the transformation-based rule interpolation. From this, the ideas of a dynamic FRI can be exploited to enrich the rule base by refining and promoting these intermediate rules, gaining efficiency by allowing for more direct rule-firing without running the interpolation procedure. Also, all mainstream work carried out so far in the FRI literature has to do with fuzzy rules of Mamdani type. Most recently, there has been research which reports on extending conventional T-FRI methods to building FRI mechanisms for Takagi Sugeno Kang (TSK) fuzzy models. These initial attempts all follow the conventional unweighted approach. It would therefore, be very interesting to consider further extending such work within the weighted FRI framework. A natural start point for this would be to directly mirror the weight-learning techniques developed within the Mamdani model-based T-FRI to estimate the antecedent weights of the otherwise unweighted conditional attributes in a TSK model. How the resulting weights could be integrated with TSK model-based FRI requires much investigation.

Furthermore, the curse of dimensionality remains a challenging problem in all fuzzy rule-based reasoning systems, including those supported with an FRI procedure. The performance of an FRI system might be strengthened by increasing the number of input variables and that of the linguistic values which each input may take. However, this would inevitably result in the enormously exponential raise of the number of the fuzzy rules required, thereby impractically increasing the system complexity. Even ignoring the issue of computational complexity, the original, fundamental problem that FRI faces is the lack of fuzzy rules in the first place, how to acquire a much large rule base is itself a significant challenge. Fuzzy hierarchical inference models may provide an effective way to alleviate such difficulties, by introducing multiple low-dimensional input fuzzy sub-systems. Thus, another important future study is to investigate how a hierarchical fuzzy model may be integrated with a given FRI working procedure.