1 Introduction

In recent years, deep neural networks (DNNs) have gained huge popularity among researchers and the general public thanks to the eye-catching performances they have demonstrated on many challenging tasks involving visual and audio information (LeCun et al. 2015; Litjens et al. 2017; Li et al. 2018). The breakthroughs achieved by DNNs have also led to the recent hot wave of artificial intelligence (AI) and machine learning research. With the exponentially increasing amount of data for analysis, the stronger capability and greater availability of computational hardware, powerful AI models have been constructed, clearly illustrating super-human machine intelligence as the impetus to revolutionize our industries, society, and everyday lives (Angelov and Gu 2018a; Hagras 2018).

Despite the great promise for a technology revolution, concerns on the issues of understandability and trustability have been raised frequently in a wide range of research communities and industries from the use of complicated machine learning systems (Angelov and Gu 2018a; Hagras 2018; Angelov et al. 2021). For example, the state-of-the-art DNNs are often characterized as the typical type of “black box” models that lack transparency (Zhou and Feng 2017). Such models usually have a huge number of hyper-parameters learned from very large amounts of training data but with no clear physical meaning. Their learning performances rely on careful parameter tuning whilst the training and internal reasoning processes are hardly understandable nor interpretable to human (Feng et al. 2018). The “black box” nature is not specific to DNNs, but also to many mainstream models such as support vector machine (Cristianini and Shawe-Taylor 2000), random forest (Breiman 2001), and learning vector quantization (Kohonen and Maps 1995). Whilst decision tree (Quinlan 1986) and k-nearest neighbour (Cunningham and Delany 2007) are known to be interpretable on small-scale, simple problems, their interpretability and explainability are usually very limited when applied for large-scale, high-dimensional, complex problems (Hagras 2018). Explaining the decisions made by these state-of-the-art models is often challenging and problematic. The lack of explainable features has also hindered their applicability in financial, healthcare and other safety–critical applications, where being able to understand the rationale behind the models’ decision is the key for trust (Dosilovic et al. 2018; Tjoa and Guan 2020).

Fuzzy rule-based systems provide an effective solution for constructing models that offer both great prediction precision and high-level model transparency while being capable of incorporating real-world uncertainties (Angelov and Gu 2018a; Barredo Arrieta, et al. 2020). At the core of such systems is the fuzzy set theory defined by Lofti A. Zadeh in the mid-60 s (Zadeh 1965). Built upon solid theoretical foundation, fuzzy rule-based systems attempt to mimic human reasoning and decision-making processes rather than trying to represent the human brain (Hagras 2018). The use of linguistic IF-THEN fuzzy rules in fuzzy rule-based systems is a natural and intuitive representation of knowledge easy-to-interpret by human. The human-like fuzzy inferencing process enables an explicit explanation for each decision made by fuzzy rule-based models. The decisions can be examined and verified by domain experts, and direct human interaction is also facilitated. With a transparent system structure, an explainable internal reasoning mechanism and intuitive IF-THEN rules to represent abstract knowledge, fuzzy rule-based systems have increasingly regained attention in the current move towards explainable AI (Garibaldi 2019).

This paper provides a systematic review of modern methods for autonomously constructing zero-order and first-order fuzzy rule-based predictive models from data. This review is mostly focused on the autonomous learning features of three different classes of fuzzy rule-based systems, including evolving fuzzy systems, evolutionary fuzzy systems, and reinforcement learning-based fuzzy systems. The main purpose of this overview is to recall the basics of fuzzy systems, to revisit and evaluate different approaches for building such systems, and to summarize the related applications tackling real-world problems. The latest trends and challenges for the development of fuzzy models will also be analysed, which may be useful to learners intended to quickly grasp the current progress in this exciting research area as a whole. Due to the wider range of topics covered, this paper does not aim to provide an exhaustive list of related works but instead focuses on a smaller portion of more representative and popular ones, in order to present the general concepts and principles. Interested readers are referred to previous reviews (Herrera 2008; Fazzolari et al. 2013; Fernández et al. 2015; Fernandez et al. 2019; Škrjanc et al. 2019; Campos Souza 2020; Leite et al. 2020) and monographs (Cordón et al. 2001a; Kasabov 2007; Angelov et al. 2010; Lughofer 2011a; Angelov 2012; Angelov and Gu 2018b) for more details about the developments of fuzzy systems and their applications.

To summarize, main contributions of this paper are as follows:

  1. 1.

    An overview of the basic concepts of fuzzy sets and zero-order and first-order fuzzy systems.

  2. 2.

    A systematic review of the autonomous learning of evolving, evolutionary and reinforcement learning-based fuzzy systems.

  3. 3.

    A critical comparison amongst the three classes of fuzzy systems from learning and application perspectives.

  4. 4.

    A comprehensive discussion on the open issues of fuzzy systems research, highlighting directions for future development.

The remainder of this paper is organized as follows. Section 2 recalls the basic concepts of fuzzy sets and fuzzy systems. The structure and parameter learning schemes of mainstream evolving fuzzy systems are reviewed in Sect. 3. Section 4 summarizes the evolutionary tuning and learning schemes employed by mainstream evolutionary fuzzy systems. Section 5 describes the use of reinforcement learning in constructing fuzzy systems. A comparison between the aforementioned three classes of fuzzy systems is presented in Sect. 6. Section 7 introduces the real-world applications where these fuzzy systems have been implemented for. A case study on stock price prediction is presented in Sect. 8. In the end, this paper is concluded with a discussion about challenges and future directions.

2 Preliminaries: fuzzy sets and fuzzy systems

In this section, basic concepts of fuzzy sets and fuzzy rule-based systems are presented.

2.1 Fuzzy sets

Fuzzy set theory was introduced by (Zadeh 1965) to generalize the concept of fuzziness that is contained in human language, judgment, evaluation, and decisions naturally. Fuzzy sets extend classical set theory, where a data sample can partially belong to multiple sets at the same time with different degrees of membership. Compared with classical bivalent sets (or crisp sets), fuzzy sets allow an extra degree of flexibility to incorporate real-world uncertainties (Zimmermann 2010).

A fuzzy set, \(A\) is characterized by a membership function denoted by \({\mu }_{A}\left(x\right)\) with a value range of \([\mathrm{0,1}]\) (Zadeh 1965). The value of \({\mu }_{A}\left(x\right)\) represents the degree of truth, fulfilment, satisfaction of the fact that \(x\) belongs to \(A\) (Angelov and Gu 2018b). A higher value of \({\mu }_{A}\left(x\right)\), namely, closer to 1, means that the data sample \(x\) is more likely to be a full member of \(A\), and vice versa. In the extreme cases where \({\mu }_{A}\left(x\right)=1\) or \(0\), which means \(x\) is or is not belonging to \(A\), the fuzzy set reduces to the crisp set with a Boolean membership function. Therefore, fuzzy sets can be viewed as a more generalized form of classical bivalent sets.

The commonly used membership functions in fuzzy sets include, but are not limited to, Gaussian, triangular, trapezoidal, Cauchy, etc. Among these, the Gaussian type membership function, as formulated by Eq. (1), is typically considered to be the most widely used one, thanks to the good generalization capabilities and coverage of the whole feature space (Angelov and Zhou 2008).

$${\mu }_{A}\left(x\right)={e}^{-\frac{{\left(x-p\right)}^{2}}{2{\sigma }^{2}}}$$
(1)

where \(p\) and \(\sigma\) are the respective focal point and spread of the fuzzy set \(A\).

In practice, the type of membership function to be used is usually determined by human experts based on their preferences and experience, and may differ significantly from real data distribution (Angelov and Yager 2012). Parameters of membership functions can be either handcrafted with human expertise or learned from data by learning algorithms (Angelov and Gu 2018b).

2.2 Fuzzy systems

Fuzzy rule-based systems are one of the most important application fields of fuzzy set theory. Fuzzy rule-based systems can be interpreted as a set of intuitive and easy-to-interpret IF-THEN fuzzy rules. Fuzzy rule-based systems can also be represented as neuro-fuzzy systems, which can be viewed as a special type of multi-layer feedforward neural networks.

A standard fuzzy rule-based system is composed of four functional components, namely, fuzzifier (maps numerical crisp inputs to fuzzy inputs), inference engine (maps fuzzy inputs to fuzzy outputs using fuzzy rules), knowledge base (contains all fuzzy rules) and defuzzifier (maps fuzzy outputs to numerical crisp outputs) (Fernandez et al. 2019), see Fig. 1.

Fig. 1
figure 1

Standard structure of fuzzy rule-based systems

The knowledge base is the key component for decision-making, and it is represented by intuitive and easy-to-interpret IF-THEN fuzzy rules. Fuzzy rules are usually composed of two parts, namely, premise (IF) part stating conditions on the input(s) and consequent (THEN) part describing the corresponding output(s) (Adriaenssens et al. 2004).

There are two most widely used types of fuzzy rules:

  1. (1)

    Zadeh-Mamdani type (Zadeh 1973; Mamdani and Assilian 1975), introduced by Zadeh, Mamdani and Assilian around 1970s.

  2. (2)

    Takagi-Sugeno-Kang type (Takagi and Sugeno 1985; Sugeno and Kang 1988), introduced by Takagi, Sugeno and Kang around 1980s.

A typical Zadeh-Mamdani type fuzzy rule-based system is composed of fuzzy rules in the form of Eq. (2):

$$\begin{array}{cc}{\mathbf{R}}_{i}:&\begin{array}{c}IF~\left({x}_{1}~is~ {A}_{i,1}\right)~AND~\left({x}_{2}~is~{A}_{i,2}\right)~AND~\dots~AND~\left({x}_{N} ~is~{A}_{i,N}\right)\\ THEN~\left({y}_{i}~is~{A}_{i,out}\right)\end{array}\end{array}$$
(2)

where \({\mathbf{R}}_{i}\) is the ith fuzzy rule; \({\varvec{x}}={\left[{x}_{1},{x}_{2},\dots ,{x}_{N}\right]}^{T}\) is the \(N\times 1\) dimensional system input;\({x}_{n}\) is the nth input variable;\({A}_{i,n}\) is the linguistic term of the nth premise fuzzy set of \({\mathbf{R}}_{i}\); \({A}_{i,out}\) is the linguistic term of the output; and \({y}_{i}\) is the output of \({\mathbf{R}}_{i}.\)

In contrast, a typical Takagi-Sugeno-Kang type fuzzy rule-based system consists of fuzzy rules in the following form:

$$\begin{array}{cc}{\mathbf{R}}_{i}:& \begin{array}{c}IF~\left({x}_{1}~is~{A}_{i,1}\right)~AND~\left({x}_{2}~is~{A}_{i,2}\right)~AND~\dots~AND~\left({x}_{N}~is~{A}_{i,N}\right)\\ THEN~\left({y}_{i}={{\varvec{a}}}_{i}^{T}\overline{{\varvec{x}} }\right)\end{array}\end{array}$$
(3)

where \(\overline{{\varvec{x}} }={\left[1,{{\varvec{x}}}^{T}\right]}^{T}\); \({{\varvec{a}}}_{i}={\left[{a}_{0},{a}_{1},{a}_{2}\dots ,{a}_{N}\right]}^{T}\) is the \((N+1)\times 1\) dimensional consequent parameter vector of \({\mathbf{R}}_{i}\).

It is clear from Eqs. (2) and (3) that both Zadeh-Mamdani type and Takagi-Sugeno-Kang type fuzzy rules have the same linguistic forms of premise parts, but differ in the consequent parts. The consequent part of a Takagi-Sugeno-Kang type fuzzy rule is a linear regression function, while the consequent part of a Zadeh-Mamdani type fuzzy rule is a singleton. Thus, Zadeh-Mamdani type fuzzy rules are also called zero-order fuzzy rules, while Takagi–Sugeno-Kang type fuzzy rules are called first-order fuzzy rules.

In (Angelov and Yager 2012), a novel type of fuzzy rules called AnYa was proposed to simplify the antecedent parts of fuzzy rules to numerical vectors as prototypes learned from data. This type of fuzzy rules also removes the need of defining membership functions for each variable. A zero-order AnYa type fuzzy rule can be formulated as follows (Angelov and Yager 2012),

$$\begin{array}{cc}{\mathbf{R}}_{i}:& \begin{array}{c}IF \left({\varvec{x}}\sim {{\varvec{p}}}_{i}\right)\\ THEN \left({y}_{i}~is~{A}_{i,out}\right)\end{array}\end{array}$$
(4)

and a first-order AnYa type fuzzy rule is expressed as:

$$\begin{array}{cc}{\mathbf{R}}_{i}:& \begin{array}{c}IF \left({\varvec{x}}\sim {{\varvec{p}}}_{i}\right)\\ THEN \left({y}_{i}={{\varvec{a}}}_{i}^{T}\overline{{\varvec{x}} }\right)\end{array}\end{array}$$
(5)

where \({{\varvec{p}}}_{i}\) is the prototype of \({\mathbf{R}}_{i}\); “\(\sim\)” denotes similarity. One can see from Eqs. (4) and (5) that AnYa type fuzzy rules have the same consequent parts as the Zadeh-Mamdani type and Takagi–Sugeno-Kang type fuzzy rules. AnYa fuzzy rule-based systems also follow the same standard fuzzy inference procedure.

2.3 Fuzzy systems for prediction and classification

A typical fuzzy predictor is a system that uses first-order fuzzy rules. Taken the Takagi-Sugeno-Kang type fuzzy rules (Eq. (3)) as an example, the input–output relationship of a standard multi-input single-output fuzzy system is mathematically modelled as (Angelov 2012; Angelov and Gu 2018b):

$$\widehat{y}=f\left({\varvec{x}}\right)=\sum_{i=1}^{L}{\overline{\lambda }}_{i}{y}_{i}=\sum_{i=1}^{L}{\overline{\lambda }}_{i}{{\varvec{a}}}_{i}^{T}\overline{{\varvec{x}} }$$
(6)

where \(L\) is the number of fuzzy rules in the system; \({y}_{i}\) is the output of the ith fuzzy rule, \({\mathbf{R}}_{i}\); \({\overline{\lambda }}_{i}\) is the normalized firing strength of \({\mathbf{R}}_{i}\) calculated by Eq. (7):

$${\overline{\lambda }}_{i}=\frac{{\lambda }_{i}}{\sum_{j=1}^{L}{\lambda }_{j}}$$
(7)

It can be seen from Eqs. (6) and (7) that given a particular input,\({\varvec{x}}\), the overall system output is computed as a fuzzily weighted sum of individual fuzzy rules’ outputs following the so-called “centre of gravity” principle. The fuzzy weights are defined by the firing strengths of the fuzzy rules (Angelov 2012; Angelov and Yager 2012; Angelov and Gu 2018b).

The most commonly used way to calculate the firing strength, \({\lambda }_{i}\) of a fuzzy rule \({\mathbf{R}}_{i}\) based on the membership values is the t-norm:

$${\lambda }_{i}=\prod_{n=1}^{N}{\mu }_{{A}_{i,n}}\left({x}_{n}\right)$$
(8)

Note that, t-norm is dominantly utilized by fuzzy systems with axis parallel rules to aggregate membership values of different attributes (Angelov and Filev 2004; Márquez et al. 2007). For fuzzy systems with non-axis parallel rules, multivariate Gaussian type membership functions are typically employed, and the membership values produced by multivariate membership functions are used directly as the firing strengths of the fuzzy rules (Lemos et al. 2011; Lughofer et al. 2015; Pratama et al. 2015).

A multi-input single-output (MISO) first-order fuzzy predictor can be directly applied to binary classification problems by using the following simple rule to convert continuous system outputs into discrete class labels (“0” and “1”):

$$\widehat{y}=\left\{\begin{array}{cc}\mathrm{class }\,1& if\, f\left({\varvec{x}}\right)>0.5\\ \mathrm{class }\,0& else\end{array}\right.$$
(9)

A MISO fuzzy predictor can also be generalized to a multi-input multi-output (MIMO) one for multi-class classification problems:

$$\widehat{{\varvec{y}}}=f\left({\varvec{x}}\right)=\sum_{i=1}^{L}{\overline{\lambda }}_{i}{{\varvec{y}}}_{i}=\sum_{i=1}^{L}{\overline{\lambda }}_{i}{\mathbf{a}}_{i}^{T}\overline{{\varvec{x}} }$$
(10)

where \(\widehat{{\varvec{y}}}={\left[{\widehat{y}}_{1},{\widehat{y}}_{2},\dots ,{\widehat{y}}_{W}\right]}^{T}\) is the \(W\times 1\) dimensional system output; \({\mathbf{a}}_{i}=[{{\varvec{a}}}_{i,1},{{\varvec{a}}}_{i,2},\dots ,{{\varvec{a}}}_{i,W}]\) is a \((N+1)\times W\) dimensional consequent parameter matrix of \({\mathbf{R}}_{i}\); \({{\varvec{y}}}_{i}\) is the \(W\times 1\) dimensional output of \({\mathbf{R}}_{i}\); and the MIMO Takagi-Sugeno-Kang type fuzzy rule, \({\mathbf{R}}_{i}\) takes the following form (Angelov 2010):

$$\begin{array}{cc}{\mathbf{R}}_{i}:& \begin{array}{c}IF~\left({x}_{1}~is~{A}_{i,1}\right)~AND~\left({x}_{2}~is~{A}_{i,2}\right)~AND\dots~AND~\left({x}_{N}~is~{A}_{i,N}\right)\\ THEN~\left({y}_{i,1}={{\varvec{a}}}_{i,1}^{T}\overline{{\varvec{x}} }\right)~AND~\left({y}_{i,2}={{\varvec{a}}}_{i,2}^{T}\overline{{\varvec{x}} }\right)~AND~\dots~AND~\left({y}_{i,W}={{\varvec{a}}}_{i,W}^{T}\overline{{\varvec{x}} }\right)\end{array}\end{array}$$
(11)

Accordingly, the class labels are determined by:

$$\begin{array}{cc}\widehat{y}=\mathrm{class }{~i}^{*};& {i}^{*}=\underset{i=\mathrm{1,2},\dots ,W}{\mathrm{argmax}}\left({\widehat{y}}_{i}\right)\end{array}$$
(12)

Comparing with first-order fuzzy rules, zero-order fuzzy rules Eq. (2) are more widely used for constructing classification models. For a standard zero-order fuzzy rule-based classifier, the class label of a given input sample, \({\varvec{x}}\) is determined by the fuzzy rule that produces the highest firing strength calculated by Eq. (8) (Angelov and Gu 2018b; Angelov and Zhou 2008; Ishibuchi et al. 1995):

$$\begin{array}{cc}\widehat{y}={A}_{{i}^{*},out};& {i}^{*}=\underset{i=\mathrm{1,2},\dots ,L}{\mathrm{argmax}}\left({\lambda }_{i}\right)\end{array}$$
(13)

where the linguistic terms of zero-order fuzzy rules, \({A}_{1,out}\), \({A}_{2,out}\),…,\({A}_{L,out}\) are the class labels.

3 Evolving fuzzy systems

The concept of evolving fuzzy systems was firstly conceived around the beginning of twenty-first century (Angelov and Buswell 2001, 2002; Kasabov and Song 2002; Angelov 2002; Angelov and Filev 2002, 2003). Evolving fuzzy systems are a class of fuzzy systems that are capable of self-developing and self-updating the system structure and parameters online from data streams (Ge and Zeng 2020; Gu and Shen 2021). A typical evolving fuzzy system can learn from streaming data “on the fly” in a single-pass manner, efficiently transforming the learned knowledge into human-interpretable fuzzy rules. It is capable of capturing concept drifts and/or shifts in the data streams and self-evolving its structure and parameters to self-adapt to the dynamically changing data patterns (Lughofer and Angelov 2011). As an effective and promising tool for handling streaming data problems, evolving fuzzy systems have been intensively researched in the past two decades.

The general framework of evolving fuzzy system is depicted in Fig. 2, where one can see that the evolving mechanism of a typical evolving fuzzy system consists of the following two key schemes, namely (1) structure evolving, and (2) parameter updating (Ge and Zeng 2018a, 2020; Rong et al. 2018). The structure evolving scheme mostly concerns fuzzy rule generation, merging, pruning and splitting as well as the premise parameter learning, namely prototypes of fuzzy rules. The parameter learning scheme is mostly for learning the meta-parameters of the system and consequent parameters of fuzzy rules. In the rest of this section, a review of evolving mechanisms used by a selected group of highly representative evolving fuzzy systems in the literature is presented. An index of the evolving fuzzy systems reviewed in this section is presented in Table 1 for readers’ convenience. Since a large part of the literature has been reviewed recently by Škrjanc et al. (2019); Campos Souza 2020; Leite et al. 2020), this section is more focused on general concepts and principles. As stated in Sect. 1, this section only considers zero-order and first-order evolving fuzzy systems. For higher order ones, such as evolving type-2 fuzzy systems, more details can be found in Škrjanc et al. (2019).

Fig. 2
figure 2

General framework of evolving fuzzy systems (Ge and Zeng 2020)

Table 1 Index of evolving fuzzy systems covered in this review

3.1 Structure learning schemes

Evolving fuzzy systems may employ different structure evolving schemes for fuzzy rule identification. The key idea of such structure evolving schemes is to group streaming data into clusters using recursive clustering and associate each individual cluster with a particular fuzzy rule in the rule base. In this way, the identified local models of data are converted into human-interpretable fuzzy rules.

3.1.1 Rule generation

Rule generation is the key component of the structure evolving scheme because evolving fuzzy systems begin with no fuzzy rule in the knowledge base. The fuzzy rules are added to the knowledge base one-by-one during the learning process from streaming data “on the fly” in an exploratory way to capture the concept drifts and shifts (Lughofer and Angelov 2011). Typically, an evolving fuzzy system adds new fuzzy rules when unfamiliar data patterns are observed from new data samples. As the existing fuzzy rules in the knowledge base fail to describe the novel data patterns well, the system needs to construct new fuzzy rules to self-adapt to the changes. Different evolving fuzzy systems use different approaches to identify new fuzzy rules from data streams (Rong et al. 2018). To date, the most widely used approaches include density/potential criterion, distance criterion, error criterion, firing strength criterion and statistical contribution criterion (Rong et al. 2018; Bao et al. 2018). In this subsection, the listed evolving fuzzy systems in Table 1 will be categorized according to the criteria they utilize for fuzzy rule generation.

3.1.1.1 Density/potential criterion

Evolving Takagi-Sugeno (eTS) model (Angelov and Filev 2004), as one of the earliest evolving fuzzy systems, employs the potential criterion to identify new fuzzy rules. The potential of a data sample is calculated based on the average Euclidean distance between this sample and all other samples in the data space in the form of Cauchy function. If the potential of a new data sample is greater than all the existing cluster centres, a new cluster is identified with this new sample as its focal point, namely, cluster centre, and a new fuzzy rule is initialized accordingly. The evolving fuzzy rule-based classifiers, eClass1 and eClass0 (Angelov and Zhou 2008) also employ the potential criterion for fuzzy rule identification in the same way as eTS. The simplified evolving Takagi-Sugeno (Simpl_eTS) model (Angelov and Filev 2005) uses a simplified variation of potential named scatter. The scatter of a data sample is defined as the average Euclidean distance between this sample and all other samples. A new data sample is recognized as the focal point of a new cluster if its scatter value is greater or smaller than all existing focal points. Meta-cognitive neuro-fuzzy inference system (McFIS) (Subramanian and Suresh 2012) use both the potential and error criteria for new rule identification. If the prediction error on the current input sample and its spherical potential, which is defined as the absolute value of the quantity in squared distance mapping, are both greater than the predefined thresholds, a new rule will be initialized. In the autonomous learning multi-model (ALMMo) system (Angelov et al. 2018), another variation of potential called global density is used for fuzzy rule identification. The global density is calculated based on the Euclidean distance between the data sample and the arithmetic mean of all the observed data samples in the entire data space. The global density measures the similarity between a data sample and the global data pattern. A new fuzzy rule will be initialized by a new data sample if its global density is smaller or greater than existing focal points. If this condition is satisfied, it suggests that the new sample describes the global data pattern better than others (in this case, its global density is currently the greatest) or it represents a new unfamiliar pattern that cannot be represented by existing focal points (in this case, its global density is currently the smallest). Self-organizing fuzzy inference system (SOFIS) (Gu and Angelov 2018a) initializes new fuzzy rules by identifying identifies the focal points from data based on their multimodal density values and mutual distances. The multimodal density of an individual sample is defined as the global density weighted by the corresponding frequency of this sample being observed in the data space. Jointly evolving and compressing fuzzy system (JECFS) (Huang et al. 2021) uses an identical mechanism as ALMMo to learn new fuzzy rules from data.

3.1.1.2 Distance criterion

Dynamic evolving neural-fuzzy inference system (DENFIS) (Kasabov and Song 2002), which is another early work of evolving fuzzy systems, is the most representative model that uses distance criterion for rule generation. In DENFIS, a newly observed data sample is recognized as the focal point of a new cluster if its Euclidean distances to the existing cluster centres exceed a predefined distance threshold, suggesting that this new sample is spatially distant to the identified local models. Flexible fuzzy inference systems (FLEXFIS) (Lughofer 2008) uses a very similar approach for identifying new fuzzy rules as DENFIS. The distance threshold (the so-called vigilance parameter) used by FLEXFIS is normalized by the input dimensionality to avoid generating excessive clusters and fuzzy rules. Gaussian evolving fuzzy modelling system (eMG) (Lemos et al. 2011) automatically adds a new fuzzy rule to the rule base if the following two conditions are both satisfied: (1) the compatibility, which is defined as a Mahalabobis distance-based similarity measure, between the current data sample and prototypes of existing fuzzy rules falls below a predefined threshold; (2) the compatibility of the most compactable prototype to the current data sample (namely, the one with the highest compatibility value) has frequently failed to exceed the predefined compatibility threshold over a given period of time. The simplified evolving Takagi-Sugeno neuro-fuzzy (Simpl_eTS +) model (Angelov 2011) establishes a new fuzzy rule if the Euclidean distance between the current data sample and the global mean is greater or smaller than all the distances between the existing focal points and the global mean. Generalized smart evolving fuzzy systems (GS-EFS) (Lughofer et al. 2015), evolving fuzzy models (EFuMo) (Dovžan et al. 2015) and evolving possibilistic fuzzy modelling system (ePFM) (Maciel et al. 2017) are representative fuzzy models that utilize the distance criterion, employing the Mahalanobis distance as the similarity measure. GS-EFS follows the same rule adding scheme of FLEXFIS but sets an individual distance threshold for each fuzzy rule, called local vigilance parameter. EFuMo normalizes the calculated Mahalanobis distances with the learned fuzzy covariance matrices and further requires a predefined number of consecutive input samples to satisfy the distance criterion-based rule adding condition before adding a new rule. ePFM (Maciel et al. 2017) calculates the Mahalanobis distances from the current input sample to the prototypes of existing rules. A new fuzzy rule is added to the system on the conditions that (1) the distance between the current input sample and the nearest prototype is greater than a threshold derived from a Chi-square distribution and (2) the nearest prototype has been surrounded by a sufficient amount of data samples. Parsimonious learning machine (PALM) (Ferdaus et al. 2019) measures the input and output coherence utilizing the maximal information compression index (Mitra et al. 2002) and adds a new fuzzy rule if the similarities between the hyperplanes of learned fuzzy rules and the current input sample drop below the predefined threshold. Different from evolving fuzzy systems that only use distance criterion for fuzzy rule generation, correntropy-based evolving fuzzy neural system (CEFNS) (Bao et al. 2018) and its modified version called recursive maximum correntropy-based evolving fuzzy system (RMCEFS) (Rong et al. 2019) also employ an additional error criterion. In CEFNS and RMCEFS, a new fuzzy rule will be added to the system knowledge base on condition that the Euclidean distance between the current input sample and the nearest focal point is greater than the distance threshold and the current output error does not exceed a predefined range.

3.1.1.3 Error criterion

Self-organizing fuzzy neural network (SOFNN) (Leng et al. 2004, 2005) uses error criterion for fuzzy rule adding. The error criterion is derived from the geometric growing criterion (Kadirkamanathan and Niranjan 1993) and satisfies the ε-completeness of fuzzy rules (Lee 1990). Based on the error criterion, SOFNN will add a new fuzzy rule if the system output error exceeds a certain threshold given the current input sample. Both evolving fuzzy neural network (EFuNN) (Kasabov 2001) and self-constructing fuzzy neural network (SCFNN) (Lin et al. 2001) utilize the error and firing strength criteria together for adding fuzzy rules. EFuNN (Kasabov 2001) decides to add a new fuzzy rule if any of the following two conditions are satisfied: (1) the current output error is greater than the predefined error threshold, and; (2) the firing strength of the current data sample is below the firing strength threshold. However, in SCFNN (Lin et al. 2001), a more conservative fuzzy rule adding strategy is considered, which requires the two conditions used by EFuNN to be satisfied at the same time. The same strategy as SCFNN is also used in the evolving fuzzy system with self-learning/adaptive thresholds (EFS-SLAT) (Ge and Zeng 2020). Incremental fuzzy c-regression clustering-based system (InFuR) (Blazic and Skrjanc 2020) adds a new fuzzy rule to the rule base if the differences between the individual fuzzy rules’ outputs with respect to the current input sample and the targeted system output all exceed the predetermined threshold. However, if the differences are all far greater than the threshold, InFuR will declare the current input sample as an outlier and discard it instead. Statistically evolving fuzzy inference system (SEFIS) (Yang et al. 2022a) uses a similar mechanism as SOFNN (Leng et al. 2004, 2005) for recruiting new fuzzy rules. Once the prediction error on the current input sample exceeds a predetermined soft threshold, a new fuzzy rule is added to the rule base of SEFIS.

3.1.1.4 Firing strength criterion

The self-organizing fuzzy modified least-squares (SOFMLS) network proposed in Jesús Rubio (2009) is an early work that employs the firing strength criterion for new fuzzy rule identification. In SOFMLS, a new fuzzy rule is initialized by the current input sample if the normalized firing strengths produced by the fuzzy rules within the rule base are all smaller than a predefined threshold. Later works, such as generic self-evolving Takagi-Sugeno-Kang (GSETSK) fuzzy neural network (Nguyen et al. 2015), local error optimization approach for learning evolving fuzzy system (LEOA) (Ge and Zeng 2018a) and self-evolving fuzzy system (SEFS) (Ge and Zeng 2018b) also use the same firing strength criterion-based evolving scheme as SOFMLS for fuzzy rule generation. The evolving neuro-fuzzy model (ENFM) (Soleimani-B et al. 2010) adds a new rule if the maximum firing strength produced by the existing fuzzy rules on the current input sample is below the predefined threshold. In (Lughofer and Pratama 2018), an alternative rule adding mechanism combining the firing strength criterion and the degree of nonlinearity of the current model is introduced to GS-EFS. Spatio-temporal fuzzy inference system (SPATFIS) (Samanta et al. 2019) uses the maximum membership values assigned to input samples per attribute as the criterion for fuzzy rule generation. A new fuzzy rule will be added to the knowledge base if the maximum membership value assigned to one of the variables of the current input sample is smaller than the predefined threshold.

3.1.1.5 Statistical contribution criterion

Sequential adaptive fuzzy inference system (SAFIS) (Rong et al. 2006) firstly proposes the statistical contribution criterion for growing the fuzzy rule base. Combining with the distance criterion, SAFIS adds a new fuzzy rule if the estimated statistical contribution of the new rule to the system outputs exceeds a predefined threshold and, at the same time, the Euclidean distance between the current input sample and the nearest focal point is greater than the distance threshold. Extended sequential adaptive fuzzy inference system (ESAFIS) (Rong et al. 2011) employs the same rule adding mechanism of SAFIS but with a simplified method to calculate the statistical contributions of fuzzy rules. In parsimonious network based on fuzzy inference system (PANFIS) (Pratama et al. 2014b) and generic evolving neuro-fuzzy inference system (GENEFIS) (Pratama et al. 2014a), the statistical contribution of each new input samples is calculated and a new fuzzy rule is initialized if the statistical contribution exceeds a predefined threshold.

A brief summary of the fuzzy rule generation criteria used by mainstream evolving fuzzy systems is given in Table 2.

Table 2 Summary of fuzzy rule generation criteria used by mainstream evolving fuzzy systems

3.1.2 Rule merging

Rule merging is a very useful component of the structure evolving scheme to resolve rule conflict and simplify the knowledge base. Typically, an evolving fuzzy system decides to merge two or multiple fuzzy rules together if they are highly similar to each other. In this subsection, the commonly used rule merging mechanisms are reviewed.

EFuNN (Kasabov 2001) will aggregate multiple fuzzy rules together into one if the radius of the cluster associated with the aggregated new fuzzy rule is less than a predefined maximum radius. SOFNN (Leng et al. 2004, 2005) is equipped with a very strict rule merging mechanism that only combines these fuzzy rules with the same premise part. ENFM (Soleimani-B et al. 2010) decides to merge two fuzzy rules if the Mahalanobis distance-based similarity between the two focal points is greater than the predefined threshold. In (Lughofer et al. 2011), a new rule merging mechanism is introduced to FLEXFIS (Lughofer 2008), enabling it to merge two fuzzy rules together if the overlapping index between their membership functions is greater than the predefined threshold. eMG (Lemos et al. 2011) combines two fuzzy rules into one if the compatibility between the prototypes of the two rules exceeds the predefined threshold. PANFIS (Pratama et al. 2014b) and GENEFIS (Pratama et al. 2014a) employ the same rule merging mechanism based on the similarity degree between Gaussian membership functions of its fuzz rules. The systems will merge two fuzzy rules into one if their similarity exceeds the predefined threshold. GS-EFS (Lughofer et al. 2015; Lughofer and Pratama 2018) uses a joint criteria to examine whether two fuzzy rules need to be merged together or not. In GS-EFS, two fuzzy rules will be merged together if their overlap degree calculated based on the Mahalanobis distance between the two focal points is greater than the predefine threshold or the angle between the hyperplanes defined by their consequent parameters is greater than 90°. EFuMo (Dovžan et al. 2015) merges two fuzzy rules if any one of the following three criteria are satisfied: (1) the normalized Mahalanobis distance between two focal points is smaller than the predefined threshold; (2) the correlation coefficient calculated from the membership degrees is greater than the predefined threshold; (3) the angle between the consequent parameters is below the predefined threshold. SEFS (Ge and Zeng 2018b) employs a L2 distance-based similarity measure calculated from membership functions as the merging criterion. If the similarity of two fuzzy rules exceeds the predefined threshold, SEFS will merge them together. The rule merging mechanism used by LEOA (Ge and Zeng 2018a) is triggered if the associated clusters of two fuzzy rules have a high overlapping level. The overlapping level is determined by the Euclidean distance between the two focal points and the areas of influences covered by the two clusters. EFS-SLAT (Ge and Zeng 2020) decides to merge two fuzzy rules if the firing strengths assigned by the two rules to each other’s focal points both exceed the predefined threshold. Since the fuzzy rules of PALM (Ferdaus et al. 2019) are based on hyperplanes, two fuzzy rules will be merged if the angle and spatial proximity between their hyperplanes both fall below the predefined threshold. In such case, the fuzzy rule with the smaller support will be merged into the other one. SPATFIS (Samanta et al. 2019) keeps monitoring the similarity index between any two rules, which is calculated from the differences between their centres and spreads. Two rules will be found to be identical if the similarity index exceeds a threshold and will be merged together as one.

Another form of rule merging is to replace an existing fuzzy rule with a newly initialized new rule to avoid possible overlapping. In such cases, some of the parameters of the old rule will be inherited by the new rule. For example, the eTS model (Angelov and Filev 2004) will check the distance between the focal points of the new rule and the nearest old rule after a new fuzzy rule is initialized by the current input sample. If the distance is below a dynamical threshold calculated based on the potentials of the two focal points, the old rule will be replaced by the new one. eClass0 and eClass1 (Angelov and Zhou 2008) replace an existing fuzzy rule with the newly established one if the firing strength of the old fuzzy rule is greater than a hardcoded threshold determined by the so-called “one sigma” condition (Duda et al. 2000). Simpl_eTS (Angelov and Filev 2005) examines the Euclidean distance between the current new focal point and the nearest previously identified focal point. The old fuzzy rule will be replaced if the distance is below the threshold. Simpl_eTS + (Angelov 2011) replaces the nearest old fuzzy rule with the newly identified fuzzy rule if its firing strength to the focal point of the new rule exceeds a predefined threshold. ALMMo (Angelov et al. 2018) uses a highly similar mechanism as Simpl_eTS (Angelov and Filev 2005) and Simpl_eTS + (Angelov 2011) by checking the local density of the new focal point at the cluster associated with the nearest fuzzy rule. The local density of a data sample is calculated locally within each individual cluster, measuring the fitness of this sample to the local model represented by the cluster. If the local density produced by the cluster exceeds the hardcoded threshold derived from the Chebyshev inequality, the old fuzzy rule will be replaced by the newly initialized fuzzy rule (Saw et al. 1984).

Rule merging is an effective mechanism to keep the system more compact, interpretable and adaptable. However, it is not an essential component since many existing evolving fuzzy systems are not equipped with a rule merging mechanism, and this does not stop them from achieving high-level performance on various benchmark problems and real application scenarios. For example, among the evolving fuzzy models mentioned in Sect. 3.1.1, DENFIS (Kasabov and Song 2002), CEFNS (Bao et al. 2018), McFIS (Subramanian and Suresh 2012), RMCEFS (Rong et al. 2019), SAFIS (Rong et al. 2006), ESAFIS (Rong et al. 2011), SCFNN (Lin et al. 2001), GSETSK (Nguyen et al. 2015), ePFM (Maciel et al. 2017), SOFIS (Gu and Angelov 2018a), InFuR (Blazic and Skrjanc 2020), JECFS (Huang et al. 2021) and SEFIS (Yang et al. 2022a) are not equipped with such mechanism.

3.1.3 Rule pruning

Rule pruning is a component of the structure evolving scheme to remove stale fuzzy rules from the knowledge base. The stale fuzzy rules are no longer valid and contribute little to the system outputs because they fail to describe the current data patterns. Pruning such rules effectively keeps the knowledge base of the evolving fuzzy systems clear and compact. Different evolving fuzzy systems may employ different criteria to identify stale rules. Commonly used criteria include age, utility, population, rule importance, rule contribution, etc. (Ge and Zeng 2020). A brief summary of the rule pruning criteria used by mainstream evolving fuzzy systems is presented in Table 3.

Table 3 Summary of fuzzy rule pruning criteria used by mainstream evolving fuzzy systems
3.1.3.1 Age criterion

eClass0 (Angelov and Zhou 2008) and eClass1 (Angelov and Zhou 2008) employ the age criterion for rule pruning. The age of a fuzzy rule gives the accumulated information about the time instances, at which input samples were assigned to this rule. Each time a new input sample is assigned to a rule, the age of that rule becomes smaller, otherwise, the rule grows older. eClass0 (Angelov and Zhou 2008) and eClass1 (Angelov and Zhou 2008) will remove a particular fuzzy rule from the knowledge base if its age is one standard deviation greater than the mean of the ages of all existing rules in the system.

3.1.3.2 Utility criterion

Simpl_eTS + (ANGELOV 2011), ePFM (Maciel et al. 2017), ALMMo (Angelov et al. 2018) and EFS-SLAT (Ge and Zeng 2020) utilize the utility criterion for stale rule identification. The concept of utility is defined as an average of normalized firing strengths produced by the individual fuzzy rules. In (Angelov 2011; Maciel et al. 2017; Angelov et al. 2018; Ge and Zeng 2020), a stale fuzzy rule will be identified from the knowledge base if its utility is below the pre-set tolerance. LEOA (Ge and Zeng 2018a) uses a variation of utility criterion for rule pruning, which is defined as the sum of the firing strengths from the time instance at which the fuzzy rule was initialized to the current time instance. Similarly, a fuzzy rule will be recognized as a stale one and removed from the knowledge base if the sum value is below the predefined threshold.

3.1.3.3 Population criterion

The pruning mechanisms employed by EFuNN (Kasabov 2001), Simpl_eTS (Angelov and Filev 2005), EFuMo (Dovžan et al. 2015) and SOFMLS (Jesús Rubio 2009) are based on the population criterion. EFuNN (Kasabov 2001) removes a fuzzy rule from the knowledge base if this rule fails to receive enough data samples after a user-defined period of time after it was initialized. During operation, Simpl_eTS (Angelov and Filev 2005) consistently monitors the support (namely, population) of each fuzzy rule. A stale rule will be identified and removed from the knowledge base if its support is less than 1% of the overall number of observed samples. EFuMo (Dovžan et al. 2015) prunes a stale rule if the average number of data samples assigned to this rule per instance is lower than a predefined percentage of the mean value of the entire rule base or this rule fails to get a sufficient amount of data samples after a certain period of time. SOFMLS (Jesús Rubio 2009) repeatedly examines the support of every fuzzy rule in the knowledge base for every a predetermined number of instances and will remove a rule if its support is smaller than the predefined threshold.

3.1.3.4 Rule importance criterion

SOFNN (Leng et al. 2004, 2005) prunes fuzzy rules based on their importance. It will evaluate the importance of every fuzzy rule to system performance by removing it temporarily and calculating the changes of root mean squared error (RMSE) of its predictions. Based on this, SOFNN identifies the rules that only incurs minor changes to its prediction accuracy. Such fuzzy rules will be pruned from the knowledge base if the prediction error of SOFNN is below the predefined value without them. SEFIS (Yang et al. 2022a) monitors its prediction error on every input sample consistently and prunes the fuzzy rule with the nearest prototype to the current input if the prediction error drops below the predetermined soft threshold, under the assumption that other fuzzy rules are sufficient for approximation.

3.1.3.5 Rule contribution criterion

The rule pruning mechanisms of SAFIS (Rong et al. 2006), ESAFIS (Rong et al. 2011), McFIS (Subramanian and Suresh 2012), PANFIS (Pratama et al. 2014b), GENEFIS (Pratama et al. 2014a) and SPATFIS (Samanta et al. 2019) are based on the rule contribution criterion. These evolving fuzzy systems use different approaches to evaluate the statistical contributions of the fuzzy rules to system output, but they follow the same procedure for rule pruning. During the learning process, the statistical contribution of every fuzzy rule will be consistently monitored, and a certain rule will be pruned if its contribution falls below a predefined threshold.

A well-designed rule pruning mechanism can effectively enhance the capability of an evolving fuzzy system to self-adapt to the unfamiliar patterns of data streams. It plays a key role in rule base simplification, effectively improving the computation- and memory-efficiency of the system. However, same as rule merging, rule pruning is also not an essential component for an evolving fuzzy system to perform well in real application scenarios. For example, the following mainstream evolving fuzzy systems are not equipped with a rule pruning mechanism: eTS (Angelov and Filev 2004), DENFIS (Kasabov and Song 2002), eMG (Lemos et al. 2011), FLEXFIS (Lughofer 2008), SOFIS (Gu and Angelov 2018a), GS-EFS (Lughofer et al. 2015; Lughofer and Pratama 2018), CEFNS (Bao et al. 2018), RMCEFS (Rong et al. 2019), SCFNN (Lin et al. 2001), GSETSK (Nguyen et al. 2015), PALM (Ferdaus et al. 2019), SEFS (Ge and Zeng 2018b), ENFM (Soleimani-B et al. 2010), InFuR (Blazic and Skrjanc 2020) and JECFS (Huang et al. 2021).

3.1.4 Rule splitting

Rule splitting is another component of the structure evolving scheme to assist evolving fuzzy systems to build finer system structure and achieve finer partition of the data space. However, compared with rule merging and rule pruning, rule splitting is a relatively new concept and is rarely used by the existing evolving fuzzy models.

Among the mainstream evolving fuzzy systems mentioned in Table 1, EFuMo (Dovžan et al. 2015) and GS-EFS (Lughofer et al. 2015; Lughofer and Pratama 2018) are the only two equipped with such mechanism. EFuMo (Dovžan et al. 2015) determines whether to split a fuzzy rule or not based on its support and mean relative error, which is calculated based on the weighted average prediction error of this local model over time. EFuMo (Dovžan et al. 2015) splits a fuzzy rule into two if its support is greater than a predefined number and its mean relative error is larger than the threshold. In (Lughofer et al. 2018), a similar but more advanced rule splitting mechanism is introduced to GS-EFS (Lughofer et al. 2015; Lughofer and Pratama 2018). Without using externally controlled thresholds as EFuMo (Dovžan et al. 2015) does, GS-EFS (Lughofer et al. 2018) decides to split the latest updated fuzzy rule if its support and weighted average prediction error both are one or two standard deviation(s) greater than the mean values of all existing fuzzy rules.

3.1.5 Input attribute reduction

Apart from the aforementioned mainstream evolving schemes (namely, generation, merging, pruning and splitting), there are also a few evolving fuzzy systems equipped with input attribute reduction schemes to remove redundant attributes, thereby improving the prediction performance and increasing the computational efficiency. The best-known examples that are equipped with input attribute reduction schemes include Simpl_eTS + (Angelov 2011), GENEFIS (Pratama et al. 2014a), ALMMo (Angelov et al. 2018) and JECFS (Huang et al. 2021). The first three evolving fuzzy systems prune the input attributes based on their importance to the system outputs. However, Simpl_eTS + (Angelov 2011) and ALMMo (Angelov et al. 2018) perform online input selection for each individual fuzzy rule separately, such that an input attribute previously removed for a particular rule will still be considered by other rules. In contrast, the online input selection of GENEFIS (Pratama et al. 2014a) is performed on the system-level. Once an input attribute is pruned, the dimensionality of the system inputs is reduced by 1. JECFS (Huang et al. 2021), on the other hand, attempts to learn a more compact, simpler fuzzy rule base from data streams by using very sparse random projection matrices (Li et al. 2006) to compress the dimensionality of the input data space and ease the effect of “the curse of dimensionality”.

3.2 Parameter learning schemes

The premise part in the form of focal points of the fuzzy rules within evolving fuzzy systems is typically learned during the structure evolving process through online clustering the data streams. As the structure evolving schemes used by different evolving fuzzy systems vary in ways that fuzzy rules are identified, merged, pruned as well as split, the premise parts learned from streaming data can also be different, leading to the highly diverse behaviours of evolving fuzzy systems. However, the vast majority of first-order evolving fuzzy systems, which are built upon first-order fuzzy rules, in the literature utilize recursive least square (RLS)-based techniques for consequent parameter learning. The fuzzily weighted RLS (FWRLS) algorithm (Angelov and Filev 2004) is the most popular one used by a wide variety of first-order evolving models as “golden standard” (Lughofer 2011b, 2021; Angelov and Gu 2018b; Škrjanc et al. 2019). Other consequent parameter updating schemes used by first-order evolving fuzzy systems include weighted RLS (Kasabov and Song 2002), extended Kalman filter (Rong et al. 2006), stable gradient descent (Jesús Rubio and Bouchachia 2017), adaptive maximum correntropy extended Kalman filter (Yang et al. 2022a), etc. In this section, the algorithmic procedure of FWRLS is presented for illustrating the key concept.

The consequent parameters can be learned with the FWRLS algorithm either globally or locally with the objective of minimizing the mean squared error of system outputs:

$${E}_{k}=\frac{1}{k}{\sum }_{i=1}^{k}{\Vert {y}_{i}-{\widehat{y}}_{i}\Vert }^{2}$$
(14)

where \({\widehat{y}}_{i}\) is the system output at the ith time instance; \({y}_{i}\) is the corresponding targeted value.

With the global learning approach, the consequent parameters of all first-order fuzzy rules within the knowledge base are updated together simultaneously using Eqs. (15) and (16) (Angelov and Filev 2004; Angelov and Gu 2018b):

$${{\varvec{\Theta}}}_{k}\leftarrow {{\varvec{\Theta}}}_{k-1}-\frac{{{\varvec{\Theta}}}_{k-1}{\overline{\mathbf{x}} }_{k}^{T}{\overline{\mathbf{x}} }_{k}{{\varvec{\Theta}}}_{k-1}}{1+{\overline{\mathbf{x}} }_{k}^{T}{{\varvec{\Theta}}}_{k-1}{\overline{\mathbf{x}} }_{k}}$$
(15)
$${\mathbf{a}}_{k}\leftarrow {\mathbf{a}}_{k-1}+{{\varvec{\Theta}}}_{k}{\overline{\mathbf{x}} }_{k}\left({y}_{k}-{\overline{\mathbf{x}} }_{k}^{T}{\mathbf{a}}_{k-1}\right)$$
(16)

where \({{\varvec{x}}}_{k}\) is the input sample at the kth time instance; \({y}_{k}\) is the corresponding targeted system output; \({\overline{\mathbf{x}} }_{k}\) is a \(\left(N+1\right)L\times 1\) dimensional vector, \({\overline{\mathbf{x}} }_{k}={\left[{\overline{\lambda }}_{k,1}{\overline{{\varvec{x}}} }_{k}^{T},{\overline{\lambda }}_{k,2}{\overline{{\varvec{x}}} }_{k}^{T},\dots ,{\overline{\lambda }}_{k,L}{\overline{{\varvec{x}}} }_{k}^{T}\right]}^{T}\); \({\overline{{\varvec{x}}} }_{k}={\left[1,{{\varvec{x}}}_{k}^{T}\right]}^{T}\); \({\mathbf{a}}_{k}={\left[{{\varvec{a}}}_{k,1}^{T},{{\varvec{a}}}_{k,2}^{T},\dots ,{{\varvec{a}}}_{k,L}^{T}\right]}^{T}\); \({{\varvec{a}}}_{k,i}\) is the consequent parameter vector of the ith rule \({\mathbf{R}}_{i}\) at the kth time instance; \({{\varvec{\Theta}}}_{k}\) is the \(\left(N+1\right)L\times \left(N+1\right)L\) dimensional covariance matrix calculated globally.

With the local learning approach, the consequent parameters of each individual fuzzy rule are updated in parallel, independently using Eqs. (17) and (18) (Angelov and Filev 2004; Angelov and Gu 2018b):

$${{\varvec{\Theta}}}_{k,i}\leftarrow {{\varvec{\Theta}}}_{k-1,i}-\frac{{\overline{\lambda }}_{k,i}{{\varvec{\Theta}}}_{k-1,i}{\overline{{\varvec{x}}} }_{k}^{T}{\overline{{\varvec{x}}} }_{k}{{\varvec{\Theta}}}_{k-1,i}}{1+{\overline{\lambda }}_{k,i}{\overline{{\varvec{x}}} }_{k}^{T}{{\varvec{\Theta}}}_{k-1,i}{\overline{{\varvec{x}}} }_{k}}$$
(17)
$${{\varvec{a}}}_{k,i}\leftarrow {{\varvec{a}}}_{k-1,i}+{\overline{\lambda }}_{k,i}{{\varvec{\Theta}}}_{k,i}{\overline{{\varvec{x}}} }_{k}\left({y}_{k}-{\overline{{\varvec{x}}} }_{k}^{T}{{\varvec{a}}}_{k-1,i}\right)$$
(18)

where \({{\varvec{\Theta}}}_{k,i}\) is the \(\left(N+1\right)\times \left(N+1\right)\) dimensional covariance matrix associated with \({\mathbf{R}}_{i}\).

In general, the local learning approach with the FWRLS algorithm is more computationally efficient than the global learning approach thanks to the recursive updating of covariance matrix per rule. In addition, the learned consequent parameters by the local learning approach are less influenced by the structure evolution of the fuzzy model.

3.3 Performance comparison and analysis

In this subsection, numerical results on a number of popular benchmark datasets are presented for performance comparison between different evolving fuzzy systems. However, as the performances of evolving fuzzy systems usually are sensitive to parameter settings, the results are obtained directly from the literature (Pratama et al. 2014b; Angelov et al. 2018; Ge and Zeng 2018a, b, 2020; Rong et al. 2018, 2019; Samanta et al. 2019; Gu et al. 2021a) for a fair comparison.

Mackey–Glass chaotic time series prediction problem (Mackey and Glass 1977) is used as the first numerical example, which is one of the most widely used benchmark datasets (Kasabov and Song 2002; Rong et al. 2006, 2011). This time series is created using the Mackey–Glass time-delay differential equation. Performances of different evolving fuzzy systems reported in the literature (Ge and Zeng 2018a, b; Rong et al. 2019; Samanta et al. 2019) are listed in Table 4 in terms of non-dimensional error index (\(NDEI\)), number of fuzzy rules (\(\#(Rule)\)) and training time consumptions (\({t}_{exe}\)).

Table 4 Performance comparison between different evolving fuzzy systems on Mackey-Glass time series

Secondly, Delta Ailerons dataset from the KEEL-dataset repository (Alcalá-Fdez et al. 2011a)Footnote 1 is employed as the second example, which is obtained from the tasks of controlling the ailerons of a F16 aircraft. Performances of different evolving fuzzy systems reported in the literature (Ge and Zeng 2020; Rong et al. 2018, 2019; Gu et al. 2021a) on Delta Ailerons dataset are listed in Table 5 in terms of root mean square error (\(RMSE\)), \(\#(Rule)\) and \({t}_{exe}\).

Table 5 Performance comparison between different evolving fuzzy systems on Delta Ailerons

Then, the S&P 500 close price prediction problem is used for experimental comparison (Pratama et al. 2014b). The daily close prices of S&P500 are collected from the Yahoo! Finance website,Footnote 2 ranging from 03.01.1950 to 12.03.2009 (60 years). Performance comparison between different evolving fuzzy systems on this problem is tabulated in Table 6 in terms of \(NDEI\) and \(\#(Rule)\). The reported results are obtained directly from (Pratama et al. 2014b; Angelov et al. 2018; Ge and Zeng 2020).

Table 6 Performance comparison between different evolving fuzzy systems on S&P 500 close price

It can be observed from Tables 4, 5 and 6 that the performances of evolving fuzzy systems vary a lot on different problems. For example, eTS and ALMMo produced the most accurate predictions on predicting the S&P 500 close price, but were outperformed by EFS-SLAT on Mackey-Glass time series. In fact, the prediction performance of a particular evolving fuzzy system is determined by many different factors in relation to the system specification, such as fuzzy rule types, structural evolving schemes, parameter learning schemes, membership function types, etc. An EIS may also behave very differently for different problems depending on their natures and the experimental settings.

4 Evolutionary fuzzy systems

As mentioned in Sect. 2, Zadeh-Mamdani type fuzzy systems (Zadeh 1973; Mamdani and Assilian 1975) were introduced in 1970s, and Takagi-Sugeno-Kang type fuzzy systems (Takagi and Sugeno 1985; Sugeno and Kang 1988) were introduced in 1980s. Theoretical basis of evolutionary computation was established around the same period of time (Holland 1975), while the pioneering works of evolutionary fuzzy systems, as a hybridization between fuzzy systems and evolutionary computation, firstly appeared in early 1990s (Karr 1991; Valenzuela-Rendon 1991; Thrift 1991; Pham and Karaboga 1991).

Nature-inspired evolutionary algorithms, e.g., genetic algorithms, genetic programming, particle swarm optimization, ant colony optimization, etc., are generally applied to global optimization. Evolutionary fuzzy systems is a class of fuzzy system, which can be either Zadeh-Mamdani type or Takagi-Sugeno-Kang type, with the knowledge bases learned or tuned by evolutionary algorithms, as depicted in Fig. 3 (Herrera 2008; Fazzolari et al. 2013; Fernandez et al. 2019; Elhag et al. 2019). The design of a fuzzy system can be considered as a search task from available observations for the suitable solutions that can best approximate the problem based on the given performance metric (Fernandez et al. 2019). In evolutionary fuzzy systems, such search task is performed by population-based evolutionary and bio-inspired algorithms (thanks to their strong ability in searching for near-optimal solutions in a wide range of problem spaces). Hence, evolutionary fuzzy systems have demonstrated excellent performance in different application scenarios for handling classification and regression problems, and is currently a popular research area (Fazzolari et al. 2013; Elhag et al. 2019).

Fig. 3
figure 3

Standard structure of evolutionary fuzzy systems (Herrera 2008; Elhag et al. 2019)

Evolutionary fuzzy systems employ evolutionary algorithms to optimize the model structure and parameters in terms of objective functions based on accuracy, interpretability or a combination of both (Fernandez et al. 2019). In general, evolutionary algorithms can be employed to learn the knowledge base of the fuzzy system (namely, evolutionary learning) or to tune a given knowledge base (namely, evolutionary tuning) (Herrera 2008). Multi-objective evolutionary algorithms can also be utilized to balance the prediction precision and interpretability of the fuzzy models (Fazzolari et al. 2013). According to the different elements of fuzzy models developed by evolutionary algorithms, a taxonomy of mainstream evolutionary fuzzy systems is given in Fig. 4. In the rest of this section, the representative approaches for developing evolutionary fuzzy systems are summarized in accordance with Fig. 4. For a more detailed literature review, interested readers are referred to the survey papers (Herrera 2008; Cordón 2011; Fazzolari et al. 2013; Fernández et al. 2015; Fernandez et al. 2019). Additional materials can be found on the website.Footnote 3

Fig. 4
figure 4

Taxonomy of mainstream evolutionary fuzzy systems (Fazzolari et al. 2013; Fernández et al. 2015; Fernandez et al. 2019)

4.1 Evolutionary learning

The aim of evolutionary learning is to learn the knowledge bases of the fuzzy systems from data with the help of evolutionary algorithms. There are four approaches for evolutionary learning.

4.1.1 Evolutionary rule selection

The main purpose of this approach is to remove useless, redundant, erroneous and/or conflictive fuzzy rules from the original knowledge base of a candidate fuzzy system, resulting in a more compact, optimized knowledge base and greater prediction precision (Ishibuchi et al. 1995, 1997; Cordón and Herrera 2000). This is very similar to the rule pruning mechanism of evolving fuzzy systems, but is implemented by evolutionary algorithms.

4.1.2 Evolutionary data base learning

This approach aims to learn the database of knowledge base, which include membership functions, fuzzy partition granularity, scaling functions. This can be achieved through two different methods. One is to use a measure to directly evaluate the quality of the generated fuzzy parameters. The other is to evaluate the quality of the entire knowledge base derived from the generated fuzzy parameters instead. Examples of evolutionary database learning can be found in (Park et al. 1994; Cordón et al. 2001b, c), where the second approach was implemented.

4.1.3 Evolutionary rule learning

This approach is designed to learn fuzzy rules by utilizing evolutionary algorithms with a predefined database. The earliest example of this approach was presented in Thrift (1991). In (Ishibuchi et al. 1999), individual fuzzy rules are generated by genetic algorithms for classification tasks, where the linguistic terms and membership functions are fixed a priori. The work presented in Rodríguez-Fdez et al. 2016a utilizes an ad hoc method to estimate the fuzzy partition granularity from data and then uses the evolutionary algorithm to learn the fuzzy rules. This work is further extended in Rodríguez-Fdez et al. 2016b for big data.

4.1.4 Simultaneous evolutionary learning of knowledge base elements

This approach learns both the fuzzy rules and data base together from data using evolutionary algorithms (Homaifar and McCormick 1995; Shi et al. 1999). However, this means that the searching space will be much larger, and the convergence speed will be lower. A hybrid approach combining the evolutionary learning of adaptive inference engine and knowledge base elements is proposed in Márquez et al. (2007).

4.2 Evolutionary tuning

Evolutionary tuning aims to improve the performances of existing fuzzy systems by adjusting the parameters of the knowledge bases or inference engines using evolutionary algorithms.

4.2.1 Evolutionary knowledge base tuning

This approach optimizes the knowledge base of a learned fuzzy system by adjusting its membership function parameters through an evolutionary tuning process (Angelov and Guthke 1997; Casillas et al. 2005). Some works (Angelov 1999, 2000; Alcalá et al. 2007a; Alcalá-Fdez et al. 2011b) combine knowledge base tuning with rule selection to achieve higher performance. In the works presented in (Gacto et al. 2009; Alcalá et al. 2007b; Pulkkinen and Koivisto 2008), multi-objective evolutionary algorithms are employed to perform membership function parameter tuning and rule selection simultaneously in order to achieve a trade-off between accuracy and complexity.

4.2.2 Evolutionary adaptive inference engine

The main aim of this approach is to improve the system performance by adjusting the fuzzy inferencing scheme. An evolutionary adaptive inference engine can be implemented in two different ways. One way is to introduce adaptive parameters, which are tuned by evolutionary algorithms, to the inference system such that higher cooperation among the fuzzy rules can be achieved (Alcalá-Fdez et al. 2007). The other way is to introduce adjustable weights to the defuzzifier and use evolutionary algorithms to tune these weights (Kim et al. 2002).

4.3 Objective trade-off

Although the primary goal of introducing evolutionary algorithms to the design of a fuzzy system is to improve its prediction precision, it is also possible to consider different criteria such as interpretability, stability, robustness, computational efficiency during the system design process by utilizing multi-objective evolutionary algorithms.

4.3.1 Performance versus interpretability

As aforementioned, prediction precision/accuracy is the key criterion considered during the system design. Indeed, the similarity between the responses of the real system and the fuzzy model should be as high as possible in real application scenarios (Gacto et al. 2011). The interpretability of the fuzzy model is also very important considering the current trend towards explainable AI. However, interpretability is a more subjective criterion and there is no standard approach to quantify it. There are several system complexity-based factors considered to be relevant to the model interpretability (Gacto et al. 2011; Rudzi 2016), for example, the number of input variables, the number of fuzzy rules, the number of membership functions, etc. In addition, there are semantic-based factors to be considered, such as incoherence, distinguishability and rule relevance (Rey et al. 2017). By balancing between prediction accuracy and model interpretability, a multi-objective evolutionary fuzzy system can achieve adequately high prediction performance with a clear and compact knowledge base. Examples of such approaches can be found in (Alcalá et al. 2007b; Gacto et al. 2009; Fazzolari et al. 2014; Rudzi 2016; Rey et al. 2017; Gorzałczany and Rudziński 2017).

4.3.2 Performance versus performance

During the design of industrial control systems, there are often a wider variety of criteria apart from prediction precision and model interpretability needed to be considered, which may include robustness, time efficiency, stability (Elhag et al. 2019). To achieve the trade-off between different criteria, multi-objective evolutionary algorithms are usually employed in the system design. One can use multi-objective evolutionary algorithms to learn the knowledge base and/or its components from data to construct the fuzzy system, but a more commonly used approach is to tune the structure and parameters of an existing fuzzy model (Fernández et al. 2015; Fernandez et al. 2019).

4.4 Performance demonstration and analysis

In this subsection, numerical results obtained by ALMMo and particle swarm optimized ALMMo (PSO-ALMMo) on the three benchmark datasets used before are presented as an example. PSO-ALMMo is the evolutionary version of ALMMo. It utilizes a particle swarm optimization algorithm (Eberhart and Kennedy 1995) to simultaneously optimize the premise and consequent parameters learned by ALMMo from streaming data based on historical observations, namely, evolutionary tuning. However, in order to achieve this, PSO-ALMMo requires historical data to be stored in system memory, and its optimization process has to be conducted offline. Numerical results by ALMMo and PSO-ALMMo obtained from Gu et al. 2021a are tabulated by Table 7 as performance comparison. Note that ALMMo in Gu et al. 2021a utilizes Gaussian type membership function instead of the Cauchy membership function as used in the original version (Angelov et al. 2018), causing differences in terms of prediction precision, system complexity and computational efficiency.

Table 7 Performance comparison between ALMMo and PSO-ALMMo

Table 7 shows that PSO-ALMMo is able to produce more accurate predictions than ALMMo thanks to the iterative parameter optimization process. However, as the parameter optimization process can take a great amount of time before converging to a locally optimal solution depending on the complexity and dimensionality of the searching space, this process has to be performed offline. As a result, PSO-ALMMo is limited to offline application scenarios despite of its greater precision.

5 Reinforcement learning-based fuzzy systems

Reinforcement learning is a machine learning paradigm for solving decision-making problems in Markovian processes (Sutton and Barto 2018). Reinforcement learning problems are essentially closed-loop problems in the sense that the actions of the learning system (also called agent) influence its later inputs. In a typical reinforcement learning framework, an agent learns to achieve a goal by interacting with the environment, which is defined in the form of a Markov decision process. The agent gets either rewards or penalties for the actions it performs, and its main goal is to maximize the long-term reward (Huang et al. 2020).

Reinforcement learning differs from supervised learning as it does not require input–output training data to be explicitly presented during the training process. Instead, the learning system is expected to figure out the solution yielding the maximum return in the form of rewards as the result of its actions. To begin with, the agent only needs to be equipped with prior knowledge of possible actions and reward policy. Thanks to the capability of discovering optimal solutions through interactions with the environment, reinforcement learning has been extensively researched in recent years and widely used in the domains such as gameplay (Silver et al. 2016), robotics (Gu et al. 2017).

Reinforcement learning is closely associated with adaptive control and optimal control (Lewis and Vrabie 2012). Pioneering works of utilizing reinforcement learning in fuzzy control system design appeared in early 1990s, which include approximate reasoning-based intelligent control (ARIC) system (Berenji 1992), generalized approximate reasoning-based intelligent control (GARIC) system (Berenji and Khedkar 1992), reinforcement neural-network-based fuzzy logic control system (RNN-FLCS) (Lin and George Lee 1994) and reinforcement fuzzy adaptive learning control network (RFALCON) (Lin and Lin 1996). These control systems utilize two separate sub-models to perform Actor–Critic learning (Konda and Tsitsiklis 2000) and, thus, their system structures are complex. The system structures of ARIC and GARIC are both fixed, and they use reinforcement learning techniques to tune the parameters only. In contrast, RNN-FLCS and RFALCON are capable of self-developing the system structure and parameters, but they still require experts to determine the number and type of membership function for each input and output variable. In (Jouffe 1998), two fuzzy reinforcement learning methods, fuzzy actor-critic learning (FACL) and fuzzy Q-learning (FQL) are proposed for tuning the consequent parameters of fuzzy controller. A fuzzy Actor–Critic reinforcement learning network (FACRLN) is proposed in Wang et al. (2007). FACRLN uses a single fuzzy radial basis function neural network to approximate both the action value function of the Actor and the state value function of the Critic simultaneously, thus, the system complexity is largely reduced. In (Juang and Lu 2009), an ant colony optimization-based algorithm combined with FQL is proposed for determining the consequent parameters of fuzzy inference systems. By treating the combined return value of a series of actions as the fitness value to be maximized, a particle swarm reinforcement learning method is presented in Hein et al. (2017) to learn the best policy represented by fuzzy rules. Since the majority of existing fuzzy reinforcement learning methods are implemented on the basis of (fuzzy) neural networks with very limited interpretability, an interpretable reinforcement learning scheme is proposed in Huang et al. (2020), where the learned policy can be expressed as human-intelligible IF-THEN rules and the value function is approximated through the AnYa type fuzzy rule-based system.

6 Comparison between evolving, evolutionary and reinforcement learning-based fuzzy systems

Evolutionary fuzzy systems and evolving fuzzy systems are closely related. In some works, EFSs is short for evolutionary fuzzy system (Fazzolari et al. 2013; Fernández et al. 2015; Fernandez et al. 2019), and in many other works, EFSs is the abbreviation of evolving fuzzy systems (Škrjanc et al. 2019; Campos Souza 2020; Leite et al. 2020). Few early works of evolutionary fuzzy systems characterize themselves as evolving (Angelov 1999, 2000). Despite of the similarity, there are some key differences between the two classes of fuzzy systems.

Evolutionary algorithms play an instrumental role in the design of evolutionary fuzzy systems. The evolutionary learning process is performed by the operators that mimic the natural evolutionary phenomena such as chromosomes crossover, mutation, selection and reproduction, parents and off-springs. Since evolutionary algorithms search for the optimal solutions in the problem spaces through an iterative process, evolutionary fuzzy systems usually can achieve strong, robust performance in problems with complex nature. Nevertheless, the training processes of evolutionary fuzzy systems are limited to offline due to the requirement of iterative computation. In order to obtain the (nearly) optimal solution, evolutionary fuzzy systems require all the training data needs to be presented, and it can take a much longer time for evolutionary algorithms to converge depending on the nature of the problem and the scale of training data.

In contrast, evolving fuzzy systems are designed to gradually self-adapt the system structure and parameters to follow the concept shifts and drifts of underly patterns of the data streams. The vast majority of evolving fuzzy systems learn from streaming data in a single-pass, non-iterative manner, and they stress the ability to react rapidly to the dynamically changing data patterns in nonstationary environments. This is certainly correct considering the targeted application scenarios and is of great importance to the success of evolving fuzzy systems (Gu et al. 2021a). However, such learning behaviours can sometimes lead to poor global prediction accuracy and the so-called “unlearning effect” (Ge and Zeng 2018a) as the models tend to fit the more recently arrived data well but fit historical data poorly.

On the other hand, reinforcement learning is generally less effective than supervised learning if there exists sufficient training data. It assumes that the world is Markovian and requires prior knowledge about the environment and possible actions of the agent. Huge amounts of time and computational resources are usually needed before the reinforcement learning system finding out the optimal solution. In addition, it sometimes can be very challenging to define the reward policies for some real-world problems, such as autonomous driving (Kiran et al. 2021). Thus, to date, reinforcement learning-based fuzzy systems are mostly implemented for control problems in the robotics and industrial automation domains where prior knowledge is sufficient to quantify the states of environments and determine the possible actions of agents (Yung and Ye 1999; Lin and Jou 2000; Fathinezhad et al. 2016).

Table 8 presents a brief comparison between the three classes of fuzzy systems discussed from the following six different aspects: (1) learning mode (online/offline); (2) requirement for supervision (input–output pairs); (3) structure and parameter learning (self-developing or pre-fixed by user); (4) optimization of the structure and/or parameters during training (yes/no); (5) computational complexity (high/low), and; (6) need for prior knowledge (weak to strong).

Table 8 Comparison between evolving, evolutionary and reinforcement learning-based fuzzy systems

7 Applications of fuzzy systems

Examples of real-world applications of the three classes of fuzzy systems are listed in Table 9, which cover a wide range of areas including agriculture, communication, computing, healthcare, finance, remote sensing, etc. This demonstrates the flexibility and wide applicability of evolving, evolutionary and reinforcement learning-based fuzzy systems, as powerful tools for handling real-world problems with different nature. Due to the very large number of research publications containing applications of fuzzy systems, which is also an empirical evidence about the importance of these methods, it is impossible to cover all of them in this overview paper. More detailed reviews of recent applications of these methods in real-world problems can be found from (Škrjanc et al. 2019; Campos Souza 2020; Leite et al. 2020; Fernández et al. 2015; Fernandez et al. 2019).

Table 9 Real-world applications of evolving, evolutionary and reinforcement learning-based fuzzy systems

8 Stock price prediction-a case study

To demonstrate the utility of fuzzy rule-based systems in real-world application scenarios, a case study of predicting the stock price of Walmart Inc. (WMT) is presented. The daily high, low, open and close prices of WMT are acquired from the Yahoo! Finance website,Footnote 4 ranging from 01.01.2000 to 01.01.2021 (20 years), 5284 samples in total. In this example, the ALMMo system (Angelov et al. 2018) is employed for predicting the close price one day ahead based on the current four prices, namely,

$${\widehat{x}}_{k+1,close}=f\left({\left[{x}_{k,high},{x}_{k,low},{x}_{k,open},{x}_{k,close}\right]}^{T}\right)$$
(19)

The prediction results, stepwise prediction error (\({e}_{k}={x}_{k+1,close}-{\widehat{x}}_{k+1,close}\)), average squared prediction error over time (\({E}_{k}=\frac{1}{k}\sum_{i=1}^{k}{e}_{i}^{2}\)) and #(Rules) over the online learning process are shown in Fig. 5a–d, respectively.

Fig. 5
figure 5

Prediction result on WMT stock price

Figure 5 shows that ALMMo is able to capture the dynamically changing patterns of the stock price and accurately modelling this nonstationary data stream (Fig. 5a). Despite that the uncertain nature of the stock market makes it impossible to correctly predict the stock price at all time (see Fig. 5b), it can be observed from Fig. 5c that the average squared prediction error of ALMMo converges to a very small value close to zero quickly and becomes stable afterwards. This demonstrates that the prediction performance of ALMMo is stable over time thanks to its capability of self-evolving its system structure and parameters in real time to follow the concept drifts and/or shifts in the data streams (Fig. 5d).

To illustrate the interpretability of fuzzy systems, the complete fuzzy rule base of the ALMMo system at the final time instance is tabulated in Table 10. The 10 first-order fuzzy rules listed in this table is the “core” of ALMMo for fuzzy inferencing. Note that, all the parameters are self-learned from data directly.

Table 10 First-order fuzzy rules learned from historical WMT stock prices for prediction

The predictive performance of ALMMo is then tested on out-of-sample data collected within the period of 01.01.2021 to 01.01.2022 (1 year), 252 samples in total. In this experiment, the ALMMo system trained based on historical data (the previous 20 years) is used for predicting the one-day ahead close price of the WMT stock based on the daily four prices in 2021. However, different from the previous example before, the structure and parameters of ALMMo will not be updated during out-of-sample testing. The overall prediction error of ALMMo on WMT out-of-sample data in terms of \(NDEI\) is reported in Table 11, and is further compared against the following four approaches:

  1. (1)

    PSO-ALMMo (the evolutionary version of ALMMo) (Gu et al. 2021a);

  2. (2)

    Support vector machine regressor (SVM) (Cristianini and Shawe-Taylor 2000);

  3. (3)

    Radom forest regressor (RF) (Breiman 2001), and;

  4. (4)

    Multi-layer perceptron regressor (MLP) (Hastie et al. 2009).

Table 11 Performance comparison on stock price prediction

In this example, PSO-ALMMo follows the same experimental setting as (Gu et al. 2021a); SVM uses the Gaussian kernel and the kernel scalar is selected using a heuristic procedure; RF is composed of 40 decision trees and the maximum split of each tree is \(K-1\) (\(K\) is the number of training samples); MLP has two hidden layers with 40 neurons in each and is trained using the resilient backpropagation algorithm. The prediction errors of PSO-ALMMo, SVM, RF and MLP are tabulated in Table 11 for comparison. The prediction results by ALMMo, PSO-ALMMo, SVM, RF and MLP are depicted in Fig. 6a.

Fig. 6
figure 6figure 6

Prediction result comparison on four different stocks

In addition, the following three stocks, namely, Apple Inc (AAPL), Coca-Cola Consolidated Inc (COKE) and Microsoft Corporation (MSFT) are involved in this example, and the same experiments are repeated to predict the one-day ahead close prices of the three stocks under the same protocol. The prediction performances of the five approaches are also reported in Table 11.

It can be seen from Table 11 that the prediction performances of ALMMo and PSO-ALMMo are greater than the other three mainstream regressors, namely, SVM, RF and MLP. One can see clearly from Fig. 6 that the performances of SVM, RF and MLP deteriorate significantly when the stock prices exceed the value ranges of historical data used for training (see Fig. 6b–d), whilst both ALMMo and PSO-ALMMo are able to provide accurate predictions consistently. For example, both SVM and RF failed to make reasonable predictions on MSFT after early 2021 when its stock price exceeded the historical highest point in the period of 2001–2020. The gap between the predictions made by MLP and the actual prices became increasingly larger as the stock price of MSFT keeps going higher. This performance comparison demonstrates the very strong capability of fuzzy systems in handling real-world uncertainties.

Another intrinsic advantage of fuzzy systems over alternative “black box” models, i.e., DNN and SVM, is their greater interpretability and transparency. The human-understandable, meaningful rule-based structure and parameters (see Table 10 for example) allow users to interpret the internal reasoning process and make sense of the predictions made by the model. This, in turn, makes the fuzzy rule-based models trustable for real-world applications.

9 Challenges and directions for further research and development

This paper has recalled the basic concepts of fuzzy sets and fuzzy systems and particularly, has introduced the main ideas of evolving, evolutionary and reinforcement learning-based fuzzy systems. It has provided an overview of the mainstream methods with an emphasis on their structure and parameter learning schemes. Then, a critical comparison between the three classes of fuzzy systems from the learning perspectives has been presented, enabling a better understanding of the pros and cons of the respective learning strategies. A high-level summary of real-world applications of fuzzy systems in different areas has been provided also.

To summarize, fuzzy systems have been widely recognized as a powerful tool for handling real-world problems with uncertainties by offering outstanding performance and high explainability. It has been a hotly researched field in recent decades, and there have been a wide variety of methods proposed to automate the design process of fuzzy systems. Many efforts have been devoted to further improve their learning performance, computational efficiency and model interpretability. Nevertheless, there remains a few open issues to be addressed as on-going and future work, including the following:

Explainability and transparency have increasingly become pressing issues in AI and machine learning due to the wider use of AI models in dealing with high-stakes and complex prediction applications in domains such as healthcare, finance, legal (Rudin 2019; Angelov et al. 2021). Although the state-of-the-art DNNs can offer greater performance on many challenging real-world problems (involving visual and speech information, in particular), such models are characterized as “black boxes” and can fool users (and even the systems developers themselves) easily. The lack of explainability and transparency can lead to severe or even fatal consequences (Rudin 2019). Although more complex models are not necessarily more accurate, it is generally recognised in practice that the simpler and more transparent fuzzy systems are incapable of handling high-dimensional, large-scale complex problems (Barredo Arrieta et al. 2020). To achieve greater performance in such application scenarios, the representation learning ability of fuzzy systems needs to be improved by redesigning the model structure, learning mechanism, or a mixture of both, and a trade-off between explainability/transparency and accuracy has to be made (Moral et al. 2021).

Missing data is commonly seen in many real-world applications and can have significantly adverse impact upon the conclusions drawn from data with missing values. Missing data may occur due to many different factors such as incomplete observations, transfer problems, memory loss, record damages, sensor failures, nonresponses, and so on (Škrjanc et al. 2019; Leite et al. 2020). There have been certain useful techniques for dealing with missing data, i.e., imputation (Musil et al. 2002), interpolation (Kokaram et al. 1995), and reverse engineering (Li et al. 2019), unsolved challenging issues exist, particularly in streaming data processing and multi-modal data modelling.

Class imbalance often occurs in real-world applications where the minorities are of greater interest, for example, in addressing problems such as financial fraud detection, network security, medical diagnosis and mechanical fault detection (Gu et al. 2020b; Naik et al. 2020). Conventional classifiers learned from imbalanced data tend to ignore the minorities because they are outnumbered and play a much weaker role in the overall performance evaluation. Although a number of successful methods (including a few evolutionary fuzzy systems (Sanz et al. 2015)) have been proposed for tackling this issue (López et al. 2013), learning an effective fuzzy rule-based model from imbalanced data streams is still a highly challenging task, especially when dealing with time-dependent or dynamic phenomena (Naik et al. 2018).

Curse of dimensionality refers to the various issues raised with high dimensional problems. Fuzzy systems are known to be less capable of handling high dimensional data as the systems can build a huge rule base from data, which usually causes significant overfitting effects and turns the systems into “black boxes” (Hagras 2018; Škrjanc et al. 2019). There exists a number of semantics-preserving dimensionality reduction or feature selection techniques, including those that are themselves based on fuzzy set theory, thereby easing the integration of such tools with the core fuzzy system (Jensen and Shen 2009). Evolutionary fuzzy systems are also capable of balancing the system complexity and performance by making a trade-off between different criteria using multi-objective evolutionary algorithms. In addition, learning efficiency may be gained through transformation of crisp rules while retaining model transparency (Chen et al. 2018). However, dimensionality reduction for high-dimensional data streams in the online application scenarios is fundamentally far more sophisticated considering the nonstationary nature of streaming data.

Big data is one of the hottest topics for current machine learning research. Single-model evolving fuzzy systems often fail to produce reliable results for large-scale problems, while evolutionary fuzzy systems are usually not applicable to such problems due to the very high computational complexity and additional system memory required. There have been fuzzy rule-based systems proposed (López et al. 2014; Río et al. 2015) that use the MapReduce programming model (Dean and Ghemawat 2008), in an effort to learn and fuse fuzzy rule bases from big data. However, handling large-scale static data is not an easy task, and it is even more challenging to handle large-scale streaming data in real time. In such application scenarios, distributed or hierarchical ensemble learning frameworks would usually be a better option (Su et al. 2015; Gu et al. 2021b), but designing a suitable strategy for fuzzy systems without scarifying the interpretability is not straightforward.

Small data can also be an obstacle for machine learning algorithms. The vast majority of complex AI models benefit from the greater availability of labelled training data and require a lengthy computation procedure to learn the model. Although fuzzy systems are much less data hungry, building a precise, reliable, robust predictive modelling from few examples remains an immensely challenging task (Angelov and Gu 2018a, b). To date, there have been only very few fuzzy models that utilize semi-supervised learning techniques to involve unlabelled data during system identification (Gu and Angelov 2018b; Gu et al. 2022; Gu 2022). Even so, with limited data, it is very difficult, if not impossible, for the learned model to cover the entire problem space. To handle this important issue, one of the key development areas in the present fuzzy systems community is to establish fuzzy rule interpolation methods, enabling approximately inference to be performed (Naik et al. 2018; Li et al. 2021; Yang et al. 2022b). Nonetheless, the point of how to construct accurate fuzzy rule-based predictive models with extremely weak supervision is still to be addressed.

Ensemble learning is a powerful machine learning scheme to construct a stronger classifier by merging individual weaker classifiers (Polikar et al. 2001; Polikar 2006). The majority of existing works employ mainstream classifiers of other types, such as SVM (Xing and Liu 2020), MLP (Polikar et al. 2001), decision tree (Chen and Guestrin 2016), etc., and are designed for static data. There have been a few ensemble models that employ fuzzy systems as ensemble components to learn from (big) data streams (Scherer 2011; Soua et al. 2013; Iglesias et al. 2013b; Leite and Škrjanc 2019; Gu et al. 2021b; Lughofer et al. 2021; Lughofer and Pratama 2022), offering both great precision and high interpretability. Although the existing works on ensemble fuzzy models have reported promising results, only very few efforts have been made attempting to design novel ensemble frameworks specifically for better incorporating fuzzy systems (Gu and Angelov 2021). Hence, the potential of fuzzy rule-based systems in ensemble learning has not been fully explored. It would be worth investigating better ensemble strategies that make the best use of the human-interpretable fuzzy features offered by fuzzy systems. Another direction worth further exploration is the multi-layered deep ensemble fuzzy systems. It is well known that the power of DNNs comes from the multi-layered distributed representations. However, most of the existing ensemble fuzzy models are based on a flat structure, only very few works explore the possibility of constructing deep ensemble models with fuzzy systems (Pratama et al. 2020b; Gu 2021).

Finally, hybridization of fuzzy system and deep learning is a relatively new concept and has been hotly researched in the recent years. Despite of being criticized as “black box” models and fragile to uncertainties, DNNs have demonstrated impressive performances on various highly challenging image and natural language processing problems. Fuzzy systems can effectively handle uncertainties and offer greater interpretability and model-transparency, but could not reach the same levels of performance achieved by DNNs on these challenging tasks due to their simpler and smaller-scale internal structures. By integrating fuzzy systems with DNNs, it becomes possible to combine the advantages of both approaches. Currently, such hybridized approaches have been developed and applied to image classification problems in the areas of remote sensing (Gu et al. 2022), autonomous driving (Soares et al. 2019a) and human activity recognition (Sargano et al. 2020), etc. These preliminary works have achieved promising performance, but alterative hybridization schemes to better integrate fuzzy systems and DNNs are still worth exploring in order to improve the performances and utility of the hybridized models. Further efforts are needed to implement these hybridized models for natural language processing problems, where DNNs are one of the dominant approaches used by researchers of this area.