Sparse regressions and particle swarm optimization in training high-order Takagi–Sugeno fuzzy systems

This paper proposes a method for training Takagi–Sugeno fuzzy systems using sparse regressions and particle swarm optimization. The fuzzy system is considered with Gaussian fuzzy sets in the antecedents and high-order polynomials in the consequents of the inference rules. The proposed method can be applied in two variants: without or with particle swarm optimization. In the first variant, ordinary least squares, ridge regression, or sparse regressions (forward selection, least angle regression, least absolute shrinkage and selection operator, and elastic net regression) determine the polynomials in the fuzzy system in which the fuzzy sets are known. In the second variant, we have a hybrid method in which particle swarm optimization determines the fuzzy sets, while ordinary least squares, ridge regression, or sparse regressions determine the polynomials. The first variant is simpler to implement but less accurate, the second variant is more complex, but gives better results. A new quality criterion is proposed in which the goal is to make the validation error and the model density as small as possible. Experiments showed that: (a) the use of sparse regression and/or particle swarm optimization can reduce the validation error and (b) the use of sparse regression may simplify the model by zeroing some of the coefficients.


Introduction
Fuzzy logic systems (or shortly, fuzzy systems), like other universal approximators, are capable of approximating any nonlinear function (mapping) with any degree of accuracy.Fuzzy systems offer a linguistic way of drawing conclusions because they are based on fuzzy IF-THEN rules.Calculation of the system outputs can be carried out in various ways.The literature mainly considers systems implemented with the use of Mamdani [22] and Takagi-Sugeno (T-S) [33] fuzzy inference.In a Mamdani system, the output of each rule is a fuzzy set, while in a T-S system, the output is a function of input variables.A T-S system is more computationally efficient since the defuzzification process is based on a weighted average rather than on a center of gravity.
Training fuzzy systems involve the selection of the number of rules (structure identification) and the determination of the parameters of the system.The first task is solved mainly by means of clustering methods [2,26,29], while the second one uses various optimization methods such as regressions [10,35,36], evolutionary algorithms [4,6], particle swarm optimization [12,15,21,26], and others [13,43].These methods are usually used in searching for the parameters for the antecedents and consequents of the fuzzy inference rules.
The main contributions of this paper are the use of sparse regressions in training high-order Takagi-Sugeno fuzzy systems and the proposition of a quality criterion that expresses a compromise between the accuracy of the model The original version of this article was revised: An error in equation (39) in the pdf file of the article has been corrected.and its sparsity.From the reviewed papers given in Sect.2, it can be seen that currently, there are no applications of sparse regression methods for training fuzzy systems.In this article, we propose the use of sparse regressions that give sparse solutions, which means that some of the coefficients of the model are exactly zero.Such models are compact, and they are easier to interpret [27].Moreover, sparse regressions provide regularization, which means that these methods can be used when the number of variables in the linear system exceeds the number of observations.In the proposed method, the premise parameters are determined manually or by a particle swarm optimization (PSO) algorithm, whereas the consequent parameters are defined by a sparse regression.The ordinary least squares (OLS) regression was used to build a fuzzy reference model.The proposed approach has been tested in three examples.This paper is a continuation of the work [38], where the sparsity of fuzzy models was not considered.
The structure of this article is as follows.Section 2 surveys the related work.Section 3 presents the proposed hybrid method for training Takagi-Sugeno fuzzy systems.This section contains the description of the high-order T-S system, training the consequent and antecedent parameters, the performance criteria, and the procedure for designing the fuzzy models.The experimental results and discussion are presented in Sect. 4. Finally, the conclusions are given in Sect. 5.

Related work
This chapter provides an overview of the literature on the use of PSO to train fuzzy systems.The review is divided into two parts: The first part discusses those papers in which PSO is used for training both the antecedents and consequents, whereas the second part discusses the papers where hybrid methods using PSO and regressions were applied.

Methods based on PSO
A two-phase swarm intelligence algorithm for zero-order Takagi-Sugeno-Kang (T-S-K) fuzzy systems was presented in [14].The first phase aims to learn the structure of the fuzzy system and its parameters by ant colony optimization.This phase is used to find a good initial fuzzy rule base for further learning.In the second phase, the algorithm optimizes all of the free parameters in the fuzzy system using particle swarm optimization.
A multi-swarm cooperative particle swarm optimization method was proposed in [24], where the population consists of one master swarm and several slave swarms.The algorithm was used to automatically design the fuzzy identifier and the fuzzy controller in dynamical systems.
Takagi-Sugeno fuzzy systems with Gaussian membership functions in the antecedents and linear functions in the consequents of the fuzzy rules were considered.
Khayat et al. [17] proposed a hybrid algorithm to design a fuzzy neural network to implement T-S fuzzy models.The algorithm consists of two phases.Firstly, the fuzzy model is generated by a coarse-tuning phase.In this phase, a fuzzy cluster validity index is used to determine the optimal number of clusters (rules), and fuzzy c-means is used to partition the input space.In the second phase, in order to adjust the parameters of the premise parts and consequent parts, genetic and PSO algorithms are used.
In [25], there is developed a method, called the knowledge acquisition with a swarm intelligence approach, for fuzzy rule evolution.They used a PSO algorithm to obtain the antecedents, consequents, and connectives (AND/OR) of the fuzzy rules.A Mamdani-type fuzzy rulebased system was used.The proposed method was tested on two examples: the control of an inverted pendulum, and scheduling in grid computing.
An algorithm that automatically extracts T-S fuzzy models from data using PSO was presented in [42].The authors proposed a cooperative random learning PSO, where several subswarms search the space and exchange information.In their method, each fuzzy rule has a label that is used to decide whether the rule is included in the inference process or not.The antecedent parameters, the consequent parameters, and the rule labels are encoded in a particle.
In [5], there is proposed a heterogeneous multi-swarm PSO to identify the T-S fuzzy system.The T-S system proposed by the authors uses linear regression models in several subspaces to describe a nonlinear system.In the presented algorithm, the antecedent parameters and the consequent parameters of the T-S models are encoded in particles and are obtained simultaneously in the training process.
In [23], a hierarchical cluster-based multi-species PSO for building T-S-K fuzzy systems is presented.It is applied to a spatial analysis problem, in which the area under study is divided into several subzones.For each subzone, a zero-order or first-order fuzzy system is extracted from the dataset.
An immune coevolution PSO with multi-strategy for T-S fuzzy systems is proposed in [21].The population consists of one elite subswarm and several normal subswarms.The parameters identified are the centers and widths of the membership functions (the antecedent parameters), and the coefficients of the local models (the consequent parameters).
In [26], there is presented a learning algorithm based on a hierarchical PSO (HPSO) to train the parameters of a T-S fuzzy model.First, an unsupervised fuzzy clustering algorithm is used to partition the data and identify the antecedent parameters of the fuzzy system.Next, a selfadaptive HPSO algorithm is used to obtain the consequent parameters of the fuzzy system.
Jhang et al. [12] proposed a T-S-K-type fuzzy cerebellar model articulation controller (T-FCMAC) for solving various problems.To determine the parameters of the T-FCMAC, a group-based hybrid learning algorithm (GHLA) was used.The GHLA algorithm was developed by combining an improved quantum particle swarm optimization algorithm and the Nelder-Mead method.The authors also adopted the fuzzy C-mean clustering technique to improve the performance of quantum particle swarm optimization.
A new approach to optimize the Mamdani fuzzy systems without a need of any prior knowledge was proposed in [15].The membership functions, the scaling factor parameters, and the fuzzy rule conclusions were optimized simultaneously with a mixed-coding PSO algorithm by combining a special monitoring function and a self-adaptive threshold.

Hybrid methods based on PSO and regressions
The approach proposed in [19] uses a hybrid learning method for T-S fuzzy systems.This method combines particle swarm optimization and recursive least squares (RLSE) to obtain a fuzzy approximation.The PSO is used to train the antecedent part of the T-S system, whereas the consequent part is trained by the RLSE method.The rootmean-square error is used as the cost function to measure the quality of the approximation.An approach to building a type-2 neural-fuzzy system from a set of input-output data was proposed in [40].First, a fuzzy clustering method is used to partition the dataset into clusters.Then, a type-2 fuzzy T-S-K rule is derived from each cluster.After the set of initial fuzzy rules is obtained, the parameters are refined using particle swarm optimization and a divide-and-merge-based least squares estimation.
In [41], there is proposed an approach to function approximation using robust fuzzy regression and particle swarm optimization.First, a fuzzy regression is used to construct a T-S-K fuzzy model.Next, particle swarm optimization is used to tune the parameters of that fuzzy model.As the fitness function, the root-mean-square error is used.
In [20], a self-learning complex neuro-fuzzy system that uses Gaussian complex fuzzy sets was presented.The knowledge base of the system consists of the T-S fuzzy rules with complex fuzzy sets in the antecedent part and linear models in the consequent part.The antecedent parameters and the consequent parameters are trained by the PSO algorithm and recursive least squares, respectively.
A method for fuzzy c-regression model clustering was presented in [28].The proposed approach combines the advantages of two algorithms: clustering and particle swarm optimization.The fuzzy model used in this method is a T-S fuzzy system with local linear models.The consequent parameters of the fuzzy rules are estimated by the orthogonal least squares method.
In [39], a self-constructing radial basis function neuralfuzzy system was proposed, using particle swarm optimization for generating the antecedent parameters, and the least-Wilcoxon norm for the consequent parameters, instead of the traditional least squares estimation.
A T-S model based on particle swarm optimization and kernel ridge regression was presented in [2].This algorithm works in two main steps.In the first step, the clustering based on the PSO algorithm separates the input data into clusters and obtains the antecedent parameters.In the second step, the consequent parameters are calculated using a kernel ridge regression.The proposed model was applied to generalized predictive control.
In [32], the adaptive chaos PSO algorithm (ACPSO) for identification of the T-S fuzzy model parameters using weighted recursive least squares was proposed.This approach is a compromise between the adaptive and chaos PSO algorithms.The ACPSO algorithm is used to optimize the parameters of the model; then, the obtained parameters are used to initialize the fuzzy c-regression model.
An identification method for the Takagi-Sugeno fuzzy model was proposed in [35].Firstly, the fuzzy c-means algorithm is used to determine the optimal rule number.Next, the initial membership function and the consequent parameters are obtained by the PSO algorithm.To obtain the final parameters, a fuzzy c-regression model and orthogonal least square methods are applied.
In [36], there is proposed a complex fuzzy machine learning approach to function approximation.The PSO is used to optimize the premise parameters of the fuzzy model, while the recursive least squares estimator is used to find the consequents parameters.
3 Proposed method for training fuzzy systems

High-order Takagi-Sugeno fuzzy system
A Takagi-Sugeno (T-S) fuzzy system [33] with one input x and one output y is described by r fuzzy inference rules where j ¼ 1; 2; . ..; r; A j ðxÞ is a fuzzy set, m !0 is the degree of the polynomial, w kj 2 R, and k ¼ 0; 1; . ..; m.The following definition given in [38] extends the concept of the T-S system in which zero or first-order polynomials are used.

Definition 1
The T-S system with the rules (1) is called: • zero-order if y ¼ w 0j , which means that the consequent functions are constants [33], • first-order if y ¼ w 1j x þ w 0j , which means that the consequent functions are linear [33], • high-order if m ! 2, which means that the consequent functions are nonlinear [38].
In this paper, we use Gaussian membership functions for input fuzzy sets that can be unevenly spaced in the universe of discourse x 2 X ¼ ½p 1 ; p r (see Fig. 1).These functions are given by where p j is the peak of the function A j , and r j [ 0 is the width.The peaks p j and the widths r j are written as the vectors p ¼ ½p j ¼ ½p 1 ; . ..; p r and r ¼ ½r j ¼ ½r 1 ; . ..; r r , respectively.
The output of the T-S system is computed by Introducing the notion of a fuzzy basis function, formula (3) can be written in a compact form.

Definition 2
The fuzzy basis function (FBF) for the jth rule is the function n j ðxÞ given by [37] n j ðxÞ ¼ Employing the definition in ( 4), the output of the T-S system can be written as In formula (5), the FBFs are multiplied by x k ; therefore, we define a modified fuzzy basis function.

Definition 3
The modified FBF (MFBF) for the jth rule is the function h kj ðxÞ given by where k ¼ 0; 1; . ..; m.
Applying (6), we obtain We introduce the following vectors: h j ðxÞ ¼ ½h mj ðxÞ; . ..; h 1j ðxÞ; h 0j ðxÞ ð8Þ The MFBFs are the elements of the vector hðxÞ, and w is the vector of p ¼ rðm þ 1Þ model parameters.

Training the consequent parameters
We assume as known the observations ðx i ; y i Þ, where i ¼ 1; . ..; n.These observations are written as vectors x ¼ ½x i T ¼ ½x 1 ; . ..; x n T and y ¼ ½y i T ¼ ½y 1 ; . ..; y n T .In order to apply regression methods, we introduce the regression matrix where h j ðx i Þ is given by (8).The matrix X will be used in determining consequent parameters of fuzzy rules by regressions such as ordinary least squares, ridge regression, or sparse regressions.The ordinary least squares, as the simplest method, will be applied further to build a fuzzy reference model.

Ordinary least squares
In the OLS, the cost function to be minimized is the sum of squared errors, given by where ŷi ¼ hðx i Þw is the estimated output of the system (see equation ( 10)) for the ith observation.The vector w contains the system parameters to be determined.The number of these parameters is equal to p ¼ dimðwÞ ¼ rðm þ 1Þ.The optimal solution is given by [ where y ¼ ½y 1 ; . ..; y n T .This is a batch least squares because the model parameters are computed directly from all the data contained in X and y.

Ridge regression
In ridge regression [11], the cost function is the penalized sum of squared errors where k !0 is a regularization parameter.The fuzzy model weights are given by where I is the identity matrix.We apply the ridge regression because it can be used for ill-conditioned problems, that is when the matrix X T X is close to singular.Ridge regression is a one-pass method, and therefore, it is very fast.
The OLS and ridge regressions have been implemented in MATLAB using a custom function.

Sparse regressions
The regressions described in this section are sparse, which means that some of the coefficients of the model are exactly zero [27].These methods lead to reduced models that have a simpler structure and are easier to interpret.The following sparse methods are considered in this paper: • Forward selection (FS)-This is a stepwise regression, i.e., variables are added one by one to the model.The algorithm starts with all coefficients equal to zero, and the next variable is chosen based on a certain criterion.For example, it can be the one with the highest correlation with the current residual vector [27].• Least angle regression (LAR) [8,27]-The LAR works similarly to the FS procedure, but instead of moving in the direction of one variable, the estimated parameters are calculated in a direction in which the angles with each of the variable currently in the model are equal.
The LAR algorithm is the basis for other sparse methods, such as the least absolute shrinkage and selection operator and elastic net regression.• Least absolute shrinkage and selection operator (LASSO) [27,34]-This regression has a mechanism that implements a coefficient shrinkage and variable selection.The cost function combines the sum of the squared errors, and the penalty function is based on the L 1 norm: where k is a nonnegative regularization parameter.• Elastic net (ENET) [27,44]-The ENET regression combines the features of ridge regression and the LASSO.The cost function contains a penalty term related to both the L 1 and the L 2 norms: where k and d are nonnegative regularization parameters.To find the solution, the LARS-EN algorithm, which is based on the LARS algorithm [8], is used.
The sparse regressions have been implemented in MATLAB using the toolbox SpaSM [27].

Example of an application
To show the use of the ridge method when the problem is ill-defined (the number of observations is less than the number of predictors (n\p)), let us consider the following example of tuning the T-S system for a small amount of data.We have four observations (n ¼ 4) in the form of vectors x ¼ ½1; 2; 3; 4 T and y ¼ ½6; 5; 7; 10 T .For the input x, we define two fuzzy sets (r ¼ 2) with the membership functions A 1 ðxÞ ¼ gaussðx; 1; 1:274Þ and A 2 ðxÞ ¼ gauss ðx; 4; 1:274Þ.In the consequent part of the rules (1), we assume the polynomials are linear, that is, m ¼ 1.We can see that n ¼ p ¼ 4, which means the amount of data is equal to the number of regressors.
The regression matrix ( 13) is For the data x i , we obtain the following FBFs ( 4) Calculating the MFBFs (6) in the matrix (21), which is omitted here, we obtain Assume now that we have only three observations, for example, x ¼ ½1; 3; 4 T and y ¼ ½6; 7; 10 T .In this case, the regression matrix is given by where n\p.For the matrix (25), we obtain detðX T XÞ ¼ 1:144eÀ16, which means that X T X is close to singularity and the fuzzy model can be unreliable.Applying the ridge regression (18) Applying forward selection, we obtain three solutions in the coefficient path Similarly, we can obtain solutions from the LAR, LASSO, and ENET regressions.

Training the antecedent parameters
A PSO algorithm is used to train the antecedent parameters.PSO was developed by Kennedy and Eberhart [7,16].
It is based on the social behavior of living organisms that live in large groups.PSO has many advantages, among others, it can escape from local optima, it is easy to implement, and it has fewer parameters to adjust.PSO can be used for many optimization problems, such as in [9,14,18,30,41].PSO consists of a set of solutions (particles), where each particle represents a point in a multi-dimensional space.A group of particles (a population) forms a swarm.Each particle behaves in a distributed manner, using its own intelligence and the group intelligence of the swarm.The particles move through the search space and exchange information to find the optimal solution.Each particle has a position (x) and velocity (v).In addition, each particle remembers its best position (pbest) and has access to the best position in the swarm (gbest).The learning process in the PSO method is based on two components: • the cognition component: attracts particles toward its local most promising position.• the social component: attracts particles toward the global best position discovered by the swarm.
The best solution is obtained by minimizing the objective function, calculated in this paper as the square root of the mean square error [19,31].The velocity v k and position x k of the kth particle are determined using the following equations [7]: where v is a constriction factor, here equal to 0.7298, r 1 ; r 2 are vectors of random numbers uniformly distributed within [0,1], l is the current iteration number, c 1 is the cognitive coefficient, and c 2 is the social coefficient (c 1 ¼ c 2 ¼ 2:05).In this method, each of the particles represents the parameters of Gaussian fuzzy sets and contains the peaks (p 2 ; . ..; p rÀ1 ) and the widths of all membership (r1; . ..; r r ) functions: We assume that the peaks p 1 and p r are known; hence, the number parameters to be chosen by the PSO algorithm is equal to 2r À 2.
To initialize the particles, we first distribute the peaks evenly in the interval (a, b), that is, the p j is given by where j ¼ 1; 2; . ..; r; d ¼ ðb À aÞ=ðr À 1Þ; a ¼ min Next, we change p 2 ; . ..; p rÀ1 according to the formula where rand is a random number in the interval [0, 1], and d p is the width of the initialization range for the peaks.The widths r j are initialized using the formula where j 2 1; . ..; r; d r is the width of the initialization range for r j and 0\r min \r max .During the optimization, we limit the peaks p j to the interval ½p 1 ; p r and the widths r j of the Gaussian sets to the interval ½r min ; r max .

Performance criteria
In this paper, we use a validation method in which the dataset is divided into two sets: a training set and a validation set.The training set is the set of observations used for learning, that is, for tuning the parameters (weights) of the fuzzy model.The validation set is also a set of observations, but it is disjoint from the training set.It is used to assess how well the model makes predictions based on new data.The performance criterion is calculated using the validation set, and it is calculated as the square root of the mean square error where V denotes the number of observations in the validation set, y k denotes the kth target in the validation set, and ŷk denotes the output of the fuzzy model obtained for kth input data x k in the validation set.
The fuzzy models used in this paper may be sparse, which means they may have some coefficients equal to zero.The following definition introduces the notion of the sparsity of a fuzzy model.

Definition 4
The sparsity of a T-S fuzzy model is defined to be where nz is the number of zero-valued coefficients in the polynomials, r is the number of rules, and m is the degree of the polynomial.
Using the definition of sparsity, we introduce the notion of density of a fuzzy model as follows.

Definition 5
The density of a T-S fuzzy model is defined as one minus the sparsity: In this paper, we propose choosing the best T-S model by minimizing a quality criterion in which the goal is to make the validation error and the density as small as possible: where a ¼ 0:5 and RMSE OLS is the value of RMSE for the OLS regression that is treated as the reference method.The quality index (39) expresses a compromise between the accuracy of the model and its sparsity.

Procedure for designing fuzzy models
The proposed fuzzy models are designed in two variants: without and with particle swarm optimization.In the first variant, the regressions are used to determine the polynomials in a fuzzy system in which the fuzzy sets are known.
In the second variant, the PSO determines the fuzzy sets, while the regressions determine the polynomials.
The following methods for building fuzzy models are employed in this paper: • OLS: the method in which the fuzzy sets are defined by the user, while the polynomials are determined by the OLS regression.• RIDGE: the method in which the fuzzy sets are defined by the user, while the polynomials are determined by the ridge regression.• SR: the method in which the fuzzy sets are defined by the user, while the polynomials are determined by a sparse regression (SR), e.g., FS, LAR, LASSO, or ENET.• PSO-OLS: the method in which the fuzzy sets are determined by the PSO algorithm, while the polynomials are determined by the OLS regression.• PSO-RIDGE: the method in which the fuzzy sets are determined by the PSO algorithm, while the polynomials are determined by the ridge regression.
• PSO-SR: the method in which the fuzzy sets are determined by the PSO algorithm, the polynomials are determined by a sparse regression.
The idea of the procedure for designing the fuzzy models is presented in Fig. 2.
In Block 1, we determine the Gaussian fuzzy sets.In the OLS, RIDGE, and SR methods, one proposition is generated in such a way that these sets are distributed evenly in the space X, and the cross-point of two adjacent sets is equal to 0.5.In the PSO-OLS, PSO-RIDGE, and PSO-SR methods, ten propositions are generated by the PSO algorithm.The outputs of Block 1 are the vectors p and r.
In Block 2, we determine the regression matrix X (13).
In Block 3, we generate the coefficient path for one of the SR methods.
In Block 4, we validate the OLS, RIDGE, and PSO-RIDGE methods.As a result of validating the OLS method, we get the value for RMSE OLS .
In Block 5, we validate the SR and PSO-SR methods.The validation is done along the coefficient path.For the PSO-SR method, we select from among ten fuzzy models the one with the smallest validation error RMSE.The quality index q is calculated in such a way that the smallest value is chosen, with the constraint that the RMSE is not greater than RMSE OLS .
The results of the calculations in Blocks 4 and 5 are the optimal weights ðw opt Þ of the fuzzy model.

Experimental results and discussion
This section gives examples of the application of the developed methods to the approximation of nonlinear functions.The following parameters were adopted in all experiments: k ¼ 1eÀ08 for the ridge regression (18), k ¼ 1eÀ10 for the ENET regression (20), the number of iterations was 500, and the number of particles was 60 for the PSO algorithm.The parameter k and the number of iterations were selected experimentally.The k can prevent an ill-defined problem from occurring, but if it is too high, it can cause a deterioration of results.The number of particles was chosen based on [19].We applied nine fuzzy rules, and the degree of the polynomial was two.The number of observations ðx i ; y i Þ was n ¼ 50, and they were evenly distributed in the space X.The dataset was divided into a training set of 40 observations and a validation set of 10 observations.The number of observations was chosen based on [3,31].The widths of the initialization range d p ; d r and the minimum and maximum values r min ; r max were chosen experimentally according to the range of input variables.

Experiment 1
We consider the nonlinear function [19] where x 2 ½3; 7. The initialization parameters and limits were as follows: d p ¼ 3:0; d r ¼ 5:0; r min ¼ 0:1, and r max ¼ 5:0.The experimental results are presented in Table 1.The smallest value of the quality index q, equal to 0.2955, was obtained for the PSO-ENET method.For this method, the validation error is smaller than the error for the reference model.Table 2 compares the parameters of the fuzzy systems obtained by the OLS method with those from the PSO-ENET method.It can be seen that the PSO-ENET method zeroed out 48% of the 27 coefficients.Based on this table, the fuzzy rules for the best model can be written as It is seen that the fuzzy model PSO-ENET has zero polynomial in the consequent part of rule R 9 .In Fig. 3, the approximation to the function is shown along with the approximation error for the PSO-ENET model.

Experiment 2
In this experiment, we consider the nonlinear function [31] where x 2 ½À8; 12.The initialization parameters and limits were chosen to be d p ¼ 12:0; d r ¼ 8:0; r min ¼ 0:5, and r max ¼ 10:0.The results of the applied methods are presented in Table 3.For the PSO-FS method, the quality index q equal to 0.4059 was the smallest value obtained for all simulations.The sparsity of this model was 26%.Table 4 presents the parameters of the fuzzy systems obtained using the OLS method and the PSO-FS method.
The fuzzy rules for the best model can be written as

Experiment 3
Here, the nonlinear function is given by [3] where x 2 ½0; 1.The parameters for initialization and limits were as follows: d p ¼ 0:3; d r ¼ 1:0; r min ¼ 0:1, and r max ¼ 1:0.The results of these experiments are presented in Table 5.The smallest value of q, equal to 0.2714, was obtained for the PSO-ENET algorithm.The sparsity of the chosen fuzzy model was z ¼ 52%.Table 6 presents the parameters of the fuzzy systems obtained using the OLS It is worth noting that the fuzzy model PSO-ENET has zero polynomial in the consequent part of rule R 6 .The approximation of the function using the PSO-ENET method, together with the approximation error, is shown in Fig. 5.

Discussion
Based on the results presented in this section, it can be observed that: • in the first variant (without PSO), the use of sparse regressions may reduce the validation error RMSE and  the quality index q compared to the fuzzy OLS-based reference model, • in the second variant (with PSO), the experiments did not give results for the PSO-OLS method, but they were obtained for the PSO-RIDGE method, • comparing the results in the second variant with the PSO-RIDGE method: • in the first experiment, a reduction in the error RMSE and quality index q was obtained for the PSO-LASSO and PSO-ENET methods, • in the second and third experiments, the error RMSE for sparse regression methods was worse than for the PSO-RIDGE method, while quality index q was better, • the results of the error RMSE and the quality index q in the second variant are better than in the first variant, • in each experiment, the model was simplified by reducing the number of polynomial coefficients: In the first experiment, a reduction of 48% was obtained, in the second-26%, while in the third 52%.
The obtained results show that the use of sparse regressions can reduce the validation error compared to the reference model and simplify fuzzy models by zeroing some coefficients.It should be emphasized that the developed method applies to T-S models [33] that have polynomials in their consequents, not fuzzy sets, as is the case of Mamdani models [22].

Conclusions
Two methods of training high-order Takagi-Sugeno systems using sparse regressions and particle swarm optimization have been proposed.In the first, the antecedent parameters are set manually and in the second are set by a particle swarm optimization algorithm.In both variants, the consequent parameters are determined by a sparse regression.A fuzzy model based on the ordinary least squares regression is used as a reference method.To assess the quality of the fuzzy models, a quality criterion that expresses a compromise between the accuracy of the model and its sparsity was proposed.This criterion is based on the square root of the mean square error calculated for the validation set and the density of a fuzzy model.Compared with the reference method, the conducted experiments showed that: a) the use of sparse regressions and/or particle swarm optimization can reduce the validation error; b) the use of sparse regressions may simplify the fuzzy model by setting some of the coefficients to zero.This paper uses a high-order T-S system to approximate the functions of one variable.Future work will focus on generalizing the proposed method for fuzzy systems with many input variables.Such systems can be applied to various applications, for example, for the approximation of functions with more than one variable, the identification of nonlinear models, the determination of inverse kinematics in robotics.Moreover, some work will be carried out toward the use of other optimization methods.

Fig. 2
Fig.2The idea of the procedure for designing the fuzzy models

R 9 : 2 ð43ÞFigure 4
Figure4shows the approximation of the function and the error between the target function and the estimator for the PSO-FS model.

Fig. 3
Fig. 3 Experiment 1: Approximation of the function and the error between the target function y and the estimator ŷ for the PSO-ENET model in Table 2

3 Fig. 4
Fig. 4 Experiment 2: Approximation to the function and the error between the target function y and the estimator ŷ for the PSO-FS model in Table 4

Fig. 5
Fig. 5 Experiment 3: Approximation to the function and the error between the target function y and the estimator ŷ for the PSO-ENET model in Table 6

Table 1
Performance comparison for Experiment 1

Table 3
Performance comparison for Experiment 2

Table 4
Parameters of fuzzy systems in Experiment 2

Table 5
Performance comparison for Experiment 3

Table 6
Parameters of fuzzy systems in Experiment 3