T2RFIS: type-2 regression-based fuzzy inference system

This article discusses a novel type-2 fuzzy inference system with multiple variables in which no fuzzy rules are explicitly defined. By using a rule-free system, we avoid the serious disadvantage of rule-based systems, which are burdened with the curse of dimensionality. In the proposed system, Gaussian membership functions are used for its inputs, and linearly parameterized system functions are used to obtain its output. To obtain the system parameters, a genetic algorithm with multi-objective function is applied. In the presented method, the genetic algorithm is combined with a feature selection method and a regularized ridge regression. The objective functions consist of a pair in which one function is defined as the number of active features and the other as the validation error for regression models or the accuracy for classification models. In this way, the models are selected from the Pareto front considering some compromise between their quality and simplification. Compared to the author’s previous work on the regression-based fuzzy inference system, a new inference scheme with type-2 fuzzy sets has been proposed, and the quality has been improved compared to the system based on type-1 fuzzy sets. Four experiments involving the approximation of a function, the prediction of fuel consumption, the classification of breast tissue, and the prediction of concrete compressive strength confirmed the efficacy of the presented method.

Although the results for type-2 fuzzy systems are promising, the underlying problem is that the processing times for these systems are higher than for type-1 fuzzy systems. This is mainly due to the fact that these systems are more complex and have more parameters that require tuning. Therefore, the paper [55] proposes the following directions that scientists could explore (apart from type reduction, they also apply to type-1 fuzzy systems): -Model optimization-the choice of membership functions, rules, and operations is still a big open question, -Type reduction-this time-consuming operation should still be improved, -Defuzzification-despite the existence of many defuzzification methods, there is a need to improve their efficiency, -Computational complexity-this issue can still be explored because of the curse of dimensionality, -Hybridization-the search for new learning algorithms is desirable to improve the overall performance.
In addition, the authors point to the need for practical applications in areas such as hardware implementations and applications in medical diagnostics, in the area of Big Data, or in robotics. The literature review presented in the following cites some solutions that exist in the literature in the areas mentioned above.

Related work
It should be stressed that most of the solutions proposed in the literature are related to systems based on fuzzy rules. Such systems are limited in their use because of the curse of dimensionality associated with the exponential increase in the number of rules. To optimize fuzzy models, some attempts have been made to develop their structure without a rule base. The paper [60] focuses on the synthesis of a fuzzy logic control system using a new analytical approach without any rule base. Unfortunately, no way to select fuzzy system parameters is provided, especially when using observed data. In the paper [88], the authors present a new type of fuzzy inference system similar to that presented in [60]. Their system does not have a rule base, and the number of its parameters increases linearly. The authors do not provide a tuning method based on the observed data; it is only mentioned that a method similar to backpropagation is used. Another way to reduce the number of fuzzy rules is to create hierarchical systems, as described in [50], which proposes a type-2 fuzzy hierarchical system for high-dimensional data modeling. The problems of type reduction and defuzzification in type-2 fuzzy systems are still the subject of research and have not yet been finally solved. This paragraph presents examples of papers devoted to these issues. The proposition of two type reduction algorithms along with a comprehensive study using various programming languages is presented in [10]. In the paper [14], the authors propose an inference and type reduction method for constrained interval type-2 fuzzy sets using the concept of switch indices. The paper [24] presents a technique, named stratic defuzzification, for discretized general type-2 fuzzy sets. This method is based on the transformation of a type-2 fuzzy set into a type-1 fuzzy set. Two algorithms, called binary algorithms, are proposed in [49] to calculate the centroid of interval type-2 fuzzy sets. A method to calculate the center of gravity of polygonal interval type-2 fuzzy sets is introduced in [57]. The authors' method can be applied both on discrete and continuous domains without the need of discretization. A non-iterative method to obtain the center of centroids for an interval type-2 fuzzy set is presented in [5]. This method is based on the weighted sum of the centroids for the lower and upper membership functions. The study [11] presents three types of samplingbased reduction algorithm for general type-2 fuzzy systems that are extended versions of the algorithms proposed in the literature for interval type-2 fuzzy systems. In the paper [23], the authors present a theoretical approach to the type reduction problem using the Chebyshev inequality. Through their method, they obtain the centroids and the bounds for type-1 and interval type-2 fuzzy numbers. The paper [93] proposes a type reduction method for general type-2 fuzzy systems using an a-plane representation. In this representation, a series of a-planes is applied to decompose a general type-2 fuzzy set. An experimental evaluation of various defuzzification algorithms, namely the Karnik-Mendel procedure, the Nie-Tan method, the q factor method, and the modified q factor method, can be found in [103].
Hybridization of type-2 fuzzy systems with various learning algorithms has been considered in the following sample papers. The paper [98] presents a hybrid approach to build a type-2 neural fuzzy system that incorporates particle swarm optimization and least-squares estimation. The authors of the paper [53] propose a hybrid mechanism for training type-2 fuzzy logic systems that uses a recursive square root filter to tune the type-1 consequent parameters and the steepest descent method to tune the interval type-2 antecedent parameters. In the paper [85], an evolving type-2 Mamdani neural fuzzy system is proposed. The work [15] introduces the extreme learning strategy to develop a fast training algorithm for the interval type-2 Takagi-Sugeno-Kang fuzzy logic systems. The paper [4] proposes a hybrid learning mechanism for type-2 fuzzy systems that uses the recursive orthogonal least-squares algorithm to tune the type-1 consequent parameters and the backpropagation algorithm to tune the type-2 antecedent parameters. The work [79] presents an adaptive neuro-fuzzy inference system (ANFIS) that uses interval Gaussian type-2 fuzzy sets in the antecedent part and Gaussian type-1 fuzzy sets in the consequent part. The structure of the proposed ANFIS2 is very similar to that of the traditional ANFIS, except for an extra layer for type reduction. In the paper [22], the authors propose a design of an interval type-2 fuzzy logic system based on the quantum-behaved particle swarm optimization algorithm. A trapezoidal type-2 fuzzy inference system is proposed in [30]. To optimize this system, a tensor unfolding structure training method is applied. A method of variable selection and sorting to construct the type-2 Takagi-Sugeno-Kang fuzzy inference system is described in [90]. This method selects independent variables using Chi-square statistics. The paper [12] presents an application of interval type-2 fuzzy logic interval system to forecast the parameters of a permanent magnetic drive. The authors use backpropagation and recursive least-squares algorithms to optimize the fuzzy logic system. In the paper [7], a composite framework is presented that uses the deep learning technique for tuning interval type-2 fuzzy systems. All paper mentioned above use rule-based fuzzy inference systems.

Goals and contributions
According to the directions for the development of type-2 fuzzy systems mentioned above, the goals of the solution proposed in this article are as follows. In the area of model optimization, propose a system that does not use fuzzy inference rules, which allows one to avoid the explosion of their number along with the increase in the number of inputs and fuzzy sets. In the area of defuzzification, eliminate this operation from a fuzzy system. Regarding the type reduction, replace this operation with the weighted sum of regression matrices. In terms of computational complexity, propose a system in which the complexity increases linearly or at most squarely, instead of the exponential increase as in the systems known in the literature. Finally, in the area of hybridization, the use of ridge regression and multi-criteria optimization with feature selection to train the proposed system. In view of these goals, the main contributions can be summarized as follows: -proposal of a new type-2 fuzzy inference scheme without explicitly defined rules, -using a hybrid method combining regularized regression and global optimization with feature selection to train this system, -improving performance over the previously published type-1 fuzzy inference system.
Moreover, the proposed method is tested on four different fuzzy modeling problems, which involve the approximation of a one-variable function and the prediction of fuel consumption. This paper is a continuation of the author's previous work [92], in which the type-1 RFIS model has been discussed.

Paper structure
The structure of this paper is as follows. Section 2 provides the description of a type-2 regression-based fuzzy inference system with linearly parameterized system functions. The regression and design matrices used to design the system are introduced in Sect. 3. A training method to calculate the system function parameters and an illustrative example are presented in Sect. 4. Section 5 presents a training method for type-2 fuzzy sets. In Sect. 6, the calculation of the objective functions is described. Section 7 contains the description of the experimental results. Finally, the conclusions are given in Sect. 8.

Problem statement
We consider the problem of training a type-2 regressionbased fuzzy inference system (T2RFIS) with m inputs x 1 ; . . .; x m and one output y (Fig. 1). The problem concerns the determination of type-2 fuzzy sets for the inputs of the system and the parameters of a system function used to obtain the output. To solve this problem, a hybrid training method is proposed, in which fuzzy sets are determined by a multi-objective genetic algorithm and the system function parameters by a regression method. Because we assume that the system function is linearly parameterized, this method is implemented here by regularized ridge regression. The models generated by the proposed method are simplified by applying a variable selection algorithm.
2 Type-2 fuzzy inference system For the system inputs creating the vector x ¼ ½x 1 ; . . .; x m , we define, for each input, p Gaussian fuzzy sets of type-2 ( Fig. 2) described by the formulas where j ¼ 1; 2; . . .; p, x k 2 ½c k 1 ; c k p , k ¼ 1; 2; . . .; m, c k j is the center of a fuzzy membership function, r k j is the width of a lower fuzzy membership function, r k j is the width of an upper fuzzy membership function, and gðx; c; rÞ ¼ exp ðÀ0:5ðx À cÞ 2 =r 2 Þ. Using the defined fuzzy sets, we introduce a lower fuzzy basis function n k j and an upper fuzzy basis function n k j given by The lower fuzzy basis functions and upper fuzzy basis functions are written as vectors The output of the T2RFIS model is determined aŝ where b s is the unknown system function coefficient, t is the number of system function coefficients, and d s ðnðxÞÞ is a linear or nonlinear expression dependent on the lower and upper fuzzy basis functions. The expressions d s ðnðxÞÞ form a design matrix DðnÞ as described in the next section. For the proposed T2RFIS model, the calculation scheme of the outputŷ is presented in Fig. 1.

Regression and design matrices
We assume that we know the pairs of observation data ðx i ; y i Þ, where i ¼ 1; 2. . .; n. These data are written as the matrix X o and the vector y of the form . .  Assuming these data, we introduce the following three regression matrices: -the lower regression matrix    (6) and upper (7) regression matrices are obtained. The regression matrices are combined to form one regression matrix (8), and the design matrix (9) is determined. In the next steps, an optional feature selection is performed, resulting in the matrix D f , for which the parameters of the system function are calculated. In the end, the output of the system is predicted Fig. 2 Type-2 fuzzy sets for the inputs of the T2RFIS model The elements of the lower and upper regression matrices are created from the elements of the vectors of the lower and upper fuzzy basis functions (3), respectively. The regression matrix X is formed as a weighted arithmetic mean of the lower X and upper X regression matrices. The elements of the described matrices are determined for all input data x i . Moreover, we introduce the design matrix The matrix D, calculated on the basis of the regression matrix ( Fig. 1), consists of the expressions d s ðnðxÞÞ of the system function used in formula (4). As in the case of regression matrices, the elements of the design matrix are determined for all input data x i . In the construction of this matrix, four types of regression functions are applied [83, 92]: -'linear'-model contains a linear term for each predictor, for exampleŷ -'purequadratic'-model contains linear and squared terms for each predictor, for examplê -'interactions'-model contains a linear term for each predictor and all products of pairs of distinct predictors, for examplê -'quadratic'-model contains linear and squared terms for each predictor and all products of pairs of distinct predictors, for example The design matrix has the following sizes in subsequent examples: n Â 3, n Â 6, n Â 6, and n Â 9. The predictors of the lower and upper regression matrix form the columns of the design matrix.

Training system function parameters
Training the system function coefficients consists of calculating the elements of the vector b. For this purpose, linear regression applied to the design matrix D and the vector of output observations y can be used. In this paper, a penalized least-squares method represented by ridge regression [26] is applied. In this method, the cost function J is given by where the estimated output for the ith observation is calculated from and k [ 0 is a regularization parameter. For k ¼ 0, the ridge regression becomes an ordinary least-squares regression. The solution to the problem of minimizing the function J is given by where y ¼ ½y 1 ; . . .; y n T and I is the identity matrix. Ridge regression, which has an additional parameter k, offers the important advantage of being a regularized regression, that is, it can be used for ill-conditioned problems. This can happen when the matrix D T D is close to the singular matrix. Furthermore, this method is very fast because the vector b is calculated directly from all the data in the matrix D and the vector y.

Illustrative example
In this example, the goal is to train the T2RFIS model in the regression problem for a very small amount of data [92]. The fuzzy system has one input (m ¼ 1) denoted by x and one output denoted by y. We assume two vectors of observations of the form x ¼ ½1; 2; 3; 4 T and y ¼ ½6; 5; 7; 10 T . For input x, we define p ¼ 2 fuzzy sets of type-2, where the lower and upper membership functions (1) are defined as (Fig. 3 The parameters c and r in the sets described in Equations (17) and (18) have been arbitrarily chosen. However, in the examples described in Section 7 'Experimental results,' these parameters are calculated by a genetic algorithm.
For the defined fuzzy sets, the lower regression matrix (6) and the upper regression matrix (7) have the form where the lower fuzzy basis functions and the upper fuzzy basis functions (8) are given by For the weights w ¼ w ¼ 1 in (8), the regression matrix X has the form of Assuming that the design matrix (9) is of type 'interactions', we obtain which means that the output of the T2RFIS model is described by the function y ¼ 9:827n 1 þ 24:72n 2 À 43:86n 1 n 2  The list of estimated valuesŷ i and the residuals (errors) r i ¼ y i Àŷ i for the T1RFIS and T2RFIS models are presented in Table 1. For the models obtained, the square root of the mean square error defined as RMSE ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 n P n i¼1 r 2 i q is equal to 0.1432 and 0.0662, respectively. The approximation of the observation data using the T1RFIS and T2RFIS models is shown in Fig. 4.

Multi-objective genetic algorithm
The genetic algorithm (GA) [27,82,91] is a global optimization method inspired by the biological process of evolution. In each generation of GA, individuals are selected from the current population as parents and used to obtain children for the next generation. The population evolves in subsequent generations toward the optimal solution. To create the next generation from the current population, GA uses three main types of rules: selection, crossover, and mutation. In addition to these rules, dominance, rank, and crowding distance are also used in the multi-objective GA (MGA) algorithm [82]. A more detailed description of these terms is given in [92].

Structure of an individual
The structure of an individual for the multi-objective GA algorithm is presented in 6 Determining objective functions

Feature selection
Feature selection is a process by which a subset of features is selected to build the model. The main rationale for using the feature selection technique is that the data can contain certain features that are redundant or irrelevant, and therefore, can be deleted without losing a significant amount of information. The use of feature selection can improve prediction performance, reduce training time, and avoid the curse of dimensionality. In the proposed approach, feature selection is used to select predictors (columns) from the design matrix. This matrix can contain multiple columns, which can lead to complex models with consequent performance degradation. In the previous work of the author [92], the following feature selection methods are considered [83]: F-test, ReliefF, NCA (neighborhood component analysis), and Lasso (least absolute shrinkage and selection operator). Since a feature selection method is called inside the objective function, it should be fast. Based on the work [92], in which the calculation time of the feature selection algorithms mentioned above is analyzed, it can be concluded that the F-test method turned out to be the fastest. For this reason, it is selected for use in the experiments presented in Sect. 7.

Objective functions
In the proposed approach, we use multi-criteria optimization, thus we have to define objective functions that are minimized during optimization. It is proposed to use two objective functions, one responsible for the complexity of the model (f 1 ) and the other responsible for its accuracy (f 2 ). The values of these functions are determined as follows: Fig. 4 Approximation of observation data using the T1RFIS and T2RFIS models where f 1 is specified by the number of active features (N) in the design matrix, and f 2 is specified by the root of the mean squared validation error (RMSE). The number N denotes the number of predictors of the design matrix after applying a feature selection method. In the formula describing the function f 2 , V is the number of observations in the validation set, y k is the kth validation observation, andŷ k is the estimate for the kth validation observation. The value of this estimate is given by formula (15). In addition to the prediction of real value for regression models, we will also consider the prediction of classes for classification models. In this case, the pair of objective functions are as follows: where f 1 denotes, as for regression models, the number of selected features, f 2 is the confusion error, n mis is the number of misclassified validation records, and n all is the number of all validation records. Figure 6 shows the calculation scheme of the objective functions for the T2RFIS models. In the first step, an individual is decoded, giving the parameters of type-2 fuzzy sets, that is, the centers c k j , upper widths r k j , and distances d k j . Additionally, the parameter q 1 for a feature selection method and the parameter q 2 for the number of features are obtained. On the basis of the lower widths and distances, the upper widths are determined from the relationship

Calculation scheme
In the next two steps, the regression matrix X t and the design matrix D t of a given type are calculated for the training data. The design matrix is then truncated using a feature selection method, and the parameters of the system function b are obtained by the ridge regression with the given value of the parameter k. Before predicting the output, the regression matrix X v and the design matrix D v are determined for the validation data. On their basis, the system outputŷ is predicted, and after that, the objective functions f 1 and f 2 are determined.

Experimental results
In this section, the following four experiments utilizing the proposed method are presented: -approximation of a one-variable function, -prediction of fuel consumption, -classification of breast tissue, -prediction of concrete compressive strength.
In these experiments, the proposed method is compared with the well-known ANFIS model [32] and the RFIS model with type-1 fuzzy sets [92]. The ANFIS model is realized with constant rule consequents (ANFIS 'constant') and with linear ones (ANFIS 'linear'). Moreover, in the first experiment, the result of an approximation with a polynomial model is presented. The T1RFIS and T2RFIS models are applied with the design matrix of four types: 'linear', 'purequadratic', 'interactions', and 'quadratic' as presented in Sect. 3. The number of bins in the F-test method is bounded in the range from 1 to 20. The number of active predictors is limited from one to the number of all features. The MGA algorithm is used to train the T1RFIS and T2RFIS models. Training includes ten trials, for which the results are averaged. The final model is chosen as the model whose result is closest to the average value.

Experiment 1
In this experiment, we consider the following one-variable function  (Fig. 7). The task is to train the T2RFIS approximator of the given function assuming an accuracy not greater than the threshold value of the validation error RMSE t ¼ 0:075. The number of observation data is 121, which are divided into training and validation sets consisting of 60% and 40% randomly selected observations, respectively. For the input x of the ANFIS, T1RFIS, and T2RFIS models, the number of fuzzy sets is five. In training the T1RFIS and T2RFIS models, the centers are bounded by c min ¼ À1, c max ¼ 1. The distances of the membership functions in the training of the T2RFIS models are bounded by d min ¼ 0, and d max ¼ 0:5. For the T2RFIS models, the widths of membership functions are bounded by r min ¼ 0:05, r max ¼ 0:8, and for the T1RFIS models by r min ¼ 0:05, r max þ d max ¼ 1:3. The T1RFIS and T2RFIS models are generated with the regularization parameter k ¼ 1eÀ5 of the ridge regression. The population size for the MGA algorithm is 5n, where n is the number of variables in the individual. The number of iterations is 100, which gives 500n evaluations of the objective function. Figure 8 shows the validation error of a polynomial approximator depending on the polynomial degree (changing from 1 to 25). As we can see, the smallest validation error is 1.070 for a polynomial degree equal to 18 (Table 2). Thus, the polynomial approximator does not achieve the prescribed accuracy RMSE t ¼ 0:075 in any case. This approximator has the shortest training time of all the models considered.

ANFIS models
In training the ANFIS models, the number of epochs was 300. Of all the epochs, the value for which the validation error reaches the minimum is selected to obtain the best model. For the two considered cases, that is, ANFIS ('constant') and ANFIS ('linear'), the results are presented in Table 2. In the first case, the minimum RMSE is 0.5192 for the number of epochs equal to 983, while in the second case-0.1042 for the number of epochs equal to 1000. As we can see, the threshold value RMSE t ¼ 0:075 is not reached in any case.

T1RFIS models
The results for the T1RFIS models are presented in Table 2, where it is seen that only the model F-test ? 'quadratic' achieves the specified accuracy. The best T1RFIS model, chosen from the Pareto front, is F-test ? 'quadratic', for which the RMSE is 0.0685. For this model, 11 of 20 features are selected from the design matrix. The training times for the T1RFIS models are longer than the training times for the ANFIS models. The use of feature selection reduces the number of features in the range of 13% (1 À 13=15) to 45% (1 À 11=20).    Fig. 9, is described as follows: -lower membership functions for the input x (Fig. 10): -system function to determine the output: As shown in Table 2, 14 of 20 characteristics of the design matrix are applied in the final model. This model has the longest training time of more than 16 s. The approximation of the function by the obtained T2RFIS model and the ANFIS ('linear') model is shown in Fig. 11. It is seen in Table 2 that that the training times of the T2RFIS models are longer than the training times of the T1RFIS models. The reduction in the number of features ranges from 20% (1 À 4=5) to 30% (1 À 7=10).

Experiment 2
In this experiment, the objective is to predict automobile fuel consumption in miles per gallon (MPG) [20,81]. The data set consists of 392 pairs ½x; y of input-output observations. This set is divided into 196 pairs of training data (odd-indexed samples) and 196 pairs of validation data (even-indexed samples) (Fig. 12). The following six automobile attributes of various makes and models are used as the model inputs: the number of cylinders (x 1 ), displacement (x 2 ), horsepower (x 3 ), weight (x 4 ), acceleration (x 5 ), and model year (x 6 ). The seventh attribute, which is MPG, is used as the model output (y). We assume that the RMSE cannot exceed a maximum value RMSE t ¼ 2:6. The number of fuzzy sets for the ANFIS, T1RFIS and T2RFIS models is three. The centers of the membership functions

ANFIS models
The ANFIS models were trained with a number of epochs equal to 200. The results for two cases, namely, ANFIS ('constant') and ANFIS ('linear'), are presented in Table 3. In the first case, the minimum RMSE is 10.53 for the number of epochs equal to 48, and in the second case, the minimum RMSE is 88.39 for the number of epochs equal to 29. In any case, the threshold RMSE t ¼ 2:6 is not reached. For both cases, the number of fuzzy rules is 729.

T1RFIS models
The results of the T1RFIS models are presented in Table 3. All T1RFIS models achieve the specified accuracy. The smallest RMSE is for the model F-test ? 'interactions', and is equal to 2.517. The training times for the T1RFIS models are shorter than the training times for the ANFIS models. The use of the feature selection method reduces the number of features in the range of 19% (1 À 29=36) to 91% (1 À 15=171). Table 3 presents the results for the T2RFIS models. Similarly to the T1RFIS models, all models achieve the specified accuracy. Among them, the best is the model F-test ? 'interactions' with RMSE ¼ 2:462, and this is the best result for all considered models. This model is selected from the Pareto front depicted in Fig. 13. As we can see in Table 3, 43 of 171 features of the design matrix are applied in the final model. The training times of the T2RFIS models are shorter than the training times of the ANFIS models, but longer than those for the T1RFIS models. Figures 14, 15, and 16 show the fuzzy sets for the inputs x 1 -x 6 . Figure 17 shows the real value of MPG, the  The best RMSE result is marked in bold predicted value, and the error for the obtained model. The reduction in the number of features ranges from 17% (1 À 15=18) to 75% (1 À 43=171).

Experiment 3
This experiment deals with breast disease class prediction based on electrical impedance measurements of freshly excised tissue samples [20]. The data set contains 106 observations in the form of input-output pairs ½x; y. The observations are randomly divided into two sets, a training set containing 74 records and a validation set containing 32 records. The elements of the input vector x are the following nine variables: impedance (ohm) at zero frequency (x 1 ), phase angle at 500 KHz (x 2 ), high-frequency slope of phase angle (x 3 ), impedance distance between spectral ends (x 4 ), area under spectrum (x 5 ), area normalized by impedance distance (x 6 ), maximum of the spectrum (x 7 ), distance between the impedance and the real part of the maximum frequency point (x 8 ), and length of the spectral curve (x 9 ). The output variable y represents six classes: carcinoma, fibro-adenoma, mastopathy, glandular, connective, and adipose. The number of fuzzy sets for the ANFIS, T1RFIS and T2RFIS models is two.      The population size for the MGA algorithm is n, where n is the number of variables in the individual. The number of iterations is 50, which gives 50n objective function evaluations. Because the considered models generate the real value y, this value must be converted to a class number. This paper uses the solution presented in [92]. First, the value of y is limited to the range [1,6], and then rounded to an integer from the set f1; 2; . . .; 6g. In this way, we get a predicated class for our problem.

ANFIS models
The ANFIS models were trained with a number of epochs equal to 100. The results for the ANFIS with constant consequents ('constant') and for linear consequents ('linear') are presented in Table 4. In the first case, the classification accuracy (ACC) is 53:13% for the number of epochs equal to 21, and in the second case, the ACC is 43:75% for the number of epochs equal to 95. For both cases, the number of fuzzy rules is 512.

T1RFIS models
The class predictions for the T1RFIS models are presented in Table 4. It can be seen that the best accuracy of ACC ¼ 78:13% is achieved by two models, namely F-test ? 'interactions' and F-test ? 'quadratic'. This result is 25% better than the ANFIS model with constant consequents and about 34% better than the same model with linear consequents. As in Experiment 2, the training times for the T1RFIS models are much shorter than the training times for the ANFIS models. The use of the feature selection method made it possible to reduce the number of features in the range of 28% (1 À 13=18) to 97% (1 À 6=189). Table 4 presents the classification results for the T2RFIS models. Among them, the best result equal to 81.25% achieved the model F-test ? 'quadratic' (this is the best result for all considered models). This model is chosen from the Pareto front shown in Fig. 18. As we can see in Table 4, 29 of 189 features of the design matrix are applied in the final model. The training times of the T2RFIS models are shorter than those of the ANFIS models, but longer than those of the T1RFIS models. The reduction in the number of features for the considered models ranges from 28% (1 À 13=18) to 85% (1 À 29=189).

Experiment 4
This experiment is about prediction of concrete compressive strength based on eight components used for concrete preparation [20]. The data set consists of 1030 records in the form of input-output pairs ½x; y representing the relationship between the output variable and the input variables. The data set is randomly divided into two sets, a training set containing 721 records and a validation set containing 309 records. The elements of the input vector x The regularization parameter for ridge regression is k ¼ 1eÀ03. As in previous experiments, the population size for the MGA algorithm is n, where n is the number of variables in the individual. The number of iterations is 50, which gives 50n objective function evaluations.

ANFIS models
The results for two ANFIS models, namely ANFIS ('constant') and ANFIS ('linear'), are presented in Table 5. These models were trained with a number of epochs equal to 100. For the first model, the minimum RMSE is 8.909 and for the second model, the minimum RMSE is 171.7. For both models, the number of fuzzy inference rules is 256. The table shows a very long training time of 12,580 s for the model with linear consequents.

T1RFIS models
The concrete strength predictions for the T1RFIS models are presented in Table 5. As we can see, the best accuracy of RMSE ¼ 6:814 is achieved by the model F-test ? 'quadratic'. This result is better by 2.095 than for the ANFIS model with constant consequents and by 164.9 than for the ANFIS model with linear consequents. As in Experiments 2 and 3, the training times for the T1RFIS models are much shorter than the training times for the ANFIS models. The use of the feature selection method made it possible to reduce the number of features in the range of 47% (1 À 17=32) to 72% (1 À 43=152).

T2RFIS models
In Table 5, the concrete strength predictions are presented for the T2RFIS models. Among these models, the best result equal to 6.703 achieved the model F-test ? 'quadratic' (this is the best result for all considered models). This model is chosen from the Pareto front presented in Fig. 19. As we can see in Table 5, 53 of 152 predictors of the design matrix are applied in this model. The training times of the T2RFIS models are shorter than those of the ANFIS models, but longer than those of the T1RFIS models. The reduction in the number of features ranges from 31% (1 À 22=32) to 65% (1 À 53=152).

Remarks on computational complexity
An important issue is the complexity of the model, for which we can observe that for the T2RFIS models it is the same as for the T1RFIS models. This is because the number of predictors in the regression matrices in both cases is n Â pm [92], so the number of predictors in the design matrices is the same ( Table 6). It is seen that for a given number of fuzzy sets, the complexity is linearly related to the number of inputs for models of type 'linear' and 'purequadratic', and squarely for the models interactions and 'quadratic'. In comparison, the complexity of the ANFIS models expressed in terms of the number of fuzzy rules depends exponentially on the number of inputs (Table 6), so in their case the curse of dimensionality arises.
The complexity of the model affects the training time, which can be clearly seen in the results for the experiments presented above. With the exception of the first experiment, where we deal with the approximation of a function of one variable, in the remaining experiments the training times for the ANFIS model are much longer than for the T1RFIS and T2RFIS models. In Experiments 2, 3 and 4 for the ANFIS model, we have 729 (3 6 ), 512 (2 9 ) and 256 (2 8 ) inference rules, respectively. A large number of rules causes the training times of this model to be long. For example, in Experiment 4, the training times for the ANFIS models are 358.9 s and 12,580 s, while for the T1RFIS and T2RFIS models the time ranges from 6.81 s to 27.65 s. Moreover, it can be seen that in this example, as well as in the others, the training time for the ANFIS models with linear consequents is much longer than for the models with constant consequents. This analysis shows the low efficiency of the ANFIS model for more complex (multidimensional) problems, which is not visible in the T1RFIS and T2RFIS models.

Conclusions
A novel multi-variable fuzzy inference system with type-2 fuzzy sets has been proposed. This system does not have explicitly defined fuzzy inference rules. It consists of Gaussian fuzzy sets of type-2 defined for the inputs and linearly parameterized system functions for determining the output. The system training is performed using observation data in the form of input/output pairs. The fuzzy sets are determined by a multi-objective genetic algorithm that uses a feature selection method, and the system function parameters are obtained by ridge regression. Calculating the output is simple and fast; it requires only the multiplication of the design matrix and the vector of parameters (b y ¼ Db).
The experiments carried out confirmed the usefulness of the proposed method. On the basis of these experiments, it can be seen that the proposed method can improve the results obtained by the ANFIS and T1RFIS models. Future work will be devoted to the use of the proposed method for multidimensional data, applications of other algorithms for training fuzzy sets, and the development of this method for use with nonlinearly parameterized system functions.
Funding No funding was received for the conduct of this study.
Data availability Experiments were carried out in publicly available data sets.

Conflict of interest Not applicable.
Consent to participate Not applicable.

Consent for publication Not applicable.
Ethical approval Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.