# Neuro-fuzzy system with weighted attributes

- 996 Downloads
- 7 Citations

## Abstract

The paper presents the neuro-fuzzy system with weighted attributes. Its crucial part is the fuzzy rule base composed of fuzzy rules (implications). In each rule the attributes have their own weights. In our system the weights of the attributes are numbers from the interval [0, 1] and they are not global: each fuzzy rule has its own attributes’ weights, thus it exists in its own weighted subspace. The theoretical description is accompanied by results of experiments on real life data sets. They show that the neuro-fuzzy system with weighted attributes can elaborate more precise results than the system that does not apply weights to attributes. Assigning weights to attributes can also discover knowledge about importance of attributes and their relations.

## Keywords

Weights of attributes Importance of attributes Weighted dimension space Subspace clustering Neuro-fuzzy system## 1 Introduction

Neuro-fuzzy systems proved to be efficient in many fields of data mining. They combine the ability to handle imprecise data and to modify the parameters of elaborated models to better fit the data. The more complicated a model is, the more suitable it is to use fuzzy approach (Zadeh et al. 1973). The fuzzy approach can provide better models, even for non-fuzzy data, than non-fuzzy systems.

The crucial part of the fuzzy system is the fuzzy rule base composed of fuzzy rules (implications). Creation of the fuzzy rule base is a difficult task. This procedure has enormous influence on the quality of results elaborated by the system. The rules can implement the knowledge of experts or can be created automatically from the presented data. The rules of the fuzzy model split the input domain into regions. This procedure can be reversed in order to obtain the rules from presented data. The domain is split into regions and the regions are transformed into premises of the rules. This approach is commonly used. There are three main ways of domain partition grid split (Jang 1993; Setnes and Babuška 2001), scatter split (clustering) and hierarchical split (Hoffmann and Nelles 2001; Jakubek et al. 2006; Nelles and Isermann 1996; Nelles et al. 2000; Simiński 2008, 2009, 2010). The most common method is scatter split (clustering) (Abonyi et al. 2002; Bauman et al. 1990; Chen et al. 1998; Czogała et al. 2000; Wang et al. 1994). Clustering avoids the curse of dimensionality, which is the main problem of grid partition. The main disadvantage of many clustering algorithms is their inability to discover the number of clusters. Is such cases the number of clusters is passed to the algorithm as a parameter.

In high dimensional data sets not always all dimensions (attributes) are relevant. Some of them can be treated as noise and have minor importance. The reduction of dimensionality may be done for a whole data set (global dimensionality reduction) or individually for each cluster. The global feature transformation (e.g. PCA or SVD) causes problems with interpretability of elaborated models. Dimension reduction without feature transformation can be achieved by feature selection. The global approach selects the same subset of attributes for all clusters whereas each cluster may need its own subspace. This is the idea of subspace clustering (Friedman and Meulman 2004; Gan et al. 2006; Kriegel et al. 2009; Müller et al. 2009; Parsons et al. 2004; Sim et al. 2012) where each cluster may be extracted in its own subspace. There are two kinds of subspace clustering: bottom-up and top-down (Parsons et al. 2004). The former approach splits the clustering space with a grid and analyses the density of data examples in each grid cell extracting the relevant dimensions [e.g. CLIQUE (Agrawal et al. 1998), ENCLUS (Cheng et al. 1999), MAFIA (Goil et al. 1999)]. The latter (top–down) approach starts with full dimensional clusters and tries to throw away the dimensions of minor importance [e.g. PROCLUS (Aggarwal et al. 1999), ORCLUS (Aggarwal et al. 2000), δ-Clusters (Yang et al. 2002), FSC (Gan and Wu 2008; Gan et al. 2006)]. In algorithms mentioned above the attribute is valid or invalid in a certain cluster, the weight of the attribute in each cluster is either 0 or 1. In our solution the clustering algorithm assigns values from the interval [0, 1]. The attributes have partial importance in the subspace. This approach creates fuzzy rules in individual weighted subspaces.

The contribution of the paper is the neuro-fuzzy system with weighted attributes.

*A*)—the cardinality of sets, uppercase bolds \((\mathbf{A})\)—matrices, lowercase bolds \((\mathbf{a})\)—vectors, lowercase italics (

*a*)—scalars and set elements. Table 1 lists the symbols used in the paper.

Symbols used in the papers

\({\mathbb{X}}\) | Set of tuples, data examples, \({\mathbb{X} = \left\{\mathbf{x}_1, \ldots, \mathbf{x}_X\right\}}\) |

\(\mathbf{x}\) | Vector of tuple’s descriptors, data example, \({\mathbf{x} \in \mathbb{X}}\) |

| Number of tuples, \({X = \|\mathbb{X}\|}\) |

\(\left[\mathbf{x}, y\right]^\mathrm{T}\) | Data tuple with vector \(\mathbf{x}\) of attributes and decision attribute |

| Descriptor of a tuple, \(\mathbf{x}=\left[x_1, \ldots, x_N\right]^{\mathrm{T}}\) |

| Decision attribute of the tuple |

| Localisation of the fuzzy set in consequence of |

| Defuzzyfied output of the system |

\({\mathbb{N}}\) | Set of attributes |

| Attribute, \({n \in \mathbb{N}}\) |

| Number of attributes in a tuple, \({N = \|\mathbb{N}\|}\) |

\({\mathbb{C}}\) | Set of clusters |

| Number of clusters, \({C = \|\mathbb{C}\|}\) |

| Cluster, \({c \in \mathbb{C}}\) |

\(\mathbf{U}\) | Partition matrix, \(\mathbf{U} = \{u_{ij}\}\) |

| Membership value of the |

| Distance between |

\(\mathbf{v}_{i}\) | Cores of fuzzy sets in premise of |

\(\mathbf{s}_{i}\) | Fuzziness of sets in premise of |

\(\mathbf{z}_{i}\) | Weights of attributes in |

| Fuzzification parameter |

\({\mathbb{L}}\) | Set of rules, rule base |

| Rule, \({l\in\mathbb{L}}\) |

| Number of rules, \({L = \|\mathbb{L}\|}\) |

\({\mathfrak{a}, \mathfrak{b}}\) | Fuzzy linguistic terms for premise and consequence |

\({\mathbb{A}}\) | Set representing premise of the fuzzy rule |

\({\mathbb{B}}\) | Triangle set in consequence |

\({\mathbb{B}^{\prime}}\) | Fuzzy set of rule’s implication |

\(\rightsquigarrow\) | Fuzzy implication |

\(\star\) | T-norm |

| Firing strength |

Number of tuples and attributes in the real life data sets

Data set | Number of tuples | Number of attributes | |
---|---|---|---|

Train | Test | ||

‘Concrete’ | 515 | 515 | 8 |

‘Methane’ | 499 | 523 | 7 |

‘Death’ | 30 | 30 | 15 |

‘Breast cancer’ | 97 | 97 | 32 |

‘Ozone’ | 924 | 923 | 72 |

The paper is organised as follows: Sect. 2 introduces the new neuro-fuzzy system with parameterized consequences and weighted attributes (architecture—Sect. 2.1, creation of a fuzzy model—Sect. 2.2). Section 3 describes the data sets (Sect. 3.1) and experiments with results (Sect. 3.2). Finally Sect. 4 summarises the paper.

## 2 Fuzzy inference system with parameterized consequences and attributes’ weights

### 2.1 Architecture of the system

*l*in form of fuzzy implications

*y*are linguistic variables, \({\mathfrak{a}}\) and \({\mathfrak{b}}\) are fuzzy linguistic terms (values). Data tuples are represented by vectors \(\left[\mathbf{x}, y\right]^\mathrm{T}\), where \(\mathbf{x}\) is a vector of descriptors and

*y*is the decision attribute of the tuples. Both the descriptors and decision are real numbers.

In the following text we will describe the situation only for one rule, but we will omit the index of the rule in the following formulae as not to complicate the notation.

*N*-dimensional space. Each fuzzy rule has its own premise and consequence. For each dimension

*n*the set \({\mathbb{A}_n}\) is described with the Gaussian membership function:

*v*

_{ n }is the core location for

*n*th attribute and

*s*

_{ n }is this attribute Gaussian bell deviation (fuzziness).

*i*has its own weight

*z*

_{ i }, we use the weighted T-norm (Rutkowski and Cpałka 2003) to determine the membership of the data example to the fuzzy set \({\mathbb{A}}\) in rule’s premise:

*l*th rule’s premise is the firing strength of the rule for the tuple (from now on we use the rule’s index

*l*)

*n*th descriptor to the fuzzy set \({\mathbb{A}_n}\) in the premise for

*n*th attribute of a certain rule (the index of which we omit here) as in formulae 2, 3, 4, \({u_{l \mathbb{A}}}\) stands for membership of the whole data tuple to the premise of the

*l*th rule—it is

*l*th rule’s firing strength (as in formula 5).

*F*of

*l*th rule for data vector (tuple) \(\mathbf{x}\):

*l*th rule’s consequence is represented by an isosceles triangle fuzzy set \({\mathbb{B}_l}\) with the base width

*w*

_{ l }, the altitude of the triangle equals 1. The localisation

*y*

_{ l }of the core of the triangle fuzzy set is determined by linear combination of input attribute values with attribute weights taken into account:

*z*

_{ l0}= 1 and

*x*

_{0}= 1.

*l*th rule is the fuzzy value of the fuzzy implication:

*L*rules are then aggregated into one fuzzy answer of the system:

*y*

_{0}the fuzzy set \({\mathbb{B}^{\prime}}\) is defuzzified with MICOG method (Czogała et al. 2000). This approach removes the non-informative parts of the aggregated fuzzy sets and takes into account only the informative parts (cf. description of Fig. 1). The aggregation and defuzzyfication may be quite expensive, but it has been proved (Czogała et al. 2000) that the defuzzyfied system output can be expressed as:

*g*depends on the fuzzy implication, in the system the Reichenbach one is used, so for the

*l*th rule function

*g*is

*g*function for various implications can be found in the original work introducing the ANNBFIS system (Czogała et al. 2000). Some inaccuracies are discussed in Nowicki (2006) and Łęski (2008).

### 2.2 Creation of the fuzzy model

Creation of the fuzzy model (fuzzy rule base) is done in three steps: partition of the input domain (Sect. 2.2.1), extraction of rules’ premises (Sect. 2.2.2) and tuning of the rules (this step is also responsible for creation of rules consequences)—Sect. 2.2.3.

#### 2.2.1 Partition of the input domain

For domain partition we use modification (Simiński 2012) of the FCM clustering algorithm (Dunn 1973) where the weights are the values from the interval [0, 1]. Thus each cluster is fuzzy in two ways:

- 1.
Data tuples have fuzzy membership to clusters. The sum of membership of one data tuple to all clusters is 1 (cf. Eq. 16). This is common in the fuzzy clustering paradigm.

- 2.
The cluster itself has fuzzy possession of attributes. This means that the cluster spreads in a fuzzy way upon the dimensions. The sum of dimensions weight of one cluster is 1 (cf. Eq. 16).

*J*

*m*and

*f*≠ 1 (the case of

*f*= 1 is discussed on Eq. 20) are parameters,

*u*

_{ ci }stands for membership of

*i*th data example \(\left(\mathbf{x}_i\right)\) to

*c*th cluster,

*z*

_{ cn }stands for weight of

*n*th attribute (descriptor) in

*c*th cluster,

*x*

_{ in }is

*n*th descritor of

*i*th data tuple,

*v*

_{ cn }is

*n*th attibute of centre of

*c*th cluster.

*c*th cluster is defined as

- 1.The sum of membership values to all clusters for each data tuple is one:$$ \forall {i \in [1,X]} : \; \sum^C_{c = 1} u_{c i} = 1. $$(16)
- 2.The sum of dimension weights
*z*for all dimensions*N*in each cluster*c*equals one:$$ \forall {i \in [1, C]} : \; \sum^N_{n = 1} z_{in} = 1. $$(17)

The data are clustered by alternating application of formulae 15, 18 and 19.

*f*= 1. The objective function (14) becomes

*n*of the

*l*th rule for which the sum

*z*

_{ ln }= 1 and other attributes of this rule get zero weights (because of the constraint expressed by formula 17).

#### 2.2.2 Extraction of rules

The clustering procedure elaborates memberships and weights gathered in matrices \(\mathbf{U} = \{u_{ij}\}\) and \(\mathbf{Z} = \{z_{ij}\}\) respectively which are then converted into premises’ parameters *v*, *s* and *z*. The number of rules is equal to the number of clusters: *L* = *C*.

*v*of rules’ premises are calculated with formula 15. The fuzzification parameter

*s*is calculated with formula (Czogała et al. 2000)

*N*attributes have the same weights, their weight is

*z*= 1/

*N*(cf. Eq. 17) and if firing strengths of all attributes are the same and equal

*F*

_{ n }, the firing strength of the whole rule is (cf. Eq. 6)

#### 2.2.3 Tuning of rule parameters

*v*and

*s*in Eq. 2,

*z*in Eq. 4) and the values of the supports

*w*of the sets in consequences are tuned with the gradient method. The linear coefficients

*p*(Eq. 7) for the calculation of the localisation of the consequence sets are calculated with the pseudoinverse matrix. For tuning parameters of the model the square error is used

*y*is the original value and

*y*

_{0}is the value elaborated by the system (cf. Eq. 12). For

*q*parameter in

*j*th rule the differential has the following form:

*v*,

*s*and

*z*parameters. For width

*w*of the isosceles triangle in the rule’s consequence the following formula is used:

*q*

_{ j }being

*v*

_{ jm }parameter (the core of the

*m*th attribute in

*j*th rule) we get (cf. Eq. 6)

*q*

_{ j }being

*s*

_{ jm }parameter (the fuzzification of the

*m*th attribute in

*j*th rule) in Eq. 26 we get:

*q*

_{ j }being

*z*

_{ jm }(the weight of the

*m*th attribute in

*j*th rule) parameter in Eq. 26 we get:

For *f* = 0 (which switches off the attributes’ weights) the proposed system is identical with ANNBFIS system described in (Czogała et al. 2000).

## 3 Experiments

The experiments were conducted on real-life data sets depicting methane concentration, death rate, breast cancer recurrence time, concrete compressive strength and ozone concentration. All real life data sets are normalised (to mean 0 and standard deviation 1). Some parameters of data sets are gathered in Table 2.

### 3.1 Data set description

The ‘Methane’ data set contains the real life measurements of air parameters in a coal mine in Upper Silesia (Poland). The parameters (measured in 10 s intervals) are: AN31—the flow of air in the shaft, AN32—the flow of air in the adjacent shaft, MM32—concentration of methane (CH_{4}), production of coal, the day of week. The 10-min sums of measurements of AN31, AN32, MM32 are added to the tuples as dynamic attributes (Sikora et al. 2005). The task is to predict the concentration of the methane in 10 min. The data is divided into a train set (499 tuples) and test set (523 tuples).

The ‘Death’ data represent the tuples containing information on various factors, the task is to estimate the death rate (Späth 1992). The first attribute (the index) is excluded from the dataset. The precise description of the attributes is available with the data set, the names of the attributes are listed in Table 7, so the description is omitted here. The data can be downloaded from a public repository.
^{1}

The ‘Breast cancer’ data set represents the data for the breast cancer case (Asuncion and Newman 2007). Each data tuple contains 32 continuous attributes and one predictive attribute (the time to recur). Here again we will omit the description of attributes, their names are listed in Table 6. The symbol ‘se’ in the attribute’s name stands for ‘standard error’ and the adjective ‘worst’ means the ‘largest’. The data can be downloaded
^{2} from a public repository (Frank et al. 2010).

The ‘Concrete’ set is a real life data set describing the parameters of the concrete sample and its strength (Yeh 1998). The attributes are: cement ratio, amount of blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, fine aggregate, age; the decision attribute is the concrete compressive strength. The original data set can be downloaded
^{3} from public repository (Frank et al. 2010).

The ‘Ozone’ set—a real life data set—describes the level of ozone in the air (Zhang and Fan 2008). The data set includes 2536 tuples with 73 attributes. The original data set can be downloaded
^{4} from a public repository (Frank et al. 2010). The data set has 687 tuples with missing values, these were deleted from the data set and 1847 full tuples were left. The first attribute (date) was deleted from the tuples. The tuples were numbered starting with 1. The tuples with odd numbers are used as a train set (924 tuples), the even numbered tuples constitute the test data set (923 tuples). All attributes are real numbers. The task is to predict the level of ozone (high 1 or low 0).

### 3.2 Results of experiments

The fuzzy models were created with train sets. The number of rules is always the same as the number of clusters and was assumed a priori as *L* = 5 (for the ‘Ozone’ dataset *L* = 3). Finding the optimal number of clusters in clustering is a difficult task. Our aim here is not to discuss this problem, but to compare the precision of our system with the one already existing. This is why we assume the a priori number of rules.

The experiments were conducted in two paradigms. In the first one—data approximation (DA)—the models are created and tested with the same train data sets. In the other—knowledge generalisation (KG)—the models are created with train data sets and tested with unseen tuples of test data sets.

*i*th tuple and \(y_0 \left( \mathbf{x}_i \right)\) is the value elaborated by the system;

*X*is the number of tuples.

Two main features were tested: (1) the precision of created models and (2) the weights assigned to dimensions (attributes).

### 3.3 Precision of models

*f*parameter. For

*f*= 0 the system elaborates the same results as ANNBFIS system. The results gathered in Table 3 are also presented as graphs in Fig. 3a–d.

Root mean square error elaborated by our system

| ‘Death’ | ‘Methane’ | ‘Concrete’ | ‘Breast cancer’ | ||||
---|---|---|---|---|---|---|---|---|

DA | KG | DA | KG | DA | KG | DA | KG | |

0.0 | 1.4590 | 2.6367 | 0.4210 | 0.4021 | 1.0464 | 1.2609 | 2.4243 | 5.5836 |

0.3 | 1.7372 | 2.0286 | 0.3884 | 0.3204 | 0.4404 | 1.1930 | 1.3124 | 3.0173 |

0.5 | 1.0276 | 1.5364 | 0.3959 | 0.3051 | 0.4623 | 0.8925 | 2.0098 | 3.7444 |

0.8 | 0.2840 | 1.1046 | 0.3999 | 0.2993 | 0.4988 | 0.9203 | 0.7836 | 1.2458 |

1.2 | 0.5746 | 0.9959 | 0.4012 | 0.3225 | 0.8744 | 0.9804 | 0.9460 | 0.9996 |

1.3 | 0.6983 | 4.2052 | 0.3852 | 0.3147 | 0.6579 | 0.7481 | 0.7990 | 1.0337 |

1.4 | 0.5698 | 0.8276 | 0.4381 | 0.3633 | 0.5603 | 0.7725 | 0.6221 | 1.6346 |

1.5 | 0.2365 | 1.5162 | 0.3874 | 0.3169 | 0.5327 | 0.7539 | 0.5953 | 1.4051 |

1.6 | 0.4799 | 2.0888 | 0.3871 | 0.3528 | 0.7585 | 0.7949 | 0.5359 | 1.7692 |

1.7 | 0.0029 | 1.9332 | 0.3923 | 0.3169 | 0.5845 | 0.7240 | 0.6330 | 1.2517 |

1.8 | 0.0031 | 1.7067 | 0.3915 | 0.3081 | 0.6582 | 0.7191 | 0.5396 | 1.4417 |

1.9 | 0.0034 | 1.5482 | 0.3902 | 0.2963 | 0.6606 | 0.7125 | 0.7560 | 1.2005 |

2.0 | 0.0034 | 3.2544 | 0.4168 | 0.3220 | 0.6586 | 0.7289 | 0.7527 | 1.1810 |

2.5 | 0.0018 | 3.3457 | 0.5989 | 0.7574 | 0.6625 | 0.7575 | 0.6927 | 1.2029 |

3.0 | 0.0014 | 3.2089 | 0.3969 | 0.2998 | 0.9120 | 1.0742 | 0.6879 | 2.4426 |

3.5 | 0.0025 | 2.7339 | 0.3757 | 0.8383 | 0.9081 | 1.0679 | 0.4396 | 2.8398 |

4.0 | 0.0074 | 2.4294 | 0.3757 | 0.8211 | 0.9045 | 1.0623 | 0.3597 | 3.0749 |

4.5 | 0.0122 | 1.9510 | 0.3879 | 0.7762 | 0.6557 | 0.7181 | 0.5227 | 2.4006 |

5.0 | 0.0102 | 1.9008 | 0.3918 | 0.2997 | 0.6615 | 0.7530 | 0.5197 | 2.5246 |

10.0 | 0.0113 | 1.8498 | 0.4597 | 0.3769 | 0.6683 | 0.7097 | 0.7905 | 2.6919 |

20.0 | 0.0127 | 1.9301 | 0.4001 | 0.2978 | 0.6660 | 0.7550 | 0.9721 | 3.0642 |

The experiments reveal that for \(f \in [1.5, 2]\) the RMSE, elaborated by the systems for various data sets, achieves its most advantageous values both for DA and KG. For *f* > 2 the KG error starts to grow, whereas the DA error is kept on more or less the same level. The optimal interval of *f* parameter seems independent from the data sets.

*f*= 2. The squares denote the expected values. The Figs. 8 and 9 present in a more detailed way the result for tuples 50–150 and 250–400 respectively. Similarly, the Fig. 10 presents the details of the Fig. 5 for tuples 400–500.

The figures show that applying attribute weights in the fuzzy rule base results in a more precise prediction. The better prediction can be observed in Fig. 8, where the expected values are better elaborated by our systems (the black line) than by the original ANNBFIS system that does not use attribute weights in rules (the gray line).

#### 3.3.1 Weights of attributes

Weights of attributes elaborated for the ‘Methane’ data set (cf. Fig. 11)

Attribute | Attributes’ weights in rules | ||||
---|---|---|---|---|---|

I | II | III | IV | V | |

AN31: flow of air in the shaft | 1.000 | 0.000 | 0.132 | 1.000 | 0.361 |

AN32: flow of air in the adjacent shaft | 0.009 | 0.000 | 0.087 | 0.002 | 0.102 |

MM32: concentration of methane | 0.004 | 0.000 | 0.065 | 0.002 | 0.058 |

Production of coal | 0.028 | 1.000 | 1.000 | 0.011 | 0.885 |

Sum of AN31 | 0.020 | 0.000 | 0.156 | 0.005 | 1.000 |

Sum of AN32 | 0.009 | 0.000 | 0.085 | 0.004 | 0.291 |

Sum of MM32 | 0.005 | 0.000 | 0.093 | 0.002 | 0.059 |

Weights of attributes elaborated for the ‘Concrete’ data set (cf. Fig. 12)

Attribute | Attributes’ weights in rules | ||||
---|---|---|---|---|---|

I | II | III | IV | V | |

Cement ratio | 0.000 | 0.065 | 0.006 | 0.065 | 0.005 |

Blast furnace slag | 0.000 | 0.050 | 0.019 | 0.535 | 1.000 |

Fly ash | 1.000 | 0.008 | 0.121 | 0.332 | 0.007 |

Water | 0.000 | 0.013 | 0.190 | 0.086 | 0.003 |

Superplasticizer | 0.000 | 0.057 | 0.057 | 0.112 | 0.004 |

Coarse aggregate | 0.000 | 0.028 | 0.006 | 0.090 | 0.003 |

Fine aggregate | 0.000 | 1.000 | 1.000 | 0.063 | 0.002 |

Age | 0.000 | 0.027 | 0.031 | 1.000 | 0.004 |

Weights of attributes elaborated for the ‘Breast cancer’ data set (cf. Fig. 13)

Attribute | Attributes’ weights in rules | ||||
---|---|---|---|---|---|

I | II | III | IV | V | |

Lymph_node | 1.000 | 0.008 | 0.030 | 0.061 | 0.044 |

Radius_mean | 0.001 | 0.009 | 0.277 | 0.208 | 0.953 |

Texture_mean | 0.001 | 0.015 | 0.037 | 0.147 | 0.039 |

Perimeter_mean | 0.001 | 0.009 | 0.293 | 0.194 | 1.000 |

Area_mean | 0.001 | 0.008 | 0.470 | 0.163 | 0.910 |

Smoothness_mean | 0.000 | 0.035 | 0.051 | 0.198 | 0.047 |

Compactness_mean | 0.000 | 0.016 | 0.041 | 0.241 | 0.039 |

Concavity_mean | 0.000 | 0.021 | 0.061 | 0.108 | 0.059 |

Concave_points_mean | 0.001 | 0.012 | 0.090 | 0.160 | 0.054 |

Symmetry_mean | 0.000 | 0.037 | 0.041 | 0.123 | 0.033 |

Fractal_dimension_mean | 0.001 | 0.022 | 0.037 | 0.431 | 0.046 |

Radius_se | 0.001 | 0.028 | 0.022 | 0.223 | 0.059 |

Texture_se | 0.000 | 0.052 | 0.028 | 0.431 | 0.034 |

Perimeter_se | 0.000 | 0.029 | 0.026 | 0.252 | 0.052 |

Area_se | 0.000 | 0.019 | 0.064 | 0.300 | 0.076 |

Smoothness_se | 0.001 | 0.094 | 0.076 | 0.543 | 0.025 |

Compactness_se | 0.000 | 0.040 | 0.069 | 0.356 | 0.064 |

Concavity_se | 0.000 | 0.049 | 0.052 | 0.190 | 0.048 |

Concave_points_se | 0.001 | 0.085 | 0.035 | 0.123 | 0.032 |

Symmetry_se | 0.000 | 1.000 | 0.051 | 0.175 | 0.066 |

Fractal_dimension_se | 0.000 | 0.030 | 0.044 | 0.608 | 0.032 |

Radius_worst | 0.001 | 0.009 | 0.544 | 0.196 | 0.379 |

Texture_worst | 0.000 | 0.023 | 0.038 | 0.165 | 0.077 |

Perimeter_worst | 0.001 | 0.010 | 0.336 | 0.204 | 0.197 |

Area_worst | 0.001 | 0.009 | 1.000 | 0.156 | 0.332 |

Smoothness_worst | 0.000 | 0.020 | 0.033 | 0.231 | 0.079 |

Compactness_worst | 0.000 | 0.016 | 0.029 | 0.482 | 0.079 |

Concavity_worst | 0.001 | 0.018 | 0.021 | 0.174 | 0.104 |

Concave_points_worst | 0.001 | 0.016 | 0.036 | 0.145 | 0.052 |

Symmetry_worst | 0.000 | 0.057 | 0.028 | 0.185 | 0.050 |

Fractal_dimension_worst | 0.001 | 0.014 | 0.021 | 1.000 | 0.106 |

Tumor_size | 0.003 | 0.016 | 0.023 | 0.051 | 0.044 |

Weights of attributes elaborated for the ‘Death’ data set (cf. Fig. 14)

Attribute | Attributes’ weights in rules | ||||
---|---|---|---|---|---|

I | II | III | IV | V | |

Average annual precipitation | 0.002 | 0.000 | 0.002 | 0.090 | 0.001 |

Average January temperature | 0.007 | 0.000 | 0.001 | 0.328 | 0.001 |

Average July temperature | 0.003 | 0.000 | 0.001 | 0.022 | 0.001 |

Size of the population older than 65 | 0.020 | 0.000 | 0.000 | 0.006 | 0.002 |

Number of members per household | 0.009 | 0.000 | 0.001 | 0.002 | 0.002 |

Years of schooling for persons over 22 | 0.008 | 1.000 | 0.001 | 0.189 | 0.001 |

Households with fully equipped kitchens | 0.001 | 0.000 | 0.001 | 0.012 | 0.001 |

Population per square mile | 0.002 | 0.055 | 0.001 | 0.005 | 0.001 |

Size of the nonwhite population | 0.002 | 0.000 | 0.001 | 1.000 | 0.001 |

Number of office workers | 0.066 | 0.013 | 0.001 | 0.054 | 0.001 |

Families with an income <3000 | 0.002 | 0.001 | 0.001 | 0.011 | 0.001 |

Hydrocarbon pollution index | 1.000 | 0.000 | 1.000 | 0.056 | 1.000 |

Nitric oxide pollution index | 0.269 | 0.000 | 0.440 | 0.023 | 0.110 |

Sulphur dioxide pollution index | 0.003 | 0.000 | 0.021 | 0.001 | 0.004 |

Degree of atmospheric moisture | 0.015 | 0.000 | 0.002 | 0.103 | 0.000 |

The attributes’ weights for the ‘Methane’ data set (prediction of methane concentration in a coal mine shaft) gathered in Table 4 and presented in Fig. 11 show a very interesting fact: the actual concentration of methane (the third attribute) turned out to be of minor importance in all the rules, although the task was the 10-min prediction of the concentration of the methane in the shaft. The most important attributes are the flow of air in the mine shaft (the first attribute) and the production of coal (the fourth attribute). It can be explained by the fact that excavation of coal causes tensions and splits in the rock that may release the methane gas. In two rules the most important attribute is the first one, the flow of the air in the shaft in question. In the fifth rule an interesting phenomenon can be observed. The most important attribute is the fifth one, the 10-min sum of the first attribute (flow of the air), whereas the first attribute itself has lower weight. The similar situation occurs in the case of the second attribute (flow of the air in the adjacent shaft), where the sum of air flow measurements in the adjacent shaft (the sixth attribute) is more important than the summed air flow itself (the second attribute).

The weights of attributes elaborated for the ‘Concrete’ data set are presented in Table 5 and Fig. 12. The most important attributes (all others have low weights) are: blast furnace slag (the second attribute), ratio of fly ash (the third attribute), fine aggregate (the seventh attribute) and age of concrete (the eighth attribute). In one rule the weights are more varied: the most important attribute is age, but concentration of blast furnace slag and fly ash have also quite high weights.

The weights of attributes elaborated for the ‘Breast cancer’ data set are presented in Table 6 and Fig. 6. In rule I the most important attribute is the first one (lymph nodes), which is in concordance with medical diagnose procedures. In all the rules the importance of three attributes: radius mean (the second attribute), perimeter mean (the fourth attribute) and area mean (the fifth attribute) are correlated. In rule III there are two triples of attributes of higher importance. The triple of high importance comprises: area worst (the 25th attribute), perimeter worst (the 24th attribute) and radius worst (the 22nd attribute). This triple is accompanied be the triple of slightly lower importance: area mean (the fifth attribute), perimeter mean (the fourth attribute) and radium mean (the second attribute). In one rule the weights are more varied. The important attributes are fractal dimension worst (the 31st attribute), fractal dimension standard deviation (the 21st attribute) and fractal dimension mean (the 11th attribute), smoothness standard deviation (the 16th attribute) and compactness worst (the 27th attribute).

The weights of attributes elaborated for the ‘Death’ data set are presented in Table 7 and Fig. 14. In rules I, III and V the most important attributes are the hydrocarbon pollution index (the 12th attribute) and the nitric oxide pollution index (the 13th attribute). It is interesting that the pollution index for sulphur dioxide has low weight in all rules. In the second rule the most important attribute describes scholarisation of persons over 22 (the sixth attribute).

## 4 Summary

The paper describes the novel neuro-fuzzy system with weighted attributes. In this approach the attributes in a fuzzy rule have weights. The weights of attributes are numbers from the interval [0, 1]. The weights of the attributes are not assigned globally, but each fuzzy rule has its own weights of attributes. Each rule exists in its own subspace. An attribute can be important in a certain rule, but unimportant in another. This approach is inspired by subspace clustering, but in our system the attribute can have partial weight, which is uncommon in subspace clustering where attributes have full (1) or none (0) weights in a subspace.

- 1.
The experiments show that fuzzy models with weighted attributes can elaborate more precise results, both for data approximation and knowledge generalisation for real life data sets in comparison with a neuro-fuzzy system that does not assign weights to attributes.

- 2.
Assigning weights to attributes discovers knowledge on importance of attributes in a problem. Individual weights of attributes in each rule discover the relation between attributes. This may explain why the weights of the same attribute are low in one rule and high in another one.

The experiments show that assigned weights of the attributes are in concordance with experts’ knowledge on the physical or medical mechanisms described by the data sets.

## Footnotes

## Notes

### Acknowledgments

The author is grateful to the anonymous reviewers for their constructive comments that have helped to improve the paper.

## References

- Abonyi J, Babuška R, Szeifert F (2002) Modified Gath–Geva fuzzy clustering for identification of Takagi–Sugeno fuzzy models. IEEE Trans Syst Man Cybern B 32(5):612–621CrossRefGoogle Scholar
- Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. SIGMOD Rec 28(2):61–72. doi: 10.1145/304181.304188 Google Scholar
- Aggarwal CC, Yu PS (2000) Finding generalized projected clusters in high dimensional spaces. In: SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD international conference on management of data. ACM, New York, pp 70–81. doi: 10.1145/342009.335383
- Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec 27(2):94–105. doi: 10.1145/276305.276314 Google Scholar
- Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, School of Information and Computer Sciences, Irvine, CAGoogle Scholar
- Bauman E, Dorofeyuk A (1990) Fuzzy identification of nonlinear dynamical systems. In: Proceedings of the international conference on fuzzy logic and neural nets, pp 895–898Google Scholar
- Chen JQ, Xi YG, Zhang ZJ (1998) A clustering algorithm for fuzzy model identification. Fuzzy Sets Syst 98(3):319–329. doi: 10.1016/S0165-0114(96)00384-3 Google Scholar
- Cheng CH, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: KDD ’99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 84–93. doi: 10.1145/312129.312199
- Czogała E, Leski J (2000) Fuzzy and neuro-fuzzy intelligent systems. Series in fuzziness and soft computing. Physica-Verlag, HeidelbergGoogle Scholar
- Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact, well separated clusters. J Cybern 3(3):32–57CrossRefMATHMathSciNetGoogle Scholar
- Frank A, Asuncion A (2010) UCI machine learning repositoryGoogle Scholar
- Friedman JH, Meulman JJ (2004) Clustering objects on subsets of attributes. J R Statist Soc B 66:815–849CrossRefMATHMathSciNetGoogle Scholar
- Gan G, Wu J (2008) A convergence theorem for the fuzzy subspace clustering (FSC) algorithm. Pattern Recogn 41(6):1939–1947. doi: 10.1016/j.patcog.2007.11.011 Google Scholar
- Gan G, Wu J, Yang Z (2006) A fuzzy subspace algorithm for clustering high dimensional data. In: Advanced data mining and applications, second international conference, ADMA 2006, Xi’an, China, August 14–16, 2006, Proceedings. Lecture notes in computer science, vol 4093. Springer, Berlin, pp 271–278Google Scholar
- Goil S, Goil S, Nagesh H, Nagesh H, Choudhary A, Choudhary A (1999) Mafia: efficient and scalable subspace clustering for very large data sets. Tech rep (1999)Google Scholar
- Hoffmann F, Nelles O (2001) Genetic programming for model selection of tsk-fuzzy systems. Inf Sci 136:7–28CrossRefMATHGoogle Scholar
- Jakubek S, Keuth N (2006) A local neuro-fuzzy network for high-dimensional models and optimalization. In: Engineering applications of artificial intelligence, pp 705–717Google Scholar
- Jang JSR (1993) ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybern 23(3):665–684CrossRefGoogle Scholar
- Kriegel HP, Kroger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data (TKDD) 3(1):1–58. doi: 10.1145/1497577.1497578 Google Scholar
- Łęski J (2008) Systemy neuronowo-rozmyte [Neuro-fuzzy systems]. Wydawnictwa Naukowo-Techniczne, Warszawa. ISBN 978-83-204-3229-9Google Scholar
- Łęski J, Czogała E (1999) A new artificial neural network based fuzzy inference system with moving consequents in if-then rules and selected applications. Fuzzy Sets Syst 108(3):289–297. doi: 10.1016/S0165-0114(97)00314-X
- Mamdani EH, Assilian S (1975) An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man-Mach Stud 7(1):1–13CrossRefMATHGoogle Scholar
- Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. Proc VLDB Endow 2(1):1270–1281Google Scholar
- Nelles O, Fink A, Babuška R, Setnes M (2000) Comparison of two construction algorithms for Takagi–Sugeno fuzzy models. Int J Appl Math Comput Sci 10(4):835–855MATHGoogle Scholar
- Nelles O, Isermann R (1996) Basis function networks for interpolation of local linear models. In: Proceedings of the 35th IEEE conference on decision and control, vol 1, pp 470–475Google Scholar
- Nowicki R (2006) Rough-neuro-fuzzy system with MICOG defuzzification. In: 2006 IEEE international conference on fuzzy systems. Vancouver, Canada, pp 1958–1965. doi: 10.1109/FUZZY.2006.1681972
- Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor Newsl 6(1):90–105. doi: 10.1145/1007730.1007731 Google Scholar
- Reichenbach H (1935) Wahrscheinlichkeitslogik. Erkenntnis 5:37–43. doi: 10.1007/BF00172280 Google Scholar
- Rutkowski L, Cpałka K (2003) Flexible neuro-fuzzy systems. IEEE Trans Neural Netw 14(3):554–574. doi: 10.1109/TNN.2003.811698 Google Scholar
- Setnes M, Babuška R (2001) Rule base reduction: some comments on the use of orthogonal transforms. IEEE Trans Syst Man Cybern C Appl Rev 31(2):199–206. doi: 10.1109/5326.941843 Google Scholar
- Sikora M, Krzykawski D (2005) Application of data exploration methods in analysis of carbon dioxide emission in hard-coal mines dewater pump stations. Mech Autom Mining 413(6)Google Scholar
- Sim K, Gopalkrishnan V, Zimek A, Cong G (2012) A survey on enhanced subspace clustering. In: Data mining and knowledge discovery, pp 1–66. doi: 10.1007/s10618-012-0258-x
- Simiński K (2008) Neuro-fuzzy system with hierarchical partition of input domain. Studia Inf 29(4A (80, 43–53):43–53Google Scholar
- Simiński K (2009) Patchwork neuro-fuzzy system with hierarchical domain partition. In: Kurzyński M, Woźniak M (eds.) Computer Recognition Systems 3. Advances in intelligent and soft computing, vol 57. Springer, Berlin, pp 11–18. doi: 10.1007/978-3-540-93905-4_2
- Simiński K (2010) Rule weights in neuro-fuzzy system with hierarchical domain partition. Int J Appl Math Comput Sci 20(2):337–347. doi: 10.2478/v10006-010-0025-3 Google Scholar
- Simiński, K (2012) Clustering in fuzzy subspaces. Theoret Appl Inf 24(4):313–326. doi: 10.2478/v10179-012-0019-y Google Scholar
- Späth H (1992) Mathematical algorithms for linear regression. Academic Press Professional, Inc., San DiegoGoogle Scholar
- Sugeno M, Kang GT (1988) Structure identification of fuzzy model. Fuzzy Sets Syst 28(1):15–33 9CrossRefMATHMathSciNetGoogle Scholar
- Takagi T, Sugeno M (1985) Fuzzy identification of systems and its application to modeling and control. IEEE Trans Syst Man Cybern 15(1):116–132CrossRefMATHGoogle Scholar
- Wang L, Langari R (1994) Building Sugeno-type models using fuzzy discretization and orthogonal parameter estimation techniques. NAFIPS/IFIS/NASA ’94. In: Proceedings of the first international joint conference of the north american fuzzy information processing society biannual conference. The industrial fuzzy control and intelligent systems conference, and the NASA Joint Technolo, pp 201–206. doi: 10.1109/IJCF.1994.375098
- Yang J, Wang W, Wang H, Yu P (2002) δ-clusters: capturing subspace correlation in a large data set. In: Proceedings 18th international conference on data engineering, 2002, pp 517–528Google Scholar
- Yeh IC (1998) Modeling of strength of high-performance concrete using artificial neural networks. Cement Concrete Res 28(12):1797–1808. doi: 10.1016/S0008-8846(98)00165-3
- Zadeh LA (1973) Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans Syst Man Cybern SMC-3, pp 28–44Google Scholar
- Zhang K, Fan W (2008) Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond. Knowl Inf Syst 14(3):299–326CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.