# Feature selection and replacement by clustering attributes

- 1.7k Downloads
- 10 Citations

## Abstract

Feature selection is to find useful and relevant features from an original feature space to effectively represent and index a given dataset. It is very important for classification and clustering problems, which may be quite difficult to solve when the amount of attributes in a given training data is very large. They usually need a very time-consuming search to get the features desired. In this paper, we will try to select features based on attribute clustering. A distance measure for a pair of attributes based on the relative dependency is proposed. An attribute clustering algorithm, called Most Neighbors First, is also presented to cluster the attributes into a fixed number of groups. The representative attributes found in the clusters can be used for classification such that the whole feature space can be greatly reduced. Besides, if the values of some representative attributes cannot be obtained from current environments for inference, some other possible attributes in the same clusters can be used to achieve approximate inference results.

## Keywords

Attribute clustering Feature selection Representative attribute Relative dependency## 1 Introduction

Although a wide variety of expert systems has been built, knowledge acquisition remains a development bottleneck [2, 21]. Building a large-scale expert system involves creating and extending a large knowledge base over the course of many months or years. Shortening the development time is thus the most important factor for the success of an expert system. In the past, machine-learning techniques were successfully developed to ease the knowledge-acquisition bottleneck. Among the proposed approaches, deriving rules from training examples is the most common [9, 14, 15]. Given a set of examples, a learning program tries to induce rules that describe each class.

In some application domains, the amount of attributes (or features) of given training data is very large (e.g. decades to hundreds). In this case, much computational time is needed to derive classification rules from the data. Besides, derived rules may contain too many features and more rules than actually desired may be obtained due to over-specialization. In fact, not all the attributes are indispensable. Some redundant, similar or dependent attributes may exist in the given training data. This phenomenon mainly results from attribute dependency. Redundant and similar attributes can be thought of as two special cases of dependent attributes. If there exists some dependency relationship between attributes, the dimension of the training data may thus be reduced.

The concept of reduced attributes has been used in many places. For example, in the rough set theory [16, 17, 18, 19], the reduced set of attributes is also called a “reduct”. Many possible reducts may exist at the same time. Even only a reduced set of attributes is used for classification, the indiscernibility relations still preserve among the attributes [10]. A minimal reduct, just as its literal meaning shows, is a reduct which cannot be reduced any more. It may not be unique as well. The classification work for a high-dimensional dataset can be done faster if a minimal reduct instead of the original entire set of attributes is used. Finding a minimal reduct is an NP-hard problem [20, 22]. Besides, there may be no minimal reduct due to noise in training examples.

Some researches about finding approximate reducts were thus proposed. An approximate reduct is a minimal reduct with acceptable tolerance. It can usually be found in much shorter time relative to an exact minimal reduct. Besides, it usually consists of less attributes than an exact one. It is thus a good trade-off among accuracy, practicability and execution time. Many approaches for finding approximate reducts were proposed [3, 7, 26, 27]. For example, Wróblewski [25] used the genetic algorithm to find approximate minimal reducts. Sun and Xiong [23] proposed an approach compatible with incomplete information systems. Al-Radaideh et al. [1] used the discernibility matrix and a weighting strategy to find the minimal reduct in a greedy strategy. Gao et al. [4] proposed a feature ranking strategy (similar to attribute weighting) with a sampling process included. Recently, approaches based on soft sets for attribute selection has also been proposed to reduce the execution time [13, 20].

All the approaches mentioned above focus on the issue of finding a minimal reduct as soon as possible. However, if there are training examples with missing or unknown values, the approach may not correctly work. Besides, if only the chosen reduct is used in a learning process, the rules cannot contain other attributes and are hard to use if some attribute values in the reduct cannot be obtained in current environments.

- 1.
Guessing a missing value of an attribute from the other attributes within the same cluster should be more accurate and faster than that from all attributes.

- 2.
If an object has missing values, its class can also be decided by the other attributes within the same cluster.

- 3.
The proposed approach is flexible for representing rules since each attribute in a rule can be displaced with other attributes in the same cluster.

The remainder of this paper is organized as follows: Some related concepts including reduct, relative dependency and clustering are reviewed in Sect. 2. The proposed dissimilarity between a pair of attributes is explained in Sect. 3. An attribute clustering algorithm is proposed in Sect. 4. An example is given in Sect. 5 to illustrate the proposed algorithm. The experimental results and some discussions are described in Sect. 6. Conclusions and future work are finally stated in Sect. 7.

## 2 Related work

In this section, some important concepts related to this paper are briefly reviewed. The concept of reducts is first introduced, followed by the concept of relative dependency. Next, two famous clustering approaches, \(k\)-means and \(k\)-medoids, are described and compared. The reasons for why they are not suitable for clustering attributes are also described. An attribute clustering approach is thus proposed due to these problems and limitations.

### 2.1 Reducts

A simple information system

Object | Age | Income | Children |
---|---|---|---|

\(x_{1}\) | Young | Low | No |

\(x_{2}\) | Middle | Middle | Yes |

\(x_{3}\) | Senior | High | Yes |

\(x_{4}\) | Young | Low | Yes |

\(x_{5}\) | Senior | Middle | No |

A simple decision system

Object | Age | Income | Children | Buying computers |
---|---|---|---|---|

\(x_{1}\) | Young | Low | No | No |

\(x_{2}\) | Middle | Middle | Yes | No |

\(x_{3}\) | Senior | High | Yes | Yes |

\(x_{4}\) | Young | Low | Yes | Yes |

\(x_{5}\) | Senior | Middle | No | No |

In Table 2, a decision attribute, Buying computers, is added to the original information system (Table 1) to form a decision system. In this example, the attribute subset {Age, Income} is not a reduct since the two objects \(x_{1}\) and \(x_{4}\) have the same values for the two attributes but belong to different classes. On the contrary, the attribute subset {Age, Children} is a reduct for the decision system. Furthermore, it is a minimal reduct since neither {Age} nor {Children} is a reduct. Finding minimal reducts has been proven as an NP-Hard problem. Li et al. [11] proposed the concept of “approximate” reducts to speed up the searching process. An approximate reduct allows for some reasonable tolerance degrees, but can greatly reduce the computation complexity. Next, the concept of relative dependency is introduced.

### 2.2 Relative dependency

The goal of the paper is to cluster attributes such that the process of finding approximate reducts can be improved. For achieving this goal, it is thus important to develop an evaluation method which can measure the similarity of attributes. This paper extends the concept of the relative dependency to compute the similarity between any two attributes and proposes an attribute clustering method. The proposed approach will be described in Sect. 3.

### 2.3 The *k*-means and the *k*-medoids clustering approaches

The \(k\)-means and the \(k\)-medoids approaches are two well-known partitioning (or clustering) strategies. They are widely used to cluster data when the number of clusters is given in advance. The \(k\)-means clustering approach [12] consists of two major steps: (1) reassigning objects to clusters and (2) updating the centers of clusters. The first step calculates the distances between each object and the \(k\) centers and reassigns the object to the group with the nearest center. The second step then calculates the new means of the \(k\) groups just updated and uses them as the new centers. These two steps are then iteratively executed until the clusters no longer change.

The complexity of the \(k\)-medoids approach is in general higher than the \(k\)-means approach, but the former can guarantee that all the centers of clusters obtained are objects themselves. This feature is important to the proposed attribute clustering here, since not only the attributes are clustered but also the representative attribute of each cluster has to be found. On the contrary, the \(k\)-means approach may use non-object points as cluster centers. Note that both the \(k\)-means and the \(k\)-medoids approaches are mainly designed to cluster objects, but not attributes. As mentioned above, the goal of the paper is to cluster attributes. An attribute clustering method based on \(k\)-medoids is thus proposed to achieve this purpose. It also uses a better search strategy to find centers in a dense region, instead of random selection in \(k\)-medoids. Besides, a method to measure the distances (dissimilarities) among attributes is also needed.

## 3 Attribute dissimilarity

In this paper, we partition the attributes into \(k\) clusters according to the dependency between each pair of attributes. Each cluster can thus be represented by its representative attribute. The whole feature spaces can thus be greatly reduced.

For most clustering approaches, the distance between two objects is usually adopted as a measure for representing their dissimilarity, which is then used for deciding whether the objects belongs to the same cluster or not. In this paper, the attributes, instead of the objects, are to be clustered. The conventional distance measures such as Euclidean distance or Manhattan distance are thus not suitable since the attributes may have different formats of data, which are hard to compare. For example, assume there are two attributes, one of which is age and the other is gender. It is thus hard to compare the two attributes via the traditional distance measure. Below, a measure based on the concept of relative data dependency is proposed to achieve it. It was proposed by Han et al. [5] and can be thought of as a kind of similarity degrees.

## 4 The proposed algorithm

In this section, an attribute clustering algorithm called Most Neighbors First (MNF) is proposed to cluster the attributes into a fixed number of groups. Assume the number \(k\) of desired clusters is known. Some preprocessing steps such as removal of inconsistent or incomplete tuples and discretization of numerical data are first done. After that, the proposed MNF attribute clustering algorithm is used to partition the feature space into \(k\) clusters and output the \(k\) representative attributes of the clusters.

The proposed clustering algorithm MNF is based on the \(k\)-medoids approach. Unlike the \(k\)-means approach, the proposed algorithm always updates the centers by some existing objects. Besides, it uses a better search strategy to find centers in a dense region, instead of random selection in \(k\)-medoids.

The proposed algorithm MNF consists of two major phases: (1) reassigning the attributes to the clusters and (2) updating the centers of the clusters. In the first phase, the proposed distance measure is used to find the nearest center of each attribute. The attribute is then assigned to the cluster with that center. In the second phase, each cluster \(C_{i}\) uses a searching radius \(r_{i}\) to decide the neighbors of each attribute in \(C_{i}\). The attribute with the most neighbors in a cluster is then chosen as the new center. The proposed algorithm is described in details below.

**The MNF attribute clustering algorithm:**

Input: An information system \(I=( {U,A\cup \{d\}})\) and the number \(k\) of desired clusters.

Output: \(k\) appropriate attribute clusters with their representative attributes.

*Step 1* Randomly select \(k\) attributes \(\{A_{1}^{c}, A_{2}^{c}, {\ldots }, A_{k}^{c}\}\) as the initial representative attributes (centers) in the \(k\) clusters, where \(A_{t}^{c}\) stands for the representative attribute (center) of the t-th cluster \(C_{t}\), \(A_{t}^{c} \in A\). Denote \(A_{c} = \{A_{1}^{c}, A_{2}^{c}, {\ldots }, A_{k}^{c}\}\subseteq A\) as the initial representative attribute set.

*Step 2*For each non-representative attribute \(A_{i}\in A-A_{c}\), compute the dissimilarity (distance) \(d(A_{i}\), \(A_{t}^{c})\) between attribute \(A_{i}\) and each representative attribute \(A_{t}^{c}\) as:

*Step 3* Allocate all non-center attributes to their nearest centers according to the distances found in Step 2. Collect a center attribute with its allocated attributes as a cluster.

*Step 4* For each cluster \(C_{t}\), calculate the distances between any two different attributes within \(C_{t}\).

*Step 5*Calculate the radius \(r_{t}\) of each cluster \(C_{t}\) as:

*Step 6*For each attribute \(A_{t,i}\) (including the center \(A_{t}^{c})\) within a cluster \(C_{t}\), find the set of attributes [(called Near(\(A_{t,i})]\) with their distances from \(A_{t,i}\) within \(r_{t}\). That is:

*Step 7*For each cluster \(C_{t}\), find the attribute \(A_{t,l}\) with the most attributes in its Near set. Set \(A_{t,l}\) as the new center \(A_{t}^{c}\) of \(C_{t}\).

*Step 8* Repeat Steps 2–7 until the clusters have converged.

*Step 9*Output the final clusters and their centers as the representative attributes.

An example for attribute clustering

Object | PR | CA | DM | C++ | JAVA | DB | DS | AL | ST |
---|---|---|---|---|---|---|---|---|---|

\(x_{1}\) | \(A\) | \(B\) | \(A\) | \(B\) | \(B\) | \(A\) | \(B\) | \(B\) | Yes |

\(x_{2}\) | \(A\) | \(B\) | \(B\) | \(C\) | \(A\) | \(B\) | \(C\) | \(B\) | No |

\(x_{3}\) | \(B\) | \(B\) | \(B\) | \(A\) | \(B\) | \(B\) | \(A\) | \(A\) | Yes |

\(x_{4}\) | \(B\) | \(C\) | \(C\) | \(C\) | \(C\) | \(B\) | \(C\) | \(C\) | No |

\(x_{5}\) | \(C\) | \(C\) | \(C\) | \(D\) | \(C\) | \(C\) | \(D\) | \(C\) | No |

\(x_{6}\) | \(B\) | \(B\) | \(C\) | \(D\) | \(C\) | \(D\) | \(D\) | \(C\) | No |

\(x_{7}\) | \(B\) | \(B\) | \(C\) | \(B\) | \(B\) | \(A\) | \(B\) | \(C\) | Yes |

\(x_{8}\) | \(A\) | \(A\) | \(A\) | \(A\) | \(B\) | \(B\) | \(A\) | \(B\) | Yes |

After Step 9, \(k\) clusters of attributes are formed and \(k\) representative attributes for the feature space are found.

## 5 An example

In this section, a simple example is given to show how the proposed algorithm can be used to cluster the attributes. Table 3 shows the scores of eight students. There are eight condition attributes \(A\) = {PR, CA, DM, C++, JAVA, DB, DS, AL}, respectively stands for the eight subjects: Probability, Calculus, Discrete Mathematics, C++, JAVA, Database, Data Structure and Algorithms. The values of the condition attributes are \(\{ A, B, C, D\}\), which stand for the grade levels of a subject. There is one decision attribute {ST}, which stands for {Study for Master Degree} and has two possible classes {Yes, No}. In this example, the number of clusters is set at 2 (i.e. \(k\) = 2). For the set of data, the proposed algorithm proceeds as follows.

*Step 1*\(k\) attributes are randomly selected as the initial centers of the clusters. In this example, \(k\) is set at 2. Assume that the two attributes DM and DS are selected as the initial centers of the two clusters \(C_{1}\) and \(C_{2}\), respectively.

*Step 2*The distances (dissimilarities) between each non-center attribute and each center are calculated. Take the distance between PR and DM as an example. Since \(\vert \Pi _\mathrm{PR}\vert \) = 3, \(\vert \Pi _\mathrm{DM}\vert \) = 3 and \(\vert \Pi _\mathrm{PR, DM}\vert \) = 5, the relative dependency degrees \(\mathrm{Dep}(\mathrm{PR, DM})\) is calculated as 0.6 and \(\mathrm{Dep}(\mathrm{DM, PR})\) is 0.6 as well. The distance between the two attributes is thus calculated as:

The distances between non-center attributes and representative centers

Cluster \(C_{1}\) | Cluster \(C_{2}\) | ||
---|---|---|---|

Attribute pair | Distance | Attribute pair | Distance |

\(d(\mathrm{PR, DM})\) | 1.67 | \(d(\mathrm{PR, DS})\) | 2.33 |

\(d(\mathrm{CA, DM})\) | 1.67 | \(d(\mathrm{CA, DS})\) | 2.27 |

\(d\)(C++, DM) | 2 | \(d\)(C++, DS) | 1 |

\(d(\mathrm{JAVA, DM})\) | 1.67 | \(d(\mathrm{JAVA, DS})\) | 0.8 |

\(d(\mathrm{DB, DM})\) | 2 | \(d(\mathrm{DB, DS})\) | 0.8 |

\(d(\mathrm{AL, DM})\) | 1.33 | \(d(\mathrm{AL, DS})\) | 2 |

*Step 3* All non-center attributes are allocated to their nearest centers. Thus, cluster \(C_{1}\) contains {PR, CA, AL, DM} and cluster \(C_{2}\) contains {C++, JAVA, DB, DS}.

*Step 4*The distances between any two different attributes in the same clusters are calculated. The results are shown in Table 5.

The distances between any two attributes within the same clusters

Within cluster \(C_{1}\) | Within cluster \(C_{2}\) | ||
---|---|---|---|

Attribute pair | Distance | Attribute pair | Distance |

\(d(\mathrm{PR, DM})\) | 1.67 | \(d\)(C++, DS) | 1 |

\(d(\mathrm{PR, AL})\) | 1.33 | \(d\)(C++, DB) | 1.25 |

\(d(\mathrm{CA, AL})\) | 1.67 | \(d(\mathrm{JAVA, DB})\) | 2 |

\(d(\mathrm{PR, CA})\) | 1.67 | \(d\)(C++, JAVA) | 1.67 |

\(d(\mathrm{CA, DM})\) | 1.67 | \(d(\mathrm{JAVA, DS})\) | 1.25 |

\(d(\mathrm{AL, DM})\) | 1.33 | \(d(\mathrm{DB, DS})\) | 1.25 |

*Step 5*The searching radius of each cluster is calculated. Take the cluster \( C_{1}\) as an example. It includes four attributes {PR, CA, AL, DM}. The distances between each pair of attributes in \(C_{1}\) are \(\{1.67, 1.67, 1.33, 1.67, 1.67, 1.33\}\). The radius \(r_{1}\) is then calculated as:

*Step 6*The Near set of each attribute in a cluster is calculated. Take the attribute PR in cluster \(C_{1}\) as an example. Its distance from the other three attributes CA, AL and DM in the same cluster are calculated as 1.67, 1.33 and 1.67. Near(PR) thus includes only the attribute AL since only AL is within the radius \(r_{1}\) (1.56), which is found from Step 5. Similarly, the Near sets of the other three attributes in the cluster \(C_{1}\) are found as follows:

*Step 7*Since the attribute AL has the most attributes in its Near set for the cluster \(C_{1}\), AL then replaces the attribute DM as the new center of \(C_{1}\). Similarly, the original center DS for \(C_{2}\) has the most attributes in its Near set. DS is thus still the center of \(C_{2}\).

*Step 8* Steps 2–7 are repeated until the two clusters no longer change. The final clusters can thus be found as follows:

\(C_{1} = \{\mathrm{PR, CA, AL, DM}\}\), with the center AL.

\(C_{2} = \){C++, JAVA, DB, DS}, with the center DS.

*Step 9* The final clusters and their centers as the representative attributes are then output. The attributes in the same cluster can thus be considered to possess similar characteristics in classification and can be used as alternative attributes of the representative one.

## 6 Experimental results

The characteristics of the dataset of WDBC

Number of instances | 569 |

Number of attributes | 30 |

Number of classes | 2 |

Number of missing attributes | 0 |

As Fig. 1 showed, the average intradistances decreased along with the increase of the cluster number for both the two discretization methods. Besides, the discretization method by equal width performed better than that by equal frequency.

As Fig. 2 showed, the average intrasimilarity increased along with the increase of the cluster number for both the two discretization methods. The same as before, the discretization method by equal width performed better than that by equal frequency.

As Figs. 4 and 5 showed, the difference of the frequencies of the attributes being selected as centers was smaller and smaller when the cluster number \(k\) increased. This phenomenon resulted from the fact that the attributes in the same cluster would become more similar to each other when the cluster number increased. Attribute would thus be chosen as centers with a more uniform opportunity. In this case, some other criteria, such as attribute cost and ratio of missing values may be used to aid the selection of representative attributes.

## 7 Conclusions and future work

In this paper, we have attempted to use attribute clustering for feature selection. A measure of the attribute dissimilarity based on the relative dependency is proposed to calculate the distance between two attributes. An attribute clustering algorithm, called Most Neighbors First, has also been proposed to find centers in a dense region, instead of random selection in \(k\)-medoids. The proposed attribute clustering approach consists of two major phases: reassigning attributes to clusters and updating centers of clusters. After the attributes are organized into several clusters by their similarity degrees, the representative attributes in the clusters can be used for classification such that the whole feature space can be greatly reduced. Besides, if the values of some representative attributes cannot be obtained from current environments for inference, some other possible attributes in the same clusters can be used to achieve approximate inference results.

Experimental results show that the average similarity in the same cluster will increase along with the increase of cluster numbers. Besides, the discretization method is an important factor for the final results. The discretization method by equal width performs better than that by equal frequency.

At last, the proposed attribute clustering approach has to know the number of clusters in advance. This requirement results in the limitation of its applications. In the future, we will try to develop other new approaches for attribute clustering, while the number of clusters is unknown. We will also attempt to apply the proposed approach to some real application domains.

## References

- 1.Al-Radaideh, Q.A., Sulaiman, M.N., Selamat, M.H., Ibrahim, H.: Approximate reduct computation by rough sets based attribute weighting. In: The IEEE International Conference on Granular, Computing, vol. 2, pp. 383–386 (2005)Google Scholar
- 2.Buchanan, B.G., Shortliffe, E.H.: Rule-based expert system: the MYCIN experiments of the Standford heuristic programming projects. Addison-Wesley, Massachusetts (1984)Google Scholar
- 3.Dong, J.Z., Zhong, N., Ohsuga, S.: Using rough sets with heuristics to feature selection, new directions in rough sets data mining, granular-soft computing. Springer, Berlin (1999)Google Scholar
- 4.Gao, K., Liu, M., Chen, K., Zhou, N., Chen, J.: Sampling-based tasks scheduling in dynamic grid environment. In: The Fifth WSEAS International Conference on Simulation, Modeling and Optimization, pp. 25–30 (2005)Google Scholar
- 5.Han, J.: Feature selection based on rough set and information entropy. In: The IEEE International Conference on Granular Computing, vol. 1, 153–158 (2005)Google Scholar
- 6.Han, J., Kamber, M.: Data mining: concepts and techniques. Morgan Kaufmann, San Francisco (2006)Google Scholar
- 7.Hong, T.P., Wang, T.T., Wang, S.L.: Knowledge acquisition from quantitative data using the rough-set theory. Intell Data Anal.
**4**, 289–304 (2000)Google Scholar - 8.Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, Toronto (1990)CrossRefGoogle Scholar
- 9.Kodratoff, Y., Michalski, R.S.: Machine learning: an artificial intelligence artificial intelligence approach. Morgan Kaufmann Publishers, San Mateo (1983)Google Scholar
- 10.Komorowski, J., Polkowski, L., Skowron, A.: Rough sets: a tutorial. http://www.let.uu.nl/esslli/Courses/skowron/skowron.ps
- 11.Li, Y., Shiu, S.C.K., Pal, S.K.: Combining feature reduction and case selection in building CBR classifiers. IEEE Trans. Knowl. Data Eng.
**18**(3), 415–429 (2006)Google Scholar - 12.Lloyd, S.P.: Least square quantization in PCM. Bell Labs, USA (1957)Google Scholar
- 13.Mamat, R., Herawan, T., Deris, M.M.: MAR: maximum attribute relative of soft set for clustering attribute selection. Knowl. Based Syst.
**52**, 11–20 (2012)Google Scholar - 14.Michalski, R.S., Carbonell, J.G., Mitchell, T.M.: Machine learning: an artificial intelligence approach. Morgan Kaufmann Publishers, Los Altos (1983)CrossRefGoogle Scholar
- 15.Michalski, R.S., Carbonell, J.G., Mitchell, T.M.: Machine learning: an artificial intelligence approach. Morgan Kaufmann Publishers, Los Altos (1983)CrossRefGoogle Scholar
- 16.Parmar, D., Wu, T., Blackhurst, J.: MMR: an algorithm for clustering categorical data using rough set theory. Data Knowl. Discov.
**63**(3), 879–893 (2007)Google Scholar - 17.Pawlak, Z.: Rough set. Int J Comput Inf Sci
**11**(5), 341–356 (1982)CrossRefMathSciNetzbMATHGoogle Scholar - 18.Pawlak, Z.: Why rough sets? In: The Fifth IEEE International Conference on Fuzzy Systems, vol. 2, pp. 738–743 (1996)Google Scholar
- 19.Pawlak, Z., Skowron, A.: Rudiments of rough sets. Int. J. Comput. Inf. Sci.
**177**(1), 3–27 (2007)Google Scholar - 20.Qin, H., Ma, X., Zain, J.M., Herawan, T.: A novel soft set approach in selecting clustering attribute. Knowl. Based Syst.
**36**, 139–145 (2012)Google Scholar - 21.Riley, G.: Expert systems: principles and programming. PWS-Kent, Boston (1989)Google Scholar
- 22.Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems, Handbook of Application and Advances of the Rough Sets Theory, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992)Google Scholar
- 23.Sun, H.Q., Xiong, Z.: Finding minimal reducts from incomplete information systems. In: The Second International Conference on Machine Learning and Cybernetics, vol. 1, pp. 350–354 (2003)Google Scholar
- 24.Wolberg, W.H., Street W.N., Mangasarian O.L.: (1995), UCI machine learning repository, http://www.ics.uci.edu/mlearn/MLRepository.html, Irvine, CA: University of California, Department of Information and Computer Science
- 25.Wroblewski, J.: Finding minimal reducts using genetic algorithms. In: The Second Annual Join Conference on Information Sciences, pp. 186–189 (1995)Google Scholar
- 26.Zhang, J., Wang, J., Li, D., He, H., Sun, J.: A new heuristic reduct algorithm based on rough sets theory. Lecture Notes in Computer Science, pp. 247–253. Springer, New York (2003)Google Scholar
- 27.Zhang, M., Yao, J.T.: A rough sets based approach to feature selection. In: The IEEE Annual Meeting of Fuzzy, Information, pp. 434–439 (2004)Google Scholar