1 Introduction

Attribute reduction (also called feature selection) being a challenging task in areas such as data mining or pattern recognition has found many varied solutions. Rough set theory (Pawlak 1991) as a mathematical tool to deal with inconsistent data has commonly been used to investigate this issue from a theoretical and practical viewpoint (e.g. Skowron and Rauszer 1992; Swiniarski 2001; Kryszkiewicz 2001).

A practical aspect for attribute reduction is clearly visible if the data to be processed are large. Removing unimportant attributes can significantly reduce the data size. However, due to memory capacity limitation, the data can be too large to process all at once.

This limitation caused the development of methods for decomposing data for attribute reduction (e.g. Deng and Huang 2006; Hu et al. 2006; Ye and Wu 2010). Data can be split with respect to objects (horizontal data decomposition) or to attributes (vertical data decomposition). Horizontal decomposition, in contrast to the vertical one, produces data subsets that are complete in terms of characteristics of objects (i.e. all attributes are stored in each subset). This makes it possible to process each subset separately and to use the same method as for the whole data to obtain partial results.

When applying data decomposition, the crucial problem is to divide the data set into subsets so that they are consistent with the original data. It means that no essential information is lost during the decomposition (e.g. Nguyen et al. 1998; Suraj 1996; Slezak 1999).

Another limitation related to ever-increasing data is the time needed to process large data sets. Data decomposition methods, however, are mainly intended to reduce the space complexity. A decrease in the time complexity is more difficult to achieve since additional operations are needed to merge the results obtained for subtables.

An efficient solution for time-consuming operations can be a hardware implementation of the method to be applied. This can be done using field programmable gate array (FPGA) devices. They are integrated circuits that can be programmed by the user after manufacturing. FPGA devices are used to implement software algorithms in hardware to increase their performance. They can significantly shorten the computation time due to parallelism, and also reduce power consumption and heat generation.

The main limitations of hardware implementation compared with the software one are the lack of flexibility in data processing and small memory capacity. Therefore, the method before its implementation can need a considerable adaptation to meet the requirements of the FPGA architecture (Grzes et al. 2013).

The goal of this paper is to develop an attribute reduction approach that can address the mentioned-above issues. This approach uses horizontal data decomposition for computing reducts of an information systems and decision table. Firstly, the universe is split into subsets (called middle subtables) according to a given data size. Then, any two subtables are joined forming the final subtables. For each such a subtable the set of reducts (called subreducts) is computed using any standard attribute reduction method. Finally, all subreduct sets are joined into one set that after reduction corresponds to the reduct set computed for the whole data. The space complexity of this approach is determined by the size of final subtables. The time complexity equals to or is less than that of the discernibility matrix-based attribute reduction method (Skowron and Rauszer 1992) applied to the whole table. The size of subtables can be arbitrary small (at least two objects) and subreducts are computed independently from one another, thanks to this the approach is suitable for hardware implementation using an FPGA.

The proposed approach overcomes the problem of memory capacity limitation since the final reduct set can be computed from the data that are split into suitable small portions. As the experiments show, for some datasets the computation time can be somewhat shortened. Based on the preliminary tests conducted in this paper and in Grzes et al. (2013) one can conclude that hardware implementation of the approach can considerable accelerate the attribute reduction process.

The following sections review existing attribute reduction methods (Sect. 2), restate basic notions related to attribute reduction in rough set theory (Sect. 3), propose new definitions that use horizontal data decomposition for computing reducts (Sect. 4), develop and evaluate algorithms for attribute reduction (Sect. 5), conduct experiments (Sect. 6) and discuss the approach (Sect. 7), review related works (Sect. 8), and provide concluding remarks (Sect. 9). The paper ends with an appendix that includes proof of propositions and theorems introduced in the paper.

2 Attribute reduction methods

In recent decades, many methods for finding reducts have been proposed in rough set theory. Most of them can be categorized into the following general groups: discernibility matrix-based methods (Skowron and Rauszer 1992), positive region-based methods (Pawlak 1991; Grzymala-Busse 1991) and information entropy-based methods (Ślȩzak 2002). Some of these methods are able to compute all reducts, other are developed to find one reduct, in particular, a minimal reduct.

Because of the scope of the paper, this section focuses on methods for computing all reducts. Most of them can be categorized into two groups: traversing based strategy and discernibility matrix-based strategy. The former relies on searching the whole powerset of the attribute set and checking subsets for being reducts. The letter computes for each pair of objects a subset of attributes that are needed to discern the objects. Reducts are constructed based on the attribute subsets that are minimal in the sense of inclusion. Both the strategies produce the same reducts but the discernibility matrix-based approach is less time consuming.

Skowron and Rauszer (1992) introduced a discernibility matrix and its alternative representation in the form of discernibility function to find all reducts in an information system. The idea of discernibility matrix/function has been intensively studied by many researchers. Hu and Cercone (1995) proposed a modification of the discernibility matrix for attribute reduction in a decision table. Kryszkiewicz (1998) applied discernibility functions in incomplete information systems and decision tables. Ye and Chen (2002) presented a discernibility matrix for attribute reduction in consistent and inconsistent decision tables. Zhang et al. (2003) studied finding reducts of inconsistent information systems. Degang et al. (2007) used the discernibility matrix for finding reducts in consistent and inconsistent covering decision system. The discernibility matrix/function has also become a starting point for computing all reducts by adapting different well-known strategies, e.g. exhaustive and genetic search (Bazan et al. 2000).

The main problem to face when developing a method for finding reducts is the computationally complexity of the attribute reduction task. Finding all reducts is proven to be an NP-hard problem (Skowron and Rauszer 1992). Much effort has therefore been made to accelerate the attribute reduction process.

Susmaga (1998) constructed an absorbed discernibility list based on the discernibility matrix. The list includes minimal elements of the matrix that are sorted in ascending order according to their cardinalities. Reducts are computed based on the list using the breadth-first search method. Starzyk et al. (1999) sped up computation of all reducts based on the discernibility function thanks to the application of the absorption and expansion laws as well as the concept of strong equivalence. Two attributes that are locally strongly equivalent in the discernibility function are replaced with a single attribute. Leung and Li (2003) used maximal consistent blocks as units to construct a reduced discernibility matrix of an incomplete information system. A maximal consistent block is understood as the maximal collection of objects in which all objects are indiscernible. To decrease the computational complexity of the classical rough set-based method, Li and Zhu (2005) defined an indiscernibility matrix based on the discernibility one. The computations in that approach are simplified because elements in the indiscernibility matrix are ordered. Yang (2006) proposed modifications of the discernibility matrix by dividing the universe into the consistent and inconsistent parts. That approach significantly reduces the time needed for computing reducts. Tsang et al. (2008) generalized the discernibility matrix method and applied it for covering generalized rough sets. To reduce the storage space used by the discernibility matrix methods, Xu et al. (2009) proposed a novel data structure, i.e. an improved frequent pattern tree. Unnecessary elements to compute reducts are not stored in the discernibility matrix. Wang and Ma (2009) developed a feature forest based algorithm for finding all reducts in consistent decision tables. To represent the data, the algorithm employs a decision forest that decreases the storage space. Reducts are computed based on the discernibility function in disjunctive normal form. Chen et al. (2012) proposed sample pair selection to construct the simplest form of the discernibility function of a decision table. Thanks to that, only minimal elements of the discernibility matrix are used to compute reducts. Zhang et al. (2013) developed a combinatorial optimization algorithm for finding all reducts of a covering decision system. The proposed attribute reduction is oriented towards obtaining the compact rules that are useful in making decision. Thi and Giang (2013) used Sperner system and the concept of minimal sets of an attribute to find all reducts of a consistent decision table.

Much research has also been devoted to finding a reduct, especially a minimal one. Although one reduct is sufficient to reduce the attribute set, the problem of finding all reducts still has its justification. A deeper analysis of the data can be conducted when all reducts are known. For instance, all reducts are necessary to find stable reducts (Bazan et al. 1994).

As shown in this section, the problem of improving the attribute reduction process by reducing the storage space or time-consuming calculations has been widely investigated. Another direction for making attribute reduction methods more efficient for large datasets is to divide the attribute reduction problem into subproblems. It can be done by applying data decomposition-based attribute reduction approaches.

3 Basic notions

This section restates basic definitions from rough set theory related to attribute reduction.

To store data to be processed, an information system is used.

Definition 1

(Pawlak 1991) (Information system) An information system is a pair \(\hbox {IS}=\left( U,A\right) \), where \(U\) is a non-empty finite set of objects, called the universe, and \(A\) is a non-empty finite set of attributes.

Each attribute \(a\in A\) is treated as a function \(a:U\rightarrow V_a\), where \(V_a\) is the value set of \(a\).

Essential information about data is expressed by an indiscernibility relation.

Definition 2

(Pawlak 1991) (Indiscernibility relation) An indiscernibility relation \(\mathrm{IND}(B)\) generated by \(B\subseteq A\) on \(U\) is defined by

$$\begin{aligned} \mathrm{IND}(B)=\left\{ (x,y)\in U\times U:\forall _{a\in B}a(x)=a(y)\right\} \end{aligned}$$
(1)

The relation is used to define a reduct of the attribute set.

Definition 3

(Pawlak 1991) (Reduct of attribute set) A subset \(B\subseteq A\) is a reduct of \(A\) on \(U\) if and only if

  1. 1.

    \(\mathrm{IND}(B)=\mathrm{IND}(A)\),

  2. 2.

    \(\mathop \forall \nolimits _{\begin{array}{c} C\subset B,\\ C\ne \emptyset \end{array}}\mathrm{IND}(C)\ne \mathrm{IND}(B)\).

The set of all reducts of \(A\) on \(U\) is denoted by \(\mathrm{RED}(A)\).

The reduct set of an information system can be computed using a discernibility function.

Definition 4

(Skowron and Rauszer 1992) (Discernibility function) A discernibility function \(f_\mathrm{IS}\) of an information system \(\mathrm{IS}=(U,A)\) is a Boolean function of \(k\) Boolean variables \(a_1^{*},\ldots ,a_k^{*}\) that correspond, respectively, to attributes \(a_1,\ldots ,a_k\in A\) and is defined by

$$\begin{aligned} f_\mathrm{IS}(a_1^{*},\ldots ,a_k^{*})=\bigwedge _{c_{x,y}\ne \emptyset }\mathop \bigvee \limits _{a\in c_{x,y}}a^{*} \end{aligned}$$
(2)

where \((c_{x,y})\) is the discernibility matrix of \(\hbox {IS}\) such that

\(\forall _{x,y\in U} c_{x,y}=\{a\in A:a(x)\ne a(y)\}\).

A prime implicant \(a^{*}_{i_1}\wedge \cdots \wedge a^{*}_{i_k}\) of \(f_\mathrm{IS}\) is equivalent to a reduct \(\{a_{i_1},\ldots ,a_{i_k}\}\) of \(\hbox {IS}\).

A special case of an information system, called decision table, is used to store data with class distribution.

Definition 5

(Pawlak 1991) (Decision table) A decision table is a pair \(\mathrm{DT}=\left( U,A\cup \{d\}\right) \), where \(U\) is a non-empty finite set of objects, called the universe, \(A\) is a non-empty finite set of condition attributes, and \(d\not \in A\) is the decision attribute.

Each attribute \(a\in A\cup \{d\}\) is treated as a function \(a:U\rightarrow V_a\), where \(V_a\) is the value set of \(a\).

For a decision table a relative indiscernibility relation and relative reduct of the attribute set are defined.

Definition 6

(Pawlak 1991; Miao et al. 2009) (Relative indiscernibility relation) A relative indiscernibility relation \(\mathrm{IND}(B,d)\) generated by \(B\subseteq A\) on \(U\) is defined by

$$\begin{aligned}&\!\!\!\mathrm{IND}(B,d)\nonumber \\&\quad =\{(x,y)\in U\times U:(x,y)\in \mathrm{IND}(B)\vee d(x)=d(y)\}\nonumber \\ \end{aligned}$$
(3)

Definition 7

(Pawlak 1991; Miao et al. 2009) (Relative reduct of attribute set) A subset \(B\subseteq A\) is a relative reduct of \(A\) if and only if

  1. 1.

    \(\mathrm{IND}(B,d)=\mathrm{IND}(A,d)\),

  2. 2.

    \(\mathop \forall \nolimits _{\begin{array}{c} C\subset B,\\ C\ne \emptyset \end{array}}\mathrm{IND}(C,d)\ne \mathrm{IND}(B,d)\).

The set of all relative reducts of \(A\) on \(U\) is denoted by \(\mathrm{RED}(A,d)\).

The relative reduct set of a decision table can be computed using a relative discernibility function.

Definition 8

(Skowron and Rauszer 1992) (Relative discernibility function) A relative discernibility function \(f_{\mathrm{DT}}\) of a decision table \(\mathrm{DT}=(U,A\cup \{d\})\) is a Boolean function of \(k\) Boolean variables \(a_1^{*},\ldots ,a_k^{*}\) that correspond, respectively, to attributes \(a_1,\ldots ,a_k\in A\) and is defined by

$$\begin{aligned} f_{\mathrm{DT}}(a_1^{*},\ldots ,a_k^{*})=\bigwedge _{c^d_{x,y}\ne \emptyset }\mathop \bigvee \limits _{a\in c^d_{x,y}}a^{*} \end{aligned}$$
(4)

where \((c^d_{x,y})\) is the relative discernibility matrix of \(\mathrm{DT}\) such that

\(\forall _{x,y\in U} c^d_{x,y}=\{a\in A:a(x)\ne a(y), d(x)\ne d(y)\}\).

A prime implicant \(a^{*}_{i_1}\wedge \cdots \wedge a^{*}_{i_k}\) of \(f_{\mathrm{DT}}\) is equivalent to a relative reduct \(\{a_{i_1},\ldots ,a_{i_k}\}\) of \(\mathrm{DT}\).

To illustrate the basic notions and those proposed in this paper, the following running example is used.

Example 1

Given a data table of patients who are suspected to be sick with flu.

\(U{\setminus } A\)

Temperature

Headache

Weakness

Nausea

Flu

1

Very high

Yes

Yes

No

Yes

2

Normal

No

No

No

No

3

High

No

No

No

No

4

Normal

No

Yes

No

Yes

5

Normal

No

Yes

No

No

6

High

Yes

No

Yes

Yes

7

Very high

No

No

No

No

8

Normal

Yes

Yes

Yes

Yes

Treat the data table as the information system \(\mathrm{IS}=(U,A)\), where \(U=\{1,\ldots ,8\}\) and \(A=\{\hbox {temperature},\hbox {headache},\hbox {weakness},\hbox {nausea}, \hbox {flu}\}\). We obtain the following discernibility matrix \((c_{x,y})\) (For simplicity’s sake the attributes names are abbreviated to their first letters and only the part of the matrix under the diagonal is shown since the matrix is symmetric.).

$$\begin{aligned} \left( \begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} \emptyset &{} \{t,h,w,f\} &{} \{t,h,w,f\} &{} \{t,h\} &{} \{t,h,f\} &{} \{t,w,n\} &{} \{h,w,f\} &{} \{t,n\} \\ &{} \emptyset &{} \{t\} &{} \{w,f\} &{} \{w\} &{} \{t,h,n,f\} &{} \{t\} &{} \{h,w,n,f\} \\ &{} &{} \emptyset &{} \{t,w,f\} &{} \{t,w\} &{} \{h,n,f\} &{} \{t\} &{} A \\ &{} &{} &{} \emptyset &{} \{f\} &{} \{t,h,w,n\} &{} \{t,w,f\} &{} \{h,n\} \\ &{} &{} &{} &{} \emptyset &{} A &{} \{t,w\} &{} \{h,n,f\} \\ &{} &{} &{} &{} &{} \emptyset &{} \{t,h,n,f\} &{} \{t,w\} \\ &{} &{} &{} &{} &{} &{} \emptyset &{} A \\ &{} &{} &{} &{} &{} &{} &{} \emptyset \\ \end{array} \right) \end{aligned}$$

The discernibility function derived from the matrix is \(f_\mathrm{IS}(t^{*}, h^{*}, w^{*}, n^{*}, f^{*})=t^{*}\wedge w^{*}\wedge f^{*}\wedge (h^{*}\vee n^{*})\). Based on its prime implicants \(t^{*}\wedge h^{*}\wedge w^{*} \wedge f^{*}\) and \(t^{*}\wedge w^{*}\wedge n^{*} \wedge f^{*}\) we obtain that the sets of reducts of \(\hbox {IS}\) are \(\mathrm{RED}(A)=\{\{t,h,w,f\}, \{t,w,n,f\}\}\).

The above data table can also be considered as the decision table \(\mathrm{DT}=(U,A^{\prime }\cup \{flu\})\), where \(A^{\prime }=\{\hbox {temperature}, \hbox {headache},\hbox {weakness},\hbox {nausea}\}\).

We obtain in an analogous way that the set of relative reducts of \(\mathrm{DT}\) is \(\mathrm{RED}(A^{\prime },flu)=\{\{h,w\},\{t,w,n\}\}\).

4 Horizontal decomposition for attribute reduction

This section proposes two attribute reduction approaches based on horizontal decomposition. The first one is developed for an information system, and the second one is for a decision table.

4.1 Horizontal decomposition of information system

Firstly, an indiscernibility relation on a subset of the universe is defined.

Definition 9

(Indiscernibility relation on universe subset) An indiscernibility relation \(\mathrm{IND}(B)\) generated by \(B\subseteq A\) on \(X\subseteq U\) is defined by

$$\begin{aligned} \mathrm{IND}_X(B)=\mathrm{IND}(B)\cap X\times X \end{aligned}$$
(5)

The relation has the following properties.

Proposition 1

For any information system \(\mathrm{IS}=(U,A)\), any non-empty \(B\subseteq A\) and \(X\subseteq U\) the following hold.

  1. 1.1

    \(\mathrm{IND}(B)=\mathrm{IND}_X(B)\) for \(X=U\).

  2. 1.2

    \(\mathrm{IND}_X(B)\subseteq \mathrm{IND}(B)\).

  3. 1.3

    \(\forall _{x,y\in X}\left( (x,y)\in \mathrm{IND}_X(B)\Leftrightarrow (x,y)\in \mathrm{IND}(B)\right) \).

  4. 1.4

    \(\mathrm{IND}_X(B)=\mathrm{IND}_X(A)\Rightarrow \exists _{\begin{array}{c} C\subseteq A,\\ B\subseteq C \end{array}}\mathrm{IND}(C)=\mathrm{IND}(A)\).

  5. 1.5

    \(\mathrm{IND}(B)=\mathrm{IND}(A)\Rightarrow \exists _{\begin{array}{c} C\subseteq B,\\ C\ne \emptyset \end{array}}\mathrm{IND}_X(C)=\mathrm{IND}_X(A)\).

A reduct of the attribute set on a subset of the universe is defined as follows.

Definition 10

(Reduct of attribute set on universe subset) A subset \(B\subseteq A\) is a reduct of \(A\) on \(X\subseteq U\) if and only if

  1. 1.

    \(\mathrm{IND}_X(B)=\mathrm{IND}_X(A)\),

  2. 2.

    \(\forall _{\begin{array}{c} C\subset B,\\ C\ne \emptyset \end{array}}\mathrm{IND}_X(C)\ne \mathrm{IND}_X(B)\).

The set of all reducts of \(A\) on \(X\subseteq U\) is denoted by \(\mathrm{RED}_X(A)\).

To differ a reduct on a universe subset from one on the whole universe, the former is called subreduct.

To decompose data, a covering of the universe is constructed. Based on each set of the covering a middle subtable is formed. Each pair of middle subtables is merged into one final subtable. To compute reduct sets of an information system, the subreduct sets of all the final subtables are joined using the following operation.

Definition 11

(Operation \(\,\dot{\cup }\,\)) An operation \(\,\dot{\cup }\,\) on families of sets is defined as follows

  1. 1.

    \(\fancyscript{S}\,\dot{\cup }\,\emptyset =\emptyset \,\dot{\cup }\,\fancyscript{S}=\fancyscript{S}\);

  2. 2.

    \(\fancyscript{S}\,\dot{\cup }\,\fancyscript{S}^{\prime }=\{S\cup S^{\prime }: S\in \fancyscript{S},S^{\prime }\in \fancyscript{S}^{\prime }\}\);

  3. 3.

    \({\dot{\bigcup }}_{i=1}^k \fancyscript{S}_i=\fancyscript{S}_1\,\dot{\cup }\,\fancyscript{S}_2\,\dot{\cup }\,\cdots \,\dot{\cup }\,\fancyscript{S}_k\), where \(k>1\).

The family of attribute subsets created by the above operation includes, in general, not only reducts but also supersets of them. To remove unnecessary sets, the following operation is used. Let \(\hbox {min}(\fancyscript{S})\) be the set of minimal elements of a family \(\fancyscript{S}\) of sets partially ordered by the relation \(\subseteq \).

The above operations have the following properties.

Proposition 2

For any families \(\fancyscript{S},\fancyscript{S}^{\prime }\) of sets the following hold.

  1. 2.1

    \(\fancyscript{S}\subseteq \fancyscript{S}\,\dot{\cup }\,\fancyscript{S}\).

  2. 2.2

    \(\min (\fancyscript{S}\,\dot{\cup }\,\fancyscript{S})=\min (\fancyscript{S})\).

  3. 2.3

    \(\min (\fancyscript{S}\,\dot{\cup }\,\fancyscript{S}^{\prime })=\min (\min (\fancyscript{S})\,\dot{\cup }\,\min (\fancyscript{S}^{\prime }))\).

  4. 2.4

    \({\dot{\bigcup }}_{\fancyscript{S}\in \mathbf {S}\cup \mathbf {S}^{\prime }}\fancyscript{S}={\dot{\bigcup }}_{\fancyscript{S}\in \mathbf {S}}\fancyscript{S}\,\dot{\cup }\,{\dot{\bigcup }}_{\fancyscript{S}\in \mathbf {S}^{\prime }}\fancyscript{S},\) where \(\mathbf {S}, \mathbf {S}^{\prime }\) are any families of families of sets.

These operations are used to define attribute reduction of an information system.

Theorem 1

Let \(\mathrm{IS}=(U,A)\) be an information system, \(\{X_1,X_2,\ldots ,X_k\}\) (\(k>1\)) be a covering of \(U\). The following holds

$$\begin{aligned} \mathrm{RED}(A)=\min \left( \mathop {\dot{\bigcup }}\limits _{1\le i<j \le k}\mathrm{RED}_{X_i\cup X_j}(A)\right) \end{aligned}$$
(6)

The above theorem is true for any total covering of the universe. From the practical viewpoint, its special case, i.e. a partition is taken.

Example 2

Consider a covering \(\{X_1,X_2,X_3,X_4\}\) of \(U\), where \(X_1=\{3,8\},X_2=\{1,7\},X_3=\{4,5\},X_4=\{2,6\}\).

We have the following subreducts \(\mathrm{RED}_{X_1\cup X_2}(A)=\{\{t,h\},\) \(\{t,w\},\{t,f\}\},\mathrm{RED}_{X_1\cup X_3}(A)=\{\{t,h,f\},\{t,n,f\},\{h,w,f\},\) \(\{w,n,f\}\},\mathrm{RED}_{X_1\cup X_4}(A)=\{\{t,h\},\{t,n\},\{t,f\}\},\) \(\mathrm{RED}_{X_2\cup X_3}(A)=\{\{t,f\},\{h,w,f\}\},\mathrm{RED}_{X_2\cup X_4}(A)=\{\{t,h\},\) \(\{t,w\},\{t,f\}\},\mathrm{RED}_{X_3\cup X_4}(A)=\{\{w,f\}\}\).

We obtain \(\min ({\dot{\bigcup }}_{1\le i<j \le 4}\mathrm{RED}_{X_i\cup X_j}(A))=\) \(\min (\min (\mathrm{RED}_{X_1\cup X_2}(A)\,\dot{\cup }\,\mathrm{RED}_{X_1\cup X_3}(A))\,\dot{\cup }\,\) \(\min (\mathrm{RED}_{X_1\cup X_4}(A)\,\dot{\cup }\, \mathrm{RED}_{X_2\cup X_3}(A))\,\dot{\cup }\,\min (\mathrm{RED}_{X_2\cup X_4}(A)\,\dot{\cup }\,\mathrm{RED}_{X_3\cup X_4}(A)))=\min (\{\{t,h,f\},\{t,n,f\}\}\,\dot{\cup }\,\{\{t,f\}\}\,\dot{\cup }\,\{\{t,w,f\}\})=\{\{t,h,w,f\},\{t,n,w,f\}\}=\mathrm{RED}(A)\).

Besides redundant operations that follow from using a covering of the universe instead of a partition of it, some other operations can be repeated unnecessarily when using the above definition. Namely, when we compute reducts on sets \(X\cup Y\) and \(X\cup Z\), we process each pair of objects from \(X\) twice. To exclude these repetitions, attribute reduction is defined in the following way.

Theorem 2

Let \(\mathrm{IS}=(U,A)\) be an information system, \(\{X_1,X_2,\ldots ,X_k\}\) \((k>1)\) be a covering of \(U\). The following holds

$$\begin{aligned}&\mathrm{RED}(A)\nonumber \\&\quad =\min \left( \mathop {\dot{\bigcup }}\limits _{1\le i\le k}\mathrm{RED}_{X_i}(A)\,\dot{\cup }\,\mathop {\dot{\bigcup }}\limits _{1\le i<j \le k}\mathrm{RED}_{X_i\cup X_j}(A,d)\right) \nonumber \\ \end{aligned}$$
(7)

where \(d\) is a virtual decision such that \(\forall _{x,y\in U} (d(x)=d(y)\Leftrightarrow \exists _{1\le i \le k} x,y\in X_i)\).

The use of attribute reduction on the set \(X\cup Y\) with respect to the virtual decision (see Definition 13) guarantees that any object from \(X\) is processed with objects from \(Y\) only, and vice versa.

Example 3

Consider the covering of \(U\) from Example 2.

We have the following subreducts \(\mathrm{RED}_{X_1}(A)=\{\{t\},\{h\},\) \(\{w\},\{n\},\{f\}\}, \mathrm{RED}_{X_2}(A)=\{\{h\},\{w\},\{f\}\}, \mathrm{RED}_{X_3}(A)=\{\{f\}\},\mathrm{RED}_{X_4}(A)=\{\{t\},\{h\},\{n\},\{f\}\},\mathrm{RED}_{X_1\cup X_2}(A,d)=\{\{t\}\},\mathrm{RED}_{X_1\cup X_3}(A,d)=\{\{t,h\},\{t,n\},\{h,w\},\{w,n\}\},\) \(\mathrm{RED}_{X_1\cup X_4}(A,d)=\{\{t,h\},\{t,n\},\{t,f\}\},\mathrm{RED}_{X_2\cup X_3}(A,d)=\{\{t\},\{h,w\}\},\mathrm{RED}_{X_2\cup X_4}(A,d)=\{\{t\}\},\mathrm{RED}_{X_3\cup X_4}(A,d)=\) \(\{\{w\}\}\).

We obtain \(\min ({\dot{\bigcup }}_{1\le i\le 4}\mathrm{RED}_{X_i}(A)\,\dot{\cup }\,{\dot{\bigcup }}_{1\le i<j\le 4}\mathrm{RED}_{X_i\cup X_j}(A,d))=\) \(\min (\min ({\dot{\bigcup }}_{1\le i\le 4}\mathrm{RED}_{X_i}(A))\,\dot{\cup }\,\min ({\dot{\bigcup }}_{1\le i<j\le 4}\mathrm{RED}_{X_i\cup X_j}(A,d)))=\) \(\min (\{\{f\}\}\,\dot{\cup }\,\{\{t,h,w\},\{t,n,w\}\})=\{\{t,h,w,f\},\) \(\{t,w,n,f\}\}=\mathrm{RED}(A)\).

4.2 Horizontal decomposition of decision table

Firstly, a relative indiscernibility relation on a subset of the universe is defined.

Definition 12

(Relative indiscernibility relation on universe subset) A relative indiscernibility relation \(\mathrm{IND}(B)\) generated by \(B\subseteq A\) on \(X\subseteq U\) is defined by

$$\begin{aligned} \mathrm{IND}_X(B,d)=\mathrm{IND}(B,d)\cap X\times X \end{aligned}$$
(8)

The relation properties are the following.

Proposition 3

For any decision table \(\mathrm{DT}=(U,A\cup \{d\}),\) non-empty \(B\subseteq A\) and \(X\subseteq U\) the following hold.

  1. 3.1

    \(\mathrm{IND}(B,d)=\mathrm{IND}_X(B,d)\) for \(X=U\).

  2. 3.2

    \(\mathrm{IND}_X(B,d)\subseteq \mathrm{IND}(B,d)\).

  3. 3.3

    \(\forall _{x,y\in X}\left( (x,y)\in \mathrm{IND}_X(B,d)\Leftrightarrow (x,y)\in \mathrm{IND}(B,d)\right) \).

  4. 3.4

    \(\mathrm{IND}_X(B,d)=\mathrm{IND}_X(A,d)\Rightarrow \exists _{\begin{array}{c} C\subseteq A,\\ B\subseteq C \end{array}}\mathrm{IND}(C,d)=\mathrm{IND}(A,d)\).

  5. 3.5

    \(\mathrm{IND}(B,d)=\mathrm{IND}(A,d)\Rightarrow \exists _{\begin{array}{c} C\subseteq B,\\ C\ne \emptyset \end{array}}\mathrm{IND}_X(C,d)=\mathrm{IND}_X(A,d)\).

A relative reduct of attribute set on a universe subset is defined as follows.

Definition 13

(Relative reduct of attribute set on universe subset) A subset \(B\subseteq A\) is a relative reduct of \(A\) on \(X\subseteq U\) if and only if

  1. 1.

    \(\mathrm{IND}_X(B,d)=\mathrm{IND}_X(A,d)\),

  2. 2.

    \(\mathop \forall \nolimits _{\begin{array}{c} C\subset B,\\ C\ne \emptyset \end{array}}\mathrm{IND}_X(C,d)\ne \mathrm{IND}_X(B,d)\).

The set of all relative reducts of \(A\) on \(X\subseteq U\) is denoted by \(\mathrm{RED}_X(A,d)\).

To decompose a decision table, each decision class is divided into subsets (middle subtables), then each pair of subsets of different classes is merged into one set (final subtables). To compute relative reduct sets of a decision table, the subreduct sets of all the final subtables are joined (analogously to the information system case).

Theorem 3

Let \(\mathrm{DT}=(U,A\cup \{d\})\) be a decision table such that \(U=\bigcup _{i=1}^k X_{v_i},\) where \(X_{v_i}\) is a decision class, \(v_i\in V_d\) and \(k>1\) is the number of different classes in \(\mathrm{DT}\). Let \({\fancyscript{X}}_{v_i}\) be a covering of \(X_{v_i}\) \((1\le i\le k)\). The following holds

$$\begin{aligned} \mathrm{RED}(A,d)=\min \left( \mathop {\dot{\bigcup }}\limits _{\begin{array}{c} 1\le i<j\le k,\\ X\in {\fancyscript{X}}_{v_i},Y\in {\fancyscript{X}}_{v_j} \end{array}} \mathrm{RED}_{X\cup Y}(A,d)\right) \end{aligned}$$
(9)

Example 4

Consider coverings \(\{X_1,X_2\}\) and \(\{X_3,X_4\}\) of, respectively, decision classes “no” and “yes”, where \(X_1=\{2,7\},X_2=\{3,5\},X_3=\{1,4\},X_4=\{6,8\}\).

We have the following relative subreducts \(\mathrm{RED}_{X_1\cup X_3}(A^{\prime },d)=\{\{w\}\},\mathrm{RED}_{X_1\cup X_4}(A^{\prime },d)=\{\{h\},\{t,w\},\) \(\{n\}\},\mathrm{RED}_{X_2\cup X_3}(A^{\prime },d)=\{\{t\},\{h,w\}\},\mathrm{RED}_{X_2\cup X_4}(A^{\prime },d)=\{\{h\},\{n\}\}\).

We obtain \(\min ({\dot{\bigcup }}_{X\in \{X_1,X_2\},Y\in \{X_3,X_4\}} \mathrm{RED}_{X\cup Y}(A^{\prime },d))=\) \(\min (\min (\mathrm{RED}_{X_1\cup X_3}(A^{\prime },d)\,\dot{\cup }\,\mathrm{RED}_{X_1\cup X_4}(A^{\prime },d))\,\dot{\cup }\,\) \(\min (\mathrm{RED}_{X_2\cup X_3}(A^{\prime },d)\,\dot{\cup }\,\mathrm{RED}_{X_2\cup X_4}(A,d)))=\min (\{\{h,w\},\) \(\{t,w\},\{w,n\}\}\,\dot{\cup }\,\{\{t,h\},\{t,n\},\{h,w\}\})=\{\{h,w\},\{t,w, n\}\}=\mathrm{RED}(A^{\prime },d)\).

5 Data decomposition-based algorithms for attribute reduction

This section proposes data decomposition algorithms for attribute reduction that are defined based on the theorems from the previous section. The algorithms are evaluated and their features are discussed.

5.1 Attribute reduction algorithms

Attribute reduction of an information system is performed according to the schema presented in Fig. 1. Step I of the schema is performed by a simple algorithm \(Create\_loc\_list\). The data size threshold is defined by the user and its maximum can be determined by the memory capacity limit. Steps II and III are performed by an algorithm \(Compute\_subred\). To compute the subreduct sets of middle subtables, the algorithm is called with default parameters. Step IV is done using an algorithm \(Compute\_red\).

Fig. 1
figure 1

Attribute reduction of \(\mathrm{IS}=(U,A)\): I Decomposition of the universe into middle subtables. II Construction of the final subtables based on the middle ones. III Computation of subreduct sets of the middle and final subtables. IV Merging subreduct sets into the reduct set

figure a
figure b
figure c

Figure 2 presents the schema for performing attribute reduction of a decision table. The operations circled by the ellipse, i.e. step III, are presented in more detail in Fig. 3. An algorithm \(Create\_set\_loc\_list\) is used to perform steps I, II, and III(i). An algorithm called \(Compute\_rel\_red\) uses \(Compute\_loc\_red\) to perform steps III(ii) and III(iii). It also conducts the remaining steps III(iv) and IV. The final relative reduct is computed based on the sum of partial sums of relative subreducts denoted by \(SS_{v_{i},v_j}\) in Figs. 2 and 3.

Fig. 2
figure 2

Attribute reduction of \(\mathrm{DT}=(U,A\cup \{d\})\): I Decomposition of the universe into one-class subsets. II Construction of two-class subsets based on one-class ones. III Computation of relative subreduct sets of two-class subtables (partial sums of relative subreducts). IV Merging the partial sums of relative subreducts into the relative reduct set

Fig. 3
figure 3

Attribute reduction of two-class data subset: i Decomposition of each class of the two-class subtable into middle subtables. ii Construction of the final subtables based on the middle ones. iii Computation of the subreduct sets of the final subtables. iv Merging the relative subreduct sets into the partial sum of relative subreducts (\(SS_{v_{1},v_2}\))

figure d
figure e

5.2 Evaluation of algorithms

The correctness and completeness of the proposed approach can be shown by Theorems 2 and 3.

Definition 14

(Correctness and completeness) A data decomposition-based algorithm for attribute reduction is correct and complete if for every data table using its every permissible decomposition, the algorithm returns the set of reducts of the whole data table.

Proposition 4

The \(Compute\_red\) and \(Compute\_rel\_red\) algorithms are correct and complete.

The remaining of this section evaluates time and space complexity of the algorithms proposed in the previous subsection.

To create lists of localizations (the algorithms \(Create\_loc\_list\) and \(Create\_set\_loc\_list\) ), we need to once scan the table, hence the time and space complexity of the algorithms is \(O(n)\), where \(n\) is the number of objects stored in the table. Computing complexities, we do not include the data size defined by the number of attributes since this does not change for subtables.

The time complexity of the \(Compute\_red\) and \(Compute\_rel\_red\) algorithms depends on the method to be used to compute subreducts. It will be shown that the time complexity of the \(Compute\_red\) algorithm is equal to or less than that of the discernibility matrix based attribute reduction method when applied to the whole information system. To end this, one has to show that computations of reducts performed by \(Compute\_red\) are equivalent to those of prime implicants of the discernibility function.

Based on the correspondence between the discernibility function of an information system and the set of reducts of this system one can define transformation operations.

Definition 15

(Transformation operations) Let \(a,b\in A\) and \(\mathrm{IS}=(U,A)\) be an information system. Transformation operations are defined by

  1. 1.

    \(a^{*}\wedge b^{*}\equiv \{\{a,b\}\}\),

  2. 2.

    \(a^{*}\vee b^{*}\equiv \{\{a\},\{b\}\}\).

The following equivalencies can be derived from the above definition.

Proposition 5

Let \(a,b,c\in A, B_i\subseteq A\) and \(\mathrm{IS}=(U,A)\) be an information system\(,\) where \(1\le i\le k\) and \(k \ge 1\).

The following hold

  1. 5.1

    \(a^{*}\vee (b^{*}\wedge c^*)\equiv \{\{a\},\{b,c\}\},\)

  2. 5.2

    \(a^{*}\wedge (b^{*}\vee c^*)\equiv \{\{a,b\},\{a,c\}\}=\{\{a\}\}\,\dot{\cup }\,\{\{b\},\{c\}\}\),

  3. 5.3

    \(\bigvee _{1\le i\le k}\bigwedge _{a\in B_i}a^*\equiv \{\{a:a\in B_i\}:1\le i\le k\},\)

  4. 5.4

    \(\bigwedge _{1\le i\le k}\bigvee _{a\in B_i}a^*\equiv {\dot{\bigcup }}_{1\le i\le k}\{\{a\}:a\in B_i\}\).

The reduct set computed for two different objects corresponds to the cell of discernibility matrix defined by the objects.

Proposition 6

Let \(\mathrm{IS}=(U,A)\) be an information system. The following holds

$$\begin{aligned} \forall _{\begin{array}{c} x,y\in U,\\ x\ne y \end{array}}\left( \mathrm{RED}_{\{x,y\}}(A)\equiv \mathop \bigvee \limits _{a\in c_{x,y}}a^{*} \right) \end{aligned}$$
(10)

The set constructed using the \(\,\dot{\cup }\,\) operation on the reduct sets computed for all pairs of different objects corresponds to the discernibility function.

Proposition 7

Let \(\mathrm{IS}=(U,A)\) be an information system. The following holds

$$\begin{aligned} \mathop {\dot{\bigcup }}\limits _{\begin{array}{c} x,y\in U,\\ x\ne y \end{array}}\mathrm{RED}_{\{x,y\}}(A)\equiv \bigwedge _{\begin{array}{c} x,y\in U,\\ x\ne y \end{array}} \mathop \bigvee \limits _{a\in c_{x,y}}a^{*} \end{aligned}$$
(11)

Computing the left side of the equivalence (i.e. applying the \({\dot{\bigcup }}\) operation), we obtain an expression directly equivalent to disjunction form of the right side. This holds by the following proposition.

Proposition 8

Let \(B_i\subseteq A,\) where \(1\le i\le k\) and \(k>1\). The set \({\dot{\bigcup }}_{1\le i\le k}\{\{a\}:a\in B_i\}\) in explicit form is directly equivalent to the expression \(\bigwedge _{1\le i\le k} \bigvee _{a\in B_i}a^{*}\) in disjunctive form.

Let \(\mathrm{PI}(p)\) be the set of all prime implicants of a Boolean expression \(p\). The following three propositions show the equivalences of subsequent operations.

Proposition 9

Let \(\mathrm{IS}=(U,A)\) be an information system. The following holds

$$\begin{aligned} \min \left( \mathop {\dot{\bigcup }}\limits _{\begin{array}{c} x,y\in U,\\ x\ne y \end{array}}\mathrm{RED}_{\{x,y\}}(A)\right) \equiv \mathrm{PI}\left( \bigwedge _{\begin{array}{c} x,y\in U,\\ x\ne y \end{array}} \mathop \bigvee \limits _{a\in c_{x,y}}a^{*}\right) \end{aligned}$$
(12)

Proposition 10

Let \(\mathrm{IS}=(U,A)\) be an information system and \(\{X_1,X_2,\ldots ,X_k\}\) be a covering of \(U,\) where \(k>1\). The following holds

$$\begin{aligned}&\mathop {\dot{\bigcup }}\limits _{1\le i<j \le k}\min \left( \mathop {\dot{\bigcup }}\limits _{\begin{array}{c} x,y\in U,\\ x\ne y \end{array}}\mathrm{RED}_{\{x,y\}}(A)\right) \nonumber \\&\quad \equiv \bigwedge _{1\le i<j \le k}\mathrm{PI}\left( \bigwedge _{\begin{array}{c} x,y\in U,\\ x\ne y \end{array}} \mathop \bigvee \limits _{a\in c_{x,y}}a^{*}\right) \end{aligned}$$
(13)

Proposition 11

Let \(\mathrm{IS}=(U,A)\) be an information system\(,\) \(\{X_1,X_2,\ldots ,X_k\}\) be a covering of \(U\) \((k>1)\) and \(\mathrm{IS}_{ij}=(X_i\cup X_j,A)\) be an information subsystem of \(\mathrm{IS}\) such that \(X_i, X_j\subset U\). The following holds

$$\begin{aligned}&\min \left( \mathop {\dot{\bigcup }}\limits _{1\le i,j \le k}\min \left( \mathop {\dot{\bigcup }}\limits _{\begin{array}{c} x,y\in X_i\cup X_j\\ x\ne y \end{array}}\mathrm{RED}_{\{x,y\}}(A)\right) \right) \nonumber \\&\quad \equiv \mathrm{PI}\left( \bigwedge _{1\le i,j \le k}\mathrm{PI}(f_{\mathrm{IS}_{i,j}}( a^*_1,\ldots ,a^*_m))\right) \end{aligned}$$
(14)

It is left to show that the left side of the above equivalence is the reduct set (1), and the right one is the set of prime implicants (2). Case (1) holds by the following corollary.

Corollary 1

(of Theorem 1) Let \(\mathrm{IS}=(U,A)\) be an information system and \(\{X_1,X_2,\ldots ,X_k\}\) be a covering of \(U,\) where \(k>1\). The following holds

$$\begin{aligned}&\mathrm{RED}(A)\nonumber \\&\quad =\min \left( \mathop {\dot{\bigcup }}\limits _{1\le i<j \le k}\min \left( \mathop {\dot{\bigcup }}\limits _{\begin{array}{c} x,y\in X_i\cup X_j,\\ x\ne y \end{array}}\mathrm{RED}_{\{x,y\}}(A)\right) \right) \end{aligned}$$
(15)

Case (2) holds by the following propositions.

Proposition 12

Let \(\mathrm{IS}=(U,A)\) be an information system\(,\) \(\{X_1,X_2,\ldots ,X_k\}\) be a covering of \(U\) \((k>1)\) and \(\mathrm{IS}_{ij}=(X_i\cup X_j,A)\) be an information subsystem of \(\mathrm{IS}\) such that \(X_i, X_j\subset U\). The following holds

$$\begin{aligned} f_{\mathrm{IS}}( a^*_1,\ldots ,a^*_m)\Leftrightarrow \bigwedge _{1\le i,j \le k}f_{\mathrm{IS}_{i,j}}( a^*_1,\ldots ,a^*_m) \end{aligned}$$
(16)

Proposition 13

Let \(\mathrm{IS}=(U,A)\) be an information system\(,\) \(\{X_1,X_2,\ldots ,X_k\}\) be a covering of \(U\) \((k>1)\) and \(\mathrm{IS}_{ij}=(X_i\cup X_j,A)\) be an information subsystem of \(\mathrm{IS}\) such that \(X_i, X_j\subset U\). The following holds

$$\begin{aligned}&\!\!\!\mathrm{PI}(f_{\mathrm{IS}}( a^*_1,\ldots ,a^*_m)) \nonumber \\&\quad \Leftrightarrow \mathrm{PI}\left( \bigwedge _{1\le i,j \le k}\mathrm{PI}(f_{\mathrm{IS}_{i,j}}( a^*_1,\ldots ,a^*_m))\right) \end{aligned}$$
(17)

One can analogously show that the time complexity of the \(Compute\_rel\_red\) algorithm is equal to or less than that of the relative discernibility matrix-based attribute reduction method applied to the whole decision table.

The space complexity of both the algorithms (i.e. \(Compute\_red\) and \(Compute\_rel\_red\)) depends on the attribute reduction method used to compute subreducts and the size of data subset. One can estimate that the space complexity is between \(O(m)\) and \(O(m^2)\), where \(m\) is the size of data subset. It is assumed here that the data are process in a batch mode, i.e. the whole data are placed in the memory. The space of the maximal size is needed when the discernibility matrix-based attribute reduction method is directly used and we have to store values of \(\frac{m(m-1)}{2}\) matrix cells. One should note that for respectively small subtables the space complexity is determined by the number of possible reducts.

6 Experiments

This section describes experimental research that concerns attribute reduction of information systems using the approach proposed in this paper.

The approach was tested on 15 datasets taken from the UCI Repository (UCI Repository 2014). The discernibility matrix method was employed to compute reducts from decomposed data (datasets were decomposed into 4, 10, and 25 middle subtables) as well as from non-decomposed data for performance comparison. The approach was implemented in C++ and tested using a laptop (Intel Core i5, 2.3 GHz, 4 GB RAM, Windows 7).

Tables 1, 2, and 3 include the basic characteristic of each dataset (name, the number of attributes and objects), information on the number of reducts (for decomposed data, the average and the maximal (in brackets) number of subreducts are given), time taken to compute reducts (in seconds), and the ratio that expresses an increase or decrease of the time needed for computing reducts from decomposed data. A progress with respect to the non-decomposed data is written in bold. A progress with respect to the previous decomposition is written in italics.

Table 1 Attribute reduction for datasets with fewer than 10 attributes
Table 2 Attribute reduction for datasets with fewer than 20 attributes
Table 3 Attribute reduction for datasets with more than 20 attributes

One can observe that a shortening of the time needed for computing reducts occurs more often for datasets with fewer attributes. The shortening also clearly depends on the number of reducts. If that number is very small (one or two reducts), then the computation time is more likely to be shorter. For datasets with bigger number of reducts other regularity is observed. Namely, a bigger number of middle subtables translates, in general, into a longer time needed for computing reducts. For some datasets, a small increase of the number of middle subtables can considerably lengthen the computation time. For example, for the Turkiye student evaluation dataset we can observe small time increases for 4 and 10 subtables and an over fourfold increase for 20 subtables. Furthermore, it is needed more than half an hour to compute the reducts for 25 subtables. The reason of this phenomenon can be searched in a big increase of the number of subreducts (see the maximal number of subreducts for the Australian credit approval, mushroom, and Turkiye student evaluation datasets).

7 Discussion

This section discusses features of the proposed approach.

An important feature is the reduction of space complexity. Namely, a table that does not fit in the memory can be split into smaller data subtables. The size of subtables can be arbitrary small (i.e. at least two objects). However, the number of final subtables increases squarely with respect to that of middle subtables (i.e. the increase is \(\frac{k(k-1)}{2}\) for \(k\) middle subtables).

In spite of computations on many subtables, the time complexity of the approach does not have to increase compared with the direct attribute reduction. As shown, the approach in conjunction with the discernibility matrix-based attribute reduction method is equivalent to the direct attribute reduction of the whole table using this method. The difference is that in the former case subtasks (computations of subreducts) of the attribute reduction task are specified. This specification does not theoretically cause an increase in the number of computations. Thanks to this, the change of the size of subtables the table is to be split into does not influence the time complexity.

In practice, the number of computations can increase if the reducibility of the discernibility matrices of subtables is suitably smaller than that for the whole table. In such a case, we have more operations when computing and joining subreducts. The below example illustrates the problem of reducibility.

Example 5

Let \(X=\{1,2,6\}\). The discernibility matrix of \((X,A)\) consists of cells \(\{t,h,w,f\}, \{t,w,n\}, \{t,h,n,f\}\) and is not reducible. The part of the discernibility matrix of \((U,A)\) defined by all pairs of objects from \(X\) reduces to \(\emptyset \) due to the cell \(\{t\}\) that is included in other part of the discernibility matrix of \((U,A)\).

Another essential feature of the approach is that computations on subtables are performed independently from one another. Thanks to this, subreducts can be computed using any existing tool for attribute reduction.

The mentioned-above features causes the approach to be suitable for hardware implementation of attribute reduction (using programmable devices FPGAs). Hardware implementation can significantly speed up attribute reduction. Namely, thanks to parallelism, basic operations such as computation of discernibility matrix cells can be done in one cycle. This step was implemented and tested for the purpose of the computation of attribute core (Grzes et al. 2013). The memory capacity limitation of FPGA is not a barrier for large data processing since it can be split into suitably small portions. However, the division of a data table into many subtables can in practice extend the time needed for joining a huge number of subreducts. Namely, for suitably large datasets the number of reducts of a smaller subtable is usually higher (see Tables 1, 2, 3).

8 Related works

This section compares the proposed approach with other rough set-based approaches that employ horizontal data decomposition for attribute reduction.

In Deng and Huang (2006) it was observed that the discernibility matrix of a decision table used for computing reducts can be represented by its submatrices. The universe is partitioned into subsets and each submatrix corresponds to one subset or to the sum of two subsets. The conjunction of the discernibility functions derived from the submatrices equals to that of the original matrix. The method enables to divide the decision table into arbitrary small subtables but it requires the use of a concrete attribute reduction method, i.e. one that is based on the discernibility matrix.

A decomposition and merging reduction method was proposed in Hu et al. (2006). For each of two subtables, formed by partitioning the universe, the reduct set is computed using any attribute reduction method. To obtain the final reduct set, the two reduct sets are merged by applying a method that is based on the notion of collision elementary set. This solution decreased the computation time compared with a standard attribute reduction method, but it is developed for two subtables only.

A core attributes based method proposed in Ye and Wu (2010) generates minimal reducts. The universe is divided into equivalence classes defined by the indiscernibility relation generated by the core attributes. Each subtable includes objects of one equivalence class and is limited to the attributes not belonging to the core. All possible unions of reducts (each reduct comes from a different subtable) are computed. The reducts of the minimal cardinality unions are expanded by the core attributes forming thereby the minimal reducts. Thanks to using the core for construction of subtables, the method reduces the number of both objects and attributes. However, the whole decision table is needed to compute the core and the solution can only be applied if the core is not empty.

The solution proposed in this paper is most similar to that from Deng and Huang (2006). Namely, the latter is almost a special case of the former that uses the discernibility matrix to compute reducts of a decision system. The difference is in the way the universe is partitioned. Each universe subset in the approach proposed in this paper includes objects from one class only. Thanks to this, the subsets do not need to be scan separately to compute relative reducts (only the sum of subsets are processed).

9 Conclusions

The problem of attribute reduction still needs new solutions to meet the challenges posed by large data.

This paper has proposed and investigated new definitions of attribute reduction using horizontal data decomposition. Algorithms for computing reducts of an information system as well as decision table have been developed and evaluated. The main benefits of the proposed approach are summarized as follows.

  1. 1.

    Data table can be split into arbitrarily small subtables (at least two objects).

  2. 2.

    Reduct sets of subtables are computed independently from one another using any standard attribute reduction method.

  3. 3.

    Compared with direct attribute reduction methods, the approach decreases the space complexity (it is determined by the size of subtables obtained during the decomposition) and achieves the same or less theoretical time complexity (it depends on the method used to compute reducts of subtables).

  4. 4.

    The time needed for computing reducts of information systems can be shorten when the data table includes fewer attribute (up to 10) or the number of reducts is small (one or two).

Future research includes more extensive experiments (the computation of reducts of decision tables, other methods for computing all reducts) and hardware implementation of the proposed method using FPGA devices.