Keywords

1 Introduction

Rough set theory [1], as a powerful mathematical tool to deal with insufficient, incomplete or vague information, has been widely used in many fields. In rough set theory, the study of attribute reduction seems to attract more attention than that of rule acquisition. But in recent years there have been more and more studies involving the decision rule acquisition. Papers [2, 3] gave discernibility matrix or the discernibility function-based methods to acquire decision rules. These methods are able to acquire all minimum rule sets for a given decision system theoretically, but they usually would pay both huge time cost and huge space cost, which extremely narrow their applications in real life. In addition, paper [4] discussed the problem of producing a set of certain and possible rules from incomplete data sets based on rough sets and gave corresponding rule learning algorithm. Paper [5] discussed optimal certain rules and optimal association rules, and proposed two quantitative measures, random certainty factor and random coverage factor, to explain relationships between the condition and decision parts of a rule in incomplete decision systems. Paper [6] also discussed the rule acquisition in incomplete decision contexts. This paper presented the notion of an approximate decision rule, and then proposed an approach for extracting non-redundant approximate decision rules from an incomplete decision context. But the proposed method is also based on discernibility matrix and discernibility function, which determines that it is relatively difficult to acquire decision rules from large data sets.

Attribute-value block technique is an important tool to analyze data sets [7, 8]. Actually, it is a granulation method to deal with data. Our paper will use the attribute-value block technique and other related techniques to systematically study realization methods of rule reduction and rule set minimum, and propose effective algorithms of reducing decision rules and minimizing decision rule sets. These algorithms, together with related attribute reduction algorithm, constitute an effective solution to the acquisition of minimum rule sets, which is a kind classifier and can be used for class prediction.

The rest of the paper is organized as follows. In Sect. 2, we review some basic notions linked to decision systems. Section 3 introduces the concept of minimum rule sets. Section 4 gives specific algorithms for rule reduction and rule set minimum based on attribute-value blocks. In Sect. 5, some experiments are conducted to verify the effectiveness of the proposed methods. Section 6 concludes this paper.

2 Preliminaries

In this section, we first review some basic notions, such as attribute-value blocks, decision rule sets, which are prepared for acquiring minimum rule sets in next sections.

2.1 Decision Systems and Relative Reducts

A decision system (DS) can be expressed as the following 4-tuple: \( DS = (U,A = C \cup D,V = \bigcup\limits_{a \in A} {V_{a} } , \, \left\{ {f_{a} } \right\}) \), where U is a finite nonempty set of objects; C and D are condition attribute set and decision attribute set, respectively, and \( {\text{C }} \cap {\text{ D}} = \emptyset ; \) V a is a value domain of attribute a; f a : U → V is an information function from U to V, which maps an object in U to a value in V a .

For simplicity, \( (U,A = C \cup D,V = \bigcup\limits_{a \in A} {V_{a} } , \, \left\{ {f_{a} } \right\}) \) is expressed as \( (U,C \cup D) \) if V and f a are understood. Without loss of generality, we suppose D is supposed to be composed of only one attribute.

For any \( B\, \subseteq \,C \), let \( U/B = \{ [x]_{B} \,|\,x \in U\} \), where \( [x]_{B} = \{ y \in U\,|\,f_{a} (y) = f_{a} (x)\,{\text{for}}\,\,{\text{any}}\,a \in B\} \), which is known as equivalence class. For any subset \( X\, \subseteq \,U \), the lower approximation \( \underline{BX} \) and the upper approximation \( \overline{BX} \) of X with respect to B are defined by: \( \underline{BX} = \{ x \in U\,|\,[x]_{B} \, \subseteq \,X\} \), \( \overline{BX} = \{ x \in U\,|\,[x]_{B} \cap X \ne \phi \} . \) And then the concepts of positive region POS B (X), boundary region BND B (X) and negative region NEG B (X) of X are defined as: \( POS_{B} (X) = \underline{BX} \), \( BND_{B} (X) = \overline{BX} - \underline{BX} \), \( NEG_{B} (X) = U - \overline{BX} \).

Suppose that \( U/D = \{ [x]_{D} \,|\,x \in U\} = \left\{ {D_{ 1} ,D_{ 2} , \ldots ,D_{m} } \right\} \), where m = |U/D|, D i is a decision class, \( i \in \left\{ { 1, 2, \ldots ,m} \right\} \). Then for any \( B\, \subseteq \,C \), the concepts of positive region POS B (D), boundary region BND B (D) and negative region NEG B (D) of a decision system \( (U,C \cup D) \) can be defined as follows:

$$ \begin{aligned} & POS_{B} (D) = POS_{B} (D{}_{1}) \cup POS_{B} (D{}_{2}) \cup \ldots \cup POS_{B} (D{}_{m}), \\ & BND_{B} (D) = BND_{B} (D_{1} ) \cup BND_{B} (D_{2} ) \cup \ldots \cup BND_{B} (D_{m} ), \\ & NEG_{B} (D) = U - POS_{B} (D) \cup BND_{B} (D). \\ \end{aligned} $$

With the positive region, the concept of reducts can be defined as follows: given a decision system \( (U,C \cup D) \) and \( B\, \subseteq \,C \), B is a relative reduct of C with respect to D if the following conditions are satisfied: (1) \( POS_{B} \left( D \right) = POS_{C} (D) \), and (2) for any \( a \in B,POS_{{B - \{ a\} }} \left( D \right) \ne POS_{B} (D) \).

2.2 Decision Logic and Attribute-Value Blocks

Decision rules are in fact related formulae in decision logic. In rough set theory, a decision logic language depends on a specific information system, while a decision system \( (U,C \cup D) \) can be regarded as being composed of two information systems: (U, C) and (U, D). Therefore, there are two corresponding decision logic languages, while attribute-value blocks just act as a bridge between the two languages. For the sake of simplicity, let \( IS\left( B \right) = (U,B,V = \bigcup\limits_{a \in A} {V_{a} } , \, \left\{ {f_{a} } \right\}) \) is an information system with respect to B, where \( B\, \subseteq \,C \)or \( B\, \subseteq \,D \). Then a decision logic language DL(B) is defined as a system being composed of the following formulae [3]:

  1. (1)

    (a, v) is an atomic formula, where \( a \in B,v \in V_{a} \);

  2. (2)

    an atomic formula is a formula in DL(B);

  3. (3)

    if φ is a formula, then ~φ is also a formula in DL(B);

  4. (4)

    if both φ and ψ are formulae, then φ˅ψ, φ˄ψ, φ → ψ, φ ≡ ψ are all formulae;

  5. (5)

    only the formulae obtained according to the above Steps (1) to (4) are formulae in DL(B).

The atomic formula (a, v) is also called attribute-value pair [7]. If φ is a simple conjunction, which consists of only atomic formulae and connectives ∧, then φ is called a basic formula.

For any \( x \in U \), the relationship between x and formulae in DL(B) is defined as following:

  1. (1)

    \( x\,| = \left( {a,v} \right){\text{ iff}}\,\,f_{a} \left( x \right) = v; \)

  2. (2)

    \( x\,| = \,\sim \varphi \,\,{\text{iff}}\,\,{\text{not}}\,\,x| = \varphi ; \)

  3. (3)

    \( x\,| = \varphi \wedge \psi \,\,{\text{iff}}\,\,x| = \varphi \,\,{\text{and}}\,\,x| = \psi ; \)

  4. (4)

    \( x\,| = \varphi \vee \psi \,\,{\text{iff}}\,\,x| = \varphi \,\,{\text{or}}\,\,x| = \psi ; \)

  5. (5)

    \( x\,| = \varphi \to \psi \,\,{\text{iff}}\,\,x| = \sim \varphi \vee \psi ; \)

  6. (6)

    \( x\,| = \varphi \equiv \psi \,\,{\text{iff}}\,\,x\,| = \varphi \to \psi \,\,{\text{and}}\,\,x\,| = \psi \to \varphi . \)

For formula φ, if \( x\,| = \varphi \), then we say that the object x satisfies formula φ. Let \( [\varphi ] = \{ x \in U\left| x \right| = \varphi \} \), which is the set of all those objects that satisfy formula φ. Obviously, formula φ consists of several attribute-value pairs by using connectives. Therefore, [φ] is so-called an attribute-value block and φ is called the (attribute-value pair) formula of the block. For DL(C) and DL(D), they are distinct decision logic languages and have no formulae in common. However, through attribute-value blocks, an association between DL(C) and DL(D) can be established. For example, suppose \( \varphi \in DL\left( C \right) \) and \( \psi \in DL\left( D \right) \) and obviously φ and ψ are two different formulae; but if \( [\varphi ]\, \subseteq \,[\psi ] \), we can obtain a decision rule φ → ψ. Therefore, attribute-value blocks play an important role in acquiring decision rules, especially in acquiring certainty rules.

3 Minimum Rule Sets

Suppose that \( \varphi \in DL\left( C \right) \) and \( \psi \in DL(D) \). Implication form φ → ψ is said to be a (decision) rule in decision system \( (U,C \cup D) \). If both φ and ψ are basic formula, then φ → ψ is called basic decision rule. A decision rule is not necessarily useful unless it satisfies some given indices. Below we introduce these indices.

A decision rule usually has two important measuring indices, confidence and support, which are defined as: \( conf(\varphi \to \psi ) = |[\varphi ] \cap [\psi \left] {\left| / \right|} \right[\varphi ]|,sup(\varphi \to \psi ) = |[\varphi ] \cap [\psi ]\left| / \right|U| \), where conf(φ → ψ) and sup(φ → ψ) are confidence and support of decision rule φ → ψ, respectively.

For decision system \( DS = (U,C \cup D) \), if rule φ → ψ is true in \( DL(C \cup D) \), i.e., for any \( x \in Ux\,| = \,\varphi \to \psi \) , then rule φ → ψ is said to be consistent in DS, denoted by | =  DS φ → ψ; if there exists at least object \( x \in U \) such that \( x\,| = \, \varphi \wedge \psi \), then rule φ → ψ is said to be satisfiable in DS. Consistency and satisfiability are the basic properties that must be satisfied by decision rules.

For object \( x \in U \) and decision rule r: φ → ψ, if x | = r, then it is said that rule r covers object x, and let \( {\mathbf{coverage}}\left( r \right) = \{ x \in U\left| {x\,} \right| = r\} \), which is the set of all objects that are covered by rule r; for two rules, r 1 and r 2, if \( {\text{coverage}}\left( {r_{ 1} } \right)\, \subseteq \,{\text{coverage}}(r_{ 2} ) \), then it is said that r 2 functionally covers r 1, denoted by \( r_{ 1} \le r_{ 2} \). Obviously, if there exist such two rules, then rule r 1 is redundant and should be deleted, or in other words, those rules that are functionally covered by other rules should be removed out from rule sets.

In addition, for a rule φ → ψ, we say that φ → ψ is reduced if \( \left[ \varphi \right]\, \subseteq \,[\psi ] \) does not hold any more when any attribute-value pair is removed from φ. And this is just known as rule reduction, which will be introduced in next section.

A decision rule set is said to be minimal if it satisfies the following properties [3]: (1) any rule in should be consistent; (2) any rule in should be satisfiable; (3) any rule in should be reduced; (4) for any two rules r 1, \( r_{ 2} \in \wp \), neither \( r_{ 1} \le r_{ 2} \,\,{\text{nor}}\,\,r_{ 2} \le r_{ 1} \).

In order to obtain a minimum rule set from a given data set, it is required to complete three steps: attribute reduction, rule reduction and rule set minimum. This paper does not introduce attribute reduction methods any more, and we try to propose new methods for rule reduction and for rule set minimum in next sections.

4 Methods of Acquiring Decision Rules

4.1 Rule Reduction

Rule reduction is to keep the minimal attribute-value pairs in a rule such that the rule is still consistent and satisfiable by removing redundant attributes from the rule. For the convenience of discussion, we let r(x) denote a decision rule that is generated with object x, and introduce the following definitions and properties.

Definition 1.

For decision system \( DS = (U,C \cup D),B = \left\{ {a_{ 1} ,a_{ 2} , \ldots ,a_{m} } \right\}\, \subseteq \,C\,\,{\text{and}}\,\,x \in U \), let \( \varvec{pairs}{\mathbf{(}}\varvec{x,B}{\mathbf{)}} = (a_{1} ,f_{{a_{1} }} (x)) \wedge (a_{2} ,f_{{a_{2} }} (x)) \wedge \ldots \wedge (a_{m} ,f_{{a_{m} }} (x)) \) and let \( \varvec{block}{\mathbf{(}}\varvec{x,B}{\mathbf{)}} = \left[ {pairs(x,B)} \right] = \left[ {(a_{1} ,f_{{a_{1} }} (x)) \wedge (a_{2} ,f_{{a_{2} }} (x)) \wedge \ldots \wedge (a_{m} ,f_{{a_{m} }} (x))} \right] \), and the number m is called the lengths of pairs(x, B) and block(x, B), denoted by | pairs(x, B)| and |block(x, B)|, respectively.

Property 1.

Suppose \( B_{ 1} ,B_{ 2} \, \subseteq \,C \) with \( B_{ 1} \, \subseteq \,B_{ 2} \), then \( block\left( {x,B_{ 2} } \right)\, \subseteq \,block(x,B_{ 1} ) \).

The proof of Property 1 is straightforward. According to this property, for an attribute subset B, block(x, B) increases with removing attributes from B, but with the prerequisite that block(x, B) does not “exceed” the decision class [x] D , to which x belongs. Therefore, how to judge whether block(x, B) is still contained in [x] D or not is crucial for rule reduction.

Property 2.

For decision system \( DS = (U,C \cup D) \) and \( B\, \subseteq \,C,block\left( {x,B} \right)\, \subseteq \,[x]_{D} ( = block(x,D)) \) if and only if f d (y) = f d (x) for all \( y \in block(x,B) \).

The proof of Property 2 is also straightforward. This property shows that the problem of judging whether block(x, B) is contained in [x] D becomes that of judging whether f d (y) = f d (x) for all \( y \in block(x,B) \). Evidently, the latter is much easier than the former. Thus, we give the following algorithm for reducing a decision rule.

The time-consuming step in this algorithm is to compute block(x, B), whose comparison number is |U||B|. Therefore, the complexity of this algorithm is O(|U||C|2) in the worst case. According to Algorithm 1, it is guaranteed at any time that \( block\left( {x,B} \right)\, \subseteq \,[x]_{D} = block(x,D) \), so the confidence of rule r(x) is always equal to 1.

4.2 Minimum of Decision Rule Sets

Using Algorithm 1, each object in U can be used to generate a rule. This means that after reducing rules, there are still |U| rules left. Obviously, there must be many rules that are covered by other rules, and hereby we need to delete those rules which are covered by other rules.

For decision system \( (U,C \cup D) \), after using Algorithm 1 to reduce each object \( x \in U \), all generated rules r(x) constitute a rule set, denoted by RS, i.e., \( RS = \{ r\left( x \right) \, |x \in U\} \). Obviously, |RS| = |U|. Our purpose in this section is to delete those rules which are covered by other rules, or in other words, to minimize RS such that each of the remaining rules is consistent, satisfiable, reduced, and is not covered by other rules.

Suppose \( V_{d} = \left\{ {v_{ 1} ,v_{ 2} , \ldots ,v_{t} } \right\} \). We use decision attribute d to partition U into t attribute-value blocks (equivalence classes): \( \left[ {(d,v_{ 1} )], \, } \right[(d,v_{ 2} )], \ldots ,[(d,v_{t} )] \). Let \( U_{{v_{i} }} = \left[ {\left( {d,v_{i} } \right)} \right] \), and thus \( \bigcup\limits_{{i \in \{ 1,2, \ldots ,t\} }} {U_{{v_{i} }} } = U \) and \( U_{{v_{i} }} \cap U_{{v_{j} }} = \emptyset \), where \( i \ne j,i,j \in \left\{ { 1, 2, \ldots ,t} \right\} \). Accordingly, let \( RS_{{v_{i} }} = \{ r\left( x \right)\, \, |\,x \in U_{{v_{i} }} \} ,{\text{ where}}\,\,i \in \left\{ { 1, 2, \ldots ,t} \right\} \). Obviously, {\( RS_{{v_{i} }} \) | i ∈ {1,2,…,t}} is a partition of RS. According to Algorithm 1, for any \( r^{\prime} \in RS_{{v_{i} }} \) and \( r^{\prime\prime} \in RS_{{v_{j} }} \), where i ≠ j, neither \( r^{\prime} \le r^{\prime\prime}\,{\text{nor }}r^{\prime\prime} \le r^{\prime}, \) because \( {\text{coverage}}(r^{\prime})\, \subseteq \,U_{{v_{i} }} \) while \( {\text{coverage}}(r^{\prime\prime})\, \subseteq \,U_{{v_{j} }} \) and then \( {\text{coverage}}(r^{\prime}) \cap {\text{coverage}}(r^{\prime\prime}) = \emptyset . \) This means that a rule in \( RS_{{v_{i} }} \) does not functionally covers any rule in \( RS_{{v_{j} }} \). Thus, we can independently minimize each \( RS_{{v_{i} }} \), and the union of all the generated rule subsets is the final minimum rule set that we want.

Let independently consider \( RS_{{v_{i} }} \), where \( i \in \left\{ { 1, 2, \ldots ,t} \right\} \). For \( r\left( x \right) \in RS_{{v_{i} }} \), if there exists \( r(y) \in RS_{{v_{i} }} \) such that \( r\left( x \right) \le r\left( y \right)(r\left( y \right)\,{\text{functionally}}\,\,{\text{covers}}\,\, \, r(x)) \) , where x ≠ y, then r(x) should be removed from \( RS_{{v_{i} }} \), otherwise it should not. Suppose after removing, the set of all remaining rules in \( RS_{{v_{i} }} \) is denoted by \( RS^{\prime}_{{v_{i} }} \), and thus we can give an algorithm for minimizing \( RS_{{v_{i} }} \), which is described as follows.

In Algorithm 2, judging if \( x_{j} \in {\text{coverage}}\left( r \right) \) takes at most |C| comparison times. But because all rules in \( RS_{{v_{i} }} \) have been reduced by Algorithm 1, the comparison number should be much smaller than |C|. Therefore, the complexity of Algorithm 2 is \( O(q^{ 2} \cdot|C|) = O(|U_{{v_{i} }} |^{ 2} \cdot|C|) \) in the worst case.

4.3 An Algorithm for Acquiring Minimum Rule Sets

Using the above proposed algorithms and related attribute reduction algorithms, we now can give an entire algorithm for acquiring a minimum rule set from a given data set. The algorithm is described as follows.

In Algorithm 3, there are three steps used to “evaporating” redundant data: Steps 2, 3, 5. These steps also determine the complexity of the entire algorithm. Actually, the newly generated decision system \( (U^{\prime},R \cup D) \) in Step 2 is completely determined by Step 1, which is attribute reduction and has the complexity of about O(|C|2|U|2). The complexity of Step 3 is O(|U′|2|C|2) in the worst case. Step 5’s complexity is \( O(|U^{\prime}_{{v_{1} }} |^{ 2} \cdot \left| C \right|) + O(|U^{\prime}_{{v_{2} }} |^{ 2} \cdot \left| C \right|) + \ldots + O(|U^{\prime}_{{v_{t} }} |^{ 2} \cdot \left| C \right|) \). Because this step can be performed in parallel, so it can be more efficient under parallel environment. Generally, after attribute reduction, the size of a data set would greatly decrease, i.e., |U′| << |U|. Therefore, computation time of Algorithm 3 is mainly determined by Step 1, so it has the complexity of O(|C|2|U|2) in most cases.

5 Experiment Analysis

This section aims to verify the effectiveness of the proposed methods through experiments. There are four UCI data sets (http://archive.ics.uci.edu/ml/datasets.html) used in our experiments, and they are outlined in Table 1. For missing values, they were replaced with the most frequently occurring value on the corresponding attribute.

Table 1. Description of the four data sets.

We executed Algorithm 3 on the four data sets to obtain minimum rule sets. Suppose that the set of finally obtained decision rules on each data set is denoted by minRS. The indices that we are interesting in and their meanings are as follows.

  • Number of rules: |minRS|, i.e., the number of decision rules in minRS

  • Average value of support: \( \frac{1}{|minRS|}\sum\limits_{r \in minRS} {sup(r)} \), and \( {\text{minValue}} = \mathop {\hbox{min} \{ sup(r)\} }\limits_{r \in minRS} \), \( {\text{maxValue}} = \mathop {\hbox{max} \{ sup(r)\} }\limits_{r \in minRS} \)

  • Average value of confidence: \( \frac{1}{|minRS|}\sum\limits_{r \in minRS} {conf(r)} \)

  • Evaporation ratio: the ratio of removed items (attribute values) to all items (all attribute values)

  • Running time: the running time of Algorithm 3, which includes attribute reduction, rule reduction and minimum of decision rule sets, and this index is measured in seconds.

The experimental results on the four data sets are shown in Table 2.

Table 2. Experimental results on the four data sets

From Table 2, it can be found that the obtained rule sets on the four data sets all have very high evaporation ratio, and each rule in these rule sets has certain support. Specially, there are averagely 0.0689*8124 = 560 objects supporting each rule in the rule set obtained on Mushroom. This shows that these rule sets have relatively strong generalization ability. Furthermore, the running time of Algorithm 3 on each data set is not long and hereby can be accepted by users. In addition, Algorithm 1 can guarantee at any time that \( block\left( {x,B} \right)\, \subseteq \,[x]_{D} = block\left( {x,D} \right) \) for all \( x \in U \), so the confidence of each rule is always equal to 1, or in other words all the obtained decision rules are deterministic. All these results demonstrate Algorithm 3 is effective and has better application value.

6 Conclusion

Acquiring decision rules from data sets is an important task in rough set theory. This paper conducted our study through the following three aspects so as to provide an effective granulation method to acquire minimum rule sets. Firstly, we introduced decision logic language and attribute-value block technique. Secondly, we used attribute-value block technique to study how to reduce rules and to minimize rule sets, and then proposed effective algorithms for rule reduction and rule set minimum. Thus, together with related attribute reduction algorithm, the proposed granulation method constituted an effective solution to the acquisition of minimum rule sets, which is a kind classifier and can be used for class prediction. Thirdly, we conducted a series of experiments to show that our methods are effective and feasible.