An efficient method for mining association rules based on minimum single constraints

Mining association rules with constraints allow us to concentrate on discovering a useful subset instead of the complete set of association rules. With the aim of satisfying the needs of users and improving the efficiency and effectiveness of mining task, many various constraints and mining algorithms have been proposed. In practice, finding rules regarding specific itemsets is of interest. Thus, this paper considers the problem of mining association rules whose left-hand and right-hand sides contain two given itemsets, respectively. In addition, they also have to satisfy two given maximum support and confidence constraints. Applying previous algorithms to solve this problem may encounter disadvantages, such as the generation of many redundant candidates, time-consuming constraint check and the repeated reading of the database when the constraints are changed. The paper proposes an equivalence relation using the closure of itemset to partition the solution set into disjoint equivalence classes and a new, efficient representation of the rules in each class based on the lattice of closed itemsets and their generators. The paper also develops a new algorithm, called MAR-MINSC, to rapidly mine all constrained rules from the lattice instead of mining them directly from the database. Theoretical results are proven to be reliable. Because MAR-MINSC does not meet drawbacks above, in extensive experiments on many databases it obtains the outstanding performance in comparison with some of existing algorithms in mining association rules with the constraints mentioned.


Introduction
For the aim of not only reducing the burden of storage and execution time but also rapidly responding to the demand of users, constraint-based data mining has attracted much interest and attention from researchers. At the beginning, they have designed algorithms to mine data with primitive constraints. A typical example is the one of the frequent itemset discoveries in a transaction database where the primitive constraint is a minimum frequency constraint. Based on frequent itemsets, association rules are mined, where the minimum confidence constraint is other primitive one. More concretely, let T = (O, A, R) be a binary database, where O is a nonempty set that contains objects (or transactions), A is a set of attributes (or items) appearing in these objects and R is a binary relation on O × A. The cardinalities of A and O are denoted as m = |A| and n = |O|, respectively (m and n are often very large). Let us denote s 0 as the minimum support threshold and c 0 as minimum confidence threshold, where s 0 , c 0 ∈ (0; 1]. The task is to mine frequent itemsets and association rules from T . A basic problem, named (P 1 ), is that the cardinalities of frequent itemset class F S(s 0 ) and association rule set ARS(s 0 , c 0 ) in the worst case are of exponent, i.e., Max(#F S(s 0 )) = 2 m − 1 = O(2 m ) and Max(#ARS(s 0 , c 0 )) = 3 m − 2 m+1 + 1 = O(3 m ). Therefore, extant algorithms remain riddled with limitations regarding the mining time and the main memory in case the size of T is quite large. Moreover, for rules that were discovered, it is difficult for users to quickly find the quite small subset of interest if there only have the constraints about support and confidence. To solve this problem (P 1 ), many more complicated constraints have been introduced into algorithms to only generate association rules related directly to the user's true needs, and to reduce the cost of the mining. Monotonic and antimonotonic constraints, denoted as C m and C am respectively, are considered by Nguyen et al. [25]. They are pushed into an Apriori-like algorithm, named CAP, to reduce the frequent itemsets computation. In [7], the problem is restricted in two constraints that are the consequent and the minimum improvement. Srikant et al. [30] present the problem of mining association rules that include the given items in their two sides. A three-phase algorithm is proposed for mining those rules. First, the constraint is integrated into the Apriori-like candidate generation procedure to find only candidates that contain the selected items. Second, an additional scanning of the database is executed to count the support of the subsets of each mined frequent itemset. Finally, an algorithm based on Apriori principle is applied to generate rules. The concept of convertible constraint is introduced and pushed within the mining process of an FP-growth based algorithm [28]. The authors show that, since frequent itemset mining is based on the concept of prefix-itemsets, it is very easy to integrate convertible constraints into FP-growth-like algorithms. They also state that pushing these constraints into Apriorilike algorithms is not possible. Due to huge input databases, Bonchi et al. [8] propose data reduction techniques and they have been proven to be quite effective in cases of pushing convertible constraints into a level-wise computation. The authors in [21] design the algorithms for discovering association rules with multi-dimension constraints.
By combining the power of the condensed representation (closed itemsets and generators) of frequent itemsets with the properties of C m and C am constraints, in [2,3,16,17], we consider some different item constraints and propose efficient algorithms to mine-constrained frequent itemsets. In detail, the work in [2] is to mine all frequent itemsets contained in a specific itemset. An algorithm, called MINE_FS_CONS, has been proposed to do this task. In [3], the efficient algorithms MFS-CC and MFS-IC for mining frequent itemsets with the dualistic constraints are presented. They are built based on the explicit structure of frequent itemset class. The class is split into two sub-classes. Each sub-class is found by applying the efficient representation of itemsets to the suitable generators. And in [16,17], we consider the problem of mining frequent itemsets that (i) include a given subset and (ii) contain no items of another specific subset, or only satisfy the condition (i). Mining frequent itemsets that satisfy both (i) and (ii) is quite complicated because there is a tradeoff among these constraints. However, with a suitable approach, the papers propose efficient algorithms, named MFS-Contain-IC and MFS_DoubleCons, for discovering frequent itemsets with the constraints mentioned.
It is noted that, our results above only relate directly to frequent itemsets. We, in this paper, are interested in extending the result presented in [16] to association rule mining with many different constraints. The approach based on frequent closed itemset and their generators is still used but the problem is much more complicated. Firstly, let us state our problem as in sub-section below.

Problem statement
Before stating the problem of our study, we present some common concepts and related notations. Given T = (O, A, R), a set X ⊆ A is called an itemset. The support of an itemset X, denoted by supp(X), is the ratio of the number of transactions containing X and N, the number of transactions in T . Let s 0 , s 1 be the minimum and maximum support thresholds, respectively, where 0 < 1/n ≤ s 0 ≤ s 1 ≤ 1 and n = |O|. A non-empty itemset A is called frequent iff 1 s 0 ≤ supp(A) ≤ s 1 (if s 1 is equal to 1, then the traditional frequent itemset concept is obtained). For any frequent itemset S , we take a non-empty, proper subset L from S (∅ = L ⊂ S ) and R ≡ S \L . Then, r : L → R is a rule created by L , R (or by L , S ) and its support and confidence are determined by supp(r) ≡ supp(S ) and conf(r) ≡ supp(S )/supp(L ), respectively. The minimum and maximum confidence thresholds are denoted by c 0 and c 1 , respectively, where 0 < c 0 ≤ c 1 ≤ 1. The rule r is called an association rule in the traditional manner iff c 0 ≤ conf(r) and s 0 ≤ supp(r) and the set of all association rules is denoted by The present study considers the problems that comprise many constraints about support, confidence and sub-items. Such a problem is stated as follows. For additional constraints on two sides of rule, L 0 , R 0 ⊆ A, the goal is to discover all association rules r : L → R so that their supports and confidences meet the conditions, s 0 ≤ supp(r) ≤ s 1 , c 0 ≤ conf(r) ≤ c 1 , and their two sides contain the item constraints, L ⊇ L 0 , R ⊇ R 0 , called minimum single constraints. The problem can be described formally as follows.
For discussing about the constraints of the problem, it is noted that if s 1 = c 1 = 1 and L 0 = R 0 = ∅, we obtain the problem of mining association rule set ARS(s 0 , c 0 ) in the traditional meaning. Otherwise, the mined rules may be significant in different application domains such as market-basket analysis, network traffic domain and so on. For instance, the managers or leaders want to increase the turnover of their supermarket based on high valuable items such as gold and iPad. To this aim, a solution is to find an interesting association among two of these items. The proposed problem may help them to answer the question if there is an association or not by setting the constraints L 0 = {gold} and R 0 = {iPad}. If there has at least a found rule, it means that the association is existent. Then, it can be used to support for attaining the aim such as showing two of these items on close places which may encourage the sale of the items together and do discount strategies. At the beginning, the confidences of mined rules may be not high because such exceptional rules only have a few their instances. If the mining task received the high value of the maximum confidence threshold, it may generate a large number of rules. This makes it easy to miss the low confidence rules but they are of potential significance. Thus, in order to realize and monitor them easily, we should use the small value of maximum confidence threshold. After a time, if these rules have higher confidences and become more important, then foreseeing these associations of the items at the early period of the rules may bring about the higher profits for the supermarket.
In the other meaning, using a maximum confidence threshold is more general than the fixed value that is always equal to 1. For the maximum support threshold, when the value of s 1 is quite low and that of c 0 is very high, ARS(s 0 , s 1 , c 0 , c 1 ) comprises association rules with the high confidences, discovered from low frequent itemsets. This problem is of importance and practical significance. For instance, we want to detect fairly accurate rules from new, abnormal yet significant phenomena despite their low frequency.
Extant algorithms to mine rules with minimum single constraints might encounter problem, named (P 2 ), such as the generation of many redundant candidate rules and the duplicates of solutions that are then eliminated. The current interest is to find an appropriate approach for mining-constrained association rule set (the rules satisfy minimum single constraints) without (P 2 ).

Paper contribution
The contributions of the paper are as follows. First, we present an approach based on the lattice [26,34,37] of closed itemsets and their generators to efficiently mine association rules satisfying the minimum single constraints and the maximum support and confidence thresholds mentioned above. To this approach, we propose a equivalence relation on constrained rule set based on the closure operator [26]. It helps to partition the set of constrained rules, ARS ⊇L 0 ,⊇R 0 (s 0 , s 1 , c 0 , c 1 ), into disjoint equivalence rule classes. Thus, each class is discovered independently and the duplication of the solution may be reduced considerably.
Moreover, the partition also helps to decrease the burden of saving the supports and confidences of all rules in the same class and be a reliable theoretical basis for developing parallel algorithms in distributed environments. Second, we point out the necessary and sufficient conditions so that the solution of the problem or a certain rule class is existent. If the conditions are not satisfied, the mining process does not need to uselessly take up time for finding the solution. This makes an important contribution to the efficiency of the approach. Third, a new representation of constrained rules in each class is proposed with many advantages as follows: (1) it helps us to have a clear sight about the structure of constrained rule set; (2) the duplication is completely eliminated; (3) all constrained rules are rapidly extracted without doing any direct check on the constraints, L ⊇ L 0 and R ⊇ R 0 . Finally, according to the proposed theoretical results, we design a new, efficient algorithm, named MAR_MinSC (Mining all Association Rules with Minimum Single Constraints) and related procedures to completely, quickly and distinctly generate all association rules satisfying the given constraints.

Preliminary concepts and notations
Prior to presenting an appropriate approach to discover the rules with minimum single constraints without (P 2 ), let us recall some of the following basic concepts about the lattice of closed itemsets and the task of association rule mining.
Given T = (O, A, R), we consider two Galois connection operators λ : 2 • → 2 A and ρ : 2 A → 2 • defined as follows: ∀O, A : From now on, we shall assume that the following conditions are satisfied, 0 < s 0 ≤ s 1 Paper organization The rest of this paper is organized as follows. In Sect. 2, we present some approaches to the problem (ARS_MinSC) and the related works. Section 3 shows a partition and a unique representation of constrained association rule set based on closed itemsets and their generators. An efficient algorithm MAR_MinSC to generate all association rules with minimum single constraints is also proposed in this section. Experimental results are discussed in Sect. 4. Finally, conclusions and future works are presented in Sect. 5.

Approaches
Post-processing approaches To find association rule set with minimum single constraints ARS ⊇L 0 ,⊇R 0 (s 0 , s 1 , c 0 , c 1 ), the approaches often perform two phases: (1) association rule set ARS(s 0 , c 0 ) without the constraints is discovered; (2) the procedures for checking and selecting rules r : L → R that satisfy the constraint ≡ supp(r) ≤ s 1 , conf(r) ≤ c 1 and L ⊇ L 0 , R ⊇ R 0 } are executed. In the phase (1), the rule set, ARS(s 0 , c 0 ), is able to be mined based on the following simple two methods. One is that it is found by definition, i.e., the class of frequent itemsets FS(s 0 ) with the threshold s 0 needs to be mined by a well-known algorithm, such as Apriori [1,23] or Declat [37]. Then, for ∀ S ∈ FS(s 0 ), all rules r : L → R ∈ ARS(s 0 , c 0 ), where ∅ = L ⊂ S , R ≡ S \ L are discovered by an algorithm based on the Apriori principle, such as Gen-Rules [26]. The time for finding ARS(s 0 , c 0 ) is often quite long because of the reasons as follows: (i) the phase of finding frequent itemsets may generate too many candidates and/or scan the database many times; (ii) the association rule extracting phase often produces many candidates and takes time a lot to calculate the confidences (since the supports of the left-hand sides of the rules may be undetermined). Let us call this post-processing algorithm PP-MAR-MinSC-1 (Post Processing-Mining Association Rule with Minimum Single Constraints-1). The other is to find ARS(s 0 , c 0 ) based on the lattice FLCG of frequent closed itemsets and the partition of ARS(s 0 , c 0 ) as presented in cotemoh4. Instead of exploiting all frequent itemsets, we only need to extract frequent closed itemsets and partition ARS(s 0 , c 0 ) into equivalence classes. The rules in each class have the same support and confidence that are calculated only once (see in Sect. 3.1.1 for more details). We name PP-MAR-MinSC-2 for the algorithm of the second method. PP-MAR-MinSC-2 seems to be more efficient than PP-MAR-MinSC-1 because it is more suitable in cases support and confidence thresholds are often changed.
Post-processing approaches have the advantage of being simple, but they also have several disadvantages. Due to the enormous cardinality of ARS(s 0 , c 0 ), the algorithms take a long time to search, but then there might be only a few or even no association rules in ARS(s 0 , c 0 ) which are of ARS ⊇L 0 ,⊇R 0 (s 0 , s 1 , c 0 , c 1 ) (the cardinality of ARS ⊇L 0 ,⊇R 0 (s 0 , s 1 , c 0 , c 1 ) is often quite small compared to that of ARS(s 0 , c 0 )). Moreover, after finding ARS(s 0 , c 0 ) is completed, post-processing algorithms have to do direct checks on the constraints, L ⊇ L 0 , R ⊇ R 0 . This might be time-consuming. In addition, when the constraints are changed based on the demands of online users, recalculating ARS(s 0 , c 0 ) will uselessly take up time. If, at the beginning, we mine and store ARS(s 0 , c 0 ) with s 0 = c 0 = 1/|O|, then the computational and memory costs will be very high.
Paper approach To avoid the disadvantages of post-processing approaches and to solve the problem (P 2 ), the paper proposes a new approach based on three key factors as follows. The first is the lattice LCG of closed itemsets, their generators and supports. Using LCG has three advantages: (1) the size of LCG is often very small in comparison with that of FS(s 0 ); (2) LCG is calculated just once by one of the efficient algorithms such as CHARM-L and MinimalGegenators [36,37], Touch [31] or GenClose [5]; (3) from the lattice LCG, we can quickly derive the lattice of frequent closed itemsets satisfying the constraint together with the corresponding generators whenever appears or changes. The second is the equivalence relation based on the closure of two sides of rules (L ≡ h(L ) ⊆ S ≡ h (L +R )). The third is the explicitly unique representation of rules in the same equivalence class AR(L, S) upon the generators and their closures, (L, G (L)) and (S, G (S)). In each class, this representation helps us to have a clear sight of the rule structure and to completely eliminate the duplication. An important note is that our method does not need to directly check the generated rules on the constraints, L ⊇ L 0 , R ⊇ R 0 .

Related works
To solve the problem (P 1 ) and improve the efficiency of existing mining algorithms, various constraints have been integrated during the mining process to only generate association rules of interest. The algorithms are mainly based on either the Apriori principle [1] or the FP-growth [18] in combination with the properties of C am and C m constraints. FP-bonsai [9] uses both C am and C m to mine frequent patterns. The advantage of FP-bonsai is that it utilizes C m to support the process of pruning candidate itemsets and the database upon C am . It is efficient on dense databases but not on sparse ones. Fold-Growth [29,35] is an improvement of FP-tree using a preprocessing tree structure, named SOTrielT. The first strength of SOTrielT is its ability to quickly find frequent 1-itemsets and 2-itemsets with a given support threshold. The second one is that it does not have to reconstruct the tree when the support is changed. A primary drawback of the FP-growth based algorithms is to require the large size of main memory for saving the original database and intermediate projected databases. Thus, if the main memory is not enough, the algorithms cannot be used. Another important limitation of this approach is that it is hard to take full advantage of a combination of different constraints, since each constraint has different properties. For instance, minimum single constraints above regarding support, confidence and item subsets include both C am and C m constraints whose properties are opposite. Moreover, the approach could take cost a lot to reconstruct FP-tree when mining frequent itemsets and association rules with different constraints. On the contrary, ExAMiner [8] is an Apriori-like algorithm. It uses input data reduction techniques to reduce the problem dimensions as well as the search space. It is good at huge input data. However, ExAMiner is not suitable with the problem stated in the paper because when the minimum single constraints are changed, the process of reducing input data needs to be started from the original database and generating rules may have time-consuming, direct checks on the constraints. Moreover, the authors in [20] show that the integration of C m can lead to a reduction in the pruning of C am . Therefore, there is a tradeoff between C am and C m pruning.
For other related results, a constraint, named maximum constraint, is used by [19] to discover association rules with many minimum support thresholds. Each 1-itemset has a minimum support threshold of its own. The authors propose an Apriori-like algorithm for mining large-itemsets and rules with this constraint. Lee et al. [21] design an algorithm to mine association rules with multi-dimensional constraints. An example, max(S.cost) <6 and 200 <min(S.price), is the one of the multi-dimensional constraints, where S is an itemset, and each item of S has two attributes, cost and price. In [14], the CoGAR framework to mine generalized association rules with constraints is presented. Besides the traditional minimum support and confidence, two new constraints, schema and opportunistic confidence, are considered. The schema constraint is similar to that shown in [2] but the approach to solve the problem is different. An algorithm is proposed to discover generalized rules satisfying both these constraints in three phases: (1) the algorithm CI-Miner is used to extract schema constrained itemsets; (2) the generalized association rules are exploited by the Apriorilike rule mining algorithm, RuleGen; (3) a post-processing filtering algorithm, named CR-Filter, is designed to get the rules satisfying the opportunistic confidence constraint. The concept of periodic constraints is given in [32,33] and new algorithms for mining association rules with this constraint are mentioned. The mining task, firstly, abstracts the variable and then eliminates the solutions falling outside at axiom constraints. The authors in [24] consider the problem of discovering multi-level frequent itemsets with the existent constraints that are represented as a Boolean expression in disjunctive normal form. A technique to model the constraints in the context of use of concept hierarchies is proposed and the efficient algorithms are developed to gain the aim.
Note that most of the previously proposed algorithms for mining association rules with constraints were designed to work on their own constraints. Thus, using them to discover rules based on minimum single constraints may be inefficient. In addition, these algorithms could encounter two important shortcomings; one is to generate many redundant candidates and duplicates of the solution that are then eliminated (the problem (P 2 )); the other is that the algorithms need to be rerun from the initial database whenever the constraints are changed. This reduces the mining speed for users.
While the results above seem to be not suitable with the stated problem, an approach that is based on the condensed representation of frequent itemsets might be more efficient. Instead of mining all frequent itemsets, only the condensed ones are extracted. Using condensed frequent itemsets has three primary advantages. First, it is easier to store because its cardinality is much smaller than the size of the class of all frequent itemsets, especially for dense databases. Second, they are mined only once from the database even when the constraints are changed. Third, they can be used to completely generate all frequent itemsets without having to access the database. There are two types of condensed representation. The first type is maximal frequent itemsets [13,22]. Since their cardinality is very small, they can be discovered quickly. All frequent itemsets can be generated from the maximal ones. However, the generation often produces duplicates. In addition, the frequent itemsets generated can lose information about their supports. Therefore, the supports need to be recomputed when mining association rules. The second type is closed frequent itemsets, called maximal ones, and their generators, called minimal ones [10][11][12]27]. Each closed frequent itemset represents a class of frequent itemsets. Thus, together with its generators, it can be used to uniquely determine all frequent itemsets in the same class without losing information about their supports.
Among two types of the condensed representation above, the second one is probably better and has been proven to be efficient in our previous works. Therefore, in this paper, we propose a new structure and an efficient representation of constrained association rule set based on closed itemsets and their generators. A new corresponding algorithm, named MAR_MinSC, is also developed for mining association rules satisfying the minimum single constraint and the maximum support and confidence thresholds.

Mining association rules with minimum single constraints
3.1 Partition of association rule set with minimum single constraints

Rough partition
To considerably reduce the duplication of candidates for the solution, we should partition the rule set into disjoint classes based on a suitable equivalence relation. Because the closure operator h of LCG has some good features, based on it, we propose the following two equivalence relations on FS(s 0 , s 1 ) and ARS(s 0 , s 1 , c 0 , c 1 ).
Obviously, these are equivalence relations. For any L ∈ FCS(s 0 , Thus, for ∀(L , S) ∈ NFCS(s 0 , s 1 , c 0 , c 1 ), all rules in the same equivalence class AR(L , S) have the same support supp(S) and confidence supp(S)/supp(L). This helps to considerably reduce storage needed for the supports of the frequent itemsets and the confidences of association rules.
(c) From (a) and (b), we have the partition of rule set ARS(s 0 , s 1 , c 0 , c 1 ) without the item constraints as follows.
With the aim of overcoming these disadvantages, we need to find the necessary conditions for the constraint set and the pairs (L , S) so that ARS ⊇L 0 ,⊇R 0 (s 0 , s 1 , c 0 , c 1 ) is not empty. As such, we have another representation AR + ⊇L 0 ,⊇R 0 (L , S) of AR ⊇L 0 ,⊇R 0 (L , S) and then obtain a better partition of ARS ⊇L 0 ,⊇R 0 (s 0 , s 1 , c 0 , c 1 ).

(a) (Necessary conditions for
and the following necessary conditions are satisfied: Thus, from now on, it is always assumed that (H 1 ) is satisfied. (b) (Necessary conditions for AR ⊇L 0 ,⊇R 0 (L , S) = ∅). For each pair (L, S) ∈ NFCS(s 0 , s 1 , c 0 , c 1 ), then for any rule r : L → R ∈ AR ⊇L 0 ,⊇R 0 (L , S) = ∅, the following necessary conditions are satisfied:

Corollary 1 (Necessary and sufficient conditions for the non-emptiness of ARS
(a) If one or more conditions in (H 1 ) are not satisfied, then Proof The assertion (a) and the dimension "⇒" of (b) are the obvious consequences of Proposition 2(a) and (b). The reverse dimension "⇐" of (b) is derived from AR + ⊇L 0 ,⊇R 0 (L , S) ⊆ AR ⊇L 0 ,⊇R 0 (L , S) ⊆ ARS ⊇L 0 ,⊇R 0 (s 0 , s 1 , c 0 , c 1 ).
From Proposition 2 and Corollary 1, we have the following smooth partition of the constrained rule set ARS ⊇L 0 ,⊇R 0 (s 0 , s 1 , c 0 , c 1 ).

Theorem 1 (Smooth partition of constrained rule set)
Assume that the conditions of (H 1 ) are satisfied, then we have: This partition is the theoretical basis for the parallel algorithms that independently mine each rule class AR + ⊇L 0 ,⊇R 0 (L, S) in the distributed environments. This is an interesting feature when we apply the suitable equivalence relations of mathematics into computer science, a simple yet efficient application of the principle "divide and conquer".
The algorithm MFCS_FromLattice (LCG S , C 0 , C 1 , s 0 , s 1 ) shown in Fig. 2 aims to find frequent closed itemsets FCS C 0 ⊆C 1 (s 0 , s 1 ) satisfying the constraints from the lattice LCG S (the restricted sub-lattice of LCG with the root node S). And, especially, we have FCS ⊇S * 0 (s * 0 , s * 1 ) = MFCS_FromLattice(LCG, S * 0 , A, s * 0 , s * 1 ). It is important to note that, from the Hass diagram on the lattice LCG, if the concepts of positive and negative borders [23], concerning the anti-monotonic and monotonic properties of the support and item constraints, are added to the algorithm, then the sub-lattices whose closed itemsets satisfy the corresponding constraints will be generated quickly. For instance, with the monotonic property (supp(L) ≤ s 1 and L ⊇ C 0 ) (M) , we illustrate the creation of negative border in the algorithm as follows. At line 3, with the movement from up to down in the lattice, starting at node S, when considering a node N for which one of the conditions in (M) has been violated, we eliminate all sub-branches of N and supplement N into the negative border of the sub-lattice containing the class FCS C 0 ⊆C 1 (s 0 , s 1 ). More specifically, in example 1, we have S = bfg, supp(S) = 2/7, s 0 = 2/7, s 1 = 1, c 0 = 3/4, c 1 = 1, and C 0 = L 0 = e in the sub-lattice LCG S . S has two direct sub-nodes L ∈{bh, fh}; consider the sub-node L = bh, then s(L) = 4/7 > s 1 = min(1; (2/7)/(3/4))=8/21, thus, we eliminate all sub-branches that started at L; consider the sub-node L =fh, then C 0 L; therefore, we also do this. Hence, } has the lefthand and right-hand sides that do not have an explicit representation, and mining them might still generate many redundant candidates.

Distinctly generating all association rules in each equivalence class AR
To completely eliminate the generation of duplicate candidates for the solution, based on each class (L , S) ∈ NFCS ⊇L 0 ,⊇R 0 (s 0 , s 1 , c 0 , c 1 ) and the generators G(L), G(S), we will propose an explicitly unique representation of rules in AR + ⊇L 0 ,⊇R 0 (L , S). It will also demonstrate how to completely and distinctly generate all rules in each class.
For that, first of all, we need to show a structure and a unique representation for an extended class of frequent itemsets that are restricted by X and contain an item constraint. Thereby, explicitly unique representations and structures of the right-hand side R ∈ FS(S\L ) L ,R * 0 ⊆R * 1 and the left-hand side L ∈ FS C 0 ⊆L C1 of rules r : L → R in each equivalence class AR ⊇L 0 ,⊇R 0 (L , S) are derived.

The explicitly unique representation and the structure of an extended class
To briefly present the results regarding the representation of the rule sides, we first consider a fairly general representation of frequent sub-items of Y that are restricted on X with minimum single constraint.

Proposition 3 (The unique representation of sets in FS
For special values of Y, X and Z 0 in FS(Y ) X,⊇Z 0 , we obtain the structures of FS C 0 ⊆L C1 and FS(S\L ) L ,R * 0 ⊆R * 1 as they are presented in the following section.

Proof
."⇐": and L C 1 = ∅. We have: Based on the representation of R min in FS * (Y ) X,⊇Z 0 and Lemma 1, , and obtain the result as follows. (1) Since L C 1 = ∅ and Proposition 3(c), we have FS * , we derive the following result.
As S\L = ∅ and Proposition 3(c), we obtain FS * (S\L ) L ,R * The general procedure MFS-ExtendedMinSC (mining frequent itemsets-Extended Minimum Single Constraints), shown in Fig. 3, distinctly generate the elements of FS * (Y ) X,⊇Z 0 : When using Remark 2, we add lines 4-10 and 25 to the procedure.
In each equivalence class, two cases, FS *

Structure and unique representation of rules in
From Corollary 2, we have the following corollary.   (1) is an improvement from our previous one (see Theorem 2 (+) in [17]). Since K U,i (that is used to check the condition K i ⊆ K U,i ) is smaller than K U,C 0 ,C 1 ,i and K −,i is larger than K −,C 0 ,C 1 in (+) , checking the condition ( * * ) in (1) is simpler.

Completely and distinctly generating all association rules in ARS
From Theorem 1 and Proposition 3, we obtain the Theorem 2 below.

Experimental results
We implemented the PP-MAR-MinSC-1, PP-MAR-MinSC-2 and MAR-MinSC algorithms in C# on Windows platforms. Experiments were performed on a PC with an i5-2400 CPU, 3.10 GHz@ 3.09 GHz PC and 3.16 GB of main memory. The source code for Charm-L, MinimalGenerators and dEclat algorithms [6] was converted to C#. Charm-L and Mini-malGenerators were used to mine the lattice of the closed itemsets and their generators. dEclat was used to exploit all frequent itemsets. To evaluate the proposed algorithm, we compare its performance to those of PP-MAR-MinSC-2 and PP-MAR-MinSC-1 algorithms. PP-MAR-MinSC-1 includes three phases: (1) executing dEclat to mine frequent itemsets; (2) integrating the constraints satisfying the monotone and anti-monotone properties into Gen-Rules [26] (using Apriori principle [1]) to generate candidate rules; For the performance test, five benchmark datasets in FIMDR [15] were chosen. Connect, Mushroom, Pumsb and Chess are real and dense datasets, i.e., they produce many long frequent itemsets even for very high support values. T10I4D100K is synthetic and sparse. Table 1 Table 2. Table 2

Conclusions and future works
The paper proposed a new, efficient algorithm called MAR-MinSC for mining association rules with minimum single constraints. It uses the lattice of closed itemsets and their generators as the input data. MAR-MinSC neither leads to the redundancy of generated rules nor directly checks rules with the constraints. Thus, it obtains the high performance. Our method that is based on the lattice and suitable equivalence relations reveals many significant implications in theory and practice.
In theory, the method demonstrates the explicit structure and unique representation of the solution set based on essentially basic factors such as closed itemset, support and generator. Therefore, this is the representation without losing information. The correctness of the theoretical results was proven.
In practice, the method is a basis to design parallel algorithms for efficiently mining the solution set in distributed environments. Moreover, the efficiency of the algorithm is minimally affected by the frequent change of the constraints in online systems.
In the future, we will use this method to study problems with other extended constraints.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Proposition 6
Proof (a) Assume that there exist two duplicate subsets R 1 , Since B and C are finite sets and ∅ = B ⊆ C, there exist the minimal sets R min,S ≡ Minimal(B) = ∅, R min ≡Minimal(C) = ∅. We always have the lowest index k of sets R i in R min,S ≡ Minimal(B). Assume that R k / ∈ R min , as R k ∈ R min,S , R k ∈ C, so ∃R j ∈ R min : R j ⊂ R k , where R j ≡ S j \(X + Z 0 ), S j ∈ G(X + Y ) and h(S j ) = h(X + Y ), S j ⊆(X + Z 0 ) ∪ S j = (X + Z 0 ) + R j ⊆(X + Z 0 ) + R k ⊆ X + R =S ⊆ X + Y, h(X + Y ) = h(S j )=h(S ). Then, S j ∈ G(S ), R j ∈B∩R min . Thus R j ∈ R min,S and R j ⊂ R k ∈ R min,S : it is contradictory because R k is a minimal set in B, Hence, R k ∈ R min = ∅. Then, it is realized that, if R min = ∅ then FS(Y ) X,⊇Z 0 = ∅ = FS * (Y ) X,⊇Z 0 . . We have S = S k + S k , where S k ≡ S \S k . Since S ⊇ X + Z 0 , so S = X + Z 0 + R k + R k + R ∼ k . = X + R , where Thus, S j ∈ G(S ) and R j ∈ R min,S but j < k: it is contradictory to the method for selecting the index k. Hence, R ∈ FS * (Y ) X,⊇Z 0 . . "⊇": For any R ∈ FS * (Y ) X,⊇Z 0 , we have R = Z 0 + R k + R k + R ∼ k , where R k ≡ S k \(X + Z 0 ) ∈ R min and S k ∈ G(X + Y ), h(S k ) = h(X + Y ), R = ∅, R k ⊆ (X +Y ) \(X +Z 0 ) = Y \Z 0 ⊆ Y . Moreover, R k ⊆ R U,k ⊆ Y , R ∼ k ⊆ R −,k ⊆ Y , Z 0 ⊆ R ⊆ Y and R ∩ X = ∅. On the other hand, since X + Y ⊇ X + R ⊇ X + Z 0 + R k = (X + Z 0 ) ∪ S k ⊇ S k , so h(S k ) = h(X + R )=h(X + Y ). Therefore, R ∈ FS(Y ) X,⊇Z 0 .

Remark 4.
From the proof of Proposition 3, we have remark as follows.
· If R min = {∅} and Z 0 = ∅, then R U, 1 For the last two cases, we do not need to check the obvious condition R = ∅ when generating R . (b) (The advantage of the condition ( * ) in decreasing redundant candidates accompanying exponential reduction).
In the process of forming set R , which originated from R * 0 + R k , when finding growing subsets R k ⊆ R U,k and then R ∼ k ⊆ R −,k to supplement R , if the condition ( * ) is violated, then we neither need to continue considering the supersets R (estimated2 |R U,k \R k | supersets) of R k (R k ⊂ R" ⊆ R U,k ) nor add all subsets R ∼ k (estimated 2 |R −,k | subsets) of R −,k to R (i.e., there are (2 |R U,k \R k | − 1)*(2 |R −,k | ) subsets eliminated because all of them are redundant candidates for R ). Then, we go on considering other R k sets (R k ⊂R k ⊆ R U,k ) or other R k sets of R min . The necessary and sufficient condition ( * ) for distinctly generating the right-hand side R of rules helps to eliminate many redundant candidates for them. This condition also helps to completely eliminate the duplication of the solutions, and the checking process is only based on minimal sets or generators with small quantity and size. It makes an important contribution to explain the efficiency of the corresponding algorithms. c. (Improving the calculation of the border sets). It is realized that, for each k > 1, to calculate R k−1 U = R k−2 U ∪ R k−1 , R U,k ≡ R k−1 U \R k , R −,k ≡ R * \R k U , where R * ≡ Y \Z 0 , we must perform two subtractions and one union on the generators that cannot be disjoint. To decrease the calculation of the border sets R U,k and R −,k , we note that \R k , R −,k = R −,k−1 \R k , ∀k ≥ 1 and For each k≥1, there remains only a disjoint union R U,k = R U,k−1 + R k−1 on two small sets in size (thus, its calculation is faster than that of normal union R k−1 U = R k−2 U ∪ R k−1 on two sets that may not be disjoint and have large sizes), and two subtractions, where one R −,k = R −,k−1 \R k is performed on two sets R −,k−1 ⊆ R * and R k ⊆ R k U that their sizes are less than those of sets in the subtraction R −,k ≡ R * \R k U .