Towards an enhanced user’s preferences integration into ranking process using dominance approach
 670 Downloads
Abstract
User preference is very important in orienting data miner, and this is the reason why these user preferences are integrated in the mining process, where they are coupled with Association Rules Mining “ARM” Algorithms to select only Association Rules “ARs” that satisfy the user’s wishes and expectations. Within this framework, several approaches were proposed to overcome some problems which persist with the traditional ARM algorithms mainly dimensionality phenomenon engendered by thresholding and the subjective choice of measures. “MDP\(_{\mathrm {REF}}\) Algorithm” is one of these approaches; it prunes, filters to select the relevant ARs, while ”RankSortMDP\(_{\mathrm {REF}}\)” sorts, ranks, and stores ARs to complete the MDP\(_{\mathrm {REF}}\) algorithm mining operation. Experiment result on real database showed the advantages of MDP\(_{\mathrm {REF}}\) algorithm and RankSortMDP\(_{\mathrm {REF}}\) algorithm over the other algorithms.
Keywords
Association rules mining RankSortMDP\(_{\mathrm {REF}}\) Algorithm Preference rules Preference mining User profile mining1 Introduction
Data mining (DM) has been of growing importance since the 1960s, and it is in fact the most important step in the mining process especially of frequent patterns, and ARs which are the subject matter of this paper. The main concern of authors is the challenge of dimensionality phenomena. Several methods have been developed on the basis of threshold fixing or use of different measures other than Support and Confidence, or else on the basis of other criteria [4, 7, 12], the objective is to mine interesting data quantitatively less and qualitatively more than the traditional techniques could do. Having the same objective, other approaches use, dominance or Paretodominance to classify rules into two categories: Dominant and Dominated rules.
Then, they chase out the category of the dominated and keep that of the dominant rules. However, it seems reasonable to wonder about this classification into two categories. Is it not possible to have more than two categories? Among the rules of the discarded category, cannot there be equivalent rules? Moreover, is there any guarantee that all the relevant information is kept and no relevant information is lost or that the category of the dominant rules really satisfy the user’s expectations?
This paper proposes MDP\(_{\mathrm {REF}}\) Algorithm to handle or process the ARset in such a way as to determine the subset of the most dominant rules responding to the user request. The remaining set is further examinated. During this examination, each single rule is given a statistical value. It is reasonably expected to have rules sharing the same statistical value called Statistically Equivalent Rules (SER). These SER are kept or discarded according to the user wishes. The third subset is discarded, because it includes dominated rules. The selected association rules via the MDP\(_{\mathrm {REF}}\) algorithm are called MDP\(_{\mathrm {REF}}\) rules, Most Dominant, and Preferential rules, and it is, therefore, obvious that the said algorithm combines the notion of dominance and preference to mine rules and helps shrink the dimensionality character of results.
This paper includes six sections including the introduction; the second one points out to some works in the literature and gives definition of the used concepts; and the third section introduces the MDP\(_{\mathrm {REF}}\) Algorithm and an evaluation experiment. In the fourth and fifth sections, we clarify the reason and our motivation for the suggestion of RankSortMDP\(_{\mathrm {REF}}\) Algorithm and we evaluate its performance according to the accuracy and execution time. The last section concludes the paper and sheds light on the future prospects of our research.
2 Literature review and background
2.1 Literature review
Many computer applications recognize user preferences as essential. Xiaoye Miao [14] considers them in a multidimensional space including language and preference operators, where a set of preference builders are assigned to categorical and numerical domains. Elsewhere are presented statistical models for user preferences, where the frequency of an item depends on the user preference and item accessibility. The user preference is modelable as an algebraic function to approximate the statistical value of the item’s features and the user profile. In [10], preference samples provided by the user are used to establish the order of tuples in the database. These samples are classified into two classes: Superior and Inferior samples; they contain information about relevant and irrelevant samples, respectively. In [7], the authors suggest “ProfMiner algorithm” to discover user profile on the basis of preferences and wishes which are userprovided. ProfMiner algorithm operates on a database containing contextual preference rules. This algorithm determines a threshold ‘k’ to select the contextual preference rules, describing the user profile and the member of these rules depends on ‘k’. However ProfMiner algorithm relies only on two measures: support and confidence which not be sufficient to preserve all the relevant information. Worth noting that the contextual preference rules is determined and extracted by “CPrefMinerAlgorithm”. The latter is a qualitative approach based on Baysian Network preference rules. The main strength of this approach that it produces a compact model of ordered preferences and products accurate result as well. In [24], the authors propose processing contextual logs of mobile device users to find out contextaware preferences.
In the same framework, PrefMinerAlgorithm [13] proposes a new solution to mine user’s preferences for intelligent mobile device notification management. PrefMiner Algorithm has the ability to determine automatically rules that reflect user’s preferences by studying notifications collected in advance in databases. In [22], the authors present an algorithm based on clustering and filtering user preferences, it is adapted to the different habits of users, and it partitions users into three groups according to their different habits and preferences: optimistic, pessimistic, and neutral. This grouping or clustering is based on new similarity measures to solve the shortcoming of previous or classical methods. In addition, some people used to resort to query rewriting or merely query enhancement [2] which consists of integrating into the user query some elements from the user profile. This technique is well used in Information Retrieval domain [8] and this is very recent in database domain.
ARset and measures
Rules  Measures confidence  Support  Pearl 

aRules Set  
ar \(_{1}\)  0.66  0.20  0.02 
ar \(_{ 2}\)  0.66  0.20  0.05 
ar \(_{ 3}\)  0.66  0.20  0.02 
ar \(_{ 4}\)  0.4  0.20  0.05 
ar \(_{ 5}\)  0.4  0.20  0.10 
ar \(_{ 6}\)  0.33  0.20  0.02 
ar \(_{ 7}\)  0.33  0.20  0.01 
ar \(_{ 8}\)  0.33  0.20  0.10 
ar \(_{ 9}\)  0.33  0.10  0.03 
ar \(_{ 10}\)  0.66  0.20  0.05 
ar \(_{ 11}\)  0.16  0.10  0.02 
ar \(_{ 12}\)  0.50  0.10  0.02 
ar \(_{ 13}\)  0.50  0.10  0.00 
ar \(_{ 14}\)  0.50  0.10  0.04 
Measures  Formula  
(bMeasures)  
Confidence (\(B\rightarrow H)\)  \(P(H/B)=\frac{P\left( {BH} \right) }{P\left( B \right) }\)  
Support (\(B\rightarrow H)\)  \(P\left( {BH} \right) \)  
Pearl (\(B\rightarrow H)\)  \(P\left( B \right) \times \left {P(H/B)P\left( H \right) } \right \)  
Recal (\(B\rightarrow H)\)  \(P(B/H)=\frac{P\left( {BH} \right) }{P\left( H \right) }\)  
Zhang (\(B\rightarrow H)\)  \(\frac{P\left( {BH} \right) P\left( B \right) P\left( H \right) }{\max \left\{ {P\left( {BH} \right) P\left( {\overline{H} } \right) ,\left. {P\left( H \right) P\left( {B\overline{H} } \right) } \right\} } \right. }\)  
Loevinger (\(B\rightarrow H)\)  \(\frac{P(H/B)P\left( H \right) }{1P\left( B \right) }\) 
The common objective of the techniques described above is to minimize the number of rules to be generated. We reasonably notice that there is a causative relationship between the number of generated rules and the number of criteria or interestingness measures imposed on the databases: the higher the latter, the lesser the former.
Unlike the approaches described above, our contribution presents a method which allows for the user preference as a further restriction of the mining operation so as to optimize the ARs cardinality.
2.2 Background and formalization
2.2.1 Association rules
“Association rules”, as a field of research, is a vital concern within the framework of business intelligence. These rules have continuously been extensively studied using different tools and techniques with the ultimate aim of discovering regularities, harmonies, and correlations between items in a database. An Association Rule usually takes the form of B \(\rightarrow \) H, where B and H are different and separate item sets, also B is called a premise and H is called a conclusion [18]. The strength of an association rule is often determined by its support and confidence [9].
Table 1 presents an illustrative example of an input association rules set (noted as: “ARSet” or the “14Rule Set”), and the mathematical formulas of some interestingness measures.
2.2.2 Dominance relationship
Definition
(Domination) A point \(x\in \) ddimensional set (X\(_{1}\),X\(_{2}\),...,X\(_{d})\) dominates \(x' \in \) ddimensional set, which is denoted by \(x\succ \sim x^\prime {,}\) if for every dimension k = 1, 2,...d we have \(x_k \ge x_k,\) [23].

ar dominates ar \(^{\prime }\) is noted as ar \(\succ \) ar \(^{\prime }\) if ar[m] \(\ge \) ar \(^{\prime }\)[m] \(\forall m\in \) \(\textsc {m}\).

If ar \(\succ \) ar \(^{\prime }\) and ar \(^{\prime }\) \(\succ \) ar: ar[m] = ar \(^{\prime }\)[m] \(\forall m\in \) M. Then, ar and ar \(^{\prime }\) are Statistically Equivalent, and noted as: ar \(\approx \) ar \(^{\prime }\) [15].
2.2.3 Preference relationship

MP\(_{1}\): possibility to watch films, interviews...

MP\(_{2}\): possibility to watch films, record interviews...

MP\(_{3}\): possibility to watch and download films, interviews...
User preference A preference p on a base relation R \(_\mathrm{b}\) is a triple (\(\sigma \), S, C), where \(\sigma \) is a selection condition involving a set D of items from R \(_\mathrm{b}\), S is a function defined on the cartesian product of a set D of items from R \(_\mathrm{b}\), such that S: \(\prod \) t \(_{i}\in \)D dom (t \(_{i})\rightarrow \)[0 1] and C \(\in \) [0 1].
The meaning of preference p is that each tuple t\(_{i}\) that belongs to the relation (R \(_\mathrm{b})\) is associated with a score through a function S with confidence C. A tuple t \(_{i}\) is preferred over a tuple t \(_{j}\) if t \(_{i}\) has a higher score than t \(_{j}\).
Some qualitative approaches use the score functions to express preferences by associating a score to a tuple of products. Other algorithms such as CPnet and RankVoting are automatic learning techniques that mine user preferences in a shorter time compared to the manual handling of preference model.
Let I be a set of objects in a multidimensional space
D = D \(_{1} \quad \otimes \, \)D\(_{2} \quad \otimes \cdots \otimes \, \) D \(_\mathrm{d}\). I is either finite or infinite. A preference relationship is a strict partial order on the multidimensional space D noted by\(\diamondsuit \).
Let i \(_{1}\diamondsuit \) i \(_{2}\) express that the user prefers i \(_{1}\) to i \(_{2}\).

The user prefers MP\(_{3}\) to MP\(_{1} \quad \Rightarrow \) MP\(_{3} \quad \diamondsuit \quad \) MP\(_{1}\).

The user prefers MP\(_{3}\) to MP\(_{2} \quad \Rightarrow \) MP\(_{3} \quad \diamondsuit \quad \) MP\(_{2}\).
Mapping of user’s preferences
Preferences  Bituples 

P\(_{1}\)  ]0 0.2[ 
P\(_{2}\)  [0.2 0.4[ 
P\(_{3}\)  [0.4 0.6[ 
P\(_{4}\)  [0.6 0.8[ 
P\(_{5}\)  [0.8 1[ 
Table 2 presents a set of preference representing a mapping of preferences provided by the user about his/her preferences over transactions (t \(_{i}\), t \(_{j})\).
Rules set with user’s preference
Rules  Measures confidence  Preferences  Support  Pearl 

ar \(_{1}\)  0.66  0.20  0.02  (P\(_{1}\), P\(_{2})\) 
ar \(_{ 2}\)  0.66  0.20  0.05  (P\(_{2})\) 
ar \(_{ 3}\)  0.66  0.20  0.02  (P\(_{2})\) 
ar \(_{ 4}\)  0.4  0.20  0.05  (P\(_{1}\), P\(_{3})\) 
ar \(_{ 5}\)  0.4  0.20  0.10  (P\(_{1}\), P\(_{3})\) 
ar \(_{ 6}\)  0.33  0.20  0.02  (P\(_{1}\), P\(_{3})\) 
ar \(_{ 7}\)  0.33  0.20  0.01  (P\(_{1}\), P\(_{3})\) 
ar \(_{ 8}\)  0.33  0.20  0.10  (P\(_{1}\), P\(_{2})\) 
ar \(_{ 9}\)  0.33  0.10  0.03  (P\(_{1}\), P\(_{3})\) 
ar \(_{ 10}\)  0.66  0.20  0.05  (P\(_{2}\), P\(_{3})\) 
ar \(_{ 11}\)  0.16  0.10  0.02  (P\(_{1}\), P\(_{3})\) 
ar \(_{ 12}\)  0.50  0.10  0.02  (P\(_{1}\), P\(_{3})\) 
ar \(_{ 13}\)  0.50  0.10  0.00  (P\(_{1}\), P\(_{3})\) 
ar \(_{ 14}\)  0.50  0.10  0.04  (P\(_{1}\), P\(_{3})\) 
3 MDP\(_{\mathrm {REF}}\) mechanism illustration
3.1 MDP\(_{\mathrm {REF}}\) algorithm
Figure 1 shows a visual representation of the mining process of MDP\(_\mathrm{REF}\) rules. Notice that it consists of three main operations the last of which is the concern of MDP\(_\mathrm{REF}\) rules algorithm.
3.2 MDP\(_{\mathrm {REF}}\) algorithm tasks and its pseudocode
 1.
Create an imaginary referential rule (ar \(^{T})\) which has the maximum measurements to dominate all the rules.
 2.
Calculate the degree of similarity of all the rules one by one with the referential rule (ar \(^{T})\) (\(Deg_{Sim} (AR,AR^{T}))\).
 3.
Determine the dominant real rule ar* having the lowest degree of similarity with ar \(^{T}\).
 4.
Remove all the rules dominated by ar*.
 (5)
Resort to the user’s preferences to determine which one to keep if two rules are statistically equivalent.
 6.
Keep both, if the decision maker is indifferent. Otherwise, we keep the one satisfying most preference.
 7.
Drop all rules where the user’s preferences are already covered by those previously handled.
 8.
Keep Rules covering the user’s preference other than those already covered by those previously selected.

Dominant rules are stored.

Nondominant rules are chased out.

Statistically Equivalent Rules—SER.
The seventh task allows discarding preferentially redundant and/or overlapping rules. The performance of task 7 implies the performance of task 8. MDP\(_{\mathrm {REF}}\) Algorithm tasks do not include learning user preferences; these were provided prior to processing—the fact which means that these do not have any influence on the processing time of MDP\(_{\mathrm {REF}}\) Algorithm.
Table 3 shows a set of ARs on which MDP\(_{\mathrm {REF}}\) Algorithm is applied and the obtained results are these two rules: ar \(_{10}\) and ar \(_{05}\) the most dominant and preferential rules (MDP\(_{\mathrm {REF}}\) rules).
Characteristics of ARset (mobile phone)
Data set  #Items  #AR  #Transaction  Avg. MDP\(_{\mathrm {REF}}\) 

Mobile phone  128  25000  326  14268 
Sample of mobile phone brands
ID  Brand  Design  Connectivity  Screen  Battery autonomy (h)  Camera (Mp)  Price (Euro) 

I\(_{1}\)  Nokia  Monobloc  wub\(^{3}\)  Tactile  6–8  2–5  >300 
I\(_{2}\)  Samsung  Monobloc  ub  Tactile  3–5  2–5  100–200 
I\(_{3}\)  Samsung  Monobloc  wub  Tactile  9–11  2–5  200–300 
I\(_{4}\)  Sony Ericson  Monobloc  wub  Tactile  9–11  10–14  >300 
I\(_{5}\)  Sony Ericson  Monobloc  ub  Tactile  3–5  6–9  >300 
I\(_{6}\)  Samsung  Coulissant  ub  Non tactile  3–5  2–5  <100 
I\(_{7}\)  Samsung  Coulissant  b  Non tactile  3–5  2–5  100–200 
I\(_{8}\)  LG  Monobloc  ub  Non tactile  3–5  2–5  <100 
I\(_{9}\)  LG  Coulissant  ub  Non tactile  3–5  2–5  200–300 
I\(_{10}\)  Nokia  Coulissant  ub  Non tactile  3–5  2–5  100–200 
I\(_{11}\)  Sony Ericson  Monobloc  wub  Non tactile  9–11  2–5  100–200 
MDP\(_{\mathrm {REF}}\) vs all rules and other ARM algorithm
Database/algorithm  Measures  

C, P, R  C, L, Zh  C, P, Zh, L  C, P, R, Zh, L\(^2\)  
Mobile phone (10.00)  CprefMiner  20,000  18,500  16,000  20,750 
ProfMiner  18,250  16,250  13,500  19,000  
TBR  22,500  20,750  18,750  21,750  
AR  25,000  25,000  25,000  25,000  
SkyRule  11,250  13,750  12,500  10,500  
MDP\(_{\mathrm {REF}}\)  12,500  15,400  16,775  12,375 
The characteristics of these mobile phones and there attributes are specified in Tables 4 and 5. The ARset involved contains 25,000 rules corresponding to a set of some distinct mobile phones, described by a set of 326 transactions, representing a set of 128 distinct items. These 25,000 rules (which may not be big data) processed by MDP\(_{\mathrm {REF}}\) Algorithm and the result is the generation of 14,268 rules representing only \(\approx \)57% of the original number.
As the other algorithms are based on thresholding, we are obliged to accept their optimal threshold only for reasons of comparison.
 1.
In comparison with All Rules, TBR, CprefMiner, and ProfMiner algorithms, MDP\(_{\mathrm {REF}}\) algorithm steadily generates less rules and it minimizes the number of selected association rules into (\(\approx \)27%) as an average of reduction rate that varying between 12% as a lower bounded and 43% as an upper bounded, regardless of the nature and cardinality of measures; that is, the number of selected rules by MDP\(_{\mathrm {REF}}\) is significantly reduced, from 25,000 rules to 12,500 for the measure sets {C, P, R}, from 25,000 to 15,400 for measure sets {C, L, Zh}, we notice that these latter sets have the same size which is three but the different size of MDP\(_{REF }\)Rules generated. from 25,000 rules to 16,775 for measure sets {C, P, Zh, L}, and from 25,000 to 12,375 for a set for measure sets {C, P, R, Zh, L}.
 2.
When compared MDP\(_{\mathrm {REF}}\) algorithm to SkyRule algorithm, the first algorithm has a different behavior as it generates more rules for all interestingness measures. This particular behavior originates from the fact that MDP\(_{\mathrm {REF}}\) algorithm recovers an average of 19% of association rules from those groundlessly rejected by SkyRule. Therefore, it keeps some SER that may cover a particular user’s preferences and having valuable information. Therefore, MDP\(_{\mathrm {REF}}\) algorithm bypasses the losing information problem that suffer SkyRule algorithm, and it selects the AR responding to the requests and preferences expressed by the users. According to these last reasons, groundlessly discarded and loss of information problem, the MDP\(_{\mathrm {REF}}\) is considered better than SkyRule algorithm.
 3.
The choice of measure sets—\(\textsc {m}\) sets, not necessarily their size, affects the number of MDP\(_{\mathrm {REF}}\) generated rules.
4 RanksortMDP\(_{\mathrm {REF}}\) algorithm
4.1 Purpose
4.2 Pseudocode of “RankSortMDP\(_{\mathrm {REF}}\) algorithm”
RankSortMDP\(_{\mathrm {REF}}\) algorithm was coded OOP language programming and all tests were performed on a computer with the following specification: 1.73 GHz Intel processor with Windows 7 operating system and 2 GB as memory Capacity.
The RankSortMDP\(_{\mathrm {REF}}\) algorithm processes by stage, for instance:
At stage 1 (k = 0 + 1) (Line 6), the RankSortMDP\(_{\mathrm {REF}}\) algorithm call for MDP\(_{\mathrm {REF}}\) algorithm to select the first subset association rules (E\(_{1)}\) (Line 7) from the all Association Rules belonging to R \(\ne \) Ø (Line 5) in our case, see Table 3, where R is the “14Rules set”. The AR\(_{10}\), AR\(_{05}\) are the two first association rules selected at this stage and ranged in the E\(_{1}\) that is considered as a first subset: {AR\(_{10}\), AR\(_{05}\)}\(\in \) E\(_{1}\).
At stage 2 (k = 1 + 1), the RankSortMDP\(_{\mathrm {REF}}\) algorithm call for MDP\(_{\mathrm {REF}}\) algorithm to select the second subset of association rules (E\(_{2}\) = {AR\(_{02}\), AR\(_{08}\)}) the E\(_{2}\) succeeds the E\(_{1}\), it is less good according to their members and ranked after the E\(_{1}\).
Recursively, at each stage k + 1, the proposed algorithm call for the MDP\(_{\mathrm {REF}}\) algorithm to select the new association rules succeeding those selected and ranked at the stage k. Then, the Association Rules set goes back before the one generated at the (k + 1)th stage. Consequently, all predecessor association rules are better classified and sorted than any association rules which belong to the successors set. Furthermore, the MDP\(_{\mathrm {REF}}\) rules ranked at the same stage in moving order of their degree similarity and the covered user preferences. Finally, the RankSortMDP\(_{\mathrm {REF}}\) algorithm can be considered as a sound algorithm.
When the Association Rules set R becomes empty and as the RankSortMDP\(_{\mathrm {REF}}\) terminates processing all association rules which are ranked and classified. This means that the RankSortMDP\(_{\mathrm {REF}}\) algorithm is complete.
We finally come to the conclusion that the RankSortMDP\(_{\mathrm {REF}}\) algorithm is sound and complete.
To show the performance of RankSortMDP\(_{\mathrm {REF}}\) algorithm, we applied it on the ARset (in our case “14Rule Set”), as shown in Table 3. It processed the said set and the result is the division into 7 subsets {E\(_{1}\)... E\(_{7}\)}, as summarized in Table 7.
The subset E\(_{1}\) which contains two rules ar \(_{10, }\) ar \(_{05}\) is generated in the first iteration of RankSortMDP\(_{\mathrm {REF}}\) algorithm. Worth noticing is that ar \(_{10, }\) ar \(_{05}\) are themselves the rules generated by MDP\(_{\mathrm {REF}}\) algorithm. Therefore, we reasonably conclude that the first generated subset E\(_{1}\) by RankSortMDP\(_{\mathrm {REF}}\) is also the result generated by MDP\(_{\mathrm {REF}}\) applied on the entire ARset (14Rule Set).
E\(_{2}\) is the RankSortMDP\(_{\mathrm {REF}}\) extracted subset in the second iteration which concerns the database “ARset\(\backslash \)E\(_{1}\)”. The member rules {ar \(_{02, }\) ar \(_{08}\)} belonging to E\(_{2}\) are the most dominant and preferential rules in “ARset\(\backslash \)E\(_{1}\)”.
At the end of the seventh and final iterations of RankSortMDP\(_{\mathrm {REF}}\), we get E\(_{7}\).
The result we get after the seven iterations is seven subsets in which rules are ranked from top to bottom. Therefore, all the 14 rules are ordered.
Output of RankSortMDP\(_{\mathrm {REF}}\) algorithm
Set of rules  Rules  Preferences  Level 

E\(_{1}\)  ar \(_{ 10}\), ar \(_{05}\)  (P\(_{1}\), P\(_{2}\), P\(_{3})\)  1 
E\(_{2}\)  ar \(_{02}\), ar \(_{08}\)  (P\(_{1}\), P\(_{2})\)  2 
E\(_{3}\)  ar \(_{01}\), ar \(_{04}\)  (P\(_{1}\), P\(_{2}\), P\(_{3})\)  3 
E\(_{4}\)  ar \(_{03}\), ar \(_{09}\)  (P\(_{2}\), P\(_{3}\), P\(_{3})\)  4 
E\(_{5}\)  ar \(_{13}\), ar \(_{06}\)  (P\(_{1}\), P\(_{3})\)  5 
E\(_{6}\)  ar \(_{14}\), ar \(_{07}\), ar \(_{12}\)  (P\(_{1}\), P\(_{3})\)  6 
E\(_{7}\)  ar \(_{11}\)  (P\(_{1}\), P\(_{3})\)  7 
Order response mechanism
User’s order “u”  Response (subset/rules) 

2  E\(_{1}\) 
3  E\(_{1}\, \oplus \, E_{2}{\backslash }\{ \textsc {ar}_{08}\)} 
4  E\(_{1} \, \oplus \, \)E\(_{2}\) 
5  E\(_{1} \, \oplus \, E_{2} \, \oplus \, E_{3}\backslash \){ ar \(_{04}\)} 
7  E\(_{1}\oplus \)E\(_{2}\, \oplus \, \) E\(_{3} \oplus \)E\(_{4}\backslash \){ ar \(_{09}\)} 
5 Performance of RankSortMDP\(_{\mathrm {REF}}\)
5.1 The previous related algorithms
This section proposes to compare the proposed algorithm with related algorithms having the same goals: ranking and sorting the association rules.
The first related algorithm is Rank Rules that suggested by [4]’s authors to rank the association rules basing on the Skyline operator and founding on SkyRules algorithm’s performances which is called at each iteration to determine the undominated association rules. The second one is the Rule RankCBA [21] which is evolved by Genetic Network Programming, where the directed graphs are used as genes population to compute the fitness function allowing to rank and to sort the members of thr data set. The third one is the HybridRuleRank [16] that couples the Genetic Algorithms and a probabilistic and metaheuristic method searching to optimize and approximate global solution, this metaheuristic method known as: Simulated Annealing (SA). Worth recalling that RuleRankCBA combines arithmetically the historical interesting measure, support, and confidence to create a set of functions to optimize its fitness function and achieve the target objectives. Like RuleRankCBA, the HybridRuleRank algorithm sorts and ranks the association rules according to the support and confidence measures.
In addition, the execution time and accuracy indicators are utilized as tools to measure the RankSortMDP\(_{\mathrm {REF}}\)’s performances and to accomplish this comparison.
5.2 Execution time of RankSortMDP\(_{\mathrm {REF}}\)
Simulation results compared to the previous algorithms
Database  Statistical indicators  RanksortMDP\(_{\mathrm {REF}}\)  Rank rules  RuleRankCBA [21]  HybridRuleRank [16] 

Mobile phone  Accuracy (%)  87.99 ± 0.33  87.98 ± 0.33  88.02 ± 0.29  89.11 ± 0.39 
Time (s)  1.97 ± 0.19  1.67 ± 0.97  50.59 ± 7.10  50.60 ± 7.10  
Iris  Accuracy (%)  94.03 ± 1.97  94.00  94.13 ± 0.87  95.22 ± 4.50 
Time (s)  0.84 ± 0.024  1.02 ± 0.03  0.41 ± 0.01  0.41 ± 0.47  
Flare  Accuracy (%)  82.26 ± 0.38  81.09 ± 0.32  84.21 ± 0.20  84.30 ± 0.62 
Time (s)  24.75 ± 1.5  3.12 ± 0.63  75.22 ± 3.55  75.30 ± 4.02  
Average  Accuracy (%)  88.09 ± 0.28  87.69 ± 0.21  88.78 ± 0.45  89.54 ± 1.83 
Time (s)  9.18 ± 1.44  3.93 ± 0.54  42.07 ± 3.55  42.10 ± 3.86 
Characteristics of data sets
Database  #Items  #AR  #Transaction  Avg. MDP\(_{\mathrm {REF}}\) 

Mobile phone  128  25,000  326  14,268 
Flare  39  57,476  1389  2550 
Iris  119  440  8124  259 
We remark that the average execution time indicator decreases until a given measure cardinality (may be an optimal measure cardinality). Then, it increases. Hence, we intend to study the property of interesting measures belonging to measures sets.
5.3 Indicators tools: accuracy and execution time
Table 9 summarizes some statistical indicators: accuracy and the execution time, concerning the three different databases (Mobile phone, Iris, Flare) on which the four related approaches are applied. In this subsection, we compare, in terms of the execution time and accuracy indicator, the proposed approach known as: “RankSortMDP\(_{\mathrm {REF}}\) algorithm with “Rank Rules” [4] and “RuleRankCBA” [21] and the HybridRuleRank [16]. To evaluate the proposed approach’s performance and efficiency, we execute the aforementioned algorithms on other databases having different sizes and attributes (Mobile phone, Iris, Flare) which their characteristics are described in Table 10. To validate the obtained results and conduct a reliable comparison, the kfold crossvalidation technique is used, since it processes repeatedly each data set ktimes. For getting accuracy, the compared algorithms are tested multiple times by running the kfold cross validation technique on each data set, worth noting that the data set elements are rearranged and restratified before each round, and then, we keep the computed average accuracy of the multiple tests for each data set, (in our case: k = 10).
On the one hand, RankSortMDP\(_{\mathrm {REF}}\) outperforms Rank Rules in terms of accuracy (88.09 vs 87.69%). However, in terms of execution time, the proposed algorithm is much longer than Rank Rules (9.18 vs 3.93 s), because the Rank Rules algorithm does not process reasonably the statistically equivalent rules—SER, the Rank Rules algorithm may rank two SER in different levels. Hence, it is probably having the wrong ranking of an SERset.
On the other hand, RankSortMDP\(_{\mathrm {REF}}\) is faster than RuleRankCBA algorithm (9.18 vs 42.07 s) and it is, also, faster than the HybridRuleRank (9.18 vs 42.10 s), since there are many redundant and repeated functions estimated and created in RuleRankCBA. Meanwhile, in terms of accuracy, RankSortMDP\(_{\mathrm {REF}}\) and RuleRankCBA have approximately the same performance (88.09 vs 88.78 %). Finally, the RankSortMDP\(_{\mathrm {REF}}\)’s performances compared to those of the HybridRuleRank algorithm show that the last algorithm “HybridRuleRank” surpasses the proposed one “RankSortMDP\(_{\mathrm {REF}}\)” in terms of accuracy (89.54 vs 88.09).
Table 10 summarizes the characteristics of the data sets: database is the database appellation, # Items is the item count in the data set, # AR is the association rules count, and # Transaction is the transaction count in the data set and Avg. MDP\(_{\mathrm {REF}}\) correspond to the average count of the association rules selected by the MDP\(_{\mathrm {REF}}\) algorithm from each data set.
6 Conclusion and perspective
The RankSortMDP\(_{\mathrm {REF}}\) algorithm is introduced to supply the user with the requested rules via ranking and sorting all association rules of the original ARset which is divided into subsets.
The proposed approach aims to rank and sort association rules and respond to a user’s request, basing on MDP\(_{\mathrm {REF}}\) algorithm that claims minimizing dimensionality without losing any relevant information or ignoring the user’s preferences. The experimental evaluation of our approach shows satisfactory results concerning the target objectives. Further directions include: (1) the semantic analysis and the association rules components which we plan to deepen (2) will intend to study the property of interestingness measures belonging to measures sets.
Perfection never comes at once, and we promise to make significant endeavors to improve our techniques to achieve a higher quality analysis of data. We are also inspired and motivated to improve techniques to make our algorithm “RankSortMDP\(_{\mathrm {REF}}\) algorithm” faster and faster so as to be able to work on big databases, the processing of which necessitates less timeconsuming techniques.
References
 1.AitMlouk, A., Gharnati, F., Agouti, T.: Multiagentbased modeling for extracting relevant association rules using a multicriteria analysis approach. Vietnam J. Comput. Sci. 3(4), 235–245 (2016). doi: 10.1007/s4059501600704 CrossRefGoogle Scholar
 2.Arvanitis, A., Koutrika, G.: PrefDB: supporting preferences as firstclass citizens in relational databases. IEEE Trans. Knowl. Data Eng. 26(6), 1430–1446 (2014). doi: 10.1109/TKDE.2013.28 CrossRefGoogle Scholar
 3.Asha, P., Srinivasan, S.: Analysing the associations between infected genes using data mining techniques. Int. J. Data Mining Bioinf. 15(3), 250–271 (2016). doi: 10.1504/IJDMB.2016.0770 CrossRefGoogle Scholar
 4.Bouker, S., Saidi, R., Ben Yahia, S., Mephu Nguifo, E.: Mining undominated association rules through interestingness measures. Int J Artif. Intell. Tools. 23(4), 1460011 (2014). doi: 10.1142/S0218213014600112 CrossRefGoogle Scholar
 5.Branke, J., Corrente, S., Greco, S., Słowiński, R., Zielniewicz, P.: Using Choquet integral as preference model in interactive evolutionary multiobjective optimization. Eur. J. Oper. Res. 250(3), 884–901 (2016). doi: 10.1016/j.ejor.2015.10.027 MathSciNetCrossRefzbMATHGoogle Scholar
 6.Branke, J.: MCDA and multiobjective evolutionary algorithms. Multiple Criteria Decision Analysis, pp. 977–1008 (2016). doi: 10.1007/9781493930944_23
 7.De Amo, S., Saliou Diallo, M., Talibouya Diop, C., Giacometti, A., Li, D., Soulet, A.: Contextual preference mining for user profile construction. Inf. Syst. 49, 182–199 (2015). doi: 10.1016/j.is.2014.11.009 CrossRefGoogle Scholar
 8.Gheorghiu, R., Labrinidis, A., Chrysanthis, P.: Unifying Qualitative and Quantitative Database Preferences to Enhance Query Personalization. Proceedings of the Second International Workshop on Databases and the Web  ExploreDB’15, pp. 6–8 (2015). doi: 10.1145/2795218.2795223
 9.Gupta, G.: Introduction to data mining with case studies. PHI Learning Pvt, Ltd (2014)Google Scholar
 10.Jiang, B., Pei, J., L, X., Cheung, D., Han, J.: Mining preferences from superior and inferior examples. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp. 390–398 (2008)Google Scholar
 11.Kongchai, P., Kerdprasop, N., Kerdprasop, K.: Dissimilar Rule Mining and Ranking Technique for Associative Classification. Proceedings of the International MultiConference of Engineers and Computer Scientists 2013, IMECS 2013. 1 (2013)Google Scholar
 12.Mallik, S., Mukhopadhyay, A., Maulik, U.: RANWAR: Rankbased weighted association rule mining from gene expression and methylation data. IEEE Trans. NanoBiosci. 14(1), 59–66 (2015)Google Scholar
 13.Mehrotra, A., Hendley, R., Musolesi, M.: PrefMiner. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous ComputingUbiComp ’16, pp. 1223–1234 (2016). doi: 10.1145/2971648.2971747
 14.Miao, X., Gao, Y., Chen, G., Cui, H., Guo, C., Pan, W.: Si2p: a restaurant recommendation system using preference queries over incomplete information. Proc. VLDB Endow. 9(13), 1509–1512 (2016). doi: 10.14778/3007263.3007296
 15.Mouhir, M., Gadi, T., Balouki, Y., El Far, M.: A new way to select the valuable association rules. 2015 7th International Conference on Knowledge and Smart Technology (KST), pp. 81–86 (2015). doi: 10.1109/KST.2015.7051464
 16.Najeeb, M. M., El Sheikh, A., Nababteh, M.: A new rule ranking model for associative classification using a hybrid artificial intelligence technique. In: Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on IEEE, pp. 231–235 (2011)Google Scholar
 17.Rolfsnes, T., Moonen, L., Di Alesio, S., Behjati, R., Binkley, D.: Improving change recommendation using aggregated association rules. Proceedings of the 13th International Workshop on Mining Software Repositories—MSR ’16, pp. 73–84 (2016). doi: 10.1145/2901739.2901756
 18.Shmueli, G., Peter Bruce, C., Nitin, Patel R.: Data mining for business analytics: concepts, techniques, and applications with XLMiner. Wiley, Hoboken (2016)Google Scholar
 19.Soulet, A., Raïssi, C., Plantevit, M., Cremilleux, B.: Mining Dominant Patterns in the Sky. 2011 IEEE 11th International Conference on Data Mining, pp. 655–664 (2011). doi: 10.1109/ICDM.2011.100
 20.Ugarte, W., Boizumault, P., Loudni, S., Crémilleux, B., Lepailleur, A.: Mining (Soft) skypatterns using constraint programming. Advances in Knowledge Discovery and Management, pp. 105–136 (2015). doi: 10.1007/9783319237510_6
 21.Yang, G., Mabu, S. M., Shimada, K., Gong, Y., Hirasawa, K.: Ranking association rules for classification based on genetic network programming. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation ACM, pp. 1917–1918 (2009)Google Scholar
 22.Zhang, J., Lin, Y., Lin, M., Liu, J.: An effective collaborative filtering algorithm based on user preference clustering. Appl. Intell. 45(2), 230–240 (2016). doi: 10.1007/s1048901507569 CrossRefGoogle Scholar
 23.Zhang, J., Jiang, X., Ku, W.S., Qin, X.: Efficient parallel skyline evaluation using mapreduce. IEEE Trans. Parallel Distrib. Syst. 27(7), 1996–2009 (2016)CrossRefGoogle Scholar
 24.Zhu, H., Chen, E., Xiong, H., Yu, K., Cao, H., Tian, J.: Mining mobile user preferences for personalized contextaware recommendation. ACM Trans. Intell. Syst. Technol. 5(4), 1–27 (2014). doi: 10.1145/253251 CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.