1 Introduction

Data mining techniques have been designed to discover useful knowledge from database [1, 2], in fact, data mining techniques play a vital role in many business analytics and predictive applications that come to complete data analysis and predictive techniques. Seen the production increase of fast information, a large number of data are generated and stored in computer database in need to discover knowledge and useful information [3], by applying data mining techniques such as association rules to present the result as a valid element for further use.

Fig. 1
figure 1

The knowledge discovery process

Association rule is a technique that allows the user to discover the correlation between different object in databases. The result presented in the form of antecedent and consequence, for example, an association rule extracted from a transactional database: Mouse \(\wedge \) Keyboard \(\rightarrow \) Computer

This rule indicates that the customer who buys mouse and keyboard together tends also to buy computer. The support defines the proportion of transactions that contains mouse and keyboard; therefore, the confidence is the proportion of transactions that contains mouse and keyboard, which also contains computer.

Although, association rule technique has many drawbacks such as a large number of discovered rules, redundancy, the production of non-interesting rules [5, 6, 35] and the complexity of rule extraction process. So, it is important to propose an approach that can help the user to make their own choice of interesting rules according to their specific needs. Its the case of our proposed approach of MCA within KDD process in which we are interested in evaluation of extracted rules by selecting the most relevant from the large number extracted, and provide a multi-agent approach to automate the process of extraction.

The remainder of this paper is organized as follows: in Sect. 2, we presented an overview of related work and concepts for extracting association rules. In Sect. 3, the contribution of the multi-criteria analysis approach is proposed. In Sect. 4, we describe the proposed architecture of the multi-agent system for modeling our proposed approach. In Sect. 5, we focus our attention on an empirical study to illustrate the performance of our approach, finally this paper ended by a result discussion and concluding section.

2 Related work

In the literature, several approaches have been designed to manage the complexity of rules extraction, but these approaches are still limited and costly in terms of interestingness, redundancy, and the huge number of extracted rules. In this section, an overview of related work is presented such as KDD, Association rules, extraction algorithms, multi-agent system and quality measurements.

2.1 Knowledge discovery in databases

The process of KDD consists of five major steps, namely the extraction goal, data selection, data transformation, data mining techniques, and finally, the interpretation of results, these steps are given in Fig.  1.

Objective of extraction The first step of the KDD process is to understand the purpose of extraction to choose the appropriate techniques for solving the problem.

Data selection The selection of significant samples data, to minimize the mass of available data, and to facilitate the study of the main objective.

Data processing Make the necessary transformations of data using an extraction, transformation, and loading (ETL) techniques between different data sources.

Application of data mining techniques There are two types of models, classification models of organizing classes in data (identifiable sets) and regression models of determining variable dependency between them.

Interpretation of results Finally, the resulting information must be analyzed according to the specified objectives.

2.1.1 Association rules concepts

Association rule was initiated by Agrawal [3], for the first time, to analyze transactional databases. It is a statement of the form \(A\rightarrow B\), where A, B \(\subset \) I and A, B are non-empty set. The set A is called antecedent of the rule, the set B is called the consequent of the rule, and I is an itemset.

2.1.2 Quality measurements

To evaluate the rules issued from extraction algorithms, the notion of interesting and relevance are introduced. Let A be an itemset, \(A\rightarrow B\), a rule and T a set of transactions, we define the support, confidence, lift, and conviction as follows:

Rule support The support value is defined as the proportion of transactions in the database, which contains the itemset A:

$$\begin{aligned} \text {Supp}(A\rightarrow B)=\frac{|t(A UB)|}{t(A)} \end{aligned}$$
(1)

Confidence The confidence determines how frequently items in B appear in transaction that contains A, the formal definition is:

$$\begin{aligned} \text {Confidence}(A\rightarrow B)=\text {Supp}(A\cup B)/ \text {Supp}(A) \end{aligned}$$
(2)

Left The left defined as the ratio of the observed support to that expected if A and B were independent.

$$\begin{aligned} \text {Confidence}(A\rightarrow B)=\text {Supp}(A\cup B)/ \text {Supp}(A) \end{aligned}$$
(3)

Conviction is another measure proposed to handle some of the weaknesses of confidence and lift, it is sensitive to rule direction:

$$\begin{aligned} \text {Confidence}(A\rightarrow B)=\text {Supp}(A\cup B)/ \text {Supp}(A) \end{aligned}$$
(4)

To establish the interestingness measure, Bayardo and Agrawal [28] considered that the interesting rule must reside along the initial parameters of support and confidence. In the other way, Piatetsky–Shapiro [7], proposed a new measure called Rule-Interest. Hilderman and Hamilton [8], proposed sixteen diversity measures. In addition, Carvalho et al. [9] evaluated eleven objective interestingness measures by interest. Huynh et al. [10] proposed a clustering approach to identify clusters. In addition, Gavrilov et al. [11] studied the similarity between the measures. In addition, computation techniques of suitable objective measures are proposed by Xuan-Hiep and Fabrice [12]. Moreover, the other measurements are given in Table  1.

Table 1 The set of quality measurement

2.1.3 Extraction algorithms of association rules

In the literature, several algorithms have been proposed to extract association rules, among them we find Apriori, the key algorithm proposed by Agrawal to extract the frequent itemset. These algorithms can be classified into three large categories namely frequent algorithms [23], maximum algorithms [3], and closed algorithms [24].

Mining frequent itemsets Mining frequent itemset is the basic technique of extraction rules it proposed in the first time by Agrawal to analyze the problem of shopping basket. For this category, the Apriori algorithm constituted the key algorithm for extracting frequent itemset. Moreover, it constitutes the basis of the majority algorithms that are coming to extract association rules, among them, we find AprioriTID, FP-Growth, Partition, DIC, Eclat, etc.

Mining closed itemsets An itemset is closed in a data set if there exists no superset that has the same support count as this original itemset. The extraction of frequent closed itemset is based on the closing of the Galois connection [25]. For this category, several algorithms have been proposed, among them, we can mention close algorithm, Pascal, etc.

Mining maximal itemsets An itemset is maximal frequent if none of its immediate supersets is frequent. Several algorithms have been designed to mine maximal itemset, among them, we find MaxMiner [28], Pincer Search [29], MaxEclat [25], etc.

3 Research methodologies

In this section, we discuss the various methodologies constructing our proposed approach, and then we start with the choice of Multi-Criteria Analysis method.

3.1 The choice of the MCA method

The MCA presented as an alternative to classical methods of optimization based on the definition of unique function reflects the consideration of several criteria. The interest of methods is to consider a different nature of criteria without necessarily turning them into economic criteria, either in a single function. This is not search of an optimum, but a compromise solution that can take various forms: choice, assignment or classification. In the literature [26, 27], we encounter three problems like sorting, selection, and arrangement. In our context of a large number of extracted rules, we have to assign the extracted rules to the category of the most relevant, so we located at the assignment problem; therefore, the suitable method is ELECTRE TRI.

3.2 ELECTRE TRI method

Electre Tri is a suitable method adapted to simplify and solve complex decision problems of ranking type. The principal of this method is to assign a set of alternatives noted \(A=a_0, a_2, a_3, a_m\) on which the decision is based. We note \(F= {1, 2n}\) the set of criteria. Each alternative of the set A will be evaluated by a real function expressing the evaluation of alternative for given criteria, we note \(G=g_0, g_2, g_3,,g_m\) the evaluation of the alternative of the criteria considered [31]. The alternatives are not compared with each other, but with thresholds reflecting the boundary between h classes predefined, noted \(C=C_1, C_2, C_3...C_h\). Each alternative will be compared to the borders of each category, forming a profile \(B=b_1, b_2, b_3, b_h\), an illustration is given in Fig.  2.

Fig. 2
figure 2

The illustration of the sort problem

The affectation of alternatives in categories is based on the concept of classification. An action a of set A outrank \(b_h\) noted \(b_h\), if a as good as \(b_h\) on all criteria, and a not bad as \(b_h\) on the majority of the criteria. Electre Tri proceeds in two consecutive steps [32]:

Step 1: formulation of outranking relation S for the comparison of a to \(b_h\).

Fig. 3
figure 3

The outranking relations

Computation of partial concordance indices \(C_j(a,b_k)\): it expresses to which the extent a outrank \(b_h\) or a is at least as good as \(b_h\).

$$\begin{aligned} C_j(a,b_k)=\left\{ \begin{array}{l@{\quad }ll} 0&{}\text {if}&{} g_j(b_k)-g_j(a)\ge p_j(b_k)\\ 1&{}\text {if}&{} g_j(b_k)-g_j(a)\le p_j(b_k)\\ &{}\text {otherwise}&{} \frac{g_j(b_k)+g_j(a)-g_j(b_k)}{p_j(b_k)-q_j(b_k)}\\ \end{array}\right. \end{aligned}$$
(5)

The computation of global concordance index \(C(a_h,b_k)\):

$$\begin{aligned} C(a_h,b_k)=\frac{\sum _{j\in F}K_j C_j(a,b_h)}{\sum _{j\in F}K_j} \end{aligned}$$
(6)

With:

\(K_j\) Weight of criteria j,

\(C_j(a,b_h)\) The partial concordance indices of criteria j

The computation of the discordance indices \(d_j(a,b_k)\):

Computation of partial discordance index \(d_j(a,b_k)\): it expresses to which extent criterion is opposed to the statement a outrank \(b_h\).

$$\begin{aligned} d_j(a,b_k)=\left\{ \begin{array}{l@{\quad }ll} 0&{}if&{} g_j(a_h)\le g_j(b_h)+p_j(b_h)\\ 1&{}if&{} g_j(a_h)> g_j(b_h)+v_j(b_h)\\ &{}else&{} \in [0,1]\\ \end{array}\right. \end{aligned}$$
(7)

The computation of credibility index is based on the global concordance and the partial discordance index, see Fig. 3.

$$\begin{aligned} C(a,b_h)=\prod _{j\in \bar{F}} \frac{1-d_j(a,b_h)}{1-C(a,b_h)} \end{aligned}$$
(8)

With:

\(\bar{F} ={j\in F:d_j(a,b_h)>C(a,b_h)}\),

\(C(a,b_h)\): Global concordance index  

\(d_j(a,b_h)\): Discordance indices

Step 2: assignment procedures.

Two assignment procedures, pessimistic and optimistic, are available to assign a set of actions to different categories.

The pessimistic assignment compares the alternative a to \(b_i\) for \(i=h,h-1,..,0\), then assign alternative a to category \(C_{h+1}(a\rightarrow C_{h+1})\).

The optimistic assignment: compare the alternative a to \(b_i\) for \(i=1h\) and being the first profile such that \(b_hPa\) or \(b_hQa\) is satisfied, (P, Q represent strongly or weakly preferred, then assign alternative a to category \(C_h(a\rightarrow C_h)\).

3.3 Multi-agent system

A multi-agent system (MAS) is a collection of autonomous agents, which interact with each other or with their environments to achieve one or more objectives.

Data mining and MAS have been used for building a complex system [34], they are combined to produce automatic data mining system. In this work, we used MAS to modeling several autonomous intelligent agents: Mining Rules Agent (MR-Agent), Quality Measurement Agent (QM-Agent), Decision Support Agent (DS-Agent), Principal Agent (MCA-Principal Agent), Control agent (Control Agent) and the user interface agent. Each one has a specific task to achieve, the combination of the whole tasks will produce the main objective of the proposed approach.

4 The Proposed approach

The proposed approach is divided into four modules. The first is the association rules extraction module, for extracting rules from dataset using one of the efficacy algorithms of extraction. The second is the decision support module for making the appropriate choice of method according to the main objective and user specifications. The third is the quality measurement module of association rules [6, 33], and the last one is the main module of association rules, quality measurement and decision support for evaluating the set of extracted rules. The details of the multi-agent approach are given in Fig. 4.

Fig. 4
figure 4

The proposed multi-agent modeling

4.1 Module 1: mining association rules

The extraction of association rules is processed in two different steps, the first is mining of all frequent itemsets, and the second is the extraction of association rules from frequent itemset. For testing the performance and the relevance of our proposed approach, we used Apriori algorithm on a set of attributes to generate an important number of rules.

4.2 Module 2: decision support

The decision support is a process that uses a set of information available at a given time to formulate the problem and reach the decision on a specific object. In this module, we study the context and the main objective of the extraction problem to choose an appropriate method to be used in the next step.

4.3 Module 3: quality measurement

The Apriori algorithms and its derivatives provide an elegant solution to the rules extraction problem, but produce a large number of rules, selecting certain rules without interest and ignorant of interesting rules. There must be other measures to complement the support and confidence measures. The measures of interest can play a key role to filter the rules, automatically extracted according to criteria adapted to user needs. Seen the active search on rules interestingness, we encounter many measurements in the literature [6] (See, Table 1 in related work for a list of some quality measures).

4.4 Module 4: MCA module

The main module consists in evaluation of choice of extracted rules using the selected multi-criteria analysis method in which we use extracted rules as alternatives and quality measurement as criteria. This process will give us a set of chosen rules according to decision makers’ preferences, in case of satisfaction we obtain the final set of relevant rules, if not, we change a set of thresholds and parameters to get the appropriate results.

5 Multi-agent-based modeling

Multi-agent systems today represent a new technology for a design and control of complex systems. It is composed of independent software and hardware entities called agents; this system usually has several important features such as parallelism, robustness and scalability [34].

Seen the active search in data mining, the KDD become more and more complex. To solve such problem, our proposed approach identified by several agents work in cooperation to achieve the task of relevant rules extraction, it is composed by six agents: MR-Agent, QM-Agent, DS-Agent, MCA-Principal Agent, the user agent and the control agent. The details are given in Fig. 5.

Fig. 5
figure 5

The architecture of the proposed multi-agent system

The extraction of association rules task is performed by negotiation [34] between the control agent, MR-agents and user interface agents; these agents work in collaboration to achieve the required goals. The control agent receives a set of quality measurement, the selected method and a set of rules then send it to the principal agent to perform the evaluation task of association rules. All these agents should recurrently interact to share information and to perform tasks to achieve the process of extraction rules.

5.1 User interface agent

This agent is responsible for interaction with users or domain experts to determine a set of specifications such as number of attributes, minimum support threshold value, and other detail information of the processed dataset. It is responsible for receiving a set of specifications from environment and send it to the control agent for further use by other agents.

5.2 Control agent

The main function of this agent is to manage the communication between all components of the system, and control the data transmission among them. It is responsible for transferring the user needs from the user interface agent to MR-agent, QM-agent and DS-agent and sending back the result to the control agent to save it in the knowledge base. Once the result is saved in the knowledge base, the control agent, transfer it to the principal agent to begin the evaluation task of extracted rules.

5.3 MR-agent: mining association rules

The MR-agent receives a dataset and chooses the appropriate algorithm to extract association rules after receiving the minimum support from the control agent and sends the result to the knowledge base to be used by the principal agent.

5.4 DS-agent: decision support agent

In the literature, we encounter different methods of multi-criteria analysis. The DS-Agent is the responsible for choosing the most suitable method according to the studied problem, user needs and send it to the knowledge base to be used by the principal agent in the evaluation step.

5.5 QM-agent: the quality measurement agent

The objective of this agent is to filter the appropriate measures according to the user needs from those presented in the literature (See Table 1 in related work) and save them in the knowledge base for further use.

5.6 MCA-principal agent: multi-criteria analysis agent

The principal agent serves as a main agent, which facilitates the choice of relevant rules which have been extracted by MR-Agent and QM-Agent. It keeps track of the names and capacities of all registered agents in the system. Once the principal agent received the a of extracted rules from the control agent, and received a set of quality measurement from QM-Agent, it performs the extraction of relevant rules; the sequence of operation between different agents constituted the system given in Fig.  6.

Fig. 6
figure 6

The sequence diagram of the proposed system

This diagram shows the sequence of operations during the execution of the proposed multi-agent system. Moreover, it shows how the distribution of the mining tasks facilitate the process of relevant rules extraction.

6 Experimental results

Our research is based on a real dataset; after the preprocessing and the transformation of data, we select a set of records to facilitate the extraction process, and we focused on this study on extracting the association rules by applying Apriori algorithm. Moreover, the interesting and useful association rules are extracted using a multi-criteria analysis approach.

In this section, we applied the proposed approach on a real dataset commonly used in the field of KDD [36], which identifies the characteristics of a set of customers who filed a credit application file, as a case study to illustrate the performance of our proposed approach. In addition, according to decision makers preferences we used a threshold minimum support = 0.33, confidence = 0.75, max rule length = 3 and lift = 1, for extracting frequent itemsets, then we obtain 27 extracted rules given in Table  2.

This table presents twenty-seven extracted rules in which some of them are redundant like the rules Rule 20, Rule 22 and Rule 24; in other ways, some other rules are not interesting for the user.

The Apriori algorithms and its derivatives provide an important solution to the rules extraction problem, but produce a large number of rules, selecting certain rules without interest and ignorant of interesting rules. There must be other techniques to complement the support and confidence measures. The measures of interest can play a key role to filter the rules automatically extracted according to criteria adapted to the decision makers preferences.

The usefulness and relevance of association rules extracted by the extraction algorithms are a critical problem. In fact, in most cases, the real datasets lead to a very large number of association rules, which does not allow users to make their own selection of the most relevant. In this context, we believe that the integration of MCA approach within the process of rules extraction combined with multi-agent system would be particularly useful for the decision makers who are suffering from the large number and the complexity of rules extraction process. Therefore, the next step is the application of our proposed approach to automate the process of relevant rules extraction. We used a set of rules previously extracted by MR-Agent (Table 2) as actions to be evaluated according to the chosen criteria (support, confidence, lift) given by QM-Agent, to conduct a comparative study between rules using Electre Tri methods.

The decision matrix is used to describe a multi-criteria decision analysis (MCDA) problem. An MCDA problem, where there are M alternatives (rules) and each one needs to be evaluated on N criteria, can be described by the decision matrix that has N rows and M columns; in our case study, the decision matrix is given in Table  3.

Table 2 Association rules
Table 3 Decision matrix

The next step is to define a set of profiles according to decision makers preferences that can be compared with the extracted association rules, the thresholds are given in Table 4.

Table 4 Initial profiles and weight defining the category limits

As the assignment is done in three distinct categories, two profiles \(b_1\) and \(b_2\) are defined, where \(b_1\) is the border between the state class interested and status Medium, and \(b_2\) is the border between the state Medium and the state Not interested.

The importance of each criterion in the decision-making resulted in predefined thresholds, as given in Table 5, the definition of thresholds is given by decision makers according to their specific needs.

Table 5 Initial values of indifference, preference and veto thresholds

After the definition of thresholds, the next step is the computation of concordance index \(c(a,b_h)\) Eq. (5), and discordance index \(d_j(a,b_k)\) Eq. (7); thus, the formulation of outranking relation S by the computation of the degree of credibility.

The computation of credibility index is based on the global concordance and the partial discordance index. It presents the degree of credibility of the outranking relation, we define the default value \(\lambda \)-cut index as the parameter that determines the situation preferably between alternatives a and profiles \(b_h\). The implementation of the case provides a preference relationship between rules and profiles. In addition, an assignment of association rules through two procedures: pessimistic and optimistic are given in Table 6 by considering that RL is the abbreviation of Rule.

Table 6 Assignment procedures

ELECTRE Tri method provides a solution in the form of assignment categories by importance. According to these two assignment procedures, we find that the relevant rules are RL9, RL1, RL14, RL15, RL16 and RL18, and the remainder belongs to the second category. For our case study of 27 rules, we conclude that the most relevant rules are given in Table 7.

According to the extracted relevant rules, the customers who benefited from credit are those who are tenant with middle grade.

Table 7 The most relevant rules

6.1 Discussion

Based on this study, when applying multi-criteria analysis approach, especially Electre Tri method, on the set of previously extracted rules by Apriori algorithm or its derivatives we obtained six relevant and significant rules after eliminating the redundant and non-interesting rules. The rest of the rules belongs to the other categories less interest within the first category.

As shown in the experiment results, when applying Apriori algorithm or its derivatives on a dataset, we obtained 27 extracted rules, and when applying multi-criteria analysis approach on a set of extracted rules, we obtained only six significant and interesting rules. Moreover, the obtained results are always sensitive to the values of thresholds \(p_j\), \(q_j\), \(v_j\), and the decision makers preferences.

Compared to the previous studies [46], in which authors applied rules extraction algorithms and introduce some quality measurement, the results was performed well but always the large number of extracted rules is an obstacle for users to select the relevant rules according to their specific needs. On the other side, our proposed approach of MCA allows the decision makers to select the interesting rules according to their prespecific needs; in addition, the integration of the MCA approach allows users to solve the complex situation by selecting only the significant and useful rules. In addition, by applying multi-criteria analysis approach combined with multi-agent system, we reduce a set of extracted rules, then we eliminate the redundant rules, and the most important is the automation of rules extraction process using multi-agent system.

Finally, our proposed approach has several advantages:

Extract the relevant and useful association rules.

Automation of the rules extraction process.

Solve the complex situation of ranking problems.

Help users to choose their own specified rules.

Reduce the large number and redundancy of extracted rules.

7 Conclusion

In this work, we have discussed the usefulness and relevance problem issued from a KDD process, in terms of a large number of extracted rules, most of them are noisy, redundant and not interesting. The deploying methods have been proposed. However, these methods produce an important quantity of extracted rules. To solve such problem, we proposed an approach that use a set of extracted rules and quality measurement within the multi-criteria analysis process to make correspondent recommendations of relevant rules. In other way, we integrate a multi-agent system to manage and model our proposed approach according to six agents working in cooperation to manage the complexity of KDD process.

Moreover, we studied twenty-seven rules according to chosen criteria for selecting the relevant class of association rules, then we find six pertinent rules as a final result. In the other side, the use of Electre Tri improved that the decision makers’ preferences have a direct influence on selecting rules.

For further work, a new methodology combining this approach with the other optimization methods, applied to the big data, especially in the road prevention needs to be developed.