Coalitions’ Weights in a Dispersed System with Pawlak Conflict Model

The article addresses the issues related to making decisions by an ensemble of classifiers. Classifiers are built based on local tables, the set of local tables is called a dispersed knowledge. The paper discusses a novel application of Pawlak analysis model to examine the relations between classifiers and to create coalitions of classifiers. Each coalition has access to some aggregated knowledge on the basis of which joint decisions are made. Various types of coalitions are formed—a strong coalitions consisting of a large number and significant classifiers, and a weak coalitions consisting of insignificant classifiers. The new contributions of the paper is a systematical investigation of the weights of coalitions that influence the final decision. Four different method of calculating the strength of the coalitions have been applied. Each of these methods consider another aspect of the structure of the coalitions. Generally, it has been experimentally confirmed that, for a method that correctly identifies the relations between base classifiers, the use of coalitions weights improves the quality of classification. More specifically, it has been statistically confirmed that the best results are generated by the weighting method that is based on the size of the coalitions and the method based on the unambiguous of the decisions.


Introduction
An important problem in today's world is the dispersion of knowledge. Many units, dealing with the same subject and operating in the same field, gather the knowledge to which they have access. This knowledge can be the result of various factorsexperience, history, analyzed cases, sensors. Very popular form of saving knowledge is a decision table. However, if the knowledge contained in local decision tables is the result of different stimuli or analysis, the form of the tables can be very different, 1 3 both in terms of the sets of conditional attributes and the sets of the universe. It is not possible to aggregate such knowledge. In this situation, a more sophisticated approach should be used.
In this paper the novel application of Pawlak rough set analysis to a dispersed system is considered. This approach was proposed in Przybyła-Kasperek (2017), and Przybyła-Kasperek (2017) for the first time, in this article it is further developed and analyzed.
For the simultaneous use of dispersed knowledge, a dispersed system with a twostage aggregation structure was proposed. In the first step, local tables on the basis of which classifiers make similar decisions for the test object, are aggregated. In the second step, probability vectors generated based on aggregated tables are merged. In the paper (Przybyła-Kasperek 2017), Pawlak's conflict model was used to identify relations between base classifiers that are constructed based on local tables. In this way coalitions are generated for which the aggregated tables are defined.
The Pawlak's model was not originally designed for ensemble of classifiers. Therefore some modifications were proposed in the paper (Przybyła-Kasperek 2017) in order to apply the model to the problem of classification based on dispersed knowledge. Three approaches to adopting the model for the multiple classifier have been proposed. In the paper (Przybyła-Kasperek 2017), the structure of clusters created using these methods was analyzed. It turned out that the quality of the generated coalitions did not translate into the quality of the classification. The reason for this lies in the final decision-making method, where each coalition (both strong and weak) has the same influence on the decision.
The novelty that is proposed in this paper is to apply the weights of the coalitions in order to diversify the coalitions' influence on final decision. Four different methods of counting weights are analyzed. Coalitions' strength is calculated according to various factors. The first method takes into account the number of classifiers belonging to the coalition. The second method takes decisiveness of classifiers into account. The third method is a combination of the above two. The fourth method calculates the strength of the coalition depending on the variability of the vectors, which are generated based on the aggregated tables.
In this paper, these four methods have been applied to three approaches of using Pawlak's model in the ensemble of classifiers. It was shown on the example that the use of different approaches to determining weights of coalitions generates completely different results. In this paper in-depth and systematical experiments were presented that have been conducted on fifteen different dispersed sets (three different data sets, each dispersed in five different versions). Three different approaches to using Pawlak analysis model and four different methods for determining weights were compared. Statistical tests were performed to confirm that the use of coalitions' weights improved the quality of classification. It was found that the best results are generated by the weighting method that is based on the size of the coalitions and the method that is based on the unambiguous of the decisions.
To summarize the main contributions of this paper are as follows: • proposition of four methods for determining the weights of coalitions in a dispersed system using Pawlak analysis model, • examination of these methods for three approaches of application Pawlak analysis method in a dispersed system, • comparison and statistical analysis of the obtained results.
In this paper, the problem of ensemble of classifiers is considered. However, the fundamental difference between the approaches known from the literature (Bloch 1996;Shoemaker et al. 2008;Stefanowski 2005;Tang et al. 2006) and the one considered in the paper lies in the knowledge representation to which the classifiers have access. In the paper it was assumed that knowledge is pre-specified and accumulated separately by independent units. Therefore, we can not expect that the sets of objects or the sets of attributes fulfill some relations (inclusion, equality, or disjunction). Therefore, the approach that is considered in the paper is more general. In Panov and Džeroski (2007), a classifier ensembles with different sets of objects and different sets of attributes are considered. This method is a combination of Bagging and Random Subspace Method. However, the knowledge that is provided in this model is stored in a single data base and it is not possible to use several local decision tables that were collected separately. In Polikar et al. (2006), a system with incremental learning capability was proposed. This method allows to learn new information when a new data set is available. Data sets have different sets of attributes. This approach consists in creating several classifiers based on each new set of data. The system does not analyze the relationship between the available data sets and does not check the consistency of the knowledge contained in them.
In the paper the approach is used in which the recognition of relations and the creation of coalitions are very important. In the literature, different approaches to this issue can be found (Lopes et al. 2008;Fatima et al. 2005;Kersten and Lai 2007;Rahwan et al. 2004). In this article the Pawlak's model (Pawlak 1984) is used. The model allows to make advanced analysis of relations between agents and is a simple way to illustrate the basic properties of conflicts. The Pawlak's model has been studied by many authors (Deja 2002;Ramanna et al. 2006;Ramanna et al. 2007;Skowron and Deja 2002;Skowron et al. 2006). In Lang et al. (2017) a probabilistic model of conflict analysis was proposed, which is a combination of the Pawlak model and three-way decisions approach. Instead of using one threshold value, as it is done in this paper, two thresholds were used in order to recognize relations in conflict situation. In Yao (2019) three levels of conflict (strong conflict, weak conflict, and non-conflict) were considered. The author noticed some inconsistencies in the Pawlak model and removed them. In this study a completely different application of the Pawlak's model is considered.
The article is organized as follows. In the second section the three methods of using Pawlak's model for ensemble of classifiers are discussed. The third section describes the general way of operating a dispersed system. In the fourth section, methods of calculating weights are presented. The fifth section contains the example of applying the coalitions' weights in a dispersed system. The sixth section shows the experimental protocol. In the seventh section the results of experiments carried out using some data sets from the UCI repository are presented. The article concludes with a short summary.

3 2 Application of the Pawlak's Model for Ensemble of Classifiers
The Pawlak's original model was proposed (Pawlak 1984(Pawlak , 2005(Pawlak , 2006 to analyze the conflict situation in which the agents (parties participating in the conflict) decided to resolve the dispute peacefully. In the model, each agent express his views on some issues, from the set A, by assigning one of three values. For each issue, each agent assigns a value: 1-means the agent is favoring the issue or 0-means the agent is neutral or −1-means the agent is opposed to the issue. Such information about the conflict situation is stored in the form of an information system S = (U, A) , where the universe U are agents, A is a set of issues, and the set of values of a ∈ A is equal V a = {−1, 0, 1} . Opinion of agent ag about issue a is the value a(ag).
In the Pawlak's model, based on such information system, relations between agents are defined and coalitions of agents in friendship (clique) are created.
In the paper (Przybyła-Kasperek 2017), this model was used to the problem of ensemble of classifiers. It is assumed that each classifier is interpreted as an agent in the Pawlak's model. The classifier ag generates a vector of ranks ag (x) = [r ag,1 (x), … , r ag,c (x)] that reflects the classification of the test object x (c is the number of decision classes). This is implemented as follows. At first, a vector of values ̄a g (x) = [̄a g,1 (x), … ,̄a g,c (x)] is generated based on each local decision table using the modified m 1 -Nearest Neighbor algorithm. That is, from each local decision table and each decision class, m 1 objects that are the most similar to the test object are selected. The vector's coordinate for a given decision value is equal to the average similarity of the relevant objects from that decision class. Based on this vector, a vector of ranks ag (x) = [r ag,1 (x), … , r ag,c (x)] for classifier ag is generated. For example, if we have five decision classes and the vector ̄a g (x) = [0.8, 0.4, 0.6, 0.4, 1] then the vector of ranks is equal to ag (x) = [2,4,3,4,1] . Which means that the decision with the highest value in the vector ̄a g (x) has the rank 1, the second decision in the order has the rank 2, and so on.
A set of issues A that is considered in the Pawlak's conflict model is a set of decision attribute values. Based on the vectors of ranks, the views of the classifiers on the set of issues A are defined. Two approaches have been proposed to generate an information system.
In both methods, the information system S = (U, A) , where U is a set of classifiers, A is a set of decision attribute values, is defined. In the first method the vectors of ranks are converted into the views of the classifiers in the following way. Function In the second method it is realized in the following way.
As can be seen, in the first method the classifier is favoring the decision only if it has the rank 1; neutrality was not used here. In the second method the classifier is favorable to the decisions with the rank 1 and it is neutral to the decisions with the rank 2. It may be concluded that, the first method of defining an information system is more restrictive.
In the Pawlak's model, the relations between classifiers are defined based on the information system in one of the two following ways. A function of distance between agents can be used. The function * where Or a conflict function can be used. The function B ∶ U × U → [0, 1] for the set of issues B ⊆ A is defined as follows where B (ag 1 , ag 2 ) = {a ∈ B ∶ a(ag 1 ) ≠ a(ag 2 )}.
The main difference between these functions refers to neutral classifiers. We will explain this by example. Let it be given the issue a ∈ A and the three classifiers ag 1 , ag 2 , ag 3 ∈ U . We assume that a(ag 1 ) = 1, a(ag 2 ) = 0, a(ag 3 ) = −1 . Then * {a} (ag 1 , ag 2 ) = 0, 5 and * {a} (ag 1 , ag 3 ) = 1 . Thus, the distance of classifiers who are neutral is less than the distance of classifiers who are in conflict. For the conflict function, we have {a} (ag 1 , ag 2 ) = 1 and {a} (ag 1 , ag 3 ) = 1 . Thus, the distance between neutral classifiers and classifiers in conflict is the same. This can be summarized that the conflict function is more restrictive.
(4) B (ag 1 , ag 2 ) = card{ B (ag 1 , ag 2 )} card{B} , In the first method of defining the information system, there is no 0 value, so there are no neutral classifiers. This means that both functions assign the same values for each pair of classifiers. In the second method of defining the information system, neutral classifiers occur, so functions can have different values. As was explained above, the distance function is less restrictive for neutral agents-it assigns smaller value for neutral agents-than the conflict function. Based on the value of one of the functions, we define a coalition as a subset of classifiers such that for each pair of classifiers the value of the function is less than 0.5. Finally, three different approaches to generating coalitions were considered: • Approach 1-the first method of defining an information system is used, • Approach 2-the second method of defining an information system and the function of distance between agents are used, • Approach 3-the second method of defining an information system and the conflict function are used.
In the next section, issues related to the use of the created coalitions and the organization of the dispersed system, are described.

Scheme of a Dispersed System's Operation
Methods of creating coalitions that were described in the previous section can be applied to any ensemble of classifiers. In order to perform the experiments, some technical specifications have been adopted in this paper. These are described below. The way to generate vectors of ranks and the method for aggregation local decision tables have been taken from the previous work of the author (Przybyła-Kasperek 2017; Przybyła-Kasperek and Wakulicz-Deja 2016). General rules of operation of a dispersed system can be described in several steps. In the first step, the coalitions of classifiers are determined using one of the three approaches discussed in the previous section.
In the second step, on the basis of local decision tables from one coalition an aggregated decision table is generated. The method of approximation aggregation is described in the paper (Przybyła-Kasperek and Wakulicz-Deja 2016). The method is implemented as follows. From each of the local tables and from each decision class, m 2 objects that are the most similar to the test object are selected. Then the objects are merged under certain conditions. If objects come from the same decision class and have consistent values on common attributes, then they are written as one object in the aggregated table.
In the third step, a vector of values is determined on the basis of each aggregated table. The vector's coordinate for a given decision value is equal to the maximum similarity of objects from the decision class and the aggregated table to the test object.
In the last step, some linear transformations are made on these vectors of values and global decisions is generated. However, this will be discussed in the next section, as the novelty-the coalitions' weights-will be applied at this stage.

3
Coalitions' Weights in a Dispersed System with Pawlak Conflict…

Weights for Coalitions
The methods of creating coalitions of classifiers and identifying relations between classifiers that were described in Sect. 2, have already been analyzed in the paper (Przybyła-Kasperek 2017). In this article the structures of created coalitions were compared in detail. It was found that approaches 1 and 2 tend to create a smaller number of large coalitions. For data sets with a large number of decision classes, the extreme situation was obtained, in which practically one coalition consisting of all classifiers was generated. This is a very unfavorable situation and means that, in this case, these approaches have lost the ability to identify the relations between the classifiers. Only approach 3, which uses the second method of defining an information system and the conflict function, generates different result. A larger number of smaller coalitions has been obtained in this case. In addition, in this approach, classifiers are more likely to be simultaneously assigned to several coalitions. Finally, it was concluded that approach 3 has the greatest ability to identify relations between classifiers.
In the second step of the experiments that were presented in article (Przybyła-Kasperek 2017), the quality of the classification was analyzed using these three approaches. It was found that significant differences in coalitions' structure do not translate into the quality of classification. The reason for this lies in the method that was used to generate final decisions. Some novelties are proposed in this paper so that the coalitions' structure is taken into account when generating the final result. The method that was used previously and the introduced modifications are described below.
As was mentioned earlier, after determination of the classifiers' coalitions, for each coalition C j an aggregated decision table is generated. Based on each of the aggregated tables, the vector of values j (x) = [ j,1 (x), … , j,c (x)] , where c is the number of decision classes and x is the classified object, is generated as described in Sect. 3. Each vector coordinate corresponds to one decision value. The final decisions are generated using these vectors.
In the previous papers, the method of density-based algorithm was used. It was implemented as follows. First, the sum of the vectors j (x) was calculated. Then a decision that has the maximum value of the vector's coordinates ∑ j j (x) is selected. Finally, the DBSCAN algorithm was used in order to determine which values of the decision have vector's coordinates densely located around the maximum value (they are close enough to the maximum value). In this way, not only the decision which has the maximum support of all classifiers but also the ones for which support is sufficiently large is determined. However, if a simple sum of vectors is considered, each coalition has the same influence on the final decision. The coalition's structure and its size are not taken into account. In this situation, the process of coalitions' generation, and better identification of relations between classifiers is less important, and is not fully exploited.
Therefore, in this work, instead of a simple sum of the vectors, the weighted sum will be calculated. The weight is assigned to each coalition, which takes into 1 3 account the size, the structure and the commitment of the members into the coalition formation. Four different methods of calculating weights are considered.
The first method-the size of the coalition. In this method, a weight that takes into account the number of classifiers belonging to the coalition is assigned to each coalition. In all three approaches of applying the Pawlak's model, inseparable coalitions are generated. And that is why it is important to take into account the partial involvement of the classifier in the formation of the coalition. For each classifier ag the coefficient m x ag , where x is the classified object, is determined. The value of this coefficient is inversely proportional to the number of coalitions to which the classifier belongs where C x 1 , … , C x m are the coalitions for object x. The weight that is assigned to the coalition C x j is equal to ∑ ag∈C x j m x ag . In this way, the larger coalitions will have a greater influence on the final decisions that were made.
The second method-the unambiguous of the decisions made by the classifiers. The quality of the decisions taken by classifiers belonged to the coalition should also be taken into account when determining the weight of the coalition. By the quality of the decisions we understand the uniqueness of decisions, in other words decisiveness of classifiers. If the classifier ag generates a vector of values ̄a g (x) (defined in Sect. 3) in which the coordinates are very different, i.e. one of them is significantly larger than the others, it means that the classifier is certain of the taken decision. The probability of classifying the test object into one of the classes is much higher compared to the probabilities for other classes. The standard deviation of the values of the vector's coordinates is the measure of the quality of the taken decisions that correspond to such assumptions. Thus, for each classifier ag the standard deviation of the values of the vector ̄a g (x) is calculated, denoted as sd x ag . The weight that is assigned to the coalition C x j is equal to In this way, coalitions that contain more resolute and assertive classifiers will have a greater influence on the final decisions that were made. The third method-the size of the coalition and the unambiguous of the taken decisions. In this method two previously discussed approaches have been combined. It means that both the size of the coalition expressed by the number of classifiers included in it and the assertiveness of these classifiers are taken into account. For the classifier, the two defined before coefficients (the partial involvement of the classifier in the Table 1 Test object x a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 11 a 12 a 13 a 14 a 15 x 1 1 0 2 1 0 2 2 1 2 2 2 1 2 1 1 3 Coalitions' Weights in a Dispersed System with Pawlak Conflict… Table 2 Relevant objects from the decision tables  Table 3 Vectors ̄a g (x) and ag (x)  Table 4 Information systems

3
Coalitions' Weights in a Dispersed System with Pawlak Conflict… formation of the coalition and the standard deviation of the generated vector) are multiplied. The weight for the coalition C x j is calculated as follows The fourth method-the unambiguous of the decisions made by the coalition. At first glance this method looks the same as the second method. However, the basic difference is that this time when calculating the weight of the coalition, a vector that was generated based on the aggregated decision table of the coalition is considered. The value of the weight is not dependent on the decisions taken by the coalition's members, but on aggregate knowledge of the coalition. Analogously to the second method, the standard deviation SD x j of the vector j (x) is calculated for each coalition. The weight for the coalition C x j is calculated as follows where C x 1 , … , C x m are the coalitions for object x. The method of generating the final decision is as follows. A vector equal to the weighted average of the vectors generated based on the aggregated tables is calculated where x j is the weight generated for the coalition C x j and the test object x according to one of the four approaches described above. Then, as before, a decision with a maximum vector's coordinates is chosen. Finally, the DBSCAN algorithm is used in order to select a set of decisions that are densely located around the decision with the maximum support of the classifiers.

Example of Applying the Coalitions' Weights in a Dispersed System
In this section, an example of the use of three approaches to creating coalitions of classifiers and four methods of calculating coalitions weights will be presented. The example will also illustrate the differences in the results generated by these methods. We assume that we have the set of seven classifiers (interpreted as agents in the Pawlak's model) U = {ag 1 , ag 2 , ag 3 , ag 4 , ag 5 , ag 6 , ag 7 } and the set of five decision , in this example we assume that the conditional attributes in decision tables are qualitative. For the test object x, the classifier ag, generates a vector of ranks r ag (x) by using the m 1 -nearest neighbor algorithm. We  256,0.184,0.185,0.190,0.185] assume that m 1 = 1 and the test object is presented in Table 1. The nearest objects from each decision class of the decision tables and the values of the Gower similarity measure s(x, y) (that in the case of qualitative attributes is equivalent to the Hamming measure) are given in Table 2. Based on these similarity values, the vectors of values ̄a g (x) = [̄a g,1 (x), … ,̄a g,c (x)] , ag ∈ U, c = card{A} are generated, and then the rank vectors ag (x) = [r ag,1 (x), … , r ag,c (x)] are determined. Both are given in Table 3. In order to determine the coalitions of classifiers, the information system S = (U, A) is defined. Depending on the approach (Approach 1, Approach 2 or Approach 2), this is carried out in accordance with Formula 1 or Formula 2. Both information systems are presented in Table 4. Based on the information systems, the values of the function of distance between agents (Formula 3) or the values of the conflict function (Formula 4) are determined, depending on the approach. From now on, the example will be solved separately for each approach.

Approach 1
In Approach 1 we use the first method of defining an information system (Formula 1). In this case both functions (Formula 3 and Formula 4) are equivalent. The   Coalitions' Weights in a Dispersed System with Pawlak Conflict…  Coalitions' Weights in a Dispersed System with Pawlak Conflict…  219,0.161,0.193,0.193,0.235] function values are shown in Table 5. For example, the value A (ag 1 , ag 4 ) is calculated as follows because values for v 1 , v 2 , v 5 and ag 1 , ag 4 are different in the information system. Then the relations between the agents are determined (allied, conflict and neutrality). A graphical representation of the conflict situation is shown in Fig. 1. In this representation classifiers are circles and if classifiers are allied then circles are linked. Coalitions are the subset of vertices such that every two vertices are linked. There are three coalitions {ag 1 , ag 3 , ag 5 , ag 6 , ag 7 } , {ag 2 , ag 4 } and {ag 4 , ag 6 } . In the next step for each coalition an aggregated decision table is generated. For this purpose the method of approximation aggregation is used, we assume that m 2 = 1 . This means that one object with the greatest similarity with the test object is selected from each decision class of the local decision table. Then objects from the same decision class that have consistent values on common attributes are written as one object in the aggregated table. The aggregated tables are presented in Table 6. For example, the third object from the aggregated table for the coalition {ag 1 , ag 3 , ag 5 , ag 6 , ag 7 } was created by combining objects x 1 1 ∈ U ag 1 , x 3 1 ∈ U ag 3 and x 6 1 ∈ U ag 6 , because they are from the same decision class. For attributes a 12 and a 13 , we have values ?, because it was not possible to select objects from the universe U ag 5 and U ag 7 and decision class    Table 6. When we calculate these values, we ignore the attributes for which we have value ?. The vectors [ j,1 (x), … , j,5 (x)] , j ∈ {1, … , 7} that were determined based on the similarity values and those which were obtained after the transformation (so that the sum of the coefficients will be equal to 1) are shown in Table 7. Then the weights for the coalitions are calculated in one of four ways and the weighted sum of the vectors is calculated.
In the first method, the coefficients m x ag are calculated for each classifier (Formula 5). We have m x ag 4 = m x ag 6 = 1 2 , because these classifiers belong to two coalitions. For other classifiers, the coefficient is equal to 1. Thus, the coefficients for the first, second and third coalition are respectively equal to 4.5, 1.5 and 1. The vectors of coalitions that were multiplied by the weights and the weighted average of the vectors are given in Table 8.
In the second method, the standard deviations of the vectors' coordinates that were generated by classifiers are calculated. These values are given in Table 3. The coefficients for the first, second and third coalition that were determined in accordance with Formula 6 are respectively equal to 0.595, 0.172 and 0.233. The vectors of coalitions that were multiplied by the weights and the weighted average of the vectors are given in Table 8.
In the third method, the weights for coalitions are determined according to Formula 7. This method is a combination of the two previous methods. The weights are equal to 0.704, 0.145, 0.152 for the first, second and third coalition respectively. The vectors of coalitions that were multiplied by the weights and the weighted average of the vectors are given in Table 8.
In the last method, the standard deviations of the vectors generated by the coalitions are calculated (Table 7). The coefficients for the first, second and third coalition that were determined in accordance with Formula 8 are respectively equal to 0.295, 0.401 and 0.304. The vectors of coalitions that were multiplied by the weights and the weighted average of the vectors are given in Table 8. It should be noted that each method of calculating weights generates completely different values. In the first, second and third methods, the first coalition {ag 1 , ag 3 , ag 5 , ag 6 , ag 7 } was recognized as the strongest. However, there are differences in the ranking of remaining coalitions. The last method of calculating weights recognized the second coalition {ag 2 , ag 4 } as the strongest.
After calculating the weighted sum of vectors according to one of the methods discussed above (see the last line in Table 8), the decision with the highest coefficient is determined and its group is generated using the DBSCAN algorithm.
In each method, the decision v 1 has the highest vector's coefficient. However, comparing the vectors generated in each of the methods we can see the differences that affect the result in the DBSAN method. For example, for the parameters value = 0.019 and minPts = 2 of the DBSCAN algorithm, the third method generates the set {v 1 , v 3 , v 4 , v 5 } while all others methods generate the set {v 1 } . For the parameters value = 0.01 and minPts = 2 of the DBSCAN algorithm, all the methods determine the set {v 1 }.
For each coalition an aggregated decision table is generated (see Table 10). The vectors that were determined for each aggregated table are given in Table 11. Then the weights for coalitions are calculated in one of four ways and the weighted sums of the vectors are calculated. These values are given in Table 12.
This time the decision v 5 received the greatest support of the coalitions, differently than for Approach 1. Of course, also completely different vectors were * A (ag 1 , ag 4 ) = 0.5 + 1 + 0.5 + 0.5 + 0.5 card{A} = 3 5 , obtained for all four methods of determining the coalitions' weights. As a consequence, we get different results. For the parameters value = 0.019 and minPts = 2 of the DBSCAN algorithm, the first and the last methods generate the set {v 1 , v 5 } while the second and the third methods generate the set {v 5 } . For the parameters value = 0.01 and minPts = 2 of the DBSCAN algorithm, only the first method determines the set {v 1 , v 5 } , while the rest of the methods generate the set {v 5 }.

Approach 3
In Approach 3 we use the second method of defining an information system (Formula 2) and the conflict function (Formula 4). The function values are shown in Table 13 and a graphical representation of the conflict situation is presented in Fig. 3. As can be seen, there are six coalitions {ag 1 , ag 3 } , {ag 1 , ag 5 } , {ag 2 , ag 4 } , {ag 3 , ag 7 } , {ag 6 , ag 7 } and {ag 4 , ag 6 }. The aggregated decision tables for the coalitions are given in Table 14 and the vectors that were generated based on the aggregated tables are presented in Table 15. The coalitions' weights and the weighted sum of the vectors are given in Table 16.
The results obtained for Approach 3 are significantly different from the two previously considered approaches. This time the decision v 1 received the greatest support for the first method of calculating weights, for rest of the methods the decision v 5 received the biggest support. We get different results than before. For the parameters value = 0.019 and minPts = 2 of the DBSCAN algorithm, all the methods of calculating weights generate the set {v 1 , v 5 } . For the parameters value = 0.01 and minPts = 2 of the DBSCAN algorithm, the first method determines the set {v 1 , v 5 } , while rest of the methods generate the set {v 5 }.
A comparison of the results obtained using three approaches to creating coalitions and four methods of calculating coalitions weights is presented in Table 17. As can be seen, Approach 1 determined the largest coalitions, while Approach 3 generated the smallest coalitions. The decisions v 1 and v 5 are preferred by all methods. However, the exact indication depends on the approach and the method of calculating the weights. Even within one approach, the use of weights can affect that another decision has the greatest support of classifiers. It can be generalized that Approach 1 considers that the best decision is the decision v 1 , while Approaches 2 and 3 recognize that the best decision is the decision v 5 .

Experimental Protocol
In this section, a description of the experiments that were carried out is included. Datasets and evaluation measures are presented. Three datasets from the UCI repository were selected for the experiments: Soybean, Vehicle Silhouettes and Landsat Satellite. These sets were randomly dispersed into five different versions. The version with three, five, seven, nine and eleven local decision tables was considered. The dispersion was not intended to increase system efficiency but to provide dispersed knowledge, that is why the process was not optimized in any way. These sets were also used in the previous work (Przybyła-Kasperek andWakulicz-Deja 2014a, b, 2016), where you can find a detailed description of the data used and the process of dispersion.
We use the following designations for dispersed systems: • DS Ag3 -dispersed system with 3 tables, • DS Ag5 -dispersed system with 5 tables, • DS Ag7 -dispersed system with 7 tables, • DS Ag9 -dispersed system with 9 tables, • DS Ag11 -dispersed system with 11 tables. In this paper, the quality of classification of a dispersed system with three different approaches to conflict analysis and four different methods of determining the strength of a coalition is examined. For this purpose, the train and test methodology was used, since it is suitable for dispersed data, as was explained in detail in Przybyła-Kasperek and Wakulicz- Deja (2016). For the Landsat Satellite data set validation, training, and test sets were used. Because the method of density-based algorithm that was used generates a set of decisions rather than a single decision, so the appropriate measures of quality of classification are used. There are: • estimator of classification error where I(d(x) ∉d DS (x)) = 1 , when d(x i ) ∉d DS (x) and I(d(x) ∉d DS (x)) = 0 , when d(x) ∈d DS (x) ; d DS (x) is a set of global decisions generated by the system for the test object x • estimator of classification ambiguity error where I(d(x) ≠d DS (x)) = 1 , when {d(x)} ≠d DS (x) and I(d(x) ≠d DS (x)) = 0 , when {d(x)} =d DS (x) • the average size of the global decisions sets In the first stage of experiments, optimization of system's parameters m 1 , m 2 and was performed. As was mentioned earlier, the parameter m 1 is used in the process

Experiments and Discussion
In this section, the results of experiments with the division on the analyzed datasets will be presented. Then comparison of the results with regards to three conflict analysis methods and four methods of determining the strength of the coalition will be made in the last part of this section.

Results with the Soybean Data Set
The results of the experiments with the Soybean data set are presented in Tables 18, 19, 20. Each table illustrates approach that was used to generate the coalitions: Approach 1, 2 and 3. The following information is given in the tables: • the version of dispersion (System) • the methods of determining the strength of the coalition (Weights): 1-the size of the coalition, 2-the unambiguous of the decisions made by the classifiers,  In Tables 18, 19, 20 the best results were in bold. Comparing the results presented in Tables 18, 19 and 20 the following conclusions can be drawn. Approach 3 of creating coalitions of classifiers generates the best results. The improvement of the quality of classification is significant compared to Approaches 1 and 2. In general it can be said that the worst results were obtained using Approach 1. Approach 2 improves the quality of classification compared to Approach 1. However, both are much worse than Approach 3. These conclusion apply to ensemble of more than three classifiers. For three classifiers, method of conflict analysis and method of determining the strength of the coalition are irrelevant. It is easy to justify this result. Simply, three classifiers are too small to generate any meaningful coalitions. It can not be concluded that any of the methods of determining the strength of the coalition is better than others.
The results obtained using the coalitions' strength were compared with the results obtained when the weights were not used. These results are shown in Table 21. The best results within this table were in bold. It was noted that the use of coalitions' weights significantly improved the quality of classification. But the most spectacular improvement was obtained for Approach 3. Such results are not surprising as already in the paper (Przybyła-Kasperek 2017) it was shown that Approach 3 recognizes relations between classifiers in the best way and reflects them in the form of coalitions. This is why, in this approach, the strengthening of the importance of coalitions gives the best results. As was previously stated, for ensemble of three classifiers the use of weights have not changed the quality of classification.

Results with the Vehicle Silhouettes Data Set
The results of the experiments with the Vehicle Silhouettes data set are presented in Tables 22, 23, 24. Each table illustrates approach that was used to generate the coalitions: Approach 1, 2 and 3. The information in the tables is similar to the previous tables.
In Tables 22, 23, 24 the best results were in bold. Based on the results given in Tables 22, 23 and 24 it can be concluded that for the Vehicle Silhouettes data set, there were no such significant differences between the results obtained using Approaches 1, 2 and 3 as it was in the case of the Soybean data set. One can certainly be said that Approaches 2 and 3 are better than Approach 1. Note that, Approach 3 generates only slightly better results than Approach 2. Comparing the methods of determining the strength of the coalition, most often, the best results were generated by the size of the coalition method and the unambiguous of the decisions made by the classifiers method. Table 25 shows the results obtained without the use of coalitions' weights. The best results within this table were in bold. Comparing results that were presented in Tables 22-25, it can be said that much better results were generated when the strength of the coalition are taken into account. When there is a larger number of classifiers we can expect a greater improvement. For three classifiers this improvement is negligible, or as for Approach 3, even worse results were obtained. This situation seems to be logical and it was already been commented earlier.

Results with the Landsat Satellites Data Set
The results of the experiments with the Landsat Satellites data set are presented in Tables 26, 27, 28. Each table illustrates approach that was used to generate the coalitions: Approach 1, 2 and 3. The information in the tables is similar to the previous tables.
In Tables 26, 27, 28 the best results were in bold. For the Landsat Satellites data set, the best results were generated using Approach 3, while the worst for Approach 1. Differences in the quality of classification are noticeable, especially for a greater number of classifiers. Thus, using the weights, it was possible to translate better ability to recognize the relationship between the classifiers of Approach 3 (which was already been demonstrated in paper Przybyła-Kasperek 2017) on the quality of classification. It should be noted that for Approach 1, the method that was used to calculate weights of coalitions is irrelevant. For all four methods the same results were obtained. This is due to poor quality of coalitions that were built using Approach 1. It can not be stated that one of the methods of calculating the weights of coalitions is the best. The only thing that can be said is that the method considering the unambiguous of the decisions made by the coalition usually generates the worst results. Comparison of results obtained without the use of the strength of the coalition was also performed for the Landsat Satellites data set. Results are presented in Table 29. The best results within this table were in bold. It has been noted that for Approaches 1 and 2 not always the use of weights improves the quality of classification. However, for a greater number of classifiers improving is noticeable. Only for Approach 3 we can always be sure of improving the results when using weights, this improvement is very significant and noticeable.

Comparison of Methods of Coalitions' Generation and Methods of Determining Coalitions' Strength
Based on the results described in the previous sections, it was found that the use of coalitions' strength has improved the quality of classification, regardless of the approach to creating coalitions that was used. It is difficult to designate which of the ways of calculating weights is the best. However, the results show that the last method-the unambiguous of the decisions made by the coalition-achieves the worst results. The first method-the size of the coalition-very often produces good results. A statistical analysis was performed to confirm this observations. The results were divided into five groups-four groups for different methods of calculating weights and one group consists of the values of error obtained without Fig. 4 Comparison of the classification error e for four different methods of determining coalitions' strength and for approach without any weights calculating the weights. The Friedman's test was performed at first. The test confirmed that the differences between the classification error in these five groups are significant with a level of p = 0.000001 . Then, in order to determine the pairs of methods between which statistically significant differences occur, a nonparametric Wilcoxon each pair test and a parametric t-test for dependent groups were performed. Both tests shown that there is no significant difference between pairs of methods 2, 3 (the unambiguous of the decisions made by the classifiers and the size of the coalition and the unambiguous of the taken decisions) and methods 4, 5 (the unambiguous of the decisions made by the coalition and without calculating the weights). For all other pairs the importance of differences was confirmed at a significance level less than 0.05.
The box-whiskers chart which presents statistics for these five groups was created (Fig. 4). It can be clearly seen that the box for the first method for calculating the weights of the coalitions is located the lowest in the graph. The second and the third methods have comparable boxes, and the boxes for the last method of calculating weights and approach without weights are the largest and are at the top of the graph.
The final conclusion is that the first method is the best for use. The second and the third methods are in the second position, and it does not make sense to use the fourth method. Similar tests were made to find out which of the approaches to coalition creation is the best. In the previous sections it was stated that Approach 3 is the best, and Approach 1 is the worst. In order to statistical justification of this, all results are divided into three groups, each representing different approach. The Friedman's test confirmed that the differences between the classification error for these three approaches are significant with a level of p = 0.000001 . Both tests-a nonparametric Wilcoxon each pair test and a parametric t-test for dependent groups-confirmed that for all pairs of different approaches the differences are statistically significant at a significance level less than 0.0004.
The box-whiskers chart that presents statistics for these three groups is shown on Fig. 5. The smallest median of error values is observed for Approach 3 and the largest for Approach 1. The final conclusion is that the best way of creating coalitions is Approach 3.

Summary
In this paper, three approaches to use the Pawlak's conflict model in order to analyze relations between classifiers and to creation coalitions of classifiers were considered. Four methods of calculating coalitions weights, in order to take advantage of the generated coalitions, were proposed. The results of the experiments on three data sets that were dispersed in five different versions were presented. The obtained results were compared and hypotheses were statistically justified. It has been found that Approach 3 to coalition creation generates the best results. In terms of the weights of the coalitions the method that is based on the size of the coalition is the best. A little worse results were generated by the unambiguous of the decisions made by the classifiers method and the size of the coalition and the unambiguous of the taken decisions method.
In this paper it was shown that the use of coalitions' weights improves the quality of classification as long as the coalitions are correctly identified. Three approaches of using Pawlak's analysis model were used to determine the coalitions, but only one of them has the ability to generate well-formed coalitions. Further modifications of application Pawlak's conflict analysis method to a dispersed system are planned in the future work. It is planned to use the approaches proposed in Lang et al. (2017); Yao (2019) to develop ideas presented in this paper.