Analysing paradoxes in design decisions: the case of “multiple-district” paradox

In early design stages, a team of designers may often express conflicting preferences on a set of design alternatives, formulating individual rankings that must then be aggregated into a collective one. The scientific literature encompasses a variety of models to perform this aggregation, showing strengths and weaknesses. In particular situations, some of these models can lead to paradoxical results, i.e., contrary to logic and common sense. This article focuses on one of these paradoxes, known as multiple-district paradox, providing a new methodology aimed at identifying the reason of its potential triggering. This methodology can be a valid support for several decision problems. Some examples accompany the description.


Introduction
Several decision-making problems in design concern the formulation of rankings amongst alternative design solutions [1,2]. A very popular problem in the early design stage is that in which m engineering designers (or more simply experts: D 1 to D m ) formulate their individual rankings of n design alternatives (or more simply alternatives: O 1 to O n ) [3][4][5][6][7][8][9]. This problem may concern design activities devoted to both incremental and disruptive forms of innovation [54,55].
For the sake of simplicity, this paper will consider complete rankings where: (i) each expert is able to rank all the alternatives of interest, and (ii) each ranking can be decomposed into paired-comparison relationships of strict preference (e.g., O 1 O 2 or O 1 ≺ O 2 ) and/or indifference (e.g., O 1~O2 ) [10].
Since designers often have conflicting opinions about the possible design alternatives, their rankings -which form the so-called preference profile -can be characterized by B Fiorenzo Franceschini fiorenzo.franceschini@polito.it Domenico A. Maisano domenico.maisano@polito.it 1 Politecnico di Torino -DIGEP (Department of Management and Production Engineering), Corso Duca degli Abruzzi 24, 10129 Torino, Italy a certain degree of variability or discordance [11][12][13]. The objective of the problem of interest is to aggregate the expert rankings into a collective one, which is supposed to reflect them as much as possible, even in the presence of diverging preferences [14][15][16][17][18][19][20][21][22][23][24][25]. For this reason, the collective ranking is also defined as social, consensus or compromise ranking [2,16,26].
The scientific literature includes a variety of possible models to perform this aggregation. Different aggregation models often lead to different collective rankings [13,22] and -paraphrasing what was theorized by Arrow -any aggregation model, in specific situations, is by its nature imperfect [23,57].
In general, the choice of the most suitable model may depend on the specific objective(s) of the expert group and/or (ii) the characteristics of the preference profile [27][28][29][30][31][32]. In addition, some aggregation models can occasionally cause paradoxical results that are (at least apparently) logically unreasonable or self-contradictory [33,34].
This paper focuses on a specific paradox, known as "multiple-district paradox", which can be summarized as follows: although an alternative can be the most preferred one in two (or more) sub-groups (districts) of rankings, yet it cannot necessarily be the most preferred one when merging the sub-groups of rankings into a single combined group [35,36]. In other words, this paradox occurs when one alternative wins in every district but loses when merging them [37]. The expression "multiple-district" derives from the Voting Theory context, in which this paradox was originally studied.
This paradox is of potential interest even today, as it can occur in design problems involving distributed teams, whose local decisions should be merged into a single global decision [38]. Some practical examples concerning the Quality and Reliability field can be found in [56,58].
This paper analyses the multiple-district paradox, providing a new "diagnostic" methodology aimed at identifying the reason of its potential triggering.
The remainder of this paper is organized into three sections. Section 2 conducts a qualitative analysis of the paradox, highlighting the typical conditions behind its occurrence, such as characteristics of the preference profile and/or the aggregation model. Section 3 illustrates the new diagnostic methodology, which is based on (i) some indicators representing the degree of concordance between the expert rankings and (ii) other indicators representing the consistency between the expert rankings and the collective ranking. The new methodology allows to investigate the causes of the paradox, on a case-by-case basis. Finally, Sect. 4 summarizes the original contributions of this paper and its practical implications, limitations and suggestions for future research. The Appendix section provides further details on the indicators used in the analysis.

The multiple-district paradox
This section illustrates the multiple-district paradox, with the support of several examples in the context of product design. The rest of the section is organized in three sub-sections, respectively dedicated to: 1. Exemplifying the occurrence of the paradox and raising some research questions through a preliminary case study; 2. Showing that the paradox may concern different aggregation models, depending on the preference profile of interest; 3. Identifying the typical conditions that favor the occurrence of the paradox.

Case study
Let us consider the interior design of a luxury car. It is assumed that three alternative interior-design concepts (i.e. O 1 , O 2 , O 3 ) should be assessed by two sub-groups of experts, with the aim of identifying the best concept in terms of aesthetics. Sub-group A is composed of seventeen engineering-design experts (i.e. e A 1 to e A 17 ) from a specific headquarters of a major design company, while sub-group B is composed of fifteen engineering-design experts (i.e. e B 1 to e B 15 ) from another headquarters of the same company. The notion of aesthetics is defined from a triple perspective: (i) colour matching; (ii) harmonious design; and (iii) comfort and practicality. Since the aesthetics assessment is intrinsically subjective, each expert is asked to formulate his/her individual ranking of O 1 , O 2 and O 3 , as summarized in Table 1a and b, for sub-group A and B respectively.
The team leader decides to aggregate the expert rankings into a collective one, through an aggregation model called Instant-Runoff Voting (IRV), sometimes referred to as Alternative Vote [22,36]. The IRV was originally conceived as part of the Voting Theory in single-seat elections with more than two candidates [37]. Instead of voting support for only one candidate, voters in IRV elections can rank the candidates in order of preference. Ballots are initially counted for each voter's first-choice. If a candidate obtains more than half of the votes based on first-choices, that candidate wins. If not, the candidate with the fewest votes is eliminated. The voters who selected the defeated candidate as a first-choice then have their votes added to the totals of their next choice. This process continues until a candidate has more than half of the votes. Of course, the application of the IRV model can be extended to other contexts, such as that of product design, where candidates are replaced with alternative design concepts and voters are replaced with design experts.
Returning to the case study, the IRV model can be applied separately to the two previous expert sub-groups (districts), obtaining the results below.

Sub-group A
• In the first round (see Table 1 Total: 17 Total: 15 Total: 32

Combined group
Assuming that, ceteris paribus, the two expert sub-groups A and B are merged into a combined group (A + B) of thirty-two experts (see Table 1c), the IRV can be applied to all their merged rankings as follows. • The resulting collective ranking for the combined group The above results are paradoxical: considering both the two sub-groups A and B separately, the most preferred alternative is O 2 , while combining the two sub-groups, the most preferred alternative becomes O 1 . This result is difficult to justify since it is (at least apparently) contradictory and against logic: how could the team leader (or whoever) tolerate that -although O 2 is the best design concept according to each individual sub-group -when combining the two sub-groups, O 1 is the (new) best one? Table 2a summarizes the results obtained from the three previous applications of the IRV aggregation model.
The aforementioned paradox example raises some research questions, which will be addressed in the remainder of the paper: (1) Is the multiple-district paradox originated by a specific aggregation model or a specific preference profile, or both? (2) Can we develop an operational procedure to quantitatively analyse the reasons behind the occurrence of this paradox?

Changing aggregation model and preference profile
The previous example showed the occurrence of the multipledistrict paradox when applying the IRV aggregation model to a certain preference profile. However, what would happen if changing the aggregation model? And what would happen if changing the preference profile? Let us consider two further aggregation models, respectively (i) the one proposed by Coombs [39,40] and (ii) the so-called Borda Count model [22,41,42], applying them to each of the same three (sub-)groups of rankings (i.e., A, B and A + B). The following sub-sections illustrate the results obtained through the application of these other aggregation models.

Coombs' aggregation model
This model is very similar to the IRV, except that the alternative eliminated in a certain round is the one ranked last by the largest number of experts, not the one ranked first by the smallest number of experts [36].
By applying the Coombs' model to the individual subgroups (A and B) of rankings in Table 1, the following results can be obtained. The collective ranking for sub-group A is then:

Sub-group B
In the first round for sub-group B, the design concept The collective ranking for sub-group B is then:

Combined group
The Coombs' model can then be applied to the combined group (A + B) of thirty-two rankings, as follows.
In The collective ranking for the combined group (A + B) is then: It can be noticed that the collective ranking related to the combined group coincides with that of the sub-group B. Therefore, the multiple-district paradox does not occur in this case. Table 2b summarizes the afore-described results.

Borda count model
The Borda Count (BC) model works as follows. For each expert ranking, the first alternative obtains one point, the second two points, and so on [22,41,42]. Thus, the cumulative score of one alternative can be calculated by cumulating the corresponding scores obtained in each ranking. Applying this model to the three (sub-)groups of rankings -A, B and (A + B) -in Table 1, the following results are obtained (see also  Table 2c).

Sub-group A
With reference to the rankings in sub-group A, the so-called Borda Counts related to the three alternatives (i.e. O 1 , O 2 and O 3 ) can be calculated as: Of course, the degree of preference of an i-th alternative decreases as the corresponding BC A (O i ) value increases. The collective ranking for sub-group A is then: Table 2 Collective rankings obtained by applying the: a IRV, b Coombs' and c BC models to the sub-groups (A and B) of experts and the combined group (A + B). The corresponding expert rankings are reported in Table 1 No. of experts (a) IRV

Sub-group B
With reference to the rankings in sub-group B, the Borda Counts (BC B ) are: (2) the collective ranking for the sample (B) then is:

Combined group
The Borda Counts related to the alternatives in the rankings of the combined-group BC A+B (O i ) are: The collective ranking for the combined group (A + B) is then O 3 O 2 O 1 , which coincides with that of the sub-group A. Again, the paradox observed when applying the IRV model (see Sect. 2.1) does not occur.
It is worth remarking that the BC aggregation model guarantees a sort of "overlapping of effects", which results in the following additive relationship: In addition, the BC aggregation model can be classified as positional scoring procedure (PSP), since the scores assigned to alternatives are based on their respective position on the ranking [36,43]. On the other hand, the IRV or Coombs' models are not PSPs, as the differences between the points awarded to alternatives in other positions than the first-or last-choices are equal.
With reference to the preference profile in Table 1, the IRV seems more prone to the multiple-district paradox than the Coombs' or BC model. Even though this rule is not necessarily general, what happens when changing the preference profile?

Further case study
TheLet us consider a second case study, which is similar to the previous one but characterized by a different repartition of the (new) expert rankings into two (new) sub-groups (A' and B'). Table 3 shows a first sub-group (A') consisting of thirtyfour experts (i.e., e A 1 to e A 34 ) and corresponding rankings, and a second sub-group (B') consisting of seven experts and corresponding rankings.
application of the three aggregation models (IRV, Coombs' and BC) to the new (sub-)groups of rankings (A', B' and A' + B' in Table 3) results into the nine collective rankings in Table 4. Interestingly, the multiple-district paradox occurs when applying the Coombs' model, while it does not occur when applying the IRV or BC models. It can be noticed that when expert rankings are very "polarized", as for sub-group B', the three different aggregation models tend to converge towards the same collective ranking (e.g., Table 4).
The previous examples show that the occurrence of paradoxes is not easily predictable. In general, paradoxes may arise from a difficult-to-predict combination between the characteristics of (i) the aggregation model, (ii) the expert rankings and (iii) their repartition into sub-groups.
Predicting a paradox is a very complex issue and still an open problem [44]. However, it is proven that so-called PSPs (see definition in Sect. 2.2.2), like the BC model, are "immune" from the multiple-district paradox, due to their structural features [36,43].

Triggering factors of the paradox
Besides providing some examples of occurrence of the multiple-district paradox, the previous sub-sections showed that this paradox can affect different aggregation models, depending on the specific preference profile. Let us go deeper into the issue, trying to identify the main "triggers" of the paradox, as explained in the following points.
1. This paradox can be seen as a manifestation of incoherence in the positioning of top alternatives within collective rankings, namely: (i) the winning alternative of the sub-groups' collective rankings and (ii) the winning Total: 34 Total: 7 Total: 41 Table 4 Collective rankings obtained by applying the IRV, Coombs', and BC models to the sub-groups (A' and B') of experts and the combined group (A' + B'). For Coombs' model, the effect of the multiple-district paradox on the top alternatives is highlighted in bold. The corresponding expert rankings are reported in Table 3 No. of experts (a) IRV Combined group (A' and B') 41 alternative of the combined group's collective ranking. Of course, similar manifestations of incoherence could also affect other alternatives in intermediate or bottom positions -especially for rankings characterized by a relatively large number of alternatives -without producing the paradox. 2. The paradox is probably more likely to occur for decision-making problems characterized by a relatively high degree of discordance among the expert rankings, with particular reference to the alternatives in the top positions. For example, returning to the seventeen rankings of sub-group A in Table 1

Methodology
This section proposes a new methodology for "diagnosing" the multiple-district paradox, based on the use of some indicators of concordance and coherence. The description is organized in three sub-sections: • The first one briefly recalls the aforementioned indicators; • The second one illustrates the use of these indicators for decision-making problems involving rankings with a relatively limited number of alternatives (as those previously exemplified); • The third part shows a step-by-step technique -denominated technique of partialized rankings -able to identify the potential triggering reasons of the paradox.

Concordance and coherence indicators
The proposed methodology is based on the use of three indicators: 1. The first one is the Kendall's concordance coefficient, W (m) , able to express the so-called degree of concordance (or agreement) between a set of m-rankings into a single number [14,45,46,54]. The range of W (m) is [0, 1]; it has unit value in the case of perfect agreement (i.e., all rankings coincide), while it is null in the case of total disagreement (i.e., all rankings are completely unrelated). For more detailed information on the construction and meaning of W (m) , the reader is referred to Sect. A.1 (in the Appendix). 2. The second indicator, W (m+1) k , was recently proposed by the authors to depict the coherence between the expert rankings and the collective ranking resulting from the application of a generic (k-th) aggregation model [47]. This indicator is nothing more than the Kendall's concordance coefficient, applied to (m + 1) rankings consisting of: 3. The m-expert rankings, denoting the preference profile; 4. The collective ranking obtained applying the (k-th) aggregation model to the previous expert rankings.
The coherence between the collective ranking and the expert rankings is evaluated in relative terms, comparing W ≥ W (m) denotes coherence (or positive coherence) between the collective ranking and m-rankings, while W (m+1) k < W (m) denotes incoherence (or negative coherence) [47]. The latter situation can occur when a collective ranking is somehow conflicting with the m-rankings.
To make the coherence assessment easier, a third synthetic indicator can be used: It can be proven that b  k , to the decision-making problem in Table 1, when considering the collective rankings resulting from the application of the (a) IRV, (b) Coombs' and (c) BC models respectively (cf. Table 2). Regardless of the aggregation model in use, the preference profile is characterized by a very low degree of concordance among experts, as evidenced by the very low W (m) values, both for sub-groups A and B and for their combination (A + B).

Interpretation of the paradox
The coherence of the collective rankings with the corresponding preference profiles can be assessed by comparing the W  Table 5b). On the other hand, the IRV model triggers the  Table 5a).
Moving our attention to the second application example in Table 3, something similar happens: all the b (m) k values related to the IRV and BC models denote positive coherence, while that one related to the Coombs' model denotes negative coherence for the combined group. Again, the multipledistrict paradox results in an incoherence between collective ranking and expert rankings at the combined-group level (see Table 6).
The indicators W (m) , W k indicators well respond to the incoherence that characterizes the multiple-district paradox. However, it cannot be excluded that -for rankings with a larger number of alternatives -the above indicators would not be equally responsive. Section 3.3 exemplifies a situation in which -in the presence of the paradox -the (local) incoherences concerning a small number (e.g., 2 or 3) of alternatives in the top of the rankings can be "masked" by other (local) incoherences concerning the alternatives in the middle and/or at the bottom of the rankings.

In the presence of the paradox, the previous examples
show incoherences at the combined-group level but never at the level of single sub-groups. However, it cannot be excluded that the paradox could be triggered by incoherence in one of the sub-groups and not in the combined group.

The technique of partialized rankings
Let us exemplify a new decision-making problem with a plurality of expert rankings of four alternatives (O 1 to O 4 ), organized into two sub-groups: A" and B", including 17 and 15 rankings respectively (see Table 7). It can be noticed that these rankings are "compatible" with those in Table 1: eliminating the alternative O 4 from each of the rankings in Table 7, the rankings in Table 1 are obtained [24,31]; for example, the ranking by e A 7 ( Table 7 is turned into the ranking by e A 7 (O 2 O 3 O 1 ) in Table 1. It can also be noticed that the alternative O 4 is generally placed in the bottom positions of the rankings.
Applying the IRV model to the various (sub-)groups of rankings, the same paradox seen for the example in Table 1 occurs: the winner of sub-groups A" and B" is O 2 , while that for the combined-group (A" + B") is O 1 . Not surprisingly, the alternative O 4 is placed at the bottom of all three respective collective rankings (see Table 8 Table 7, somehow unexpected results are obtained (see Table 8   ). These sub-groups are then merged into a single combined group (A" + B"). It can be noticed that these rankings are "compatible" with those in the example in Table 1: if we eliminate the alternative O 4 from each of these rankings, we obtain those ones in Table 1 (a) Sub-group A" (b) Sub-group B" (c) Combined group (A" + B") Expert Ranking Expert Ranking Expert Ranking Expert Ranking in Table 1, "masking" the discordance related to the positioning of O 1 , O 2 and O 3 . Indicators are sensitive to the presence of all alternatives and to the so-called "irrelevant alternatives" too [22,23].
• Despite the occurrence of the paradox, W and b (m) k ≥ 1, denoting positive coherence between (the three) collective rankings and the relevant expert rankings; Table 8 Results of the application of the step-by-step procedure to the problem in Table 7. We observe that the reasons for the paradox are already visible by excluding only the O 4 alternative from the initial complete rankings 1. The effect of the multiple-district paradox on the top alternatives is highlighted in bold Step (Sub-) can lose effectiveness in identifying the incoherence behind the occurrence of the multiple-district paradox. This weakness can be overcome with a simple contrivance, as illustrated below.
The basic idea is to "partialize" 1 the initial rankings, excluding the alternatives with lower impact on the top positions and recalculating the three indicators of interest. This process can be implemented iteratively, initially considering only the top alternatives (i.e., excluding the remaining ones) and then gradually adding the alternatives in the non-top positions. Precisely: 1. The starting point of the procedure is the collective ranking generated when applying the aggregation model to the combined group, which is conventionally considered as the one that best reflects the global positioning of alternatives. Observing this collective ranking (e.g.,O 1 O 2 O 3 O 4 for the problem in Table 7), it is possible to discriminate roughly between the top two alternatives (O 1 and O 2 ) and the remaining ones (O 3 and O 4 ). 2. The first iteration considers the partialized rankings, related to the sub-groups and the combined group, with the two top alternatives only (e.g., O 1 and O 2 in the problem in Table 7). The indicators W are then calculated and analysed. 3. In the i-th iteration, the procedure is repeated considering the partialized rankings related to the sub-groups and the combined group of the first (i + 1) top alternatives. Again, the indicators W k are calculated and analysed. 4. The procedure is repeated until the (n -1)-th iteration, which considers the complete rankings with all n alternatives. 5. Analysing the indicators determined in each iteration, it is possible to identify the underlying reasons for the occurrence of the paradox.
Returning to the example in Table 7, let us exemplify the technique of "partial rankings" when applying the IRV aggregation model. For the (sub-)groups of rankings, the collective rankings in Table 8-Step 3 are obtained. Leaving aside the multiple-district paradox, which concerns only the two top alternatives (O 1 and O 2 ) -the (collective) ranking that conventionally best reflects the global positioning of the totality of the alternatives based on expert rankings is the one related to the combined group: Next, both the experts' and the collective rankings are "partialized", omitting the non-top alternatives. The first iteration considers the partialized rankings with the two top alternatives only (O 1 and O 2 ), omitting the remaining ones (O 3 and O 4 ). The application of the IRV to the (sub-)groups of rankings in Table 7 leads to the collective rankings in Table 8 (m) k denote positive coherence for sub-groups A", B" and for the combined group, the W (m) values related to sub-group B" and the combined group denote a significant degree of discordance among the corresponding partialized expert rankings.
The second iteration includes the three top alternatives: Table 8-Step 2). In this case, the multipledistrict paradox occurs and the underlying incoherence is detected by the W The procedure can be further iterated considering the complete rankings (see Table 8-Step 3). In this case the paradox occurs but is not detected by the indicators in use. As noted earlier, the "irrelevant" alternative O 4 undermines the effectiveness of the indicators in identifying the paradox incoherence. In other words, the irrelevant alternative O 4 attenuates the sensitivity of W k , which both grow, denoting an increase in concordance among experts; in this case, for example, all experts agree in locating the alternative O 4 in the last position. This growth of concordance among experts masks the incoherence due to the paradox.

Conclusions
The paper focused on the reasons behind the occurrence of the multiple-district paradox in ranking-aggregation problems. Summarizing, it was found that: • The occurrence of the paradox is typically associated with a very low degree of concordance among the expert rankings, with particular reference to the alternatives in the top positions. • The occurrence of the paradox may concern different aggregation models, depending on the specific (i) preference profile and (ii) repartition of the rankings into sub-groups. • The choice of a method to aggregate the expert rankings into a collective one may affects the results even more than the preference profile. • Some aggregation models, classified as PSPs, are "immune" from the multiple-district paradox [43].
It was proposed a methodology based on the use of three indicators: • W (m) , which measures the concordance between expert rankings; k , which measure the consistency between the expert rankings and the collective ranking obtained through a certain (k-th) aggregation model.
The proposed methodology allows to highlight the incoherence characterizing the paradox occurrence, distinguishing whether it occurs at the level of sub-groups (districts) or combined groups (multiple districts).
For rankings with a relatively large number of alternatives, the above indicators can lose responsiveness. To overcome this obstacle, a step-by-step procedure based on the progressive "partialization" of rankings was proposed. This procedure is a valid support tool for design problems involving distributed teams with (partly) conflicting opinions [38]. Additionally, the proposed methodology can be used to assess the robustness of the collective ranking obtained through a certain aggregation model [59].
Some limitations of this research are as follows: • The proposed methodology is based on application of specific (concordance and consistency) indicators. The choice of other indicators could lead to (at least partially) different outcomes [54]. • Although the multiple-district paradox is especially interesting for design decision-making problems in which the best alternative should be determined, it remains one-andone-only of the possible paradoxes documented in the scientific literature; e.g., other paradoxes are the so-called no-shows, preference inversion, absolute majority loser, etc. [34].
Regarding the future, we plan to extend the proposed methodology (i) to the use of new concordance and coherence indicators, and (ii) to investigate further paradoxes. two expert rankings, i.e., the so-called Kendall's coefficient of concordance, which is defined as [14,32,45,46,[48][49][50][51]53]: 12 · ( n i 1 R 2 i ) − 3 · m 2 · n · (n + 1) 2 m 2 · n · n 2 − 1 − m · ( m j 1 T j ) (6) where: • R i m j 1 r i j is the sum of the rank positions for the i-th object, r ij being the rank position of the object O i according to the j-th expert; • n is the total number of objects; • m is the total number of rankings; ∀ j 1, . . . , m, being t i the number of objects in the i-th group of ties (a group is a set of tied objects), and g j is the number of groups of ties in the ranking by the j-th expert. If there are no ties in the j-th ranking, then T j 0.
Regarding the rank positions of the tied objects (r ij ), a convention is adopted whereby they should be the average rank positions that each set of tied objects would occupy if a strict dominance relationship could be expressed [52]. This convention guarantees that -for a certain j-th ranking and regardless of the presence of ties -the sum of the objects' rank positions is an invariant: n i 1 r i j n · (n + 1) 2 In terms of range, W (m) ∈ [0, 1]. W (m) 0 indicates the absence of concordance, while W (m) 1 indicates the complete concordance (or unanimity). The superscript " (m) " is added by the authors to underline that the coefficient of concordance is applied to the m expert rankings and to distinguish it from another indicator -referred to as W (m+1) k ,which will be applied to m + 1 rankings.
The basic idea of W (m+1) k , recently proposed by the authors, is to analyse the level of coherence between the expert rankings and the collective ranking resulting from the application of the (k-th) aggregation model [47]. The test is based on the construction of an indicator, which is nothing more than the Kendall's concordance coefficient (see Eq. A.1), applied to the (m + 1) rankings consisting of (i) the m expert rankings, involved in an Engineering Design decisionmaking problem, and (ii) the collective ranking obtained by applying a generic (k-th) aggregation model to the previous m rankings. The collective ranking is actually treated as an additional (m + 1)-th ranking.