Aggregating multiple ordinal rankings in engineering design: the best model according to the Kendall’s coefficient of concordance

Aggregating the preferences of a group of experts is a recurring problem in several fields, including engineering design; in a nutshell, each expert formulates an ordinal ranking of a set of alternatives and the resulting rankings should be aggregated into a collective one. Many aggregation models have been proposed in the literature, showing strengths and weaknesses, in line with the implications of Arrow’s impossibility theorem. Furthermore, the coherence of the collective ranking with respect to the expert rankings may change depending on: (i) the expert rankings themselves and (ii) the aggregation model adopted. This paper assesses this coherence for a variety of aggregation models, through a recent test based on the Kendall’s coefficient of concordance (W), and studies the characteristics of those models that are most likely to achieve higher coherence. Interestingly, the so-called Borda count model often provides best coherence, with some exceptions in the case of collective rankings with ties. The description is supported by practical examples.


Introduction
A problem that is common to a number of fields, including engineering design, is that of aggregating multiple ordinal rankings of a set of alternatives into a collective ranking. This problem may concern the early-design stage, in which m experts (or decision-making agents: D 1 to D m ) formulate their individual rankings of n design alternatives (or objects: O 1 to O n ) (Fu et al. 2010;Frey et al. 2009;Hoyle and Chen 2011;Keeney 2009). In the simplest case, these rankings are complete, i.e.: (i) each expert is able to rank all the alternatives of interest, without omitting any of them; (ii) each ranking can be decomposed into paired-comparison relationships of strict preference (e.g., O 1 ≻ O 2 or O 1 ≺ O 2 ) and/or indifference (e.g., O 1 ~ O 2 ).
The objective of the problem is to aggregate the expert ordinal rankings into a collective one, which is supposed to reflect them as much as possible, even in the presence of diverging preferences (Weingart et al. 2005;See and Lewis 2006). For this reason, the collective ranking is often defined as social, consensus or compromise ranking (Cook 2006;Herrera-Viedma et al. 2014;Franceschini et al. , 2016. Returning to the context of the early-design stage, design alternatives are often not very well defined and there are doubts about how) to prioritize them (Weingart 2005;Kaldate et al. 2006;McComb et al. 2017. Although there is a substantial agreement on the design criteria, the selection of design alternatives is generally driven by the different personal experience of designers (Dwarakanath and Wallace 1995). Thus arises the need to aggregate preference rankings of design alternatives that reflect the opinions of individual experts, using appropriate aggregation models (Fishburn 1973b;Franssen 2005;Cook 2006;Hazelrigg 1999;Frey et al. 2010;Katsikopoulos 2009;Ladha et al. 2003;Reich 2010;Nurmi 2012).
Alongside this, a passionate debate on the effects of the Arrow's impossibility theorem in engineering design is still going on (Arrow 2012;Reich 2010;Hazelrigg 1996Hazelrigg , 1999Hazelrigg , 2010Scott and Antonsson 1999;Franssen 2005;Yeo et al. 2004;McComb et al. 2017). In short, this theorem establishes the impossibility of a generic aggregation model to provide a collective ranking that always satisfies several desirable properties, also known as fairness criteria, i.e., unrestricted domain, non-dictatorship, independence of irrelevant alternatives (IIA), weak monotonicity, and Pareto efficiency (Arrow 2012;Fishburn 1973a;Nisan et al. 2007;Saari 2011;Saari and Sieberg 2004;Franssen 2005;Jacobs et al. 2014).
For a given set of m expert rankings concerning n alternatives, different aggregation models may obviously lead to different collective rankings (Saari 2011;McComb et al. 2017). Identifying the model that best reflects the m rankings is not easy, also because it may change from case to case. Some researchers showed the effectiveness of specific aggregation models, even though they cannot always satisfy all of the Arrow's fairness criteria (Dym, Wood and Scott 2002). Yet, the Arrow's theorem does not close the doors to the possibility of comparing different aggregation models, identifying the best one(s) on the basis of certain tests. For example, several authors attempted to measure the coherence (or consistency) between the expert rankings and the collective one (Chiclana 2002;Maisano 2015, 2017;Franceschini and Garcia-Lapresta 2019). Other authors hypothesized a relationship between the so-called implicit agreement of the expert rankings and the Arrow's fairness (McComb et al. 2017). Moreover, Katsikopoulos (2009) expressed the need for greater clarity in the discussion ofengineering design methods to support decision making.
In general, the choice of the best aggregation model may depend on: (1) the specific objective(s) of the expert group and/or (2) the rationale of the test used (Dong et al. 2004;Li et al. 2007;Paulus et al. 2011;Cagan and Vogel 2012;Franceschini and Maisano 2019b).
The aim of this article is to make a comparison between four relatively popular aggregation models-i.e., the socalled Best of the best, Best two, Best three, and Borda count model-trying to answer the research question: "Which is the model producing the collective ranking that best reflects the expert rankings?". The comparison will be performed by measuring the coherence of the models, through a recent test based on the so-called Kendall's coefficient of concordance (W) (Kendall 1962;Legendre 2010). This test quantitatively evaluates the coherence of the collective ranking provided by any aggregation model, for a specific ranking-aggregation problem.
A previous research (Franceschini and Maisano 2019a) illustrated the test in general terms, regardless of the characteristics of the specific aggregation models. This work significantly extends the previous one, investigating the characteristics of the aggregation models that are most likely to achieve higher coherence, according to the above test. The new investigation generalizes earlier results, including a mathematical optimization of a specific coherence indicator. Thanks to the outcomes of this study, the engineering-design management will have extra support for choosing the most "promising" aggregation models.
The remainder of this article is organized into three sections. Section 2 illustrates a case study that will accompany the description of the proposed methodology. Section 3 is divided into two parts: the first part formalizes the concept of coherence of the collective ranking with respect to the expert rankings, recalling the coherence test proposed in (Franceschini and Maisano 2019a); the second part analyzes the test itself thoroughly, showing its close link with the Borda count model. Section 4 provides a discussion of the practical implications and limitations of this research for the engineering-design field, summarizing original contributions and suggestions for future research. Further details are contained in the Appendix section.

Case study
This section contains an application example that will be used to illustrate the proposed methodology. An important hi-tech company-which is kept anonymous for reasons of confidentiality-operates predominantly in the sector of video projectors. Recent advances in imaging technology have led the company to increasingly invest in the development of hand-held projectors, also known as a pocket projectors, mobile projectors, pico-projectors or mini beamers (see Fig. 1) (Borisov et al. 2018).

Fig. 1
Example of pocket projector, i.e., small hardware device designed to project content from a smartphone, camera, tablet, notebook or memory device onto a wall or other flat surface Four design concepts of pocket projectors (O 1 to O 4 , i.e., objects) have been generated by a team of ten engineering designers (i.e., the experts of the problem: D 1 to D 10 ), during the conceptual design phase (see also the description in Fig. 2): The objective is to evaluate the aforementioned design concepts in terms of user friendliness, i.e., a measure of the ease of use of a pocket projector. Some of the factors that can positively influence this attribute are: (i) quick set-up time, (ii) intuitive controls, and (iii) good user interface.
Given the great difficulty in bringing together all the experts and making them interact to reach shared decisions, management leaned towards a different solution: a collective ranking of the four design concepts can be obtained by merging the individual rankings formulated by the ten engineering designers (Table 1 shows these rankings).
Before focusing on the possible aggregation models, let us take a step back dealing with the evaluation of the experts' degree of concordance (Franceschini and Maisano 2019a). The scientific literature includes an important indicator to evaluate the overall association for more than two rankings, i.e., the so-called Kendall's coefficient of concordance, which is defined as (Kendall and Smith, 1939;Kendall 1962;Fishburn 1973b;Legendre 2005Legendre , 2010: where: R i = ∑ m j=1 r ij is the sum of the rank positions for the i-th object, r ij being the rank position of the object O i according to the j-th expert; n is the total , number of objects; m is the total number of ordinal rank- being t i the number of objects in the i-th group of ties (a group is a set of tied objects), and g j is the number of groups of ties in the ranking by the j-th expert. If there are no ties in the j-th ranking, then T j = 0.
Regarding the rank positions (r ij ) of the tied objects, a convention is adopted whereby they should be the average rank positions that each set of tied objects would occupy if a strict dominance relationship could be expressed (Gibbons and Chakraborti 2010). This convention guarantees thatfor a certain j-th ranking and regardless of the presence of ties-the sum of the objects' rank positions is an invariant equal to: In terms of range, W (m) ∈ [0,1] . W (m) = 0 indicates the absence of concordance, while W (m) = 1 indicates the complete concordance (or unanimity). The superscript " (m) " (2) n ∑ i=1 r ij = n ⋅ (n + 1) 2 .
was added by the authors to underline that the coefficient of concordance is applied to the m expert rankings and to distinguish it from another indicator-referred to as W (m+1)which will be applied to m + 1 rankings.
Returning to the problem in Table 1, which does not include any ranking with ties (i.e., T j = 0 ), the formula in Eq. 1 can be applied, obtaining W (m) = 0.004 = 0.4% . This result denotes a relatively low degree of concordance among experts. Table 2 shows the calculation of the rank positions (r ij ) of the four objects, for each of the ten expert rankings in Table 1.
Inspired by different design strategies, the team of engineering designers decides to consider four popular aggregation models from the scientific literature (Saari 2011;McComb et al. 2017;Franceschini and Maisano 2019a). A brief description of these models follows: (i) Best of the best model (BoB or standard plurality vote). For each ranking, the most preferred design concept obtains one point. According to the data in Table 2, the resulting collective ranking is For example, this model is used for municipal elections of City Commissioner in some major U.S. cities (Boyd and Markman 1983). (iii) Best three model (BTH or vote for three). For each ranking, the three most preferred design concepts obtain one point each (i.e., this is equivalent to neglecting the worst design concept). The resulting collective ranking is Table 3(iii) contains the intermediate calculations. Whilst this model is less common than the above models, it is occasionally used for municipal elections in several city councils (Stark 2008). (iv) Borda count model (BC).For each expert ranking, the first design concept accumulates one point, the second two points, and so on (Borda 1781). According to this model, the cumulative scores of the four design concepts are calculated as: and BC(O 4 ) the so-called Borda counts related to the four design concepts. Of course, the degree of preference of an i-th design concept decreases as the corresponding BC(O i ) increases. In this specific case, the collective ranking is In addition to being used for engineering design (Dym et al. 2002;McComb et al. 2017), it is also used for: (1) political elections in several countries, (2) internal elections in some professional and technical societies (e.g., board of governors in the International Society for Cryobiology, board of directors in the X.Org Foundation, research area committees in the U.S. Wheat and Barley Scab Initiative, etc.), and (3) a variety of other contexts (e.g., world  Table 1 Engineering designer Rank positions (r ij ) champion of "Public Speaking" contest by Toastmasters International, "RoboCup" autonomous robot soccer competition at the University of Bremen in Germany, ranking of NCAA college teams, etc.) (Emerson 2013). Reflecting different design strategies, the four aggregation models produce four different collective rankings in this case (see overview in Table 3). Even more surprising is that the best pocket projector design concept (i.e. the object at the top of each collective ranking) is different for each of the four aggregation models. Although this plurality of results may at first glance confuse the reader, it is in some measure justified by the low degree of concordance of the expert rankings (i.e., W (m) = 0.004, as seen before). Additionally, this plurality of results raises the question: "Which is the model producing the collective ranking that best reflects the expert rankings?". To answer this question, a test can be used to measure the coherence between (1) the expert rankings and (2) the collective ranking obtained through each model.

Testing and maximizing the coherence
This section is divided into two parts: the first one recalls the concept of coherence and the so-called W (m+1) test, while the second one analytically studies the maximization of the coherence itself.

The W (m+1) test
The basic idea of the W (m+1) test, recently proposed by the authors (Franceschini and Maisano 2019a), is to analyse the level of coherence between the expert rankings and the collective ranking resulting from the application of the (kth) aggregation model. The test is based on the construction of an indicator, denominated W (m+1) k , which is nothing more than the Kendall's concordance coefficient (see Eq. 1), applied to the (m + 1) rankings consisting of: • The m expert rankings, involved in a engineering-design decision problem; • The collective ranking obtained by applying the (k-th) aggregation model to the previous m rankings. The collective ranking is actually treated as an additional (m + 1)-th ranking.
The formula of the indicator W (m+1) k follows: (4) where r i is the rank position of the i-th object in the collective ranking; obviously r i ∈ [1, n] . In case of tied objects, the same convention described in Sect. 2 is adopted.
Going back to the case study, the indicator W (m+1) k can be determined by applying the formula in Eq. (4) to the ten rankings in Table 1 plus the collective ranking resulting from the application of each aggregation model. Table 4 reports the resulting W (m+1) k values; subscript "k" denotes a generic aggregation model, k: BoB, BTW, BTH, BC. For this specific problem, the BC model is the one with the highest coherence ( W (m+1) BC ≈ 2.00%).
In this specific case, the condition W (m+1) k ≥ W (m) holds for each k-th aggregation model, depicting a certain coherence (or positive coherence) between the corresponding collective ranking and the m rankings. An opposite result (i.e., W (m+1) ) would depict incoherence (or negative coherence). Even though the latter situation is in some ways paradoxical, it can occur when a collective ranking is somehow conflicting with the m rankings (Franceschini and Maisano 2019a).
To quantitatively measure the degree of coherence of an aggregation model, the following synthetic indicator can be used (Franceschini and Maisano 2019a): For a given set of alternative aggregation models, the most coherent can be considered the one that maximizes b (m) k ; in formal terms, the model for which: The last column of Table 4 reports the b (m) k values related to the four aggregation models of interest. It is worth noticing that this indicator allows a quick and practical quantitative comparison.

Maximization of
Let us now focus on the synthetic indicator b (m) k . Replacing Eqs. (1) and (4) into Eq. (5), b (m) k can be expressed as: The previous expression is deliberately general, as it contemplates the possibility of: • ties-i.e., relationships of indifference (" ~ ") and not only of strict preference ("≻" or "≺")-among the objects within the m-expert rankings; • ties among the objects within the collective ranking, or (m + 1)-th ranking.
By grouping some terms, Eq. (7) can be reformulated in a more compact form as follows: It can be seen that the indicator b (m) k includes four types of contributions: concerning the m rankings (and therefore the r ij and R i values, also known as experts' preference profile) and the parameters related to the "size" of the problem (i.e., n and m); , concerning a mixture of the experts' preference profile (through the R i values) and the ranks of the collective ranking (through the r i values); (iii) 12 � ∑ n i=1 r 2 i � , concerning the r i values of the collective ranking; (iv) W (m) (m + 1)T m+1 , concerning a mixture of the experts' preference profile (through W (m) (m + 1) ) and the T m+1 value related to the collective ranking.
Note that the first contribution (i) is not related to the results of the collective ranking. Instead, the remaining three contributions-(ii), (iii) and (iv)-are all related to the collective ranking. In line with the research question behind this study, let us try to identify the aggregation model that produces the most coherent collective ranking, through the maximization of b (m) k , operating on the three terms concerning the aggregation model: Additionally, note that the analytic maximization of Eq. (8) as a function of the r i values is relatively complex for two reasons: (i) the r i values are variables defined on a discrete domain; (ii) the r i values are explicit in some terms (i.e., 24 ) and implicit in others (i.e., W (m) (m + 1)T m+1 , where possible ties in the r i collective ranks affect the T m+1 term). In the following subsections, the major terms of Eq. (8) will be analysed separately, although they are closely related. A more rigorous, though laborious, alternative could be performing numerical maximization through Monte Carlo simulations.

Analysis of
This term can be interpreted as a scalar product between two vectors: R = (R 1 , …, R n ) and r = (r 1 , …, r n ). In general, the scalar product of two vectors (r and R) with predetermined modules is maximized if these vectors are completely aligned, i.e., when direct proportionality between the relevant components occurs: r i ∝ R i . With reference to the problem of interest, this perfect alignment can hardly be achieved in practice, due to the fact that the r i components are rank positions ∈ [1, n], with constant sum equal to n⋅(n + 1)/2. Compatibly with the previous constraint, it can be demonstrated that the Borda model provides a collective ranking that maximises the term

Analysis of
We note that this term corresponds to the squared module of the collective-rank vector r = (r 1 , …, r n ). It can be demonstrated that r has the maximum-possible module when its components are a permutation of natural numbers included between 1 and n, in the absence of ties (see the proof in Appendix A.2). Precisely, the maximum-possible value of the term of interest is n⋅(n + 1)⋅(2⋅n + 1)/6 (Gibbons and Chakraborti 2010). If there is a tie, however, this term tends to decrease. The most disadvantageous case would be the one with an overall tie of all the alternatives (i.e., r 1 = r 2 = … = r n = n+1 2 ), with a consequent value of the term of n+1 2 2 ⋅ n ; therefore:

Analysis of T m+1
This term is maximized in the case of an all-all tie of all the alternatives in the collective ranking, i.e., r 1 = r 2 = …= r n , with the value of n 3n (see Sect. 2). In the case of no ties, it is obviously equal to zero. Therefore, the range of T m+1 is: n ⋅ (n + 1) ⋅ (2 ⋅ n + 1) 6 .

Close link between b (m) k and the Borda count model
In light of the previous analyses, it is possible to outline two potential situations, as described in the following subsections.

Absence of ties in the collective ranking
In the absence of ties in the collective ranking, the two terms ∑ n i=1 r 2 i and T m+1 become constants, i.e., respectively: and T m+1 = 0. It can be proven that the BC model is the one maximizing the term ∑ n i=1 � R i r i � (see demonstration in Appendix A.1); given the constancy of the aforementioned two other terms, the BC model will also maximize b (m) k in this situation. To better focalize this result, it may be appropriate to consider the general expression of W (m) (in Eq. 1) more closely. It can be observed that the R i values (e.g., those reported at the bottom of Table 2) are the same as the Borda scores assigned to the individual objects of interest (i.e., BC(O i ) , ∀i = 1, … , n , as exemplified in Eq. 3) (Cook and Seiford 1982). In this situation, the expression of b (m) k in Eq. (7) can be reformulated as: Pooling the terms that do not depend on the collective ranking (i.e., all except for the r i terms), the following compact expression can be obtained: being a and b two terms that-for a given problem-can be treated as constants, as they depend exclusively on n, m, R i = BC(O i ) ( ∀i = 1, … , n ), and T j ( ∀j = 1, … , m ). Equation (12) highlights the close link between the collective ranks ( r i ) and the Borda scores related to the m-expert rankings: in the absence of ties in the collective ranking, the BC model maximizes the term ∑ n i=1 [BC(O i ) ⋅ r i ] , and therefore also b (m) k , determining the maximum alignment (or projection) of the two vectors r = (r 1 , …, r n ) and

Presence of ties in the collective ranking
The presence of ties in the collective ranking can affect the maximisation of b (m) k in a somewhat unpredictable way: although it contributes to reduce the term ∑ n i=1 r 2 i , it also contributes to increase the term (m + 1)T m+1 (cf. Eq. 8). Thus, the overall effect on b (m) k is not simply predictable and should be considered on a case-by-case basis; this also emerges from the additional asymptotic analysis of b (m) k , contained in Appendix A.4.
Reversing the perspective, in the presence of ties, the maximization of

Discussion
Leaving the mathematical issues, this section focuses on (i) practical implications and limitations of this research for the engineering-design field, and (ii) original contributions and ideas for future research. These topics are covered in the following two sub-sections respectively.

Implications and limitations for engineering design
In early-design stages, initial decisions often should be made when information is incomplete and many goals are contradictory, leading to situations of conflict between (co-)designers. Managing the conflict that emerges from multi-design interaction is therefore a critical element of collaborative design (Grebici et al. 2006). According to some authors, conflict itself is the process through which ideas are validated and developed: "the engine of design" (Brown 2013). Considering the problem of interest, the engineeringdesign conflict finds its shape in the (discordant) object rankings, which are formulated by the individual designers; for example, this conflict is quite evident from the m rankings in the case study (see Table 1). The collective ranking represents a way to solve this conflict and the aggregation model therefore represents a sort of conflict-management tool. However, the plurality of aggregation models makes their selection non-trivial: any aggregation model is by definition imperfect and may provide more or less sound results, depending on the specific ranking-aggregation problem (Arrow 2012). In this research, the coherence between the collective ranking and the corresponding m rankings was considered as a selection criterion; in fact, the indicator b (m) k allows to identify the most coherent aggregation model(s) in different practical situations.
In cases where a certain conflict between collective ranking and expert rankings is observed, decision makers can deepen the analysis, identifying those expert rankings that represent the main sources of incoherence. A possible in-depth analysis could be based on the calculation of the Spearman's rank correlation coefficient 1 (ρ) between the collective ranking and each of the expert rankings. Intuitively, the Spearman correlation between the rankings will be high (i.e., tending towards + 1) when objects have a similar rank between the two rankings, and low (i.e., tending towards −1) when objects have a dissimilar (or even opposed) rank between the two rankings. Of course, the rankings that will produce the highest incoherence are those with negative ρ values.
For the purpose of example, let us return to the case study; Table 5 reports the ρ values between the collective ranking related to the application of the BC model (in Table 3) and the corresponding expert rankings (in Table 1). The expert rankings most in contrast with the collective ranking are those by experts e 1 and e 2 (both with ρ values of − 0.632), followed by those by experts e 6 and e 7 (both with ρ values of − 0.316). It could be interesting for the engineering-design management to identify the reasons for the misalignment of these experts (e.g., different view or poor understanding of some design concepts, errors in the ranking formulation, etc.).
Among the jungle of possible aggregation models existing in the scientific literature, this researcher has considered exclusively aggregation models characterised by simplicity and easy understanding. It is well known that simple and intuitive models are more easily "digested" and implemented by management than obscure and complicated ones . To quote a phrase by Leonardo da Vinci, "Simplicity is the ultimate sophistication". In line with that, our analysis was limited to four simple, intuitive and popular aggregation models, showing that the traditional BC generally provides very coherent results. The authors believe that this is a relevant indication for the engineering-design management when selecting the most "promising" aggregation models for a certain ranking-aggregation problem.
The proposed study has several limitations, summarised in the following points: • Since the present analysis makes extensive use of the W (m+1) test, it "inherits" the limitations associated with it, i.e., (Franceschini and Maisano 2019a): (i) the test does not consider the (possible) uncertainty in expert rankings, and (ii) the test allows only an ex post (i.e., caseby-case) analysis of the impact of aggregation models. • The study revealed that the BC model often provides the best coherence. Over and above the merits of the BC model, this result is also due to the structural characteristics of the W (m+1) test; in fact, being based on the Kendall's coefficient of concordance, this test is somehow related to the BC model (cf. Section 3.3) (Cook andSeiford 1978, 1982). This aspect in some ways limits the proposed coherence analysis: measuring coherence through another indicator would not necessarily lead to the same results. As an example, one could use Cronbach's alpha ( Ck ), applied respectively to the ranks related to the expert rankings and the collective ranking.
A new indicator, similar to b (m) k , could then be defined as: Table 5 Spearman's ρ correlation coefficients with relevant p-values for the BC-model collective ranking (in Table 3) and each of the ten expert rankings in Table 1 Expert 1 This coefficient can be seen as a special case of the Pearson correlation coefficient of two sets of variables (e.g., X and Y), where the values of the variables are replaced with rank values (e.g., r X and r Y ) before calculating the coefficient itself (Spearman 1904;Ross 2009;Myers et al. 2010 where: X and Y are subscripts referred to two generic experts (e X and e Y ), who have formulated their respective rankings of n objects (O 1 , O 2 , …, O n ); r X (O i ) and r Y (O i ) are the rank values of the i-th object, formulated by e X and e Y respectively; n are the mean rank values of the objects, considering the rankings formulated by e X and e Y .
1 3 being (m) C and (m+1) Ck respectively the Cronbach's alpha related to the m-expert rankings and the (m + 1) rankings, when adding the collective ranking obtained through the (k-th) aggregation model (cf. Sect. 3.1).
However, the choice fell on W for many reasons: (i) it is specific for judgments expressed in the form of rankings and not in other forms, such as on cardinal scales (Hammond et al. 2015); (ii) the W-distributional properties are well known (Kendall 1962;Gibbons and Chakraborti 2010); (iii) it is intuitive and relatively easy to implement (Franceschini and Maisano 2019a). • The proposed analysis considers only complete expert rankings, where all objects are ranked through strict preference and/or indifference relationships only. Nevertheless, some practical contexts may make it difficult to formulate complete rankings, e.g., problems with many alternatives, where experts can face practical impediment or do not have the concentration to formulate complete rankings. In such cases, experts may prefer to formulate incomplete rankings, which include the most/least relevant objects only and/or deliberately exclude some other objects Maisano 2019b, 2020).

Original contributions and ideas for future research
This paper analysed the coherence of alternative aggregation models, trying to answer the research question: "Which is the model producing the collective ranking that best reflects the expert rankings?". The coherence between the m-expert rankings and the collective ranking (which is obtained through a certain aggregation model) was assessed using the W (m+1) test and the corresponding synthetic indicator b (m) k (Franceschini and Maisano 2019a). It was found that the BC model offers, with some exceptions, the best coherence. Precisely, when no ties appear among the objects of the collective ranking, it was analytically shown that the BC model maximizes both the indicators W (m+1) and b (m) k . Admitting ties, instead, the BC model's collective ranking is not necessarily the best one, although it is generally close to it (cf. Appendix A.3).
The above result confirms the versatility and practicality of the BC model, which-in spite of some inevitable imperfections (Dym et al. 2002;Arrow 2012)-remains intuitive, easy to implement, computationally light and coherent as results.
Regarding the future, we plan to explore in greater depth the gap between (i) the collective ranking resulting from the BC model and (ii) the one maximizing coherence, based on a large number of tests and experimental simulations.
If the numbers are different, meaning that R 1 < ⋯ < R n and r 1 < ⋯ < r n , then the lower bound is attained only for the permutation which reverses the order, i.e., p(i) = n − i + 1 for all i = 1, …, n, and the upper bound is attained only for the identity, i.e., p(i) = i for all i = 1, …, n.
Proof (by induction). Observe first that the condition implies hence the result is true if n = 2. Assuming that it is true at the (n -1)-th rank position, and let choose a permutation p for which the arrangement gives rise a maximal result. If p(n) were different from n, say p(n) = k, there would exist j < n such that p(j) = n. But r j > r n and R k > R n and, hence R n ⋅r n + R j ⋅r k > R n ⋅r k + R j ⋅r n , by what has just been proved. Consequently, it would follow that the permutation q coinciding with p, except at j and n, where q(j) = k and q(n) = n, gives rise a better result. This (14) (15) R 1 > R 2 and r 1 > r 2 (16) (17) R 1 > ⋯ > R n and r 1 > ⋯ > r n contradicts the choice of p. Hence p(n) = n, and from the induction hypothesis, p(i) = i ∀ i < n (Hardy et al. 1952).
Since, for a certain sequence of R i values, the BC model provides a sequence of r i values that-in addition to complying with the constraint that the sum is constant and equal to n⋅(n + 1)/2-also maintains the order, it can be inferred that it maximizes the term of interest. The same consideration also applies if the BC model results in some ties between the objects in the collective ranking.

A.2. Ties in the BC collective ranking
The BC model determines a collective ranking (i.e., a set of r i values), which is similarly sorted with respect to the ranking of the R i values (i.e., if R k is greater than or equal to exactly R j , then r k will also be greater than or equal to exactly r j ).
Considering a ranking of R i values of n objects, we assume that two of these values, respectively in positions j and j + 1, coincide. The BC model (cf. Sect. 2) will then produce a corresponding collective ranking, containing a tie between two corresponding objects in positions j and j + 1.
In this case, the sum of the component-by-component products between the two rankings is: Focusing on the last two terms on the second member of Eq. 18 and imposing (1) R j = R j+1 = R, which reflects the initial hypothesis of a tie between the two R i values, and (2) r j = r j+1 = [j + (j + 1)]/2 = j + ½, which reflects the average rank of the tied objects in the collective ranking, Eq. (18) results in: Additionally, it can be demonstrated that the sum of the squares of the r j values is: If we eliminate the tie between r j = j + ½ and r j+1 = j + ½, replacing the respective values with other values compatible with the condition ∑ n i=1 r i = n⋅(n+1) 2 , we will obtain r j = j and r j+1 = j + 1 (or, vice versa: r j = j + 1 and r j+1 = j ). In this which is greater than that in Eq. 20, due to the last term: i.e., 1 instead of ½. It can therefore be deduced that the term ∑ n i=1 r 2 i is maximized in the case of absence of ties in the collective ranking.
Obviously, the above considerations can be extended to rankings with ties between more than two objects and/or rankings with different groups of ties. Table 6 exemplifies a fictitious problem consisting of m = 5 expert rankings, concerning n = 5 alternatives (O 1 to O 5 ); for the sake of simplicity, the above rankings do not include ties. Table 7 reports the collective rankings obtained through the four different aggregation models in Sect. 2 (i.e., BoB, BTW, BTH and BC) and several related indicators:

A.3. Additional example
• W (m) = 74.4% , which depicts a relatively high degree of concordance among the expert rankings; • The W (m+1) k values, specifying the contributions of the three terms discussed in Sect. 3.2: The presence of ties for some collective rankings does not guarantee the achievement of the maximum value for the indicator b (m) k through the BC model (Sect. 3.3.2). It can be seen that the W (m+1) k values are very close to each other, as are the respective b (m) k values. Consistently with what illustrated in Sects. A.1 and A.2, the BC model-which results in a collective ranking without ties (see Table 7 (iv)) maximizes both the first two terms, ∑ n i=1 � R i r i � and ∑ n i=1 r 2 i , in the numerator of Eq. 8, but minimizes the third term, in the denominator of Eq. 8.
As mentioned, the aggregation model that maximizes W (m+1) k (and therefore also b (m) k ) is not the BC model, as in the case study in Sect. 2, but the BTW model ( b (m) BoB = 1.0485 ), followed by the BTH model ( b (m) BoB = 1.0479 ) as it appears in the last column of Table 7. For this specific problem, the aggregation model obtained through the BC is only the third one in terms of coherence. This apparently negative result is at least partially mitigated, considering that: k values of the four models are extremely close to each other, in particular those of the first three ones (which differ by a few thousandths of a unit), reflecting collective rankings that are very close to each other (see the second column of Table 7). • The term m, which is one of the two "size" terms of the specific problem (together with n), is relatively low in this case. It can be proven that the gap between b (m) BC and b (m) * = b (m) BoB would tend to narrow as m increases.

A.4. Asymptotic analysis of b (m)
k This section quantitatively analyses the contributions of the three terms of b (m) k , which depend on the aggregation model (cf. Precisely, an asymptotic analysis of of b (m) k is performed as the n and m parameters grow. Although we are aware that these parameters will never grow too much in realistic decision-making problems (e.g., reaching at most the order of magnitude of tens), the proposed analysis can nevertheless provide useful indications.
Considering Eq. (7), b (m) k can be expressed as: The asymptotic order of magnitude of the various addenda in the numerator and denominator can then be determined using the so-called asymptotic notation by Bachmann-Landau, with respect to n and m (Knuth 1976). The numerator consists of four addenda: where the operator "O()" (to be read as "big-O") describes the asymptotic order of magnitude of a generic function (Knuth 1976). We specify that, being the R i terms given by the sum of m rankings that ∈ [1, n] , they will ∈ O(n⋅m).
Considering the n parameter alone, we note that all four addenda in the numerator are asymptotically of the same order of magnitude-O(n 3 )-so no one will tend to predominate over the others as n increases.
On the other hand, considering only the m parameter, we note that the fourth addend in the numerator, which contains the term ∑ n i=1 r 2 i , will tend to be asymptotically negligible compared to the third addend, which contains the term ∑ n i=1 � R i r i � . It can therefore be argued that as m increases, the contribution of the term ∑ n i=1 r 2 i tends to be negligible with respect to that of the term ∑ n i=1 � R i r i � , i.e., using the Bachmann-Landau notation (Knuth 1976): −3(m + 1) 2 n(n + 1) 2 ∈ O n 3 ⋅ m 2 , where the operator "o()" (to be read as "little-o") describes the asymptotic negligibility of the first function compared to the second one (Knuth 1976). We further note that, although Eq. (28) exists, both terms ∑ n i=1 r 2 i and ∑ n i=1 � R i r i � , in the denominator of the fraction in Eq. (23), will tend to be asymptotically negligible compared to the first two addenda (cf. Eqs. (24) and (25), being both ∈ O m 2 .
With reference to the denominator, it consists of three addenda: We specify that the T j values and T m+1 are O n 3 (see also the considerations about T j and T m+1 in Sects. 2 and 3.2). Again, we note that the three addenda are asymptotically of the same order of magnitude with respect to parameter n. With respect to m, on the other hand, it emerges that the third addend (containing the term T m+1 ) is asymptotically negligible compared to the other terms, that is: In conclusion, this asymptotic analysis has not shown any clear trend, if not a certain predominance of the contribution of the term ∑ n i=1 � R i r i � compared to ∑ n i=1 r 2 i and T m+1 , as m grows. Therefore, aggregation models that maximise ∑ n i=1 � R i r i � , such as the BC model, will have a better chance of maximising the b (m) k indicator.