Pooling Rankings to Obtain a Set of Scores for a Composite Indicator of Erasmus + Mobility Effects

In this paper, we study how to assign weights to a set of evaluations obtained at the end of an international mobility experience in order to aggregate them into a composite indicator. The mobility experience was evaluated by three categories of actor: the participant; the school or company sending the participant; and the school or company hosting the participant. We estimated the weights starting from the assessors’ mutual evaluations of the beneficiaries of the mobility experiences. In particular, the aim of the paper was to compare two strategies for estimating the weights: (1) a weighted function of the univariate rank distribution of frequencies; and (2) the normalised elements of the first eigenvector of the dominance matrix computed by mediating the actors’ dominance matrices derived from the rankings of mobility beneficiaries. Variants of the two strategies were also introduced. Even though each strategy had different assumptions, the analyses produced several important findings. First, the optimum weighting model depends on the loss function used to evaluate the quality of the results. In particular a between-ranking variability function favours both univariate and unweighted multivariate models, while a bias-based function favours weighted multivariate models. Second, in both univariate and multivariate analyses, the application of rank-order-centroid and rank-reciprocal rules give more accurate results than both linear and exponential rules.


Introduction
ROI-MOB (Measuring return on investment from EU VET mobility) is a European project that aims to represent the final outcomes of an Erasmus + mobility experience using a single, complex indicator (Fabbris and Boetti 2019). ROI-MOB focuses on the Vocational 1 3 Education and Training (VET) experience, which consists of an internship undertaken by a student or apprentice at a workplace in another country. Schools, companies, or other intermediate bodies organize, in agreement with the hosting bodies, the internship of individuals or small groups of participants. The indicator measures the quality of this complex phenomenon jointly experienced by the young participant and all other bodies involved in the process. The indicator is a composite index of elementary indicators, each capturing the viewpoint of an assessor, aggregated using an appropriate 'importance' weighting. In particular, the surveys collected data from four groups of assessors: (i) the participants directly involved in VET international mobility, either residing or hosted in one of four countries (Germany, Italy, Portugal, and Spain); (ii) the schools and training centres facilitating the participation of students or apprentices, either by sending them abroad or hosting them; (iii) the companies and public bodies that sent or hosted the participants; and (iv) representatives of the national or European institutions promoting VET international mobility. For the sake of simplicity, only the data from the first three groups of assessors are processed in this paper.
This paper aims to estimate the importance weights to attach to the final evaluations from three groups of assessors surveyed after a set of mobility experiences that occurred in 2017 and 2018 in the four focal countries. For each assessor, the final evaluation is assumed to be an overall evaluation measure of the outcomes of the pertinent VET mobility experience. To estimate the weights, we applied a set of procedures to the rankings of beneficiaries from the VET international mobility scheme, as expressed by the assessors, and defined an optimum way to transform rankings into scores. The procedures rely on the hypothesis that the mobility experience is more important for those categories of assessors who receive greater benefits and, therefore, the evaluation of the experience of these categories must weigh more in the composite indicator representing the overall evaluation.
Technically, the ROI-MOB indicator is a combination of the final evaluations independently given by the mobility actors, weighted according to the recognised 'benefit' the actor received from the mobility experience. All actors are assumed to be equally informed of the process. This prevents the argument be made that the weights attached to the experience assessments should be proportional to the level of information possessed by the assessor, which is unknown. However, we do know the level of benefit the assessors obtain through the process.
In the following, we deal just with the aim of estimating the weights for the indicator construction, which we will call scores so as not to confuse them with the weights that will be used in their estimation. The rest of the paper is organised as follows. Section 2 describes in detail the available data, the analytical models used in the estimation of weights, and the criteria for model selection. Section 3 describes the results, and these are then discussed in Sect. 4.

The Data
The data came from three surveys conducted between March and August 2018 in four European countries (Germany, Italy Portugal, and Spain). The questionnaires were sent to samples of participants, schools, and companies whose email addresses were provided by the project partners. All participants, companies, and schools involved in a VET mobility experience organised by one of the project partners in the last two years were included in the sample. Each sample aimed to represent a national context. After two pilot rounds of data collection, the questionnaires, written in the four national languages of the project (Italian, German, Portuguese, and Spanish), plus English, were administrated to over 5,000 potential respondents through a Computer Assisted Web-based Interviewing (CAWI) system. The participation-level was good, with a response rate of 31% (1,545 questionnaires).
The national representativeness of the samples does not provide European representativeness in the overall sample. However, the presence in the sample of both Mediterranean and Central Europe countries allows for broad cross-national inference. Moreover, the four focal countries account for a substantial proportion of the international students and trainees mobilised by the Erasmus + Programme (Fabbris and Boetti 2019).
The basic questions posed in the surveys to evaluate the ranking of beneficiaries slightly differed according to the mobility actor. The questions posed only to participants was: • 'Finally, which are the two categories that get the highest benefits from Erasmus + mobility? (Please, click the first and the second category of possible recipients)'. The option was given to specify a first and second selection out of five possible beneficiaries: students/apprentices, schools and training centres, companies (both sending and hosting), the labour market, and the European Union as an institution. The beneficiaries were ranked on a three-level ordinal scale. The question posed only to the schools and companies was: • 'Finally, which are the categories of possible recipients that get the highest benefits and the ones that get the lowest ones from Erasmus + mobility? Please, order the categories from 1 (highest) to 5 (lowest benefits)'. The same categories presented to the participants were used here. The beneficiaries were ranked on a five-level ordinal scale.
The collected data contained nonresponses. In some cases, people did not respond to the question at all; in other cases, people indicated just the first position or just few initial positions. All given rankings, even those that were incomplete, were processed.
It is worth highlighting that the three categories of assessors evaluated themselves as a possible recipient of mobility benefits. Thus, when a respondent evaluated the relative positions of the mobility actors, they were expected to operate a sequence of mental comparisons, first pinpointing the top beneficiary and then selecting a second best among the residual categories, then a third, and so on. All the possible beneficiaries were well known to the assessors as actors of the represented mobility process.

Models and Methods
In the following, we deal with the evaluation systems in which each unit of a sample ofn independent assessors ranks A ′ alternatives from a set of A total alternatives, A ′ ≤ A , from the first to last possible place. Our aim is to find an optimum strategy to aggregate the n rankings and transform them into scores to assign to the A ′ alternatives. 1 Without a loss of generality, we assume that-as occurred in our surveys-all alternatives are ranked, e.g.A � = A.
Let us consider the information contained in the rankings given by a group of assessors. To estimate the scores, the information can be conceived at different levels: • We can use just the top position of each distribution. Thus, the score estimates are proportional to the frequencies of the top positions obtained by the A alternatives. This estimation method uses basic information and does not require particular statistical techniques. In the following, this procedure is called 'first position'. • We can deduce scores from the averages of frequency distributions. This estimation method uses the full frequency distribution as if each ranking was an empirical occurrence of a random variable distributed with mean (A + 1)∕2 and variance A 2 − 1 ∕12 . Henceforth, we will call this method the 'univariate' procedure. • We can model the relations between the A(A − 1)∕2 distinct pairs of alternatives by comparing their ranking positions and defining the level of 'dominance' of alternative a over alternative b . Thus, this weighting procedure is called 'bivariate'. This estimation method requires the construction of a 'dominance matrix', which reflects the bivariate relationships between alternatives. The way such a matrix is defined and how it can be processed for weight estimation purposes is described later in this section.
The univariate approach uses the (absolute or relative) frequency distribution of the rankings given by the n respondents. For the a th alternative, the absolute frequency distribution is the vector a = f a1 , ..., f ah , ..., f aA whose elements add up to n. To give an example, f a1 is the number of respondents who put the a th alternative in the first position. The score for alternative a , Y a , is obtained as a linear combination of the frequencies at all rank positions of that alternative, where: f ai is the frequency of assessors who placed alternative a in position i(i = 1, ..., A) ; and W i is a weight assigned to rank i , such that ∑ A i=1 W i = 1 . For the sake of simplicity, the score is assumed to be normalised, so as to add up to one across all alternatives, ∑ A a=1 Y a = 1 and 0 ≤ Y a ≤ 1. For score estimation purposes, the bivariate approach refers to the analysis of dominances, tournaments, or round-robins (Ahn and Park 2008). This analysis requires a threestep procedure: 1) Estimation of a dominance matrix, = p ab (a ≠ b = 1, ..., A); p aa = 0 ∀ a, b , through the rankings expressed by the respondents. The dominance relations can be recorded in a 'generalised' tournament matrix, whose entries vary between zero and one, with extremes included (Moon and Pullman 1970;Tanino 1988). is a square skew-symmetric matrix of cardinality A with p ab = 1 − p ba (a ≠ b = 1, ..., A) and p aa = 0 for (a = 1, ..., A) . The entry p ab can be interpreted as a measure of the dominance of a over b , and the more this measure is close to one, its maximum, the larger the score assigned to a with respect to b.
2) The computation of the right eigenvector w corresponding to the first eigenvalue of matrix is as follows: (1) The a th entry of this eigenvector is proportional to the score of the a th alternative. 3) Normalisation of the values of w = w a (a = 1, ..., A) is accomplished by dividing each element with the sum of the A elements of the eigenvector ∑ A a w a so that the normalised values w * a add up to one across the A alternatives, ( ∑ A a w * a = 1 ). So, in our case, the normalised values estimate how much the actors benefitted from the international mobility on a 0-1 scale so that the total benefit of the A = 5 possible beneficiaries is 100%.
Therefore, for bivariate analysis, we estimate the dominance relations through the following formula: where n(a > b|a = i) denotes the number of times the alternative a is at rank i and the alternative b follows in the sequence, say from (i + 1) to A ; m is the total number of comparisons between a and b ; and W i is a weight associated with the number of times a dominates b when alternative a is at rank i(i = 1, ..., A − 1).
For P * ab to be an element of the P matrix it requires further refinement: In defining the weights, both in the univariate and the bivariate approach, we will refer to various scoring functions. Let us start with the plurality and the Borda rules (Borda 1784; Brams and Fishburn 2002). The plurality rule is the simplest way of weighting since it assumes that W 1 = 1 and W h = 0 for h = 2, ..., A . This rule is equivalent to considering only the first position (i.e. the univariate approach) and only the comparisons involving the first ranking position for each alternative in the bivariate approach. In these cases, the scores assigned to the A alternatives are given by the vector = Ŷ 1 , ...,Ŷ a , ...,Ŷ A in which the score estimate of alternative a is given simply by the relative frequency of the first positions assigned to the alternative a by the n assessors.
The application of the Borda rule implies different weights, W i (i = 1, ..., A) , according to the position of the alternative a in the ranking. For a univariate analysis, the rule assigns a weight of A to the first position, A − 1 to the second position, and so on until the weight of one is given to the last position. So, the standardised form of weight W i is a linear function of the distance of alternative a from the first position of the ranking: which is decreasing in i , the position of the alternative in the ranking. The weight has a maximum of 2∕(A + 1) , corresponding to the first position, and a minimum of 2∕ A 2 + A , corresponding to the bottom position: The weights obtained according to formula (4) are based on the idea that the rank order should be reflected directly in the weights and are known in the literature as rank sum (RS) weights ( W � i = W RS i : Stillwell et al. 1981). Another way for converting the rankings into scores is the rank reciprocal (RR) rule, in which weights are proportional to the reciprocals of the rank order of alternative I (Barron 1992;Stillwell et al. 1981): Variants of the rank reciprocal rule are the so called rank order centroid (ROC), in which weights are proportional to the sum of rank reciprocals from position i till the bottom position of the ranking: and the Sum-Reciprocal (SR) rule as proposed in Danielson and Ekenberg (2014), in which weights are intermediate between the linear and the reciprocal ones: Both RS and ROC approaches have sound statistical bases, both relying on distributional properties and being subject to random error (Jia et al. 1998). When numerical weights are estimated through functions, instead of directly observed, they are also called surrogate weights (Danielson and Ekenberg 2017).
The literature on how independent rankings can be merged into one or more summarizing objects of the same ordinal (ranking) 2 or metric (score) nature makes reference to the feasible consensus, compromise theory in social choices, or visual representations of the rankings in a one-dimensional normalised scale (Honda et al. 1981;Jensen 1986;Hudry and Monjardet 2010). A metric solution requires the introduction of the notion of distance: the consensus scores are those minimizing the remoteness from the basic rankings, and the remoteness depends on the imposed metric. In what follows, to account for the possible dependence between the solutions and the adopted metrics, we applied various metric rules.
Therefore, we conjectured that the further we go down the ranking, the more the distance from the first position should more than linearly reduce the weight of the alternative. Formula (4) then changes as follows: where is the power of the distances from the first position. Ahn and Park (2008) call this rule 'rank exponent' (RE) weight. It is easy to see that if = 0 , all weights are equal, and equal to 0.2, and if = 1 , Formula (8) coincides with Formula (4), which corresponds to the linearly scaled weights. For = 2 , the frequencies close to the top position weigh more than linear and diverge from linearity as we approach A , the bottom of the ranking. In fact, To weigh a comparison between alternatives a and b, the rules vary according to assumptions. Weights can be computed simply as a function of the distance of alternative a from the first position: if alternative a is in the first position, we assign the largest weight to the comparison between a and j , where j can be any other alternative (j = b, ..., A) ; if a is in the second position, the weight will be one less than previously, until the next to last position, when the weight will be 1 ; and for a = A , the weight will be zero. The standardised form of weight W i for a bivariate analysis is a linear function of the distance of the first alternative from the top position in the ranking 3 : This weighting system has a maximum of 2∕A for i = 1 and a minimum of 2∕(A(A − 1)) for i = A − 1: It is easily seen that Formula (9) is the same as Formula (4) in cases where i varies between 1 and (A-1). Formulae from (5) to (7) can also be adapted to weigh pair comparisons of alternatives by leaving i to vary between 1 and (A-1) and substituting A with (A-1) in the formulae.
Other possible rules refer either to how many ranks separate the two alternatives or which are the alternatives that occur between the two at stake. For instance, if we compare two alternatives whose positions are a = 1 and b = 4 , should it matter if another alternative c is in the second or the fifth position? From Arrow's theorem (Arrow 1951), we know it does because the rest of the world may in some way influence, even significantly, the relationship between the two compared alternatives. The degree to which it matters and in which direction is unknown. In what follows, we model the between-alternative relations according to various hypotheses on the distance between their ranks. We ignore the possible dependence of the magnitude of a dominance relationship on the position of the remaining alternatives. 4 As a matter of fact, we rely on the local independence of the irrelevant alternatives (Young 1995), namely the independence between the two compared alternatives and the other A-2 alternatives not involved in the comparison, an assumption that is reasonable in a case, such as ours, in which the alternatives are social entities just functionally interacting with each other but likely independent in their judgements of the mobility process.
Analogous to the univariate case, we conjectured that the further we go down the ranking, the more the distance from the first position should more than linearly affect the weight of the alternative. So, the distance rule can be powered as: Also, if alternatives a and b are apart in a ranking, we could pose for ourselves the question: should we equally weight a comparison between a and b if, for instance, a is first and b is second, instead of when a is first and b is in any other non-contiguous position? Indeed, Bradley and Terry (1952) and Kemeny (1959) suggested to account for how far apart one alternative is from another in a comparison. This question is not addressed in this paper.

Criteria for Choosing the Best Approach
The literature on how to transform a set of available rankings into a set of scores (as defined in Sect. 2.2) is sparse, and there is no agreed upon solution for comparing these or similar approaches. Therefore, we propose two methods that may help in selecting the best among the proposed methods: a) Variability of rankings given by assessors around their aggregate ranking. This criterion assumes that the aggregate ranking is what the assessors evaluated and that a ranking delivered by an assessor is a trial of the ranking we expected to measure. Therefore, for each group of assessors, (g = 1, ..., G) , we have n g rankings whose variability is proportional to the distance between each individual ranking and the group ranking: (10) The possibility to introduce more refined hypotheses about the distance between alternatives in a comparison and the orderings of the alternatives other than those we compare has been the focus of many studies (see David 1987;de Vries 1988;Cook and Kress 1990;Adler, Friedman and Sinuany-Stern 2002;Wang et al. 2007;Llamazares and Peña 2013). The literature on these hypotheses conveys peculiarities that depend on the disciplinary domain, the scope of the analysis, the type and number of the involved alternatives to be ordered or scored (competing animals, agonistic sports teams and competitions, social or business decisions, judgements, etc.), the number of observed rankings, and the methodological drawbacks, thus, leaving many open questions on more refined hypotheses.
where |⋅| denotes the absolute distance of the argument; R mg0a , the mean rank of the a th alternative obtained with method m(m = 1, ..., M) at group g ; and R mgka , the ranking of the same alternative in the ranking delivered by assessor k k = 1, ..., n g belonging to group g . The difference R mgka − R mg0a measures how many ranks alternative a(a = 1, ..., A) has to move to be in the same position as the group ranking; is the power to which the distances between rankings are raised; and all the other symbols have the same meanings as in Sect. 2.2. It can be easily understood that = 1 gives the between-rank absolute distance, and = 2 gives the Euclidean distance. 5 The former distance is known to smooth the differences between the compared rankings, while the latter one gives much more relevance to large deviations. By construction, D mg is always positive. It can be standardised to compare the ranking variability from different studies. A possible standardisation is the ratio between the empirical value and its maximum, which is obtained when one half of assessors give rankings in reverse order to the other half of assessors. In It can be noticed that both maxima are a constant function of A , so they can be used to standardise both single rankings and the aggregate one.
For a group of assessors, the standardised distance, D * m = D m ∕Max D mg , can be conceived as the proportion of within-ranking positions that disagree with respect to a mean ranking. For instance, a standardised distance evaluated as 0.3 means that, as a whole, 30% of positions in the analysed rankings differ from the reference positions.
The method with the minimum standardised distance has a ranking that best represents the rankings delivered by the n assessors, i.e. the ranking varies the least. b) Absolute distance between the final scores generated by a model and the aggregated one mediated over all models. We assume the aggregated score is that to which all models should conform. In our case, the reference score is the one resulting from the average of the scores estimated for all models. Therefore, the absolute distance between all models and their mean is an estimate of the bias of the models: where Ŷ mga denotes the estimate of the score for alternative a from the group of assessors n with method m ; and Y̿ ga the average score for alternative a from group g across the compared methods.

An Application to Mobility Beneficiaries
The models presented in Sect. 2.2 were applied to the rankings delivered by three groups of assessors: (a) the participants in international mobility; (b) the schools and training centres that either sent or hosted participants; and (c) the companies that either sent or hosted participants. 6 In this application, the alternatives (henceforth, the actors) are the five possible beneficiaries of VET international mobility: participants; schools and training centres; companies (both sending and hosting); the labour market; and the European Union (EU) as an institution. To merge the scores of the assessing groups, the responses obtained from the three groups of assessors were mediated through an (unweighted) arithmetic mean, since, for the purpose of representing the mobility process, the responses given by the three groups are equally informative.
The models are ordered according to computational difficulty: 1. First, we analysed the distribution of the main beneficiaries of international mobility that occurred in the first positions in the assessors' rankings. The frequency gained by actor a(a = 1, ..., A) was the estimate of their benefit score. This model is named Model 1. It could be computed for each group of assessors and, in a mediated form, for all groups of assessors. 2. Then, we computed the linear combination of the frequencies at all rank positions for each mobility actor according to Formula (4), in its standardised form. Formula (4) applies constant weights to the ranking positions. This model, which can be considered a weighted synthesis of each frequency distribution, is named Model 2, and was computed for each group of assessors. Estimates for = 1 and = 2 and for ROC, RR, and SR rules were computed. 3. We also applied the bivariate analysis to estimate the benefit scores. This application was based on the estimates of the dominance degree of a generic actor a over another generic actor b for all possible combinations of actor pairs. This allowed the construction of a dominance matrix (Formula 3) of which we computed the right eigenvector associated with the first positive eigenvalue (Formula 2). The standardised elements of the eigenvector estimated the benefit scores of the mobility actors. This is presented as Model 3. 4. Finally, we applied the same procedure adopted to compute pair comparisons between actors using weights given in Formula (9). The standardised scores deriving from the first eigenvector of this new matrix are presented as Model 4. Weights were powered both with = 1 and = 2 and also ROC, RR, and SR rules were applied.
The main results are presented in Table 1, in which the estimates of the benefit scores for the actors of international mobility are compared to each other according to the generating model and the group of assessors.  Table 1 shows that estimates differed significantly according to the applied model. The overall mean scores of the alternatives by assessor group and the average of the mean scores are presented at the bottom of Table1 . The results show the following: • The difference in estimates using just the first positions with respect to the methods based on the full distribution of evaluations is notable: the average of the participants' scores computed over all assessors was almost 80%, while the average of all other methods was 36.9% and the median was 36.7%. This suggests that the estimates based on just the first position of the rankings should be considered a shortcut, as it is not influenced by the relations between beneficiaries. Since the estimation procedure based on just the first positions is an outlier among the applied models, it will not be considered further here. • Among the other computational methods, Model 2 rules were computationally the least troublesome. The difference between the most extreme score-profiles obtained by adopting the quadratic distance of alternatives and that obtained applying the absolute distance with the same RS rule is relevant: the quadratic strategy produced scores that were, averaging the benefit perceived by all assessors for participants as beneficiaries, more than 10% larger (42.4% instead of 31.6%) than those obtained with the absolute weighting rule. The scores assigned to the other beneficiaries were reduced proportionally to the score assigned to the first beneficiary but were consistent with each other. The score assigned to the top beneficiary according to the ROC rule follows immediately that of the RS with quadratic weighting rule (40.9% on average), then the RR (36.7%) and the SR (34.1) scores. These results are consistent with the literature. • The weighting rules seemed not so important if scores were estimated through the analysis of dominances. The quadratic procedure yielded results similar to the linearly-weighted analysis of dominances: the mean difference was lower than 0.6% whichever assessing-group was considered. Furthermore, the differences between assessors using either linear or quadratic powers for weight estimation were less important than those obtained in the univariate analysis. The application of any weighting rule to the analysis of dominances (Model 4) led to score profiles that did not diverge with each other. Actually, the first-beneficiary scores maintained the ordering highlighted in case of univariate analysis (Model 2): the largest value was obtained by the quadratic weighting (37.0% on average), ROC (36.9%), RR (36.8%), SR (36.7%), and RS (36.7%) rules. • We ascertained just three cases of equal scores between alternatives: once when mediating the scores obtained by three groups of assessors with Model 3 and twice when applying linear weights to Model 4. This may mean that even a flatting rule applied together with the analysis of dominances tends to assign to alternative scores that differ from each other. This result is in line with the literature (e.g. Landau 1951;Hemelrijk, Wantia and Gygax 2005). • Whatever the weighting rule, there was a clear hierarchy that placed participants at the top of the ranking with an endorsement of approximately 37%, then schools and companies at approximately 19% each, and finally, the labour market and the EU as an institution with approximately 12% each.
Table2 presents the results of the application of the two statistical criteria suitable for the evaluation of the capability of the models to give adequate estimates. The criteria were a measure of ranking variability, Formula (11), and a measure of bias, Formula (12).
The distance measures in Table2 refer to the variability of the delivered rankings from the final ranking, computed for each assessing group and model. The mean absolute distances were computed from the values marked with an asterisk at the bottom of Table 1. Moreover, for practical purposes, only complete rankings have been analysed. The bias was computed as the mean absolute deviance from a mean score, computed for each assessing group and for the assessors altogether. The Table2 estimates allow us to state the following: • The estimates of the distance between the collected rankings and the reference ranking did not differentiate among models because the reference ranking of the single models was about the same for all models. Thus, the measure of distance for Models 2 and 3 were the same for every group of assessors, but only for the RR rule. Small differences, instead, were found between Models 3 and 4 regardless of the weighting rule. The relative invariance of the distances (no difference exceeded 2.5%) was caused by both the near-linearity of the assessed alternatives and the shortness of the ranking. • Models 3 and 4, which were based on the analysis of dominances, were able to produce biased estimates of the scores in a lower proportion than Model 2, which was based on just the univariate analysis of frequencies. • Model 3, which was based on an equal-weight system, performed better in terms of variability than Model 4, for which we introduced various articulated weighting systems, but its bias was higher than any of the weighting systems in Model 4. 7 • All weighting rules applied to dominance analysis showed the same level of variability. Instead, the bias slightly differed among the applied rules: the ROC and RR rules showed the lowest bias, while the SR and the RS with linear weighting rules showed an 8% higher bias. The highest bias (almost twice that of the ROC rule) was found for the RS rule with quadratic weighting.

Discussion and Conclusion
In this paper, we presented a set of rules to score a common profile starting from large sets of rankings. Both rankings and scores concerned five categories of possible recipients of benefits from VET international mobility. For score estimation, we applied various weighting procedures that included at least three levels of information present in the collected rankings: (a) just the top recipient as stated by assessors; (b) the full distribution of frequencies derived from the rankings delivered by the assessors; and (c) the betweenalternative dominance relations that can be deduced from the assessed rankings.
We showed that the more information we used in the scoring procedure, the more stable and accurate were the estimates. So, the estimates based on just the first position were shown to be an outlier with respect to the other estimation procedures, and this method would only be recommended for an initial estimation of scores, but not for a definitive estimate. The estimation procedure based on the analysis of univariate frequency distributions, which is intermediate in terms of quantity of processed information, gave estimates that were also intermediate regarding both variability and bias. Models that use the bivariate relations between alternatives on top of the frequency distribution of assessments were the best-performing models. This result is consistent with the literature (see, among others, Poisbleau, Jenouvrier and Fritz 2006), where it has been highlighted that the betweenalternative dominance relations takes into account both the identity of the ranked alternatives and the bivariate interactions between alternatives present in the assessor's minds. 7 It is worth saying that the comparison between Models 3 and 4 would have led to the same conclusion had the score estimates been standardised with their sampling error. As Poisbleau, Jenouvrier, and Fritz (2006) also show, the sampling error of estimates based on pair comparisons is a quadratic function of n, the sample size of assessors. In fact, any ranking delivered by an assessor brings about A(A − 1)∕2 distinct pair comparisons, if no ties are allowed, and this makes the sampling errors of the estimated values of the matrix very low with respect to the estimates themselves.
Regarding the optimality of the weighting rules, the comparison between EW, RS, RR, ROC, SR, and RE rules within the processed informative models, showed conflicting results: in both univariate and bivariate analyses the variability criterion highlighted that rules whose configuration is steeper (RR and ROC) perform somewhat worse than or as well as linear or near-linear (RS and SR) rules, while the bias criterion showed the opposite, with RR and ROC slightly better than the linear and quasi-linear one. The quadratic weighting intensifies the steepness of the configuration so much that both the variability and bias criteria highlight its inadequacy as a behavioural model. On the contrary, the equal-weight rule provide a less adequate fit to the basic rankings than any position-based rule. Our data conclusively shows that the first position is to be weighted somewhat more than with a simple linear weighting.
Also, the solution of weighting through an SR model-which is intermediate between the linear and the RR rule-is as adequate as RR in the bivariate analysis and somewhat more adequate than RR in the univariate analysis if we refer to the variability criterion, but is less adequate if the bias criterion is considered. If we adopt the bias criterion, the best rule is RR, while if the variability criterion is adopted, there is no difference between the ROC, RS, and SR rules. In other words, although the between-rule differences are little, the highlights depend on the chosen evaluation criterion.
Our results are in line with the mainstream literature. In fact, Barron and Barrett (1996) compared RS, RR, ROC, and EW rules and stated that the ROC appears to perform better that the others. Sarabando and Dias (2009), applying a variety of rules, found that ROC rule is the best if just the first alternative or the top ranked alternatives are retained. Jia et al. (1998) compared EW, RS, and ROC rules, and also direct rating of alternatives in case the basic rankings are affected by assessment error. They found that direct rating is better if error is small, while RR and ROC rules are better if error is large, but the differences between linear and steeper rules are small. In addition, Jia et al. (1998) found that the accuracy of the rules is sensitive to the number of alternatives and to the sign and level of correlation among basic rankings. In particular, the authors found that if the rankings' fluctuation is high and the number of alternatives is five, the RR rule gives more accurate results than the ROC, and the latter more accurate results than RS. However, the between-rule differences are small. Furthermore, they showed that if the correlation among the rankings is positive and high-as it is in our case-the between-rule differences are even smaller.
So, which weighting method should one use? From our results, the application of a dominance analysis technique gives more accurate estimates than just the univariate analyses. The accuracy of the choice of the weighting rule depends on the steepness of the "true" weights. In summary, EW is one extreme of the horn, that in which weighting is irrelevant because either we purposively ignore the elicited rankings or consider the basic data purely random. On the contrary, the RR, ROC, and RE (with δ > 1) rules highlights that the first alternative is nearly more important than the others and implies steep weight functions. In between, RS and SR rules provide flatter weight functions. Tversky et al. (1988) and Fischer and Hawkins (1993) found that in behavioural analyses and in preference-governed people's choices, functions are often quite steep. This seems true in our case, because in all assessors' minds, participants are those who benefit the most from international mobility, while the EU as an institution and the labour market benefit just indirectly from the scheme and are, consequently, positioned by all assessors at the bottom of the ranking.
In conclusion, our results support the value of surveys on social preferences for elucidating hidden social hierarchies and values orders, and for pinpointing the social categories that are particularly associated with a phenomenon (i.e. those that are often observed at the top of a ranking, versus others that occur further down the rankings and are only vaguely associated with a phenomenon). In cases where ranks are transformed into scores, these results imply that a more-than-linear weight should be assigned to the top position. That is why the RR and ROC rules applied together within a dominance analysis seem to better fit our mobility data.
Author contributions All authors contributed to the study conception and design. Data collection and analysis were performed by Manuela Scioni. The authors share the responsibility of the whole paper, though LF wrote Sects. 1, 2 and 4, MS wrote Sect. 3. All authors read and approved the final manuscript.
Funding Open access funding provided by Università degli Studi di Padova within the CRUI-CARE Agreement. This study was pursued as a part of the project titled "ROI-MOB, Measuring return on Investment from EU VET mobility" coordinated by L. Boetti. Project no. 2016-1-IT01-KA202-005396. This project has been funded with support from the European Commission. This publication [communication] reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.