1 Introduction

Analyzing consumer satisfaction by listening to their needs is fundamental in any private and public sector since it represents a useful tool for improving the efficacy of management and policy actions [5]; it becomes critical when the consumers cannot switch to other providers, deny or reduce the service [8].

Satisfaction data are usually collected through opinion survey questionnaires where each respondent is asked to value a set of attributes independently of one another and rate them accordingly, in such a way that no direct comparison is made between attributes. The widespread adoption of ordered categorical response scales is due to two factors: they are ease to manage and low time-consuming [1, 12, 16, 17]. Despite these practical advantages, this evaluation method requires a specific expertise that some respondents may lack producing two main criticisms: response biases like agreement response style (tendency to always agree with every statement irrespective of its content) and non-differentiation (tendency to not really differentiate between the statements irrespective of their content) [1, 12].

A valid alternative solution stands in elicitation methods based on ranking according which each respondent is asked to sort a set of alternatives with respect to some criteria (e.g. liking, satisfaction, agreement, importance). Such elicitation methods give high quality and informative data, high test–retest and cross-sectional reliability and high discriminate and correlation validity [11, 17, 20]; nevertheless, their widespread adoption is limited by the higher availability of suitable statistical methods for handling ordered categorical response data.

When dealing with ranking data, two typical issues of interest are to assess the diversity between two subjects expressing their preferences over the same set of alternatives and compare two or more groups of subjects (e.g. men and women, young and elderly, people from different countries) in order to investigate concordance, agreement or homogeneity among them and identify the significant factors impacting on subject preferences and, possibly, cluster the subjects [15].

Different proximity and distance measures have been proposed in the literature over the years for assessing the similarity between two rankings. A widespread approach in ranking data analysis uses similarity measures based on correlation coefficients and derives distance measures via linear transformations. Among these methods, Kendall’s \(\tau \) rank correlation coefficient [9, 10], Pearson correlation coefficient and Spearman’s \(\rho \) are the most widely used. Other typical examples of distances are Kendall, Spearman and Cayley’s distances. See Mallows [14], Critchlow et al. [2] and Diaconis [4] for more details; moreover, an extensive listing of such measures classified according to the main field of application or scientific area of interest is available in Deza and Deza [3].

A different approach to measure distances between rankings stems from multidimensional geometry, according which each ranking is a vector in the multidimensional Euclidean space and the distance between two of them is measured via the pairwise angular distance. Following the multidimensional geometry approach, in this paper the rankings are analyzed and interpreted using a simple distance metric based on cosine similarity, that is the normalized pairwise angular distance metric [21].

It is noteworthy that, as recently pointed out by Kumar and Vassilvitskii [13], the traditional similarity and distance measures between rankings do not care about two crucial concepts: (1) some alternatives could be more important than others, so that swapping equally important alternatives should be less penalizing than swapping not equally important alternatives; (2) swapping alternatives belonging to the top of the ranking could be more relevant than swapping alternatives belonging to the tail of the ranking. In order to overcome the two criticisms, Kumar and Vassilvitskii [13] defined the element weights and position weights for handling the first and second criticism, respectively.

The aim of this paper is to propose a variant of the normalized pairwise angular distance metric [21, 23] suitable for position weighted rankings that penalizes inversions in the top of the ranking more than inversions in the tail of the ranking. The proposed weighted distance metric will be adopted for testing preference heterogeneity across groups of subjects via an inferential procedure based on the index of segregation power \(I_{SP}\) introduced by Gadrich et al. [6] for numerical and categorical variables and further specialized by Vanacore et al. [23] to complete rankings.

The performance of the testing procedure based on \(I_{SP}\) with the weighted normalized angular distance metric is investigated via a Monte Carlo simulation study under different scenarios, differing for group size, number of ranked alternatives and system of hypotheses. The results of the Monte Carlo simulation study are compared against those obtained for the normalized angular distance metric discussed in Vanacore and Pellegrino [22].

The paper is organized as follows: the weighted version of the normalized pairwise angular distance metric together with the testing procedure are introduced in Sect. 2; the design and the main results of the Monte Carlo simulation study are described in Sect. 3; finally, conclusions are summarized in Sect. 4.

2 Weighted normalized pairwise angular distance metric

Let M subjects rank a finite set \(\varvec{A} = \{A_{1}, A_{2}, \dots , A_{n}\}\) of n alternatives. Each ranking can be represented as an antisymmetric square alternative-to-alternative matrix \((a_{lm})_{n \times n}\) of order n. The generic element \(a_{lm}\) of the (lm) cell represents the preference relation between the ranked alternatives \(A_{l}\) and \(A_{m}\) and thus it assumes the following values:

$$\begin{aligned} a_{lm}= {\left\{ \begin{array}{ll} -1 &{} \text {if the alternative}~ A_{l}~ \text {preceeds the alternative}~ A_{m} \\ 0 &{} \text {if}~ A_{l}~ \text {and}~ A_{m}~ \text {are indistinguishable alternatives} \\ 1 &{} \text {if the alternative}~ A_{l}~ \text {follows the alternative}~ A_{m} \end{array}\right. } \end{aligned}$$

A suitable distance measure between two rankings comes from multidimensional geometry, according which each ranking, and thus matrix \((a_{lm})_{n \times n}\), can be considered as a vector in a multidimensional Euclidean space. A measure of dissimilarity between two rankings is the normalized pairwise angular distance metric [21]. Specifically, let a and b be two such vectors, the normalized pairwise angular distance metric \(L_{\mathbf{ab }}\) is expressed as follows:

$$\begin{aligned} L_{\mathbf{ab }}= \frac{1}{\pi } \cdot \arccos \Biggl [ \frac{\sum _{l=1}^{n-1} \sum _{m=l+1}^n a_{lm} b_{lm}}{\sqrt{\sum _{l=1}^{n-1} \sum _{m=l+1}^n a_{lm}^2} \sqrt{\sum _{l=1}^{n-1} \sum _{m=l+1}^n b_{lm}^2}} \Biggr ]=\frac{\hat{\theta }_{\mathbf{a,b }}}{\pi }. \end{aligned}$$
(1)

where \(\pi \) is the straight angle against which the angular distance \(\hat{\theta }_{\mathbf{a,b }}\) is normalized.

Position weighted distances are built to enhance the reactiveness to the role of the position of each alternative, that is swaps at the top of the ranking are costlier than those at the tail of the ranking. The weighted variant of the normalized pairwise angular distance metric can be formulated as follows:

$$\begin{aligned} L_{\mathbf{ab }_{w}} = \frac{1}{\pi } \cdot \arccos \Biggl [ \frac{\sum _{l=1}^{n-1} \; \sum _{m=l+1}^n a_{{lm}_{w}} \; b_{{lm}_{w}}}{\sqrt{\sum _{l=1}^{n-1} \sum _{m=l+1}^n a_{{lm}_{w}}^2} \sqrt{\sum _{l=1}^{n-1} \sum _{m=l+1}^n b_{{lm}_{w}}^2}} \Biggr ]=\frac{\hat{\theta }_{\mathbf{a }_{w},\mathbf{b }_{w}}}{\pi } \end{aligned}$$
(2)

where \(a_{{lm}_{w}}\) (with \(l=1, \dots , n-1\); \(m=l+1, \dots , n\)) is the position weighted version of the generic element \(a_{lm}\) of the antisymmetric square alternative-to-alternative matrix \((a_{lm})_{n \times n}\). Specifically, let \(w = (w_{1}, w_{2}, \dots , w_{n})\) be the vector of non-increasing weights for each ranking position, the generic elements \(a_{{lm}_{w}}\) and \(b_{{lm}_{w}}\) are given by:

$$\begin{aligned} a_{{lm}_{w}}=a_{lm} \cdot w_{l}; \qquad b_{{lm}_{w}}=b_{lm} \cdot w_{l} \end{aligned}$$
(3)

It is worth to pinpoint that the square alternative-to-alternative matrix \((a_{lm})_{n \times n}\) is antisymmetric so that, in order to reduce the computational burden, Eqs. 1 and 2 are applied only on the elements of the strictly upper triangular matrix. Anyway, the weighted square alternative-to-alternative matrix can be easily obtained by filling in the empty cells with their antisymmetric value.

The position weights, modeling the cost of swapping two alternatives according to their position in the ranking, can be arbitrary defined, constrained to \(\sum _{l} w_{l}=1\). The position weighting scheme here adopted, derived from the radical weighting scheme adopted in agreement studies [7], is proportional to the distance between positions and is formulated as follows:

$$\begin{aligned} w_{l}=1-\sqrt{\frac{l-1}{n-1}}. \end{aligned}$$
(4)

In order to compare several groups of subjects for testing ranking heterogeneity, the total variation among the rankings provided by all subjects (i.e. K groups of \(\text {M}\) subjects) \(\hat{V}_{\text {TOT}}\), formulated as proposed in Vanacore et al. [23], is split—according to Rao’s apportionment of diversity [19]—into within \(\hat{V}_{\text {WG}}\) and between group \(\hat{V}_{\text {BG}}\) components. The inter-group ranking heterogeneity is measured through the index of segregation power \(I_{SP}\) [6], defined as the quotient of normalized between-group variation (\(\hat{V}_{\text {BG}}\)) to normalized total variation (\(\hat{V}_{\text {TOT}}\)):

$$\begin{aligned} I_{SP}=\frac{\hat{V}_{\text {BG}}/df_{\text {BG}}}{\hat{V}_{\text {TOT}}/df_{\text {TOT}}}. \end{aligned}$$
(5)

Let \(L_{i_{h}j_{k}}\) be the distance between the rankings provided by \(i^{th}\) subject belonging to group h and \(j^{th}\) subject belonging to group k, the total ranking variation and its components are formulated as shown in Table 1. The readers can refer to [22] for more computational details.

Table 1 Estimates of total ranking variation and its components

It is worth noting that although \(\hat{V}_{\text {BG}}/\hat{V}_{\text {TOT}}\) is always less than 1, the \(I_{SP}\) value could exceed 1 since the ratio \(df_{\text {TOT}}/df_{\text {BG}}=(\text {M}K-1)/(\text {M}K-\text {M}) \ge 1\) with \(\text {M} \ge 2\).

3 Monte Carlo simulation

The Monte Carlo simulation study aims at investigating: (1) the performance of the weighted normalized angular distance metric in discriminating dissimilarity between rankings; (2) the impact of the position weights on the performance of the procedure for testing preference heterogeneity. The results are compared against those obtained for the normalized angular distance metric without position weights. The algorithm for Monte Carlo simulations have been implemented using Mathematica (Version 11.0, Wolfram Research, Inc., Champaign, IL, USA).

3.1 Discriminating performance of the weighted normalized angular distance metric

Let us consider two subjects i and j expressing their preferences about \(n= 5, 6, 7, 8, 9, 10\) alternatives; the \(n! \times n!\) normalized pairwise angular distances \(L_{ij}\) and \(L_{ij_{w}}\) are represented in the box-plots of Fig. 1. It is evident that the introduction of position weights produces a wider range of distance values, allowing to better discriminate dissimilarity between rankings.

Fig. 1
figure 1

Box-plot of weighted (in orange) and unweighted (in blue) normalized pairwise angular distance metric for n alternatives

The discriminating performance of \(L_{ij_{w}}\) has been investigated and compared against that of \(L_{ij}\) focusing on rankings differing each other for only one swap. Specifically, two factors have been considered: swap position with two levels (i.e. at the top and at the tail of the ranking) and swap dimension with four levels (i.e. between adjacent, 1-step apart, 2-steps apart and 3-steps apart positions), for a total of \(2 \times 4 = 8 \) simulated scenarios for \(n=5,6,7,8,9,10\) alternatives. The distance values \(L_{ij}\) and \(L_{ij_{w}}\) are plotted in Fig. 2 for all the investigated scenarios.

The plotted curves show that both \(L_{ij}\) and \(L_{ij_{w}}\) decrease with the number of ranked alternatives and increase with the swap dimension; however, it is worth to note that only the weighted normalized angular distance metric takes into account the position of the swap. Indeed, for any swap dimension, \(L_{ij_{w}}\) is about three times higher if the swap is at the top preferences rather than if it is at the tail; vice-versa, \(L_{ij}\) does not depend on swap position, being its value the same either the swap is at the top or at the tail of the ranking.

Fig. 2
figure 2

Weighted (in orange) and unweighted (in blue) normalized pairwise angular distance metric for 1 swap of adjacent, 1-step apart, 2-steps apart and 3-steps apart positions at the top (solid lines) or at the tail (dashed lines) of the ranking of n alternatives

3.2 Statistical power of the inter-group preference heterogeneity testing procedure

Let us consider M subjects expressing their preferences about n alternatives. In order to simulate different levels of inter-group heterogeneity, preferences have been generated using the distance-based model developed by Diaconis [4], which assumes that the probability of each ranking depends on its distance to a chosen modal ranking, to which it is expected most of rankings are close.

Different samples of rankings have been simulated by varying either the modal ranking or the dispersion parameter \(\lambda _{k}\) for the \(k^{th}\) group of subjects; specifically, \(\lambda _{k}\) controls the probability of each ranking and higher differences among \(\lambda _{k}\) produce higher heterogeneity among the rankings provided by different groups of subjects.

The case of \(K = 3\) groups of \(\text {M}\) subjects has been considered and 108 scenarios have been simulated differing for group size, number of alternatives to rank and system of hypotheses. The group sizes simulated for this study are \(\text {M} = 10, 20, 50\) subjects and the number of alternatives to rank are \(n =5, 10\); these values have been selected in order to represent a range of values that might be seen in practice in a survey [18].

The groups have been firstly simulated having all the same dispersion parameter \(\lambda _{k}\), chosen in the range \([1 \div 50]\), but different modal ranking with one swap at the top of the ranking (i.e. first simulation setting). Specifically, 7 different \(\lambda _{k}\) values have been considered, that is \(\lambda _{k}=1, 3, 5, 10, 20, 35, 50\), so as to represent as many alternative hypotheses of growing levels of inter-group ranking heterogeneity.

Then, the groups have been simulated each with a different dispersion parameter \(\lambda _{k}\) chosen in the range \([1 \div 30]\) but having all the same modal ranking (i.e. second simulation setting). Specifically, 11 combinations of different values of \(\lambda _{k}\) have been considered so as to represent as many alternative hypotheses H\(_{1}\) (see Table 2) of growing levels of inter-group ranking heterogeneity, obtained by choosing combinations of \(\lambda _{k}\) values characterized by increasing values of maximum difference in ranking dispersion (\(\varDelta \lambda _{k}\)).

For each scenario, \(R=2000\) data sets with K groups of \(\text {M}\) rankings have been generated, the distances between rankings have been measured via the weighted and unweighted normalized pairwise angular distance metrics [21] and the \(I_{SP}\) has been assessed.

The simulation procedure has been developed through the following steps:

  1. 1.

    set a modal ranking and the dispersion parameters \(\lambda _{k}\) for each of the K groups of size M (\(\text {M}=10, 20, 50\));

  2. 2.

    sample the rankings provided by the M subjects of the K groups over n alternatives according to the framework of distance-based models;

  3. 3.

    compute the weighted normalized distance \(L_{ij_{w}}\) for each pair of subjects (Eq. 2);

  4. 4.

    assess the total ranking variation and its within and between components (see Table 1);

  5. 5.

    compute the \(I_{SP}\) index (Eq. 5);

  6. 6.

    repeat R times steps 1–5;

  7. 7.

    for a significance level \(\alpha \), define the critical value \(I_{cr}\) as the (\(1-\alpha \)) percentile of the empirical sampling distribution of \(I_{SP}\) built under the assumption of homogeneity;

  8. 8.

    compute the statistical power for each alternative hypothesis as:

    $$\begin{aligned} 1-\beta = \frac{1}{R} \sum \limits _{r=1}^R I[I_{SP_{r}} > I_{cr} | \text {H}_{1}]; \end{aligned}$$
    (6)
  9. 9.

    compute the unweighted normalized angular distance metric \(L_{ij}\) for each pair of subjects (Eq. 1) with the outputs of step 2;

  10. 10.

    repeat steps 4-8 adopting the weighted distance values \(L_{{ij}_{w}}\).

Table 2 Values of \(\lambda _{k}\) identifying the nine alternative hypotheses H\(_{1}\) of the second simulation setting

For \(\alpha =0.05\), the values of \(I_{cr}\) for the null hypothesis H\(_{0}\): \(\lambda _{1}=\lambda _{2}=\lambda _{3}=1\) are reported in Table 3; whereas the values of statistical power for the two testing procedures for each tested hypothesis H\(_{1}\) are reported in Tables 4 and 5, respectively.

Table 3 Critical values \(I_{cr}\) of the indicator \(I_{SP}\) for the significance level \(1-\alpha =0.95\), computed with and without position weights considering \(K =3\) groups of M subjects ranking n alternatives
Fig. 3
figure 3

Statistical power curves when testing alternative hypotheses H\(_{1}\) with increasing dispersion parameter \(\lambda _{k}\) against the null hypothesis H\(_{0}\): \(\lambda _{1}=\lambda _{2}=\lambda _{3}=1\), for \(n=5\) and \(n=10\) alternatives, \(\text {M}=10\) (short-dashed lines), \(\text {M}=20\) (long-dashed lines) and \(\text {M}=50\) (solid lines) subjects, with (in orange) and without (in blue) position weights

Table 4 Statistical power when testing alternative hypotheses with increasing dispersion parameter \(\lambda _{k}\) for different n and M values with (on the left) and without (on the right) position weights against the null hypothesis H\(_{0}\): \(\lambda _{1}=\lambda _{2}=\lambda _{3}=1\)
Fig. 4
figure 4

Statistical power curves when testing alternative hypotheses H\(_{1}\) with increasing difference in ranking dispersion (i.e. \(\varDelta \lambda _{k}\)), for \(n=5\) and \(n=10\) alternatives, \(\text {M}=10\) (short-dashed lines), \(\text {M}=20\) (long-dashed lines) and \(\text {M}=50\) (solid lines) subjects, with (in orange) and without (in blue) position weights

Table 5 Statistical power when testing alternative hypotheses with increasing difference in ranking dispersion (\(\varDelta \lambda _{k}\)) against the null hypothesis H\(_{0}\): \(\lambda _{1}=\lambda _{2}=\lambda _{3}=1\) for different values of n and M with (on the left) and without (on the right) position weights

The results highlight that the adoption of position weights for assessing the dissimilarity between rankings makes the statistical power of the testing procedure for inter-group heterogeneity increase. The improving power rate is more evident when the modal ranking changes across groups (i.e. first simulation setting) rather than with groups having the same modal ranking but different dispersion parameters (i.e. second simulation setting).

In the first simulation setting (see Fig. 3 and Table 4), for \(n=5\) a small difference in power rates is observed between weighted and unweighted distances; in both cases the statistical power is higher than 80% with small and homogeneous groups of subjects (i.e. starting from a \(\lambda _{k}\) value equal to 3 for \(10< \text {M} < 20\)). The difference in power rates between weighted and unweighted distances increases with the number of alternatives. For \(n=10\), when the position weights are included the statistical power is higher than 80% with homogeneous groups of subjects (i.e. starting from \(\lambda _{k}=3\) for \(\text {M}=50\) or \(\lambda _{k}=5\) for \(10< \text {M} < 20\)); whereas, when the weights are not included, the statistical power reaches 80% only with more heterogeneous groups of subjects (i.e. \(\lambda _{k} \ge 10\)).

The results obtained under the second simulation setting (see Fig. 4 and Table 5) reveal that in the case of \(n=5\) alternatives the increase in statistical power due to the inclusion of position weights is more evident with groups of \(\text {M} > 10\) subjects; whereas in the case of \(n=10\) alternatives the increase in statistical power is evident for any group size.

For \(n=5\) alternatives, the adoption of position weights makes the testing procedure adequately powered in detecting inter-group ranking heterogeneity even for small group size (i.e. \(\text {M}=10\)) with \(\varDelta \lambda _{k} \approx 10\). In scenarios with a fairly large set of alternatives, like \(n=10\), the statistical power of the testing procedure slightly worsens and with small groups of \(\text {M}=10\) subjects it reaches 80% when \(\varDelta \lambda _{k} \approx 17\). Nevertheless, the statistical power can be improved by increasing the group dimension, reaching 80% with groups of \(\text {M}=50\) subjects with \(\varDelta \lambda _{k} \approx 4\) for \(n=5\) or with \(\varDelta \lambda _{k} \approx 6\) for \(n=10\).

4 Conclusions

In this paper a position weighted variant of the normalized pairwise angular distance metric has been proposed.

The performance of the proposed metric in assessing ranking dissimilarity and inter-group heterogeneity has been investigated by an extensive Monte Carlo simulation study. The critical values of the \(I_{SP}\) index under the null hypothesis of inter-group ranking homogeneity have been obtained for different scenarios and the statistical power has been computed for increasing level of heterogeneity by varying either the modal ranking or the dispersion parameter.

The simulation results have been compared against those obtained for the unweighted distance metric. \(L_{ij}\) and \(L_{ij_{w}}\) decrease with the number of alternatives to rank and are both positively related to swap dimension; only \(L_{ij_{w}}\) accounts for the position of the swap taking on higher values if the swaps are at the top of the ranking. Whatever the adopted metric, the statistical power of the testing procedure improves for increasing group size and decreasing number of alternatives to rank and it reaches 80% even with as few as \(\text {M}=10\) subjects ranking \(n=5\) alternatives. The inclusion of position weights makes the performance of the inter-group heterogeneity testing procedure improve; the difference in statistical power is more evident when the groups have different modal ranking and the same dispersion parameter rather than when the groups have the same modal ranking but differ for the dispersion parameter.