Abstract
Distances on permutations are often convenient tools for analyzing and modeling rank data. They measure the closeness between two rankings and can be very useful and informative for revealing the main structure and features of the data. In this paper, some statistical properties of the Lee distance are studied. Asymptotic results for the random variable induced by Lee distance are derived and used to compare the Distance-based probability model and the Marginals model for complete rankings. Three rank datasets are analyzed as an illustration of the presented models.
This is a preview of subscription content, access via your institution.

Change history
20 February 2021
A Correction to this paper has been published: https://doi.org/10.1007/s00184-021-00810-9
References
Alvo M, Yu PL (2014) Statistical methods for ranking data. Frontiers in probability and the statistical sciences. Springer, Berlin
Chan CH, Yan F, Kittler J, Mikolajczyk K (2015) Full ranking as local descriptor for visual recognition: a comparison of distance metrics on \(\mathbf{S}_{n}\). Pattern Recognit 48:134–160
Critchlow DE (1985) Metric methods for analyzing partially ranked data. Lecture Notes in Statistics, vol 34. Springer, New York
Critchlow DE (1992) On rank statistics: an approach via metrics on the permutation group. J Stat Plan Inference 32:325–346
Deza M, Huang T (1998) Metrics on permutations, a survey. J Comb Inf Syst Sci 23:173–185
Diaconis P (1988) Group representations in probability and statistics. IMS Lecture Notes—Monograph Series, vol 11. Institute of Mathematical Statistics, Hayward
Diaconis P (1989) A generalization of spectral analysis with application to ranked data. Ann Stat 17:949–979
Fligner M, Verducci T (1986) Distance based ranking models. J R Stat Soc 48:359–369
Hoeffding W (1951) A combinatorial limit theorem. Ann Math Stat 22:558–566
Irurozki E, Calvo B, Lozano A (2014) Sampling and learning the Mallows and Weighted Mallows models under the Hamming distance. Technical report. https://addi.ehu.es/bitstream/handle/10810/11240/tr14-3.pdf. Accessed 28 Sept 2018
Lee CY (1961) An algorithm for path connections and its applications. IRE Trans Electron Comput 10:346–365
Mallows CM (1957) Non-null ranking models. I. Biometrika 44:114–130
Mao A, Procaccia AD, Chen Y (2013) Better human computation through principled voting. In: Proceedings of 27th AAAI conference on artificial intelligence, pp 1142–1148
Marden JI (1995) Analyzing and modeling rank data. Monographs on statistics and applied probability, vol 64. Chapman & Hall, London
Mattei N, Walsh T (2013) Preflib: a library of preference data. In: Proceedings of 3rd international conference on algorithmic decision theory. Springer. http://www.preflib.org. Accessed 28 Sept 2018
Mukherjee S (2016) Estimation in exponential families on permutations. Ann Stat 44:853–875
Nikolov NI (2016) Lee distance in two-sample rank tests. In: Proceedings of 11th international conference on computer data analysis and modeling, pp 100–103
Nikolov NI, Stoimenova E (2017) Mallows’ model based on Lee distance. In: Proceedings of 20th European young statisticians meeting, pp 59–66
Nikolov NI, Stoimenova E (2018) EM estimation of the parameters in latent Mallows’ models. Studies in computational intelligence. Springer, Berlin
Skowron P, Faliszewski P, Slinko A (2013) Achieving fully proportional representation is easy in practice. In: Proceedings of 2013 international conference on autonomous agents and multi-agent systems, pp 399–406
Verducci JS (1982) Discrimination between two populations on the basis of ranked preferences. PhD dissertation, Department of Statistics, Stanford University
Verducci JS (1989) Minimum majorization decomposition. In: Gleser LJ, Perlman MD, Press SJ, Sampson AR (eds) Contributions to probability and statistics. Springer, Berlin, pp 160–173
Yu PLH, Xu H (2018) Rank aggregation using latent-scale distance-based models. Stat Comput. https://doi.org/10.1007/s11222-018-9811-9
Acknowledgements
The work of the first author was supported by the Support Program of Bulgarian Academy of Sciences for Young Researchers under Grant 17-95/2017. The work of the second author was supported by the National Science Fund of Bulgaria under Grant DH02-13.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In order to prove Theorem 3, let’s consider the random variables \(D_{N,k}=d_{L}\left( \pi ,e_{N}\right) \), where \(k=1,2,\ldots ,N\) and \(\pi \) is randomly selected from \({\mathbf {S}}_{N,k}=\left\{ \sigma \in {\mathbf {S}}_{N}: \sigma (N)=k\right\} \), i.e. \(\pi \sim Uniform({\mathbf {S}}_{N,k})\). Then, for fixed k,
where \(\sigma \in {\mathbf {S}}_{N-1}\) and for \(i,j=1,2,\ldots ,N-1\),
Lemma 1
Let \( \tilde{D}_{N-1}\left( \sigma \right) =\sum \nolimits _{i=1}^{N-1}\tilde{c}_{N}(\sigma (i),i)\), where \(\sigma \sim Uniform({\mathbf {S}}_{N-1})\) and \(\tilde{c}_{N}(\cdot ,\cdot )\) is as in (21). Then the distribution of \(\tilde{D}_{N-1}\) is asymptotically normal and the mean and variance of \(\tilde{D}_{N-1}\) are
where
Proof
From (6) of Theorem 1 and formulas (21) and (10), it follows that
for \(i,j=1,2,\ldots ,N\). Simplifying expression (23) gives
When N is even, the variance of \(\tilde{D}_{N-1}\) can be calculated by
where the summation \(\sum _{j=l_{1}}^{l_{2}}=0\), if \(l_{1}>l_{2}\). Since the computations for \(Q_{1}\), \(Q_{2}\), \(Q_{3}\) and \(Q_{4}\) are similar, only the steps for \(Q_{1}\) are presented herein.
where
for \( B_{N}(k)=\frac{c_{N}(k,N)}{\left( N-1\right) ^{2}}-\frac{N}{\left( N-1\right) ^{2}}\left[ \frac{N+1}{2}\right] \left[ \frac{N}{2}\right] =\frac{4(N-k)-N^{3}}{4\left( N-1\right) ^{2}}\) and \(\sum _{i=l_{1}}^{l_{2}}=0\), if \(l_{1}>l_{2}\). The calculation of \(Q_{1}\) is completed by repeatedly using the formula
for appropriate values of a and n.
The quantities \(Q_{2}\), \(Q_{3}\) and \(Q_{4}\) can be decomposed and calculated in a similar fashion as shown for \(Q_{1}\). The final result for the variance of \(\tilde{D}_{N-1}\), when N is even, is
The variance \({\mathbf {Var}} \left( \tilde{D}_{N-1}\right) \), when N is odd, can be obtained by decomposing it to four decomposable double sums and applying formula (25), as in the case when N is even.
From (24) and (2), it follows that
By using (22),
where \(\lim _{N \rightarrow \infty }O\left( \frac{1}{N}\right) =0\). Therefore,
i.e. the condition (8) of Theorem 1 is fulfilled and the distribution of \(\tilde{D}_{N-1}\) is asymptotically normal. \(\square \)
Proof (Proof of Theorem 3)
From (14), (19) and (15), it follows that
where \(g_{N}(\cdot )\) and \(\tilde{g}_{N-1}(\cdot )\) are the moment generating functions of \(D_{L}(\pi )\) and \(D_{i,j}(\sigma )\), for \(\pi \sim Uniform({\mathbf {S}}_{N})\) and \(\sigma \sim Uniform({\mathbf {S}}_{i,j})\). Since \(D_{i,j}\) depends on i and j only through \(c_{N}(i,j)\), the random variables \(D_{i,j}\) and \(D_{N,k}\) are identically distributed for \({k=N-c_{N}(i,j)}\). From Theorem 2 and Lemma 1, \(g_{N}(\cdot )\) and \(\tilde{g}_{N-1}(\cdot )\) can be approximated, so
where \(\mu ={\mathbf {E}}\left( D_{i,j}\right) -{\mathbf {E}}(D_{L})\) and \(\nu ^{2}={\mathbf {Var}}\left( D_{i,j}\right) -{\mathbf {Var}}(D_{L})\).
According to Lemma 1,
The values of \(\mu \) and \(\nu ^{2}\) are obtained by combining the results above with formulas (10) and (12). \(\square \)
Rights and permissions
About this article
Cite this article
Nikolov, N.I., Stoimenova, E. Asymptotic properties of Lee distance. Metrika 82, 385–408 (2019). https://doi.org/10.1007/s00184-018-0687-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-018-0687-7