Advertisement

Mathematical Programming

, Volume 127, Issue 1, pp 203–244 | Cite as

Statistical ranking and combinatorial Hodge theory

  • Xiaoye Jiang
  • Lek-Heng LimEmail author
  • Yuan Yao
  • Yinyu Ye
Open Access
Full Length Paper Series B

Abstract

We propose a technique that we call HodgeRank for ranking data that may be incomplete and imbalanced, characteristics common in modern datasets coming from e-commerce and internet applications. We are primarily interested in cardinal data based on scores or ratings though our methods also give specific insights on ordinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on an appropriate graph. Our statistical ranking method exploits the graph Helmholtzian, which is the graph theoretic analogue of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is an analogue of the Laplace operator or scalar Laplacian. We shall study the graph Helmholtzian using combinatorial Hodge theory, which provides a way to unravel ranking information from edge flows. In particular, we show that every edge flow representing pairwise ranking can be resolved into two orthogonal components, a gradient flow that represents the l 2-optimal global ranking and a divergence-free flow (cyclic) that measures the validity of the global ranking obtained—if this is large, then it indicates that the data does not have a good global ranking. This divergence-free flow can be further decomposed orthogonally into a curl flow (locally cyclic) and a harmonic flow (locally acyclic but globally cyclic); these provides information on whether inconsistency in the ranking data arises locally or globally. When applied to statistical ranking problems, Hodge decomposition sheds light on whether a given dataset may be globally ranked in a meaningful way or if the data is inherently inconsistent and thus could not have any reasonable global ranking; in the latter case it provides information on the nature of the inconsistencies. An obvious advantage over the NP-hardness of Kemeny optimization is that HodgeRank may be easily computed via a linear least squares regression. We also discuss connections with well-known ordinal ranking techniques such as Kemeny optimization and Borda count from social choice theory.

Keywords

Statistical ranking Rank aggregation Combinatorial Hodge theory Discrete exterior calculus Combinatorial Laplacian Hodge Laplacian Graph Helmholtzian HodgeRank Kemeny optimization Borda count 

Mathematics Subject Classification (2000)

68T05 58A14 90C05 90C27 91B12 91B14 

Notes

Acknowledgments

The authors thank the anonymous referees for their useful suggestions. The authors also gratefully acknowledge helpful discussions with Gunnar Carlsson, Leo Guibas, and Steve Smale.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

References

  1. 1.
    Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. In: Proceeding ACM Symposium Theory Computing (STOC ’605), vol. 37, pp. 684–693 (2005)Google Scholar
  2. 2.
    Arrow K.J.: A difficulty in the concept of social welfare. J. Polit. Econ. 58(4), 328–346 (1950)CrossRefGoogle Scholar
  3. 3.
    Arrow K.J.: General economic equilibrium: purpose, analytic techniques, collective choice. Nobel Memorial Lecture, 12 December 1972. In: Lindbeck, A. (ed.) Nobel Lectures: Economic Sciences 1969–1980, pp. 109–131. World Scientific, Singapore (1992)Google Scholar
  4. 4.
    Barber B.M., Lehavy R., McNichols M., Trueman B.: Can investors profit from the prophets? Security analyst recommendations and stock returns. J. Finance 56(2), 531–563 (2001)CrossRefGoogle Scholar
  5. 5.
    Bartholdi, L., Schick, T., Smale, N., Smale, S., Baker, A.W.: Hodge theory on metric spaces. preprint. http://www.arxiv.org/abs/0912.0284 (2009)
  6. 6.
    Bell, R.M., Koren, Y.: Scalable collaborative filtering with jointly derived neighborhood interpolation weights. In: Proceedings IEEE International Conference Data Mining (ICDM ’607), vol. 7, pp. 43–52. (2007)Google Scholar
  7. 7.
    Bradley R., Terry M.: The rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39, 324–345 (1952)zbMATHMathSciNetGoogle Scholar
  8. 8.
    Brin S., Page L.: The anatomy of a large-scale hypertextual web search engine. Comput. Networks ISDN Syst. 30(1–7), 107–117 (1998)CrossRefGoogle Scholar
  9. 9.
    Choi, S.-C.: Iterative methods for singular linear equations and least-squares problems. Ph.D. thesis, Stanford University, Stanford, CA. http://www.stanford.edu/group/SOL/dissertations/sou-cheng-choi-thesis.pdf (2006)
  10. 10.
    Choi, S.-C., Paige, C.C., Saunders, M.A.: minres -qlp: a Krylov subspace method for indefinite or singular symmetric systems, preprint. http://www.arxiv.org/abs/1003.4042 (2010)
  11. 11.
    Chung, F.: Spectral Graph Theory. In: CBMS Regional Conference Series in Mathematics, vol. 92. AMS, Providence, RI (1997)Google Scholar
  12. 12.
    Cortes, C., Mohri, M., Rastogi, A.: Magnitude-preserving ranking algorithms. In: Proceeding International Conference Machine Learning (ICML ’607), vol. 24, pp. 169–176 (2007)Google Scholar
  13. 13.
    David H.A.: The Method of Paired Comparisons, 2nd edn. Griffin’s Statistical Monographs and Courses, vol. 41. Oxford University Press, New York, NY (1988)Google Scholar
  14. 14.
    de Borda J.-C.: Mémoire sur les élections au scrutin. Histoire de’Académie Royale des Sci. 102, 657–665 (1781)Google Scholar
  15. 15.
    de Condorcet A.-N.: Éssai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris (1785)Google Scholar
  16. 16.
    Diaconis P.: A generalization of spectral analysis with application to ranked data. Ann. Statist. 17(3), 949–979 (1989)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings International Conference World Wide Web (WWW ’01), vol. 10, pp. 613–622 (2001)Google Scholar
  18. 18.
    Freund Y., Iyer R., Shapire R., Singer Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(6), 933–969 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Friedman J.: Computing Betti numbers via combinatorial Laplacians. Algorithmica 21(4), 331–346 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Hastie T., Tibshirani R.: Classification by pairwise coupling. Ann. Statis. 26(2), 451–471 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Herbrich, R., Graepel, T., Obermayer, K.: Large margin rank boundaries for ordinal regression. In: Advances in Large Margin Classifiers, pp. 115–132. MIT Press, Cambridge, MA (2000)Google Scholar
  22. 22.
    Hochbaum D.: The separation, and separation-deviation methodology for group decision making and aggregate ranking. In: Hasenbein, J.J. (ed.) Tutorials in Operations Research. vol. 7, pp. 116–141. INFORMS, Hanover, MD (2010)Google Scholar
  23. 23.
    Hochbaum D., Levin A.: Methodologies and algorithms for group-rankings decision. Manag. Sci. 52(9), 1394–1408 (2006)CrossRefGoogle Scholar
  24. 24.
    Kahle, M.: Topology of random clique complexes. Discrete Math. (to appear, 2008)Google Scholar
  25. 25.
    Kemeny J.G.: Mathematics without numbers. Daedalus 88, 571–591 (1959)Google Scholar
  26. 26.
    Kemeny J.G., Snell L.J.: Preference ranking: an axiomatic approach. In: Kemeny, J.G., Snell, L.J. (eds) Mathematical Models in the Social Sciences, pp. 9–23. MIT Press, Cambridge (1973)Google Scholar
  27. 27.
    Kendall M.G., Gibbons J.D.: Rank Correlation Methods. 5th ed. Oxford University Press, Oxford (1990)zbMATHGoogle Scholar
  28. 28.
    Kendall M.G., Smith B.B.: On the method of paired comparisons. Biometrika 31(3–4), 324–345 (1940)zbMATHMathSciNetGoogle Scholar
  29. 29.
    Kleinberg J.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  30. 30.
    Ma, M.: A matrix approach to asset pricing in foreign exchange market. preprint. http://www.ssrn.com/abstract=921755 (2008)
  31. 31.
    Miller G.A.: The magical number seven, plus or minus two some limits on our capacity for processing information. Psychol. Rev. 63(2), 81–97 (1956)CrossRefGoogle Scholar
  32. 32.
    Morton J., Pachter L., Shiu A., Sturmfels B., Wienand O.: Convex rank tests and semigraphoids. SIAM J. Discrete Math. 23(3), 1117–1134 (2009)zbMATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Mosteller, F.: Remarks on the method of paired comparisons I, II, III. Psychometrika 16(1), pp. 3–9, no. 2, pp. 203–218 (1951)Google Scholar
  34. 34.
    Noether G.E.: Remarks about a paired comparison model. Psychometrika 25, 357–367 (1960)zbMATHCrossRefMathSciNetGoogle Scholar
  35. 35.
    Paige C.C., Saunders M.A.: lsqr: an algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software 8(1), 43–71 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  36. 36.
    Saari D.G., Merlin V.R.: A geometric examination of Kemeny’s rule. Soc. Choice Welf. 17(3), 403–438 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  37. 37.
    Saaty T.L.: A scaling method for priorities in hierarchical structures. J. Math. Psych. 15(3), 234–281 (1977)zbMATHCrossRefMathSciNetGoogle Scholar
  38. 38.
    Sen A.K.: The impossibility of a paretian liberal. J. Polit. Econ. 78(1), 152–157 (1970)CrossRefGoogle Scholar
  39. 39.
    Sen A.K.: The possibility of social choice, Nobel Lecture, 8 Dec 1998. In: Persson, T. (ed.) Nobel Lectures: Economic Sciences 1996–2000, pp. 178–215. World Scientific, Singapore (2003)Google Scholar
  40. 40.
    Tahbaz-Salehi, A., Jadbabaie, A.: Distributed coverage verification in sensor networks without location information. In: Proceeding IEEE Conference Decision Control, vol. 47. (2008), to appearGoogle Scholar
  41. 41.
    Thurstone L.L.: The method of paired comparisons for social values. J. Abnorm. Soc. Psychol. 21, 384–400 (1927)CrossRefGoogle Scholar
  42. 42.
    Xu, Y., Dyer, J.S., Owen, A.B.: Empirical stationary correlations for semi-supervised learning on graphs. preprint (2009)Google Scholar
  43. 43.
    Young H.P.: Condorcet’s theory of voting. Am. Polit. Sci. Rev. 82, 1231–1244 (1988)CrossRefGoogle Scholar
  44. 44.
    Young H.P., Levenglick A.: A consistent extension of Condorcet’s election principle. SIAM J. Appl. Math. 35(2), 285–300 (1978)zbMATHCrossRefMathSciNetGoogle Scholar
  45. 45.
    Zhang J.: Binary choice, subset choice, random utility, and ranking: a unified perspective using the permutahedron. J. Math. Psych. 48(2), 107–134 (2004)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© The Author(s) 2010

Open AccessThis is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  • Xiaoye Jiang
    • 1
  • Lek-Heng Lim
    • 2
    Email author
  • Yuan Yao
    • 3
  • Yinyu Ye
    • 4
  1. 1.Institute for Computational and Mathematical EngineeringStanford UniversityStanfordUSA
  2. 2.Department of MathematicsUniversity of CaliforniaBerkeleyUSA
  3. 3.School of Mathematical Sciences, LMAM and Key Lab of Machine Perception (MOE)Peking UniversityBeijingPeople’s Republic of China
  4. 4.Department of Management Science and EngineeringStanford UniversityStanfordUSA

Personalised recommendations