Advertisement

Clustering Rankings in the Fourier Domain

  • Stéphan Clémençon
  • Romaric Gaudel
  • Jérémie Jakubowicz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6911)

Abstract

It is the purpose of this paper to introduce a novel approach to clustering rank data on a set of possibly large cardinality n ∈ ℕ*, relying upon Fourier representation of functions defined on the symmetric group \(\mathfrak{S}_n\). In the present setup, covering a wide variety of practical situations, rank data are viewed as distributions on \(\mathfrak{S}_n\). Cluster analysis aims at segmenting data into homogeneous subgroups, hopefully very dissimilar in a certain sense. Whereas considering dissimilarity measures/distances between distributions on the non commutative group \(\mathfrak{S}_n\), in a coordinate manner by viewing it as embedded in the set [0,1] n! for instance, hardly yields interpretable results and leads to face obvious computational issues, evaluating the closeness of groups of permutations in the Fourier domain may be much easier in contrast. Indeed, in a wide variety of situations, a few well-chosen Fourier (matrix) coefficients may permit to approximate efficiently two distributions on \(\mathfrak{S}_n\) as well as their degree of dissimilarity, while describing global properties in an interpretable fashion. Following in the footsteps of recent advances in automatic feature selection in the context of unsupervised learning, we propose to cast the task of clustering rankings in terms of optimization of a criterion that can be expressed in the Fourier domain in a simple manner. The effectiveness of the method proposed is illustrated by numerical experiments based on artificial and real data.

Keywords

clustering rank data non-commutative harmonic analysis feature selection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [CJ10]
    Clémençon, S., Jakubowicz, J.: Kantorovich distances between rankings with applications to rank aggregation. In: Proceedings of ECML 2010 (2010)Google Scholar
  2. [CS01]
    Crammer, K., Singer, Y.: Pranking with ranking. In: NIPS (2001)Google Scholar
  3. [CV09]
    Clémençon, S., Vayatis, N.: Tree-based ranking methods. IEEE Transactions on Information Theory 55(9), 4316–4336 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  4. [dEW06]
    desJardins, M., Eaton, E., Wagstaff, K.: Learning user preferences for sets of objects. In: Airoldi, E.M., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds.) ICML 2006. LNCS, vol. 4503, pp. 273–280. Springer, Heidelberg (2007)Google Scholar
  5. [Dia88]
    Diaconis, P.: Group representations in probability and statistics. Institute of Mathematical Statistics, Hayward (1988)zbMATHGoogle Scholar
  6. [Dia89]
    Diaconis, P.: A generalization of spectral analysis with application to ranked data. The Annals of Statistics 17(3), 949–979 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  7. [DS89]
    Donoho, D., Stark, P.: Uncertainty principles and signal recovery. SIAM J. Appl. Math. 49(3), 906–931 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  8. [FISS03]
    Freund, Y., Iyer, R.D., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. JMLR 4, 933–969 (2003)MathSciNetzbMATHGoogle Scholar
  9. [FM04]
    Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes. JRSS 66(4), 815–849 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  10. [FV86]
    Fligner, M.A., Verducci, J.S.: Distance based ranking models. JRSS Series B (Methodological) 48(3), 359–369 (1986)MathSciNetzbMATHGoogle Scholar
  11. [FV88]
    Fligner, M.A., Verducci, J.S.: Multistage ranking models. JASA 83(403), 892–901 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  12. [HFCB08]
    Hüllermeier, E., Fürnkranz, J., Cheng, W., Brinker, K.: Label ranking by learning pairwise preferences. Artificial Intelligence 172, 1897–1917 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  13. [HG09]
    Huang, J., Guestrin, C.: Riffled independence for ranked data. In: Proceedings of NIPS 2009 (2009)Google Scholar
  14. [HGG09]
    Huang, J., Guestrin, C., Guibas, L.: Fourier theoretic probabilistic inference over permutations. JMLR 10, 997–1070 (2009)MathSciNetzbMATHGoogle Scholar
  15. [HTF09]
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn., pp. 520–528. Springer, Heidelberg (2009)CrossRefzbMATHGoogle Scholar
  16. [K8̈9]
    Körner, T.: Fourier Analysis. Cambridge University Press, Cambridge (1989)zbMATHGoogle Scholar
  17. [KB10]
    Kondor, R., Barbosa, M.: Ranking with kernels in Fourier space. In: Proceedings of COLT 2010 (2010)Google Scholar
  18. [KLR95]
    Kahane, J.P., Lemarié-Rieusset, P.G.: Fourier series and wavelets. Routledge, New York (1995)zbMATHGoogle Scholar
  19. [Kon06]
    R. Kondor. \(\mathbb{S}_n\)ob: a C++ library for fast Fourier transforms on the symmetric group (2006), http://www.its.caltech.edu/~risi/Snob/
  20. [LL03]
    Lebanon, G., Lafferty, J.: Conditional models on the ranking poset. In: Proceedings of NIPS 2003 (2003)Google Scholar
  21. [LM08]
    Lebanon, G., Mao, Y.: Non-parametric modeling of partially ranked data. JMLR 9, 2401–2429 (2008)MathSciNetzbMATHGoogle Scholar
  22. [Mal57]
    Mallows, C.L.: Non-null ranking models. Biometrika 44(1-2), 114–130 (1957)MathSciNetCrossRefzbMATHGoogle Scholar
  23. [MM09]
    Mandhani, B., Meila, M.: Tractable search for learning exponential models of rankings. In: Proceedings of AISTATS 2009 (2009)Google Scholar
  24. [MPPB07]
    Meila, M., Phadnis, K., Patterson, A., Bilmes, J.: Consensus ranking under the exponential model. Proceedings of UAI 2007, 729–734 (2007)Google Scholar
  25. [MS73]
    Matolcsi, T., Szücs, J.: Intersection des mesures spectrales conjuguées. CR Acad. Sci. S r. I Math. (277), 841–843 (1973)Google Scholar
  26. [Mur38]
    Murnaghan, F.D.: The Theory of Group Representations. The Johns Hopkins Press, Baltimore (1938)zbMATHGoogle Scholar
  27. [PTA+07]
    Pahikkala, T., Tsivtsivadze, E., Airola, A., Boberg, J., Salakoski, T.: Learning to rank with pairwise regularized least-squares. In: Proceedings of SIGIR 2007, pp. 27–33 (2007)Google Scholar
  28. [RBEV10]
    Richard, E., Baskiotis, N., Evgeniou, T., Vayatis, N.: Link discovery using graph feature tracking. In: NIPS 2010, pp. 1966–1974 (2010)Google Scholar
  29. [RKJ07]
    Howard, A., Kondor, R., Jebara, T.: Multi-object tracking with representations of the symmetric group. In: Proceedings og ICML 2007 (2007)Google Scholar
  30. [RS08]
    Rattan, A., Sniady, P.: Upper bound on the characters of the symmetric groups for balanced Young diagrams and a generalized Frobenius formula. Adv. in Math. 218(3), 673–695 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  31. [Ser88]
    Serre, J.P.: Algebraic groups and class fields. Springer, Heidelberg (1988)CrossRefzbMATHGoogle Scholar
  32. [TWH01]
    Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. Royal Stat. Soc. 63(2), 411–423 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  33. [WT10]
    Witten, D.M., Tibshirani, R.: A framework for feature selection in clustering. JASA 105(490), 713–726 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  34. [WX08]
    Wünsch, D., Xu, R.: Clustering. IEEE Press, Wiley (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Stéphan Clémençon
    • 1
  • Romaric Gaudel
    • 1
  • Jérémie Jakubowicz
    • 1
  1. 1.LTCI, Telecom Paristech (TSI)UMR Institut Telecom/CNRS No. 5141France

Personalised recommendations