Bandit Online Optimization over the Permutahedron

  • Nir Ailon
  • Kohei Hatano
  • Eiji Takimoto
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8776)


The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),…, π(n)) for all permutations (bijections) π over {1,…, n}. We study a bandit game in which, at each step t, an adversary chooses a hidden weight weight vector s t , a player chooses a vertex π t of the permutahedron and suffers an observed instantaneous loss of \(\sum_{i=1}^n\pi_t(i) s_t(i)\).

We study the problem in two regimes. In the first regime, s t is a point in the polytope dual to the permutahedron. Algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of \(O(n\sqrt{T \log n})\) after T steps. Unfortunately, CombBand requires at each step an n-by-n matrix permanent computation, a #P-hard problem. Approximating the permanent is possible in the impractical running time of O(n 10), with an additional heavy inverse-polynomial dependence on the sought accuracy. We provide an algorithm of slightly worse regret \(O(n^{3/2}\sqrt{T})\) but with more realistic time complexity O(n 3) per step. The technical contribution is a bound on the variance of the Plackett-Luce noisy sorting process’s ‘pseudo loss’, obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices of rational functions in exponents of 3 parameters.

In the second regime, s t is in the hypercube. For this case we present and analyze an algorithm based on Bubeck et al.’s (2012) OSMD approach with a novel projection and decomposition technique for the permutahedron. The algorithm is efficient and achieves a regret of \(O(n\sqrt{T})\), but for a more restricted space of possible loss vectors.


Relative Entropy Random Utility Model Online Optimization Projection Step Bregman Divergence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ailon, N.: Improved Bounds for Online Learning Over the Permutahedron and Other Ranking Polytopes. In: AISTATS (2014)Google Scholar
  2. 2.
    Ailon, N., Hatano, K., Takimoto, E.: Bandit online optimization over the permutahedron (technical report). arXiv:1312.1530 (2013)Google Scholar
  3. 3.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2003)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Beggs, S., Cardell, S., Hausman, J.: Assessing the potential demand for electric cars. Journal of Econometrics 17(1), 1–19 (1981)CrossRefGoogle Scholar
  5. 5.
    Bubeck, S.: Introduction to Online Optimization (2011),
  6. 6.
    Bubeck, S., Cesa-Bianchi, N., Kakade, S.M.: Towards Minimax Policies for Online Linear Optimization with Bandit Feedback. In: Proceedings of 25th Annual Conference on Learning Theory (COLT 2012), pp. 41.1–41.14 (2012)Google Scholar
  7. 7.
    Cesa-Bianchi, N., Lugosi, G.: Combinatorial bandits. J. Comput. Syst. Sci. 78(5), 1404–1422 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Dani, V., Hayes, T.P., Kakade, S.: The price of bandit information for online optimization. In: NIPS (2007)Google Scholar
  9. 9.
    Hazan, E.: The convex optimization approach to regret minimization. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, ch. 10, pp. 287–304. MIT Press (2011)Google Scholar
  10. 10.
    Hazan, E., Karnin, Z.S., Mehka, R.: Volumetric spanners and their applications to machine learning. CoRR, abs/1312.6214 (2013)Google Scholar
  11. 11.
    Helmbold, D.P., Warmuth, M.K.: Learning Permutations with Exponential Weights. Journal of Machine Learning Research 10, 1705–1736 (2009)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Jerrum, M., Sinclair, A., Vigoda, E.: A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. J. ACM 51(4), 671–697 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Marden, J.I.: Analyzing and Modeling Rank Data. Chapman & Hall (1995)Google Scholar
  14. 14.
    Suehiro, D., Hatano, K., Kijima, S., Takimoto, E., Nagano, K.: Online Prediction under Submodular Constraints. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 260–274. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  15. 15.
    Valiant, L.G.: The complexity of computing the permanent. Theor. Comput. Sci. 8, 189–201 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Yasutake, S., Hatano, K., Kijima, S., Takimoto, E., Takeda, M.: Online Linear Optimization over Permutations. In: Asano, T., Nakano, S.-i., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 534–543. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Yellott, J.: The relationship between Luce’s choice axiom, Thurstone’s theory of comparative judgment, and the double exponential distribution. Journal of Mathematical Psychology 15, 109–144 (1977)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Nir Ailon
    • 1
  • Kohei Hatano
    • 2
  • Eiji Takimoto
    • 2
  1. 1.Department of Computer ScienceTechnionIsrael
  2. 2.Department of InformaticsKyushu UniversityJapan

Personalised recommendations