Skip to main content

Bandit Online Optimization over the Permutahedron

  • Conference paper
Algorithmic Learning Theory (ALT 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8776))

Included in the following conference series:

Abstract

The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),…, π(n)) for all permutations (bijections) π over {1,…, n}. We study a bandit game in which, at each step t, an adversary chooses a hidden weight weight vector s t , a player chooses a vertex π t of the permutahedron and suffers an observed instantaneous loss of \(\sum_{i=1}^n\pi_t(i) s_t(i)\).

We study the problem in two regimes. In the first regime, s t is a point in the polytope dual to the permutahedron. Algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of \(O(n\sqrt{T \log n})\) after T steps. Unfortunately, CombBand requires at each step an n-by-n matrix permanent computation, a #P-hard problem. Approximating the permanent is possible in the impractical running time of O(n 10), with an additional heavy inverse-polynomial dependence on the sought accuracy. We provide an algorithm of slightly worse regret \(O(n^{3/2}\sqrt{T})\) but with more realistic time complexity O(n 3) per step. The technical contribution is a bound on the variance of the Plackett-Luce noisy sorting process’s ‘pseudo loss’, obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices of rational functions in exponents of 3 parameters.

In the second regime, s t is in the hypercube. For this case we present and analyze an algorithm based on Bubeck et al.’s (2012) OSMD approach with a novel projection and decomposition technique for the permutahedron. The algorithm is efficient and achieves a regret of \(O(n\sqrt{T})\), but for a more restricted space of possible loss vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ailon, N.: Improved Bounds for Online Learning Over the Permutahedron and Other Ranking Polytopes. In: AISTATS (2014)

    Google Scholar 

  2. Ailon, N., Hatano, K., Takimoto, E.: Bandit online optimization over the permutahedron (technical report). arXiv:1312.1530 (2013)

    Google Scholar 

  3. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2003)

    Article  MathSciNet  Google Scholar 

  4. Beggs, S., Cardell, S., Hausman, J.: Assessing the potential demand for electric cars. Journal of Econometrics 17(1), 1–19 (1981)

    Article  Google Scholar 

  5. Bubeck, S.: Introduction to Online Optimization (2011), http://www.princeton.edu/~bubeck/BubeckLectureNotes.pdf

  6. Bubeck, S., Cesa-Bianchi, N., Kakade, S.M.: Towards Minimax Policies for Online Linear Optimization with Bandit Feedback. In: Proceedings of 25th Annual Conference on Learning Theory (COLT 2012), pp. 41.1–41.14 (2012)

    Google Scholar 

  7. Cesa-Bianchi, N., Lugosi, G.: Combinatorial bandits. J. Comput. Syst. Sci. 78(5), 1404–1422 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  8. Dani, V., Hayes, T.P., Kakade, S.: The price of bandit information for online optimization. In: NIPS (2007)

    Google Scholar 

  9. Hazan, E.: The convex optimization approach to regret minimization. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, ch. 10, pp. 287–304. MIT Press (2011)

    Google Scholar 

  10. Hazan, E., Karnin, Z.S., Mehka, R.: Volumetric spanners and their applications to machine learning. CoRR, abs/1312.6214 (2013)

    Google Scholar 

  11. Helmbold, D.P., Warmuth, M.K.: Learning Permutations with Exponential Weights. Journal of Machine Learning Research 10, 1705–1736 (2009)

    MathSciNet  MATH  Google Scholar 

  12. Jerrum, M., Sinclair, A., Vigoda, E.: A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. J. ACM 51(4), 671–697 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  13. Marden, J.I.: Analyzing and Modeling Rank Data. Chapman & Hall (1995)

    Google Scholar 

  14. Suehiro, D., Hatano, K., Kijima, S., Takimoto, E., Nagano, K.: Online Prediction under Submodular Constraints. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 260–274. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Valiant, L.G.: The complexity of computing the permanent. Theor. Comput. Sci. 8, 189–201 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  16. Yasutake, S., Hatano, K., Kijima, S., Takimoto, E., Takeda, M.: Online Linear Optimization over Permutations. In: Asano, T., Nakano, S.-i., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 534–543. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Yellott, J.: The relationship between Luce’s choice axiom, Thurstone’s theory of comparative judgment, and the double exponential distribution. Journal of Mathematical Psychology 15, 109–144 (1977)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ailon, N., Hatano, K., Takimoto, E. (2014). Bandit Online Optimization over the Permutahedron. In: Auer, P., Clark, A., Zeugmann, T., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2014. Lecture Notes in Computer Science(), vol 8776. Springer, Cham. https://doi.org/10.1007/978-3-319-11662-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11662-4_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11661-7

  • Online ISBN: 978-3-319-11662-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics