Bandit Online Optimization over the Permutahedron

Ailon, Nir; Hatano, Kohei; Takimoto, Eiji

doi:10.1007/978-3-319-11662-4_16

Nir Ailon²³,
Kohei Hatano²⁴ &
Eiji Takimoto²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8776))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

1334 Accesses
4 Citations

Abstract

The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),…, π(n)) for all permutations (bijections) π over {1,…, n}. We study a bandit game in which, at each step t, an adversary chooses a hidden weight weight vector s _t, a player chooses a vertex π _t of the permutahedron and suffers an observed instantaneous loss of \(\sum_{i=1}^n\pi_t(i) s_t(i)\).

We study the problem in two regimes. In the first regime, s _t is a point in the polytope dual to the permutahedron. Algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of \(O(n\sqrt{T \log n})\) after T steps. Unfortunately, CombBand requires at each step an n-by-n matrix permanent computation, a #P-hard problem. Approximating the permanent is possible in the impractical running time of O(n ¹⁰), with an additional heavy inverse-polynomial dependence on the sought accuracy. We provide an algorithm of slightly worse regret \(O(n^{3/2}\sqrt{T})\) but with more realistic time complexity O(n ³) per step. The technical contribution is a bound on the variance of the Plackett-Luce noisy sorting process’s ‘pseudo loss’, obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices of rational functions in exponents of 3 parameters.

In the second regime, s _t is in the hypercube. For this case we present and analyze an algorithm based on Bubeck et al.’s (2012) OSMD approach with a novel projection and decomposition technique for the permutahedron. The algorithm is efficient and achieves a regret of \(O(n\sqrt{T})\), but for a more restricted space of possible loss vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ailon, N.: Improved Bounds for Online Learning Over the Permutahedron and Other Ranking Polytopes. In: AISTATS (2014)
Google Scholar
Ailon, N., Hatano, K., Takimoto, E.: Bandit online optimization over the permutahedron (technical report). arXiv:1312.1530 (2013)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2003)
Article MathSciNet Google Scholar
Beggs, S., Cardell, S., Hausman, J.: Assessing the potential demand for electric cars. Journal of Econometrics 17(1), 1–19 (1981)
Article Google Scholar
Bubeck, S.: Introduction to Online Optimization (2011), http://www.princeton.edu/~bubeck/BubeckLectureNotes.pdf
Bubeck, S., Cesa-Bianchi, N., Kakade, S.M.: Towards Minimax Policies for Online Linear Optimization with Bandit Feedback. In: Proceedings of 25th Annual Conference on Learning Theory (COLT 2012), pp. 41.1–41.14 (2012)
Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Combinatorial bandits. J. Comput. Syst. Sci. 78(5), 1404–1422 (2012)
Article MathSciNet MATH Google Scholar
Dani, V., Hayes, T.P., Kakade, S.: The price of bandit information for online optimization. In: NIPS (2007)
Google Scholar
Hazan, E.: The convex optimization approach to regret minimization. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, ch. 10, pp. 287–304. MIT Press (2011)
Google Scholar
Hazan, E., Karnin, Z.S., Mehka, R.: Volumetric spanners and their applications to machine learning. CoRR, abs/1312.6214 (2013)
Google Scholar
Helmbold, D.P., Warmuth, M.K.: Learning Permutations with Exponential Weights. Journal of Machine Learning Research 10, 1705–1736 (2009)
MathSciNet MATH Google Scholar
Jerrum, M., Sinclair, A., Vigoda, E.: A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. J. ACM 51(4), 671–697 (2004)
Article MathSciNet MATH Google Scholar
Marden, J.I.: Analyzing and Modeling Rank Data. Chapman & Hall (1995)
Google Scholar
Suehiro, D., Hatano, K., Kijima, S., Takimoto, E., Nagano, K.: Online Prediction under Submodular Constraints. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 260–274. Springer, Heidelberg (2012)
Chapter Google Scholar
Valiant, L.G.: The complexity of computing the permanent. Theor. Comput. Sci. 8, 189–201 (1979)
Article MathSciNet MATH Google Scholar
Yasutake, S., Hatano, K., Kijima, S., Takimoto, E., Takeda, M.: Online Linear Optimization over Permutations. In: Asano, T., Nakano, S.-i., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 534–543. Springer, Heidelberg (2011)
Chapter Google Scholar
Yellott, J.: The relationship between Luce’s choice axiom, Thurstone’s theory of comparative judgment, and the double exponential distribution. Journal of Mathematical Psychology 15, 109–144 (1977)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Technion, Israel
Nir Ailon
Department of Informatics, Kyushu University, Japan
Kohei Hatano & Eiji Takimoto

Authors

Nir Ailon
View author publications
You can also search for this author in PubMed Google Scholar
Kohei Hatano
View author publications
You can also search for this author in PubMed Google Scholar
Eiji Takimoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Montanuniversitaet Leoben, 8700, Leoben, Austria
Peter Auer
Department of Philosophy, King’s College, WC2R 2LS, London, UK
Alexander Clark
Division of Computer Science, Hokkaido University, N-14, W-9, 060-0814, Sapporo, Japan
Thomas Zeugmann
Department of Computer Science, University of Regina, S4S 0A2, Regina, SK, Canada
Sandra Zilles

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ailon, N., Hatano, K., Takimoto, E. (2014). Bandit Online Optimization over the Permutahedron. In: Auer, P., Clark, A., Zeugmann, T., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2014. Lecture Notes in Computer Science(), vol 8776. Springer, Cham. https://doi.org/10.1007/978-3-319-11662-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-11662-4_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11661-7
Online ISBN: 978-3-319-11662-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics