Making Online Decisions with Bounded Memory

* Final gross prices may vary according to local VAT.

Get Access


We study the online decision problem in which there are T steps to play and n actions to choose. For this problem, several algorithms achieve an optimal regret of \(O(\sqrt{T \ln n})\) , but they all require about T n states, which one may not be able to afford when n and T are very large. We are interested in such large scale problems, and we would like to understand what an online algorithm can achieve with only a bounded number of states. We provide two algorithms, both with m n − 1 states, for a parameter m, which achieve regret of O(m + (T/m)ln (mn)) and \(O(n \sqrt{m} +T/\sqrt{m})\) , respectively. We also show that any online algorithm with m n − 1 states must suffer a regret of Ω(T/m), which is close to what our algorithms achieve.