Secure Best Arm Identification in Multi-armed Bandits
The stochastic multi-armed bandit is a classical decision making model, where an agent repeatedly chooses an action (pull a bandit arm) and the environment responds with a stochastic outcome (reward) coming from an unknown distribution associated with the chosen action. A popular objective for the agent is that of identifying the arm with the maximum expected reward, also known as the best-arm identification problem. We address the inherent privacy concerns that occur in a best-arm identification problem when outsourcing the data and computations to a honest-but-curious cloud.
Our main contribution is a distributed protocol that computes the best arm while guaranteeing that (i) no cloud node can learn at the same time information about the rewards and about the arms ranking, and (ii) by analyzing the messages communicated between the different cloud nodes, no information can be learned about the rewards or about the ranking. In other words, the two properties ensure that the protocol has no security single point of failure. We rely on the partially homomorphic property of the well-known Paillier’s cryptosystem as a building block in our protocol. We prove the correctness of our protocol and we present proof-of-concept experiments suggesting its practical feasibility.
KeywordsMulti-armed bandits Best arm identification Privacy Distributed computation Paillier cryptosystem
- 1.Audibert, J., Bubeck, S., Munos, R.: Best arm identification in multi-armed bandits. In: Conference on Learning Theory (COLT) (2010)Google Scholar
- 3.Chen, S., Lin, T., King, I., Lyu, M.R., Chen, W.: Combinatorial pure exploration of multi-armed bandits. In: Conference on Neural Information Processing Systems (NIPS) (2014)Google Scholar
- 4.Coquelin, P., Munos, R.: Bandit algorithms for tree search. In: Conference on Uncertainty in Artificial Intelligence (UAI) (2007)Google Scholar
- 5.Dwork, C.: Differential privacy. In: International Colloquium on Automata, Languages and Programming (ICALP) (2006)Google Scholar
- 8.Gabillon, V., Ghavamzadeh, M., Lazaric, A.: Best arm identification: a unified approach to fixed budget and fixed confidence. In: Conference on Neural Information Processing Systems (NIPS) (2012)Google Scholar
- 9.Gajane, P., Urvoy, T., Kaufmann, E.: Corrupt bandits for preserving local privacy. In: Algorithmic Learning Theory (ALT) (2018)Google Scholar
- 12.Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: International Conference on World Wide Web (WWW) (2010)Google Scholar
- 13.Mishra, N., Thakurta, A.: (Nearly) optimal differentially private stochastic multi-arm bandits. In: Conference on Uncertainty in Artificial Intelligence (UAI) (2015)Google Scholar
- 16.Soare, M., Lazaric, A., Munos, R.: Best-arm identification in linear bandits. In: Conference on Neural Information Processing Systems (NIPS) (2014)Google Scholar
- 18.Tossou, A.C.Y., Dimitrakakis, C.: Algorithms for differentially private multi-armed bandits. In: AAAI Conference on Artificial Intelligence (2016)Google Scholar