Advertisement

Machine Learning

, Volume 108, Issue 4, pp 687–715 | Cite as

Corruption-tolerant bandit learning

  • Sayash Kapoor
  • Kumar Kshitij Patel
  • Purushottam KarEmail author
Article
  • 1.1k Downloads

Abstract

We present algorithms for solving multi-armed and linear-contextual bandit tasks in the face of adversarial corruptions in the arm responses. Traditional algorithms for solving these problems assume that nothing but mild, e.g., i.i.d. sub-Gaussian, noise disrupts an otherwise clean estimate of the utility of the arm. This assumption and the resulting approaches can fail catastrophically if there is an observant adversary that corrupts even a small fraction of the responses generated when arms are pulled. To rectify this, we propose algorithms that use recent advances in robust statistical estimation to perform arm selection in polynomial time. Our algorithms are easy to implement and vastly outperform several existing UCB and EXP-style algorithms for stochastic and adversarial multi-armed and linear-contextual bandit problems in wide variety of experimental settings. Our algorithms enjoy minimax-optimal regret bounds, as well as can tolerate an adversary that is allowed to corrupt upto a universally constant fraction of the arms pulled by the algorithm.

Keywords

Robust learning Online learning Bandit algorithms 

Notes

Acknowledgements

The authors would like to thank the reviewers and editors for pointing out several relevant works, as well as helping improve the presentation of the paper. S.K. is supported by the National Talent Search Scheme under the National Council of Education, Research and Training (Ref. No. 41/X/2013-NTS). K.K.P. thanks Honda Motor India Pvt. Ltd. for an award under the 2017 Y-E-S Award program. P.K. is supported by the Deep Singh and Daljeet Kaur Faculty Fellowship and the Research-I foundation at IIT Kanpur, and thanks Microsoft Research India and Tower Research for research grants.

References

  1. Abbasi-Yadkori, Y., Pal, D., & Szepesvari, C. (2011). Improved algorithms for linear stochastic bandits. In Proceedings of the 25th annual conference on neural information processing systems (NIPS).Google Scholar
  2. Audibert, J.-Y., Munos, R., & Szepesvári, C. (2007). Tuning bandit algorithms in stochastic environments. In Proceedings of the 18th international conference on algorithmic learning theory (ALT).Google Scholar
  3. Audibert, J.-Y., Munos, R., & Szepesvári, C. (2009). Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19), 1876–1902.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002a). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47, 235–256.CrossRefzbMATHGoogle Scholar
  5. Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. (2002b). The nonstochastic multiarmed bandit problem. SIAM Journal of Computing, 31(1), 48–77.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Bhatia, K., Jain, P., & Kar, P. (2015). Robust regression via hard thresholding. In Proceedings of the 29th annual conference on neural information processing systems (NIPS).Google Scholar
  7. Bubeck, Sébastian., & Slivkins, A. (2012). The best of both worlds: stochastic and adversarial bandits. In Proceedings of the 25th annual conference on learning theory (COLT).Google Scholar
  8. Bubeck, S., Cesa-Bianchi, N., & Lugosi, G. (2013). Bandits with heavy tail. IEEE Transaction on Information Theory, 59(11), 7711–7717.MathSciNetCrossRefzbMATHGoogle Scholar
  9. Candès, E. J., Li, X., & Wright, J. (2009). Robust principal component analysis? Journal of the ACM, 58(1), 1–37.MathSciNetzbMATHGoogle Scholar
  10. Chakrabarti, D., Kumar, R., Radlinski, F., & Upfal, E. (2008). Mortal multi-armed bandits. In Proceedings of the 21st international conference on neural information processing systems (NIPS).Google Scholar
  11. Charikar, M., Steinhardt, J., & Valiant, G. (2017). Learning from untrusted data. In Proceedings of the 49th annual ACM SIGACT symposium on theory of computing (STOC) (pp. 47–60).Google Scholar
  12. Chen, Y., Caramanis, C., & Mannor, S. (2013). Robust sparse regression under adversarial corruption. In Proceedings of the 30th international conference on machine learning (ICML).Google Scholar
  13. Chu, W., Li, L., Reyzin, L., & Schapire, R. (2011). Contextual bandits with linear payoff functions. In Proceedings of the 14th international conference on artificial intelligence and statistics (AISTATS).Google Scholar
  14. Diakonikolas, I., Kamath, G., Kane, D. M., Li, J., Moitra, A., & Stewart, A. (2016). Robust estimators in high dimensions without the computational intractability. In Proceedings of the 57th IEEE annual symposium on foundations of computer science (FOCS).Google Scholar
  15. Diakonikolas, I., Kamath, G., Kane, D. M., Li, J., Moitra, A., & Stewart, A. (2018). Robustly learning a gaussian: Getting optimal error, efficiently. In Proceedings of the twenty-ninth annual acm-siam symposium on discrete algorithms (SODA) (pp. 2683–2702).Google Scholar
  16. Feng, J., Xu, H., Mannor, S., & Yan, S. (2014). Robust logistic regression and classification. In Proceedings of the 28th annual conference on neural information processing systems (NIPS).Google Scholar
  17. Gajane, P., Urvoy, T., & Kaufmann, E. (2018). Corrupt bandits for preserving local privacy. In Proceedings of the 29th international conference on algorithmic learning theory (ALT).Google Scholar
  18. Garivier, A., & Cappé, O. (2011). The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th annual conference on learning theory (COLT).Google Scholar
  19. Gentile, C., Li, S., Kar, P., Karatzoglou, A., Zappella, G., & Etrue, E. (2017). On context-dependent clustering of bandits. In Proceedings of the 34th international conference on machine learning (ICML).Google Scholar
  20. Gentile, C., Li, S., & Zappella, G. (2014). Online clustering of bandits. In Proceedings of the 31st international conference on machine learning (ICML).Google Scholar
  21. Huber, P. J. (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1), 73–101.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Lai, K. A., Rao, A. B., & Vempala, S. (2016). Agnostic estimation of mean and covariance. In Proceedings of the 57th IEEE annual symposium on foundations of computer science (FOCS).Google Scholar
  23. Li, L., Chu, W., Langford, J., & Schapire, R. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international world wide web conference (WWW).Google Scholar
  24. Lykouris, T., Mirrokni, V., & Leme, R. P. (2018). Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th annual ACM SIGACT symposium on theory of computing (STOC) (pp. 114–122).Google Scholar
  25. Maronna, R. A., Martin, R. D., & Yohai, V. J. (2006). Robust statistics: Theory and methods. New York: Wiley.CrossRefzbMATHGoogle Scholar
  26. Medina, A. M., & Yang, S. (2016). No-regret algorithms for heavy-tailed linear bandits. In Proceedings of the 33rd international conference on machine learning (ICML).Google Scholar
  27. Nguyen, N. H., & Tran, T. D. (2013). Exact recoverability from dense corrupted observations via \(\ell _1\)-minimization. IEEE Transactions on Information Theory, 59(4), 2017–2035.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Padmanabhan, D., Bhat, S., Garg, D., Shevade, S. K., & Narahari, Y. (2016). A robust UCB scheme for active learning in regression from strategic crowds. In Proceedings of the international joint conference on neural networks (IJCNN).Google Scholar
  29. Seldin, Y., & Slivkins, A. (2014). One practical algorithm for both stochastic and adversarial bandits. In Proceedings of the 31st international conference on machine learning (ICML).Google Scholar
  30. Tang, L., Rosales, R., Singh, A. P., & Agarwal, D. (2013). Automatic Ad format selection via contextual bandits. In Proceedings of the 22nd ACM international conference on information and knowledge management (CIKM).Google Scholar
  31. Tewari, A., & Murphy, S. A. (2017). Mobile health, chapter From Ads to interventions: Contextual bandits in mobile health (pp. 495–517). New York: Springer.Google Scholar
  32. The Hindustan Times. #Appwapsi: Snapdeal gets blowback from Aamir Khan controversy, Nov 24, (2015). https://www.hindustantimes.com/india/appwapsi-snapdeal-gets-blowback-from-aamir-khan-controversy/story-N3HwOObJ0WMe9vz7GjXFBO.html. Accessed July 15, 2018.
  33. Tsybakov, A. B. (2009). Introduction to nonparametric estimation. New York: Springer.CrossRefzbMATHGoogle Scholar
  34. Tukey, J. W. (1960). A survey of sampling from contaminated distributions. Contributions to Probability and Statistics, 2, 448–485.MathSciNetzbMATHGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Indian Institute of Technology KanpurKanpurIndia

Personalised recommendations