Response Prediction for Low-Regret Agents

  • Saeed Alaei
  • Ashwinkumar BadanidiyuruEmail author
  • Mohammad Mahdian
  • Sadra Yazdanbod
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11920)


Companies like Google and Microsoft run billions of auctions every day to sell advertising opportunities. Any change to the rules of these auctions can have a tremendous effect on the revenue of the company and the welfare of the advertisers and the users. Therefore, any change requires careful evaluation of its potential impacts. Currently, such impacts are often evaluated by running simulations or small controlled experiments. This, however, misses the important factor that the advertisers respond to changes. Our goal is to build a theoretical framework for predicting the actions of an agent (the advertiser) that is optimizing her actions in an uncertain environment. We model this problem using a variant of the multi-armed bandit setting where playing an arm is costly. The cost of each arm changes over time and is publicly observable. The value of playing an arm is drawn stochastically from a static distribution and is observed by the agent and not by us. We, however, observe the actions of the agent. Our main result is that assuming the agent is playing a strategy with a regret of at most f(T) within the first T rounds, we can learn to play the multi-armed bandits game (without observing the rewards) in such a way that the regret of our selected actions is at most \(O(k^4(f(T)+1)\log (T))\), where k is the number of arms.


Ad auctions Advertiser response prediction Multi-armed bandit Low regret 


  1. 1.
    Athey, S., Nekipelov, D.: A structural model of sponsored search advertising auctions. In: Sixth Ad Auctions Workshop (2010)Google Scholar
  2. 2.
    Athey, S., Haile, P.A.: Nonparametric approaches to auctions. In: Handbook of Econometrics. Elsevier (2007)Google Scholar
  3. 3.
    Backstrom, L., Kleinberg, J.: Network bucket testing. In: Proceedings of the 20th International Conference on World Wide Web, pp. 615–624. ACM (2011)Google Scholar
  4. 4.
    Braverman, M., Mao, J., Schneider, J., Weinberg, M.: Selling to a no-regret buyer. In: Proceedings of the 2018 ACM Conference on Economics and Computation, pp. 523–538 (2018)Google Scholar
  5. 5.
    Broder, A., Gabrilovich, E., Josifovski, V., Mavromatis, G., Smola, A.: Bid generation for advanced match in sponsored search. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, New York, NY, USA, pp. 515–524 (2011)Google Scholar
  6. 6.
    Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. CoRR abs/1204.5721 (2012)CrossRefGoogle Scholar
  7. 7.
    Campo, S., Guerre, E., Perrigne, I., Vuong, Q.: Semiparametric Estimation of First-price Auctions with Risk Averse Bidders. Working papers, Centre de Recherche en Economie et Statistique (2003)Google Scholar
  8. 8.
    Cary, M., et al.: Greedy bidding strategies for keyword auctions. In: Proceedings of the 8th ACM Conference on Electronic Commerce, EC 2007, pp. 262–271. ACM, New York (2007).
  9. 9.
    Chawla, S., Hartline, J., Nekipelov, D.: Mechanism design for data science. In: Proceedings of the Fifteenth ACM Conference on Economics and Computation, pp. 711–712. ACM (2014)Google Scholar
  10. 10.
    Feldman, M., Kalai, A., Tennenholtz, M.: Playing games without observing payoffs. In: ICS, pp. 106–110 (2010)Google Scholar
  11. 11.
    Guerre, E., Perrigne, I., Vuong, Q.: Optimal nonparametric estimation of first-price auctions. Econometrica 68(3), 525–574 (2000)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Guerre, E., Perrigne, I., Vuong, Q.: Nonparametric identification of risk aversion in first-price auctions under exclusion restrictions. Econometrica 77(4), 1193–1227 (2009)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Jofre-Bonet, M., Pesendorfer, M.: Estimation of a dynamic auction game. Econometrica 71(5), 1443–1489 (2003)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Kuleshov, V., Schrijvers, O.: Inverse game theory: learning utilities in succinct games. In: Markakis, E., Schäfer, G. (eds.) WINE 2015. LNCS, vol. 9470, pp. 413–427. Springer, Heidelberg (2015). Scholar
  15. 15.
    Lang, K.J., Andersen, R.: Finding dense and isolated submarkets in a sponsored search spending graph. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 613–622. ACM (2007)Google Scholar
  16. 16.
    Mizuta, H., Steiglitz, K.: Agent-based simulation of dynamic online auctions. In: 2000 Winter Simulation Conference Proceedings, vol. 2, pp. 1772–1777. IEEE (2000)Google Scholar
  17. 17.
    Nekipelov, D., Syrgkanis, V., Tardos, E.: Econometrics for learning agents. In: Proceedings of the Sixteenth ACM Conference on Economics and Computation, pp. 1–18. ACM (2015)Google Scholar
  18. 18.
    Pin, F., Key, P.: Stochastic variability in sponsored search auctions: observations and models. In: Proceedings of the 12th ACM Conference on Electronic Commerce, pp. 61–70 (2011)Google Scholar
  19. 19.
    Tang, D., Agarwal, A., O’Brien, D., Meyer, M.: Overlapping experiment infrastructure: more, better, faster experimentation. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 17–26. ACM (2010)Google Scholar
  20. 20.
    Varian, H.R.: Position auctions. Int. J. Ind. Organ. 25(6), 1163–1178 (2007)CrossRefGoogle Scholar
  21. 21.
    Xu, H., Gao, B., Yang, D., Liu, T.Y.: Predicting advertiser bidding behaviors in sponsored search by rationality modeling. In: Proceedings of the 22nd International Conference on World Wide Web, May 2013Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Saeed Alaei
    • 1
  • Ashwinkumar Badanidiyuru
    • 1
    Email author
  • Mohammad Mahdian
    • 1
  • Sadra Yazdanbod
    • 1
  1. 1.GoogleMountain ViewUSA

Personalised recommendations