Online Learning

  • Guillermo Gallego
  • Huseyin Topaloglu
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 279)


In the models that we have studied so far, we have assumed that the demand model and its parameters are all known. In practice, demand models need to be estimated before dynamic pricing, assortment optimization, and revenue management can be effectively done. In some instances, there is enough data over a long period of time to calibrate different demand models, do model selection, and update parameter estimates. At the other extreme, we may be pricing for products for which we have little or no information. In this case, demand learning needs to be done on the fly. This is particularly true for online retailing of new products. In this chapter, we address the problem of online demand learning. We study the expected loss in revenue of a learning-and-earning policy relative to an optimal clairvoyant policy that knows the expected demand function. We consider both the case of ample and constrained capacity and measure how the regret grows as the length of the sales horizon increases. We present only the strongest available results for both the case of ample and the case of constrained capacity. In Sect. 10.2, we consider the case with ample capacity, whereas in Sect. 10.3, we consider the case with constrained capacity.


  1. D. Acemoglu, M. Dahleh, I. Lobel, A. Ozdaglar, Bayesian learning in social networks. Econ. Stud. 78 (4), 1201–1236 (2011)Google Scholar
  2. P. Afeche, B. Ata, Bayesian dynamic pricing in queueing systems with unknown delay cost characteristics. Manuf. Serv. Oper. Manag. 15 (2), 292–304 (2013)CrossRefGoogle Scholar
  3. S. Agrawal, Z. Wang, Y. Ye, A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62 (4), 876–890 (2014)CrossRefGoogle Scholar
  4. S. Agrawal, V. Avadhanula, V. Goyal, A. Zeevi, MNL-Bandit: a dynamic learning approach to assortment selection. Technical report, Columbia University, New York, NY (2018)Google Scholar
  5. V.F. Araman, R. Caldentey, Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57 (5), 1169–1188 (2009)CrossRefGoogle Scholar
  6. V.F. Araman, R. Caldentey, Revenue management with incomplete demand information, in Wiley Encyclopedia of Operations Research and Management Science, ed. by J.J. Cochran, L.A. Cox Jr., P. Keskinocak, J.P. Kharoufeh, J.C. Smith (Wiley, New York, 2011)Google Scholar
  7. Y. Aviv, A. Pazgal, A partially observed Markov decision process for dynamic pricing. Manag. Sci. 51 (9), 1400–1416 (2005)CrossRefGoogle Scholar
  8. G.-Y. Ban, N.B. Keskin, Personalized dynamic pricing with machine learning. Technical report, Duke University, Durham, NC (2017)CrossRefGoogle Scholar
  9. D. Bertsimas, A. Mersereau, A learning approach for interactive marketing to a customer segment. Oper. Res. 55 (6), 1120–1135 (2007)CrossRefGoogle Scholar
  10. D. Bertsimas, V. Misic, Exact first-choice product line optimization. Oper. Res. 67(3), 651–670 (2019)CrossRefGoogle Scholar
  11. D. Bertsimas, G. Perakis, Dynamic pricing: a learning approach, in Mathematical and Computational Models for Congestion Charging, ed. by S. Lawphongpanich, D.W. Hearn, M.J. Smith (Springer US, Boston, 2006)Google Scholar
  12. O. Besbes, D. Saure, Dynamic pricing in the presence of demand shifts. Manuf. Serv. Oper. Manag. 16 (4), 513–528 (2014)CrossRefGoogle Scholar
  13. O. Besbes, A. Zeevi, Dynamic pricing without knowing the demand function: risk bounds and near optimal algorithms. Oper. Res. 57 (6), 1407–1420 (2009)CrossRefGoogle Scholar
  14. O. Besbes, A. Zeevi, On the minimax complexity of pricing in a changing environment. Oper. Res. 59 (1), 66–79 (2011)CrossRefGoogle Scholar
  15. O. Besbes, A. Zeevi, On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Manag. Sci. 61 (4), 723–739 (2015)CrossRefGoogle Scholar
  16. O. Besbes, Y. Gur, A. Zeevi, Non-stationary stochastic optimization. Oper. Res. 63 (5), 1227–1244 (2015)CrossRefGoogle Scholar
  17. J. Broder, P. Rusmevichientong, Dynamic pricing under a general parametric choice model. Oper. Res. 60 (4), 965–980 (2012)CrossRefGoogle Scholar
  18. N. Chen, G. Gallego, Nonparametric learning and optimization with covariates. Technical report, Hong Kong University of Science and Technology, Hong Kong (2018a)CrossRefGoogle Scholar
  19. N. Chen, G. Gallego, A primal-dual learning algorithm for personalized dynamic pricing with an inventory constraint. Technical report, Hong Kong University of Science and Technology, Hong Kong (2018b)CrossRefGoogle Scholar
  20. X. Chen, Y. Wang, A note on tight lower bound for MNL-bandit assortment selection models. Oper. Res. Lett. 46 (5), 534–537 (2018)CrossRefGoogle Scholar
  21. B. Chen, X. Chao, C. Shi, Nonparametric algorithms for joint pricing and inventory control with lost-sales and censored demand. Technical report, University of Michigan, Ann Arbour, MI (2016a)Google Scholar
  22. X. Chen, Y. Wang, Y. Zhou, Dynamic assortment optimization with changing contextual information. Technical report, New York University, New York, NY (2018b)Google Scholar
  23. X. Chen, Y. Wang, Y. Zhou, Dynamic assortment selection under the nested logit models. Technical report, New York University, New York, NY (2018c)Google Scholar
  24. Q.G. Chen, S. Jasin, I. Duenyas, Nonparametric self-adjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity. Math. Oper. Res. 44(2), 601–631 (2019a)CrossRefGoogle Scholar
  25. B. Chen, X. Chao, H.-S. Ahn, Coordinating pricing and inventory replenishment with nonparametric demand learning. Oper. Res. (2019b, forthcoming)Google Scholar
  26. W.C. Cheung, D. Simchi-Levi, Technical note – dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65 (6), 1722–1731 (2017)CrossRefGoogle Scholar
  27. D.F. Ciocan, V. Farias, Model predictive control for dynamic resource allocation. Math. Oper. Res. 37 (3), 501–525 (2012a)CrossRefGoogle Scholar
  28. D.F. Ciocan, V.F. Farias, Fast demand learning for display advertising revenue management. Technical report, MIT, Boston, MA (2014)Google Scholar
  29. M. Cohen, I. Lobel, R.P. Leme, Feature-based dynamic pricing. Technical report, New York University, New York, NY (2018a)Google Scholar
  30. D. Crapis, B. Ifrach, C. Maglaras, M. Scarsini, Monopoly pricing in the presence of social learning. Manag. Sci. 63 (11), 3531–3997 (2017)CrossRefGoogle Scholar
  31. A.V. den Boer, Dynamic pricing and learning: historical origins, current research, and new directions. Surv. Oper. Res. Manag. Sci. 20 (1), 1–18 (2015)Google Scholar
  32. A. den Boer, N.B. Keskin, Discontinuous demand functions: estimation and pricing. Technical report, Duke University, Durham, NC (2017a)Google Scholar
  33. A. den Boer and N.B. Keskin, Dynamic pricing with demand learning and reference effects. Technical report, Duke University, Durham, NC (2017b)Google Scholar
  34. V. Farias, B. Van Roy, Dynamic pricing with a prior on market response. Oper. Res. 58 (1), 16–29 (2010)CrossRefGoogle Scholar
  35. K.J. Ferreira, D. Simchi-Levi, H. Wang, Online network revenue management using Thompson sampling. Oper. Res. 66 (6), 1586–1602 (2018)CrossRefGoogle Scholar
  36. J.M. Harrison, B.N. Keskin, A. Zeevi, Bayesian dynamic pricing policies: learning and earning under a binary prior distribution. Manag. Sci. 58 (3), 570–586 (2012)CrossRefGoogle Scholar
  37. B. Ifrach, C. Maglaras, M. Scarsini, Bayesian social learning from consumer reviews. Technical report, Columbia University, New York, NY (2018)Google Scholar
  38. A. Javanmard, H. Nazerzadeh, Dynamic pricing in high dimensions. Technical report, University of Southern California, Los Angeles, CA (2018)Google Scholar
  39. N.B. Keskin, J. Birge, Dynamic selling mechanisms for product differentiation and learning. Oper. Res. (2019, forthcoming)Google Scholar
  40. N.B. Keskin, A. Zeevi, Dynamic pricing with an unknown demand model: asymptotically optimal semi-myopic policies. Oper. Res. 62 (5), 1142–1167 (2014)CrossRefGoogle Scholar
  41. N.B. Keskin, A. Zeevi, Chasing demand: learning and earning in a changing environment. Math. Oper. Res. 42 (2), 277–307 (2017)CrossRefGoogle Scholar
  42. R. Kleinberg, T. Leighton, The value of knowing a demand curve: bounds on regret for online posted-price auctions, in 44th Annual IEEE Symposium on Foundations of Computer Science, Cambridge, MA (2003), pp. 594–605Google Scholar
  43. H.D. Kwon, S.A. Lippman, C.S. Tang, Optimal markdown pricing strategy with demand learning. Probab. Eng. Inf. Sci. 26 (1), 77–104 (2012)CrossRefGoogle Scholar
  44. T. Levina, Y. Levin, J. McGill, M. Nediak, Dynamic pricing with online learning and strategic consumers: an application of the aggregating algorithm. Oper. Res. 57 (2), 327–341 (2009b)CrossRefGoogle Scholar
  45. M. Nambiar, D. Simchi-Levi, H.Wang, Dynamic learning and pricing with model misspecification. Manag. Sci. (2019, forthcoming)Google Scholar
  46. D. Saure, A. Zeevi, Optimal dynamic assortment planning with demand learning. Manuf. Serv. Oper. Manag. 15 (3), 387–404 (2013)CrossRefGoogle Scholar
  47. A. Sen, A.X. Zhang, Style goods pricing with demand learning. Eur. J. Oper. Res. 196 (3), 1058–1075 (2009)CrossRefGoogle Scholar
  48. Z. Wang, S. Deng, Y. Ye, Close the gaps: a learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62 (2), 318–331 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Guillermo Gallego
    • 1
  • Huseyin Topaloglu
    • 2
  1. 1.Clearwater BayHong Kong
  2. 2.ORIECornell UniversityNew YorkUSA

Personalised recommendations