Skip to main content
Log in

Measuring Policy Performance in Online Pricing with Offline Data: Worst-case Perspective and Bayesian Perspective

  • Published:
Journal of Systems Science and Systems Engineering Aims and scope Submit manuscript

Abstract

The problems of online pricing with offline data, among other similar online decision making with offline data problems, aim at designing and evaluating online pricing policies in presence of a certain amount of existing offline data. To evaluate pricing policies when offline data are available, the decision maker can either position herself at the time point when the offline data are already observed and viewed as deterministic, or at the time point when the offline data are not yet generated and viewed as stochastic. We write a framework to discuss how and why these two different positions are relevant to online policy evaluations, from a worst-case perspective and from a Bayesian perspective. We then use a simple online pricing setting with offline data to illustrate the constructions of optimal policies for these two approaches and discuss their differences, especially whether we can decompose the searching for the optimal policy into independent subproblems and optimize separately, and whether there exists a deterministic optimal policy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available in the GitHub repository, https:/github.coir/YueWangMathbio/OPOD.

References

  • Ban G-Y, Keskin N B (2021). Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Science 67(9):5549–5568.

    Article  Google Scholar 

  • Bastani H, Simchi-Levi D, Zhu R (2022). Meta dynamic pricing: Transfer learning across experiments. Management Science 68(3):1865–1881.

    Article  Google Scholar 

  • Billingsley P (2013). Convergence of Probability Measures. John Wiley & Sons, USA.

    MATH  Google Scholar 

  • Bu J, Simchi-Levi D, Xu Y (2020). Online pricing with offline data: Phase transition and inverse square law. In International Conference on Machine Learning. PMLR.

  • den Boer A V (2015). Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys in Operations Research and Management Science 20(1): 1–18.

    Article  MathSciNet  Google Scholar 

  • den Boer A V, Zwart B (2015). Dynamic pricing and learning with finite inventories. Operations Research 63(4):965–978.

    Article  MathSciNet  MATH  Google Scholar 

  • Durrett R (2019). Probability: Theory and Examples. Cambridge University Press, UK.

    Book  MATH  Google Scholar 

  • Eysenbach B, Salakhutdinov R R, Levine S (2019). Search on the replay buffer: Bridging planning and reinforcement learning. arXiv: 1906.05253.

  • Fujimoto S, Meger D, Precup D (2019). Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning. PMLR.

  • Gallego G, Topaloglu H (2019). Revenue Management and Pricing Analytics. Springer, USA.

    Book  Google Scholar 

  • Harrison J M, Keskin N B, Zeevi A (2012). Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Science 58(3):570–586.

    Article  Google Scholar 

  • Keskin N B, Zeevi A (2014). Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Operations Research 62(5):1142–1167.

    Article  MathSciNet  MATH  Google Scholar 

  • Kirschner J, Krause A (2018). Information directed sampling and bandits with heteroscedastic noise. In Conference on Learning Theory. PMLR.

  • Munos R, Stepleton T, Harutyunyan A, Bellemare M (2016). Safe and efficient off-policy reinforcement learning. arXiv: 1606.02647.

  • Prokhorov Y V (1956). Convergence of random processes and limit theorems in probability theory. Theory of Probability and Its Applications 1(2):157–214.

    Article  MathSciNet  Google Scholar 

  • Rakelly K, Zhou A, Finn C, Levine S, Quillen D (2019). Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International Conference on Machine Learning. PMLR.

  • Rolnick D, Ahuja A, Schwarz J, Lillicrap T, Wayne G (2019). Experience replay for continual learning. arXiv: 1811.11682.

  • Russo D, Van Roy B (2014). Learning to optimize via posterior sampling. Mathematics of Operations Research 39(4):1221–1243.

    Article  MathSciNet  MATH  Google Scholar 

  • Russo D, Van Roy B (2018). Learning to optimize via information-directed sampling. Operations Research 66(1):230–252.

    Article  MathSciNet  MATH  Google Scholar 

  • Russo D J, Van Roy B, Kazerouni A, Osband I, Wen Z (2018). A Tutorial on Thompson Sampling. Now Foundations and Trends, USA.

    Book  MATH  Google Scholar 

  • Srinivas N, Krause A, Kakade S M, Seeger M W (2012). Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory 58(5):3250–3265.

    Article  MathSciNet  MATH  Google Scholar 

  • Thomas P, Brunskill E (2016). Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning. PMLR.

  • Wang Y, Wang L (2020). Causal inference in degenerate systems: An impossibility result. In International Conference on Artificial Intelligence and Statistics. PMLR.

  • Wang Y, Zheng Z, Shen Z-J M (2023). Online pricing with polluted offline data. Available at SSRN 4320324.

  • Zanette A, Brandfonbrener D, Brunskill E, Pirotta M, Lazaric A (2020). Frequentist regret bounds for randomized least-squares value iteration. In International Conference on Artificial Intelligence and Statistics. PMLR.

Download references

Acknowledgments

The authors would like to thank the anonymous referees for providing helpful comments that improve the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Wang.

Additional information

Yue Wang is a postdoctoral fellow at the Department of Computational Medicine, University of California, Los Angeles since 2021. During 2018–2021, Dr. Wang was a postdoctoral researcher at Institut des Hautes Études Scientifiques in France. Dr. Wang received Ph.D. in applied mathematics from the University of Washington in 2018, and B.Sc. in mathematics from Peking University in 2013. Dr. Wang applies different mathematical tools, such as modeling, simulation, algorithm, statistical analysis, theoretical analysis with discrete mathematics, differential equation, and stochastic process, to biology, e.g., population dynamics, gene regulation, and developmental biology. Dr. Wang also applies probability, stochastic process, and discrete mathematics to different subjects, such as reinforcement learning, causal inference, statistical physics, biochemistry, dynamical system, and law.

Zeyu Zheng is an assistant professor at the Department of Industrial Engineering and Operations Research, University of California, Berkeley since 2018. Dr. Zheng received a PhD degree in operations research from Stanford University in 2018, an MS degree in economics from Stanford University in 2016 and a Bachelor degree in mathematics from Peking University in 2012. He has done research in Monte Carlo simulation theory and simulation optimization. He is also interested in non-stationary stochastic modeling.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Zheng, Z. Measuring Policy Performance in Online Pricing with Offline Data: Worst-case Perspective and Bayesian Perspective. J. Syst. Sci. Syst. Eng. 32, 352–371 (2023). https://doi.org/10.1007/s11518-023-5557-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11518-023-5557-9

Keywords

Navigation