Measuring Policy Performance in Online Pricing with Offline Data: Worst-case Perspective and Bayesian Perspective

Wang, Yue; Zheng, Zeyu

doi:10.1007/s11518-023-5557-9

Measuring Policy Performance in Online Pricing with Offline Data: Worst-case Perspective and Bayesian Perspective

Published: 08 March 2023

Volume 32, pages 352–371, (2023)
Cite this article

Journal of Systems Science and Systems Engineering Aims and scope Submit manuscript

Yue Wang^1,2 &
Zeyu Zheng³

95 Accesses
1 Citation
Explore all metrics

Abstract

The problems of online pricing with offline data, among other similar online decision making with offline data problems, aim at designing and evaluating online pricing policies in presence of a certain amount of existing offline data. To evaluate pricing policies when offline data are available, the decision maker can either position herself at the time point when the offline data are already observed and viewed as deterministic, or at the time point when the offline data are not yet generated and viewed as stochastic. We write a framework to discuss how and why these two different positions are relevant to online policy evaluations, from a worst-case perspective and from a Bayesian perspective. We then use a simple online pricing setting with offline data to illustrate the constructions of optimal policies for these two approaches and discuss their differences, especially whether we can decompose the searching for the optimal policy into independent subproblems and optimize separately, and whether there exists a deterministic optimal policy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Competitive pricing on online markets: a literature review

Article Open access 14 June 2022

The pricing strategies of online grocery retailers

Article Open access 28 November 2023

Introduction to Reinforcement Learning

Data Availability

The datasets generated during and/or analysed during the current study are available in the GitHub repository, https:/github.coir/YueWangMathbio/OPOD.

References

Ban G-Y, Keskin N B (2021). Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Science 67(9):5549–5568.
Article Google Scholar
Bastani H, Simchi-Levi D, Zhu R (2022). Meta dynamic pricing: Transfer learning across experiments. Management Science 68(3):1865–1881.
Article Google Scholar
Billingsley P (2013). Convergence of Probability Measures. John Wiley & Sons, USA.
MATH Google Scholar
Bu J, Simchi-Levi D, Xu Y (2020). Online pricing with offline data: Phase transition and inverse square law. In International Conference on Machine Learning. PMLR.
den Boer A V (2015). Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys in Operations Research and Management Science 20(1): 1–18.
Article MathSciNet Google Scholar
den Boer A V, Zwart B (2015). Dynamic pricing and learning with finite inventories. Operations Research 63(4):965–978.
Article MathSciNet MATH Google Scholar
Durrett R (2019). Probability: Theory and Examples. Cambridge University Press, UK.
Book MATH Google Scholar
Eysenbach B, Salakhutdinov R R, Levine S (2019). Search on the replay buffer: Bridging planning and reinforcement learning. arXiv: 1906.05253.
Fujimoto S, Meger D, Precup D (2019). Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning. PMLR.
Gallego G, Topaloglu H (2019). Revenue Management and Pricing Analytics. Springer, USA.
Book Google Scholar
Harrison J M, Keskin N B, Zeevi A (2012). Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Science 58(3):570–586.
Article Google Scholar
Keskin N B, Zeevi A (2014). Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Operations Research 62(5):1142–1167.
Article MathSciNet MATH Google Scholar
Kirschner J, Krause A (2018). Information directed sampling and bandits with heteroscedastic noise. In Conference on Learning Theory. PMLR.
Munos R, Stepleton T, Harutyunyan A, Bellemare M (2016). Safe and efficient off-policy reinforcement learning. arXiv: 1606.02647.
Prokhorov Y V (1956). Convergence of random processes and limit theorems in probability theory. Theory of Probability and Its Applications 1(2):157–214.
Article MathSciNet Google Scholar
Rakelly K, Zhou A, Finn C, Levine S, Quillen D (2019). Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International Conference on Machine Learning. PMLR.
Rolnick D, Ahuja A, Schwarz J, Lillicrap T, Wayne G (2019). Experience replay for continual learning. arXiv: 1811.11682.
Russo D, Van Roy B (2014). Learning to optimize via posterior sampling. Mathematics of Operations Research 39(4):1221–1243.
Article MathSciNet MATH Google Scholar
Russo D, Van Roy B (2018). Learning to optimize via information-directed sampling. Operations Research 66(1):230–252.
Article MathSciNet MATH Google Scholar
Russo D J, Van Roy B, Kazerouni A, Osband I, Wen Z (2018). A Tutorial on Thompson Sampling. Now Foundations and Trends, USA.
Book MATH Google Scholar
Srinivas N, Krause A, Kakade S M, Seeger M W (2012). Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory 58(5):3250–3265.
Article MathSciNet MATH Google Scholar
Thomas P, Brunskill E (2016). Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning. PMLR.
Wang Y, Wang L (2020). Causal inference in degenerate systems: An impossibility result. In International Conference on Artificial Intelligence and Statistics. PMLR.
Wang Y, Zheng Z, Shen Z-J M (2023). Online pricing with polluted offline data. Available at SSRN 4320324.
Zanette A, Brandfonbrener D, Brunskill E, Pirotta M, Lazaric A (2020). Frequentist regret bounds for randomized least-squares value iteration. In International Conference on Artificial Intelligence and Statistics. PMLR.

Download references

Acknowledgments

The authors would like to thank the anonymous referees for providing helpful comments that improve the quality of this paper.

Author information

Authors and Affiliations

Department of Computational Medicine, University of California, Los Angeles, CA, 90095, USA
Yue Wang
Institut des Hautes Études Scientifiques, Bures-sur-Yvette, Essonne, 91440, France
Yue Wang
Department of Industrial Engineering and Operations Research, University of California, Berkeley, CA, 94720, USA
Zeyu Zheng

Authors

Yue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zeyu Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Wang.

Additional information

Yue Wang is a postdoctoral fellow at the Department of Computational Medicine, University of California, Los Angeles since 2021. During 2018–2021, Dr. Wang was a postdoctoral researcher at Institut des Hautes Études Scientifiques in France. Dr. Wang received Ph.D. in applied mathematics from the University of Washington in 2018, and B.Sc. in mathematics from Peking University in 2013. Dr. Wang applies different mathematical tools, such as modeling, simulation, algorithm, statistical analysis, theoretical analysis with discrete mathematics, differential equation, and stochastic process, to biology, e.g., population dynamics, gene regulation, and developmental biology. Dr. Wang also applies probability, stochastic process, and discrete mathematics to different subjects, such as reinforcement learning, causal inference, statistical physics, biochemistry, dynamical system, and law.

Zeyu Zheng is an assistant professor at the Department of Industrial Engineering and Operations Research, University of California, Berkeley since 2018. Dr. Zheng received a PhD degree in operations research from Stanford University in 2018, an MS degree in economics from Stanford University in 2016 and a Bachelor degree in mathematics from Peking University in 2012. He has done research in Monte Carlo simulation theory and simulation optimization. He is also interested in non-stationary stochastic modeling.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Zheng, Z. Measuring Policy Performance in Online Pricing with Offline Data: Worst-case Perspective and Bayesian Perspective. J. Syst. Sci. Syst. Eng. 32, 352–371 (2023). https://doi.org/10.1007/s11518-023-5557-9

Download citation

Published: 08 March 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11518-023-5557-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measuring Policy Performance in Online Pricing with Offline Data: Worst-case Perspective and Bayesian Perspective

Abstract

Access this article

Similar content being viewed by others

Competitive pricing on online markets: a literature review

The pricing strategies of online grocery retailers

Introduction to Reinforcement Learning

Data Availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Measuring Policy Performance in Online Pricing with Offline Data: Worst-case Perspective and Bayesian Perspective

Abstract

Access this article

Similar content being viewed by others

Competitive pricing on online markets: a literature review

The pricing strategies of online grocery retailers

Introduction to Reinforcement Learning

Data Availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation