Abstract
Dynamic assortment optimization with demand learning is a fundamental task in data-driven revenue management research that requires a combination of techniques from operations research, optimization, and machine learning. In this chapter, we give an overview of research on data-driven dynamic assortment optimization when the underlying demand is governed by probabilistic choice models beyond the classical multinomial logit (MNL) choice model, thereby overcoming several limitations and drawbacks of the MNL model. We also mention interesting unsolved questions for future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In terms of either the worst-case regret or the Bayes regret. We shall adopt the worst-case regret formulation here as it is more popular.
References
Abbasi-Yadkori, Y., Pál, D., & Szepesvári, C. (2011). Improved algorithms for linear stochastic bandits. In Proceedings of the 25th Conference on Advances in Neural Information Processing Systems (NeurIPS) (pp. 2312–2320).
Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2017). Thompson sampling for the MNL-bandit. In Proceedings of the 30th Conference on Learning Theory (COLT) (pp. 76–78). PMLR
Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2019). MNL-bandit: A dynamic learning approach to assortment selection. Operations Research, 67(5), 1453–1485.
Andrieu, C., De Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine Learning, 50(1), 5–43.
Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov), 397–422.
Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science (FOCS) (pp. 322–331). New York: IEEE.
Bubeck, S., & Cesa-Bianchi, N. (2012). Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 5(1), 1–122.
Chen, W., Wang, Y., & Yuan, Y. (2013). Combinatorial multi-armed bandit: General framework and application. In Proceedings of the 30th International Conference on Machine Learning (ICML) (pp. 151–159).
Chen, W., Hu, W., Li, F., Li, J., Liu, Y., & Lu, P. (2016). Combinatorial multi-armed bandit with general reward functions. In Proceedings of the 30th Conference on Advances in Neural Information Processing Systems (NeurIPS)
Chen, X., & Wang, Y. (2018). A note on a tight lower bound for capacitated MNL-bandit assortment selection models. Operations Research Letters, 46(5), 534–537.
Chen, X., Wang, Y., & Zhou, Y. (2018). An optimal policy for dynamic assortment planning under uncapacitated multinomial logit models. Mathematics of Operations Research (in press). arXiv preprint arXiv:1805.04785.
Chen, X., Wang, Y., & Zhou, Y. (2020). Dynamic assortment optimization with changing contextual information. Journal of Machine Learning Research, 21(216), 1–44.
Chen, X., Shi, C., Wang, Y., & Zhou, Y. (2021). Dynamic assortment planning under nested logit models. Production and Operations Management, 30(1), 85–102.
Cheung, W. C., & Simchi-Levi, D. (2017). Thompson sampling for online personalized assortment optimization problems with multinomial logit choice models. Available at SSRN 3075658.
Chu, W., Li, L., Reyzin, L., & Schapire, R. (2011). Contextual bandits with linear payoff functions. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS) (pp. 208–214). JMLR Workshop and Conference Proceedings.
Daganzo, C. (2014). Multinomial probit: The theory and its application to demand forecasting. Amsterdam: Elsevier.
Davis, J., Gallego, G., & Topaloglu, H. (2013). Assortment planning under the multinomial logit model with totally unimodular constraint structures. Work in Progress.
Davis, J. M., Gallego, G., Topaloglu, H. (2014). Assortment optimization under variants of the nested logit model. Operations Research, 62(2), 250–273.
Feldman, J. B., & Topaloglu, H. (2017). Revenue management under the Markov chain choice model. Operations Research, 65(5), 1322–1342.
Filippi, S., Cappe, O., Garivier, A., & Szepesvári, C. (2010). Parametric bandits: The generalized linear case. In Proceedings of the 24th conference on advances in neural information processing systems (NeruIPS) (pp. 586–594).
Jagabathula, S., Mitrofanov, D., & Vulcano, G. (2020a). Personalized retail promotions through a DAG-based representation of customer preferences. Available at SSRN 3258700.
Jagabathula, S., Subramanian, L., & Venkataraman, A. (2020b). A conditional gradient approach for nonparametric estimation of mixing distributions. Management Science, 66(8), 3635–3656.
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4–22.
Li, G., Rusmevichientong, P., & Topaloglu, H. (2015). The d-level nested logit model: Assortment and price optimization problems. Operations Research, 63(2), 325–342.
Li, L., Lu, Y., & Zhou, D. (2017). Provably optimal algorithms for generalized linear contextual bandits. In Proceedings of the 34th International Conference on Machine Learning (ICML) (pp. 2071–2080). PMLR.
McFadden, D. (1973). Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics (pp. 105–142)
McFadden, D., Train, K. (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics, 15(5), 447–470.
Megiddo, N. (1978). Combinatorial optimization with rational objective functions. In Proceedings of the annual ACM symposium on Theory of computing (STOC)
Miao, S. & Chao, X. (2019). Fast algorithms for online personalized assortment optimization in a big data regime. Available at SSRN 3432574.
Oh, M. h., & Iyengar, G. (2019). Thompson sampling for multinomial logit contextual bandits. In Proceedings of the 33rd conference on advances of neural information processing systems (NeurIPS) (pp. 3145–3155).
Rusmevichientong, P., & Tsitsiklis, J. N. (2010). Linearly parameterized bandits. Mathematics of Operations Research, 35(2), 395–411.
Rusmevichientong, P., Shen, Z. J. M., & Shmoys, D. B. (2010). Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Operations Research, 58(6), 1666–1680.
Russo, D., & Van Roy, B. (2014). Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4), 1221–1243.
Sauré, D., & Zeevi, A. (2013). Optimal dynamic assortment planning with demand learning. Manufacturing and Service Operations Management, 15(3), 387–404.
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.
Train, K. E. (2008). EM algorithms for nonparametric estimation of mixing distributions. Journal of Choice Modelling, 1(1), 40–69.
Train, K. E. (2009). Discrete choice methods with simulation. Cambridge: Cambridge University Press.
Acknowledgements
We would like to thank the editors for their invitation and helpful guidelines on the writing of this chapter. We would also like to thank Sentao Miao for his suggestions that greatly helped the writing of Sect. 10.4.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Wang, Y., Zhou, Y. (2022). Dynamic Assortment Optimization: Beyond MNL Model. In: Chen, X., Jasin, S., Shi, C. (eds) The Elements of Joint Learning and Optimization in Operations Management. Springer Series in Supply Chain Management, vol 18. Springer, Cham. https://doi.org/10.1007/978-3-031-01926-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-01926-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-01925-8
Online ISBN: 978-3-031-01926-5
eBook Packages: Business and ManagementBusiness and Management (R0)