A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking

Gosavi, Abhijit; Bandla, Naveen; Das, Tapas K.

doi:10.1023/A:1015583703449

A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking

Published: September 2002

Volume 34, pages 729–742, (2002)
Cite this article

IIE Transactions

Abhijit Gosavi¹,
Naveen Bandla² &
Tapas K. Das³

83 Accesses
4 Citations
Explore all metrics

Abstract

The airline industry strives to maximize the revenue obtained from the sale of tickets on every flight. This is referred to as revenue management and it forms a crucial aspect of airline logistics. Ticket pricing, seat or discount allocation, and overbooking are some of the important aspects of a revenue management problem. Though ticket pricing is usually heavily influenced by factors beyond the control of an airline company, significant amount of control can be exercised over the seat allocation and the overbooking aspects. A realistic model for a single leg of a flight should consider multiple fare classes, overbooking of the flight, concurrent demand arrivals of passengers from the different fare classes, and class-dependent, random cancellations. Accommodating all these factors in one optimization model is a challenging task because that makes it a very large-scale stochastic optimization problem. Almost all papers in the existing literature either accommodate only a subset of these factors or use a discrete approximation in order to make the model tractable. We consider all these factors and cast the single leg problem as a semi-Markov Decision Problem (SMDP) under the average reward optimizing criterion over an infinite time horizon. We solve it using a stochastic optimization technique called Reinforcement Learning. Not only is Reinforcement Learning able to scale up to a huge state-space but because it is simulation-based it can also handle complex modeling assumptions such as the ones mentioned above. The state-space of the numerical test problem scenarios considered here is non-denumerable; its countable part being of the order of 10⁹. Our solution procedure involves a multi-step extension of the SMART algorithm which is based on the one-step Bellman equation. Numerical results presented here show that our approach is able to outperform a heuristic, namely the nested version of the EMSR heuristic, which is widely used in the airline industry. We also present a detailed study of the sensitivity of some modeling parameters via a full factorial experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

Route optimization for warehouse order picking operations via vehicle routing and simulation

Article 31 January 2020

References

Abounadi, J. (1998) Stochastic approximation for non-expansive maps: application to Q-learning algorithms. PhD thesis, MIT, Cambridge, MA.
Google Scholar
Bandla, N. (1998) Airline yield management using a reinforcement learning approach. Unpublished Master's thesis, Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL.
Google Scholar
Bellman, R. (1954) The theory of dynamic programming. Bulletin of the American Mathematical Society, 60, 503–516.
Google Scholar
Belobaba, P.P. (1989) Application of a probabilistic decision model to airline seat inventory control. Operations Research, 37, 183–197.
Google Scholar
Bertsekas, D.P. (1995) Dynamic Programming and Optimal Control, Vol. II, Athena Scientific, Belmont, MA.
Google Scholar
Bertsekas, D. and Tsitsiklis, J. (1996) Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.
Google Scholar
Brumelle, S.L. and McGill, J.I. (1993) Airline seat allocation with multiple nested fare classes. Operations Research, 41, 127–137.
Google Scholar
Chatwin, R.E. (1998) Multiperiod airline overbooking with a single fare class. Operations Research, 46(6), 805–819.
Google Scholar
Curry, R.E. (1990) Optimal airline seat allocation with fare classes nested by origins and destinations. Transportation Science, 24, 193–204.
Google Scholar
Darken, C., Chang, J. and Moody, J. (1992) Learning rate schedules for faster stochastic gradient search, in Neural Networks for Signal Processing 2 Proceedings of the 1992 IEEE Workshop, White, D.A. and Sofge, D.A. (ed), IEEE Press, Piscataway, NJ.
Google Scholar
Das, T.K., Gosavi, A., Mahadevan, S. and Marchalleck, N. (1999) Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 45(4), 560–574.
Google Scholar
Davis, P. (1994) Airline ties profitability yield to management. SIAM News, 27(5).
Glover, F., Glover R., Lorenzo, J. and McMillan, C. (1982) The passenger-mix problem in the scheduled airlines. Interfaces, 12, 73–79.
Google Scholar
Gosavi, A. (1999) An algorithm for solving semi-Markov decision problems using reinforcement learning: convergence analysis and numerical results. PhD thesis, University of South Florida, Tampa, FL.
Google Scholar
Higle, J.L. and Sen, S. (1991) Stochastic decomposition: an algorithm for two-stage linear programs with recourse. Mathematics of Operations Research, 16, 650–669.
Google Scholar
Howard, R. (1960) Dynamic Programming and Markov Processes, MIT Press, Cambridge, MA.
Google Scholar
Howard, R. (1971) Dynamic Probabilities Systems Volume II; semi-Markov Decision Processes, John Wiley and Sons, New York, NY, p. 976 onwards.
Google Scholar
Littlewood, K. (1972) Forecasting and control of passenger bookings, in Proceedings of the 12th AGIFORS Symposium, Nathanya, Israel, pp. 95–117.
Google Scholar
Martinez, R. and Sanchez, M. (1970) Automatic booking level control, in Proceedings of the 10th AGIFORS Symposium, Terrigal, Australia, pp. 1–20.
Google Scholar
McGill, J.I. and Van Ryzin, G.J. (1999) Revenue management: research overview and prospects. Transportation Science, 33(2), 233–256.
Google Scholar
Puterman, M.L. (1994) Markov Decision Processes, Wiley Interscience, New York, NY.
Google Scholar
Robbins, H. and Monro, S. (1951) A stochastic approximation method.Annals of Mathematical Statistics, 22, 400–407.
Google Scholar
Robinson, L.W. (1995) Optimal and approximate control policies for airline booking with sequential nonmonotonic fare classes. Operations Research, 43, 252–263.
Google Scholar
Shapiro, A. (2000) Stochastic programming by Monte Carlo methods.Preprint, Georgia Institute of Technology, Atlanta, GA.
Google Scholar
Smith, B.C., Leimkuhler, J.F. and Darrow, R.M. (1992) Yield management at American Airlines. Interfaces, 22, 8–31.
Google Scholar
Subramaniam, J., Stidham, Jr, S. and Lautenbacher, C.J. (1999) Airline yield management with overbooking, cancellations and noshows.Transportation Science, 33(2), 147–167.
Google Scholar
Sutton, R.S. (1988) Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Google Scholar
Sutton, R. and Barto, A.G. (1998) Reinforcement Learning, The MIT Press, Cambridge, MA.
Google Scholar
Talluri, K.T. and Van Ryzin, G.J. (1999) Bid-price controls for network revenue management. Management Science, 44, 1577–1593.
Google Scholar
Thomson, H.R. (1961) Statistical problems in airline reservation control.Operational Research Quarterly, 12, 167–185.
Google Scholar
Van Ryzin, G.J. and McGill, J.I. (2000) Revenue management without forecasting or optimization: an adaptive algorithm for determining seat protection levels. Management Science, 46(6), 760–775.
Google Scholar
Watkins, C.J. (1989) Learning from delayed rewards. PhD thesis, Kings College, Cambridge, UK.
Google Scholar
Wheeler, R. and Narenda, K. (1986) Decentralized learning in finite Markov chains. IEEE Transactions on Automatic Control, 31(6), 373–376.
Google Scholar
Wollmer, R.D. (1992) An airline seat management model for a single leg route when lower fare classes book first. Operations Research, 40, 26–37.
Google Scholar

Download references

Author information

Authors and Affiliations

College of Engineering, Industrial Engineering Program, University of Southern Colorado, Pueblo, CO, 81001, USA
Abhijit Gosavi
Sabre Technologies Inc, Southlake, TX, 76092, USA
Naveen Bandla
Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL, 33620, USA
Tapas K. Das

Authors

Abhijit Gosavi
View author publications
You can also search for this author in PubMed Google Scholar
Naveen Bandla
View author publications
You can also search for this author in PubMed Google Scholar
Tapas K. Das
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gosavi, A., Bandla, N. & Das, T.K. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking. IIE Transactions 34, 729–742 (2002). https://doi.org/10.1023/A:1015583703449

Download citation

Issue Date: September 2002
DOI: https://doi.org/10.1023/A:1015583703449

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Monte Carlo Tree Search: a review of recent modifications and applications

Route optimization for warehouse order picking operations via vehicle routing and simulation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Monte Carlo Tree Search: a review of recent modifications and applications

Route optimization for warehouse order picking operations via vehicle routing and simulation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation