Abstract
The airline industry strives to maximize the revenue obtained from the sale of tickets on every flight. This is referred to as revenue management and it forms a crucial aspect of airline logistics. Ticket pricing, seat or discount allocation, and overbooking are some of the important aspects of a revenue management problem. Though ticket pricing is usually heavily influenced by factors beyond the control of an airline company, significant amount of control can be exercised over the seat allocation and the overbooking aspects. A realistic model for a single leg of a flight should consider multiple fare classes, overbooking of the flight, concurrent demand arrivals of passengers from the different fare classes, and class-dependent, random cancellations. Accommodating all these factors in one optimization model is a challenging task because that makes it a very large-scale stochastic optimization problem. Almost all papers in the existing literature either accommodate only a subset of these factors or use a discrete approximation in order to make the model tractable. We consider all these factors and cast the single leg problem as a semi-Markov Decision Problem (SMDP) under the average reward optimizing criterion over an infinite time horizon. We solve it using a stochastic optimization technique called Reinforcement Learning. Not only is Reinforcement Learning able to scale up to a huge state-space but because it is simulation-based it can also handle complex modeling assumptions such as the ones mentioned above. The state-space of the numerical test problem scenarios considered here is non-denumerable; its countable part being of the order of 109. Our solution procedure involves a multi-step extension of the SMART algorithm which is based on the one-step Bellman equation. Numerical results presented here show that our approach is able to outperform a heuristic, namely the nested version of the EMSR heuristic, which is widely used in the airline industry. We also present a detailed study of the sensitivity of some modeling parameters via a full factorial experiment.
Similar content being viewed by others
References
Abounadi, J. (1998) Stochastic approximation for non-expansive maps: application to Q-learning algorithms. PhD thesis, MIT, Cambridge, MA.
Bandla, N. (1998) Airline yield management using a reinforcement learning approach. Unpublished Master's thesis, Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL.
Bellman, R. (1954) The theory of dynamic programming. Bulletin of the American Mathematical Society, 60, 503–516.
Belobaba, P.P. (1989) Application of a probabilistic decision model to airline seat inventory control. Operations Research, 37, 183–197.
Bertsekas, D.P. (1995) Dynamic Programming and Optimal Control, Vol. II, Athena Scientific, Belmont, MA.
Bertsekas, D. and Tsitsiklis, J. (1996) Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.
Brumelle, S.L. and McGill, J.I. (1993) Airline seat allocation with multiple nested fare classes. Operations Research, 41, 127–137.
Chatwin, R.E. (1998) Multiperiod airline overbooking with a single fare class. Operations Research, 46(6), 805–819.
Curry, R.E. (1990) Optimal airline seat allocation with fare classes nested by origins and destinations. Transportation Science, 24, 193–204.
Darken, C., Chang, J. and Moody, J. (1992) Learning rate schedules for faster stochastic gradient search, in Neural Networks for Signal Processing 2 Proceedings of the 1992 IEEE Workshop, White, D.A. and Sofge, D.A. (ed), IEEE Press, Piscataway, NJ.
Das, T.K., Gosavi, A., Mahadevan, S. and Marchalleck, N. (1999) Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 45(4), 560–574.
Davis, P. (1994) Airline ties profitability yield to management. SIAM News, 27(5).
Glover, F., Glover R., Lorenzo, J. and McMillan, C. (1982) The passenger-mix problem in the scheduled airlines. Interfaces, 12, 73–79.
Gosavi, A. (1999) An algorithm for solving semi-Markov decision problems using reinforcement learning: convergence analysis and numerical results. PhD thesis, University of South Florida, Tampa, FL.
Higle, J.L. and Sen, S. (1991) Stochastic decomposition: an algorithm for two-stage linear programs with recourse. Mathematics of Operations Research, 16, 650–669.
Howard, R. (1960) Dynamic Programming and Markov Processes, MIT Press, Cambridge, MA.
Howard, R. (1971) Dynamic Probabilities Systems Volume II; semi-Markov Decision Processes, John Wiley and Sons, New York, NY, p. 976 onwards.
Littlewood, K. (1972) Forecasting and control of passenger bookings, in Proceedings of the 12th AGIFORS Symposium, Nathanya, Israel, pp. 95–117.
Martinez, R. and Sanchez, M. (1970) Automatic booking level control, in Proceedings of the 10th AGIFORS Symposium, Terrigal, Australia, pp. 1–20.
McGill, J.I. and Van Ryzin, G.J. (1999) Revenue management: research overview and prospects. Transportation Science, 33(2), 233–256.
Puterman, M.L. (1994) Markov Decision Processes, Wiley Interscience, New York, NY.
Robbins, H. and Monro, S. (1951) A stochastic approximation method.Annals of Mathematical Statistics, 22, 400–407.
Robinson, L.W. (1995) Optimal and approximate control policies for airline booking with sequential nonmonotonic fare classes. Operations Research, 43, 252–263.
Shapiro, A. (2000) Stochastic programming by Monte Carlo methods.Preprint, Georgia Institute of Technology, Atlanta, GA.
Smith, B.C., Leimkuhler, J.F. and Darrow, R.M. (1992) Yield management at American Airlines. Interfaces, 22, 8–31.
Subramaniam, J., Stidham, Jr, S. and Lautenbacher, C.J. (1999) Airline yield management with overbooking, cancellations and noshows.Transportation Science, 33(2), 147–167.
Sutton, R.S. (1988) Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Sutton, R. and Barto, A.G. (1998) Reinforcement Learning, The MIT Press, Cambridge, MA.
Talluri, K.T. and Van Ryzin, G.J. (1999) Bid-price controls for network revenue management. Management Science, 44, 1577–1593.
Thomson, H.R. (1961) Statistical problems in airline reservation control.Operational Research Quarterly, 12, 167–185.
Van Ryzin, G.J. and McGill, J.I. (2000) Revenue management without forecasting or optimization: an adaptive algorithm for determining seat protection levels. Management Science, 46(6), 760–775.
Watkins, C.J. (1989) Learning from delayed rewards. PhD thesis, Kings College, Cambridge, UK.
Wheeler, R. and Narenda, K. (1986) Decentralized learning in finite Markov chains. IEEE Transactions on Automatic Control, 31(6), 373–376.
Wollmer, R.D. (1992) An airline seat management model for a single leg route when lower fare classes book first. Operations Research, 40, 26–37.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Gosavi, A., Bandla, N. & Das, T.K. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking. IIE Transactions 34, 729–742 (2002). https://doi.org/10.1023/A:1015583703449
Issue Date:
DOI: https://doi.org/10.1023/A:1015583703449