Skip to main content
Log in

A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking

  • Published:
IIE Transactions

Abstract

The airline industry strives to maximize the revenue obtained from the sale of tickets on every flight. This is referred to as revenue management and it forms a crucial aspect of airline logistics. Ticket pricing, seat or discount allocation, and overbooking are some of the important aspects of a revenue management problem. Though ticket pricing is usually heavily influenced by factors beyond the control of an airline company, significant amount of control can be exercised over the seat allocation and the overbooking aspects. A realistic model for a single leg of a flight should consider multiple fare classes, overbooking of the flight, concurrent demand arrivals of passengers from the different fare classes, and class-dependent, random cancellations. Accommodating all these factors in one optimization model is a challenging task because that makes it a very large-scale stochastic optimization problem. Almost all papers in the existing literature either accommodate only a subset of these factors or use a discrete approximation in order to make the model tractable. We consider all these factors and cast the single leg problem as a semi-Markov Decision Problem (SMDP) under the average reward optimizing criterion over an infinite time horizon. We solve it using a stochastic optimization technique called Reinforcement Learning. Not only is Reinforcement Learning able to scale up to a huge state-space but because it is simulation-based it can also handle complex modeling assumptions such as the ones mentioned above. The state-space of the numerical test problem scenarios considered here is non-denumerable; its countable part being of the order of 109. Our solution procedure involves a multi-step extension of the SMART algorithm which is based on the one-step Bellman equation. Numerical results presented here show that our approach is able to outperform a heuristic, namely the nested version of the EMSR heuristic, which is widely used in the airline industry. We also present a detailed study of the sensitivity of some modeling parameters via a full factorial experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abounadi, J. (1998) Stochastic approximation for non-expansive maps: application to Q-learning algorithms. PhD thesis, MIT, Cambridge, MA.

    Google Scholar 

  • Bandla, N. (1998) Airline yield management using a reinforcement learning approach. Unpublished Master's thesis, Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL.

    Google Scholar 

  • Bellman, R. (1954) The theory of dynamic programming. Bulletin of the American Mathematical Society, 60, 503–516.

    Google Scholar 

  • Belobaba, P.P. (1989) Application of a probabilistic decision model to airline seat inventory control. Operations Research, 37, 183–197.

    Google Scholar 

  • Bertsekas, D.P. (1995) Dynamic Programming and Optimal Control, Vol. II, Athena Scientific, Belmont, MA.

    Google Scholar 

  • Bertsekas, D. and Tsitsiklis, J. (1996) Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.

    Google Scholar 

  • Brumelle, S.L. and McGill, J.I. (1993) Airline seat allocation with multiple nested fare classes. Operations Research, 41, 127–137.

    Google Scholar 

  • Chatwin, R.E. (1998) Multiperiod airline overbooking with a single fare class. Operations Research, 46(6), 805–819.

    Google Scholar 

  • Curry, R.E. (1990) Optimal airline seat allocation with fare classes nested by origins and destinations. Transportation Science, 24, 193–204.

    Google Scholar 

  • Darken, C., Chang, J. and Moody, J. (1992) Learning rate schedules for faster stochastic gradient search, in Neural Networks for Signal Processing 2 Proceedings of the 1992 IEEE Workshop, White, D.A. and Sofge, D.A. (ed), IEEE Press, Piscataway, NJ.

    Google Scholar 

  • Das, T.K., Gosavi, A., Mahadevan, S. and Marchalleck, N. (1999) Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 45(4), 560–574.

    Google Scholar 

  • Davis, P. (1994) Airline ties profitability yield to management. SIAM News, 27(5).

  • Glover, F., Glover R., Lorenzo, J. and McMillan, C. (1982) The passenger-mix problem in the scheduled airlines. Interfaces, 12, 73–79.

    Google Scholar 

  • Gosavi, A. (1999) An algorithm for solving semi-Markov decision problems using reinforcement learning: convergence analysis and numerical results. PhD thesis, University of South Florida, Tampa, FL.

    Google Scholar 

  • Higle, J.L. and Sen, S. (1991) Stochastic decomposition: an algorithm for two-stage linear programs with recourse. Mathematics of Operations Research, 16, 650–669.

    Google Scholar 

  • Howard, R. (1960) Dynamic Programming and Markov Processes, MIT Press, Cambridge, MA.

    Google Scholar 

  • Howard, R. (1971) Dynamic Probabilities Systems Volume II; semi-Markov Decision Processes, John Wiley and Sons, New York, NY, p. 976 onwards.

    Google Scholar 

  • Littlewood, K. (1972) Forecasting and control of passenger bookings, in Proceedings of the 12th AGIFORS Symposium, Nathanya, Israel, pp. 95–117.

    Google Scholar 

  • Martinez, R. and Sanchez, M. (1970) Automatic booking level control, in Proceedings of the 10th AGIFORS Symposium, Terrigal, Australia, pp. 1–20.

    Google Scholar 

  • McGill, J.I. and Van Ryzin, G.J. (1999) Revenue management: research overview and prospects. Transportation Science, 33(2), 233–256.

    Google Scholar 

  • Puterman, M.L. (1994) Markov Decision Processes, Wiley Interscience, New York, NY.

    Google Scholar 

  • Robbins, H. and Monro, S. (1951) A stochastic approximation method.Annals of Mathematical Statistics, 22, 400–407.

    Google Scholar 

  • Robinson, L.W. (1995) Optimal and approximate control policies for airline booking with sequential nonmonotonic fare classes. Operations Research, 43, 252–263.

    Google Scholar 

  • Shapiro, A. (2000) Stochastic programming by Monte Carlo methods.Preprint, Georgia Institute of Technology, Atlanta, GA.

    Google Scholar 

  • Smith, B.C., Leimkuhler, J.F. and Darrow, R.M. (1992) Yield management at American Airlines. Interfaces, 22, 8–31.

    Google Scholar 

  • Subramaniam, J., Stidham, Jr, S. and Lautenbacher, C.J. (1999) Airline yield management with overbooking, cancellations and noshows.Transportation Science, 33(2), 147–167.

    Google Scholar 

  • Sutton, R.S. (1988) Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.

    Google Scholar 

  • Sutton, R. and Barto, A.G. (1998) Reinforcement Learning, The MIT Press, Cambridge, MA.

    Google Scholar 

  • Talluri, K.T. and Van Ryzin, G.J. (1999) Bid-price controls for network revenue management. Management Science, 44, 1577–1593.

    Google Scholar 

  • Thomson, H.R. (1961) Statistical problems in airline reservation control.Operational Research Quarterly, 12, 167–185.

    Google Scholar 

  • Van Ryzin, G.J. and McGill, J.I. (2000) Revenue management without forecasting or optimization: an adaptive algorithm for determining seat protection levels. Management Science, 46(6), 760–775.

    Google Scholar 

  • Watkins, C.J. (1989) Learning from delayed rewards. PhD thesis, Kings College, Cambridge, UK.

    Google Scholar 

  • Wheeler, R. and Narenda, K. (1986) Decentralized learning in finite Markov chains. IEEE Transactions on Automatic Control, 31(6), 373–376.

    Google Scholar 

  • Wollmer, R.D. (1992) An airline seat management model for a single leg route when lower fare classes book first. Operations Research, 40, 26–37.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gosavi, A., Bandla, N. & Das, T.K. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking. IIE Transactions 34, 729–742 (2002). https://doi.org/10.1023/A:1015583703449

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1015583703449

Keywords

Navigation