Annals of Operations Research

, Volume 143, Issue 1, pp 59–75 | Cite as

Learning dynamic prices in electronic retail markets with customer segmentation



In this paper, we use reinforcement learning (RL) techniques to determine dynamic prices in an electronic monopolistic retail market. The market that we consider consists of two natural segments of customers, captives and shoppers. Captives are mature, loyal buyers whereas the shoppers are more price sensitive and are attracted by sales promotions and volume discounts. The seller is the learning agent in the system and uses RL to learn from the environment. Under (reasonable) assumptions about the arrival process of customers, inventory replenishment policy, and replenishment lead time distribution, the system becomes a Markov decision process thus enabling the use of a wide spectrum of learning algorithms. In this paper, we use the Q-learning algorithm for RL to arrive at optimal dynamic prices that optimize the seller’s performance metric (either long term discounted profit or long run average profit per unit time). Our model and methodology can also be used to compute optimal reorder quantity and optimal reorder point for the inventory policy followed by the seller and to compute the optimal volume discounts to be offered to the shoppers.


Electronic retail market Dynamic pricing Customer segmentation Captives Shoppers Volume discounts Inventory replenishment Markov decision process Reinforcement learning Q-learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abounadi, J., D. Bertsekas, and V. Borkar. (1996) “Learning algorithms for Markov decision processes with average cost.” Technical report, Lab. for Info. and Decision Systems, M.I.T., USA.Google Scholar
  2. Bertsekas, D. P. and J. Tsitsiklis. (1996). Neuro-dynamic Programming. Boston, MA, USA: Athena Scientific.Google Scholar
  3. Brooks, C., R. Fay, R. Das, J. K. MacKie-Mason, J. Kephart, and E. Durfee. (1999). “Automated strategy searches in an electronic goods market: Learning and complex price schedules.” In: Proceedings of the First ACM Conference on Electronic Commerce (EC-99), 31–40.Google Scholar
  4. Carvalho, A. and M. Puttman. (2003). Dynamic Pricing and Reinforcement Learning, URL: Scholar
  5. DiMicco, J. M., A. Greenwald, and P. Maes. (2002). “Learning Curve: A Simulation-based Approach to Dynamic Pricing.”Google Scholar
  6. Elmaghraby, W. and P. Keskinocak. (2002). “Dynamic Pricing: Research Overview, Current Practices and Future Directions, URL:
  7. Gupta, M., K. Ravikumar, and M. Kumar. (2002). “Adaptive strategies for price markdown in a multi-unit descending price auction: A comparative study.” In: Proceedings of the IEEE Conference on Systems, Man, and Cybernetics 373–378.Google Scholar
  8. Hu, J. and Y. Zhang. (2002). “Online Reinformcenet Learning in Multiagent Systems, URL:
  9. Mcgill, J. and G. van Ryzin. (1999). “Revenue management: Research overview and prospects.” Transportation Science 33 (2), 233–256.CrossRefGoogle Scholar
  10. Narahari, Y., C. Raju, and S. Shah. (2003). “Dynamic Pricing Models for Electronic Business.” Technical report, Electronic Enterprises Laboratory, Department of Computer Science an d Automation, Indian Institute of Science.Google Scholar
  11. Salop, S. and J. Stiglitz. (1982) “The theory of sales: A simple model of equilibrium price dispersion with identical agents.” The American Economic Review 72 (5), 1121–1130.Google Scholar
  12. Singh, S. (1994). “Learning to solve Markovian Decision Processes.” Ph.d dissertation, University of Michigan, Ann Arbor.Google Scholar
  13. Smith, B., D. Gunther, B. Rao, and R. Ratliff. (2001). “E-commerce and operations research in airline planning, marketing, and distribution.” Interfaces 31 (2).CrossRefGoogle Scholar
  14. Smith, M., J. Bailey, and E. Brynjolfsson. (2000). Understanding Digital Markets: Review and Assessment. Cambridge, MA: MIT Press.Google Scholar
  15. Sutton, R. S. and A. G. Barto. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.Google Scholar
  16. Swann, J. (1999). “Flexible Pricing Policies: Introduction and a Survey of Implementation in Various Industries.” Technical Report Contract Report CR-99/04/ESL, General Motors Corporation.Google Scholar
  17. Varian, H. R. (1980). “A Model of Sales.” The American Economic Review pp. 651–659.Google Scholar
  18. Watkins, C. J. C. H. and P. Dayan. (1992). “Q-learning.” Machine Learning 8, 279–292.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  1. 1.Electronic Enterprises Laboratory, Computer Science and AutomationIndian Institute of ScienceIndia
  2. 2.General Motors India Science LabsBangaloreIndia

Personalised recommendations