Machine Learning

, Volume 92, Issue 1, pp 5–39 | Cite as

A reinforcement learning approach to autonomous decision-making in smart electricity markets

  • Markus Peters
  • Wolfgang Ketter
  • Maytal Saar-Tsechansky
  • John Collins


The vision of a Smart Electric Grid relies critically on substantial advances in intelligent decentralized control mechanisms. We propose a novel class of autonomous broker agents for retail electricity trading that can operate in a wide range of Smart Electricity Markets, and that are capable of deriving long-term, profit-maximizing policies. Our brokers use Reinforcement Learning with function approximation, they can accommodate arbitrary economic signals from their environments, and they learn efficiently over the large state spaces resulting from these signals. We show how feature selection and regularization can be leveraged to automatically optimize brokers for particular market conditions, and demonstrate the performance of our design in extensive experiments using real-world energy market data.


Energy brokers Feature selection Reinforcement learning Smart electricity grid Trading agents 



We would like to thank three anonymous Machine Learning reviewers and three anonymous ECML-PKDD 2012 reviewers for their insightful comments on this work. The extensive exploration of alternative feature selection and regularization techniques we present here and the subsequent enhanced performance of the agent, was, among other things, inspired by their remarks.


  1. Bichler, M., Gupta, A., & Ketter, W. (2010). Designing smart markets. Information Systems Research, 21(4), 688–699. CrossRefGoogle Scholar
  2. Bishop, C. (1995). Training with noise is equivalent to Tikhonov regularization. Neural Computation, 7(1), 108–116. CrossRefGoogle Scholar
  3. Blum, A., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1), 245–271. MathSciNetMATHCrossRefGoogle Scholar
  4. Bradtke, S., & Barto, A. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1), 33–57. MATHGoogle Scholar
  5. Busoniu, L., Babuska, R., De Schutter, B., & Ernst, D. (2010). Reinforcement learning and dynamic programming using function approximators. Boca Raton: CRC. CrossRefGoogle Scholar
  6. Conejo, A. J., Contreras, J., & Plazas, M. A. (2005). Forecasting electricity prices for a day-ahead pool-based electric energy market. International Journal of Forecasting, 21(3), 435–462. CrossRefGoogle Scholar
  7. Dash, R., Jennings, N., & Parkes, D. (2003). Computational-mechanism design: a call to arms. IEEE Intelligent Systems, 18(6), 40–47. CrossRefGoogle Scholar
  8. De Jong, K. (1988). Learning with genetic algorithms: an overview. Machine Learning, 3(2), 121–138. CrossRefGoogle Scholar
  9. de Weerdt, M., Ketter, W., & Collins, J. (2011). A theoretical analysis of pricing mechanisms and broker’s decisions for real-time balancing in sustainable regional electricity markets. In Conference on information systems and technology, Charlotte (pp. 1–17). Google Scholar
  10. ETPSG (2010). European technology platform smart grids: strategic deployment document for Europe’s electricity networks of the future. Google Scholar
  11. European Commission (2011). EU energy country factsheet. Google Scholar
  12. Ghavamzadeh, M., Lazaric, A., Maillard, O., & Munos, R. (2010). LSTD with random projections. In Proceedings of the twenty-fourth annual conference on advances in neural information processing systems (pp. 721–729). Google Scholar
  13. Gottwalt, S., Ketter, W., Block, C., Collins, J., & Weinhardt, C. (2011). Demand side management—a simulation of household behavior under variable prices. Energy Policy, 39, 8163–8174. CrossRefGoogle Scholar
  14. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. MATHGoogle Scholar
  15. Herter, K., McAuliffe, P., & Rosenfeld, A. (2007). An exploratory analysis of California residential customer response to critical peak pricing of electricity. Energy, 32(1), 25–34. CrossRefGoogle Scholar
  16. Ketter, W., Collins, J., Gini, M., Gupta, A., & Schrater, P. (2012a). Real-time tactical and strategic sales management for intelligent agents guided by economic regimes. Information Systems Research, 23, 1263–1283. CrossRefGoogle Scholar
  17. Ketter, W., Collins, J., Reddy, P., & de Weerdt, M. (2012b). The 2012 power trading agent competition (Tech. Rep. ERS-2012-010-LIS). RSM Erasmus University, Rotterdam, The Netherlands.
  18. Kolter, J., & Ng, A. (2009). Regularization and feature selection in least-squares temporal difference learning. In Proceedings of the 26th annual international conference on machine learning (pp. 521–528). New York: ACM. Google Scholar
  19. Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149. MathSciNetGoogle Scholar
  20. Liepins, G.,& Hilliard, M. (1989). Genetic algorithms: foundations and applications. Annals of Operations Research, 21(1), 31–57. MATHCrossRefGoogle Scholar
  21. Loth, M., Davy, M., & Preux, P. (2007). Sparse temporal difference learning using LASSO. In IEEE international symposium on approximate dynamic programming and reinforcement learning (pp. 352–359). New York: IEEE. CrossRefGoogle Scholar
  22. Nicolaisen, J., Petrov, V., & Tesfatsion, L. (2001). Market power and efficiency in a computational electricity market with discriminatory double-auction pricing. IEEE Transactions on Evolutionary Computation, 5(5), 504–523. CrossRefGoogle Scholar
  23. Painter-Wakefield, C., & Parr, R. (2012). L1 regularized linear temporal difference learning (Tech. Rep. TR-2012-01) Duke University, Computer Science. Google Scholar
  24. Pardoe, D., Stone, P., Saar-Tsechansky, M., Keskin, T., & Tomak, K. (2010). Adaptive auction mechanism design and the incorporation of prior knowledge. INFORMS Journal on Computing, 22(3), 353–370. MATHCrossRefGoogle Scholar
  25. Parkes, D. C. (2007). Online mechanisms. In Algorithmic game theory (pp. 411–439). Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  26. Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., & Littman, M. L. (2008). An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In Proceedings of the 25th international conference on machine learning (pp. 752–759). New York: ACM. CrossRefGoogle Scholar
  27. Peters, M., Ketter, W., Saar-Tsechansky, M., & Collins, J. (2012). Autonomous data-driven decision-making in smart electricity markets. In P. Flach, T. Bie, & N. Cristianini (Eds.), Lecture notes in computer science: Vol. 7524. Machine learning and knowledge discovery in databases (pp. 132–147). Berlin: Springer. CrossRefGoogle Scholar
  28. Petrik, M., Taylor, G., Parr, R., & Zilberstein, S. (2010). Feature selection using regularization in approximate linear programs for Markov decision processes. In International conference on machine learning (ICML). Google Scholar
  29. Pyeatt, L., Howe, A., et al. (2001). Decision tree function approximation in reinforcement learning. In Proceedings of the third international symposium on adaptive systems: evolutionary computation and probabilistic graphical models (Vol. 2, pp. 70–77). Google Scholar
  30. Rahimiyan, M., & Mashhadi, H. (2010). An adaptive Q-learning algorithm developed for agent-based computational modeling of electricity market. IEEE Transactions on Systems, Man and Cybernetics, 40(5), 547–556. CrossRefGoogle Scholar
  31. Ramavajjala, V., & Elkan, C. (2012). Policy iteration based on a learned transition model. In Machine learning and knowledge discovery in databases (pp. 211–226). CrossRefGoogle Scholar
  32. Reddy, P., & Veloso, M. (2011a). Learned behaviors of multiple autonomous agents in smart grid markets. In Proceedings of the twenty-fifth AAAI conference on artificial intelligence (AAAI-11). Google Scholar
  33. Reddy, P., & Veloso, M. (2011b). Strategy learning for autonomous agents in smart grid markets. In Proceedings of the twenty-second international joint conference on artificial intelligence (IJCAI) (pp. 1446–1451). Google Scholar
  34. Rummery, G., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Tech. Rep. CUED/F-INFENG/TR 166, University of Cambridge. Google Scholar
  35. Sutton, R., & Barto, A. (1998). Reinforcement learning: an introduction (Vol. 116). Cambridge: Cambridge University Press. Google Scholar
  36. Szepesvári, C. (2010). Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 4(1), 1–103. CrossRefGoogle Scholar
  37. Venayagamoorthy, G. (2009). Potentials and promises of computational intelligence for smart grids. In Power & energy society general meeting (pp. 1–6). New York: IEEE. CrossRefGoogle Scholar
  38. Werbos, P. (2009). Putting more brain-like intelligence into the electric power grid: what we need and how to do it. In International joint conference on neural networks (pp. 3356–3359). New York: IEEE. Google Scholar
  39. Whiteson, S., Stone, P., Stanley, K., Miikkulainen, R., & Kohl, N. (2005). Automatic feature selection in neuroevolution. In Proceedings of the 2005 conference on genetic and evolutionary computation (pp. 1225–1232). New York: ACM. CrossRefGoogle Scholar
  40. Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2011). Protecting against evaluation overfitting in empirical reinforcement learning. In IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL). Google Scholar
  41. Wilson, C., & Price, C. (2010). Do consumers switch to the best supplier? Oxford Economic Papers, 62(4), 647–668. CrossRefGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Markus Peters
    • 1
  • Wolfgang Ketter
    • 1
  • Maytal Saar-Tsechansky
    • 2
  • John Collins
    • 3
  1. 1.Rotterdam School of ManagementErasmus UniversityRotterdamThe Netherlands
  2. 2.McCombs School of BusinessUniversity of Texas at AustinAustinUSA
  3. 3.Dept. of Computer Science and EngineeringUniversity of MinnesotaMinneapolisUSA

Personalised recommendations