Autonomous HVAC Control, A Reinforcement Learning Approach

  • Enda Barrett
  • Stephen Linder
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9286)


Recent high profile developments of autonomous learning thermostats by companies such as Nest Labs and Honeywell have brought to the fore the possibility of ever greater numbers of intelligent devices permeating our homes and working environments into the future. However, the specific learning approaches and methodologies utilised by these devices have never been made public. In fact little information is known as to the specifics of how these devices operate and learn about their environments or the users who use them. This paper proposes a suitable learning architecture for such an intelligent thermostat in the hope that it will benefit further investigation by the research community. Our architecture comprises a number of different learning methods each of which contributes to create a complete autonomous thermostat capable of controlling a HVAC system. A novel state action space formalism is proposed to enable a Reinforcement Learning agent to successfully control the HVAC system by optimising both occupant comfort and energy costs. Our results show that the learning thermostat can achieve cost savings of 10% over a programmable thermostat, whilst maintaining high occupant comfort standards.


HVAC control Reinforcement learning Bayesian learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fuzzy logic inference temp. controller for air conditioner, September 1, 1999. cN Patent 2,336,254
  2. 2.
    Honeywell evohome, January 01, 2015.
  3. 3.
    Nest thermostat, January 01, 2015.
  4. 4.
    Ahmed, O.: Method and apparatus for determining a thermal setpoint in a hvac system, November 9, 2004. cA Patent 2,289,237
  5. 5.
    Barrett, E., Duggan, J., Howley, E.: A parallel framework for bayesian reinforcement learning. Connection Science 26(1), 7–23 (2014)CrossRefzbMATHGoogle Scholar
  6. 6.
    Barrett, E., Howley, E., Duggan, J.: A learning architecture for scheduling workflow applications in the cloud. In: 2011 Ninth IEEE European Conference on Web Services (ECOWS), pp. 83–90. IEEE (2011)Google Scholar
  7. 7.
    Barrett, E., Howley, E., Duggan, J.: Applying reinforcement learning towards automating resource allocation and application scalability in the cloud. Concurrency and Computation: Practice and Experience (2012)Google Scholar
  8. 8.
    Choi, S., Yeung, D.Y.: Predictive q-routing: a memory-based reinforcement learning approach to adaptive tra c control. In: Advances in Neural Information Processing Systems 8, pp. 945–951 (1996)Google Scholar
  9. 9.
    Dage, G., Davis, L., Matteson, R., Sieja, T.: Method and system for controlling an automotive hvac system, July 22, 1998. eP Patent 0,706,682
  10. 10.
    Dorigo, M., Gambardella, L.: Ant-q: a reinforcement learning approach to the traveling salesman problem. In: Proceedings of ML-95, Twelfth Intern. Conf. on Machine Learning, pp. 252–260 (2014)Google Scholar
  11. 11.
    Doshi, P., Goodwin, R., Akkiraju, R., Verma, K.: Dynamic workflow composition using markov decision processes. International Journal of Web Services Research 2, 1–17 (2005)CrossRefGoogle Scholar
  12. 12.
    Dutreilh, X., Kirgizov, S., Melekhova, O., Malenfant, J., Rivierre, N., Truck, I.: Using reinforcement learning for autonomic resource allocation in clouds: towards a fully automated workflow. In: The Seventh International Conference on Autonomic and Autonomous Systems, ICAS 2011, pp. 67–74 (2011)Google Scholar
  13. 13.
    Fadell, A., Rogers, M., Satterthwaite, E., Smith, I., Warren, D., Palmer, J., Honjo, S., Erickson, G., Dutra, J., Fiennes, H.: User-friendly, network connected learning thermostat and related systems and methods, July 4, 2013. uS Patent App. 13/656,189
  14. 14.
    Grzes, M., Kudenko, D.: Learning shaping rewards in model-based reinforcement learning. In: Proc. AAMAS 2009 Workshop on Adaptive Learning Agents, vol. 115 (2009)Google Scholar
  15. 15.
    Karray, F.O., De Silva, C.W.: Soft computing and intelligent systems design: theory, tools, and applications. Pearson Education (2004)Google Scholar
  16. 16.
    Nau, D., Ghallab, M., Traverso, P.: Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., San Francisco (2004)zbMATHGoogle Scholar
  17. 17.
    Russell, S., Norvig, P., Canny, J., Malik, J., Edwards, D.: Artificial intelligence: a modern approach, vol. 2. Prentice hall Englewood Cliffs, NJ (1995)Google Scholar
  18. 18.
    Scott, J., Bernheim Brush, A., Krumm, J., Meyers, B., Hazas, M., Hodges, S., Villar, N.: Preheat: controlling home heating using occupancy prediction. In: Proceedings of the 13th International Conference on Ubiquitous Computing, pp. 281–290. ACM (2011)Google Scholar
  19. 19.
    Spiegelhalter, D.J., Dawid, A.P., Lauritzen, S.L., Cowell, R.G.: Bayesian analysis in expert systems. Statistical science, 219–247 (1993)Google Scholar
  20. 20.
    Strens, M.: A bayesian framework for reinforcement learning, pp. 943–950 (2000)Google Scholar
  21. 21.
    Tesauro, G.: Temporal difference learning and td-gammon. Communications of the ACM 38(3), 58–68 (1995)CrossRefGoogle Scholar
  22. 22.
    Tesauro, G., Kephart, J.O.: Pricing in agent economies using multi-agent q-learning. Autonomous Agents and Multi-Agent Systems 5(3), 289–304 (2002)CrossRefzbMATHGoogle Scholar
  23. 23.
    Watkins, C.: Learning from Delayed Rewards. Ph.D. thesis, University of Cambridge, England (1989)Google Scholar
  24. 24.
    Wiering, M.: Multi-agent reinforcement learning for traffic light control. In: ICML, pp. 1151–1158 (2000)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Schneider ElectricGalwayIreland
  2. 2.Schneider ElectricAndoverUSA

Personalised recommendations