Skip to main content

Reinforcement Learning Methodologies for Controlling Occupant Comfort in Buildings

  • Chapter
  • First Online:
Data-driven Analytics for Sustainable Buildings and Cities

Abstract

Classical building control systems are becoming vulnerable with increasing complexities in contemporary built environments and energy systems. Due to this, the reinforcement learning (RL) method is becoming more distinctive and applicable in control networks for buildings. This chapter, therefore, conducts a comprehensive review of RL techniques applied in control systems for occupant comfort in indoor built environments. The empirical applications of RL-based control systems are presented, depending on comfort objectives (thermal comfort, indoor air quality, and lighting) along with other objectives which invariably includes energy consumption. The class of RL algorithms and implementation details regarding how the value functions have been represented and how the policies are improved are also illustrated. This chapter shows there are limited works for which RL has been explored for controlling occupant comfort, especially in indoor air quality and lighting. Relatively few of the reviewed works incorporate occupancy patterns and/or occupant feedback into the control loop. Moreover, this chapter identifies a gap with regard to the performance of implementing cooperative multiagent RL (MARL). Based on our findings, current challenges and further opportunities are discussed. We expect to clarify the feasible theory and functions of RL for building control systems, which would promote their widespread application in built environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Altnan E (1999) Constrained Markov decision processes. Chapman & Hall/CRC

    Google Scholar 

  • ASHRAE Standard 55 (2017) Thermal environmental conditions for human occupancy. ASHRAE Inc.

    Google Scholar 

  • Baghaee S, Ulusoy I (2018) User comfort and energy efficiency in HVAC systems by Q-learning. In: 2018 26th signal processing and communications applications conference (SIU), pp 1–4

    Google Scholar 

  • Barrett E, Linder S (2015) Autonomous HVAC control, a reinforcement learning approach. In: Bifet A, May M, Zadrozny B, Gavalda R, Pedreschi D, Bonchi F et al (eds) Machine learning and knowledge discovery in databases. Springer International Publishing, pp 3–19

    Google Scholar 

  • Bellman R (1957a) A Markovian decision process. Indiana Univ Math J 6(4):679–684

    Article  Google Scholar 

  • Bellman R (1957b) Dynamic programming. Princeton Univ. Press, Princeton, NJ

    Google Scholar 

  • Bielskis AA, Guseinoviene E, Drungilas D, Gricius G, Zulkas E (2013) Modelling of ambient comfort affect reward based adaptive laboratory climate controller. Elektronika Ir Elektrotechnika 19(8):79–82

    Article  Google Scholar 

  • Bonte M, Perles A, Lartigue B, Thellier F (2014) An occupant behaviour model based on artificial intelligence for energy building simulation. In: Proceedings of the 13th international IBPSA conference

    Google Scholar 

  • Boodi A, Beddiar K, Benamour M, Amirat Y, Benbouzid M (2018) Intelligent systems for building energy and occupant comfort optimization: a state of the art review and recommendations. Energies 11(10):2604

    Article  Google Scholar 

  • Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J et al (2016) OpenAI Gym. arXiv:1606.01540 [Cs]

  • BuÅŸoniu L, BabuÅ¡ka R, De Schutter B (2010) Multi-agent reinforcement learning: an overview. In: Srinivasan D, Jain LC (eds) Innovations in multi-agent systems and applications—1, vol 310, pp 183–221. Springer Berlin Heidelberg

    Google Scholar 

  • CEN prEN15251 (2005) Criteria for the indoor environment including thermal, indoor air quality, light and noise

    Google Scholar 

  • Chen Y, Norford LK, Samuelson HW, Malkawi A (2018) Optimal control of HVAC and window systems for natural ventilation through reinforcement learning. Energy Build 169:195–205

    Article  Google Scholar 

  • Chenari B, Dias Carrilho J, Gameiro da Silva M (2016) Towards sustainable, energy-efficient and healthy ventilation strategies in buildings: a review. Renew Sustain Energy Rev 59:1426–1447

    Article  Google Scholar 

  • Cheng Z, Zhao Q, Wang F, Jiang Y, Xia L, Ding J (2016) Satisfaction based Qlearning for integrated lighting and blind control. Energy Build 127:43–55

    Article  Google Scholar 

  • Christiano P, Leike J, Brown TB, Martic M, Legg S, Amodei D (2017) Deep reinforcement learning from human preferences. arXiv:1706.03741 [Cs, Stat]

  • Dalamagkidis K, Kolokots D (2008) Reinforcement learning for building environmental control. In: Weber C, Elshaw M, Michael N (eds) Reinforcement learning. ITech Education and Publishing

    Google Scholar 

  • Dalamagkidis K, Kolokotsa D, Kalaitzakis K, Stavrakakis GS (2007) Reinforcement learning for energy conservation and comfort in buildings. Build Environ 42(7):2686–2698

    Article  Google Scholar 

  • Eller L, Siafara LC, Sauter T (2018) Adaptive control for building energy management using reinforcement learning. IEEE Int Conf Industr Technol (ICIT) 2018:1562–1567

    Google Scholar 

  • Enescu D (2017) A review of thermal comfort models and indicators for indoor environments. Renew Sustain Energy Rev 79:1353–1379

    Article  Google Scholar 

  • Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6:503–556

    Google Scholar 

  • Frontczak M, Wargocki P (2011) Literature survey on how different factors influence human comfort in indoor environments. Build Environ 46(4):922–937

    Article  Google Scholar 

  • Fu Q, Hu L, Wu H, Hu F, Hu W, Chen J (2018) A Sarsa-based adaptive controller for building energy conservation. J Comput Methods Sci Eng 18(2):329–338

    Google Scholar 

  • Galasiu AD, Veitch JA (2006) Occupant preferences and satisfaction with the luminous environment and control systems in daylit offices: a literature review. Energy Build 38(7):728–742

    Article  Google Scholar 

  • Gambier A (2004). Real-time control systems: a tutorial. In: Presented at the 5th Asian control conference (IEEE Cat. No. 04EX904), pp 1024–1031

    Google Scholar 

  • Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C 42(6):1291–1307

    Article  Google Scholar 

  • Gu S, Lillicrap T, Sutskever I, Levine S (2016) Continuous deep Q-learning with model-based acceleration. In: Presented at the conference on machine learning, vol 48

    Google Scholar 

  • Guo X, Tiller D, Henze G, Waters C (2010) The performance of occupancy-based lighting control systems: a review. Light Res Technol 42(4):415–431

    Article  Google Scholar 

  • Guyot G, Sherman MH, Walker IS (2018) Smart ventilation energy and indoor air quality performance in residential buildings: a review. Energy Build 165:416–430

    Article  Google Scholar 

  • Haq MA, Hassan MY, Abdullah H, Rahman HA, Abdullah MP, Hussin F et al (2014) A review on lighting control technologies in commercial buildings, their performance and affecting factors. Renew Sustain Energy Rev 33:268–279

    Article  Google Scholar 

  • Hurtado LA, Mocanu E, Nguyen PH, Gibescu M, Kamphuis RIG (2018) Enabling cooperative behaviour for building demand response based on extended joint action learning. IEEE Trans Industr Inf 14(1):127–136

    Article  Google Scholar 

  • Jouffe L (1997) Ventilation control learning with FACL. In: Proceedings of 6th international fuzzy systems conference, vol 3, pp 1719–1724

    Google Scholar 

  • Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Article  Google Scholar 

  • Khalili AH, Wu C, Aghajan H (2010) Hierarchical preference learning for light control from user feedback. In: IEEE computer society conference on computer vision and pattern recognition—workshops, pp 56–62

    Google Scholar 

  • Klein L, Kwak J, Kavulya G, Jazizadeh F, Becerik-Gerber B, Varakantham P et al (2012) Coordinating occupant behaviour for building energy and comfort management using multi-agent systems. Autom Constr 22:525–536

    Article  Google Scholar 

  • Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. Presented Adv Neural Inform Process Syst 12:1008–1014

    Google Scholar 

  • Kruisselbrink T, Dangol R, Rosemann A (2018) Photometric measurements of lighting quality: an overview. Build Environ 138:42–52

    Article  Google Scholar 

  • Li B, Xia L (2015) A multi-grid reinforcement learning method for energy conservation and comfort of HVAC in buildings. IEEE Int Conf Autom Sci Eng (CASE) 2015:444–449

    Google Scholar 

  • Li D, Zhao D, Zhu Y, Xia Z (2015) Thermal comfort control based on MEC algorithm for HVAC systems. Int Joint Conf Neural Netw (IJCNN) 2015:1–6

    Google Scholar 

  • Li N, Cui H, Zhu C, Zhang X, Su L (2016) Grey preference analysis of indoor environmental factors using sub-indexes based on Weber/Fechner’s law and predicted mean vote. Indoor Built Environ 25(8):1197–1208

    Google Scholar 

  • Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y et al (2016) Continuous control with deep reinforcement learning. arXiv:1509.02971 [Cs, Stat]

  • Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Presented at the conference on machine learning, pp 157–163

    Google Scholar 

  • Lu S, Wang W, Lin C, Hameen E (2019) Data-driven simulation of a thermal comfort-based temperature set-point control with ASHRAE RP884. Build Environ

    Google Scholar 

  • Marinakis V, Karakosta C, Doukas H, Androulaki S, Psarras J (2013) A building automation and control tool for remote and real time monitoring of energy consumption. Sustain Cities Soc 6:11–15

    Article  Google Scholar 

  • Mataric MJ (1994) Reward functions for accelerated learning. In: Presented at the proceedings 11th international conference on machine learning (ICML-94), pp 181–189

    Google Scholar 

  • Merabti S, Draoui B, Bounaama F (2016) A review of control systems for energy and comfort management in buildings. In: 2016 8th international conference on modelling, identification and control (ICMIC), pp 478–486

    Google Scholar 

  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  • Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T et al (2016) Asynchronous methods for deep reinforcement learning. arXiv:1602.01783 [Cs]

  • Mozer MC (1998) The neural network house: an environment hat adapts to its inhabitants, vol 5

    Google Scholar 

  • Nagy A, Kazmi H, Cheaib F, Driesen J, Leuven K (2018) Deep reinforcement learning for optimal control of space heating. arXiv:1805.03777 [Stat.AP]

  • Oca S, Hong T, Langevin J (2018) The human dimensions of energy use in buildings: a review. Renew Sustain Energy Rev 81:731–742

    Google Scholar 

  • Park JY, Nagy Z (2018) Comprehensive analysis of the relationship between thermal comfort and building control research—a data-driven literature review. Renew Sustain Energy Rev 82:2664–2679

    Article  Google Scholar 

  • Park JY, Dougherty T, Fritz H, Nagy Z (2019) LightLearn: an adaptive and occupant centered controller for lighting based on reinforcement learning. Build Environ 147:397–414

    Article  Google Scholar 

  • Pedro F, Kalyan V, Pedro L, Una-May O (2014) Using reinforcement learning to optimize occupant comfort and energy usage in HVAC systems. J Ambient Intell Smart Environ 6:675–690

    Article  Google Scholar 

  • Roetzel A, Tsangrassoulis A, Dietrich U, Busching S (2010) A review of occupant control on natural ventilation. Renew Sustain Energy Rev 14(3):1001–1013

    Article  Google Scholar 

  • Royapoor AI, Caraiscos C (2009) Advanced control systems engineering for energy and comfort management in a building environment—a review. Renew Sustain Energy Rev 13(6–7):1246–1261

    Google Scholar 

  • Royapoor M, Antony A, Roskilly T (2018) A review of building climate and plant controls, and a survey of industry perspectives. Energy Build 158:453–465

    Article  Google Scholar 

  • Ruelens F, Iacovella S, Claessens BJ, Belmans R (2015) Learning agent for a heatpump thermostat with a set-back strategy using model-free reinforcement learning. Energies 8(8):8300–8318

    Article  Google Scholar 

  • Rummery G, Niranjan M (1994) On-line Q-learning using connectionist systems. Cambridge University

    Google Scholar 

  • Sato K, Samejima M, Akiyoshi M, Komoda N (2012) A scheduling method of air conditioner operation using workers daily action plan towards energy saving and comfort at office. In: Proceedings of 2012 IEEE 17th international conference on emerging technologies & factory automation (ETFA 2012), pp 1–6

    Google Scholar 

  • Schmidt M, Moreno MV, Schulke A, Macek K, MaÅ™ik K, Pastor AG (2017) Optimizing legacy building operation: the evolution into data-driven predictive cyber-physical systems. Energy Build 148:257–279

    Article  Google Scholar 

  • Schwartz HM (2014) Multi-agent machine learning. a reinforcement approach, 1st ed. Wiley

    Google Scholar 

  • Sen S, Sekaran M, Hale J (1994) Learning to coordinate without sharing information. In: Presented at the 12th national conference on artificial intelligence (AAAI-94), pp 426–431

    Google Scholar 

  • Shaikh PH, Nor NBM, Nallagownden P, Elamvazuthi I, Ibrahim T (2013) Robust stochastic control model for energy and comfort management of buildings. Aust J Basic Appl Sci 7(10):137–144

    Google Scholar 

  • Shaikh PH, Nor NBM, Nallagownden P, Elamvazuthi I, Ibrahim T (2014) A review on optimized control systems for building energy and comfort management of smart sustainable buildings. Renew Sustain Energy Rev 34:409–429

    Article  Google Scholar 

  • Silver D (2015) RL course by David Silver. UCL. Retrieved from http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

  • Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  • Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359

    Article  Google Scholar 

  • Song Y, Wu S, Yan YY (2015) Control strategies for indoor environment quality and energy efficiency—a review. Int J Low-Carbon Technol 10(3):305–312

    Article  Google Scholar 

  • Sun B, Luh PB, Jia Q, Yan B (2013) Event-based optimization with non-stationary uncertainties to save energy costs of HVAC systems in buildings. IEEE Int Conf Autom Sci Eng (CASE) 2013:436–441

    Google Scholar 

  • Sun B, Luh PB, Jia Q, Yan B (2015a) Event-based optimization within the lagrangian relaxation framework for energy savings in HVAC systems. IEEE Trans Autom Sci Eng 12(4):1396–1406

    Article  Google Scholar 

  • Sun Y, Somani A, Carroll TE (2015b) Learning based bidding strategy for HVAC systems in double auction retail energy markets. Am Control Conf (ACC) 2015:2912–2917

    Google Scholar 

  • Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. The MIT Press, Cambridge, Massachusetts

    Google Scholar 

  • Sycara KP (1998) Multiagent systems. AI Mag 19:79–92

    Google Scholar 

  • Urieli D, Stone P (2013) A learning agent for heat-pump thermostat control, p 8

    Google Scholar 

  • Vazquez-Canteli JR, Nagy Z (2019) Reinforcement learning for demand response: a review of algorithms and modeling techniques. Appl Energy 235:1072–1089

    Article  Google Scholar 

  • Vazquez-Canteli JR, Ulyanin S, Kampf J, Nagy Z (2019) Fusing tensorflow with building energy simulation for intelligent energy management in smart cities. Sustain Cities Soc 45:243–257

    Article  Google Scholar 

  • Vesely M, Zeiler W (2014) Personalized conditioning and its impact on thermal comfort and energy performance—a review. Renew Sustain Energy Rev 34:401–408

    Article  Google Scholar 

  • Wang W, Zmeureanu R, Rivard H (2005) Applying multi-objective genetic algorithms in green building design optimization. Build Environ 40(11):1512–1525

    Article  Google Scholar 

  • Wang Y, Kuckelkorn J, Liu Y (2017a) A state of art review on methodologies for control strategies in low energy buildings in the period from 2006 to 2016. Energy Build 147:27–40

    Article  Google Scholar 

  • Wang Y, Velswamy K, Huang B (2017b) A long-short term memory recurrent neural network based reinforcement learning controller for office heating ventilation and air conditioning systems. Processes 5(3):46

    Article  Google Scholar 

  • Wang N, Phelan PE, Harris C, Langevin J, Nelson B, Sawyer K (2018) Past visions, current trends, and future context: a review of building energy, carbon, and sustainability. Renew Sustain Energy Rev 82:976–993

    Article  Google Scholar 

  • Watkins CJCH (1989) Learning from delayed rewards PhD thesis. University of Cambridge

    Google Scholar 

  • Wei T, Wang Y, Zhu Q (2017) Deep reinforcement learning for building HVAC control. In: Proceedings of the 54th annual design automation conference 2017 on—DAC’17, pp 1–6

    Google Scholar 

  • Wenqi G, Zhou M (2009) Technologies toward thermal comfort-based and energyefficient HVAC systems: a review. In: 2009 IEEE international conference on systems, man and cybernetics, pp 3883–3888

    Google Scholar 

  • Xu X, He H, Hu D (2002) Efficient reinforcement learning using recursive leastsquares methods. J Artif Intell Res 16:259–292. https://doi.org/10.1613/jair.946

    Article  Google Scholar 

  • Yan D, Hong T, Dong B, Mahdavi A, D’Oca S, Gaetani I et al (2017) IEA EBC Annex 66: definition and simulation of occupant behaviour in buildings. Energy Build 156:258–270

    Article  Google Scholar 

  • Yang R, Wang L (2012) Multi-objective optimization for decision-making of energy and comfort management in building automation and control. Sustain Cities Soc 2(1):1–7

    Article  Google Scholar 

  • Yang R, Wang L (2013) Multi-zone building energy management using intelligent control and optimization. Sustain Cities Soc 6:16–21

    Article  Google Scholar 

  • Yang L, Nagy Z, Goffin P, Schlueter A (2015) Reinforcement learning for optimal control of low exergy buildings. Appl Energy 156:577–586

    Article  Google Scholar 

  • Ye W, Zhang X, Gao J, Cao G, Zhou X, Su X (2017a) Indoor air pollutants, ventilation rate determinants and potential control strategies in Chinese dwellings: a literature review. Science Total Environ 586:696–729

    Article  Google Scholar 

  • Ye D, Zhang M, Vasilakos AV (2017b) A survey of self-organisation mechanisms in multi-agent systems. IEEE Trans Syst Man Cybern Syst 47(3):441–461

    Article  Google Scholar 

  • Yu Z, Dexter A (2010) Online tuning of a supervisory fuzzy controller for low-energy building system using reinforcement learning. Control Eng Pract 18(5):532–539

    Article  Google Scholar 

  • Zalejska-Jonsson A, Wilhelmsson M (2013) Impact of perceived indoor environment quality on overall satisfaction in Swedish dwellings. Build Environ 63:134–144

    Article  Google Scholar 

  • Zhang Z, Lam KP (2018) Practical implementation and evaluation of deep reinforcement learning control for a radiant heating system. In: Proceedings of the 5th conference on systems for built environments—BuildSys’ 18, pp 148–157

    Google Scholar 

  • Zhang Z, Chong A, Pan Y, Zhang C, Lu S, Lam KP (2018) A deep reinforcement learning approach to using whole building energy model for HVAC optimal control. In: Presented at the 2018 building performance modeling conference and simbuild co-organized by ASHRAE and IBPSA-USA

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mengjie Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Han, M., May, R., Zhang, X. (2021). Reinforcement Learning Methodologies for Controlling Occupant Comfort in Buildings. In: Zhang, X. (eds) Data-driven Analytics for Sustainable Buildings and Cities. Sustainable Development Goals Series. Springer, Singapore. https://doi.org/10.1007/978-981-16-2778-1_9

Download citation

Publish with us

Policies and ethics