Skip to main content

Explaining Deep Reinforcement Learning-Based Methods for Control of Building HVAC Systems

  • Conference paper
  • First Online:
Explainable Artificial Intelligence (xAI 2023)

Abstract

Deep reinforcement learning (DRL) has emerged as a powerful tool for controlling complex systems, by combining deep neural networks with reinforcement learning techniques. However, due to the black-box nature of these algorithms, the resulting control policies can be difficult to understand from a human perspective. This limitation is particularly relevant in real-world scenarios, where an understanding of the controller is required for reliability and safety reasons. In this paper we investigate the application of DRL methods for controlling the heating, ventilation and air-conditioning (HVAC) system of a building, and we propose an Explainable Artificial Intelligence (XAI) approach to provide interpretability to these models. This is accomplished by combining different XAI methods including surrogate models, Shapley values, and counterfactual examples. We show the results of the DRL-based controller in terms of energy consumption and thermal comfort and provide insights and explainability to the underlying control strategy using this XAI layer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. ASHRAE: Guideline 36–2021: High Performance Sequences of Operation for HVAC Systems. ASHRAE (2021)

    Google Scholar 

  2. Azuatalam, D., Lee, W.L., de Nijs, F., Liebman, A.: Reinforcement learning for whole-building HVAC control and demand response. Energy AI 2, 100020 (2020). https://doi.org/10.1016/j.egyai.2020.100020

    Article  Google Scholar 

  3. Barredo Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012. https://www.sciencedirect.com/science/article/pii/S1566253519308103

  4. Barrett, E., Linder, S.: Autonomous HVAC control, a reinforcement learning approach. In: Bifet, A., et al. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 3–19. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_1

    Chapter  Google Scholar 

  5. Bastani, O., Kim, C., Bastani, H.: Interpretability via model extraction (2018). https://doi.org/10.48550/arXiv.1706.09773

  6. Biemann, M., Scheller, F., Liu, X., Huang, L.: Experimental evaluation of model-free reinforcement learning algorithms for continuous HVAC control. Appl. Energy 298, 117164 (2021). https://doi.org/10.1016/j.apenergy.2021.117164

    Article  Google Scholar 

  7. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.: Classification and Regression Trees. Chapman and Hall/CRC, Wadsworth, Belmont, CA (1984). https://doi.org/10.1201/9781315139470

  8. Cho, S., Park, C.S.: Rule reduction for control of a building cooling system using explainable AI. J. Build. Perform. Simul. 15(6), 832–847 (2022). https://doi.org/10.1080/19401493.2022.2103586

    Article  Google Scholar 

  9. Dulac-Arnold, G., et al.: Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach. Learn. 110(9), 2419–2468 (2021). https://doi.org/10.1007/s10994-021-05961-4

    Article  MathSciNet  Google Scholar 

  10. Dulac-Arnold, G., Mankowitz, D., Hester, T.: Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901 (2019)

  11. Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  12. Fu, Q., Han, Z., Chen, J., Lu, Y., Wu, H., Wang, Y.: Applications of reinforcement learning for building energy efficiency control: a review. J. Build. Eng. (2022). https://doi.org/10.1016/j.jobe.2022.104165

    Article  Google Scholar 

  13. Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1582–1591 (2018). https://doi.org/10.48550/arXiv.1802.09477

  14. Geng, G., Geary, G.: On performance and tuning of PID controllers in HVAC systems. In: Proceedings of IEEE International Conference on Control and Applications, pp. 819–824. IEEE (1993). https://doi.org/10.1109/CCA.1993.348229

  15. Gomez-Romero, J., et al.: A probabilistic algorithm for predictive control with full-complexity models in non-residential buildings. IEEE Access 7, 38748–38765 (2019)

    Article  Google Scholar 

  16. Guo, W., Wu, X., Khan, U., Xing, X.: Edge: explaining deep reinforcement learning policies. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 12222–12236. Curran Associates, Inc. (2021). https://proceedings.neurips.cc/paper_files/paper/2021/file/65c89f5a9501a04c073b354f03791b1f-Paper.pdf

  17. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018). https://doi.org/10.48550/arXiv.1801.01290

  18. Heuillet, A., Couthouis, F., Díaz-Rodríguez, N.: Explainability in deep reinforcement learning. Knowl.-Based Syst. 214, 106685 (2021). https://doi.org/10.1016/j.knosys.2020.106685

  19. International Energy Agency: Tracking buildings (2021). https://www.iea.org/reports/tracking-buildings-2021

  20. Jiménez-Raboso, J., Campoy-Nieves, A., Manjavacas-Lucas, A., Gómez-Romero, J., Molina-Solana, M.: Sinergym: a building simulation and control framework for training reinforcement learning agents. In: Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, pp. 319–323. Association for Computing Machinery, New York, USA (2021). https://doi.org/10.1145/3486611.3488729

  21. Juozapaitis, Z., Koul, A., Fern, A., Erwig, M., Doshi-Velez, F.: Explainable reinforcement learning via reward decomposition. In: Proceedings at the International Joint Conference on Artificial Intelligence. A Workshop on Explainable Artificial Intelligence (2019)

    Google Scholar 

  22. Krause, J., Perer, A., Ng, K.: Interacting with predictions: visual inspection of black-box machine learning models. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 5686–5697. CHI 2016, Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2858036.2858529

  23. Landajuela, M., et al.: Discovering symbolic policies with deep reinforcement learning. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 5979–5989. PMLR (2021). https://proceedings.mlr.press/v139/landajuela21a.html

  24. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015). https://doi.org/10.48550/arXiv.1509.02971

  25. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 4765–4774. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf

  26. Madhikermi, M., Malhi, A.K., Främling, K.: Explainable artificial intelligence based heat recycler fault detection in air handling unit. In: Calvaresi, D., Najjar, A., Schumacher, M., Främling, K. (eds.) EXTRAAMAS 2019. LNCS (LNAI), vol. 11763, pp. 110–125. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30391-4_7

    Chapter  Google Scholar 

  27. Madumal, P., Miller, T., Sonenberg, L., Vetere, F.: Explainable reinforcement learning through a causal lens (2019). https://doi.org/10.48550/arXiv.1905.10958

  28. Meas, M., et al.: Explainability and transparency of classifiers for air-handling unit faults using explainable artificial intelligence (XAI). Sensors 22(17), 6338 (2022). https://doi.org/10.3390/s22176338

    Article  Google Scholar 

  29. Milani, S., Topin, N., Veloso, M., Fang, F.: A survey of explainable reinforcement learning (2022). https://doi.org/10.48550/arXiv.2202.08434

  30. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR (2016). https://doi.org/10.48550/arXiv.1602.01783

  31. Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013). https://doi.org/10.48550/arXiv.1312.5602

  32. Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.R.: Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn. 65, 211–222 (2017). https://doi.org/10.1016/j.patcog.2016.11.008. https://www.sciencedirect.com/science/article/pii/S0031320316303582

  33. Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 607–617. FAT 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3351095.3372850

  34. Papadopoulos, S., Kontokosta, C.E., Vlachokostas, A., Azar, E.: Rethinking HVAC temperature setpoints in commercial buildings: the potential for zero-cost energy savings and comfort improvement in different climates. Build. Environ. 155, 350–359 (2019). https://doi.org/10.1016/j.buildenv.2019.03.062

    Article  Google Scholar 

  35. Papernot, N., McDaniel, P.: Deep k-nearest neighbors: towards confident, interpretable and robust deep learning (2018). https://doi.org/10.48550/arXiv.1803.04765

  36. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  37. Perera, A., Kamalaruban, P.: Applications of reinforcement learning in energy systems. Renew. Sustain. Energy Rev. 137, 110618 (2021). https://doi.org/10.1016/j.rser.2020.110618

  38. Pérez-Lombard, L., Ortiz, J., Pout, C.: A review on buildings energy consumption information. Energy Build. 40(3), 394–398 (2008). https://doi.org/10.1016/j.enbuild.2007.03.007

    Article  Google Scholar 

  39. Qing, Y., Liu, S., Song, J., Wang, H., Song, M.: A survey on explainable reinforcement learning: concepts, algorithms, challenges (2022). https://doi.org/10.48550/arXiv.2211.06665

  40. Ribeiro, M., Singh, S., Guestrin, C.: “why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 97–101. Association for Computational Linguistics, San Diego, California (2016). https://doi.org/10.18653/v1/N16-3020. https://aclanthology.org/N16-3020

  41. Roth, A.M., Topin, N., Jamshidi, P., Veloso, M.: Conservative q-improvement: reinforcement learning for an interpretable decision-tree policy (2019). https://doi.org/10.48550/arXiv.1907.01180

  42. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization (2017). https://doi.org/10.48550/arXiv.1502.05477

  43. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017). https://doi.org/10.48550/arXiv.1707.06347

  44. Shapley, L.S.: A Value for n-Person Games, pp. 307–318. Princeton University Press, Princeton (1953). https://doi.org/10.1515/9781400881970-018

  45. Shuai, H., He, H.: Online scheduling of a residential microgrid via Monte-Carlo tree search and a learned model. IEEE Trans. Smart Grid 12(2), 1073–1087 (2021). https://doi.org/10.1109/TSG.2020.3035127

    Article  Google Scholar 

  46. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018). https://doi.org/10.1109/tnn.1998.712192

  47. Valladares, W., et al.: Energy optimization associated with thermal comfort and indoor air control via a deep reinforcement learning algorithm. Build. Environ. 155, 105–117 (2019)

    Article  Google Scholar 

  48. Vázquez-Canteli, J.R., Ulyanin, S., Kämpf, J., Nagy, Z.: Fusing TensorFlow with building energy simulation for intelligent energy management in smart cities. Sustain. Urban Areas 45, 243–257 (2019). https://doi.org/10.1016/j.scs.2018.11.021

    Article  Google Scholar 

  49. Vouros, G.A.: Explainable deep reinforcement learning: state of the art and challenges. ACM Comput. Surv. 55(5), 92:1–92:39 (2022). https://doi.org/10.1145/3527448

  50. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    Article  Google Scholar 

  51. Yang, Y., Srinivasan, S., Hu, G., Spanos, C.J.: Distributed control of multizone HVAC systems considering indoor air quality. IEEE Trans. Control Syst. Technol. 29(6), 2586–2597 (2021). https://doi.org/10.1109/TCST.2020.3047407

    Article  Google Scholar 

  52. Yao, Y., Shekhar, D.K.: State of the art review on model predictive control (MPC) in heating ventilation and air-conditioning (HVAC) field. Build. Environ. 200, 107952 (2021). https://doi.org/10.1016/j.buildenv.2021.107952

  53. Yu, L., Qin, S., Zhang, M., Shen, C., Jiang, T., Guan, X.: A review of deep reinforcement learning for smart building energy management. IEEE Internet Things J. (2021). https://doi.org/10.1109/JIOT.2021.3078462

    Article  Google Scholar 

  54. Zhang, K., Zhang, J., Xu, P.D., Gao, T., Gao, D.W.: Explainable AI in deep reinforcement learning models for power system emergency control. IEEE Trans. Comput. Soc. Syst. 9(2), 419–427 (2022). https://doi.org/10.1109/TCSS.2021.3096824

    Article  Google Scholar 

  55. Zhang, Z., Chong, A., Pan, Y., Zhang, C., Lam, K.P.: Whole building energy model for HVAC optimal control: a practical framework based on deep reinforcement learning. Energy Build. 199, 472–490 (2019)

    Article  Google Scholar 

  56. Zhong, X., Zhang, Z., Zhang, R., Zhang, C.: End-to-end deep reinforcement learning control for HVAC systems in office buildings. Designs 6(3), 52 (2022). https://doi.org/10.3390/designs6030052

    Article  Google Scholar 

Download references

Acknowledgements

This work has been partially funded by the European Union – NextGenerationEU (IA4TES project, MIA.2021.M04.0008), Junta de Andalucía (D3S project, 30.BG.29.03.01 – P21.00247) and the Spanish Ministry of Science (SPEEDY, TED2021.130454B.I00). A. Manjavacas is also funded by FEDER/Junta de Andalucía (IFMIF-DONES project, SE21_UGR_IFMIF-DONES).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Jiménez-Raboso .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A. Observation space description

Table 3. Observation space used for training the DRL agent. It is composed of 20 variables normalized into range [0, 1] using min-max strategy.

1.2 B. PPO hyperparameters

The trained PPO model has the default architecture of the ActorCriticPolicy class of StableBaselines3. This architecture is similar for policy and value nets, and consists of a feature extractor followed by 2 fully-connected hidden layers with 64 units per layer. The activation function is tanh.

The remaining hyperparameters are listed in Table 4.

Table 4. Hyperparameters for training PPO model

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiménez-Raboso, J., Manjavacas, A., Campoy-Nieves, A., Molina-Solana, M., Gómez-Romero, J. (2023). Explaining Deep Reinforcement Learning-Based Methods for Control of Building HVAC Systems. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1902. Springer, Cham. https://doi.org/10.1007/978-3-031-44067-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44067-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44066-3

  • Online ISBN: 978-3-031-44067-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics