Abstract
Deep reinforcement learning (DRL) has emerged as a powerful tool for controlling complex systems, by combining deep neural networks with reinforcement learning techniques. However, due to the black-box nature of these algorithms, the resulting control policies can be difficult to understand from a human perspective. This limitation is particularly relevant in real-world scenarios, where an understanding of the controller is required for reliability and safety reasons. In this paper we investigate the application of DRL methods for controlling the heating, ventilation and air-conditioning (HVAC) system of a building, and we propose an Explainable Artificial Intelligence (XAI) approach to provide interpretability to these models. This is accomplished by combining different XAI methods including surrogate models, Shapley values, and counterfactual examples. We show the results of the DRL-based controller in terms of energy consumption and thermal comfort and provide insights and explainability to the underlying control strategy using this XAI layer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
ASHRAE: Guideline 36–2021: High Performance Sequences of Operation for HVAC Systems. ASHRAE (2021)
Azuatalam, D., Lee, W.L., de Nijs, F., Liebman, A.: Reinforcement learning for whole-building HVAC control and demand response. Energy AI 2, 100020 (2020). https://doi.org/10.1016/j.egyai.2020.100020
Barredo Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012. https://www.sciencedirect.com/science/article/pii/S1566253519308103
Barrett, E., Linder, S.: Autonomous HVAC control, a reinforcement learning approach. In: Bifet, A., et al. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 3–19. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_1
Bastani, O., Kim, C., Bastani, H.: Interpretability via model extraction (2018). https://doi.org/10.48550/arXiv.1706.09773
Biemann, M., Scheller, F., Liu, X., Huang, L.: Experimental evaluation of model-free reinforcement learning algorithms for continuous HVAC control. Appl. Energy 298, 117164 (2021). https://doi.org/10.1016/j.apenergy.2021.117164
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.: Classification and Regression Trees. Chapman and Hall/CRC, Wadsworth, Belmont, CA (1984). https://doi.org/10.1201/9781315139470
Cho, S., Park, C.S.: Rule reduction for control of a building cooling system using explainable AI. J. Build. Perform. Simul. 15(6), 832–847 (2022). https://doi.org/10.1080/19401493.2022.2103586
Dulac-Arnold, G., et al.: Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach. Learn. 110(9), 2419–2468 (2021). https://doi.org/10.1007/s10994-021-05961-4
Dulac-Arnold, G., Mankowitz, D., Hester, T.: Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901 (2019)
Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Fu, Q., Han, Z., Chen, J., Lu, Y., Wu, H., Wang, Y.: Applications of reinforcement learning for building energy efficiency control: a review. J. Build. Eng. (2022). https://doi.org/10.1016/j.jobe.2022.104165
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1582–1591 (2018). https://doi.org/10.48550/arXiv.1802.09477
Geng, G., Geary, G.: On performance and tuning of PID controllers in HVAC systems. In: Proceedings of IEEE International Conference on Control and Applications, pp. 819–824. IEEE (1993). https://doi.org/10.1109/CCA.1993.348229
Gomez-Romero, J., et al.: A probabilistic algorithm for predictive control with full-complexity models in non-residential buildings. IEEE Access 7, 38748–38765 (2019)
Guo, W., Wu, X., Khan, U., Xing, X.: Edge: explaining deep reinforcement learning policies. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 12222–12236. Curran Associates, Inc. (2021). https://proceedings.neurips.cc/paper_files/paper/2021/file/65c89f5a9501a04c073b354f03791b1f-Paper.pdf
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018). https://doi.org/10.48550/arXiv.1801.01290
Heuillet, A., Couthouis, F., Díaz-Rodríguez, N.: Explainability in deep reinforcement learning. Knowl.-Based Syst. 214, 106685 (2021). https://doi.org/10.1016/j.knosys.2020.106685
International Energy Agency: Tracking buildings (2021). https://www.iea.org/reports/tracking-buildings-2021
Jiménez-Raboso, J., Campoy-Nieves, A., Manjavacas-Lucas, A., Gómez-Romero, J., Molina-Solana, M.: Sinergym: a building simulation and control framework for training reinforcement learning agents. In: Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, pp. 319–323. Association for Computing Machinery, New York, USA (2021). https://doi.org/10.1145/3486611.3488729
Juozapaitis, Z., Koul, A., Fern, A., Erwig, M., Doshi-Velez, F.: Explainable reinforcement learning via reward decomposition. In: Proceedings at the International Joint Conference on Artificial Intelligence. A Workshop on Explainable Artificial Intelligence (2019)
Krause, J., Perer, A., Ng, K.: Interacting with predictions: visual inspection of black-box machine learning models. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 5686–5697. CHI 2016, Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2858036.2858529
Landajuela, M., et al.: Discovering symbolic policies with deep reinforcement learning. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 5979–5989. PMLR (2021). https://proceedings.mlr.press/v139/landajuela21a.html
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015). https://doi.org/10.48550/arXiv.1509.02971
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 4765–4774. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
Madhikermi, M., Malhi, A.K., Främling, K.: Explainable artificial intelligence based heat recycler fault detection in air handling unit. In: Calvaresi, D., Najjar, A., Schumacher, M., Främling, K. (eds.) EXTRAAMAS 2019. LNCS (LNAI), vol. 11763, pp. 110–125. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30391-4_7
Madumal, P., Miller, T., Sonenberg, L., Vetere, F.: Explainable reinforcement learning through a causal lens (2019). https://doi.org/10.48550/arXiv.1905.10958
Meas, M., et al.: Explainability and transparency of classifiers for air-handling unit faults using explainable artificial intelligence (XAI). Sensors 22(17), 6338 (2022). https://doi.org/10.3390/s22176338
Milani, S., Topin, N., Veloso, M., Fang, F.: A survey of explainable reinforcement learning (2022). https://doi.org/10.48550/arXiv.2202.08434
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR (2016). https://doi.org/10.48550/arXiv.1602.01783
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013). https://doi.org/10.48550/arXiv.1312.5602
Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.R.: Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn. 65, 211–222 (2017). https://doi.org/10.1016/j.patcog.2016.11.008. https://www.sciencedirect.com/science/article/pii/S0031320316303582
Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 607–617. FAT 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3351095.3372850
Papadopoulos, S., Kontokosta, C.E., Vlachokostas, A., Azar, E.: Rethinking HVAC temperature setpoints in commercial buildings: the potential for zero-cost energy savings and comfort improvement in different climates. Build. Environ. 155, 350–359 (2019). https://doi.org/10.1016/j.buildenv.2019.03.062
Papernot, N., McDaniel, P.: Deep k-nearest neighbors: towards confident, interpretable and robust deep learning (2018). https://doi.org/10.48550/arXiv.1803.04765
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Perera, A., Kamalaruban, P.: Applications of reinforcement learning in energy systems. Renew. Sustain. Energy Rev. 137, 110618 (2021). https://doi.org/10.1016/j.rser.2020.110618
Pérez-Lombard, L., Ortiz, J., Pout, C.: A review on buildings energy consumption information. Energy Build. 40(3), 394–398 (2008). https://doi.org/10.1016/j.enbuild.2007.03.007
Qing, Y., Liu, S., Song, J., Wang, H., Song, M.: A survey on explainable reinforcement learning: concepts, algorithms, challenges (2022). https://doi.org/10.48550/arXiv.2211.06665
Ribeiro, M., Singh, S., Guestrin, C.: “why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 97–101. Association for Computational Linguistics, San Diego, California (2016). https://doi.org/10.18653/v1/N16-3020. https://aclanthology.org/N16-3020
Roth, A.M., Topin, N., Jamshidi, P., Veloso, M.: Conservative q-improvement: reinforcement learning for an interpretable decision-tree policy (2019). https://doi.org/10.48550/arXiv.1907.01180
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization (2017). https://doi.org/10.48550/arXiv.1502.05477
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017). https://doi.org/10.48550/arXiv.1707.06347
Shapley, L.S.: A Value for n-Person Games, pp. 307–318. Princeton University Press, Princeton (1953). https://doi.org/10.1515/9781400881970-018
Shuai, H., He, H.: Online scheduling of a residential microgrid via Monte-Carlo tree search and a learned model. IEEE Trans. Smart Grid 12(2), 1073–1087 (2021). https://doi.org/10.1109/TSG.2020.3035127
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018). https://doi.org/10.1109/tnn.1998.712192
Valladares, W., et al.: Energy optimization associated with thermal comfort and indoor air control via a deep reinforcement learning algorithm. Build. Environ. 155, 105–117 (2019)
Vázquez-Canteli, J.R., Ulyanin, S., Kämpf, J., Nagy, Z.: Fusing TensorFlow with building energy simulation for intelligent energy management in smart cities. Sustain. Urban Areas 45, 243–257 (2019). https://doi.org/10.1016/j.scs.2018.11.021
Vouros, G.A.: Explainable deep reinforcement learning: state of the art and challenges. ACM Comput. Surv. 55(5), 92:1–92:39 (2022). https://doi.org/10.1145/3527448
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Yang, Y., Srinivasan, S., Hu, G., Spanos, C.J.: Distributed control of multizone HVAC systems considering indoor air quality. IEEE Trans. Control Syst. Technol. 29(6), 2586–2597 (2021). https://doi.org/10.1109/TCST.2020.3047407
Yao, Y., Shekhar, D.K.: State of the art review on model predictive control (MPC) in heating ventilation and air-conditioning (HVAC) field. Build. Environ. 200, 107952 (2021). https://doi.org/10.1016/j.buildenv.2021.107952
Yu, L., Qin, S., Zhang, M., Shen, C., Jiang, T., Guan, X.: A review of deep reinforcement learning for smart building energy management. IEEE Internet Things J. (2021). https://doi.org/10.1109/JIOT.2021.3078462
Zhang, K., Zhang, J., Xu, P.D., Gao, T., Gao, D.W.: Explainable AI in deep reinforcement learning models for power system emergency control. IEEE Trans. Comput. Soc. Syst. 9(2), 419–427 (2022). https://doi.org/10.1109/TCSS.2021.3096824
Zhang, Z., Chong, A., Pan, Y., Zhang, C., Lam, K.P.: Whole building energy model for HVAC optimal control: a practical framework based on deep reinforcement learning. Energy Build. 199, 472–490 (2019)
Zhong, X., Zhang, Z., Zhang, R., Zhang, C.: End-to-end deep reinforcement learning control for HVAC systems in office buildings. Designs 6(3), 52 (2022). https://doi.org/10.3390/designs6030052
Acknowledgements
This work has been partially funded by the European Union – NextGenerationEU (IA4TES project, MIA.2021.M04.0008), Junta de Andalucía (D3S project, 30.BG.29.03.01 – P21.00247) and the Spanish Ministry of Science (SPEEDY, TED2021.130454B.I00). A. Manjavacas is also funded by FEDER/Junta de Andalucía (IFMIF-DONES project, SE21_UGR_IFMIF-DONES).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 A. Observation space description
1.2 B. PPO hyperparameters
The trained PPO model has the default architecture of the ActorCriticPolicy class of StableBaselines3. This architecture is similar for policy and value nets, and consists of a feature extractor followed by 2 fully-connected hidden layers with 64 units per layer. The activation function is tanh.
The remaining hyperparameters are listed in Table 4.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jiménez-Raboso, J., Manjavacas, A., Campoy-Nieves, A., Molina-Solana, M., Gómez-Romero, J. (2023). Explaining Deep Reinforcement Learning-Based Methods for Control of Building HVAC Systems. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1902. Springer, Cham. https://doi.org/10.1007/978-3-031-44067-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-44067-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44066-3
Online ISBN: 978-3-031-44067-0
eBook Packages: Computer ScienceComputer Science (R0)