1 Introduction

Climate change is one of the bigger challenges we face today, needing immediate and long-term action. In general, climate change affects all regions of the world-posing a threat to global economy, holding negative environmental effects, and bringing worrying health implications. These growing threats require international action to mitigate and minimize their negative effects. Initiatives such as the European Climate Action outline the recorded negative effects of climate change and list urban infrastructure as the key to effectively working towards the goals set forth by the European Union (EU), who is currently responsible for 71% of global gas emissions and thus has a vital role to play [60]. To achieve this, the European Climate Action initiative aims to both decrease EU greenhouse gas emissions and improve energy efficiency by reducing the amount of primary energy consumed. It also aims to find sustainable solutions from an environmental as well as an economic standpoint.

Within smart buildings, the automation of existing residential as well as commercial buildings (built prior to modern low- or zero-energy buildings) plays a significant role, as such buildings make up the majority of energy consumption. The EU has pointed to the development of efficient building energy management systems as key to achieving the identified objectives due to the fact that buildings account for 40% of energy consumption and 36% of total CO\(_2\) emissions within the EU [3, 72]. The majority of energy in those buildings is consumed by Heating, Ventilation, and Air Conditioning (HVAC) systems, which have strong impact on households comfort as well as on the environment [29].

Increasing affordability as well as rising temperatures have meant that HVAC systems are increasingly being used to improve comfort and thus quality of everyday life. At the same time, such systems can quickly consume a considerable amount of energy. Particularly for systems with limited intelligent behavior, energy efficiency is not emphasized, and simple matters quickly add up to energy waste—such as a household forgetting to turn off an air conditioner before going to work or systems not adapting when the weather changes by, for instance, turning off when not needed.

In traditional (non-smart) buildings, users (residents) are responsible for monitoring and controlling available devices. However, contemporary smart buildings are increasingly equipped with Internet of Things (IoT) devices and objects such as sensors, actuators, connected air conditioners, and heaters. In such buildings, unlike traditional buildings, IoT devices collaborate to automatically adjust temperature and optimize the use of HVAC systems, for instance, by forecasting the indoor temperature and generating plans for tuning HVAC devices to optimize energy consumption.

Previous studies have shown that Machine Learning (ML) algorithms can be exploited to model most of the systems in the smart buildings. In particular, ML can be used to model the current HVAC systems [23] to improve energy efficiency and reduce consumption in such buildings. ML is a sub-field of Artificial Intelligence (AI) that combines a set of mathematical algorithms to give systems the ability to learn automatically and improve the experience without being explicitly programmed [4]. Nowadays, ML is widely used in many fields, including health care, public transportation, and smart cities systems [2, 16, 45]. ML is divided into several categories based on the learning method, such as supervised, semi-supervised, unsupervised, and enforced learning. In this paper we will be using supervised learning, which is divided into two main branches; classification [7, 28, 62] and regression [1, 27] depending on the problem that needs to be solved. In our case we will be using regression to forecast the indoor temperature.

In this paper, we describe an experiment that compares 36 offline ML algorithms used for forecasting the indoor temperature for three consecutive hours in a smart building. A real dataset was collected from the CiTIUS research center and the closest weather station sensor measurements that belongs to different winter periods with different weather conditions as reported in Table 1. All algorithms were evaluated based on their accuracy, performance, and robustness to weather changes. The main aim of this study is to find the most suitable ML algorithm in terms of the performance and robustness that can be integrated into building management systems (BMS) to improve building energy efficiency. Specifically to tune HVAC system parameters taking into consideration user comfort levels and reduction of energy consumption. We concluded that increasing the forecasting time does not decrease the accuracy of the best model. Moreover, we found that the difference between the obtained results for three consecutive forecasting hours is insignificant (around 0.01) for both R-coefficient and RMSE; This means that the increase of the horizon does not rapidly affect the accuracy of extraTrees.

The remainder of this paper is organized as follows. In the next Sect. 2, we review existing studies to forecast the building’s indoor temperature using different ML algorithms. In Sect. 3, we describe the dataset we used to develop the experiments and explain the ML algorithms used to develop the experiments. Section 4 shows the results and discussion. Finally, Sect. 5 draws the conclusions and outlines of the future work.

2 Related research

Previous studies have determined that the HVAC systems have the highest energy demand in a building. Therefore, managing HVAC systems in current buildings should be addressed to improve energy efficiency by improving energy plans. In particular, developing a ML model that considers the surrounding factors is necessary to configure the best HVAC system parameters. Those parameters have a relevant impact on both energy consumption and user comfort [23]. The ML model Artificial Neural Network (ANN) is widely used for indoor temperature forecasting. Nivine et al. [8] proposed a new approach to forecast the indoor temperature up to 4 h based on ANN by considering the outdoor parameters. Further, Kwok et al. [44] modulated the cooling load in a smart building by incorporating a Neural Network (NN) into an intelligent model that allows forecasting and examining the energy demand of the building as well as determining the critical factors that impact on energy consumption. The study reveals that the building occupancy is a significant factor in forecasting the cooling load of the HVAC system. In [54], the authors studied the impact of both users’ activities and their behaviors on potential energy saving in smart buildings. The authors classified the user as the most important factor and divided the user impact on energy demands into three main subsystems: HVAC, light, and plug load systems.

Moreover, Varick et al. [25] used real-time data to study building occupancy and its influence on energy saving. They proposed an occupancy model that could be successfully integrated into the HVAC system in the building through Markov Chains. The study revealed that this model could annually save 42% of consumed energy. Zhao [62] argued that external factors also have a significant influence on a building’s energy performance through reviewing various energy forecast methods implemented into ML algorithms and studying the engineering and statistical techniques utilized to predict a building’s energy consumption [72].

In [48], a new model was developed based on Support Vector Regression (SVR) to predict the hourly cooling load inside office buildings. The model’s hyper-parameters were tuned to get the best temperature forecast. The study compared the developed model with the classical multi-layer perceptron neural network (MLP) and showed that the SVR outperforms the MLP in both accuracy and mean squared error (MSE). In addition, Dong [21] examined the feasibility of forecasting building energy consumption by applying SVR for regression and determined the impact of different SVR parameters on the prediction accuracy. The study exposes that SVR obtained the highest accuracy compared with other relevant research approaches using genetic programming and neural networks. Previous studies addressed external weather conditions and their influence on indoor temperature through autoregressive model (ARX) and autoregressive moving average model (ARMAX). The selection of the suitable structure of both models has been determined to obtain the best prediction accuracy. These models can become a flexible controller because of their dynamic structure, which permits to increase the user’s comfort level inside the building and to improve the energy efficiency of HVAC systems [58]. The outcomes exhibited that the ARX model achieved the best forecasting accuracy.

Sülo et al. [67] developed a deep learning model to predict the energy consumption value of each building resides in the City University of New York (CUNY) campuses. Each one of those buildings has different energy expenditures. Where, the optimal conditions and forecasting the future energy usage of those buildings have been investigated to determine the loss of energy, using long short-term memory (LSTM) Neural Networks models. The experiments were conducted using time series data that were collected from several campuses of CUNY. Furthermore, Xu et al. [18] used an LSTM deep learning model to forecast the indoor temperature for 5 and 30 min in advance. The LSTM model was compared to three standard ML models Back Propagation Neural Network (BPNN), Support Vector Machine (SVM) and Decision Tree (DT), which it outperformed. In [42] Jin et al. used deep learning to forecast the optimal indoor temperature with the aim to adjust the air conditioner automatically without any user interference.

Abdullatif et al. [10] proposed a cooling load forecasting model for buildings, utilizing the generalized regression neural network (GRNN) taking into consideration the building orientational characteristics and occupancy in order to optimize the thermal energy storage of the HVAC.

Catalina [14] developed polynomial regression models based on neural networks to predict the monthly heating demand for residential buildings, considering the residential constructional structure. Catalina used 270 different scenarios to validate the developed models to find the best approach. Several other recent investigations proposed models using different ML algorithms for forecasting a building’s energy consumption [24, 53, 61, 72]. In these studies, various external factors were considered, such as building structure, orientation, isolation, and environmental variables. The statistical results showed that these factors have a significant influence on indoor temperature prediction and energy consumption in a building. Kangji et al. [47] developed a GA-ANFIS model to predict the indoor building temperature. This approach obtained the optimal configuration of subtractive clusters, using a genetic algorithm (GA) to optimize the fuzzy if-then rule base. The adaptive network-based fuzzy inference system (ANFIS) adjusted the premise and subsequent parameters to match the training data. The results showed that GA-ANFIS obtained higher performance levels compared to neural networks in terms of prediction accuracy.

Recently, Rodríguez-Mier et al. [59] used FRULER-GFS (fuzzy rule learning through evolution for regression-genetic fuzzy system) to develop a rule-based model for forecasting indoor temperature. The knowledge bases learned by FRULER include Takagi-Sugeno-Kang fuzzy rules that correctly predict the temperature dynamics measured by several different predictors obtained from both inside and outside the building. The experiment results demonstrated that FRULER-GFS had the best accuracy rate compared with ElasticNet and random forest regressors [59].

Further, Doukas et al. [22] developed an integrated decision support system based on rule sets. Their study aimed at improving the energy management system of a building. Their system allowed central control over energy consumption in the building, which made it exceptionally flexible. Furthermore, they created a reliable energy profile using expert knowledge in the system. The HVAC control optimization (On/Off) provided the system with the capability to recognize and discard any wrong decision. The study confirmed that expert experience has a notable impact on improving the building energy management system.

When reviewing previous studies on improving energy efficiency of HVAC systems in smart buildings, none has compared a large set of ML algorithms to predict the indoor temperature of buildings. This study provides a baseline for future studies on forecasting the indoor temperatures in smart buildings using ML algorithms. All models developed have been trained using the same settings for different weather conditions to check the robustness and the performance of these algorithms.

3 Experiments

3.1 Experiments setups

As part of the European OPERE project [26], which aims at improving the energy management system of the Universidade de Santiago de Compostela (USC), the USC has deployed sensors in 45 university buildings. In this paper, we conducted experiments considering one of those smart buildings, called Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), using a medium-sized sensor network. The network collects and reports sensor readings as illustrated in Table 1. It produces 667 signals every 10 s.

The dataset we used to develop the experiments composed both the sensor measurements linked to the CiTIUS HVAC system and weather data collected from the closest Meteogalicia weather station. The CiTIUS building has two functionality modes: winter and summer modes. The dataset patterns were retrieved every 10 min during two different time periods: from October 1, 2015, to March 31, 2016 (26,321 patterns), and from November 1, 2016, to January 31, 2017 (13,083 patterns). Both periods correspond to the HVAC winter working mode, which has the highest energy demand. It must be noted that the second period corresponds to an unusually dry winter season in Galicia. Thus, the weather conditions in both periods are different enough.

Each dataset pattern comprises 10 features, seven of them are provided by the CiTIUS and the rest by Metogalicia weather station. Each variable indicates a measurable phenomenon that can reduce the energy demand for heating and cooling the building; these features are described in Table 1.

Table 1 Pattern features, where (*) represents features from CiTIUS, and (+) symbolizes features from Meteogalicia

3.2 Machine learning algorithms

In this paper, we compared 36 batch learning algorithms belonging to 20 different families (as listed in Table 2) [27]. All algorithms were selected based on the recommendation of the study conducted by Sirsat et al. [63]. The main purpose of the experiment was to identify which of those algorithms is the most accurate to forecast the indoor temperature of the studied building. The majority of the algorithms were selected from the Classification and Regression Training packageFootnote 1 in the statistical computing language R.Footnote 2

The experiments for each algorithm were repeated 10 times using different seeds generated randomly. The data partitions were generated randomly in such a way that 70%, 15%, and 15% of the patterns were used for training, validating, and testing the models, respectively. For each algorithm, the hyper-parameters were tuned using the values reported in Table 3. The selected final values for the hyper-parameter are those that maximize the average performance over the validation sets.

Furthermore, we implemented three more popular methods using other platforms: support vector regression (SVR) using the LibSVM library was implemented in C++,Footnote 3 and Generalized Regression Neural Network (GRNN) and Extreme Learning Machine (ELM) with Gaussian kernels were both implemented in MATLAB.Footnote 4 Moreover, we trained the regressors by exploiting the values reported in Table 3, and stated in the R package documentation to tune the algorithm hyper-parameters.

Table 2 Regressors considered in this work, grouped by families
Table 3 List of the regressors, with their tunable hyper-parameters (tried values and packages)
Table 4 Friedman rank of the RMSE (left) and R-coefficient (right)

We then evaluated the tested algorithms’ performance using Pearson correlation (R-coefficient) that falls between \((+1, -1)\), shown in Eq. 1, and the Root Mean Squared Error (RMSE), shown in Eq. 2.

$$\begin{aligned} \rho ({\hat{Y}},Y)= \frac{1}{N-1} \sum _{i=1}^{N}{\left( \frac{\overline{{\hat{Y}}_i-\mu _{{\hat{Y}}}}}{\sigma _{{\hat{Y}}}}\right) \left( \frac{Y_i-\mu _Y}{\sigma _Y}\right) } \end{aligned}$$
(1)

where \(\mu _{{\hat{Y}}}\) and \(\sigma _{{\hat{Y}}}\) are the mean and standard deviation of the predicted temperature \({\hat{Y}}\), while \(\mu _{Y}\) and \(\sigma _{Y}\) are the mean and the standard deviation of the real temperature Y, and N is the number of test patterns.

$$\begin{aligned} RMSE= \sqrt{\frac{1}{N} \sum _{i=1}^{N}({\hat{Y}}_i - Y_i)^2} \end{aligned}$$
(2)

The final regressor performance matrices were computed in the developed experiments by taking the average of both RMSE and R-coefficient over the 10 repetitions.

4 Results and discussion

Satisfying users by achieving and maintaining their comfort levels and optimizing energy consumption inside buildings should be core aspects when realizing smart buildings. This requires developing accurate and reliable HVAC systems that are automatically adaptable to different weather conditions. Towards achieving this goal, we compared 36 ML algorithms, over a real data set, to predict the indoor temperature in the CiTIUS office. The results can be utilized to generate energy plans that tune the HVAC system parameters and consequently both increase user satisfaction and optimize energy consumption. We plan to address those aspects in our future work.

In the performed experiments, we calculated the Friedman ranks [35] for both RMSE and R-coefficient for all regressors (see Table 4). The Friedman test is a non-parametric statistical test. Similar to the parametric repeated measures ANOVA, it compares three or more matched or paired groups. It scores the values in each matched row in ascending order, where each row is ranked individually. It then sums the ranks in each column [34]. This test determined the actual position of each algorithm on average over all the horizons. The regressors must be sorted in a descending order based on their performance on each data set (e.g., by increasing RMSE or by decreasing R-coefficient), and the Friedman rank of each regressor is its average position over the horizons. Figure 1 illustrates the Friedman rank for both MSE and R-coefficient in ascending order (i.e., by decreasing performance). The best results were achieved by two regressors that belong to the random forest family (ExtraTrees and RF) in both performance measurements. Generally, both figures are quite similar, with small changes in some regressor positions. Table 4 summarizes the Friedman ranks of both the MSE and the R-coefficient average for each regressor, and it clearly shows the small change in the position over all three horizons. Namely, the algorithms fall between the 24th and 28th positions and also between the Bayesian regularized neural network (Brnn) and the generalized boosting model (Gbm).

Figure 2 shows the average R-coefficient of the most reliable 20 regressors over the three prediction horizons, sorted decreasingly. The highest R-coefficients are achieved by extremely randomized regression trees (ExtraTrees)—with the accuracy R-coefficient (0.97) and the lowest RMSE average (0.058) as reported in Table 4—followed by Rf, Cubist, BstTree, and AvNNet. The Figure also shows that all the algorithms that appear in the top 10 list belong to random forest family, and the accuracy obtained by Qrf is quite similar to the Bayesian model (Brnn) and Support Vector Regression (Svr). On the other hand, NNLS, Lasso, and Bstlm are at the bottom of the top 20 list, with good performance in terms of R-coefficient (around 0.94 over all horizons).

Fig. 1
figure 1

Friedman rank of R-coefficient (upper panel) and RMSE (lower panel) for the 20 best regressors

Fig. 2
figure 2

Average values of R-coefficient over the data sets of the 20 best regressors to forecast three consecutive hours

These results (ploted in Fig. 2) are quite similar to the Friedman rank of R-coefficient shown in Fig. 1. The BstTree is substituted with AvNNet, so they come in 4th and 5th position, respectively. Moreover, Bag and Grnn algorithms swap positions, becoming 12th and 13th, respectively. Regarding the last three positions, NNLS has improved its position. Unfortunately, Earth and Bagearth regressors disappeared from the top 20, while lasso and BstLm replaced them in the 19th and 20th position.

Table 5 The best R-coefficient and RMSE are achieved by extraTrees for the forecasting horizon

The outcomes of this comparative experiment are as follows: the extraTrees algorithm achieved the highest accuracy for the three prediction horizons in terms of Friedman rank, average values of RMSE, and R-coefficient (Table 5). ExtraTrees is less sensitive to noise and outlier values while ANN models are more sensitive, which means that extraTree is more robust. Moreover, the difference between the obtained results for three consecutive forecasting hours is quite small (around 0.01) for both R-coefficient and RMSE; this means the increase of the horizon does not rapidly affect the extraTrees accuracy. Other regressors with good performance are random forest, cubist, gradient boosting of regression trees (bstTree), average neural network committee (avNNet), and kernel ELM (elm-kernel).

There is a high agreement between average values and Friedman ranks in the results. This comparison might be useful for indoor temperature prediction for any smart building, which facilitates building a ML forecast model to improve the energy efficiency, reduce energy consumption, and manage a building’s assets.

Threats to validity A potential threat is that our results may not be valid in all HVAC systems. As we have not made any particular assumptions, and as the HVAC does not have any unique features, we believe that our results can be generalized to most other HVAC Systems. However, further research is needed to confirm this. Our study may have been internally biased from the settings of the experiments because the data was collected during winter periods in two different years with different weather conditions. Testing all algorithms using data collected during summer periods may produce different results, however, based on previous studies, the ExtraTree will obtain the best results in all scenarios [63]. Moreover, the algorithm hyper-parameters values were tuned according to the default settings shown in the Table 3 used in our study and the results are quite good. However, if we search for the optimal values of those parameters which will affect the learning process, we may get a slight improvement in the accuracy of the algorithms. The experiments were repeated 10 times to make it statistically significant, and the mean was calculated to ensure the result was correct and avoid any execution errors.

5 Conclusions and future work

In this paper, we compared a set of 36 ML algorithms that belong to 20 different families to forecast the indoor temperature for three consecutive hours using real data collected from both a smart building and a weather station every 10 min. This comparison showed that the ExtraTrees algorithm performs best in terms of both the R-coefficient (0.97%) and RMSE (0,058%); it also ranks the highest according to the Friedman test. Other algorithms performed well are the random forest, averaged neural network (AvNNet), cubist, gradient boosted machines with regression trees, extreme learning machine with Gaussian kernels, and support vector machine for regression. The outcomes of this study show that the extraTrees is more robust to outliers and data noise, while most of the algorithms such as ANN are highly sensitive to data noise. Furthermore, increasing the forecasting time does not decrease the accuracy of the best model. We found that the difference between the obtained results for three consecutive forecasting hours is insignificant (around 0.01) for both R-coefficient and RMSE; this means that the increase of the horizon does not rapidly affect the accuracy of extraTrees. Finally, it is possible to use a standard ML algorithm to forecast the indoor temperature with reasonable accuracy based on weather and sensors data linked to the smart building.

However, more research efforts should be made in the future to optimize the HVAC parameters based on the prediction of the indoor temperature. Researchers need to consider the following: integrating an incremental training and online learning approach to improve the accuracy and the robustness of the identified model. Real time user feedback during the deployment phase (Interactive learning) for new data behavior that will help in improving model efficiency. Raising the forecast horizon for longer time periods (days ahead), considering user satisfaction (comfort level), and energy consumption. Integrating the winner model (ExtraTree) with building management systems and predicting in real-time. Validating the results in other buildings using other sensor data. Finally, addressing possible noise or missing data linked to sensor failure scenarios during the run time.