Binary versus Multiclass Deep Learning Modelling in Energy Disaggregation

This paper compares two different deep-learning architectures for the use in energy disaggregation and Non-Intrusive Load Monitoring. Non-Intrusive Load Monitoring breaks down the aggregated energy consumption into individual appliance consumptions, thus detecting device operation. In detail, the “One versus All” approach, where one deep neural network per appliance is trained, and the “Multi-Output” approach, where the number of output nodes is equal to the number of appliances, are compared to each other. Evaluation is done on a state-of-the-art baseline system using standard performance measures and a set of publicly available datasets out of the REDD database.


Introduction
Due to global warming average temperatures are rising and several techniques for energy reduction have been proposed in order to reduce the total energy consumption. However, to make use of those techniques accurate and fine-grained monitoring of electrical energy consumption is needed [1], since the energy consumption of most households is monitored via monthly aggregated measurements and thus cannot provide real-time feedback. Moreover, according to [2] the largest improvements in terms of energy savings can be made when monitoring energy consumption on device level. The term Non-Intrusive Load Monitoring (NILM) is used to describe the estimation of the power consumption of individual appliances, based on a single measurement on the inlet of a household or building [3]. In contrast to NILM, the term Intrusive Load Monitoring (ILM) is used when multiple sensors are used, usually one per device. ILM compared to NILM has the drawback of higher cost through wiring and data acquisition making it unsuitable for monitoring households where appliances can change. Conversely, NILM has the goal of finding the inverse of the aggregation function through a disaggregation algorithm using as input only the aggregated power consumption which makes it a highly underdetermined problem and thus impossible to solve analytically [4].
Several NILM methodologies based on deep neural networks have been proposed in the literature, e.g. Convolutional Neural Networks (CNNs) [5], Recurrent Neural Networks (RNNs) [6] and Long Short Time Memory (LSTM) [7]. Additionally, combinations of machine learning algorithms for fusion of information [8] and modelling of temporal dynamics [9] have also been proposed, especially for low sampling frequencies [10]. Particularly, these models operate either according to the "One versus All" approach, where one Deep Neural Network (DNN) per appliance is trained or the "Multi-Output" approach, where the number of output nodes is equal to the number of appliances. As it is not clear which architecture leads to better performances a comparison of these two architectures is needed.
The remainder of this paper is organized as follows: In Sect. 6.2 the two NILM systems based on DNNs are presented. In Sect. 6.3 the experimental setup is described and in Sect. 6.4 the evaluation results are presented. Finally, the paper is concluded in Sect. 6.5.

Proposed Architecture
NILM energy disaggregation can be formulated as the task of determining the power consumption on device level based on the measurements of one sensor, within a time window (frame or epoch). Specifically, for a set of M − 1 known devices each consuming power p m , with 1 ≤ m ≤ M − 1, the aggregated power p agg measured by the sensor will be [11]: where g = p M is a 'ghost' power consumption usually consumed by one or more unknown devices. In NILM the goal is to find estimationsp m ,ĝ of the power consumption of each device m using an estimation method f −1 with minimal estimation error [11], i.e. where p agg is the aggregated power consumption, p m the power consumption of m-th device with p g = p M being the 'ghost' power consumption,P = {P m ,P g } the estimates of the per device power consumptions, f −1 an estimation method and g() a function transforming a time window of the aggregated power consumption into a multidimensional feature vector F ∈ R N . The block diagram of the NILM architecture adopted in the present evaluation is illustrated in Fig. 6.1 and consists of three stages, namely the pre-processing, feature extraction and appliance detection.
In detail, the aggregated power consumption signal calculated from a smart meter is initially pre-processed i.e. passed through a median filter [12] and then frame blocked in time frames. After pre-processing feature vectors, F of dimensionality N , one for each frame are calculated. In the appliance detection stage, the feature vectors are processed by a regression algorithm using a set of pre-trained appliance models to estimate the power consumption of each device. The output of the regression algorithm estimates the corresponding device consumption and a set of thresholds, T m with 1 ≤ m ≤ M with T g = T M , for each device including the ghost device (m = M) is used to decide whether a device is switched on or off. In the present evaluation the estimation method is implemented using two different deep-learning architectures as shown in Fig. 6.2.
As can be seen in Fig. 6.2 the two architectures only differ in their number of output nodes with architecture (a) using a single output node and one DNN per device and architecture (b) using one output node per device and a single DNN for all devices.

Experimental Setup
The NILM architecture presented in Sect. 6.2 was evaluated using several publicly available datasets and a deep neural network for regression.

a. Datasets
To evaluate performance five different datasets of the REDD [13] database were used. The REDD database was chosen as it contains power consumption measurements per device as well as the aggregated consumption. The REDD-5 dataset was excluded as its measurement duration is significantly shorter than the rest of the datasets in the REDD database [14]. The evaluated datasets and their characteristics are tabulated in Table 6.1 with the number of appliances denoted in the column #App. In the same column, the number of appliances in brackets is the number of appliances after excluding devices with power consumption below 25 W, which were added to the power of the ghost device, similarly to the experimental setup followed in [15]. The next three columns in Table 6.1 are listing the sampling period T s , the duration T of the aggregated signal used and the appliance type for each evaluated dataset. The appliances type categorization is based on their operation as described in [11].

b. Pre-processing and Parameterization
During pre-processing the aggregated signal was processed by a median filter of 5 samples as proposed in [12] and then was frame blocked in frames of 10 samples with overlap between successive frames equal to 50% (i.e. 5 samples). Specifically, raw samples have been used at the input stage of the DNN, thus F 1,...,N being the raw samples of each frame respectively. Furthermore, the number of hidden layers for each architecture was optimized using a bootstrap training dataset resulting into an architecture with 3 hidden layers and 32 sigmoid nodes for (a) and 2 hidden layers and 32 sigmoid nodes for (b).

Experimental Results
The NILM architecture presented in Sect. 6.2 was evaluated according to the experimental setup described in Sect. 6.3. The performance was evaluated in terms of appliance power estimation accuracy (E ACC ), as proposed in [13] and defined in Eq. 6.3. The accuracy estimation is considering the estimated powerp m for each device m, where T is the number of frames and M is the number of disaggregated devices.
To compare the two architectures the publicly available REDD database is used [13]. The results are tabulated in Table 6.2.
As can be seen in Table 6.2 the performance of datasets with smaller number of appliances (e.g. REDD-2/6) is significantly higher than for the datasets with higher number of appliances (e.g. REDD-1/3/4). Furthermore, the "One versus All" approach slightly outperforms the "Multi-Output" approach performing 0.36% better on average. However, it has to be mentioned that the "One versus All" approach requires the training of M deep neural networks resulting into significantly higher training times.

Conclusion
In this paper two different deep learning architectures for non-intrusive load monitoring were compared. Specifically, the "One versus All" approach using one deep neural network per appliance was compared to the "Multi-Output" approach using one deep neural network with the same number of output nodes than the number of appliances. It was shown, that both architectures have similar performance with average accuracies of 76.6% for the "One versus All" approach and 76.3% for the "Multi-Output" approach respectively. However, in terms of training time it must be considered, that for the "One versus All" approach M deep neural networks must be trained, resulting in significant higher training times.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.