The main goal of this paper is to assess the utility of differential privacy for smart metering load profiles. That means that the utility of Y for approximating f is studied for real world smart metering data and assessed by visualization and by the relative error (5). The way how the noisy aggregate time profile Y is computed is ignored (for example who determines the sensitivity) and the starting point for the experiments is Eq. (10).
Note that the input data are a time series of real-valued measurements. While the utility of differential privacy has been studied for counting data \(x_{it}\in \{0,1\}\) before, to best of our knowledge this has not been done for real datasets with \(x_{it}\in \mathbb {R}\), especially not for smart metering load profiles.
The overall procedure part can be described as follows and is also illustrated in Fig. 2.
-
Calculate the exact aggregate f from (3).
-
Choose \(\epsilon \) (here \(\epsilon =1\)).
-
Determine the sensitivity S using (13) or (14).
-
Calculate \(\lambda \) from (7) and (15), respectively.
-
Each smart meter adds noise using (9).
-
Calculate the aggregate signal Y using (10).
-
Smooth the aggregate signal.
-
Compare Y with the exact aggregate f.
The whole analysis was performed 20 times. Since the results are very similar for different trials, for sake of clarity a single result is presented. Only for the comparison of the robust maximum with the exact maximum, the average error, averaged over both time and the 20 trials, is presented (the spread over the 20 trials is negligibly small there).
Smart metering datasets
In this work, a real smart metering dataset with data from the Modellregion KöstendorfFootnote 1 is studied. Measurements of 40 households for a period of one year with 5 min time intervals are available. Since the number of 40 households is much to small to demonstrate reasonable utility, different daily profiles of the same household are treated as if they would stem from different households. Ignoring the dependency on the household therefore results in a total of \(N=14{,}052\) daily profiles which are to be aggregated. This approach is reasonable, since the the focus of this paper lies on the study of the effect of the Laplacian noise and not e.g. on a privacy attack.
Differential privacy results
Differential privacy works by adding Laplacian noise to the target signal. The amount of noise depends on 2 parameters. The first parameter is the privacy budget \(\epsilon \). To best of our knowledge no recommendation for how to set \(\epsilon \) is known. Differential privacy is a theoretically appealing definition which is on contrary hard to comprehend intuitively. In particular, it is not clear how \(\epsilon \) affects e.g. the identifiability of an individual in a database. In this paper, for the development of the method \(\epsilon \) is set to 1. Afterwards, \(\epsilon \) is considered a free parameter that is varied. Its influence on the utility is studied in Sect. 3.5.
Determination of the sensitivity
The second parameter influencing the noise is the sensitivity S(f) of the function that should be evaluated. In the usual case of counting data, the sensitivity is known to be \(S(f)=1\). However, here the data are real numbers and the sensitivity must be determined. In the normal differential privacy setting, the data curator has full control over the data and can therefore calculate S(f). In a private smart grid setting, each smart meter only owns its load profile, so there is no single entity that owns all data. A practical way would need to be found in order to privately determine S(f), e.g. based on (expensive) secure comparison protocols. Even if one would privately determine the sensitivity, Fig. 3 shows, that a single wrong value could completely destroy the utility of the query. Such a bad case would not be easily detectable then. In practice, it seems reasonable that a good estimation for the upper bound is already known.
Therefore, in this work this topic is left open and the data present are used to determine the sensitivity S(f). Even then, the best way to determine S(f) is not evident. Using the Eq. (13) directly with an exact maximum, a huge value was obtained for S(f). Inspecting the data more closely, it was found that one household showed extremely high and therefore implausible values for certain periods of time. In order to not destroy the whole analysis by possible errors, S(f) was computed in a robust way as the 95 %-percentile of the 1-norms of all daily load profiles. As can be seen in the left panel of Fig. 3, the effect on the relative error is extremely large: the robust version (Robust) decreases the relative error by an order of magnitude compared to the exact but unstable version (Max). Therefore, the robust version was chosen for further examinations.
Laplacian noise scenarios
A time-series \(x= (x_{1},\ldots ,x_{T})\) can be seen as (i) a set of single, independent values or (ii) as a vector in a high dimensional space (with a 5 min time interval a daily curve consists of 288 values). Because consecutive values are clearly not independent, the vector-version (called LapVec) is expected to yield better results than the method considering different values in time as independent (called LapSingle).
Both methods are investigated experimentally. On one hand to determine the possible gain of the vector formulation. On the other hand, the method assuming independency leads to a simpler interpretation since the privacy budget simply adds up due to the composition property of differential privacy.
Figure 4 shows the original curve which is the sum of all 14,052 load profile curves (solid red) which is called the target profile. The dash-dotted, black line is the the target profile with single-point Laplacian noise added and the dashed, blue line has vector Laplacian noise added. It can be seen that the vector Laplacian noise is the method nearer to the target curve.
The small superiority of the vector version over the can be evaluated by looking at the cumulative distribution of the relative error values for all 288 time points of the curves (Fig. 5). This figure also shows that while approximately half of the values have a relative error of 5 % or less, the highest relative error is at the order of 45 % (small circles). For this reason both methods offer rather limited utility in approximating the target curve at all points of time for the given sample size of 14,052.
Postprocessing: smoothing
The approximation for the aggregate with both Laplacian methods in does not seem to be satisfactory (Figs. 4, 5). The differentially private curves significantly deviate from the exact aggregate. Looking at the curve it is obvious that smoothing could improve the utility. However, one could think that a smoothing operation could destroy the differentially privacy property. This is not the case, because differential privacy is preserved due to the post-processing Theorem 4 which states that differential privacy is not decreased by a mapping on the output. Therefore, we smoothed the curve for better utility.
In order to avoid border effects, the daily signals were augmented with values half the filter length at both sides. For filtering several smoothing methods from Matlab (running average, loess, lowess and its robust versions, Savitz–Golay) were tried. The running average was chosen for further analysis. Although it is the simplest method it nevertheless offered equal performance. For each method the filter length was chosen that leads to the minimum average relative error. The optimal span for the running average was about 20. Note that in practice, the relative error can not be calculated since the exact aggregate profile f is not known. Therefore, the filter length can not be chosen this way in practice.
As expected [18], smoothing significantly improves the result. The approximations in Fig. 4 are further away from the aggregate curve than the approximation in Fig. 6. The beneficial effect can be seen even better in Fig. 5. Not only the median error decreases by a factor of about 2. More importantly, the maximum error decreases from 45 to 12 %.
Discussion of smoothing and privacy
Differential privacy has the important property that it is immune to post-processing. Post-processing includes smoothing which explains the allowed use of smoothing filters. However, from the filtering perspective, there seems to be a contradiction. First, (Laplacian) noise is added for privacy reasons, then a moving average filter is applied to reduce the effect of noise. One could think that through the reduction of the noise the filter also destroys the privacy property.
For illustrative purpose we explicitly show here that smoothing does not destroy the differential privacy property. Unfortunately, privacy can not be directly confirmed experimentally because this would require an extremely exact estimation of probability densities in a high dimensional (288 dimensions) space. Instead, the analysis shows that differential privacy is not destroyed for a single, but arbitrary time point t and a simple moving average filter with span 3 (where the change to an arbitrary span is straightforward) is used.
The smoothed curve at time t is
$$\begin{aligned} Y_t^{\mathrm {sm}}:=\frac{1}{3}\left( Y_{t-1}+Y_{t}+Y_{t+1}\right) . \end{aligned}$$
(17)
Since Laplacian noise with zero mean noise is added, the expected curve for \(Y_t\) is the time averaged curve of the target curve
$$\begin{aligned} \mu _t^{\mathrm {sm}}:=E[Y_t^{\mathrm {sm}}]=\frac{1}{3}\left( f_{t-1}+f_{t}+f_{t+1}\right) . \end{aligned}$$
(18)
Since the Laplacian noise at each time point is created independently of other time points, the usage of Eqs. (17), (18) and (6) yield
$$\begin{aligned} \Pr (Y_t^{\mathrm {sm}} = y)= & {} \Pr (Y_{t-1}=y_{-1}, Y_{t} = y_0, Y_{t+1} = 3y-y_{-1}-y_0)\\= & {} \left( \frac{1}{2\lambda }\right) ^3 e^{-\frac{\vert y_{-1}-\mu _{t-1}^{\mathrm {sm}} \vert }{\lambda } } e^{-\frac{\vert y_0-\mu _{t}^{\mathrm {sm}} \vert }{\lambda } } e^{-\frac{\vert 3y-y_{-1}-y_0-\mu _{t+1}^{\mathrm {sm}} \vert }{\lambda } }. \end{aligned}$$
For simplicity of argumentation we assume that all three y-values exceed their expected values \(\mu \). Therefore the absolute value function has no effect, the x-terms cancel out and, introducing
$$\begin{aligned} M_t^{\mathrm {sm}}=\mu _{t-1}^{\mathrm {sm}}+\mu _{t}^{\mathrm {sm}}+\mu _{t+1}^{\mathrm {sm}}, \end{aligned}$$
(19)
one obtains
$$\begin{aligned} \Pr (Y_t = y) =\left( \frac{1}{2\lambda }\right) ^3e^{(-3y+ M_t^{\mathrm {sm}})/\lambda }. \end{aligned}$$
(20)
Now, two neighboring datasets \(\mathcal {D}\) and \(\tilde{\mathcal {D}}\) are considered. W.l.o.g, they differ in the last profile which is only present in dataset \(\mathcal {D}\) and for all \(i\le N-1\) the profiles coincide \(x_{i}=\tilde{x}_{i}\). Now the differential privacy condition directly can be proved: Starting from Eq. (20), substituting back Eqs. (19) and (18) and then using the neighboring condition as formulated above, one gets
$$\begin{aligned} \frac{\Pr (Y_t(\mathcal {D}) = y)}{\Pr (Y_t(\tilde{\mathcal {D}}) = y)}= & {} e^{(M_t^{\mathrm {sm}}(\mathcal {D})-M_t^{\mathrm {sm}}(\tilde{\mathcal {D}}))/\lambda }\\= & {} e^{\left( \frac{1}{3}x_{N,t-2}+\frac{2}{3}x_{N,t-1}+x_{N,t}+\frac{2}{3}x_{N,t+1}+\frac{1}{3}x_{N,t+2}\right) /\lambda }\\\le & {} e^{\left( \sum \limits _{t=1}^T x_{N,t}\right) /\lambda }. \end{aligned}$$
Ignoring possible border effects due to smoothing (i.e. taking \(t\in \{3,\ldots ,T-2\}\)), using the definition of \(\lambda \) from Eq. (7) and the sensitivity (13) then directly leads to the \(\epsilon \)-differential privacy property
$$\begin{aligned} \frac{\Pr (Y_t(\mathcal {D}) = y)}{\Pr (Y_t(\tilde{\mathcal {D}}) = y)} \le e^{\epsilon }. \end{aligned}$$
Thus, differential privacy is preserved for a single time point even after smoothing with a moving average filter.
Dependency on the privacy parameter
In practice, it is important, how the utility changes when the desired privacy restriction, i.e. \(\epsilon \) changes. In all experiments so far the differentially privacy budget parameter \(\epsilon \) was set to 1. Ignoring smoothing, the noise is corresponding to the standard deviation \(\sigma \) of the Laplace distribution which is known to be \(\sigma = \sqrt{2\lambda }\). Since \(\lambda \) is inverse proportional to \(\epsilon \), increasing privacy by halfing \(\epsilon \) would result in \(\sqrt{2}\) larger error. Therefore, knowing the error for \(\epsilon =1\), the error for another \(\epsilon \) could be theoretically calculated by
$$\begin{aligned} \hat{\mathrm {err}}(\epsilon ) = \frac{1}{\sqrt{\epsilon }}\cdot \mathrm {err}(1). \end{aligned}$$
(21)
Maybe due to the smoothing operation following the addition of Laplacian noise, this relation is only approximately valid. As can be seen in Fig. 7 the measured relative error (blue curves with pluses) is larger than the theoretical one (red curve with o) for small \(\epsilon \). Note that the measured error is here robustly estimated as the median relative error over 30 trials and all time points.
Dependency on the number of households
To successfully use differential privacy methods it is crucial to have reasonable utility. Turned another way round one can ask the question, how the error increases when the sample size decreases.
The result is shown in Fig. 8 (blue line with pluses). At current state differential privacy is very likely not suited for small neighborhoods in the size of hundreds.
Again one can compare the result with a theoretical extrapolation. Ignoring the effect of smoothing the noise is independent on the sample size N. This can directly be seen from (15), so the nominator of the relative error terms (5) does not depend on N. However, the denominators \(f_t\) are proportional to N due to (3). Therefore, one can expect that the relative error decreases with 1 / N, i.e.
$$\begin{aligned} \hat{\mathrm {err}}(N) = \frac{14{,}052}{N} \cdot \mathrm {err}(14{,}052). \end{aligned}$$
(22)
This is roughly the case: the error extrapolated from \(N=14{,}052\) (red curve with o) is rather near to the measured one (Fig. 8). Although the relative error is a factor of 2 wrong at a sample size of 500, this is not very bad considering the fact that the extrapolation from 14,052 to 500 is roughly a factor of 28. Again, the measured error is here robustly estimated as the median relative error over 30 trials and all time points. For each trial, a subsample of the right size has been sampled with replacement from the total of 14,052 load curves.