The sensitivity of a function is the expected maximum change in the output, given a change in the input of the function. Sensitivity is the basis for calibrating the amount of noise to be added to prevent leakages on statistical database queries using a differential privacy mechanism [6]. Differential privacy ensures that it is difficult for an attacker, who observes the query output, to distinguish between two input databases that are sufficiently “close” to each other, e.g. differ in one row. Pleak tells the user how to sample noise to achieve differential privacy, and how this affects the correctness of the output. Pleak provides two methods – global and local – to quantify sensitivity of a task in a SQL workflow or of an entire SQL workflow. These methods can be applied to queries that output aggregations (e.g. count, sum, min, max).
Global sensitivity analysis [5] takes as input a database schema and a query, and computes the theoretical bounds for sensitivity, which are suitable for any instance of the database. This shows how the output changes if we add (remove) a row to (from) some input table. The analysis output is a matrix that shows the sensitivity w.r.t. each input table separately. It supports only COUNT queries.
Sometimes, the global sensitivity may be very large or even infinite. Local sensitivity analysis is an alternative approach, which requires as input not only a schema and a query, but also a particular instance of the underlying database, and it tells how the output changes with the change from the given input. Using the database instance improves the amount of noise needed to ensure differential privacy w.r.t. the number of rows. Moreover, it supports COUNT, SUM, MIN, MAX aggregations, and allows to capture more interesting distances between input tables, such as change in a particular attribute of some row. In Pleak, we have investigated a particular type of local sensitivity, called derivative sensitivity [4], which is in first place adapted to continuous functions, and is closely related to function derivative. Pleak uses derivative sensitivity to quantify the required amount of noise as described in [4].
An example of derivative sensitivity analysis output is shown in Fig. 5a. It tells that the derivative sensitivity w.r.t. the Ship table is 4, and that a differential privacy level of \(\varepsilon =1\) can be achieved using smoothness parameter \(\beta =0.05\). To this end, we would have to add an amount of (Laplacian) noise such that the relative error of the output is \(74\%\). More precisely, if the correct output is y, the noised answer will be between 0.26y and 1.74y with probability \(80\%\). A tutorial on sensitivity analyzer can be found at https://pleak.io/wiki/sql-derivative-sensitivity-analyser. More examples can be found in the full version of this paper [9].