sql-derivative-sensitivity-analyser_advanced

Here we present some additional functionality of SQL combined sensitivity analyzer, which is not essential, but allows to do more. We do not reproduce the formal semantics here, but give some examples instead.

A table T's norm is defined either in the text file `T.nrm`

(if the analyzer is run from the command line), or in the text input window **table norm** that opens after clicking on a data object (if analysis is run from PLEAK web application).

The first two lines tell which rows (indexed starting from 0) and which columns (identified by corresponding attribute name) are treated as sensitive. We estimate change in the output when only the sensitive entries may change, and the non-sensitive entries remain the same.

rows: 0 3 7 ; cols: latitude longitude ;

Here we assume that the columns `latitude`

and `longitude`

are sensitive in the rows indexed by 0, 3, and 7. It is possible to define more sophisticated sensitive components. For this, the norm description is extended by a sequence of variable assignments, denoting how the norm is computed. Supported operations are scaling and l_{p}-norms, which can be composed in an arbitrary way.

rows: i_1 i_2 ... i_n ; cols: attr_1 attr_2 ... attr_n ; var_1 = op_1 var_11 ... var_1m; .... var_n = op_n var_n1 ... var_nm; return op var_n;

As an example, let us consider the following norm definition.

rows: 0 3 7 ; cols: latitude longitude ; u = lp 2.0 latitude longitude; z = scaleNorm 0.2 u; return linf z;

The line `u = lp 2.0 latitude longitude;`

combines latitude and longitude to define Euclidean distance (i.e l_{2}-norm). We may scale the distances, and 0.2 in the line `z = scaleNorm 0.2 u;`

means that we conceal changes in location up to 1 / 0.2 = 5 units. Finally, `return linf z;`

shows how the distance between the tables is computed from the distances between their rows, and `linf`

means that we take the maximum row distance (i.e l_{∞}-norm), so DP conceals the change even if all sensitive rows change by a unit.

In the previous section, we considered differential privacy w.r.t. change in some particular cells of the data tables. The number of rows was considered immutable. To achieve a more traditional differential privacy, which considers addition or deletion of a row as a unit change, we need to define a cost of such operation, expressed by the line `G: 1.0 ;`

. It is possible to combine these two distances.

rows: all ; cols: latitude longitude ; G: 1.0 ;

Intuitively, this means that both types of changes are allowed. In this example, differential privacy conceals the facts that a row has been added or removed, as well as that the latitude or longitude have been changed by a unit. More precisely, we define the distance between two tables as a *table edit distance* (analogous to string edit distance) that uses the following operations:

- the cost of row insertion/deletion (defined by the line
`G:`

). - the cost of cell modification (defined by the line
`cols:`

and the possible extension).

Table edit distance is defined as the minimal cost of operations required to transform one table into the other.

sql-derivative-sensitivity-analyser_advanced.txt · Last modified: 2018/11/26 11:45 by alisa