User Tools

Site Tools


sql-derivative-sensitivity-analyser_advanced

This is an old revision of the document!


Advanced settings

Here we present some additional functionality of SQL combined sensitivity analyzer, which is not essential, but allows to do more. We do not reproduce the formal semantics here, but give some examples instead.

Defining norms for data tables

A table T's norm is defined either in the text file T.nrm (if the analyzer is run from the command line), or in the text input window table norm that opens after clicking on a data object (if analysis is run from PLEAK web application).

The first two lines tell which rows (indexed starting from 0) and which columns (identified by corresponding attribute name) are treated as sensitive. We estimate change in the output when only the sensitive entries may change, and the non-sensitive entries remain the same.

rows: 0 3 7 ;
cols: latitude longitude ;

Here we assume that the columns latitude and longitude are sensitive in the rows indexed by 0, 3, and 7. It is possible to define more sophisticated sensitive components. For this, the norm description is extended by a sequence of variable assignments, denoting how the norm is computed. Supported operations are scaling and lp-norms, which can be composed in an arbitrary way.

rows: i_1 i_2 ... i_n ;
cols: attr_1 attr_2 ... attr_n ;

var_1 = op_1 var_11 ... var_1m;
....
var_n = op_n var_n1 ... var_nm;
return op var_n;

As an example, let us consider the following norm definition.

rows: 0 3 7 ;
cols: latitude longitude ;

u = lp 2.0 latitude longitude;
z = scaleNorm 0.2 u;
return linf z;

The line u = lp 2.0 latitude longitude; combines latitude and longitude to define Euclidean distance (i.e l2-norm). We may scale the distances, and 0.2 in the line z = scaleNorm 0.2 u; means that we conceal changes in location up to 1 / 0.2 = 5 units. Finally, return linf z; shows how the distance between the tables is computed from the distances between their rows, and linf means that we take the maximum row distance (i.e l-norm), so DP conceals the change even if all sensitive rows change by a unit.

Local and combined sensitivity

In the previous section, we considered differential privacy w.r.t. change in some particular cells of the data tables. The number of rows was considered immutable. Instead of (or in addition to) tracking changes in certain cells, we may also use traditional definition of DP, considering addition and removal of rows as a unit change. For this, we need to define the cost of adding or removing a row, defined by a variable G. It assumes that all rows are sensitive, so we need to mark rows: all ; to make it work properly.

rows: all ;
cols: none ;
G: 1 ;

It is possible to combine these two distances into one.

rows: all ;
cols: latitude longitude ;
G: 1.0 ;

Intuitively, this means that both types of changes are allowed. In this example, differential privacy conceals the facts that a row has been added or removed, as well as that the latitude or longitude have been changed by a unit. More precisely, we define the distance between two tables as a table edit distance (analogous to string edit distance) that uses the following operations:

  • the cost of row insertion/deletion (defined by the line G:).
  • the cost of cell modification (defined by the line cols: and the possible extension).

Table edit distance is defined as the minimal cost of operations required to transform one table into the other.

Table constraints

Clicking on a sub-button Table constraints opens a window tab where the user can define known bounds on data, which may reduce the noise or even turn an infeasible result into feasible. These settings are essential for GROUP BY queries, defining the total number of possible groups. For combined sensitivity analyser, it can be used to define sets and/or ranges of values.

table_1.attr_1 range lower_bound upper_bound ;
...
table_n.attr_n set value_1 ... value_n ;

This tab has more use in guessing advantage analysis, allowing more types of constraints. The keywords total, set and range do not specify any probability distribution on the data, and the analyser assumes worst-case distribution by default (i.e. one for which the advantage is the largest). The keywords totalUnif, setunif, rangeUnif in addition specify that the distribution is uniform. To provide even more distribution details, setPrior allows to define a probability for each element, and rangePrior allows to split a range into n blocks, defining a different weight to each block (inside each block, the elements are distributed uniformly).

table_1.attr_1 exact;               --attacker knows the exact value

table_2.attr_2 total int;           --there are n possible values
table_3.attr_3 set v1 ... vn;       --there are values {v1 ... vn}
table_4.attr_4 range lb ub;         --the values come from range [lb,ub)

table_5.attr_5 totalUnif int;       -- there are n uniformly distributed values
table_6.attr_6 setUnif v1 ... vn;   -- uniformly distributed in {v1 ... vn}
table_7.attr_7 rangeUnif lb ub;     -- uniformly distributed in [lb,ub)

-- value vk comes with probability pk
table_8.attr_8 setPrior (v1, p1) ... (vn, pn)  

-- range [v(k-1)...vk) comes with prob.pk
-- the values within [v(k-1)...vk) are distributed uniformly
table_9.attr_9 rangePrior v0 (v1, p1) ... (vn, pn) 
                                     
sql-derivative-sensitivity-analyser_advanced.1569505558.txt.gz · Last modified: 2019/10/01 13:53 (external edit)