This is an old revision of the document!

Example model for SQL combined sensitivity and guessing advantage analysis

In this section, we discuss how to create and use a particular model that is available as one of the demo models of PLEAK.

Creating the basis model

First of all, we need to create the basis model in the BPMN editor. Follow the instructions of pleak-frontend to create a new .bpmn model. It should have the following components.

A task object named Estimate the first arrival.
An input data object named ship. Set its type to Collection.
An output data object named arrival
Add a startEvent and an endEvent.

The model should look like this.

Setting up data objects

Let us open the model in SQL combined sensitivity editor. If it is opened in some other editor, click Change analyser and select the combined sensitivity editor.

Click on the data object ship. First, let us describe the table schema. Write the following into the window tab Table schema.

create table ship (
  ship_id INT8 primary key,
  name TEXT,
  latitude INT8,
  longitude INT8,
  cargo INT8,
  max_speed INT8
);

The keyword id is reserved, as it will be used to establish an ordering on table rows when the data goes from table to PSQL database. Hence, we use the name ship_id.

Let us now fill Table data with some sample data.

	A	B	C	D	E	F
1	ship_id	name	latitude	longitude	cargo	max_speed
2	0	sokk	100	120	60	30
3	1	alfa	270	290	40	40
4	2	beta	-180	280	100	30
5	3	gamma	160	-150	120	60
6	4	delta	-140	-140	80	50
7	5	milka	180	-170	110	30

We need to define a norm for this table. For example, we may find that the ship location and its cargo are sensitive.

rows: all;
cols: latitude longitude cargo;

Now we need to think whether we want to define the norm more precisely. Currently, we treat the change in latitude and longitude units similarly to a change in cargo unit. What the units are? Looking at our data table, we see that cargo is measured in tens, so it can be logical to conceal its precision within 10 units. For latitude and longitude, we actually want to hide the location in terms of Eucledian distance. Let us conceal the location within 5 miles of precision. Add the following lines to Table norm.

// a longer description of the norm
u1 = lp 2.0 latitude longitude;
u2 = scaleNorm 0.2 u1;
w1 = lp 1.0 cargo;
w2 = scaleNorm 0.1 w1; 
z = lp 1.0 u2 w2;
return linf z;

In the lines u2 = scaleNorm 0.2 u1; and w2 = scaleNorm 0.1 w1; the scalings are defined as 0.2 = 1 / 5 and 0.1 = 1 / 10 respectively. Then, z = lp 1.0 u2 w2; combines them into an l₁-norm, which is just the sum of these two norms. Intuitively, differential privacy conceals the facts that the location of some ship has changed by 5 units, or the cargo has changed by 10 units (but not both at once). Finally, return linf z; shows how the norm is aggregated over table rows, i.e. different ships. Here linf denotes l_∞ norm, which is the maximum of changes of all rows, which means that differential privacy conceals the change even if each row changes.

Setting up tasks

Click on the task Estimate the first arrival. We need to insert here a query. Let the query return the earliest time when some ship arrives at the port located at the point (0,0). We assume that each ship starts moving at its maximum speed.

create or replace function min_time() returns TABLE (
  cnt INT8
) as $$ 

SELECT
    MIN ((ship.latitude ^ 2 + ship.longitude ^ 2) ^ 0.5 / ship.max_speed)
FROM
    ship
$$ language SQL;

Running sensitivity analysis

We are now ready to run the analysis. Click the blue button Analyze. Let us first set ε = 1 and β = 0.1. Click the green button Run Analysis. The most interesting value in the output that we see is the relative error. This can be interpreted as an upper bound on the relative distance of the noisy output from the actual output, which holds with probability 80%. There is unfortunately no strict upper bound on the additive noise, and it can potentially be infinite, though with negligible probability. Hence, we can only give a probabilistic upper bound on the noise, which is in our case hard-coded to 80%.

We can now play around with the model and see how the error can be reduced.

Try to reduce β, e.g. try = 0.1. This does not affect security in any way, but may give smaller noise level.
Try to reset scalings of Table norm to 1.0, or even try larger values. The error descreases, as we now consider smaller changes in the input (which means that we lose in security).
Try out different row sensitivity. Instead of rows: all ;, try some particular row, rows: 0 ; or rows: 1 ;. It can be seen that ships with higher speed have larger sensitivity and hence add more noise, since changing their locations even a little may affect the arrival time more significantly.

Our example query has no WHERE statement. Let us see what happens if we allow only those ships that have sufficienly much cargo on them, e.g. WHERE ship.cargo > 50. We see that the error has grown up a lot. This is because cargo is a sensitive variable, and changing it even a little may cause the ship to be discarded from the set, which affects the final result significantly.

Running guessing advantage analysis

Our model is also ready for guessing advantage analysis. Open the model in GA analyzer mode (e.g. clicking the button Change analyzer). We see that the table schemas, the data, and the query are the same as they were in the combined sensitivity analyzer.

Click the blue button Analyze. We need to specify what the attacker already knows and what he is trying to guess.

Attacker settings

From the data table, we can already infer possible values of ship locations. Let both latitude and longitude be bounded by the range (-300,300). We insert the following code into the window that opens after clicking Attacker settings button.

ship.latitude range -300 300;
ship.longitude range -300 300;

Sensitive attributes

If the attacker guesses the location precisely, this is bad. However, it can be bad even if he guesses the location precisely enough. Let us assume that we want to avoid guessing within 5 units of precision. We insert the following code into the window that opens after clicking Sensitive attributes button.

leak
ship.latitude approx 5;
ship.longitude approx 5;
cost
100

In this example, the leakage cost is set to 100. The cost does not affect the noise level in anyway, and its only goal is to give alternative interpretation to the guessing advantage. It can always be set to default value 100, so that the average cost would equal the advantage.

We can now play around with the model and see how the error can be reduced.

Increasing allowed guessing advantage decreases the error. At extreme cases, we get the error ∞ if we want advantage 0%, and the error 0 if we allow advantage 100% (more precisely, if we allow posterior probability 100%, which happens for even a smaller advantage).
Try to decrease the allowed guessing radius (e.g. set it to 1). In general, it becomes more difficult for the attacker to make a guess, so the error decreases.
Try to increase and decrease the initially known ranges on latitude and longitude. While it directly affects the prior probability (which can be viewed by clicking View more in the analysis result), the upper bound on posterior probability may change less. Technically, differential privacy makes the “sensitive area” similar to its neighbouring surroundings, and not the entire set of possible values, so increasing the range may have little effect on the posterior probability. As the result, if the advantage level is kept the same, increasing the range may also increase the error.

PLEAK Wiki

Sidebar

Table of Contents

Example model for SQL combined sensitivity and guessing advantage analysis

Creating the basis model

Setting up data objects

Setting up tasks

Running sensitivity analysis

Running guessing advantage analysis

Attacker settings

Sensitive attributes

PLEAK Wiki

User Tools

Site Tools

Sidebar

Table of Contents

Example model for SQL combined sensitivity and guessing advantage analysis

Creating the basis model

Setting up data objects

Setting up tasks

Running sensitivity analysis

Running guessing advantage analysis

Attacker settings

Sensitive attributes

Page Tools