In this section, we discuss how to create and use a particular model that is available as one of the demo models of PLEAK.
First of all, we need to create the basis model in the BPMN editor. Follow the instructions of pleak-frontend to create a new .bpmn
model. It should have the following components.
Estimate the first arrival
.ship
. Set its type to Collection
.arrival
The model should look like this.
Let us open the model in SQL combined sensitivity editor. If it is opened in some other editor, click Change analyser and select the combined sensitivity editor.
Click on the data object ship
.
First, let us describe the table schema. Write the following into the window tab Table schema.
create table ship ( ship_id INT8 primary key, name TEXT, latitude INT8, longitude INT8, cargo INT8, max_speed INT8 );
The keyword id
is reserved, as it will be used to establish an ordering on table rows when the data goes from table to PSQL database. Hence, we use the name ship_id
.
Let us now fill Table data with some sample data.
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | ship_id | name | latitude | longitude | cargo | max_speed |
2 | 0 | sokk | 100 | 120 | 60 | 30 |
3 | 1 | alfa | 270 | 290 | 40 | 40 |
4 | 2 | beta | -180 | 280 | 100 | 30 |
5 | 3 | gamma | 160 | -150 | 120 | 60 |
6 | 4 | delta | -140 | -140 | 80 | 50 |
7 | 5 | milka | 180 | -170 | 110 | 30 |
We need to define a norm for this table. For example, we may find that the ship location
and its cargo
are sensitive.
rows: all; cols: latitude longitude cargo;
Now we need to think whether we want to define the norm more precisely. Currently, we treat the change in latitude
and longitude
units similarly to a change in cargo
unit. What the units are? Looking at our data table, we see that cargo
is measured in tens, so it can be logical to conceal its precision within 10 units. For latitude
and longitude
, we actually want to hide the location in terms of Eucledian distance. Let us conceal the location within 5 miles of precision. Add the following lines to Table norm.
u1 = lp 2.0 latitude longitude; u2 = scaleNorm 0.2 u1; w1 = lp 1.0 cargo; w2 = scaleNorm 0.1 w1; z = lp 1.0 u2 w2; return linf z;
In the lines u2 = scaleNorm 0.2 u1;
and w2 = scaleNorm 0.1 w1;
the scalings are defined as 0.2 = 1 / 5
and 0.1 = 1 / 10
respectively. Then, z = lp 1.0 u2 w2;
combines them into an l1-norm, which is just the sum of these two norms. Intuitively, differential privacy conceals the facts that the location of some ship has changed by 5 units, or the cargo has changed by 10 units (but not both at once). Finally, return linf z;
shows how the norm is aggregated over table rows, i.e. different ships. Here linf
denotes l∞ norm, which is the maximum of changes of all rows, which means that differential privacy conceals the change even if each row changes.
Click on the task Estimate the first arrival
. We need to insert here a query, and a schema of the table that results from executing that query. Let the query return the earliest time when some ship arrives at the port located at the point (0,0). We assume that each ship starts moving at its maximum speed.
The output table schema defines a table that contains just a single column.
create table min_time(cnt INT8);
The output table query describes how the arrival time is computed from ship location and its speed.
SELECT MIN ((ship.latitude ^ 2 + ship.longitude ^ 2) ^ 0.5 / ship.max_speed) AS cnt FROM ship ;
We are now ready to run the analysis. Click the blue button Analyze. Let us first set ε = 1 and β = 0.1, and set the slider “Confidence level of estimated noise” to 90%. Click the green button Run Analysis. The most interesting value in the output that we see is the relative error. This can be interpreted as an upper bound on the relative distance of the noisy output from the actual output, which holds with probability 90%. There is unfortunately no strict upper bound on the additive noise, and it can potentially be infinite, though with negligible probability. Hence, we can only give a probabilistic upper bound on the noise.
We can now play around with the model and see how the error can be reduced.
1.0
, or even try larger values. The error descreases, as we now consider smaller changes in the input (which means that we lose in security).rows: all ;
, try some particular row, rows: 0 ;
or rows: 1 ;
. It can be seen that ships with higher speed have larger sensitivity and hence add more noise, since changing their locations even a little may affect the arrival time more significantly.
Our example query has no WHERE
statement. Let us see what happens if we allow only those ships that have sufficienly much cargo on them, e.g. WHERE ship.cargo > 50
. We see that the error has grown up a lot. This is because cargo is a sensitive variable, and changing it even a little may cause the ship to be discarded from the set, which affects the final result significantly.
Our model is also ready for guessing advantage analysis. Open the model in GA analyzer mode (e.g. clicking the button Change analyzer). We see that the table schemas, the data, and the query are the same as they were in the combined sensitivity analyzer. However, there is no tab for table norm anymore.
From the data table, we can already infer possible values of ship locations. Let both latitude
and longitude
be bounded by the range (-300,300). We insert the following code into the tab Table constraints which becomes visible after clicking the table ship
.
latitude range -300 300; longitude range -300 300;
Click the blue button Guessing Advantage analysis. We need to specify what the attacker already knows and what he is trying to guess.
If the attacker guesses the location precisely, this is bad. However, it can be bad even if he guesses the location precisely enough. Let us assume that we want to avoid guessing within 5 units of precision. The attacker goal is stated in form of an SQL query with some additional special syntax for approximation. We insert the following code into the window that opens after clicking Attacker goal button.
SELECT ship.longitude approx 5 AND ship.latitude approx 5 FROM ship;
In this example, the attacker wins if he guesses both the latitude and the longitude withing 5 units of precision. Hence, the set of successful guesses looks like a 10-unit square centered around the actual location. Intuitively, we would like it to be a circle. We can state attacker's goal as approximating the location w.r.t. Euclidean distance, i.e. l_2-norm.
SELECT (ship.longitude, ship.latitude) approxWrtLp(2) 5 FROM ship;
It is possible that not all ships are private, and the attacker is only interested in some of them. We can add a filter to attacker's goal. Let us select only those ships that have sufficiently much cargo on them.
SELECT (ship.longitude, ship.latitude) approxWrtLp(2) 5 FROM ship WHERE cargo >= 50;
We can play around with the model and see how the error depends on different parameters.