SQL guessing advantage analyser has a a user-facing frontend application that allows to extend models by attaching SQL scripts to its elements. SQL queries are added to tasks and SQL database information added to data objects. SQL information is attached by adding specific labels into the XML code of the model. The editor uses SQL guessing advantage analysis tool to perform an analyze on the extended model to combine and present the results. Editor and analysis tool have separate codebases, but they are both required to use the full functionality of the analyser. Communication between the two components is arranged by backend REST service.
We interpret differential privacy in terms of a more standard security measure – the attacker's guessing advantage. It is defined as the difference between the posterior (after observing the output) and prior (before observing the output) probabilities of attacker guessing the input. The analyser reports the amount of noise that need to be added to achieve a desired upper bound on advantage.
Compared to combined sensitivity analyser, the data objects of a model also have schemas and data tables, but now there are no explicit table norms. The distance measure for differential privacy will be determined in a different way.
Clicking on Analyze button opens a menu entitled Analysis settings on the right side of the page (in sidebar). The emerging slider allows to set desired upper bound on attacker’s advantage, which ranges between 0% and 100%.
The user has to specify a particular subset of attributes that the attacker is trying to guess, within given precision range. To characterize the attacker more precisely, the user defines prior knowledge of the attacker. There are now two extra buttons to define bounds for used attributes.
This input starts with the keyword
LEAK. It defines a set of sensitive components, which the attacker is trying to guess. For each sensitive attribute, the guess can either be
exact (discrete attributes), or
approx r (approximated by r > 0 units). The guesses can be combined into an expression using AND and OR operation, describing the case where leakage is considered successful. The expression can be followed by a sequence of statements of the form
FROM table WHERE condition, which describes which rows of the considered tables are treated as sensitive. The statements can in turn be followed by a single line containing keyword
cost and a number that defines the cost of leaking that combination of attributes. By default, the cost is set to 100. The delimiter
; finishes the description of the sensitive components.
LEAK ship.latitude approx 5 AND ship.longitude approx 5 FROM ship WHERE cargo > 0 cost 100;
In this example, the attacker wins iff he guesses both attributes
longitude of some row of the table
ship within 5-unit precision. The definition of “unit” depends on the data table, e.g. if the location was defined in miles, then a unit is also a mile. We only worry about location of ships that carry some cargo.
If we want to express that the attacker wins if he guesses either
longitude, we replace AND operation with OR.
This input defines prior knowledge of the attacker by setting pre-known bounds on attributes, defined either as
range a b, or
total a (the latter is used only for discrete data).
ship.latitude range 0 300; ship.longitude range 0 300;
In this example, the attacker knows that both
longitude range between
Click on Run analysis button to run analysis. The analyser internally converts these values to a suitable ε for differential privacy, and computes the noise required to achieve the bound on attacker’s advantage. The results (entitled Analysis results) appear in the sidebar as well. The result is given for each of the input tables, and it consists of the following components. Click on Run analysis button to run analysis. The results (entitled Analysis results) appear in the sidebar as well. The result is given for each of the input tables, and it consists of the following components.
To see more precise values of prior and posterior guessing probabities, click View more. This can be useful for choosing appropriate value on the guessing advantage slider. For example, if the prior guessing probability was already 75%, then any value above 25% makes no sense since it would mean that the attacker is allowed to learn everything.