SQL combined sensitivity analyser has a a user-facing frontend application that allows to extend models by attaching SQL scripts to its elements. SQL queries are added to tasks and SQL database information added to data objects. SQL information is attached by adding specific labels into the XML code of the model. The editor uses SQL combined sensitivity analysis tool to perform an analyze on the extended model to combine and present the results. Editor and analysis tool have separate codebases, but they are both required to use the full functionality of the analyser. Communication between the two components is arranged by backend REST service.
Sensitivity of a function is the expected maximum change in the output, given a change in the input of the function. Sensitivity is the basis for calibrating the amount of noise to be added to prevent leakages on statistical database queries using a differential privacy mechanism. The analyser computes sensitivity of an SQL workflow and reports the amount of noise that we need to tolerate to achieve ε-differential privacy for desired ε.
SQL combined sensitivity analyzer is accessible through Actions menu (with burger-menu icon) under each model in own and shared models/folders lists of frontend - link “Open in SQL CS editor”. Different capabilities of the analyser are also demostrated on a particular example model.
Clicking on tasks or data objects opens a menu on the right side of the page (in sidebar). You can add SQL scripts (in a form of stored procedures) to tasks.
Clicking on a task opens two window tabs on the right side of the page (in sidebar): Output table schema and Output table query.
The tab Output table schema expects schema of the table that comes out as the result of computation in the given task. The schema description comes in SQL language in form of a CREATE TABLE statement.
CREATE TABLE t ( var_1 type_1, ... var_n type_n );
The analyzer supports all data types that the underlying PSQL engine recognizes. However, only those starting with “int” and “float” are interpreted as numerical. All the other data types are treated as strings.
The tab Output table query expects an SQL SELECT-statement as an input. The output of this statement should be compatible with the schema described in Output table schema. As the goal of CS analyzer is to ensure differential privacy by adding noise to the final output of an SQL workflow, the last task (the one followed by a end event) should return a numerical value, i.e the output of that task should be an aggregation. Grouping by a table attribute (but not a complex expression) is allowed. The analyzer expects that the group columns come first, and the aggregation is the last one. Currently supported aggregations are COUNT, SUM, MIN, MAX, and only one aggregation per task is allowed.
SELECT table_i1.attr_1 AS group_1, ... table_ik.attr_k AS group_k, query_expression FROM table_1 AS alias_1, ... table_n AS alias_n WHERE condition_1 AND ... condition_n GROUP BY table_i1.attr_1, ... table_ik.attr_k
All the intermediate tasks (the ones not followed by a end event) may as well contain SELECT-statements without aggregations.
Clicking on a data object opens several window tabs on the right side of the page (in sidebar) for entering table schema, contents, and norm information.
The window tab Table schema contains a schema description in SQL language in form of a CREATE TABLE statement, similarly to Output table schema of a task object.
CREATE TABLE t ( var_1 type_1, ... var_n type_n );
The window tab Table data initially contains an empty table object where rows are labeled with numbers and columns with letters. More rows and columns can be added or deleted by right clicking on the table and choosing appropriate action from the menu.
The labels are immutable. Hence, the actual table column headings should be inserted into the first editable row. Use the same attribute names that are defined in the table schema.
The goal of differential privacy is to conceal the fact that a table has been modified by a unit change, e.g. adding or removing one row. The window tab Table norm tells what this unit is.
The simplest way is to tell which rows and which columns are sensitive, so DP will take care of changes in these particular cells.
rows: i_1 i_2 ... i_n ; cols: attr_1 attr_2 ... attr_n ;
While columns are defined by the names of corresponding attributes, the rows are indexed by the order they are present in the data table, starting from 0. If all rows or columns are sensitive, the keyword
all can be used instead of listing all of them. If no rows or columns are sensitive, use the keyword
float data, the default distance is defined as the numeric difference in value of the corresponding cells. For
bool data, the distance between two cells is 1 iff they contain different data. The distance between two tables is the sum of differences of all their cells.
It is possible to define more complicated distances for tables, e.g. scaling them, so that one attribute is considered more sensitive than the other one. The details can be found at analyser advanced settings.
Instead of (or in addition to) tracking changes in certain cells, we may also use a more common definition of DP, considering addition and removal of rows. For this, we need to define the cost of adding or removing a row, defined by a variable G. It assumes that all rows are sensitive, so we need to mark
rows: all ; to make it work properly.
rows: all ; cols: none ; G: 1 ;
Clicking on Analyze button opens a menu entitled Analysis settings on the right side of the page (in sidebar). There you can adjust “Privacy level ε” and “Smoothness level β” parameters. Simply put, smaller ε means more privacy. The variable β is an optimization parameter that does not affect privacy at all, but is related to the noise lever. In general, smaller β means less noise, but not all values of β will work for a particular query, resulting in an error. The user may choose not to fix β and let the analyzer find a suitable β itself (which depends on ε). The analysis takes more time, but the obtained β can be reused afterwards, and still be reasonable if the query or the data does not change too much.
Clicking on a sub-button “Attacker Settings” opens a window tab where the user can define known bounds on data, which may reduce the noise or even turn an infeasible result into feasible. These settings are essential for GROUP BY queries, defining the total number of possible groups.
table_1.attr_1 range lower_bound upper_bound ; ... table_n.attr_n set value_1 ... value_n ;
Click on Run analysis button to run analysis. The results (entitled Analysis results) appear in the sidebar as well. The result is given for each of the input tables, and it consists of the following components.