User Tools

Site Tools


sql-derivative-sensitivity-analyser

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
sql-derivative-sensitivity-analyser [2019/05/31 17:38]
alisa [Quick guide]
sql-derivative-sensitivity-analyser [2019/10/01 17:22] (current)
alisa [Description]
Line 5: Line 5:
 ===== Description ===== ===== Description =====
  
-Sensitivity of a function is the expected maximum change in the output, given a change in the input of the function. Sensitivity is the basis for calibrating the amount of noise to be added to prevent leakages on statistical database queries using a differential privacy mechanism. The analyser ​computes sensitivity of an SQL workflow and reports the amount of noise that we need to tolerate to achieve ε-differential privacy for desired ε.+Sensitivity of a function is the expected maximum change in the output, given a change in the input of the function. Sensitivity is the basis for calibrating the amount of noise to be added to prevent leakages on statistical database queries using a differential privacy mechanism. The analyzer ​computes sensitivity of an SQL workflow and reports the amount of noise that we need to tolerate to achieve ε-differential privacy for desired ε.
 ===== Quick guide ===== ===== Quick guide =====
  
-SQL combined sensitivity analyzer is accessible through Actions menu (with burger-menu icon) under each model in own and shared models/​folders lists of frontend - link "Open in SQL CS editor"​. Different capabilities of the analyser ​are also demostrated ​on a particular [[sql-derivative-sensitivity-analyser_demo|example model]]+SQL combined sensitivity analyzer is accessible through Actions menu (with burger-menu icon) under each model in own and shared models/​folders lists of frontend - link "Open in SQL CS editor"​. Different capabilities of the analyzer ​are also demonstrated ​on a particular [[sql-derivative-sensitivity-analyser_demo|example model]].
- +
-=== Adding information to model elements for analysis === +
- +
-Clicking on tasks or data objects opens a menu on the right side of the page (in sidebar). You can add SQL scripts (in a form of stored procedures) to tasks.+
  
 === Adding SQL scripts === === Adding SQL scripts ===
Line 18: Line 14:
 Clicking on a task opens two window tabs on the right side of the page (in sidebar): **Output table schema** and **Output table query**. Clicking on a task opens two window tabs on the right side of the page (in sidebar): **Output table schema** and **Output table query**.
  
-The tab **Output table schema** expects ​the schema of the table that comes out as the result of computation in the given task.+The tab **Output table schema** expects schema of the table that comes out as the result of computation in the given task. The schema description comes in SQL language in form of a CREATE TABLE statement.
  
 +<​code>​
 +CREATE TABLE t (
 +  var_1 type_1,
 +  ...
 +  var_n type_n
 +);
 +</​code>​
  
-The tab **Output table query** expects an SQL SELECT-statement as an input. As the goal of CS analyzer is to ensure differential privacy by adding noise to the final output of an SQL workflow, the **last task** (the one followed by a end event) should return a numerical value, i.e the output of that task should be an aggregation. Grouping by a table attribute (but not a complex expression) is allowed. The analyzer expects that the group columns come first, and the aggregation is the last one. Currently supported aggregations are COUNT, SUM, MIN, MAX, and only one aggregation per task is allowed.+The analyzer supports all data types that the underlying PSQL engine recognizes. However, only those starting with "​int"​ and "​float"​ are interpreted as numerical. All the other data types are treated as strings. 
 + 
 +The tab **Output table query** expects an SQL SELECT-statement as an input. The output of this statement should be compatible with the schema described in **Output table schema**. As the goal of CS analyzer is to ensure differential privacy by adding noise to the final output of an SQL workflow, the **last task** (the one followed by a end event) should return a numerical value, i.e the output of that task should be an aggregation. Grouping by a table attribute (but not a complex expression) is allowed. The analyzer expects that the group columns come first, and the aggregation is the last one. Currently supported aggregations are COUNT, SUM, MIN, MAX, and only one aggregation per task is allowed.
  
 <​code>​ <​code>​
 SELECT SELECT
-    table_i1.attr_1 AS group_1, 
-    ... 
-    table_ik.attr_k AS group_k, 
     query_expression     query_expression
 FROM FROM
Line 34: Line 36:
     table_n AS alias_n     table_n AS alias_n
 WHERE WHERE
-    ​condition_1 AND +    ​condition 
-    ... +;
-    condition_n +
-GROUP BY +
-    table_i1.attr_1,​ +
-    ... +
-    table_ik.attr_k+
 </​code>​ </​code>​
  
Line 51: Line 48:
 == Table schema == == Table schema ==
  
-The window tab **Table schema** contains a schema description in SQL language in form of a CREATE TABLE statement.+The window tab **Table schema** contains a schema description in SQL language in form of a CREATE TABLE statement, similarly to **Output table schema** of a task object.
  
 <​code>​ <​code>​
Line 60: Line 57:
 ); );
 </​code>​ </​code>​
-Currently supported data types are ''​int4'',​ ''​int8'',​ ''​float4'',​ ''​float8'',​ ''​text'',​ ''​bool''​. 
  
 == Table data == == Table data ==
Line 69: Line 65:
  
 ^    ^ A ^ B     ^ C        ^ ^    ^ A ^ B     ^ C        ^
-^ 1  | attr_1attr_2 ​ ​| ​attr_3|+^ 1  | var_1var_2  ​| ​var_3|
 ^ 2  | x11 | x21  | x31      | ^ 2  | x11 | x21  | x31      |
 ^ 3  | x12 | x22  | x32      | ^ 3  | x12 | x22  | x32      |
Line 91: Line 87:
 It is possible to define more complicated distances for tables, e.g. scaling them, so that one attribute is considered more sensitive than the other one. The details can be found at [[sql-derivative-sensitivity-analyser_advanced|analyser advanced settings]]. It is possible to define more complicated distances for tables, e.g. scaling them, so that one attribute is considered more sensitive than the other one. The details can be found at [[sql-derivative-sensitivity-analyser_advanced|analyser advanced settings]].
  
-Instead of (or in addition to) tracking changes in certain cells, we may also use a more common definition of DP, considering addition and removal of rows. For this, we need to define the cost of adding or removing a row, defined by a variable G. It assumes that all rows are sensitive, so we need to mark ''​rows:​ all ;''​ to make it work properly.+=== Running analysis ===
  
-<​code>​ +Clicking on //Analyze// button opens a menu entitled //Analysis settings// on the right side of the page (in sidebar). There you can adjust "​Privacy level ε" and "​Smoothness level β" parameters. Simply put, smaller ε means more privacy. For simplicity, set β radio button to //off//, as it does not affect privacy and is needed for noise optimization.
-rows: all ; +
-cols: none ; +
-G: 1 ; +
-</code>+
  
-=== Running analysis ===+The slider //​Confidence level of estimated noise//, which ranges between 0% and 100%, fixes the probability with which the noise stays below reported bound. The larger is confidence, the larger is the noise. For 100% confidence, the noise would be infinite, as the noise distribution is unbounded.
  
-Clicking on //​Analyze// ​button ​opens a menu entitled ​//Analysis settings// on the right side of the page (in sidebar). There you can adjust "​Privacy ​level ε" and "​Smoothness level β" parameters. Simply put, smaller ε means more privacy. The variable β is an optimization parameter that does not affect privacy at all, but is related to the noise lever. In general, smaller β means less noise, but not all values of β will work for a particular query, resulting ​in an error. The user may choose not to fix β and let the analyzer find a suitable β itself (which depends on ε). The analysis takes more time, but the obtained β can be reused afterwards, and still be reasonable if the query or the data does not change too much.+Sub-button //Table constraints// allows to add information that may be helpful for getting better noise level. The details are given in [[sql-derivative-sensitivity-analyser_advanced|analyser advanced settings]].
  
-Clicking on a sub-button "​Attacker Settings"​ opens a window tab where the user can define known bounds on data, which may reduce the noise or even turn an infeasible result into feasible.+{{cs_settings.png}}
  
-<​code>​ +Click on //Run analysis// button to run analysisThe results (entitled //Analysis results//) appear in the sidebar as wellThe result is given for each of the input tables, and it consists of the following components.
-table_1.attr_1 range lower_bound upper_bound ; +
-..+
-table_n.attr_n range lower_bound upper_bound ; +
-</​code>​+
  
-Click on //Run analysis// button to run analysisThe results (entitled //Analysis results//) appear ​in the sidebar as wellThe result is given for each of the input tables, and it consists of the following components.+  * **beta-smooth sensitivity** is the β-smooth sensitivity of the query w.r.t. the distance defined ​in the table norm. 
 +  * **actual outputs y** are the true outputs ​of the querywithout noise. 
 +  * **p%-noise magnitude a** is the additive noise magnitude, i.e. the noise stays below this quantity with probability p%. 
 +  * **p%-realtive error |a|/|y|** is the quotient of the additive noise and the query output. If there are several outputs, ​it is the quotient of corresponding vector norms.
  
-  * **Combined sensitivity** ​is derivative sensitivity of the query w.r.t. the distance defined in the table norm+Clicking //View more// provides more information. It tells which value of β was finally used (which can be important if β was optimized automatically),​ what is the underlying ​norm, and how much noise we would get using Laplace ​noise distribution (by default, the noise level is given for Cauchy distribution).
-  * **Additive noise** is the noise magnitudeequal to the required scaling of the noise added to the actual output. Since the noise comes from an unbounded ​distribution, we cannot give an upper bound for it, but with high probability ​(in our caseca 80%) it stays below the noise magnitude. +
-  * **Query output** ​is the actual output, without adding any noise. +
-  * **Relative error (additive noise / query output)** is the quotient of the additive noise and the query output. It shows how far the differentially private result gets from the actual result.+
 ===== Source code ===== ===== Source code =====
  
 The source code of SQL combined sensitivity editor is available at [[https://​github.com/​pleak-tools/​pleak-combined-sensitivity-editor|pleak-sql-combined-sensitivity-editor]] and the source code of SQL combined sensitivity analysis tools at [[https://​github.com/​pleak-tools/​pleak-sql-analysis|pleak-sql-analysis]] repositories. Installation details can be found at [[sql-derivative-sensitivity-analyser_install|analyser installation guide]]. The source code of SQL combined sensitivity editor is available at [[https://​github.com/​pleak-tools/​pleak-combined-sensitivity-editor|pleak-sql-combined-sensitivity-editor]] and the source code of SQL combined sensitivity analysis tools at [[https://​github.com/​pleak-tools/​pleak-sql-analysis|pleak-sql-analysis]] repositories. Installation details can be found at [[sql-derivative-sensitivity-analyser_install|analyser installation guide]].
sql-derivative-sensitivity-analyser.1559313526.txt.gz · Last modified: 2019/10/01 13:53 (external edit)