User Tools

Site Tools


sql-derivative-sensitivity-analyser

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
sql-derivative-sensitivity-analyser [2018/11/27 10:00]
alisa [Description]
sql-derivative-sensitivity-analyser [2019/10/01 17:22] (current)
alisa [Description]
Line 5: Line 5:
 ===== Description ===== ===== Description =====
  
-Sensitivity of a function is the expected maximum change in the output, given a change in the input of the function. Sensitivity is the basis for calibrating the amount of noise to be added to prevent leakages on statistical database queries using a differential privacy mechanism. The analyser ​computes sensitivity of an SQL workflow and reports the amount of noise that we need to tolerate to achieve ε-differential privacy for desired ε.+Sensitivity of a function is the expected maximum change in the output, given a change in the input of the function. Sensitivity is the basis for calibrating the amount of noise to be added to prevent leakages on statistical database queries using a differential privacy mechanism. The analyzer ​computes sensitivity of an SQL workflow and reports the amount of noise that we need to tolerate to achieve ε-differential privacy for desired ε.
 ===== Quick guide ===== ===== Quick guide =====
  
-SQL combined sensitivity analyzer is accessible through Actions menu (with burger-menu icon) under each model in own and shared models/​folders lists of frontend - link "Open in SQL CS editor"​. Different capabilities of the analyser ​are also demostrated ​on a particular [[sql-derivative-sensitivity-analyser_demo|example model]].+SQL combined sensitivity analyzer is accessible through Actions menu (with burger-menu icon) under each model in own and shared models/​folders lists of frontend - link "Open in SQL CS editor"​. Different capabilities of the analyzer ​are also demonstrated ​on a particular [[sql-derivative-sensitivity-analyser_demo|example model]].
  
-=== Adding ​information to model elements for analysis ​===+=== Adding ​SQL scripts ​===
  
-Clicking on tasks or data objects ​opens a menu on the right side of the page (in sidebar). You can add SQL scripts (in a form of stored procedures) to tasks.+Clicking on a task opens two window tabs on the right side of the page (in sidebar): **Output table schema** and **Output table query**.
  
-=== Adding ​SQL scripts ===+The tab **Output table schema** expects schema of the table that comes out as the result of computation in the given task. The schema description comes in SQL language in form of a CREATE TABLE statement.
  
-Clicking on a task opens a window tab **Query input** on the right side of the page (in sidebar) for entering an SQL script. Currently, the script should be in form of a stored procedure. 
 <​code>​ <​code>​
-CREATE ​OR REPLACE FUNCTION f() RETURNS ​TABLE (+CREATE TABLE (
   var_1 type_1,   var_1 type_1,
   ...   ...
-  ​var_2 type_2 +  ​var_n type_n 
-as $$ +)
 +</​code>​
  
 +The analyzer supports all data types that the underlying PSQL engine recognizes. However, only those starting with "​int"​ and "​float"​ are interpreted as numerical. All the other data types are treated as strings.
 +
 +The tab **Output table query** expects an SQL SELECT-statement as an input. The output of this statement should be compatible with the schema described in **Output table schema**. As the goal of CS analyzer is to ensure differential privacy by adding noise to the final output of an SQL workflow, the **last task** (the one followed by a end event) should return a numerical value, i.e the output of that task should be an aggregation. Grouping by a table attribute (but not a complex expression) is allowed. The analyzer expects that the group columns come first, and the aggregation is the last one. Currently supported aggregations are COUNT, SUM, MIN, MAX, and only one aggregation per task is allowed.
 +
 +<​code>​
 SELECT SELECT
     query_expression     query_expression
Line 31: Line 36:
     table_n AS alias_n     table_n AS alias_n
 WHERE WHERE
-    ​condition_1 AND +    ​condition 
-    ... +;
-    condition_n +
-$$ language SQL;+
 </​code>​ </​code>​
  
-The goal of CS analyzer is to ensure differential privacy by adding noise to the final output of an SQL workflow. Hence, ​the **last task** (the one followed by a end event) ​should return exactly one numerical value, i.e the query_expression of that task should be an aggregation. Currently supported aggregations are COUNT, SUM, MIN, MAX. Conversely, all the **intermediate tasks** should not contain aggregations, as they are not supported yet.+All the **intermediate tasks** (the ones not followed by a end event) ​may as well contain ​SELECT-statements without ​aggregations.
  
 === Adding table data === === Adding table data ===
Line 45: Line 48:
 == Table schema == == Table schema ==
  
-The window tab **Table schema** contains a schema description in SQL language in form of a CREATE TABLE statement.+The window tab **Table schema** contains a schema description in SQL language in form of a CREATE TABLE statement, similarly to **Output table schema** of a task object.
  
 <​code>​ <​code>​
Line 54: Line 57:
 ); );
 </​code>​ </​code>​
-Currently supported data types are ''​int4'',​ ''​int8'',​ ''​float4'',​ ''​float8'',​ ''​text'',​ ''​bool''​. 
  
 == Table data == == Table data ==
Line 63: Line 65:
  
 ^    ^ A ^ B     ^ C        ^ ^    ^ A ^ B     ^ C        ^
-^ 1  | attr_1attr_2 ​ ​| ​attr_3|+^ 1  | var_1var_2  ​| ​var_3|
 ^ 2  | x11 | x21  | x31      | ^ 2  | x11 | x21  | x31      |
 ^ 3  | x12 | x22  | x32      | ^ 3  | x12 | x22  | x32      |
Line 84: Line 86:
  
 It is possible to define more complicated distances for tables, e.g. scaling them, so that one attribute is considered more sensitive than the other one. The details can be found at [[sql-derivative-sensitivity-analyser_advanced|analyser advanced settings]]. It is possible to define more complicated distances for tables, e.g. scaling them, so that one attribute is considered more sensitive than the other one. The details can be found at [[sql-derivative-sensitivity-analyser_advanced|analyser advanced settings]].
- 
-Instead of (or in addition to) tracking changes in certain cells, we may also use a more common definition of DP, considering addition and removal of rows. For this, we need to define the cost of adding or removing a row, defined by a variable G. It assumes that all rows are sensitive, so we need to mark ''​rows:​ all ;''​ to make it work properly. 
- 
-<​code>​ 
-rows: all ; 
-cols: none ; 
-G: 1 ; 
-</​code>​ 
  
 === Running analysis === === Running analysis ===
  
-Clicking on //Analyze// button opens a menu entitled //Analysis settings// on the right side of the page (in sidebar). There you can adjust "​Privacy level ε" and "​Smoothness level β" parameters. Simply put, smaller ε means more privacy. ​The variable ​β is an optimization parameter that does not affect privacy ​at all, but is related to the noise lever. In general, smaller β means less noise, but not all values of β will work for a particular query, resulting in an error. The user may choose not to fix β and let the analyzer find a suitable β itself (which depends on ε). The analysis takes more time, but the obtained β can be reused afterwards, and still be reasonable if the query or the data does not change too much.+Clicking on //Analyze// button opens a menu entitled //Analysis settings// on the right side of the page (in sidebar). There you can adjust "​Privacy level ε" and "​Smoothness level β" parameters. Simply put, smaller ε means more privacy. ​For simplicity, set β radio button to //off//, as it does not affect privacy ​and is needed for noise optimization.
  
-Clicking on a sub-button "​Attacker Settings"​ opens a window tab where the user can define known bounds on data, which may reduce ​the noise or even turn an infeasible result into feasible.+The slider //​Confidence level of estimated noise//, which ranges between 0% and 100%, fixes the probability with which the noise stays below reported bound. The larger is confidence, the larger is the noise. For 100% confidence, the noise would be infinite, as the noise distribution is unbounded.
  
-<​code>​ +Sub-button //Table constraints//​ allows to add information that may be helpful for getting better noise levelThe details are given in [[sql-derivative-sensitivity-analyser_advanced|analyser advanced settings]]
-table_1.attr_1 range lower_bound upper_bound ; + 
-... +{{cs_settings.png}}
-table_n.attr_n range lower_bound upper_bound ; +
-</​code>​+
  
 Click on //Run analysis// button to run analysis. The results (entitled //Analysis results//) appear in the sidebar as well. The result is given for each of the input tables, and it consists of the following components. Click on //Run analysis// button to run analysis. The results (entitled //Analysis results//) appear in the sidebar as well. The result is given for each of the input tables, and it consists of the following components.
  
-  * **Combined ​sensitivity** is derivative ​sensitivity of the query w.r.t. the distance defined in the table norm. +  * **beta-smooth ​sensitivity** is the β-smooth ​sensitivity of the query w.r.t. the distance defined in the table norm. 
-  * **Additive noise** is the noise magnitude, equal to the required scaling ​of the noise added to the actual output. Since the noise comes from an unbounded distributionwe cannot give an upper bound for it, but with high probability (in our case, ca 80%) it stays below the noise magnitude+  * **actual outputs y** are the true outputs ​of the querywithout ​noise. 
-  * **Query output** is the actual outputwithout adding any noise. +  * **p%-noise magnitude a** is the additive noise magnitudei.e. the noise stays below this quantity with probability p%
-  * **Relative ​error (additive noise query output)** is the quotient of the additive noise and the query output. It shows how far the differentially private result gets from the actual result.+  * **p%-realtive ​error |a|/|y|** is the quotient of the additive noise and the query output. If there are several outputs, it is the quotient of corresponding vector norms. 
 + 
 +Clicking //View more// provides more information. It tells which value of β was finally used (which can be important if β was optimized automatically),​ what is the underlying norm, and how much noise we would get using Laplace noise distribution (by default, ​the noise level is given for Cauchy distribution).
 ===== Source code ===== ===== Source code =====
  
 The source code of SQL combined sensitivity editor is available at [[https://​github.com/​pleak-tools/​pleak-combined-sensitivity-editor|pleak-sql-combined-sensitivity-editor]] and the source code of SQL combined sensitivity analysis tools at [[https://​github.com/​pleak-tools/​pleak-sql-analysis|pleak-sql-analysis]] repositories. Installation details can be found at [[sql-derivative-sensitivity-analyser_install|analyser installation guide]]. The source code of SQL combined sensitivity editor is available at [[https://​github.com/​pleak-tools/​pleak-combined-sensitivity-editor|pleak-sql-combined-sensitivity-editor]] and the source code of SQL combined sensitivity analysis tools at [[https://​github.com/​pleak-tools/​pleak-sql-analysis|pleak-sql-analysis]] repositories. Installation details can be found at [[sql-derivative-sensitivity-analyser_install|analyser installation guide]].
sql-derivative-sensitivity-analyser.1543305627.txt.gz · Last modified: 2019/10/01 13:53 (external edit)