User Tools

Site Tools


sql-derivative-sensitivity-analyser

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
sql-derivative-sensitivity-analyser [2019/05/31 17:38]
alisa [Quick guide]
sql-derivative-sensitivity-analyser [2019/05/31 17:56]
alisa [Quick guide]
Line 18: Line 18:
 Clicking on a task opens two window tabs on the right side of the page (in sidebar): **Output table schema** and **Output table query**. Clicking on a task opens two window tabs on the right side of the page (in sidebar): **Output table schema** and **Output table query**.
  
-The tab **Output table schema** expects ​the schema of the table that comes out as the result of computation in the given task.+The tab **Output table schema** expects schema of the table that comes out as the result of computation in the given task. The schema description comes in SQL language in form of a CREATE TABLE statement.
  
 +<​code>​
 +CREATE TABLE t (
 +  var_1 type_1,
 +  ...
 +  var_n type_n
 +);
 +</​code>​
 +
 +The analyzer supports all data types that the underlying PSQL engine recognizes. However, only those starting with "​int"​ and "​float"​ are interpreted as numerical. All the other data types are treated as strings.
  
-The tab **Output table query** expects an SQL SELECT-statement as an input. As the goal of CS analyzer is to ensure differential privacy by adding noise to the final output of an SQL workflow, the **last task** (the one followed by a end event) should return a numerical value, i.e the output of that task should be an aggregation. Grouping by a table attribute (but not a complex expression) is allowed. The analyzer expects that the group columns come first, and the aggregation is the last one. Currently supported aggregations are COUNT, SUM, MIN, MAX, and only one aggregation per task is allowed.+The tab **Output table query** expects an SQL SELECT-statement as an input. The output of this statement should be compatible with the schema described in **Output table schema**. As the goal of CS analyzer is to ensure differential privacy by adding noise to the final output of an SQL workflow, the **last task** (the one followed by a end event) should return a numerical value, i.e the output of that task should be an aggregation. Grouping by a table attribute (but not a complex expression) is allowed. The analyzer expects that the group columns come first, and the aggregation is the last one. Currently supported aggregations are COUNT, SUM, MIN, MAX, and only one aggregation per task is allowed.
  
 <​code>​ <​code>​
Line 51: Line 60:
 == Table schema == == Table schema ==
  
-The window tab **Table schema** contains a schema description in SQL language in form of a CREATE TABLE statement.+The window tab **Table schema** contains a schema description in SQL language in form of a CREATE TABLE statement, similarly to **Output table schema** of a task object.
  
 <​code>​ <​code>​
Line 60: Line 69:
 ); );
 </​code>​ </​code>​
-Currently supported data types are ''​int4'',​ ''​int8'',​ ''​float4'',​ ''​float8'',​ ''​text'',​ ''​bool''​. 
  
 == Table data == == Table data ==
Line 103: Line 111:
 Clicking on //Analyze// button opens a menu entitled //Analysis settings// on the right side of the page (in sidebar). There you can adjust "​Privacy level ε" and "​Smoothness level β" parameters. Simply put, smaller ε means more privacy. The variable β is an optimization parameter that does not affect privacy at all, but is related to the noise lever. In general, smaller β means less noise, but not all values of β will work for a particular query, resulting in an error. The user may choose not to fix β and let the analyzer find a suitable β itself (which depends on ε). The analysis takes more time, but the obtained β can be reused afterwards, and still be reasonable if the query or the data does not change too much. Clicking on //Analyze// button opens a menu entitled //Analysis settings// on the right side of the page (in sidebar). There you can adjust "​Privacy level ε" and "​Smoothness level β" parameters. Simply put, smaller ε means more privacy. The variable β is an optimization parameter that does not affect privacy at all, but is related to the noise lever. In general, smaller β means less noise, but not all values of β will work for a particular query, resulting in an error. The user may choose not to fix β and let the analyzer find a suitable β itself (which depends on ε). The analysis takes more time, but the obtained β can be reused afterwards, and still be reasonable if the query or the data does not change too much.
  
-Clicking on a sub-button "​Attacker Settings"​ opens a window tab where the user can define known bounds on data, which may reduce the noise or even turn an infeasible result into feasible.+Clicking on a sub-button "​Attacker Settings"​ opens a window tab where the user can define known bounds on data, which may reduce the noise or even turn an infeasible result into feasible. These settings are essential for GROUP BY queries, defining the total number of possible groups.
  
 <​code>​ <​code>​
 table_1.attr_1 range lower_bound upper_bound ; table_1.attr_1 range lower_bound upper_bound ;
 ... ...
-table_n.attr_n ​range lower_bound upper_bound ​;+table_n.attr_n ​set value_1 ... value_n ​;
 </​code>​ </​code>​
  
sql-derivative-sensitivity-analyser.txt · Last modified: 2019/10/01 17:22 by alisa