User Tools

Site Tools


sql-derivative-sensitivity-analyser

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
sql-derivative-sensitivity-analyser [2018/11/27 10:00]
alisa [Description]
sql-derivative-sensitivity-analyser [2019/05/31 17:56]
alisa [Quick guide]
Line 16: Line 16:
 === Adding SQL scripts === === Adding SQL scripts ===
  
-Clicking on a task opens window ​tab **Query input** ​on the right side of the page (in sidebar) ​for entering an SQL scriptCurrently, ​the script should be in form of a stored procedure.+Clicking on a task opens two window ​tabs on the right side of the page (in sidebar): **Output table schema** and **Output table query**. 
 + 
 +The tab **Output table schema** expects schema of the table that comes out as the result of computation in the given task. The schema description comes in SQL language ​in form of a CREATE TABLE statement. 
 <​code>​ <​code>​
-CREATE ​OR REPLACE FUNCTION f() RETURNS ​TABLE (+CREATE TABLE (
   var_1 type_1,   var_1 type_1,
   ...   ...
-  ​var_2 type_2 +  ​var_n type_n 
-as $$ +)
 +</​code>​
  
 +The analyzer supports all data types that the underlying PSQL engine recognizes. However, only those starting with "​int"​ and "​float"​ are interpreted as numerical. All the other data types are treated as strings.
 +
 +The tab **Output table query** expects an SQL SELECT-statement as an input. The output of this statement should be compatible with the schema described in **Output table schema**. As the goal of CS analyzer is to ensure differential privacy by adding noise to the final output of an SQL workflow, the **last task** (the one followed by a end event) should return a numerical value, i.e the output of that task should be an aggregation. Grouping by a table attribute (but not a complex expression) is allowed. The analyzer expects that the group columns come first, and the aggregation is the last one. Currently supported aggregations are COUNT, SUM, MIN, MAX, and only one aggregation per task is allowed.
 +
 +<​code>​
 SELECT SELECT
 +    table_i1.attr_1 AS group_1,
 +    ...
 +    table_ik.attr_k AS group_k,
     query_expression     query_expression
 FROM FROM
Line 34: Line 46:
     ...     ...
     condition_n     condition_n
-$$ language SQL;+GROUP BY 
 +    table_i1.attr_1,​ 
 +    ... 
 +    table_ik.attr_k
 </​code>​ </​code>​
  
-The goal of CS analyzer is to ensure differential privacy by adding noise to the final output of an SQL workflow. Hence, ​the **last task** (the one followed by a end event) ​should return exactly one numerical value, i.e the query_expression of that task should be an aggregation. Currently supported aggregations are COUNT, SUM, MIN, MAX. Conversely, all the **intermediate tasks** should not contain aggregations, as they are not supported yet.+All the **intermediate tasks** (the ones not followed by a end event) ​may as well contain ​SELECT-statements without ​aggregations.
  
 === Adding table data === === Adding table data ===
Line 45: Line 60:
 == Table schema == == Table schema ==
  
-The window tab **Table schema** contains a schema description in SQL language in form of a CREATE TABLE statement.+The window tab **Table schema** contains a schema description in SQL language in form of a CREATE TABLE statement, similarly to **Output table schema** of a task object.
  
 <​code>​ <​code>​
Line 54: Line 69:
 ); );
 </​code>​ </​code>​
-Currently supported data types are ''​int4'',​ ''​int8'',​ ''​float4'',​ ''​float8'',​ ''​text'',​ ''​bool''​. 
  
 == Table data == == Table data ==
Line 97: Line 111:
 Clicking on //Analyze// button opens a menu entitled //Analysis settings// on the right side of the page (in sidebar). There you can adjust "​Privacy level ε" and "​Smoothness level β" parameters. Simply put, smaller ε means more privacy. The variable β is an optimization parameter that does not affect privacy at all, but is related to the noise lever. In general, smaller β means less noise, but not all values of β will work for a particular query, resulting in an error. The user may choose not to fix β and let the analyzer find a suitable β itself (which depends on ε). The analysis takes more time, but the obtained β can be reused afterwards, and still be reasonable if the query or the data does not change too much. Clicking on //Analyze// button opens a menu entitled //Analysis settings// on the right side of the page (in sidebar). There you can adjust "​Privacy level ε" and "​Smoothness level β" parameters. Simply put, smaller ε means more privacy. The variable β is an optimization parameter that does not affect privacy at all, but is related to the noise lever. In general, smaller β means less noise, but not all values of β will work for a particular query, resulting in an error. The user may choose not to fix β and let the analyzer find a suitable β itself (which depends on ε). The analysis takes more time, but the obtained β can be reused afterwards, and still be reasonable if the query or the data does not change too much.
  
-Clicking on a sub-button "​Attacker Settings"​ opens a window tab where the user can define known bounds on data, which may reduce the noise or even turn an infeasible result into feasible.+Clicking on a sub-button "​Attacker Settings"​ opens a window tab where the user can define known bounds on data, which may reduce the noise or even turn an infeasible result into feasible. These settings are essential for GROUP BY queries, defining the total number of possible groups.
  
 <​code>​ <​code>​
 table_1.attr_1 range lower_bound upper_bound ; table_1.attr_1 range lower_bound upper_bound ;
 ... ...
-table_n.attr_n ​range lower_bound upper_bound ​;+table_n.attr_n ​set value_1 ... value_n ​;
 </​code>​ </​code>​
  
sql-derivative-sensitivity-analyser.txt · Last modified: 2019/10/01 17:22 by alisa