sql-derivative-sensitivity-analyser

This shows you the differences between two versions of the page.

Both sides previous revision Previous revision Next revision | Previous revision | ||

sql-derivative-sensitivity-analyser [2019/05/31 17:56] alisa [Quick guide] |
sql-derivative-sensitivity-analyser [2019/10/01 17:22] (current) alisa [Description] |
||
---|---|---|---|

Line 5: | Line 5: | ||

===== Description ===== | ===== Description ===== | ||

- | Sensitivity of a function is the expected maximum change in the output, given a change in the input of the function. Sensitivity is the basis for calibrating the amount of noise to be added to prevent leakages on statistical database queries using a differential privacy mechanism. The analyser computes sensitivity of an SQL workflow and reports the amount of noise that we need to tolerate to achieve ε-differential privacy for desired ε. | + | Sensitivity of a function is the expected maximum change in the output, given a change in the input of the function. Sensitivity is the basis for calibrating the amount of noise to be added to prevent leakages on statistical database queries using a differential privacy mechanism. The analyzer computes sensitivity of an SQL workflow and reports the amount of noise that we need to tolerate to achieve ε-differential privacy for desired ε. |

===== Quick guide ===== | ===== Quick guide ===== | ||

- | SQL combined sensitivity analyzer is accessible through Actions menu (with burger-menu icon) under each model in own and shared models/folders lists of frontend - link "Open in SQL CS editor". Different capabilities of the analyser are also demostrated on a particular [[sql-derivative-sensitivity-analyser_demo|example model]]. | + | SQL combined sensitivity analyzer is accessible through Actions menu (with burger-menu icon) under each model in own and shared models/folders lists of frontend - link "Open in SQL CS editor". Different capabilities of the analyzer are also demonstrated on a particular [[sql-derivative-sensitivity-analyser_demo|example model]]. |

- | | + | |

- | === Adding information to model elements for analysis === | + | |

- | | + | |

- | Clicking on tasks or data objects opens a menu on the right side of the page (in sidebar). You can add SQL scripts (in a form of stored procedures) to tasks. | + | |

=== Adding SQL scripts === | === Adding SQL scripts === | ||

Line 34: | Line 30: | ||

<code> | <code> | ||

SELECT | SELECT | ||

- | table_i1.attr_1 AS group_1, | ||

- | ... | ||

- | table_ik.attr_k AS group_k, | ||

query_expression | query_expression | ||

FROM | FROM | ||

Line 43: | Line 36: | ||

table_n AS alias_n | table_n AS alias_n | ||

WHERE | WHERE | ||

- | condition_1 AND | + | condition |

- | ... | + | ; |

- | condition_n | + | |

- | GROUP BY | + | |

- | table_i1.attr_1, | + | |

- | ... | + | |

- | table_ik.attr_k | + | |

</code> | </code> | ||

Line 77: | Line 65: | ||

^ ^ A ^ B ^ C ^ | ^ ^ A ^ B ^ C ^ | ||

- | ^ 1 | attr_1| attr_2 | attr_3| | + | ^ 1 | var_1| var_2 | var_3| |

^ 2 | x11 | x21 | x31 | | ^ 2 | x11 | x21 | x31 | | ||

^ 3 | x12 | x22 | x32 | | ^ 3 | x12 | x22 | x32 | | ||

Line 99: | Line 87: | ||

It is possible to define more complicated distances for tables, e.g. scaling them, so that one attribute is considered more sensitive than the other one. The details can be found at [[sql-derivative-sensitivity-analyser_advanced|analyser advanced settings]]. | It is possible to define more complicated distances for tables, e.g. scaling them, so that one attribute is considered more sensitive than the other one. The details can be found at [[sql-derivative-sensitivity-analyser_advanced|analyser advanced settings]]. | ||

- | Instead of (or in addition to) tracking changes in certain cells, we may also use a more common definition of DP, considering addition and removal of rows. For this, we need to define the cost of adding or removing a row, defined by a variable G. It assumes that all rows are sensitive, so we need to mark ''rows: all ;'' to make it work properly. | + | === Running analysis === |

- | <code> | + | Clicking on //Analyze// button opens a menu entitled //Analysis settings// on the right side of the page (in sidebar). There you can adjust "Privacy level ε" and "Smoothness level β" parameters. Simply put, smaller ε means more privacy. For simplicity, set β radio button to //off//, as it does not affect privacy and is needed for noise optimization. |

- | rows: all ; | + | |

- | cols: none ; | + | |

- | G: 1 ; | + | |

- | </code> | + | |

- | === Running analysis === | + | The slider //Confidence level of estimated noise//, which ranges between 0% and 100%, fixes the probability with which the noise stays below reported bound. The larger is confidence, the larger is the noise. For 100% confidence, the noise would be infinite, as the noise distribution is unbounded. |

- | Clicking on //Analyze// button opens a menu entitled //Analysis settings// on the right side of the page (in sidebar). There you can adjust "Privacy level ε" and "Smoothness level β" parameters. Simply put, smaller ε means more privacy. The variable β is an optimization parameter that does not affect privacy at all, but is related to the noise lever. In general, smaller β means less noise, but not all values of β will work for a particular query, resulting in an error. The user may choose not to fix β and let the analyzer find a suitable β itself (which depends on ε). The analysis takes more time, but the obtained β can be reused afterwards, and still be reasonable if the query or the data does not change too much. | + | Sub-button //Table constraints// allows to add information that may be helpful for getting better noise level. The details are given in [[sql-derivative-sensitivity-analyser_advanced|analyser advanced settings]]. |

- | Clicking on a sub-button "Attacker Settings" opens a window tab where the user can define known bounds on data, which may reduce the noise or even turn an infeasible result into feasible. These settings are essential for GROUP BY queries, defining the total number of possible groups. | + | {{cs_settings.png}} |

- | <code> | + | Click on //Run analysis// button to run analysis. The results (entitled //Analysis results//) appear in the sidebar as well. The result is given for each of the input tables, and it consists of the following components. |

- | table_1.attr_1 range lower_bound upper_bound ; | + | |

- | ... | + | |

- | table_n.attr_n set value_1 ... value_n ; | + | |

- | </code> | + | |

- | Click on //Run analysis// button to run analysis. The results (entitled //Analysis results//) appear in the sidebar as well. The result is given for each of the input tables, and it consists of the following components. | + | * **beta-smooth sensitivity** is the β-smooth sensitivity of the query w.r.t. the distance defined in the table norm. |

+ | * **actual outputs y** are the true outputs of the query, without noise. | ||

+ | * **p%-noise magnitude a** is the additive noise magnitude, i.e. the noise stays below this quantity with probability p%. | ||

+ | * **p%-realtive error |a|/|y|** is the quotient of the additive noise and the query output. If there are several outputs, it is the quotient of corresponding vector norms. | ||

- | * **Combined sensitivity** is derivative sensitivity of the query w.r.t. the distance defined in the table norm. | + | Clicking //View more// provides more information. It tells which value of β was finally used (which can be important if β was optimized automatically), what is the underlying norm, and how much noise we would get using Laplace noise distribution (by default, the noise level is given for Cauchy distribution). |

- | * **Additive noise** is the noise magnitude, equal to the required scaling of the noise added to the actual output. Since the noise comes from an unbounded distribution, we cannot give an upper bound for it, but with high probability (in our case, ca 80%) it stays below the noise magnitude. | + | |

- | * **Query output** is the actual output, without adding any noise. | + | |

- | * **Relative error (additive noise / query output)** is the quotient of the additive noise and the query output. It shows how far the differentially private result gets from the actual result. | + | |

===== Source code ===== | ===== Source code ===== | ||

The source code of SQL combined sensitivity editor is available at [[https://github.com/pleak-tools/pleak-combined-sensitivity-editor|pleak-sql-combined-sensitivity-editor]] and the source code of SQL combined sensitivity analysis tools at [[https://github.com/pleak-tools/pleak-sql-analysis|pleak-sql-analysis]] repositories. Installation details can be found at [[sql-derivative-sensitivity-analyser_install|analyser installation guide]]. | The source code of SQL combined sensitivity editor is available at [[https://github.com/pleak-tools/pleak-combined-sensitivity-editor|pleak-sql-combined-sensitivity-editor]] and the source code of SQL combined sensitivity analysis tools at [[https://github.com/pleak-tools/pleak-sql-analysis|pleak-sql-analysis]] repositories. Installation details can be found at [[sql-derivative-sensitivity-analyser_install|analyser installation guide]]. |

sql-derivative-sensitivity-analyser.1559314618.txt.gz · Last modified: 2019/10/01 13:53 (external edit)