TY - JOUR
T1 - PreFair
T2 - 49th International Conference on Very Large Data Bases, VLDB 2023
AU - Pujol, David
AU - Gilad, Amir
AU - Machanavajjhala, Ashwin
N1 - Publisher Copyright:
© 2023, VLDB Endowment. All rights reserved.
PY - 2023
Y1 - 2023
N2 - When a database is protected by Differential Privacy (DP), its us-ability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the original data. Therefore, multiple works have been devoted to devising systems for DP synthetic data generation. However, such systems may preserve or even magnify properties of the data that make it unfair, rendering the synthetic data unfit for use. In this work, we present PreFair, a system that allows for DP fair synthetic data generation. PreFair extends the state-of-the-art DP data generation mechanisms by incorporating a causal fairness criterion that ensures fair synthetic data. We adapt the notion of justifiable fairness to fit the synthetic data generation scenario. We further study the problem of generating DP fair synthetic data, showing its intractability and designing algorithms that are optimal under certain assumptions. We also provide an extensive experimental evaluation, showing that PreFair generates synthetic data that is significantly fairer than the data generated by leading DP data generation mechanisms, while remaining faithful to the private data.
AB - When a database is protected by Differential Privacy (DP), its us-ability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the original data. Therefore, multiple works have been devoted to devising systems for DP synthetic data generation. However, such systems may preserve or even magnify properties of the data that make it unfair, rendering the synthetic data unfit for use. In this work, we present PreFair, a system that allows for DP fair synthetic data generation. PreFair extends the state-of-the-art DP data generation mechanisms by incorporating a causal fairness criterion that ensures fair synthetic data. We adapt the notion of justifiable fairness to fit the synthetic data generation scenario. We further study the problem of generating DP fair synthetic data, showing its intractability and designing algorithms that are optimal under certain assumptions. We also provide an extensive experimental evaluation, showing that PreFair generates synthetic data that is significantly fairer than the data generated by leading DP data generation mechanisms, while remaining faithful to the private data.
UR - http://www.scopus.com/inward/record.url?scp=85152888592&partnerID=8YFLogxK
U2 - 10.14778/3583140.3583168
DO - 10.14778/3583140.3583168
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85152888592
SN - 2150-8097
VL - 16
SP - 1573
EP - 1586
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 6
Y2 - 28 August 2023 through 1 September 2023
ER -