TY - JOUR
T1 - Validation Assessment of Privacy-Preserving Synthetic Electronic Health Record Data
T2 - Comparison of Original Versus Synthetic Data on Real-World COVID-19 Vaccine Effectiveness
AU - Wang, Echo
AU - Mott, Katrina
AU - Zhang, Hongtao
AU - Gazit, Sivan
AU - Chodick, Gabriel
AU - Burcu, Mehmet
N1 - Publisher Copyright:
© 2024 John Wiley & Sons Ltd.
PY - 2024/10
Y1 - 2024/10
N2 - Purpose: To assess the validity of privacy-preserving synthetic data by comparing results from synthetic versus original EHR data analysis. Methods: A published retrospective cohort study on real-world effectiveness of COVID-19 vaccines by Maccabi Healthcare Services in Israel was replicated using synthetic data generated from the same source, and the results were compared between synthetic versus original datasets. The endpoints included COVID-19 infection, symptomatic COVID-19 infection and hospitalization due to infection and were also assessed in several demographic and clinical subgroups. In comparing synthetic versus original data estimates, several metrices were utilized: standardized mean differences (SMD), decision agreement, estimate agreement, confidence interval overlap, and Wald test. Synthetic data were generated five times to assess the stability of results. Results: The distribution of demographic and clinical characteristics demonstrated very small difference (< 0.01 SMD). In the comparison of vaccine effectiveness assessed in relative risk reduction between synthetic versus original data, there was a 100% decision agreement, 100% estimate agreement, and a high level of confidence interval overlap (88.7%–99.7%) in all five replicates across all subgroups. Similar findings were achieved in the assessment of vaccine effectiveness against symptomatic COVID-19 Infection. In the comparison of hazard ratios for COVID 19-related hospitalization and odds ratio for symptomatic COVID-19 Infection, the Wald tests suggested no significant difference between respective effect estimates in all five replicates for all patient subgroups but there were disagreements in estimate and decision metrices in some subgroups and replicates. Conclusions: Overall, comparison of synthetic versus original real-world data demonstrated good validity and reliability. Transparency on the process to generate high fidelity synthetic data and assurances of patient privacy are warranted.
AB - Purpose: To assess the validity of privacy-preserving synthetic data by comparing results from synthetic versus original EHR data analysis. Methods: A published retrospective cohort study on real-world effectiveness of COVID-19 vaccines by Maccabi Healthcare Services in Israel was replicated using synthetic data generated from the same source, and the results were compared between synthetic versus original datasets. The endpoints included COVID-19 infection, symptomatic COVID-19 infection and hospitalization due to infection and were also assessed in several demographic and clinical subgroups. In comparing synthetic versus original data estimates, several metrices were utilized: standardized mean differences (SMD), decision agreement, estimate agreement, confidence interval overlap, and Wald test. Synthetic data were generated five times to assess the stability of results. Results: The distribution of demographic and clinical characteristics demonstrated very small difference (< 0.01 SMD). In the comparison of vaccine effectiveness assessed in relative risk reduction between synthetic versus original data, there was a 100% decision agreement, 100% estimate agreement, and a high level of confidence interval overlap (88.7%–99.7%) in all five replicates across all subgroups. Similar findings were achieved in the assessment of vaccine effectiveness against symptomatic COVID-19 Infection. In the comparison of hazard ratios for COVID 19-related hospitalization and odds ratio for symptomatic COVID-19 Infection, the Wald tests suggested no significant difference between respective effect estimates in all five replicates for all patient subgroups but there were disagreements in estimate and decision metrices in some subgroups and replicates. Conclusions: Overall, comparison of synthetic versus original real-world data demonstrated good validity and reliability. Transparency on the process to generate high fidelity synthetic data and assurances of patient privacy are warranted.
KW - data access
KW - data validity/quality
KW - patient privacy
KW - real-world data
KW - real-world evidence
KW - replicability
KW - synthetic data
UR - https://www.scopus.com/pages/publications/85205822520
U2 - 10.1002/pds.70019
DO - 10.1002/pds.70019
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 39375947
AN - SCOPUS:85205822520
SN - 1053-8569
VL - 33
JO - Pharmacoepidemiology and Drug Safety
JF - Pharmacoepidemiology and Drug Safety
IS - 10
M1 - e70019
ER -