Explanations for Data Repair through Shapley Values

Daniel Deutch, Nave Frost, Amir Gilad, Oren Sheffer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data repair, i.e., the identification and fix of errors in the data, is a central component of the Data Science cycle. As such, significant research effort has been devoted to automate the repair process. Yet it still requires significant manual labor by the Data Scientists, tweaking and optimizing repair modules (up to 80% of their time, according to surveys). To this end, we propose in this paper a novel framework for explaining the results of any data repair module. Explanations involve identifying the table cells and database constraints having the strongest influence on the process. Influence, in turn, is quantified through the game-theoretic notion of Shapley values, commonly used for explaining Machine Learning classifier results. The main technical challenge is that exact computation of Shapley values incurs exponential time. We consequently devise and optimize novel approximation algorithms, and analyze them both theoretically and empirically. Our results show the efficiency of our approach when compared to the alternative of adapting existing Shapley value computation techniques to the data repair settings.

Original languageEnglish
Title of host publicationCIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages362-371
Number of pages10
ISBN (Electronic)9781450384469
DOIs
StatePublished - 26 Oct 2021
Event30th ACM International Conference on Information and Knowledge Management, CIKM 2021 - Virtual, Online, Australia
Duration: 1 Nov 20215 Nov 2021

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference30th ACM International Conference on Information and Knowledge Management, CIKM 2021
Country/TerritoryAustralia
CityVirtual, Online
Period1/11/215/11/21

Keywords

  • data repair
  • denial constraints
  • explainability
  • shapley value

Fingerprint

Dive into the research topics of 'Explanations for Data Repair through Shapley Values'. Together they form a unique fingerprint.

Cite this