Query-oriented data cleaning with oracles

Moria Bergman, Tova Milo, Slava Novgorodov, Wang Chiew Tan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As key decisions are often made based on information contained in a database, it is important for the database to be as complete and correct as possible. For this reason, many data cleaning tools have been developed to automatically resolve inconsistencies in databases. However, data cleaning tools provide only best-effort results and usually cannot eradicate all errors that may exist in a database. Even more importantly, existing data cleaning tools do not typically address the problem of determining what information is missing from a database. To overcome the limitations of existing data cleaning techniques, we present QOCO, a novel query-oriented system for cleaning data with oracles. Under this framework, incorrect (resp. missing) tuples are removed from (added to) the result of a query through edits that are applied to the underlying database, where the edits are derived by interacting with domain experts which we model as oracle crowds. We show that the problem of determining minimal interactions with oracle crowds to derive database edits for removing (adding) incorrect (missing) tuples to the result of a query is NPhard in general and present heuristic algorithms that interact with oracle crowds. Finally, we implement our algorithms in our prototype system QOCO and show that it is effective and efficient through a comprehensive suite of experiments.

Original languageEnglish
Title of host publicationSIGMOD 2015 - Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1199-1214
Number of pages16
ISBN (Electronic)9781450327589
DOIs
StatePublished - 27 May 2015
EventACM SIGMOD International Conference on Management of Data, SIGMOD 2015 - Melbourne, Australia
Duration: 31 May 20154 Jun 2015

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
Volume2015-May
ISSN (Print)0730-8078

Conference

ConferenceACM SIGMOD International Conference on Management of Data, SIGMOD 2015
Country/TerritoryAustralia
CityMelbourne
Period31/05/154/06/15

Fingerprint

Dive into the research topics of 'Query-oriented data cleaning with oracles'. Together they form a unique fingerprint.

Cite this