TY - GEN
T1 - Query-oriented data cleaning with oracles
AU - Bergman, Moria
AU - Milo, Tova
AU - Novgorodov, Slava
AU - Tan, Wang Chiew
N1 - Publisher Copyright:
Copyright © 2015 ACM.
PY - 2015/5/27
Y1 - 2015/5/27
N2 - As key decisions are often made based on information contained in a database, it is important for the database to be as complete and correct as possible. For this reason, many data cleaning tools have been developed to automatically resolve inconsistencies in databases. However, data cleaning tools provide only best-effort results and usually cannot eradicate all errors that may exist in a database. Even more importantly, existing data cleaning tools do not typically address the problem of determining what information is missing from a database. To overcome the limitations of existing data cleaning techniques, we present QOCO, a novel query-oriented system for cleaning data with oracles. Under this framework, incorrect (resp. missing) tuples are removed from (added to) the result of a query through edits that are applied to the underlying database, where the edits are derived by interacting with domain experts which we model as oracle crowds. We show that the problem of determining minimal interactions with oracle crowds to derive database edits for removing (adding) incorrect (missing) tuples to the result of a query is NPhard in general and present heuristic algorithms that interact with oracle crowds. Finally, we implement our algorithms in our prototype system QOCO and show that it is effective and efficient through a comprehensive suite of experiments.
AB - As key decisions are often made based on information contained in a database, it is important for the database to be as complete and correct as possible. For this reason, many data cleaning tools have been developed to automatically resolve inconsistencies in databases. However, data cleaning tools provide only best-effort results and usually cannot eradicate all errors that may exist in a database. Even more importantly, existing data cleaning tools do not typically address the problem of determining what information is missing from a database. To overcome the limitations of existing data cleaning techniques, we present QOCO, a novel query-oriented system for cleaning data with oracles. Under this framework, incorrect (resp. missing) tuples are removed from (added to) the result of a query through edits that are applied to the underlying database, where the edits are derived by interacting with domain experts which we model as oracle crowds. We show that the problem of determining minimal interactions with oracle crowds to derive database edits for removing (adding) incorrect (missing) tuples to the result of a query is NPhard in general and present heuristic algorithms that interact with oracle crowds. Finally, we implement our algorithms in our prototype system QOCO and show that it is effective and efficient through a comprehensive suite of experiments.
UR - http://www.scopus.com/inward/record.url?scp=84957565711&partnerID=8YFLogxK
U2 - 10.1145/2723372.2737786
DO - 10.1145/2723372.2737786
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84957565711
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1199
EP - 1214
BT - SIGMOD 2015 - Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
T2 - ACM SIGMOD International Conference on Management of Data, SIGMOD 2015
Y2 - 31 May 2015 through 4 June 2015
ER -