Reverse-engineering conjunctive queries from provenance examples

Daniel Deutch, Amir Gilad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


The provenance of a query result details relevant parts of the input data as well as the computation leading to each output tuple. Multiple lines of work have studied the tracking and presentation of provenance, showing its effectiveness in explaining and justifying query results. The willingness of application owners to share provenance information for these purposes may however be hindered by the resulting exposure of the underlying query logic, which may be proprietary and confidential. We therefore formalize and study the following problem: when a (small) subset of the query results along with their provenance is given, what information is revealed about the underlying query? Our model is based on the provenance semiring framework and applies to many previously proposed provenance models. We analyze two flavors of the problem: (1) how many queries may be consistent with a given provenance example? and (2) what is the complexity of inferring a consistent query, or one that is a “best fit"? Our theoretical analysis shows that there may be many (for some models, even infinitely many in presence of self-joins) consistent queries, yet we provide practically efficient algorithms to find (best-fit) such queries. We experimentally show that the algorithms are generally successful in correctly reverse engineering queries, even when given only a few output examples and their provenance.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2019
Subtitle of host publication22nd International Conference on Extending Database Technology, Proceedings
EditorsZoi Kaoudi, Melanie Herschel, Carsten Binnig, Berthold Reinwald, Helena Galhardas, Irini Fundulaki
Number of pages12
ISBN (Electronic)9783893180813
StatePublished - 2019
Event22nd International Conference on Extending Database Technology, EDBT 2019 - Lisbon, Portugal
Duration: 26 Mar 201929 Mar 2019

Publication series

NameAdvances in Database Technology - EDBT
ISSN (Electronic)2367-2005


Conference22nd International Conference on Extending Database Technology, EDBT 2019


FundersFunder number
Horizon 2020 Framework Programme804302
H2020 European Research Council
European Research Council
Horizon 2020


    Dive into the research topics of 'Reverse-engineering conjunctive queries from provenance examples'. Together they form a unique fingerprint.

    Cite this