Efficient provenance tracking for datalog using top-k queries

Daniel Deutch, Amir Gilad*, Yuval Moskovitch

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


Highly expressive declarative languages, such as datalog, are now commonly used to model the operational logic of data-intensive applications. The typical complexity of such datalog programs, and the large volume of data that they process, call for result explanation. Results may be explained through the tracking and presentation of data provenance, defined here as the set of derivation trees of a given fact. While informative, the size of such full provenance information is typically too large and complex (even when compactly represented) to allow displaying it to the user. To this end, we propose a novel top-k query language for querying datalog provenance, supporting selection criteria based on tree patterns and ranking based on the rules and database facts used in derivation. We propose an efficient novel algorithm that computes in polynomial data complexity a compact representation of the top-k trees which may be explicitly constructed in linear time with respect to their size. We further experimentally study the algorithm performance, showing its scalability even for complex datalog programs where full provenance tracking is infeasible.

Original languageEnglish
Pages (from-to)245-269
Number of pages25
JournalVLDB Journal
Issue number2
StatePublished - 1 Apr 2018


FundersFunder number
Blavatnik Interdisciplinary Cyber Research Center
Israel Science Foundation1636/13, 978/17


    • Datalog
    • Provenance
    • Top-K


    Dive into the research topics of 'Efficient provenance tracking for datalog using top-k queries'. Together they form a unique fingerprint.

    Cite this