Approximated summarization of data provenance

Eleanor Ainy, Pierre Bourhis, Susan B. Davidson, Daniel Deutch, Tova Milo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

32 Scopus citations


Many modern applications involve collecting large amounts of data from multiple sources, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes it difficult to understand how the resulting information was derived. Data provenance has proven helpful in this respect, however, maintaining and presenting the full and exact provenance information may be infeasible due to its size and complexity. We therefore introduce the notion of approximated summarized provenance, which provides a compact representation of the provenance at the possible cost of information loss. Based on this notion, we present a novel provenance summarization algorithm which, based on the semantics of the underlying data and the intended use of provenance, outputs a summary of the input provenance. Experiments measure the conciseness and accuracy of the resulting provenance summaries, and improvement in provenance usage time.

Original languageEnglish
Title of host publicationCIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Number of pages10
ISBN (Electronic)9781450337946
StatePublished - 17 Oct 2015
Event24th ACM International Conference on Information and Knowledge Management, CIKM 2015 - Melbourne, Australia
Duration: 19 Oct 201523 Oct 2015

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings


Conference24th ACM International Conference on Information and Knowledge Management, CIKM 2015


  • Crowd-sourcing applications
  • Provenance
  • Provisioning


Dive into the research topics of 'Approximated summarization of data provenance'. Together they form a unique fingerprint.

Cite this