TY - JOUR
T1 - Putting Lipstick on Pig
T2 - Enabling database-style workflow provenance
AU - Amsterdamer, Yael
AU - Davidson, Susan B.
AU - Deutch, Daniel
AU - Milo, Tova
AU - Stoyanovich, Julia
AU - Tannen, Val
PY - 2011/12
Y1 - 2011/12
N2 - Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all in-puts (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an out-put may depend on only a small subset of the inputs (fine-grained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow prove-nance. It also enables a number of novel graph transforma-tion operations, allowing to choose the desired level of gran-ularity in provenance querying (ZoomIn and ZoomOut), and supporting "what-if" workflow analytic queries. We imple-mented our approach in the Lipstick system and developed a benchmark in support of a systematic performance eval-uation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.
AB - Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all in-puts (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an out-put may depend on only a small subset of the inputs (fine-grained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow prove-nance. It also enables a number of novel graph transforma-tion operations, allowing to choose the desired level of gran-ularity in provenance querying (ZoomIn and ZoomOut), and supporting "what-if" workflow analytic queries. We imple-mented our approach in the Lipstick system and developed a benchmark in support of a systematic performance eval-uation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.
UR - http://www.scopus.com/inward/record.url?scp=84863479950&partnerID=8YFLogxK
U2 - 10.14778/2095686.2095693
DO - 10.14778/2095686.2095693
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84863479950
SN - 2150-8097
VL - 5
SP - 346
EP - 357
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 4
ER -