Verifying equivalence of spark programs

Shelly Grossman*, Sara Cohen, Shachar Itzhaky, Noam Rinetzky, Mooly Sagiv

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Apache Spark is a popular framework for writing large scale data processing applications. Our long term goal is to develop automatic tools for reasoning about Spark programs. This is challenging because Spark programs combine database-like relational algebraic operations and aggregate operations, corresponding to (nested) loops, with User Defined Functions (UDF s). In this paper, we present a novel SMT-based technique for verifying the equivalence of Spark programs. We model Spark as a programming language whose semantics imitates Relational Algebra queries (with aggregations) over bags (multisets) and allows for UDFs expressible in Presburger Arithmetics. We prove that the problem of checking equivalence is undecidable even for programs which use a single aggregation operator. Thus, we present sound techniques for verifying the equivalence of interesting classes of Spark programs, and show that it is complete under certain restrictions. We implemented our technique, and applied it to a few small, but intricate, test cases.

Original languageEnglish
Title of host publicationComputer Aided Verification - 29th International Conference, CAV 2017, Proceedings
EditorsViktor Kuncak, Rupak Majumdar
PublisherSpringer Verlag
Pages282-300
Number of pages19
ISBN (Print)9783319633893
DOIs
StatePublished - 2017
Event29th International Conference on Computer Aided Verification, CAV 2017 - Heidelberg, Germany
Duration: 24 Jul 201728 Jul 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10427 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference29th International Conference on Computer Aided Verification, CAV 2017
Country/TerritoryGermany
CityHeidelberg
Period24/07/1728/07/17

Fingerprint

Dive into the research topics of 'Verifying equivalence of spark programs'. Together they form a unique fingerprint.

Cite this