Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning

Ori Bar El, Tova Milo, Amit Somech

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

51 Scopus citations

Abstract

Exploratory Data Analysis (EDA) is an essential yet highly demanding task. To get a head start before exploring a new dataset, data scientists often prefer to view existing EDA notebooks - illustrative, curated exploratory sessions, on the same dataset, that were created by fellow data scientists who shared them online. Unfortunately, such notebooks are not always available (e.g., if the dataset is new or confidential). To address this, we present ATENA, a system that takes an input dataset and auto-generates a compelling exploratory session, presented in an EDA notebook. We shape EDA into a control problem, and devise a novel Deep Reinforcement Learning (DRL) architecture to effectively optimize the notebook generation. Though ATENA uses a limited set of EDA operations, our experiments show that it generates useful EDA notebooks, allowing users to gain actual insights.

Original languageEnglish
Title of host publicationSIGMOD 2020 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1527-1537
Number of pages11
ISBN (Electronic)9781450367356
DOIs
StatePublished - 14 Jun 2020
Event2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020 - Portland, United States
Duration: 14 Jun 202019 Jun 2020

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020
Country/TerritoryUnited States
CityPortland
Period14/06/2019/06/20

Funding

FundersFunder number
Israel Innovation Authority
MDM
US-Israel Science Foundation
Israel Science Foundation

    Keywords

    • EDA
    • EDA notebooks
    • auto EDA
    • auto generated
    • autogenerated
    • data exploration
    • interactive data analysis
    • notebooks

    Fingerprint

    Dive into the research topics of 'Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning'. Together they form a unique fingerprint.

    Cite this