TY - GEN
T1 - ATENA-PRO
T2 - 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023
AU - Lipman, Tavor
AU - Milo, Tova
AU - Somech, Amit
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/6/4
Y1 - 2023/6/4
N2 - One of the most common, helpful practices of data scientists, when starting the exploration of a given dataset, is to examine existing data exploration notebooks prepared by other data analysts or scientists. These notebooks contain curated sessions of contextually-related query operations that together demonstrate interesting hypotheses and conjectures on the data. Unfortunately,relevant such notebooks, that had been prepared on the same dataset, and in light of thesame analysis task, are often nonexistent or unavailable. In this work, we describe ATENA-PRO, a framework for auto-generating such relevant, personalized exploratory sessions. Using a novel specification language, users first describe their desired output notebook. Our language contains dedicated constructs for contextually connecting future output queries. These specifications are then used as input for a Deep Reinforcement Learning (DRL) engine, which auto-generates the personalized notebook. Our DRL engine relies on an existing, general-purpose, DRL framework for data exploration. However, augmenting the generic framework with user specifications requires overcoming a difficult sparsity challenge, as only a small portion of the possible sessions may be compliant with the specifications. Inspired by solutions for constrained reinforcement learning, we devise a compound, flexible reward scheme as well as specification-aware neural network architecture. Our experimental evaluation shows that the combination of these components allows ATENA-PRO to consistently generate interesting, personalized exploration sessions for various analysis tasks and datasets.
AB - One of the most common, helpful practices of data scientists, when starting the exploration of a given dataset, is to examine existing data exploration notebooks prepared by other data analysts or scientists. These notebooks contain curated sessions of contextually-related query operations that together demonstrate interesting hypotheses and conjectures on the data. Unfortunately,relevant such notebooks, that had been prepared on the same dataset, and in light of thesame analysis task, are often nonexistent or unavailable. In this work, we describe ATENA-PRO, a framework for auto-generating such relevant, personalized exploratory sessions. Using a novel specification language, users first describe their desired output notebook. Our language contains dedicated constructs for contextually connecting future output queries. These specifications are then used as input for a Deep Reinforcement Learning (DRL) engine, which auto-generates the personalized notebook. Our DRL engine relies on an existing, general-purpose, DRL framework for data exploration. However, augmenting the generic framework with user specifications requires overcoming a difficult sparsity challenge, as only a small portion of the possible sessions may be compliant with the specifications. Inspired by solutions for constrained reinforcement learning, we devise a compound, flexible reward scheme as well as specification-aware neural network architecture. Our experimental evaluation shows that the combination of these components allows ATENA-PRO to consistently generate interesting, personalized exploration sessions for various analysis tasks and datasets.
KW - AI for data analytics
KW - automated data exploration
UR - http://www.scopus.com/inward/record.url?scp=85162871848&partnerID=8YFLogxK
U2 - 10.1145/3555041.3589727
DO - 10.1145/3555041.3589727
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85162871848
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 167
EP - 170
BT - SIGMOD 2023 - Companion of the 2023 ACM/SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
Y2 - 18 June 2023 through 23 June 2023
ER -