TY - GEN
T1 - Interactive Coreset Selection for Tabular Data
T2 - 2025 Workshop on Human-In-the-Loop Data Analytics, HILDA 2025, Co-located with SIGMOD 2025
AU - Hadar, Aviv
AU - Milo, Tova
AU - Razmadze, Kathy
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/7/8
Y1 - 2025/7/8
N2 - We present a human-centric extension of CoreTab, a novel coreset algorithm for tabular data recently accepted to VLDB 2025. A coreset is a compact, representative subset of a dataset that approximates full-data training performance while greatly reducing computational cost. While CoreTab already achieves state-of-the-art accuracy and efficiency, this demonstration focuses on its interactive components that let users audit, adjust, and guide data selection-addressing real-world concerns like fairness and representativeness. CoreTab is the first coreset method with built-in explainability, offering a decision-tree-based view of which data regions were included or excluded. It also introduces the first human-in-the-loop interface for coreset refinement. We highlight design insights and open challenges for building transparent and responsible data sampling workflows.
AB - We present a human-centric extension of CoreTab, a novel coreset algorithm for tabular data recently accepted to VLDB 2025. A coreset is a compact, representative subset of a dataset that approximates full-data training performance while greatly reducing computational cost. While CoreTab already achieves state-of-the-art accuracy and efficiency, this demonstration focuses on its interactive components that let users audit, adjust, and guide data selection-addressing real-world concerns like fairness and representativeness. CoreTab is the first coreset method with built-in explainability, offering a decision-tree-based view of which data regions were included or excluded. It also introduces the first human-in-the-loop interface for coreset refinement. We highlight design insights and open challenges for building transparent and responsible data sampling workflows.
UR - https://www.scopus.com/pages/publications/105012254434
U2 - 10.1145/3736733.3736735
DO - 10.1145/3736733.3736735
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:105012254434
T3 - HILDA 2025 - Workshop on Human-In-the-Loop Data Analytics, Co-located with SIGMOD 2025
BT - HILDA 2025 - Workshop on Human-In-the-Loop Data Analytics, Co-located with SIGMOD 2025
PB - Association for Computing Machinery, Inc
Y2 - 22 June 2025 through 27 June 2025
ER -