Skip to main navigation Skip to search Skip to main content

Interactive Coreset Selection for Tabular Data: Fairness-Aware, Explainable, and User-Guided

  • Aviv Hadar
  • , Tova Milo
  • , Kathy Razmadze*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

We present a human-centric extension of CoreTab, a novel coreset algorithm for tabular data recently accepted to VLDB 2025. A coreset is a compact, representative subset of a dataset that approximates full-data training performance while greatly reducing computational cost. While CoreTab already achieves state-of-the-art accuracy and efficiency, this demonstration focuses on its interactive components that let users audit, adjust, and guide data selection-addressing real-world concerns like fairness and representativeness. CoreTab is the first coreset method with built-in explainability, offering a decision-tree-based view of which data regions were included or excluded. It also introduces the first human-in-the-loop interface for coreset refinement. We highlight design insights and open challenges for building transparent and responsible data sampling workflows.

Original languageEnglish
Title of host publicationHILDA 2025 - Workshop on Human-In-the-Loop Data Analytics, Co-located with SIGMOD 2025
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400719592
DOIs
StatePublished - 8 Jul 2025
Event2025 Workshop on Human-In-the-Loop Data Analytics, HILDA 2025, Co-located with SIGMOD 2025 - Berlin, Germany
Duration: 22 Jun 202527 Jun 2025

Publication series

NameHILDA 2025 - Workshop on Human-In-the-Loop Data Analytics, Co-located with SIGMOD 2025

Conference

Conference2025 Workshop on Human-In-the-Loop Data Analytics, HILDA 2025, Co-located with SIGMOD 2025
Country/TerritoryGermany
CityBerlin
Period22/06/2527/06/25

Funding

FundersFunder number
Israel Science Foundation2707/22

    Fingerprint

    Dive into the research topics of 'Interactive Coreset Selection for Tabular Data: Fairness-Aware, Explainable, and User-Guided'. Together they form a unique fingerprint.

    Cite this