Data analytics often make sense of large data sets by generalization: aggregating from the detailed data to a more general context. Given a dataset, misleading generalizations can sometimes be drawn from a cherry-picked level of aggregation to obscure substantial subgroups that oppose the generalization. Our goal is to detect and explain cherry-picked generalizations by refining the corresponding aggregate queries. We demonstrate OREO, a system to compute a support score of the given statement to quantify the quality of the generalization; that is, whether the aggregated result is an accurate reflection of the data. To better understand the resulting score, our system also identifies significant counterexamples and alternative statements that better represent the data at hand. We will demonstrate the utility of OREO for investigating generalizations, by interacting with the VLDB’22 participants who will use the OREO interface for statement validation and explanation.
|Number of pages||4|
|Journal||Proceedings of the VLDB Endowment|
|State||Published - 2022|
|Event||48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia|
Duration: 5 Sep 2022 → 9 Sep 2022