TY - GEN
T1 - Quantifying the Loss of Acyclic Join Dependencies
AU - Kenig, Batya
AU - Weinberger, Nir
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/6/18
Y1 - 2023/6/18
N2 - Acyclic schemes posses known benefits for database design, speeding up queries, and reducing space requirements. An acyclic join dependency (AJD) is lossless with respect to a universal relation if joining the projections associated with the schema results in the original universal relation. An intuitive and standard measure of loss entailed by an AJD is the number of redundant tuples generated by the acyclic join. Recent work has shown that the loss of an AJD can also be characterized by an information-Theoretic measure. Motivated by the problem of automatically fitting an acyclic schema to a universal relation, we investigate the connection between these two characterizations of loss. We first show that the loss of an AJD is captured using the notion of KL-Divergence. We then show that the KL-divergence can be used to bound the number of redundant tuples. We prove a deterministic lower bound on the percentage of redundant tuples. For an upper bound, we propose a random database model, and establish a high probability bound on the percentage of redundant tuples, which coincides with the lower bound for large databases.
AB - Acyclic schemes posses known benefits for database design, speeding up queries, and reducing space requirements. An acyclic join dependency (AJD) is lossless with respect to a universal relation if joining the projections associated with the schema results in the original universal relation. An intuitive and standard measure of loss entailed by an AJD is the number of redundant tuples generated by the acyclic join. Recent work has shown that the loss of an AJD can also be characterized by an information-Theoretic measure. Motivated by the problem of automatically fitting an acyclic schema to a universal relation, we investigate the connection between these two characterizations of loss. We first show that the loss of an AJD is captured using the notion of KL-Divergence. We then show that the KL-divergence can be used to bound the number of redundant tuples. We prove a deterministic lower bound on the percentage of redundant tuples. For an upper bound, we propose a random database model, and establish a high probability bound on the percentage of redundant tuples, which coincides with the lower bound for large databases.
KW - acyclic schemas
KW - decomposition
KW - normalization
KW - universal relation
UR - http://www.scopus.com/inward/record.url?scp=85164259846&partnerID=8YFLogxK
U2 - 10.1145/3584372.3588658
DO - 10.1145/3584372.3588658
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85164259846
T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
SP - 329
EP - 338
BT - PODS 2023 - Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
PB - Association for Computing Machinery
T2 - 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2023
Y2 - 18 June 2023 through 23 June 2023
ER -