TY - JOUR
T1 - Bregman divergence bounds and universality properties of the logarithmic loss
AU - Painsky, Amichai
AU - Wornell, Gregory W.
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2020/3
Y1 - 2020/3
N2 - A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying probability. We show that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullback-Leibler (KL) divergence, to within a normalization constant. This implies that by minimizing the logarithmic loss associated with the KL divergence, we minimize an upper bound to any choice of loss from this set. As such the logarithmic loss is universal in the sense of providing performance guarantees with respect to a broad class of accuracy measures. Importantly, this notion of universality is not problem-specific, enabling its use in diverse applications, including predictive modeling, data clustering and sample complexity analysis. Generalizations to arbitary finite alphabets are also developed. The derived inequalities extend several well-known f-divergence results.
AB - A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying probability. We show that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullback-Leibler (KL) divergence, to within a normalization constant. This implies that by minimizing the logarithmic loss associated with the KL divergence, we minimize an upper bound to any choice of loss from this set. As such the logarithmic loss is universal in the sense of providing performance guarantees with respect to a broad class of accuracy measures. Importantly, this notion of universality is not problem-specific, enabling its use in diverse applications, including predictive modeling, data clustering and sample complexity analysis. Generalizations to arbitary finite alphabets are also developed. The derived inequalities extend several well-known f-divergence results.
KW - Bregman divergences
KW - Kullback-Leibler (KL) divergence
KW - Pinsker inequality
KW - logarithmic loss
UR - http://www.scopus.com/inward/record.url?scp=85081063560&partnerID=8YFLogxK
U2 - 10.1109/TIT.2019.2958705
DO - 10.1109/TIT.2019.2958705
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85081063560
SN - 0018-9448
VL - 66
SP - 1658
EP - 1673
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 3
M1 - 8930624
ER -