TY - JOUR

T1 - Mismatched codebooks and the role of entropy coding in lossy data compression

AU - Kontoyiannis, Ioannis

AU - Zamir, Ram

N1 - Funding Information:
Manuscript received March 2, 2003; revised January 2, 2006. The work of I. Kontoyiannis was supported in part by the National Science Foundation under Grant CCR-0073378, and by USDA-IFAFS under Grant 00-52100-9615. The work of R. Zamir was supported in part by the US-Israel Bi-National Science Foundation under Grant 1998-309. The material in this paper was presented in part at the IEEE International Symposium on Information Theory, Yokohama, Japan, June/July 2003.

PY - 2006/5

Y1 - 2006/5

N2 - We introduce a universal quantization scheme based on random coding, and we analyze its performance. This scheme consists of a source-independent random codebook (typically mismatched to the source distribution), followed by optimal entropy coding that is matched to the quantized codeword distribution. A single-letter formula is derived for the rate achieved by this scheme at a given distortion, in the limit of large codebook dimension. The rate reduction due to entropy coding is quantified, and it is shown that it can be arbitrarily large. In the special case of "almost uniform" codebooks (e.g., an independent and identically distributed (i.i.d.) Gaussian codebook with large variance) and difference distortion measures, a novel connection is drawn between the compression achieved by the present scheme and the performance of "universal" entropy-coded dithered lattice quantizers. This connection generalizes the "half-a-bit" bound on the redundancy of dithered lattice quantizers. Moreover, it demonstrates a strong notion of universality where a single "almost uniform" codebook is near optimal for any source and any difference distortion measure. The proofs are based on the fact that the limiting empirical distribution of the first matching codeword in a random codebook can be precisely identified. This is done using elaborate large deviations techniques, that allow the derivation of a new "almost sure" version of the conditional limit theorem.

AB - We introduce a universal quantization scheme based on random coding, and we analyze its performance. This scheme consists of a source-independent random codebook (typically mismatched to the source distribution), followed by optimal entropy coding that is matched to the quantized codeword distribution. A single-letter formula is derived for the rate achieved by this scheme at a given distortion, in the limit of large codebook dimension. The rate reduction due to entropy coding is quantified, and it is shown that it can be arbitrarily large. In the special case of "almost uniform" codebooks (e.g., an independent and identically distributed (i.i.d.) Gaussian codebook with large variance) and difference distortion measures, a novel connection is drawn between the compression achieved by the present scheme and the performance of "universal" entropy-coded dithered lattice quantizers. This connection generalizes the "half-a-bit" bound on the redundancy of dithered lattice quantizers. Moreover, it demonstrates a strong notion of universality where a single "almost uniform" codebook is near optimal for any source and any difference distortion measure. The proofs are based on the fact that the limiting empirical distribution of the first matching codeword in a random codebook can be precisely identified. This is done using elaborate large deviations techniques, that allow the derivation of a new "almost sure" version of the conditional limit theorem.

KW - Data compression

KW - Large deviations

KW - Mismatch

KW - Pattern matching

KW - Random coding

KW - Rate-distortion theory

KW - Robustness

KW - Universal Gaussian codebook

KW - Universal quantization

UR - http://www.scopus.com/inward/record.url?scp=33646036701&partnerID=8YFLogxK

U2 - 10.1109/TIT.2006.872845

DO - 10.1109/TIT.2006.872845

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:33646036701

SN - 0018-9448

VL - 52

SP - 1922

EP - 1938

JO - IEEE Transactions on Information Theory

JF - IEEE Transactions on Information Theory

IS - 5

ER -