Mismatched codebooks and the role of entropy coding in lossy data compression

Ioannis Kontoyiannis*, Ram Zamir

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


We introduce a universal quantization scheme based on random coding, and we analyze its performance. This scheme consists of a source-independent random codebook (typically mismatched to the source distribution), followed by optimal entropy coding that is matched to the quantized codeword distribution. A single-letter formula is derived for the rate achieved by this scheme at a given distortion, in the limit of large codebook dimension. The rate reduction due to entropy coding is quantified, and it is shown that it can be arbitrarily large. In the special case of "almost uniform" codebooks (e.g., an independent and identically distributed (i.i.d.) Gaussian codebook with large variance) and difference distortion measures, a novel connection is drawn between the compression achieved by the present scheme and the performance of "universal" entropy-coded dithered lattice quantizers. This connection generalizes the "half-a-bit" bound on the redundancy of dithered lattice quantizers. Moreover, it demonstrates a strong notion of universality where a single "almost uniform" codebook is near optimal for any source and any difference distortion measure. The proofs are based on the fact that the limiting empirical distribution of the first matching codeword in a random codebook can be precisely identified. This is done using elaborate large deviations techniques, that allow the derivation of a new "almost sure" version of the conditional limit theorem.

Original languageEnglish
Pages (from-to)1922-1938
Number of pages17
JournalIEEE Transactions on Information Theory
Issue number5
StatePublished - May 2006


  • Data compression
  • Large deviations
  • Mismatch
  • Pattern matching
  • Random coding
  • Rate-distortion theory
  • Robustness
  • Universal Gaussian codebook
  • Universal quantization


Dive into the research topics of 'Mismatched codebooks and the role of entropy coding in lossy data compression'. Together they form a unique fingerprint.

Cite this