Large Alphabet Source Coding Using Independent Component Analysis

Research output: Contribution to journalArticlepeer-review

Abstract

Large alphabet source coding is a basic and well-studied problem in data compression. It has many applications, such as compression of natural language text, speech, and images. The classic perception of most commonly used methods is that a source is best described over an alphabet, which is at least as large as the observed alphabet. In this paper, we challenge this approach and introduce a conceptual framework in which a large alphabet source is decomposed into 'as statistically independent as possible' components. This decomposition allows us to apply entropy encoding to each component separately, while benefiting from their reduced alphabet size. We show that in many cases, such decomposition results in a sum of marginal entropies which is only slightly greater than the entropy of the source. Our suggested algorithm, based on a generalization of the binary independent component analysis, is applicable for a variety of large alphabet source coding setups. This includes the classical lossless compression, universal compression, and high-dimensional vector quantization. In each of these setups, our suggested approach outperforms most commonly used methods. Moreover, our proposed framework is significantly easier to implement in most of these cases.

Original languageEnglish
Article number7983009
Pages (from-to)6514-6529
Number of pages16
JournalIEEE Transactions on Information Theory
Volume63
Issue number10
DOIs
StatePublished - Oct 2017

Keywords

  • Data Compression
  • Entropy Coding
  • Independent Component Analysis
  • Source Coding

Fingerprint

Dive into the research topics of 'Large Alphabet Source Coding Using Independent Component Analysis'. Together they form a unique fingerprint.

Cite this