TY - JOUR

T1 - Large Alphabet Source Coding Using Independent Component Analysis

AU - Painsky, Amichai

AU - Rosset, Saharon

AU - Feder, Meir

N1 - Publisher Copyright:
© 2017 IEEE.

PY - 2017/10

Y1 - 2017/10

N2 - Large alphabet source coding is a basic and well-studied problem in data compression. It has many applications, such as compression of natural language text, speech, and images. The classic perception of most commonly used methods is that a source is best described over an alphabet, which is at least as large as the observed alphabet. In this paper, we challenge this approach and introduce a conceptual framework in which a large alphabet source is decomposed into 'as statistically independent as possible' components. This decomposition allows us to apply entropy encoding to each component separately, while benefiting from their reduced alphabet size. We show that in many cases, such decomposition results in a sum of marginal entropies which is only slightly greater than the entropy of the source. Our suggested algorithm, based on a generalization of the binary independent component analysis, is applicable for a variety of large alphabet source coding setups. This includes the classical lossless compression, universal compression, and high-dimensional vector quantization. In each of these setups, our suggested approach outperforms most commonly used methods. Moreover, our proposed framework is significantly easier to implement in most of these cases.

AB - Large alphabet source coding is a basic and well-studied problem in data compression. It has many applications, such as compression of natural language text, speech, and images. The classic perception of most commonly used methods is that a source is best described over an alphabet, which is at least as large as the observed alphabet. In this paper, we challenge this approach and introduce a conceptual framework in which a large alphabet source is decomposed into 'as statistically independent as possible' components. This decomposition allows us to apply entropy encoding to each component separately, while benefiting from their reduced alphabet size. We show that in many cases, such decomposition results in a sum of marginal entropies which is only slightly greater than the entropy of the source. Our suggested algorithm, based on a generalization of the binary independent component analysis, is applicable for a variety of large alphabet source coding setups. This includes the classical lossless compression, universal compression, and high-dimensional vector quantization. In each of these setups, our suggested approach outperforms most commonly used methods. Moreover, our proposed framework is significantly easier to implement in most of these cases.

KW - Data Compression

KW - Entropy Coding

KW - Independent Component Analysis

KW - Source Coding

UR - http://www.scopus.com/inward/record.url?scp=85028872067&partnerID=8YFLogxK

U2 - 10.1109/TIT.2017.2728017

DO - 10.1109/TIT.2017.2728017

M3 - מאמר

AN - SCOPUS:85028872067

VL - 63

SP - 6514

EP - 6529

JO - IEEE Transactions on Information Theory

JF - IEEE Transactions on Information Theory

SN - 0018-9448

IS - 10

M1 - 7983009

ER -