TY - JOUR
T1 - Large Alphabet Source Coding Using Independent Component Analysis
AU - Painsky, Amichai
AU - Rosset, Saharon
AU - Feder, Meir
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/10
Y1 - 2017/10
N2 - Large alphabet source coding is a basic and well-studied problem in data compression. It has many applications, such as compression of natural language text, speech, and images. The classic perception of most commonly used methods is that a source is best described over an alphabet, which is at least as large as the observed alphabet. In this paper, we challenge this approach and introduce a conceptual framework in which a large alphabet source is decomposed into 'as statistically independent as possible' components. This decomposition allows us to apply entropy encoding to each component separately, while benefiting from their reduced alphabet size. We show that in many cases, such decomposition results in a sum of marginal entropies which is only slightly greater than the entropy of the source. Our suggested algorithm, based on a generalization of the binary independent component analysis, is applicable for a variety of large alphabet source coding setups. This includes the classical lossless compression, universal compression, and high-dimensional vector quantization. In each of these setups, our suggested approach outperforms most commonly used methods. Moreover, our proposed framework is significantly easier to implement in most of these cases.
AB - Large alphabet source coding is a basic and well-studied problem in data compression. It has many applications, such as compression of natural language text, speech, and images. The classic perception of most commonly used methods is that a source is best described over an alphabet, which is at least as large as the observed alphabet. In this paper, we challenge this approach and introduce a conceptual framework in which a large alphabet source is decomposed into 'as statistically independent as possible' components. This decomposition allows us to apply entropy encoding to each component separately, while benefiting from their reduced alphabet size. We show that in many cases, such decomposition results in a sum of marginal entropies which is only slightly greater than the entropy of the source. Our suggested algorithm, based on a generalization of the binary independent component analysis, is applicable for a variety of large alphabet source coding setups. This includes the classical lossless compression, universal compression, and high-dimensional vector quantization. In each of these setups, our suggested approach outperforms most commonly used methods. Moreover, our proposed framework is significantly easier to implement in most of these cases.
KW - Data Compression
KW - Entropy Coding
KW - Independent Component Analysis
KW - Source Coding
UR - http://www.scopus.com/inward/record.url?scp=85028872067&partnerID=8YFLogxK
U2 - 10.1109/TIT.2017.2728017
DO - 10.1109/TIT.2017.2728017
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85028872067
SN - 0018-9448
VL - 63
SP - 6514
EP - 6529
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 10
M1 - 7983009
ER -