TY - GEN
T1 - A Simple and Efficient Approach for Adaptive Entropy Coding over Large Alphabets
AU - Painsky, Amichai
AU - Rosset, Saharon
AU - Feder, Meir
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/12/15
Y1 - 2016/12/15
N2 - Encoding a sequence of independent symbols over a large alphabet size is a challenging problem with applications in many fields. The most widely used adaptive entropy coding techniques (namely, arithmetic and Huffman coding) are known to achieve an average codeword length which may be significantly greater than the empirical entropy of the sequence, as the alphabet size increases. In this work we introduce an efficient and easy-to-implement method for large alphabet adaptive encoding. We propose a conceptual framework in which a sequence of symbols, over a large alphabet size, is decomposed into multiple 'almost independent' sequences over a smaller alphabet. Then each of these sequences is encoded separately. This way, we allow encoding of small alphabet sequences, at the cost of the 'remaining dependence' among the sequences. We demonstrate the advantages of our suggested scheme through a series of theorems and experiments, showing it reduces both the average codeword length and the compression runtime in many large alphabet setups.
AB - Encoding a sequence of independent symbols over a large alphabet size is a challenging problem with applications in many fields. The most widely used adaptive entropy coding techniques (namely, arithmetic and Huffman coding) are known to achieve an average codeword length which may be significantly greater than the empirical entropy of the sequence, as the alphabet size increases. In this work we introduce an efficient and easy-to-implement method for large alphabet adaptive encoding. We propose a conceptual framework in which a sequence of symbols, over a large alphabet size, is decomposed into multiple 'almost independent' sequences over a smaller alphabet. Then each of these sequences is encoded separately. This way, we allow encoding of small alphabet sequences, at the cost of the 'remaining dependence' among the sequences. We demonstrate the advantages of our suggested scheme through a series of theorems and experiments, showing it reduces both the average codeword length and the compression runtime in many large alphabet setups.
KW - Factorial codes
KW - Independent Component Analysis
KW - Large alphabet source coding
UR - http://www.scopus.com/inward/record.url?scp=85010046845&partnerID=8YFLogxK
U2 - 10.1109/DCC.2016.59
DO - 10.1109/DCC.2016.59
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85010046845
T3 - Data Compression Conference Proceedings
SP - 369
EP - 378
BT - Proceedings - DCC 2016
A2 - Marcellin, Michael W.
A2 - Bilgin, Ali
A2 - Serra-Sagrista, Joan
A2 - Storer, James A.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 Data Compression Conference, DCC 2016
Y2 - 29 March 2016 through 1 April 2016
ER -