TY - GEN
T1 - Universal Compression of Memoryless Sources over Large Alphabets via Independent Component Analysis
AU - Painsky, Amichai
AU - Rosset, Saharon
AU - Feder, Meir
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/7/2
Y1 - 2015/7/2
N2 - Many applications of universal compression involve sources such as text, speech and image, whose alphabet is extremely large. In this work we propose a conceptual framework in which a large alphabet memory less source is decomposed into multiple 'as independent as possible' sources whose alphabet is much smaller. This way we slightly increase the average codeword length as the compressed symbols are no longer perfectly independent, but at the same time significantly reduce the overhead redundancy resulted by the large alphabet of the observed source. Our proposed algorithm, based on a generalization of the Binary Independent Component Analysis, shows to efficiently find the ideal trade-off so that the overall compression size is minimal. We demonstrate our framework on memory less draws from a variety of natural languages and show that the redundancy we achieve is remarkably smaller than most commonly used methods.
AB - Many applications of universal compression involve sources such as text, speech and image, whose alphabet is extremely large. In this work we propose a conceptual framework in which a large alphabet memory less source is decomposed into multiple 'as independent as possible' sources whose alphabet is much smaller. This way we slightly increase the average codeword length as the compressed symbols are no longer perfectly independent, but at the same time significantly reduce the overhead redundancy resulted by the large alphabet of the observed source. Our proposed algorithm, based on a generalization of the Binary Independent Component Analysis, shows to efficiently find the ideal trade-off so that the overall compression size is minimal. We demonstrate our framework on memory less draws from a variety of natural languages and show that the redundancy we achieve is remarkably smaller than most commonly used methods.
KW - ICA
KW - Large Alphabet Souce coding
KW - Universal Compression
UR - http://www.scopus.com/inward/record.url?scp=84938947471&partnerID=8YFLogxK
U2 - 10.1109/DCC.2015.48
DO - 10.1109/DCC.2015.48
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84938947471
T3 - Data Compression Conference Proceedings
SP - 213
EP - 222
BT - Proceedings - DCC 2015
A2 - Bilgin, Ali
A2 - Marcellin, Michael W.
A2 - Serra-Sagrista, Joan
A2 - Storer, James A.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 Data Compression Conference, DCC 2015
Y2 - 7 April 2015 through 9 April 2015
ER -