Universal Compression of Memoryless Sources over Large Alphabets via Independent Component Analysis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations


Many applications of universal compression involve sources such as text, speech and image, whose alphabet is extremely large. In this work we propose a conceptual framework in which a large alphabet memory less source is decomposed into multiple 'as independent as possible' sources whose alphabet is much smaller. This way we slightly increase the average codeword length as the compressed symbols are no longer perfectly independent, but at the same time significantly reduce the overhead redundancy resulted by the large alphabet of the observed source. Our proposed algorithm, based on a generalization of the Binary Independent Component Analysis, shows to efficiently find the ideal trade-off so that the overall compression size is minimal. We demonstrate our framework on memory less draws from a variety of natural languages and show that the redundancy we achieve is remarkably smaller than most commonly used methods.

Original languageEnglish
Title of host publicationProceedings - DCC 2015
Subtitle of host publication2015 Data Compression Conference
EditorsAli Bilgin, Michael W. Marcellin, Joan Serra-Sagrista, James A. Storer
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages10
ISBN (Electronic)9781479984305
StatePublished - 2 Jul 2015
Event2015 Data Compression Conference, DCC 2015 - Snowbird, United States
Duration: 7 Apr 20159 Apr 2015

Publication series

NameData Compression Conference Proceedings
ISSN (Print)1068-0314


Conference2015 Data Compression Conference, DCC 2015
Country/TerritoryUnited States


  • ICA
  • Large Alphabet Souce coding
  • Universal Compression


Dive into the research topics of 'Universal Compression of Memoryless Sources over Large Alphabets via Independent Component Analysis'. Together they form a unique fingerprint.

Cite this