Lossless Compression of Random Forests

Amichai Painsky*, Saharon Rosset

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Ensemble methods are among the state-of-the-art predictive modeling approaches. Applied to modern big data, these methods often require a large number of sub-learners, where the complexity of each learner typically grows with the size of the dataset. This phenomenon results in an increasing demand for storage space, which may be very costly. This problem mostly manifests in a subscriber-based environment, where a user-specific ensemble needs to be stored on a personal device with strict storage limitations (such as a cellular device). In this work we introduce a novel method for lossless compression of tree-based ensemble methods, focusing on random forests. Our suggested method is based on probabilistic modeling of the ensemble’s trees, followed by model clustering via Bregman divergence. This allows us to find a minimal set of models that provides an accurate description of the trees, and at the same time is small enough to store and maintain. Our compression scheme demonstrates high compression rates on a variety of modern datasets. Importantly, our scheme enables predictions from the compressed format and a perfect reconstruction of the original ensemble. In addition, we introduce a theoretically sound lossy compression scheme, which allows us to control the trade-off between the distortion and the coding rate.

Original languageEnglish
Pages (from-to)494-506
Number of pages13
JournalJournal of Computer Science and Technology
Volume34
Issue number2
DOIs
StatePublished - 1 Mar 2019

Funding

FundersFunder number
Israeli Ministry of Immigration
Israel Science Foundation1487/12

    Keywords

    • entropy coding
    • lossless compression
    • lossy compression
    • random forest

    Fingerprint

    Dive into the research topics of 'Lossless Compression of Random Forests'. Together they form a unique fingerprint.

    Cite this