Active Learning via Predictive Normalized Maximum Likelihood Minimization

Shachar Shayovitz*, Meir Feder

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Machine learning systems require massive amounts of labeled training data in order to achieve high accuracy rates. Active learning uses feedback to label the most informative data points and significantly reduce the training set size. Many heuristics for selecting data points have been developed in recent years which are usually tailored to a specific task and a general unified framework is lacking. In this work, the individual setting is considered and an active learning criterion is proposed. Motivated by universal source coding, the proposed criterion attempts to find data points which minimize the Predictive Normalized Maximum Likelihood (pNML) regret on an un-labelled test set. It is shown that for binary classification and linear regression, the resulting criterion coincides with well-known active learning criteria and thus represents a unified information theoretic active learning approach for general hypothesis classes. Finally, it is shown using real data that the proposed criterion performs better than other active learning criteria in terms of sample complexity.

Original languageEnglish
Pages (from-to)5799-5810
Number of pages12
JournalIEEE Transactions on Information Theory
Volume70
Issue number8
DOIs
StatePublished - 2024

Keywords

  • Minimax learning
  • active learning
  • individual sequences
  • universal prediction

Fingerprint

Dive into the research topics of 'Active Learning via Predictive Normalized Maximum Likelihood Minimization'. Together they form a unique fingerprint.

Cite this