Active Learning via Predictive Normalized Maximum Likelihood Minimization

Shachar Shayovitz, Meir Feder

Research output: Contribution to journalArticlepeer-review

Abstract

Machine learning systems require massive amounts of labeled training data in order to achieve high accuracy rates. Active learning uses feedback to label the most informative data points and significantly reduce the training set size. Many heuristics for selecting data points have been developed in recent years which are usually tailored to a specific task and a general unified framework is lacking. In this work, the individual setting is considered and an active learning criterion is proposed. Motivated by universal source coding, the proposed criterion attempts to find data points which minimize the Predictive Normalized Maximum Likelihood (pNML) regret on an un-labelled test set. It is shown that for binary classification and linear regression, the resulting criterion coincides with well known active learning criteria and thus represents a unified information theoretic active learning approach for general hypothesis classes. Finally, it is shown using real data that the proposed criterion performs better than other active learning criteria in terms of sample complexity.

Original languageEnglish
Pages (from-to)1
Number of pages1
JournalIEEE Transactions on Information Theory
DOIs
StateAccepted/In press - 2024

Keywords

  • Active Learning
  • Entropy
  • Individual Sequences
  • Labeling
  • Minimax Learning
  • Noise
  • Supervised learning
  • Task analysis
  • Training
  • Uncertainty
  • Universal Prediction

Fingerprint

Dive into the research topics of 'Active Learning via Predictive Normalized Maximum Likelihood Minimization'. Together they form a unique fingerprint.

Cite this