TY - GEN
T1 - Active Learning for Individual Data via Minimal Stochastic Complexity
AU - Shayovitz, Shachar
AU - Feder, Meir
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Modern machine learning systems require massive amounts of labeled training data in order to achieve high accuracy rates. Active learning uses feedback to label the most informative data points and significantly reduces the labeling effort. Many heuristics for selecting data points have been developed in recent years which are usually tailored to a specific task and a general unified framework is lacking. In this work, the individual data setting is considered and an active learning criterion is proposed. In this setting the features and labels, both in the training and the test, are specific individual, deterministic quantities. Motivated by connections between source coding and minimax learning, the proposed criterion attempts to find data points which minimize the average Predictive Normalized Maximum Likelihood (pNML) on the unlabeled test set. It is shown using a real data set that the proposed criterion performs better than other active learning criteria.
AB - Modern machine learning systems require massive amounts of labeled training data in order to achieve high accuracy rates. Active learning uses feedback to label the most informative data points and significantly reduces the labeling effort. Many heuristics for selecting data points have been developed in recent years which are usually tailored to a specific task and a general unified framework is lacking. In this work, the individual data setting is considered and an active learning criterion is proposed. In this setting the features and labels, both in the training and the test, are specific individual, deterministic quantities. Motivated by connections between source coding and minimax learning, the proposed criterion attempts to find data points which minimize the average Predictive Normalized Maximum Likelihood (pNML) on the unlabeled test set. It is shown using a real data set that the proposed criterion performs better than other active learning criteria.
UR - http://www.scopus.com/inward/record.url?scp=85142610473&partnerID=8YFLogxK
U2 - 10.1109/Allerton49937.2022.9929357
DO - 10.1109/Allerton49937.2022.9929357
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85142610473
T3 - 2022 58th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2022
BT - 2022 58th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 58th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2022
Y2 - 27 September 2022 through 30 September 2022
ER -