TY - GEN

T1 - Online learning with feedback graphs

T2 - 28th Conference on Learning Theory, COLT 2015

AU - Alon, Noga

AU - Cesa-Bianchi, Nicolò

AU - Dekel, Ofer

AU - Koren, Tomer

N1 - Publisher Copyright:
© 2015 A. Agarwal & S. Agarwal.

PY - 2015

Y1 - 2015

N2 - We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced T-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: strongly observable graphs, weakly observable graphs, and unobservable graphs. We prove that the first class induces learning problems with θ∼(δ1/2T1/2) minimax regret, where α is the independence number of the underlying graph; the second class induces problems with θ∼(δ1/3T2/3) minimax regret, where δ is the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time.

AB - We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced T-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: strongly observable graphs, weakly observable graphs, and unobservable graphs. We prove that the first class induces learning problems with θ∼(δ1/2T1/2) minimax regret, where α is the independence number of the underlying graph; the second class induces problems with θ∼(δ1/3T2/3) minimax regret, where δ is the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time.

UR - http://www.scopus.com/inward/record.url?scp=84984693577&partnerID=8YFLogxK

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:84984693577

VL - 40

T3 - Proceedings of Machine Learning Research

SP - 23

EP - 35

BT - Proceedings of The 28th Conference on Learning Theory

A2 - Grünwald, Peter

A2 - Hazan, Elad

A2 - Kale, Satyen

PB - PMLR

Y2 - 2 July 2015 through 6 July 2015

ER -