Abstract
In this work, we investigate the interplay between memorization and learning in the context of stochastic convex optimization (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou [SZ20]. Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni [Liv23]. We show that, in the L2 Lipschitz-bounded setting and under strong convexity, every learner with an excess error ε has CMI bounded below by Ω(1/ε2) and Ω(1/ε), respectively. We further demonstrate the essential role of memorization in learning problems in SCO by designing an adversary capable of accurately identifying a significant fraction of the training samples in specific SCO problems. Finally, we enumerate several implications of our results, such as a limitation of generalization bounds based on CMI and the incompressibility of samples in SCO problems.
Original language | English |
---|---|
Pages (from-to) | 2035-2068 |
Number of pages | 34 |
Journal | Proceedings of Machine Learning Research |
Volume | 235 |
State | Published - 2024 |
Event | 41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria Duration: 21 Jul 2024 → 27 Jul 2024 |
Funding
Funders | Funder number |
---|---|
Natural Sciences and Engineering Research Council of Canada |