Abstract
Stochastic Gradient Descent (SGD) is one of the most popular optimization methods in machine learning and has been studied extensively since the early 50’s. However, our understanding of this fundamental algorithm is still lacking in certain aspects. We point out to a gap that remains between the known upper and lower bounds for the expected suboptimality of the last SGD point whenever the dimension is a constant independent of the number of SGD iterations T, and in particular, that the gap is still unaddressed even in the one dimensional case. For the latter, we provide evidence that the correct rate is Θ(1/√T) and conjecture that the same applies in any (constant) dimension.
Original language | English |
---|---|
Pages (from-to) | 3847-3851 |
Number of pages | 5 |
Journal | Proceedings of Machine Learning Research |
Volume | 125 |
State | Published - 2020 |
Event | 33rd Conference on Learning Theory, COLT 2020 - Virtual, Online, Austria Duration: 9 Jul 2020 → 12 Jul 2020 |