TY - JOUR
T1 - ASAP
T2 - 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020
AU - Noy, Asaf
AU - Doveh, Sivan
AU - Nayman, Niv
AU - Friedman, Itamar
AU - Ridnik, Tal
AU - Giryes, Raja
AU - Zamir, Nadav
AU - Zelnik-Manor, Lihi
N1 - Publisher Copyright:
Copyright © 2020 by the author(s)
PY - 2020
Y1 - 2020
N2 - Automatic methods for Neural Architecture Search (NAS) have been shown to produce state-of-the-art network models. Yet, their main drawback is the computational complexity of the search process. As some primal methods optimized over a discrete search space, thousands of days of GPU were required for convergence. A recent approach is based on constructing a differentiable search space that enables gradient-based optimization, which reduces the search time to a few days. While successful, it still includes some noncontinuous steps, e.g., the pruning of many weak connections at once. In this paper, we propose a differentiable search space that allows the annealing of architecture weights, while gradually pruning inferior operations. In this way, the search converges to a single output network in a continuous manner. Experiments on several vision datasets demonstrate the effectiveness of our method with respect to the search cost and accuracy of the achieved model. Specifically, with 0.2 GPU search days we achieve an error rate of 1.68% on CIFAR-10.
AB - Automatic methods for Neural Architecture Search (NAS) have been shown to produce state-of-the-art network models. Yet, their main drawback is the computational complexity of the search process. As some primal methods optimized over a discrete search space, thousands of days of GPU were required for convergence. A recent approach is based on constructing a differentiable search space that enables gradient-based optimization, which reduces the search time to a few days. While successful, it still includes some noncontinuous steps, e.g., the pruning of many weak connections at once. In this paper, we propose a differentiable search space that allows the annealing of architecture weights, while gradually pruning inferior operations. In this way, the search converges to a single output network in a continuous manner. Experiments on several vision datasets demonstrate the effectiveness of our method with respect to the search cost and accuracy of the achieved model. Specifically, with 0.2 GPU search days we achieve an error rate of 1.68% on CIFAR-10.
UR - http://www.scopus.com/inward/record.url?scp=85161924550&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85161924550
SN - 2640-3498
VL - 108
SP - 493
EP - 503
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 26 August 2020 through 28 August 2020
ER -