TY - JOUR
T1 - SpeedNet
AU - Benaim, Sagie
AU - Ephrat, Ariel
AU - Lang, Oran
AU - Mosseri, Inbar
AU - Freeman, William T.
AU - Rubinstein, Michael
AU - Irani, Michal
AU - Dekel, Tali
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020
Y1 - 2020
N2 - We wish to automatically predict the 'speediness' of moving objects in videos-whether they move faster, at, or slower than their 'natural' speed. The core component in our approach is SpeedNet-a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up. SpeedNet is trained on a large corpus of natural videos in a self-supervised manner, without requiring any manual annotations. We show how this single, binary classification network can be used to detect arbitrary rates of speediness of objects. We demonstrate prediction results by SpeedNet on a wide range of videos containing complex natural motions, and examine the visual cues it utilizes for making those predictions. Importantly, we show that through predicting the speed of videos, the model learns a powerful and meaningful space-Time representation that goes beyond simple motion cues. We demonstrate how those learned features can boost the performance of self supervised action recognition, and can be used for video retrieval. Furthermore, we also apply SpeedNet for generating time-varying, adaptive video speedups, which can allow viewers to watch videos faster, but with less of the jittery, unnatural motions typical to videos that are sped up uniformly.
AB - We wish to automatically predict the 'speediness' of moving objects in videos-whether they move faster, at, or slower than their 'natural' speed. The core component in our approach is SpeedNet-a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up. SpeedNet is trained on a large corpus of natural videos in a self-supervised manner, without requiring any manual annotations. We show how this single, binary classification network can be used to detect arbitrary rates of speediness of objects. We demonstrate prediction results by SpeedNet on a wide range of videos containing complex natural motions, and examine the visual cues it utilizes for making those predictions. Importantly, we show that through predicting the speed of videos, the model learns a powerful and meaningful space-Time representation that goes beyond simple motion cues. We demonstrate how those learned features can boost the performance of self supervised action recognition, and can be used for video retrieval. Furthermore, we also apply SpeedNet for generating time-varying, adaptive video speedups, which can allow viewers to watch videos faster, but with less of the jittery, unnatural motions typical to videos that are sped up uniformly.
UR - http://www.scopus.com/inward/record.url?scp=85094739078&partnerID=8YFLogxK
U2 - 10.1109/CVPR42600.2020.00994
DO - 10.1109/CVPR42600.2020.00994
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85094739078
SP - 9919
EP - 9928
JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SN - 1063-6919
M1 - 9156879
Y2 - 14 June 2020 through 19 June 2020
ER -