TY - GEN
T1 - Stitch it in Time
T2 - SIGGRAPH Asia 2022 - Computer Graphics and Interactive Techniques Conference - Asia, SA 2022
AU - Tzaban, Rotem
AU - Mokady, Ron
AU - Gal, Rinon
AU - Bermano, Amit
AU - Cohen-Or, Daniel
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/11/29
Y1 - 2022/11/29
N2 - The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing. However, replicating their success with videos has proven challenging. Applying StyleGAN editing to real videos introduces two main challenges: (i) StyleGAN operates over aligned crops. When editing videos, these crops need to be pasted back into the frame, resulting in a spatial inconsistency. (ii) Videos introduce a fundamental barrier to overcome - temporal coherency. To address the first challenge, we propose a novel stitching-tuning procedure. The generator is carefully tuned to overcome the spatial artifacts at crop borders, resulting in smooth transitions even when difficult backgrounds are involved. Turning to temporal coherence, we propose that this challenge is largely artificial. The source video is already temporally coherent, and deviations arise in part due to careless treatment of individual components in the editing pipeline. We leverage the natural alignment of StyleGAN and the tendency of neural networks to learn low-frequency functions, and demonstrate that they provide a strongly consistent prior. These components are combined in an end-to-end framework for semantic editing of facial videos. We compare our pipeline to the current state-of-the-art and demonstrate significant improvements. Our method produces meaningful manipulations and maintains greater spatial and temporal consistency, even on challenging talking head videos which current methods struggle with. Our code and videos are available at https://stitch-time.github.io/.
AB - The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing. However, replicating their success with videos has proven challenging. Applying StyleGAN editing to real videos introduces two main challenges: (i) StyleGAN operates over aligned crops. When editing videos, these crops need to be pasted back into the frame, resulting in a spatial inconsistency. (ii) Videos introduce a fundamental barrier to overcome - temporal coherency. To address the first challenge, we propose a novel stitching-tuning procedure. The generator is carefully tuned to overcome the spatial artifacts at crop borders, resulting in smooth transitions even when difficult backgrounds are involved. Turning to temporal coherence, we propose that this challenge is largely artificial. The source video is already temporally coherent, and deviations arise in part due to careless treatment of individual components in the editing pipeline. We leverage the natural alignment of StyleGAN and the tendency of neural networks to learn low-frequency functions, and demonstrate that they provide a strongly consistent prior. These components are combined in an end-to-end framework for semantic editing of facial videos. We compare our pipeline to the current state-of-the-art and demonstrate significant improvements. Our method produces meaningful manipulations and maintains greater spatial and temporal consistency, even on challenging talking head videos which current methods struggle with. Our code and videos are available at https://stitch-time.github.io/.
KW - Image Synthesis
UR - http://www.scopus.com/inward/record.url?scp=85143978616&partnerID=8YFLogxK
U2 - 10.1145/3550469.3555382
DO - 10.1145/3550469.3555382
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85143978616
T3 - Proceedings - SIGGRAPH Asia 2022 Conference Papers
BT - Proceedings - SIGGRAPH Asia 2022 Conference Papers
A2 - Spencer, Stephen N.
PB - Association for Computing Machinery, Inc
Y2 - 6 December 2022 through 9 December 2022
ER -