TY - JOUR
T1 - Deep Video-Based Performance Cloning
AU - Aberman, K.
AU - Shi, M.
AU - Liao, J.
AU - Lischinski, D.
AU - Chen, B.
AU - Cohen-Or, D.
N1 - Publisher Copyright:
© 2019 The Author(s) Computer Graphics Forum © 2019 The Eurographics Association and John Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.
PY - 2019/5
Y1 - 2019/5
N2 - We present a new video-based performance cloning technique. After training a deep generative network using a reference video capturing the appearance and dynamics of a target actor, we are able to generate videos where this actor reenacts other performances. All of the training data and the driving performances are provided as ordinary video segments, without motion capture or depth information. Our generative model is realized as a deep neural network with two branches, both of which train the same space-time conditional generator, using shared weights. One branch, responsible for learning to generate the appearance of the target actor in various poses, uses paired training data, self-generated from the reference video. The second branch uses unpaired data to improve generation of temporally coherent video renditions of unseen pose sequences. Through data augmentation, our network is able to synthesize images of the target actor in poses never captured by the reference video. We demonstrate a variety of promising results, where our method is able to generate temporally coherent videos, for challenging scenarios where the reference and driving videos consist of very different dance performances.
AB - We present a new video-based performance cloning technique. After training a deep generative network using a reference video capturing the appearance and dynamics of a target actor, we are able to generate videos where this actor reenacts other performances. All of the training data and the driving performances are provided as ordinary video segments, without motion capture or depth information. Our generative model is realized as a deep neural network with two branches, both of which train the same space-time conditional generator, using shared weights. One branch, responsible for learning to generate the appearance of the target actor in various poses, uses paired training data, self-generated from the reference video. The second branch uses unpaired data to improve generation of temporally coherent video renditions of unseen pose sequences. Through data augmentation, our network is able to synthesize images of the target actor in poses never captured by the reference video. We demonstrate a variety of promising results, where our method is able to generate temporally coherent videos, for challenging scenarios where the reference and driving videos consist of very different dance performances.
KW - CCS Concepts
KW - Neural networks
KW - • Computing methodologies → Image-based rendering
UR - http://www.scopus.com/inward/record.url?scp=85066973925&partnerID=8YFLogxK
U2 - 10.1111/cgf.13632
DO - 10.1111/cgf.13632
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85066973925
SN - 0167-7055
VL - 38
SP - 219
EP - 233
JO - Computer Graphics Forum
JF - Computer Graphics Forum
IS - 2
ER -