TY - CONF
T1 - Emerging disentanglement in auto-encoder based unsupervised image content transfer
AU - Press, Ori
AU - Galanti, Tomer
AU - Benaim, Sagie
AU - Wolf, Lior
N1 - Publisher Copyright:
© 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved.
PY - 2019
Y1 - 2019
N2 - We study the problem of learning to map, in an unsupervised way, between domains A and B, such that the samples b ∈ B contain all the information that exists in samples a ∈ A and some additional information. For example, ignoring occlusions, B can be people with glasses, A people without, and the glasses, would be the added information. When mapping a sample a from the first domain to the other domain, the missing information is replicated from an independent reference sample b ∈ B. Thus, in the above example, we can create, for every person without glasses a version with the glasses observed in any face image. Our solution employs a single two-pathway encoder and a single decoder for both domains. The common part of the two domains and the separate part are encoded as two vectors, and the separate part is fixed at zero for domain A. The loss terms are minimal and involve reconstruction losses for the two domains and a domain confusion term. Our analysis shows that under mild assumptions, this architecture, which is much simpler than the literature guided-translation methods, is enough to ensure disentanglement between the two domains. We present convincing results in a few visual domains, such as no-glasses to glasses, adding facial hair based on a reference image, etc.
AB - We study the problem of learning to map, in an unsupervised way, between domains A and B, such that the samples b ∈ B contain all the information that exists in samples a ∈ A and some additional information. For example, ignoring occlusions, B can be people with glasses, A people without, and the glasses, would be the added information. When mapping a sample a from the first domain to the other domain, the missing information is replicated from an independent reference sample b ∈ B. Thus, in the above example, we can create, for every person without glasses a version with the glasses observed in any face image. Our solution employs a single two-pathway encoder and a single decoder for both domains. The common part of the two domains and the separate part are encoded as two vectors, and the separate part is fixed at zero for domain A. The loss terms are minimal and involve reconstruction losses for the two domains and a domain confusion term. Our analysis shows that under mild assumptions, this architecture, which is much simpler than the literature guided-translation methods, is enough to ensure disentanglement between the two domains. We present convincing results in a few visual domains, such as no-glasses to glasses, adding facial hair based on a reference image, etc.
UR - http://www.scopus.com/inward/record.url?scp=85083953396&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontoconference.paper???
AN - SCOPUS:85083953396
T2 - 7th International Conference on Learning Representations, ICLR 2019
Y2 - 6 May 2019 through 9 May 2019
ER -