TY - GEN
T1 - Spice·E
T2 - SIGGRAPH 2024 Conference Papers
AU - Sella, Etai
AU - Fiebelman, Gal
AU - Atia, Noam
AU - Averbuch-Elor, Hadar
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/7/13
Y1 - 2024/7/13
N2 - We are witnessing rapid progress in automatically generating and manipulating 3D assets due to the availability of pretrained text-to-image diffusion models. However, time-consuming optimization procedures are required for synthesizing each sample, hindering their potential for democratizing 3D content creation. Conversely, 3D diffusion models now train on million-scale 3D datasets, yielding high-quality text-conditional 3D samples within seconds. In this work, we present Spice · E - a neural network that adds structural guidance to 3D diffusion models, extending their usage beyond text-conditional generation. At its core, our framework introduces a cross-entity attention mechanism that allows for multiple entities - in particular, paired input and guidance 3D shapes - to interact via their internal representations within the denoising network. We utilize this mechanism for learning task-specific structural priors in 3D diffusion models from auxiliary guidance shapes. We show that our approach supports a variety of applications, including 3D stylization, semantic shape editing and text-conditional abstraction-to-3D, which transforms primitive-based abstractions into highly-expressive shapes. Extensive experiments demonstrate that Spice · E achieves SOTA performance over these tasks while often being considerably faster than alternative methods. Importantly, this is accomplished without tailoring our approach for any specific task. We will release our code and trained models.
AB - We are witnessing rapid progress in automatically generating and manipulating 3D assets due to the availability of pretrained text-to-image diffusion models. However, time-consuming optimization procedures are required for synthesizing each sample, hindering their potential for democratizing 3D content creation. Conversely, 3D diffusion models now train on million-scale 3D datasets, yielding high-quality text-conditional 3D samples within seconds. In this work, we present Spice · E - a neural network that adds structural guidance to 3D diffusion models, extending their usage beyond text-conditional generation. At its core, our framework introduces a cross-entity attention mechanism that allows for multiple entities - in particular, paired input and guidance 3D shapes - to interact via their internal representations within the denoising network. We utilize this mechanism for learning task-specific structural priors in 3D diffusion models from auxiliary guidance shapes. We show that our approach supports a variety of applications, including 3D stylization, semantic shape editing and text-conditional abstraction-to-3D, which transforms primitive-based abstractions into highly-expressive shapes. Extensive experiments demonstrate that Spice · E achieves SOTA performance over these tasks while often being considerably faster than alternative methods. Importantly, this is accomplished without tailoring our approach for any specific task. We will release our code and trained models.
KW - 3D Generative AI
KW - 3D Textual Editing
KW - Conditional Generation
KW - Diffusion Models
UR - http://www.scopus.com/inward/record.url?scp=85199855960&partnerID=8YFLogxK
U2 - 10.1145/3641519.3657461
DO - 10.1145/3641519.3657461
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85199855960
T3 - Proceedings - SIGGRAPH 2024 Conference Papers
BT - Proceedings - SIGGRAPH 2024 Conference Papers
A2 - Spencer, Stephen N.
PB - Association for Computing Machinery, Inc
Y2 - 28 July 2024 through 1 August 2024
ER -