TY - JOUR
T1 - Joint embeddings of shapes and images via CNN image purification
AU - Li, Yangyan
AU - Su, Hao
AU - Qi, Charles Ruizhongtai
AU - Fish, Noa
AU - Cohen-Or, Daniel
AU - Guibas, Leonidas J.
N1 - Publisher Copyright:
Copyright 2015 ACM.
PY - 2015/11
Y1 - 2015/11
N2 - Both 3D models and 2D images contain a wealth of information about everyday objects in our environment. However, it is difficult to semantically link together these two media forms, even when they feature identical or very similar objects. We propose a joint embedding space populated by both 3D shapes and 2D images of objects, where the distances between embedded entities reflect similarity between the underlying objects. This joint embedding space facilitates comparison between entities of either form, and allows for cross-modality retrieval. We construct the embedding space using 3D shape similarity measure, as 3D shapes are more pure and complete than their appearance in images, leading to more robust distance metrics. We then employ a Convolutional Neural Network (CNN) to "purify" images by muting distracting factors. The CNN is trained to map an image to a point in the embedding space, so that it is close to a point attributed to a 3D model of a similar object to the one depicted in the image. This purifying capability of the CNN is accomplished with the help of a large amount of training data consisting of images synthesized from 3D shapes. Our joint embedding allows cross-view image retrieval, image-based shape retrieval, as well as shape-based image retrieval. We evaluate our method on these retrieval tasks and show that it consistently outperforms state-of-the-art methods, and demonstrate the usability of a joint embedding in a number of additional applications.
AB - Both 3D models and 2D images contain a wealth of information about everyday objects in our environment. However, it is difficult to semantically link together these two media forms, even when they feature identical or very similar objects. We propose a joint embedding space populated by both 3D shapes and 2D images of objects, where the distances between embedded entities reflect similarity between the underlying objects. This joint embedding space facilitates comparison between entities of either form, and allows for cross-modality retrieval. We construct the embedding space using 3D shape similarity measure, as 3D shapes are more pure and complete than their appearance in images, leading to more robust distance metrics. We then employ a Convolutional Neural Network (CNN) to "purify" images by muting distracting factors. The CNN is trained to map an image to a point in the embedding space, so that it is close to a point attributed to a 3D model of a similar object to the one depicted in the image. This purifying capability of the CNN is accomplished with the help of a large amount of training data consisting of images synthesized from 3D shapes. Our joint embedding allows cross-view image retrieval, image-based shape retrieval, as well as shape-based image retrieval. We evaluate our method on these retrieval tasks and show that it consistently outperforms state-of-the-art methods, and demonstrate the usability of a joint embedding in a number of additional applications.
KW - 3D shapes
KW - Deep learning
KW - Embedding
UR - http://www.scopus.com/inward/record.url?scp=84995745188&partnerID=8YFLogxK
U2 - 10.1145/2816795.2818071
DO - 10.1145/2816795.2818071
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84995745188
SN - 0730-0301
VL - 34
JO - ACM Transactions on Graphics
JF - ACM Transactions on Graphics
IS - 6
M1 - 234
ER -