TY - CONF
T1 - Describing Sets of Images with Textual-PCA
AU - Hupert, Oded
AU - Schwartz, Idan
AU - Wolf, Lior
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set. Our procedure is analogous to Principle Component Analysis, in which the role of projection vectors is replaced with generated phrases. First, a centroid phrase that has the largest average semantic similarity to the images in the set is generated, where both the computation of the similarity and the generation are based on pretrained vision-language models. Then, the phrase that generates the highest variation among the similarity scores is generated, using the same models. The next phrase maximizes the variance subject to being orthogonal, in the latent space, to the highest-variance phrase, and the process continues. Our experiments show that our method is able to convincingly capture the essence of image sets and describe the individual elements in a semantically meaningful way within the context of the entire set. Our code is available at: https://github.com/OdedH/textual-pca.
AB - We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set. Our procedure is analogous to Principle Component Analysis, in which the role of projection vectors is replaced with generated phrases. First, a centroid phrase that has the largest average semantic similarity to the images in the set is generated, where both the computation of the similarity and the generation are based on pretrained vision-language models. Then, the phrase that generates the highest variation among the similarity scores is generated, using the same models. The next phrase maximizes the variance subject to being orthogonal, in the latent space, to the highest-variance phrase, and the process continues. Our experiments show that our method is able to convincingly capture the essence of image sets and describe the individual elements in a semantically meaningful way within the context of the entire set. Our code is available at: https://github.com/OdedH/textual-pca.
UR - http://www.scopus.com/inward/record.url?scp=85149897836&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.findings-emnlp.279
DO - 10.18653/v1/2022.findings-emnlp.279
M3 - ???researchoutput.researchoutputtypes.contributiontoconference.paper???
AN - SCOPUS:85149897836
SP - 3840
EP - 3850
T2 - 2022 Findings of the Association for Computational Linguistics: EMNLP 2022
Y2 - 7 December 2022 through 11 December 2022
ER -