TY - JOUR
T1 - A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
AU - Kojima, Noriyuki
AU - Averbuch-Elor, Hadar
AU - Artzi, Yoav
N1 - Publisher Copyright:
© 2024, Transactions on Machine Learning Research. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take place if the task is addressed in a way that is conductive to generalization. We propose a framework to jointly study task performance and phrase grounding, and propose three benchmarks to study the relation between the two. Our results show that contemporary models demonstrate inconsistency between their ability to ground phrases and solve tasks. We show how this can be addressed through brute-force training on ground phrasing annotations, and analyze the dynamics it creates. Code and data are available at https://github.com/lil-lab/phrase_grounding.
AB - Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take place if the task is addressed in a way that is conductive to generalization. We propose a framework to jointly study task performance and phrase grounding, and propose three benchmarks to study the relation between the two. Our results show that contemporary models demonstrate inconsistency between their ability to ground phrases and solve tasks. We show how this can be addressed through brute-force training on ground phrasing annotations, and analyze the dynamics it creates. Code and data are available at https://github.com/lil-lab/phrase_grounding.
UR - http://www.scopus.com/inward/record.url?scp=85212349073&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85212349073
SN - 2835-8856
VL - 2024
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -