What is learned in visually grounded neural syntax acquisition

Noriyuki Kojima, Hadar Averbuch-Elor, Alexander Rush, Yoav Artzi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Visual features are a promising signal for learning bootstrap textual models. However, black-box learning models make it difficult to isolate the specific contribution of visual components. In this analysis, we consider the case study of the Visually Grounded Neural Syntax Learner (Shi et al., 2019), a recent approach for learning syntax from a visual training signal. By constructing simplified versions of the model, we isolate the core factors that yield the model's strong performance. Contrary to what the model might be capable of learning, we find significantly less expressive versions produce similar predictions and perform just as well, or even better. We also find that a simple lexical signal of noun concreteness plays the main role in the model's predictions as opposed to more complex syntactic reasoning.

Original languageEnglish
Title of host publicationACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages2615-2635
Number of pages21
ISBN (Electronic)9781952148255
StatePublished - 2020
Externally publishedYes
Event58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Virtual, Online, United States
Duration: 5 Jul 202010 Jul 2020

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference58th Annual Meeting of the Association for Computational Linguistics, ACL 2020
Country/TerritoryUnited States
CityVirtual, Online
Period5/07/2010/07/20

Funding

FundersFunder number
Google
National Science FoundationCRII-1656998, 1901030, 1656998
Directorate for Computer and Information Science and Engineering1901030

    Fingerprint

    Dive into the research topics of 'What is learned in visually grounded neural syntax acquisition'. Together they form a unique fingerprint.

    Cite this