Detector-Free Weakly Supervised Grounding by Separation

Assaf Arbelle*, Sivan Doveh*, Amit Alfassy*, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume an existence of a pre-trained object detector, relying on it to produce the ROIs for localization. In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector. The key idea behind our proposed Grounding by Separation (GbS) method is synthesizing 'text to image-regions' associations by random alpha-blending of arbitrary image pairs and using the corresponding texts of the pair as conditions to recover the alpha map from the blended image via a segmentation network. At test time, this allows using the query phrase as a condition for a non-blended query image, thus interpreting the test image as a composition of a region corresponding to the phrase and the complement region. Our GbS shows an 8.5% accuracy improvement over previous DF-WSG SotA, for a range of benchmarks including Flickr30K, Visual Genome, and ReferIt, as well as a complementary improvement (above 7%) over the detector-based approaches for WSG.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1781-1792
Number of pages12
ISBN (Electronic)9781665428125
DOIs
StatePublished - 2021
Event18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, Canada
Duration: 11 Oct 202117 Oct 2021

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
ISSN (Print)1550-5499

Conference

Conference18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Country/TerritoryCanada
CityVirtual, Online
Period11/10/2117/10/21

Funding

FundersFunder number
ERC-StG757497
Defense Advanced Research Projects AgencyFA8750-19-C-1001

    Fingerprint

    Dive into the research topics of 'Detector-Free Weakly Supervised Grounding by Separation'. Together they form a unique fingerprint.

    Cite this