ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation

Moran Yanuka, Morris Alper, Hadar Averbuch-Elor, Raja Giryes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Web-scale training on paired text-image data is becoming increasingly central to multimodal learning, but is challenged by the highly noisy nature of datasets in the wild. Standard data filtering approaches succeed in removing mismatched text-image pairs, but permit semantically related but highly abstract or subjective text. These approaches lack the fine-grained ability to isolate the most concrete samples that provide the strongest signal for learning in a noisy dataset. In this work, we propose a new metric, Image Caption Concreteness (ICC), that evaluates caption text without an image reference to measure its concreteness and relevancy for use in multimodal learning. Our unsupervised approach leverages strong foundation models for measuring visual-semantic information loss in multimodal representations. We demonstrate that this strongly correlates with human evaluation of concreteness in both single-word and caption-level texts. Moreover, we show that curation using ICC complements existing approaches: It succeeds in selecting the highest quality samples from multimodal web-scale datasets to allow for efficient training in resource-constrained settings.

Original languageEnglish
Title of host publication62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Proceedings of the Conference
EditorsLun-Wei Ku, Andre Martins, Vivek Srikumar
PublisherAssociation for Computational Linguistics (ACL)
Pages11048-11064
Number of pages17
ISBN (Electronic)9798891760998
StatePublished - 2024
EventFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Hybrid, Bangkok, Thailand
Duration: 11 Aug 202416 Aug 2024

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

ConferenceFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Country/TerritoryThailand
CityHybrid, Bangkok
Period11/08/2416/08/24

Fingerprint

Dive into the research topics of 'ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation'. Together they form a unique fingerprint.

Cite this