WAFFLE: Multimodal Floorplan Understanding in the Wild

Keren Ganon, Morris Alper, Rachel Mikulinsky, Hadar Averbuch-Elor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Buildings are a central feature of human culture and require significant work to design, build, and maintain. As such, the fundamental element defining their structure - the floorplan - has increasingly become an object of computational analysis. Existing works on automatic floorplan understanding are extremely limited in scope, often focusing on a single semantic category and region (e.g. apartments from a single country). This contrasts with the wide vari-ety of shapes and sizes of real-world buildings which reflect their diverse purposes. In this work, we introduce WAF-FLE, a novel multimodal floorplan understanding dataset of nearly 20K floorplan images and metadata curatedfrom In-ternet data spanning diverse building types, locations, and data formats. By using a large language model and multimodal foundation models, we curate and extract semantic information from these images and their accompanying noisy metadata. We show that WAFFLE serves as a challenging benchmark for prior computational methods, while enabling progress on new floorplan understanding tasks. We will publicly release WAFFLE along with our code and trained models, providing the research community with a new foundation for learning the semantics of buildings.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1488-1497
Number of pages10
ISBN (Electronic)9798331510831
DOIs
StatePublished - 2025
Event2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 - Tucson, United States
Duration: 28 Feb 20254 Mar 2025

Publication series

NameProceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025

Conference

Conference2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
Country/TerritoryUnited States
CityTucson
Period28/02/254/03/25

Fingerprint

Dive into the research topics of 'WAFFLE: Multimodal Floorplan Understanding in the Wild'. Together they form a unique fingerprint.

Cite this