TY - GEN
T1 - WAFFLE
T2 - 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
AU - Ganon, Keren
AU - Alper, Morris
AU - Mikulinsky, Rachel
AU - Averbuch-Elor, Hadar
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Buildings are a central feature of human culture and require significant work to design, build, and maintain. As such, the fundamental element defining their structure - the floorplan - has increasingly become an object of computational analysis. Existing works on automatic floorplan understanding are extremely limited in scope, often focusing on a single semantic category and region (e.g. apartments from a single country). This contrasts with the wide vari-ety of shapes and sizes of real-world buildings which reflect their diverse purposes. In this work, we introduce WAF-FLE, a novel multimodal floorplan understanding dataset of nearly 20K floorplan images and metadata curatedfrom In-ternet data spanning diverse building types, locations, and data formats. By using a large language model and multimodal foundation models, we curate and extract semantic information from these images and their accompanying noisy metadata. We show that WAFFLE serves as a challenging benchmark for prior computational methods, while enabling progress on new floorplan understanding tasks. We will publicly release WAFFLE along with our code and trained models, providing the research community with a new foundation for learning the semantics of buildings.
AB - Buildings are a central feature of human culture and require significant work to design, build, and maintain. As such, the fundamental element defining their structure - the floorplan - has increasingly become an object of computational analysis. Existing works on automatic floorplan understanding are extremely limited in scope, often focusing on a single semantic category and region (e.g. apartments from a single country). This contrasts with the wide vari-ety of shapes and sizes of real-world buildings which reflect their diverse purposes. In this work, we introduce WAF-FLE, a novel multimodal floorplan understanding dataset of nearly 20K floorplan images and metadata curatedfrom In-ternet data spanning diverse building types, locations, and data formats. By using a large language model and multimodal foundation models, we curate and extract semantic information from these images and their accompanying noisy metadata. We show that WAFFLE serves as a challenging benchmark for prior computational methods, while enabling progress on new floorplan understanding tasks. We will publicly release WAFFLE along with our code and trained models, providing the research community with a new foundation for learning the semantics of buildings.
UR - http://www.scopus.com/inward/record.url?scp=105003626197&partnerID=8YFLogxK
U2 - 10.1109/WACV61041.2025.00152
DO - 10.1109/WACV61041.2025.00152
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:105003626197
T3 - Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
SP - 1488
EP - 1497
BT - Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 28 February 2025 through 4 March 2025
ER -