TY - GEN
T1 - SAI3D
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
AU - Yin, Yingda
AU - Liu, Yuzheng
AU - Xiao, Yang
AU - Cohen-Or, Daniel
AU - Huang, Jingwei
AU - Chen, Baoquan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Advancements in 3D instance segmentation have tra-ditionally been tethered to the availability of annotated datasets, limiting their application to a narrow spectrum of object categories. Recent efforts have sought to har-ness vision-language models like CLIP for open-set semantic reasoning, yet these methods struggle to distinguish between objects of the same categories and rely on specific prompts that are not universally applicable. In this paper, we introduce SAI3D, a novel zero-shot 3D instance segmentation approach that synergistically leverages geometric priors and semantic cues derived from Segment Any-thing Model (SAM). Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations that are consistent with the multi-view SAM masks. Moreover, we design a hierarchi-cal region-growing algorithm with a dynamic thresholding mechanism, which largely improves the robustness of fine-grained 3D scene parsing. Empirical evaluations on Scan-Net, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach. No-tably, SAI3D outperforms existing open-vocabulary base-lines and even surpasses fully-supervised methods in class-agnostic segmentation on ScanNet++. Our project page is at https://yd-yin.github.io/SAI3D.
AB - Advancements in 3D instance segmentation have tra-ditionally been tethered to the availability of annotated datasets, limiting their application to a narrow spectrum of object categories. Recent efforts have sought to har-ness vision-language models like CLIP for open-set semantic reasoning, yet these methods struggle to distinguish between objects of the same categories and rely on specific prompts that are not universally applicable. In this paper, we introduce SAI3D, a novel zero-shot 3D instance segmentation approach that synergistically leverages geometric priors and semantic cues derived from Segment Any-thing Model (SAM). Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations that are consistent with the multi-view SAM masks. Moreover, we design a hierarchi-cal region-growing algorithm with a dynamic thresholding mechanism, which largely improves the robustness of fine-grained 3D scene parsing. Empirical evaluations on Scan-Net, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach. No-tably, SAI3D outperforms existing open-vocabulary base-lines and even surpasses fully-supervised methods in class-agnostic segmentation on ScanNet++. Our project page is at https://yd-yin.github.io/SAI3D.
KW - 3D instance segmentation
KW - Segment Anything Model
KW - open-vocabulary
UR - http://www.scopus.com/inward/record.url?scp=85201258312&partnerID=8YFLogxK
U2 - 10.1109/CVPR52733.2024.00317
DO - 10.1109/CVPR52733.2024.00317
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85201258312
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 3292
EP - 3302
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PB - IEEE Computer Society
Y2 - 16 June 2024 through 22 June 2024
ER -