Abstract
Multi-modal models require aligned, shared embedding spaces. However, common CLIP-based approaches need large amounts of samples and do not natively support 3D or tabular data, both of which are crucial in the medical domain. To address these issues, we revisit CLIP-style alignment by training a domain-specific 3D foundation model as an image encoder and demonstrate that modality alignment is feasible with only 62 MRI scans. Our approach is enabled by a simple embedding accumulation strategy required for training in 3D, which scales the amount of negative pairs across batches in order to stabilize training. We perform a thorough evaluation of various design choices, including the choice of backbone and loss functions, and evaluate the proposed methodology on zero-shot classification and image-retrieval tasks. While zero-shot image-retrieval remains challenging, zero-shot classification results demonstrate that the proposed approach can meaningfully align the representations of 3D MRI with tabular data. Code and model checkpoints are available here.
Originalsprog | Engelsk |
---|---|
Titel | ISBI 2025 - 2025 IEEE 22nd International Symposium on Biomedical Imaging, Proceedings |
Forlag | IEEE Computer Society Press |
Publikationsdato | 2025 |
Sider | 1-5 |
ISBN (Elektronisk) | 9798331520526 |
DOI | |
Status | Udgivet - 2025 |
Begivenhed | 22nd IEEE International Symposium on Biomedical Imaging, ISBI 2025 - Houston, USA Varighed: 14 apr. 2025 → 17 apr. 2025 |
Konference
Konference | 22nd IEEE International Symposium on Biomedical Imaging, ISBI 2025 |
---|---|
Land/Område | USA |
By | Houston |
Periode | 14/04/2025 → 17/04/2025 |
Sponsor | et al., Houston Methodist, IEEE Signal Processing Society, United Imaging, University of Texas MD Anderson Cancer Center, Verasonics |
Navn | Proceedings - International Symposium on Biomedical Imaging |
---|---|
ISSN | 1945-7928 |
Bibliografisk note
Publisher Copyright:© 2025 IEEE.