Abstract
Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet’s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.
Originalsprog | Engelsk |
---|---|
Titel | Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings |
Redaktører | Henrik I. Christensen, Peter Corke, Renaud Detry, Jean-Baptiste Weibel, Markus Vincze |
Forlag | Springer |
Publikationsdato | 2023 |
Sider | 444-458 |
ISBN (Trykt) | 9783031441363 |
DOI | |
Status | Udgivet - 2023 |
Begivenhed | 14th International Conference on Computer Vision Systems, ICVS 2023 - VIenna, Østrig Varighed: 27 sep. 2023 → 29 sep. 2023 |
Konference
Konference | 14th International Conference on Computer Vision Systems, ICVS 2023 |
---|---|
Land/Område | Østrig |
By | VIenna |
Periode | 27/09/2023 → 29/09/2023 |
Navn | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Vol/bind | 14253 LNCS |
ISSN | 0302-9743 |
Bibliografisk note
Funding Information:We thank Grant Van Horn for the mapping between NABirds and CUB, Jonathan M. Wells for helpful conversation, Vésteinn Snæbjarnarson for experimental assistance, and the reviewers for important feedback. This work was supported in part by the Pioneer Centre for AI, DNRF grant number P1.
Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.