Descriptive Attributes for Language-Based Object Keypoint Detection

Jerod Weinman*, Serge Belongie, Stella Frank

*Corresponding author af dette arbejde

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

Abstract

Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet’s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.

OriginalsprogEngelsk
TitelComputer Vision Systems - 14th International Conference, ICVS 2023, Proceedings
RedaktørerHenrik I. Christensen, Peter Corke, Renaud Detry, Jean-Baptiste Weibel, Markus Vincze
ForlagSpringer
Publikationsdato2023
Sider444-458
ISBN (Trykt)9783031441363
DOI
StatusUdgivet - 2023
Begivenhed14th International Conference on Computer Vision Systems, ICVS 2023 - VIenna, Østrig
Varighed: 27 sep. 202329 sep. 2023

Konference

Konference14th International Conference on Computer Vision Systems, ICVS 2023
Land/OmrådeØstrig
ByVIenna
Periode27/09/202329/09/2023
NavnLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Vol/bind14253 LNCS
ISSN0302-9743

Bibliografisk note

Funding Information:
We thank Grant Van Horn for the mapping between NABirds and CUB, Jonathan M. Wells for helpful conversation, Vésteinn Snæbjarnarson for experimental assistance, and the reviewers for important feedback. This work was supported in part by the Pioneer Centre for AI, DNRF grant number P1.

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

Citationsformater