MMEarth: Exploring Multi-modal Pretext Tasks for Geospatial Representation Learning

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

2 Citationer (Scopus)

Abstract

The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create MMEarth, a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that pretraining with multi-modal pretext tasks notably improves the linear probing performance compared to pretraining on optical satellite images only. This also leads to better label efficiency and parameter efficiency which are crucial aspects in global scale applications. (The MMEarth dataset is available on the project page: vishalned.github.io/mmearth. The dataset construction code is available here: github.com/vishalned/MMEarth-data. The MP-MAE code for training and evaluation is available here: github.com/vishalned/MMEarth-train).
OriginalsprogEngelsk
TitelComputer Vision – ECCV 2024
RedaktørerAleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
Antal sider19
ForlagSpringer
Publikationsdato2025
Sider164-182
ISBN (Trykt)978-3-031-73038-2
ISBN (Elektronisk)978-3-031-73039-9
DOI
StatusUdgivet - 2025
Begivenhed18th European Conference on Computer Vision, ECCV 2024 - Milan, Italien
Varighed: 29 sep. 20244 okt. 2024

Konference

Konference18th European Conference on Computer Vision, ECCV 2024
Land/OmrådeItalien
ByMilan
Periode29/09/202404/10/2024
NavnLecture Notes in Computer Science
Vol/bind15122
ISSN0302-9743

Bibliografisk note

DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

Citationsformater