SSEmb: A joint embedding of protein sequence and structure enables robust variant effect predictions

Lasse M. Blaabjerg, Nicolas Jonsson, Wouter Boomsma, Amelie Stein*, Kresten Lindorff-Larsen

*Corresponding author af dette arbejde

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

4 Citationer (Scopus)
7 Downloads (Pure)

Abstract

The ability to predict how amino acid changes affect proteins has a wide range of applications including in disease variant classification and protein engineering. Many existing methods focus on learning from patterns found in either protein sequences or protein structures. Here, we present a method for integrating information from sequence and structure in a single model that we term SSEmb (Sequence Structure Embedding). SSEmb combines a graph representation for the protein structure with a transformer model for processing multiple sequence alignments. We show that by integrating both types of information we obtain a variant effect prediction model that is robust when sequence information is scarce. We also show that SSEmb learns embeddings of the sequence and structure that are useful for other downstream tasks such as to predict protein-protein binding sites. We envisage that SSEmb may be useful both for variant effect predictions and as a representation for learning to predict protein properties that depend on sequence and structure.
OriginalsprogEngelsk
Artikelnummer9646
TidsskriftNature Communications
Vol/bind15
Udgave nummer1
Antal sider9
ISSN2041-1723
DOI
StatusUdgivet - 2024

Bibliografisk note

Funding Information:
Our research is supported by the PRISM (Protein Interactions and Stability in Medicine and Genomics) center funded by the Novo Nordisk Foundation (NNF18OC0033950 to A.S. and K.L.L.), and by grants from the Carlsberg Foundation (CF21-0392 to K.L.L.), Novo Nordisk Foundation (NNF20OC0062606 and NNF18OC0052719 to W.B.) and the Lundbeck Foundation (R272-2017-4528 to A.S.).

Publisher Copyright:
© The Author(s) 2024.

Citationsformater