TY - JOUR
T1 - Unifying the known and unknown microbial coding sequence space
AU - Vanni, Chiara
AU - Schechter, Matthew S.
AU - Acinas, Silvia G.
AU - Barberán, Albert
AU - Buttigieg, Pier Luigi
AU - Casamayor, Emilio O.
AU - Delmont, Tom O.
AU - Duarte, Carlos M.
AU - Eren, A. Murat
AU - Finn, Robert D.
AU - Kottmann, Renzo
AU - Mitchell, Alex
AU - Sánchez, Pablo
AU - Siren, Kimmo
AU - Steinegger, Martin
AU - Glöckner, Frank Oliver
AU - Fernandez-Guerra, Antonio
PY - 2022
Y1 - 2022
N2 - Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 4060 systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we propose a conceptual framework and a computational workflow that bridge the known-unknown gap in genomes and metagenomes. We showcase our approach by exploring 415,971,742 genes predicted from 1,749 metagenomes and 28,941 bacterial and archaeal genomes. We quantify the extent of the unknown fraction, its diversity, and its relevance across multiple biomes. Furthermore, we provide a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria, being a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.Competing Interest StatementThe authors have declared no competing interest.
AB - Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 4060 systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we propose a conceptual framework and a computational workflow that bridge the known-unknown gap in genomes and metagenomes. We showcase our approach by exploring 415,971,742 genes predicted from 1,749 metagenomes and 28,941 bacterial and archaeal genomes. We quantify the extent of the unknown fraction, its diversity, and its relevance across multiple biomes. Furthermore, we provide a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria, being a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.Competing Interest StatementThe authors have declared no competing interest.
U2 - 10.7554/eLife.67667
DO - 10.7554/eLife.67667
M3 - Journal article
C2 - 35356891
VL - 11
JO - eLife
JF - eLife
SN - 2050-084X
M1 - e67667
ER -