TY - JOUR
T1 - Adversarial and variational autoencoders improve metagenomic binning
AU - Líndez, Pau Piera
AU - Johansen, Joachim
AU - Kutuzova, Svetlana
AU - Sigurdsson, Arnor Ingi
AU - Nissen, Jakob Nybo
AU - Rasmussen, Simon
N1 - Publisher Copyright:
© 2023, Springer Nature Limited.
PY - 2023
Y1 - 2023
N2 - Assembly of reads from metagenomic samples is a hard problem, often resulting in highly fragmented genome assemblies. Metagenomic binning allows us to reconstruct genomes by re-grouping the sequences by their organism of origin, thus representing a crucial processing step when exploring the biological diversity of metagenomic samples. Here we present Adversarial Autoencoders for Metagenomics Binning (AAMB), an ensemble deep learning approach that integrates sequence co-abundances and tetranucleotide frequencies into a common denoised space that enables precise clustering of sequences into microbial genomes. When benchmarked, AAMB presented similar or better results compared with the state-of-the-art reference-free binner VAMB, reconstructing ~7% more near-complete (NC) genomes across simulated and real data. In addition, genomes reconstructed using AAMB had higher completeness and greater taxonomic diversity compared with VAMB. Finally, we implemented a pipeline Integrating VAMB and AAMB that enabled improved binning, recovering 20% and 29% more simulated and real NC genomes, respectively, compared to VAMB, with moderate additional runtime.
AB - Assembly of reads from metagenomic samples is a hard problem, often resulting in highly fragmented genome assemblies. Metagenomic binning allows us to reconstruct genomes by re-grouping the sequences by their organism of origin, thus representing a crucial processing step when exploring the biological diversity of metagenomic samples. Here we present Adversarial Autoencoders for Metagenomics Binning (AAMB), an ensemble deep learning approach that integrates sequence co-abundances and tetranucleotide frequencies into a common denoised space that enables precise clustering of sequences into microbial genomes. When benchmarked, AAMB presented similar or better results compared with the state-of-the-art reference-free binner VAMB, reconstructing ~7% more near-complete (NC) genomes across simulated and real data. In addition, genomes reconstructed using AAMB had higher completeness and greater taxonomic diversity compared with VAMB. Finally, we implemented a pipeline Integrating VAMB and AAMB that enabled improved binning, recovering 20% and 29% more simulated and real NC genomes, respectively, compared to VAMB, with moderate additional runtime.
U2 - 10.1038/s42003-023-05452-3
DO - 10.1038/s42003-023-05452-3
M3 - Journal article
C2 - 37865678
AN - SCOPUS:85174562894
VL - 6
JO - Communications Biology
JF - Communications Biology
SN - 2399-3642
M1 - 1073
ER -