Abstract
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.
Originalsprog | Engelsk |
---|---|
Tidsskrift | Nature Methods |
Vol/bind | 19 |
Udgave nummer | 4 |
Sider (fra-til) | 429-440 |
ISSN | 1548-7091 |
DOI | |
Status | Udgivet - 2022 |
Bibliografisk note
Funding Information:We thank all members of the metagenomics community who provided inputs and feedback on the project in public workshops and gratefully acknowledge funding of the DZIF (project number TI 12.002_00; F.Meyer), German Excellence Cluster RESIST (EXC 2155 project number 390874280; Z.-.L.D.) and NFDI4Microbiota (project number 460129525). D.K. was supported in part by the National Science Foundation under grant no. 1664803; A.G. by Saint Petersburg State University (grant ID PURE 73023672); D.A., A.Korobeynikov, D.M. and S.N. by the Russian Science Foundation (grant no. 19-14-00172); C.T.B. and L.I. in part by the Gordon and Betty Moore Foundation?s Data-Driven Discovery Initiative through grant nos. GBMF4551 to C.T.B.; R.C. and R.V. by ANR Inception (ANR-16-CONV-0005) and PRAIRIE (ANR-19-P3IA-0001); S.D.K. by the European Research Council (ERC) under the European Union?s Horizon 2020 research and innovation programme (ERC-COG-2018); J.K. and E.R.R. by the National Science Foundation under grant no. 1845890; S.M. partially by National Science Foundation grant nos. 2041984; V.R.M. by the Tony Basten Fellowship, Sydney Medical School Foundation. G.L.R. and Z.Z. partially by the National Science Foundation grant nos. 1936791 and 1919691; M.T. by the ERC under the European Union?s Horizon 2020 research and innovation programme (ERC-COG-2018); S.Z. by the Shanghai Municipal Science and Technology Commission (grant no. 2018SHZDZX01), 111 Project (grant no. B18015); S. Hacquard. by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through the ?2125 DECRyPT? Priority Program; R.E., E.Goltsman, Zho.W. and A.T. by the Department of Energy (DOE) Office of Biological and Environmental Research under contract number DE-AC02-05CH11231; S.S. by the Swiss National Science Foundation (NCCR Microbiomes ? 51NF40_180575). This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231. The work conducted by the US DOE Joint Genome Institute, a DOE Office of Science User Facility, is supported under contract no. DE-AC02-05CH11231.
Funding Information:
We thank all members of the metagenomics community who provided inputs and feedback on the project in public workshops and gratefully acknowledge funding of the DZIF (project number TI 12.002_00; F.Meyer), German Excellence Cluster RESIST (EXC 2155 project number 390874280; Z.-.L.D.) and NFDI4Microbiota (project number 460129525). D.K. was supported in part by the National Science Foundation under grant no. 1664803; A.G. by Saint Petersburg State University (grant ID PURE 73023672); D.A., A.Korobeynikov, D.M. and S.N. by the Russian Science Foundation (grant no. 19-14-00172); C.T.B. and L.I. in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through grant nos. GBMF4551 to C.T.B.; R.C. and R.V. by ANR Inception (ANR-16-CONV-0005) and PRAIRIE (ANR-19-P3IA-0001); S.D.K. by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (ERC-COG-2018); J.K. and E.R.R. by the National Science Foundation under grant no. 1845890; S.M. partially by National Science Foundation grant nos. 2041984; V.R.M. by the Tony Basten Fellowship, Sydney Medical School Foundation. G.L.R. and Z.Z. partially by the National Science Foundation grant nos. 1936791 and 1919691; M.T. by the ERC under the European Union’s Horizon 2020 research and innovation programme (ERC-COG-2018); S.Z. by the Shanghai Municipal Science and Technology Commission (grant no. 2018SHZDZX01), 111 Project (grant no. B18015); S. Hacquard. by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through the ‘2125 DECRyPT’ Priority Program; R.E., E.Goltsman, Zho.W. and A.T. by the Department of Energy (DOE) Office of Biological and Environmental Research under contract number DE-AC02-05CH11231; S.S. by the Swiss National Science Foundation (NCCR Microbiomes – 51NF40_180575). This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231. The work conducted by the US DOE Joint Genome Institute, a DOE Office of Science User Facility, is supported under contract no. DE-AC02-05CH11231.
Publisher Copyright:
© 2022, The Author(s).