HaploCart: Human mtDNA haplogroup classification using a pangenomic reference graph

Joshua Daniel Rubin*, Nicola Alexandra Vogel, Shyam Gopalakrishnan, Peter Wad Sackett, Gabriel Renaud

*Corresponding author af dette arbejde

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

3 Citationer (Scopus)
14 Downloads (Pure)

Abstract

Current mitochondrial DNA (mtDNA) haplogroup classification tools map reads to a single reference genome and perform inference based on the detected mutations to this reference. This approach biases haplogroup assignments towards the reference and prohibits accurate calculations of the uncertainty in assignment. We present HaploCart, a probabilistic mtDNA haplogroup classifier which uses a pangenomic reference graph framework together with principles of Bayesian inference. We demonstrate that our approach significantly outperforms available tools by being more robust to lower coverage or incomplete consensus sequences and producing phylogenetically-aware confidence scores that are unbiased towards any haplogroup. HaploCart is available both as a command-line tool and through a user-friendly web interface. The C++ program accepts as input consensus FASTA, FASTQ, or GAM files, and outputs a text file with the haplogroup assignments of the samples along with the level of confidence in the assignments. Our work considerably reduces the amount of data required to obtain a confident mitochondrial haplogroup assignment.

OriginalsprogEngelsk
Artikelnummere1011148
TidsskriftPLOS Computational Biology
Vol/bind19
Udgave nummer6
Antal sider27
ISSN1553-734X
DOI
StatusUdgivet - 2023

Bibliografisk note

Funding Information:
Funding for this research was provided by a Novo Nordisk Data Science Investigator grant number NNF20OC0062491 (GR). This funding source provided the salaries for JDR and NAV. Additional funding for computational resources was provided by the Department for Health Technology at DTU. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We would like to thank the Department of Healthtech at DTU for usage of the Healthtech Cluster. We would like to thank Daniel Caleb Remero Yianni for his help with the web application and maintenance of computational infrastructure. We would also like to thank Viviane Slon and Ana T. Duggan for their valuable comments on the manuscript. Finally we would like to thank Nanna Elmstedt Bild for her help in creating our graphical illustration of the HaploCart inference algorithm.

Publisher Copyright:
© 2023 Rubin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Citationsformater