TY - JOUR
T1 - Estimating Gene Conversion Tract Length and Rate From PacBio HiFi Data
AU - Charmouh, Anders Poulsen
AU - Porsborg, Peter Sørud
AU - Hansen, Lasse Thorup
AU - Besenbacher, Søren
AU - Boeg Winge, Sofia
AU - Almstrup, Kristian
AU - Hobolth, Asger
AU - Bataillon, Thomas
AU - Schierup, Mikkel Heide
N1 - © The Author(s) 2025. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
PY - 2025
Y1 - 2025
N2 - Gene conversions are broadly defined as the transfer of genetic material from a "donor" to an "acceptor" sequence and can happen both in meiosis and mitosis. They are a subset of noncrossover (NCO) events and, like crossover (CO) events, gene conversion can generate new combinations of alleles and counteract mutation load by reverting germline mutations through GC-biased gene conversion. Estimating gene conversion rate and the distribution of gene conversion tract lengths remains challenging. We present a new method for estimating tract length, rate, and detection probability of NCO events directly in HiFi PacBio long read data. The method can be used to make inference from sequencing of gametes from a single individual. The method is unbiased even under low single nucleotide variant (SNV) densities and does not necessitate any demographic or evolutionary assumptions. We test the accuracy and robustness of our method using simulated datasets where we vary length of tracts, number of tracts, the genomic SNV density, and levels of correlation between SNV density and NCO event position. Our simulations show that under low SNV densities, like those found in humans, only a minute fraction (∼2%) of NCO events are expected to become visible as gene conversions by moving at least 1 SNV. We finally illustrate our method by applying it to PacBio sequencing data from human sperm.
AB - Gene conversions are broadly defined as the transfer of genetic material from a "donor" to an "acceptor" sequence and can happen both in meiosis and mitosis. They are a subset of noncrossover (NCO) events and, like crossover (CO) events, gene conversion can generate new combinations of alleles and counteract mutation load by reverting germline mutations through GC-biased gene conversion. Estimating gene conversion rate and the distribution of gene conversion tract lengths remains challenging. We present a new method for estimating tract length, rate, and detection probability of NCO events directly in HiFi PacBio long read data. The method can be used to make inference from sequencing of gametes from a single individual. The method is unbiased even under low single nucleotide variant (SNV) densities and does not necessitate any demographic or evolutionary assumptions. We test the accuracy and robustness of our method using simulated datasets where we vary length of tracts, number of tracts, the genomic SNV density, and levels of correlation between SNV density and NCO event position. Our simulations show that under low SNV densities, like those found in humans, only a minute fraction (∼2%) of NCO events are expected to become visible as gene conversions by moving at least 1 SNV. We finally illustrate our method by applying it to PacBio sequencing data from human sperm.
KW - Gene Conversion
KW - Humans
KW - Polymorphism, Single Nucleotide
KW - Models, Genetic
U2 - 10.1093/molbev/msaf019
DO - 10.1093/molbev/msaf019
M3 - Journal article
C2 - 39982809
SN - 0737-4038
VL - 42
JO - MOLECULAR BIOLOGY AND EVOLUTION
JF - MOLECULAR BIOLOGY AND EVOLUTION
IS - 2
M1 - msaf019
ER -