TY - JOUR
T1 - Estimating inbreeding coefficients from NGS data
T2 - impact on genotype calling and allele frequency estimation
AU - Garrett Vieira, Filipe Jorge
AU - Fumagalli, Matteo
AU - Albrechtsen, Anders
AU - Nielsen, Rasmus
PY - 2013
Y1 - 2013
N2 - Most methods for Next-Generation Sequencing (NGS) data analyses incorporate information regarding allele frequencies using the assumption of Hardy-Weinberg Equilibrium (HWE) as a prior. However, many organisms including domesticated, partially selfing or with asexual life cycles show strong deviations from HWE. For such species, and specially for low coverage data, it is necessary to obtain estimates of inbreeding coefficients (F) for each individual beforecalling genotypes. Here, we present two methods for estimating inbreeding coefficients from NGS data based on an Expectation-Maximization (EM) algorithm. We assess the impact of taking inbreeding into account when calling genotypes or estimating the Site Frequency Spectrum (SFS), and demonstrate a marked increase in accuracy on low coverage highly inbred samples. We demonstrate the applicability and efficacy of these methods in both simulated and real datasets.
AB - Most methods for Next-Generation Sequencing (NGS) data analyses incorporate information regarding allele frequencies using the assumption of Hardy-Weinberg Equilibrium (HWE) as a prior. However, many organisms including domesticated, partially selfing or with asexual life cycles show strong deviations from HWE. For such species, and specially for low coverage data, it is necessary to obtain estimates of inbreeding coefficients (F) for each individual beforecalling genotypes. Here, we present two methods for estimating inbreeding coefficients from NGS data based on an Expectation-Maximization (EM) algorithm. We assess the impact of taking inbreeding into account when calling genotypes or estimating the Site Frequency Spectrum (SFS), and demonstrate a marked increase in accuracy on low coverage highly inbred samples. We demonstrate the applicability and efficacy of these methods in both simulated and real datasets.
U2 - 10.1101/gr.157388.113
DO - 10.1101/gr.157388.113
M3 - Journal article
C2 - 23950147
VL - 23
SP - 1852
EP - 1861
JO - Genome Research
JF - Genome Research
SN - 1088-9051
ER -