Abstract
Originalsprog | Engelsk |
---|---|
Tidsskrift | Genome Research |
Vol/bind | 19 |
Udgave nummer | 6 |
Sider (fra-til) | 1124-32 |
Antal sider | 8 |
ISSN | 1088-9051 |
DOI | |
Status | Udgivet - 2009 |
Adgang til dokumentet
Citationsformater
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS
I: Genome Research, Bind 19, Nr. 6, 2009, s. 1124-32.
Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review
}
TY - JOUR
T1 - SNP detection for massively parallel whole-genome resequencing
AU - Li, Ruiqiang
AU - Li, Yingrui
AU - Fang, Xiaodong
AU - Yang, Huanming
AU - Wang, Jian
AU - Kristiansen, Karsten
AU - Wang, Jun
PY - 2009
Y1 - 2009
N2 - Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.
AB - Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.
U2 - 10.1101/gr.088013.108
DO - 10.1101/gr.088013.108
M3 - Journal article
C2 - 19420381
SN - 1088-9051
VL - 19
SP - 1124
EP - 1132
JO - Genome Research
JF - Genome Research
IS - 6
ER -