TY - JOUR
T1 - Sequencing and de novo assembly of 150 genomes from Denmark as a population reference
AU - Sørensen, Lasse Maretty
AU - Jensen, Jacob Malte
AU - Petersen, Bent
AU - Sibbesen, Jonas Andreas
AU - Liu, Siyang
AU - Villesen, Palle
AU - Skov, Laurits
AU - Belling, Kirstine G
AU - Have, Christian Theil
AU - Izarzugaza, Jose M. G.
AU - Grosjean, Marie
AU - Bork-Jensen, Jette
AU - Grove, Jakob
AU - Als, Thomas D.
AU - Huang, Shujia
AU - Chang, Yuqi
AU - Xu, Ruiqi
AU - Ye, Weijian
AU - Rao, Junhua
AU - Guo, Xiaosen
AU - Sun, Jihua
AU - Cao, Hongzhi
AU - Ye, Chen
AU - van Beusekom, Johan
AU - Espeseth, Thomas
AU - Flindt, Esben
AU - Friborg, Rune M.
AU - Halager, Anders E.
AU - Le Hellard, Stephanie
AU - Hultman, Christina M.
AU - Lescai, Francesco
AU - Li, Shengting
AU - Lund, Ole
AU - Løngren, Peter
AU - Mailund, Thomas
AU - Matey-Hernandez, Maria Luisa
AU - Mors, Ole
AU - Pedersen, Christian N. S.
AU - Sicheritz-Pontén, Thomas
AU - Sullivan, Patrick
AU - Syed, Ali
AU - Westergaard, David
AU - Yadav, Rachita
AU - Li, Ning
AU - Xu, Xun
AU - Hansen, Torben
AU - Krogh, Anders
AU - Bolund, Lars
AU - Sørensen, Thorkild I. A.
AU - Pedersen, Oluf Borbye
AU - Gupta, Ramneek
AU - Rasmussen, Simon
AU - Besenbacher, Søren
AU - Børglum, Anders D.
AU - Wang, Jun
AU - Eiberg, Hans Rudolf Lytchoff
AU - Kristiansen, Karsten
AU - Brunak, Søren
AU - Schierup, Mikkel Heide
PY - 2017
Y1 - 2017
N2 - Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits1, 2, 3, 4. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly2, 5, 6, 7. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology4, 8, 9, 10, 11, 12, 13. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.
AB - Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits1, 2, 3, 4. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly2, 5, 6, 7. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology4, 8, 9, 10, 11, 12, 13. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.
U2 - 10.1038/nature23264
DO - 10.1038/nature23264
M3 - Letter
C2 - 28746312
VL - 548
SP - 87
EP - 91
JO - Nature
JF - Nature
SN - 0028-0836
IS - 7665
ER -