Efficient ancestry and mutation simulation with msprime 1.0

Franz Baumdicker, Gertjan Bisschop, Daniel Goldstein, Graham Gower, Aaron P. Ragsdale, Georgia Tsambos, Sha Zhu, Bjarki Eldon, E. Castedo Ellerman, Jared G. Galloway, Ariella L. Gladstein, Gregor Gorjanc, Bing Guo, Ben Jeffery, Warren W. Kretzschumar, Konrad Lohse, Michael Matschiner, Dominic Nelson, Nathaniel S. Pope, Consuelo D. Quinto-CortesMurillo F. Rodrigues, Kumar Saunack, Thibaut Sellinger, Kevin Thornton, Hugo Van Kemenade, Anthony W. Wohns, Yan Wong, Simon Gravel, Andrew D. Kern, Jere Koskela, Peter L. Ralph, Jerome Kelleher*

*Corresponding author af dette arbejde

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

129 Citationer (Scopus)
12 Downloads (Pure)

Abstract

Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime's many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.

OriginalsprogEngelsk
Artikelnummeriyab229
TidsskriftGenetics
Vol/bind220
Udgave nummer3
Antal sider19
ISSN0016-6731
DOI
StatusUdgivet - 2022

Bibliografisk note

Funding Information:
A.D.K. was supported by US National Institutes of Health awards R01GM117241 and R01HG010774. A.L.G. was supported by US National Institutes of Health award R00HG008696 to Daniel R. Schrider. B.E. was supported by Deutsche Forschungsgemeinschaft grant 273887127 through Priority Programme SPP 1819: Rapid Evolutionary Adaptation (grant STE 325/17-2) to Wolfgang Stephan; B.E. would also like to acknowledge funding through The Icelandic Research Centre (Rannís) through an Icelandic Research Fund Grant of Excellence nr. 185151-051 to Einar Árnason, Katrín Halldórsdóttir, Alison Etheridge, Wolfgang Stephan, and B.E. F.B. is funded by the Deutsche Forschungsgemeinschaft EXC 2064/1: Project number 390727645 and EXC 2124: Project number 390838134. G.B. and K.L. are supported by an European Research Council starting grant (ModelGenomLand 757648) to K.L. Graham Gower was supported by a Villum Fonden Young Investigator award to Fernando Racimo (project no. 00025300). Gregor Gorjanc is supported by the Chancellor’s Fellowship of the University of Edinburgh and the UK Biotechnology and Biological Sciences Research Council grant to the Roslin Institute BBS/E/D/30002275. Jere Koskela is supported in part by UK Engineering and Physical Sciences Research Council grant EP/R044732/1. Jerome Kelleher is supported by the Robertson Foundation. P.L.R. was supported by US National Institutes of Health award R01HG010774. S.G. acknowledges funding from the Canada Research Chairs Program, from the Canadian Institutes of Health Research PJT 173300, and from the Canadian Foundation for Innovation.

Publisher Copyright:
© The Author(s) 2021.

Citationsformater