Pervasive incomplete lineage sorting illuminates speciation and selection in primates

Iker Rivas-González, Marjolaine Rousselle, Fang Li, Long Zhou, Julien Y. Dutheil, Kasper Munch, Yong Shao, Dongdong Wu, Mikkel H. Schierup, Guojie Zhang

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Abstract

INTRODUCTION
Incomplete lineage sorting generates gene trees that are incongruent with the species tree. Incomplete lineage sorting has been described in many phylogenetic clades, including birds, marsupials, and primates. For example, the level of incomplete lineage sorting in the human-chimp-gorilla branch adds up to ~30%, which means that, even though our closest primate relatives are chimps, 15% of our genome resembles more the gorilla than the chimp genome, and another 15% groups the chimp with the gorilla first.
RATIONALE
Although incomplete lineage sorting is usually regarded as an obstacle for phylogenetic reconstruction, it holds valuable information about the evolutionary history of the species because its extent depends on the ancestral effective population sizes and the time between speciation events. Additionally, recurrent ancestral selective processes are expected to influence how the proportion of incongruent trees varies along the genome, which makes incomplete lineage sorting a useful tool to study ancient evolutionary events. In this study, we estimate the incomplete lineage sorting landscape by running a coalescent hidden Markov model in species trios along a 50-way primate genome alignment. We then leverage the signal of incomplete lineage sorting to reconstruct ancestral effective population parameters and to analyze the genomic determinants that influence the sorting of lineages.
RESULTS
We find widespread incomplete lineage sorting across the primate tree in 29 nodes, some reaching as much as 64% of the genome. Combining CoalHMM with a machine learning pipeline, we reconstruct the speciation times of the primate phylogeny without the need for fossil calibrations. Our speciation time estimates are more recent than divergence times, and they are in agreement with previous estimates based on fossil evidence. Our reconstructed ancestral effective population sizes show that they increase toward the past.
We additionally detect regions that have low or high incomplete lineage sorting levels consistently across several nodes. We show that incomplete lineage sorting proportions increase with the recombination rate in the genomic region—a difference that translates into an up to fourfold variation in the inferred local effective population size. Moreover, we report low levels of incomplete lineage sorting on the X chromosome. This reduction is more pronounced than expected under neutral evolution, which suggests that selective forces affect the X chromosome more strongly than the autosomes, reducing the effective population size of the X chromosome and, subsequently, the levels of incomplete lineage sorting.
We further assess how selection affects the distribution of incomplete lineage sorting patterns by comparing the incomplete lineage sorting proportions of exons with those in intergenic regions. We find that there is an overall decrease in the levels of incomplete lineage sorting in exons that amounts to a reduction of 31% in the local effective population size as compared with intergenic regions.
Finally, we perform a gene ontology enrichment analysis on low– and high–incomplete lineage sorting genes. We find that immune system genes show large proportions of incomplete lineage sorting for many of the nodes, whereas housekeeping genes with basic cell functions show a lack of incomplete lineage sorting.
CONCLUSION
Most molecular-based methods that aim at timing a species tree provide estimates of divergence times, which are confounded by ancestral population sizes compared with the actual speciation times. We showed that using the coalescent theory and the signal of incomplete lineage sorting allows us to accurately estimate speciation times and ancestral population sizes in the primate tree, gaining key insights regarding some aspects of primate biology. Our study also emphasizes the prevalence of natural selection at linked sites that shapes the landscape of both genetic diversity and incomplete lineage sorting along the primate genome.
OriginalsprogEngelsk
TidsskriftScience (New York, N.Y.)
Vol/bind380
Udgave nummer6648
Antal sider10
ISSN0036-8075
DOI
StatusUdgivet - 2023

Citationsformater