Mind the Gap: A Neural Network Framework for Imputing Genotypes in Non-Model Species

Katia Bougiouri*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

Reduced representation sequencing (RRS) has proven to be a cost-effective solution for sequencing subsets of the genome in non-model species for large-scale studies. However, the targeted nature of RRS approaches commonly introduces large amounts of missing data, leading to reduced statistical power and biased estimates in downstream analyses. Genotype imputation, the statistical inference of missing sites across the genome, is a powerful alternative to overcome the caveats associated with missing sites. Typically, genotype imputation requires the presence of a reference panel of haplotypes, however, this is not always feasible for non-model species. In this issue of Molecular Ecology Resources, Mora-Márquez et al. (2024) develop gtImputation, an unsupervised machine learning imputation tool with an interactive GUI, which leverages information from the underlying data structure itself, without the need for a reference panel. They showcase that their method performs equally well and even surpasses existing haplotype-clustering and unsupervised machine learning algorithms, particularly for sites with low minor allele frequency (MAF) and for data sets with strong underlying population structure. This innovative framework adds to the ongoing efforts to expand the applicability of imputation to non-model species, offering the opportunity to apply varied types of analyses requiring dense sets of markers, while also maintaining lower sequencing costs.

Original languageEnglish
JournalMolecular Ecology Resources
Number of pages3
ISSN1755-098X
DOIs
Publication statusE-pub ahead of print - 2025

Bibliographical note

Publisher Copyright:
© 2025 John Wiley & Sons Ltd.

Keywords

  • genotype imputation
  • mixed populations
  • neural networks
  • non-model species
  • reduced representation sequencing

Cite this