Abstract
Differential analysis of bulk RNA-seq data often suffers from lack of good controls. Here, we present a generative model that replaces controls, trained solely on healthy tissues. The unsupervised model learns a low-dimensional representation and can identify the closest normal representation for a given disease sample. This enables control-free, single-sample differential expression analysis. In breast cancer, we demonstrate how our approach selects marker genes and outperforms a state-of-the-art method. Furthermore, significant genes identified by the model are enriched in driver genes across cancers. Our results show that the in silico closest normal provides a more favorable comparison than control samples.
Originalsprog | Engelsk |
---|---|
Artikelnummer | 263 |
Tidsskrift | Genome Biology |
Vol/bind | 24 |
Udgave nummer | 1 |
Antal sider | 17 |
ISSN | 1474-7596 |
DOI | |
Status | Udgivet - 2023 |
Bibliografisk note
Funding Information:Open access funding provided by Royal Library, Copenhagen University Library AK is funded by two grants from the Novo Nordisk Foundation: Center for Basic Machine Learning Research in Life Science (NNF20OC0062606) and Quantum for Life (NNF20OC0059939). AK, VaS, and IPL receive funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 101017549, ‘Genomed4all’. YL is supported by the China Scholarship Council (Grant 201804910693).
Publisher Copyright:
© 2023, The Author(s).