TY - JOUR
T1 - Computational identification of signals predictive for nuclear RNA exosome degradation pathway targeting
AU - Wu, Mengjun
AU - Schmid, Manfred
AU - Jensen, Torben Heick
AU - Sandelin, Albin
PY - 2022
Y1 - 2022
N2 - The RNA exosome degrades transcripts in the nucleoplasm of mammalian cells. Its substrate specificity is mediated by two adaptors: the 'nuclear exosome targeting (NEXT)' complex and the 'poly(A) exosome targeting (PAXT)' connection. Previous studies have revealed some DNA/RNA elements that differ between the two pathways, but how informative these features are for distinguishing pathway targeting, or whether additional genomic features that are informative for such classifications exist, is unknown. Here, we leverage the wealth of available genomic data and develop machine learning models that predict exosome targets and subsequently rank the features the models use by their predictive power. As expected, features around transcript end sites were most predictive; specifically, the lack of canonical 3 ' end processing was highly predictive of NEXT targets. Other associated features, such as promoter-proximal G/C content and 5 ' splice sites, were informative, but only for distinguishing NEXT and not PAXT targets. Finally, we discovered predictive features not previously associated with exosome targeting, in particular RNA helicase DDX3X binding sites. Overall, our results demonstrate that nucleoplasmic exosome targeting is to a large degree predictable, and our approach can assess the predictive power of previously known and new features in an unbiased way.
AB - The RNA exosome degrades transcripts in the nucleoplasm of mammalian cells. Its substrate specificity is mediated by two adaptors: the 'nuclear exosome targeting (NEXT)' complex and the 'poly(A) exosome targeting (PAXT)' connection. Previous studies have revealed some DNA/RNA elements that differ between the two pathways, but how informative these features are for distinguishing pathway targeting, or whether additional genomic features that are informative for such classifications exist, is unknown. Here, we leverage the wealth of available genomic data and develop machine learning models that predict exosome targets and subsequently rank the features the models use by their predictive power. As expected, features around transcript end sites were most predictive; specifically, the lack of canonical 3 ' end processing was highly predictive of NEXT targets. Other associated features, such as promoter-proximal G/C content and 5 ' splice sites, were informative, but only for distinguishing NEXT and not PAXT targets. Finally, we discovered predictive features not previously associated with exosome targeting, in particular RNA helicase DDX3X binding sites. Overall, our results demonstrate that nucleoplasmic exosome targeting is to a large degree predictable, and our approach can assess the predictive power of previously known and new features in an unbiased way.
KW - U1 SNRNP
KW - TRANSCRIPTION
KW - INITIATION
KW - COMPLEX
KW - DECAY
KW - UPSTREAM
KW - REGIONS
U2 - 10.1093/nargab/lqac071
DO - 10.1093/nargab/lqac071
M3 - Journal article
C2 - 36128426
VL - 4
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
SN - 2631-9268
IS - 3
M1 - lqac071
ER -