Link Your Sites (LYS) Scripts: Automated Search of Protein Structures and Mapping of Sites Under Positive Selection Detected by PAML

Lys Sanz Moreta*, Rute R. da Fonseca

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

The visualization of the molecular context of an amino acid mutation in a protein structure is crucial for the assessment of its functional impact and the understanding of its evolutionary implications. Currently, searches for fast evolving amino acid positions using codon substitution models like those implemented in PAML (Yang and Nielsen in Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17(1):32–43, 2000; Zhang et al. in Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol 22(12):2472–2479, 2005) are done in almost complete proteomes, generating large numbers of candidate proteins making the analysis of individual protein structures and models very time-consuming. Here we present the package Link Your Sites (LYS) that can be used to reduce the number of analysed targets to those for which structural information can be retrieved. LYS consists of two python wrapper scripts, where the first one (i) mines the RCSB database (Berman et al. in The protein data bank. Nucleic Acids Res 28(1):235–242, 2000) using the BLAST alignment tool to find the best matching homologous sequences, (ii) fetches their domain positions by using Prosites (Hamelryck and Manderick in Pdb file parser and structure class implemented in python. Bioinformatics 19(17):2308–2310, 2003; Sigrist et al. in Prosite: a documented database using patterns and profiles as motif descriptors. Brief Bioinf 3(3):265–274, 2002; Sigrist et al. in New and continuing developments at prosite. Nucleic Acids Res 41(D1):D344–D347, 2012), (iii) parses the output of PAML extracting the positional information of fast-evolving sites and transforms them into the coordinate system of the protein structure, (iv) outputs one file per gene with the equivalence among the positions in the input sequence and homologous structure. The second script produces figures to be used in publications highlighting the positively selected sites mapped on regions that are known to have functional relevance.Motivation Automatizing the search for protein structures to assess the functional impact of sites found to be under positive selection by codeml, implemented in PAML (Yang and Nielsen in Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17(1):32–43, 2000). Building publication-quality figures highlighting the sites on a protein structure model that are within and outside functional domains. Reduces the workload associated with selecting proteins for which a functional assessment of the impact of substitutions can be done using a protein structure. This is especially relevant when analyzing almost complete proteomes which is the case of large comparative genomic studies.Software LYS scripts are executed in the command line. They automatically search for homologous proteins at the RSCB database (Nielsen in Molecular signatures of natural selection. Annu Rev Genet 39:197–218, 2005), determine the functional domain locations and correlate the positions pointed by the M8 model (Yang and Nielsen in Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17(1):32–43, 2000), and output a data frame that can be used as the input by PyMOL (Schrodinger in The pymol molecular graphics system. Version 1 in 2010) to generate a visualization of the results.Availability LYS is easy to install and implement and they are available at https://github.com/LysSanzMoreta/LYS_Automatic_Search.

Original languageEnglish
Book seriesEvolutionary Biology
Volume47
Issue number3
Pages (from-to)240-245
Number of pages6
ISSN0071-3260
DOIs
Publication statusPublished - 2020

Keywords

  • BLAST
  • Codeml
  • Functional domain
  • Homologous proteins
  • PDB
  • Positive selection
  • Prosites
  • PyMOL

Cite this