TY - JOUR
T1 - Benchmarking the identification of a single degraded protein to explore optimal search strategies for ancient proteins
AU - Palomo, Ismael Rodriguez
AU - Nair, Bharath
AU - Chiang, Yun
AU - Dekker, Joannes
AU - Dartigues, Benjamin
AU - Mackie, Meaghan
AU - Evans, Miranda
AU - Macleod, Ruairidh
AU - Olsen, Jesper V.
AU - Collins, Matthew J.
N1 - Funding Information:
We would like to thank Shevan Wilkin and an anonymous reviewer and the recommender Raquel Assis for their time and effort. Their helpful comments and feedback improved the quality of the manuscript. Preprint version 3 of this article has been peer-reviewed and recommended by Peer Community In Mathematical and Computational Biology (https://doi.org/10.24072/pci.mcb.100309; Assis, R., 2024).IRP is currently funded by the European Union\u2019s Horizon 2020 Research and Innovation Programme under the Marie Sk\u0142odowska-Curie grant agreement No 956410. At the time of producing and writing this work BN was funded by European Union\u2019s Horizon 2020 Research and Innovation Programme under the Marie Sk\u0142odowska-Curie grant agreement No. 801199 and with MC by the European Union\u2019s EU Framework Programme for Research and Innovation Horizon 2020 under Grant Agreement No. 787282 (B2C). MC and MM were also supported by the Danish National Research Foundation (DNRF128). YC and JD are funded by the European Union\u2019s Horizon 2020 Research and Innovation Programme under the Marie Sk\u0142odowska-Curie grant agreement No. 956351.
Funding Information:
IRP is currently funded by the European Union\u2019s Horizon 2020 Research and Innovation Programme under the Marie Sk\u0142odowska-Curie grant agreement No 956410. At the time of producing and writing this work BN was funded by European Union\u2019s Horizon 2020 Research and Innovation Programme under the Marie Sk\u0142odowska-Curie grant agreement No. 801199 and with MC by the European Union\u2019s EU Framework Programme for Research and Innovation Horizon 2020 under Grant Agreement No. 787282 (B2C). MC and MM were also supported by the Danish National Research Foundation (DNRF128). YC and JD are funded by the European Union\u2019s Horizon 2020 Research and Innovation Programme under the Marie Sk\u0142odowska-Curie grant agreement No. 956351.
Publisher Copyright:
© 2024, Centre Mersenne. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Palaeoproteomics is a rapidly evolving discipline, and practitioners are constantly developing novel strategies for the analyses and interpretations of complex, degraded protein mixtures. The community has also established standards of good practice to interrogate our data. However, there is a lack of a systematic exploration of how these affect the identification of peptides, post-translational modifications (PTMs), proteins and their significance (through the False Discovery Rate) and correctness. We systematically investigated the performance of a wide range of sequencing tools and search engines in a controlled system: the experimental degradation of the single purified bovine β-lactoglobulin (BLG), heated at 95°C and pH 7 for 0, 4 and 128 days. We target BLG since it is one of the most robust and ubiquitous proteins in the archaeological record. We tested different reference database choices, a targeted dairy protein one, and the whole bovine proteome and the three digestion options (tryptic-, semi-tryptic-and non-specific searches), in order to evaluate the effects of search space and the identification of peptides. We also explored alternative strategies, including open search that allows for the global identification of PTMs based upon wide precursor mass tolerance and de novo sequencing to boost sequence coverage. We analysed the samples using Mascot, MaxQuant, Metamorpheus, pFind, Fragpipe and DeNovoGUI (pepNovo+, DirecTag, Novor), benchmarked these tools and discuss the optimal strategy for the characterisation of ancient proteins. We also studied physicochemical properties of the BLG that correlate with bias in the identification coverage.
AB - Palaeoproteomics is a rapidly evolving discipline, and practitioners are constantly developing novel strategies for the analyses and interpretations of complex, degraded protein mixtures. The community has also established standards of good practice to interrogate our data. However, there is a lack of a systematic exploration of how these affect the identification of peptides, post-translational modifications (PTMs), proteins and their significance (through the False Discovery Rate) and correctness. We systematically investigated the performance of a wide range of sequencing tools and search engines in a controlled system: the experimental degradation of the single purified bovine β-lactoglobulin (BLG), heated at 95°C and pH 7 for 0, 4 and 128 days. We target BLG since it is one of the most robust and ubiquitous proteins in the archaeological record. We tested different reference database choices, a targeted dairy protein one, and the whole bovine proteome and the three digestion options (tryptic-, semi-tryptic-and non-specific searches), in order to evaluate the effects of search space and the identification of peptides. We also explored alternative strategies, including open search that allows for the global identification of PTMs based upon wide precursor mass tolerance and de novo sequencing to boost sequence coverage. We analysed the samples using Mascot, MaxQuant, Metamorpheus, pFind, Fragpipe and DeNovoGUI (pepNovo+, DirecTag, Novor), benchmarked these tools and discuss the optimal strategy for the characterisation of ancient proteins. We also studied physicochemical properties of the BLG that correlate with bias in the identification coverage.
KW - benchmarking
KW - beta-lactoglobulin
KW - de novo
KW - False Discovery Rate
KW - open search
KW - Palaeoproteomics
U2 - 10.24072/pcjournal.491
DO - 10.24072/pcjournal.491
M3 - Journal article
AN - SCOPUS:85210438401
VL - 4
JO - Peer Community Journal
JF - Peer Community Journal
SN - 2804-3871
M1 - e107
ER -