Spectra without stories: reporting 94% dark and unidentified ancient proteomes: [version 1; peer review: 2 approved, 1 approved with reservations]

Yun Chiang*, Frido Welker, Matthew James Collins

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

1 Downloads (Pure)

Abstract

Background: Data-dependent, bottom-up proteomics is widely used for identifying proteins and peptides. However, one key challenge is that 70% of fragment ion spectra consistently fail to be assigned by conventional database searching. This ‘dark matter’ of bottom-up proteomics seems to affect fields where non-model organisms, low-abundance proteins, non-tryptic peptides, and complex modifications may be present. While palaeoproteomics may appear as a niche field, understanding and reporting unidentified ancient spectra require collaborative innovation in bioinformatics strategies. This may advance the analysis of complex datasets. Methods: 14.97 million high-impact ancient spectra published in Nature and Science portfolios were mined from public repositories. Identification rates, defined as the proportion of assigned fragment ion spectra, were collected as part of deposited database search outputs or parsed using open-source python packages. Results and Conclusions: We report that typically 94% of the published ancient spectra remain unidentified. This phenomenon may be caused by multiple factors, notably the limitations of database searching and the selection of user-defined reference data with advanced modification patterns. These ‘spectra without stories’ highlight the need for widespread data sharing to facilitate methodological development and minimise the loss of often irreplaceable ancient materials. Testing and validating alternative search strategies, such as open searching and de novo sequencing, may also improve overall identification rates. Hence, lessons learnt in palaeoproteomics may benefit other fields grappling with challenging data.

Original languageEnglish
Article number71
JournalOpen Research Europe
Volume4
Number of pages13
ISSN2732-5121
DOIs
Publication statusPublished - 2024

Bibliographical note

Publisher Copyright:
Copyright: © 2024 Chiang Y et al.

Keywords

  • bioinformatics challenges
  • database searching
  • DDA
  • palaeoproteomics
  • shotgun proteomics

Cite this