Please note: This PhD seminar will take place online.
Johra Muhammad Moosa, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Bin Ma
To improve the peptide identification rates in the database search analysis of bottom-up proteomics data, many proposed implementation of machine learning algorithms. These machine learning-based methods train a new scoring function after the initial search to rescore and rerank the peptide spectrum matches (PSMs). Generally, the retraining uses selected peptide-spectrum matches from the target and decoy databases as positive and negative training examples, respectively. However, this exposes the target-decoy information to the scoring function, potentially invalidating the false discovery rate (FDR) estimation.
We propose a novel method for retraining without revealing the target-decoy information. Our approach considers the top-ranked and the next-ranked peptides for the same spectrum as positive and negative examples, respectively. We demonstrate that this leads to a much-improved identification rate while maintaining accurate FDR estimation.