Please note: This master’s thesis presentation will be given online.
Soroosh Gholamizoj, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Bin Ma
In proteomics, database search programs are routinely used for peptide identification from tandem mass spectrometry data. However, many low-quality spectra cannot be interpreted by any programs. Meanwhile, certain high-quality spectra may not be identified due to incompleteness of the database, failure of the software, or sub-optimal search parameters. Thus, spectrum quality assessment tools are helpful programs that can eliminate poor-quality spectra before the database search and highlight the high-quality spectra that are not identified in the initial search. These spectra may be valuable candidates for further analyses.
We propose SPEQ: a spectrum quality assessment tool that uses a deep neural network to classify spectra into high-quality, which are worthy candidates for interpretation, and low-quality, which lack sufficient information for identification. SPEQ was compared with a few other prediction models and demonstrated improved prediction accuracy.
Furthermore, we propose a statistical model to automatically detect the enzyme used for digestion in a proteomics experiment, by analyzing the distribution of amino acids in peptides de novo sequenced with a nonspecific enzyme setting. Results demonstrate that this algorithm can accurately identify correct enzymes.