PhD Seminar • Bioinformatics • Mitigating the Missing-fragmentation Problem in De Novo Peptide Sequencing with a Two-stage Graph-based Deep Learning Model

Wednesday, February 7, 2024 9:00 am - 10:00 am EST (GMT -05:00)

Please note: This PhD seminar will take place online.

Ruixue Zhang, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Ming Li

De novo peptide sequencing with tandem mass spectrometry is the process of determining a peptide sequence through analysis of its tandem mass spectrum and accompanying information about the precursor peptide, such as its mass, charge and retention time. Compared with database search and spectral library, de novo peptide sequencing deduces peptide sequence without prior knowledge of the database. Novel protein discovery and immunopeptidomics depend on highly sensitive de novo peptide sequencing with tandem mass spectrometry. Despite notable improvement using deep learning models, the missing-fragmentation problem remains an important hurdle that severely degrades the performance of de novo peptide sequencing.

In this talk, I will first show that in the process of peptide prediction, missing fragmentation results in the generation of incorrect amino acids within those regions and causes error accumulation thereafter. Subsequently, I will introduce a new de novo sequencing model GraphNovo, a two-stage de novo peptide-sequencing algorithm based on a graph neural network and explain how GraphNovo mitigates the effects of missing fragmentation problem. As the main experimental result, I will end the talk by showing the performance of GraphNovo and presenting the first stage of GraphNovo can also improve the previous de novo peptide algorithms.