PhD Defence • Bioinformatics • Advanced Machine Learning Techniques for Taxonomic Classification and Clustering of DNA Sequences

Wednesday, November 27, 2024 1:00 pm - 4:00 pm EST (GMT -05:00)

Please note: This PhD defence will take place in DC 2314 and online.

Fatemeh Alipour, PhD candidate
David R. Cheriton School of Computer Science

Supervisors: Professors Lila Kari and Yang Lu

Advancements in genomic sequencing have significantly increased the availability of DNA sequence data, introducing both opportunities and challenges in bioinformatics. This dissertation leverages advanced machine learning techniques to enhance taxonomic classification and clustering of DNA sequences, introducing several innovative algorithms.

We proposed a deep learning-based method for unsupervised clustering of DNA sequences without relying on prior taxonomic information and demonstrates superior performance over traditional clustering methods such as K-Means++ and Gaussian Mixture Models across various genomic datasets. Additionally, we developed a hybrid approach that integrates k-mer composition analysis with host species data to address the taxonomic classification of emerging astroviruses, successfully assigning genus labels to previously unclassified genomes and tackling the challenges posed by interspecies transmission. Moreover, we introduced a novel method that employs twin contrastive learning with convolutional neural networks to cluster Chaos Game Representations of DNA sequences. This method has shown robust performance and enhanced clustering accuracy compared to existing methods. Collectively, these methodologies improve the accuracy and computational efficiency of genomic data analysis and highlight the transformative potential of machine learning in DNA sequence classification.


To attend this PhD defence in person, please go to DC 2314. You can also attend virtually using MS Teams.