PhD Defence • Bioinformatics • Deep Unsupervised Learning for Biodiversity Analysis: Representation Learning and Clustering of Bacterial, Mitochondrial, and Barcode DNA Sequences

Friday, May 10, 2024 1:00 pm - 4:00 pm EDT (GMT -04:00)

Please note: This PhD defence will take place in DC 2310 and online.

Pablo Millán Arias, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Lila Kari

Amid the recent surge in next-generation sequencing technologies, alignment-free algorithms stand out as a promising alternative to traditional alignment-based methods in phylogenetic analyses. Specifically, the use of genomic signatures has enabled the success of supervised machine learning-based alignment-free methods in taxonomic classification.

Motivated by this success, this dissertation investigates the potential of unsupervised learning-based alignment-free algorithms in genomic signature categorization and attempts to connect the worlds of biodiversity and taxonomic identification with the world of deep unsupervised learning. Our findings reveal deep learning’s untapped potential to capture taxonomic information, even without supervision. The methodologies presented in this dissertation can also be used to learn expressive DNA embeddings and test evolutionary hypotheses.