PhD Seminar • Bioinformatics • Unsupervised and Self-Supervised Learning Approaches to Taxonomic Categorization of DNA Sequences

Thursday, April 18, 2024 2:30 pm - 3:30 pm EDT (GMT -04:00)

Please note: This PhD seminar will take place online, NOT in person as advertised earlier.

Pablo Millán Arias, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Lila Kari

This talk discusses two approaches for utilizing unlabelled genomic data for taxonomic categorization, specifically through deep-learning-based algorithms. First, we introduce an entropy-based clustering method for DNA Sequences, which employs a discriminative classifier to identify taxonomic clusters without supervision. Further, we expand upon these ideas and leverage self-supervised representation learning for enhanced non-parametric DNA sequence clustering, achieving performance comparable to traditional alignment-based methods in synthetic datasets. Our work demonstrates the integration of deep unsupervised learning with taxonomic identification, offering novel approaches for biodiversity studies.