Please note: This PhD seminar will take place online.
Pablo Millán Arias, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Lila Kari
This work explores the genomic signatures of microbial extremophiles using a comprehensive dataset and a deep learning-based methodology. We first present a methodology based on supervised learning to classify their genomic k-mer-based signatures according to their true taxonomic labels and environmental characteristics. Subsequently, we compute various feature relevance measures to determine the impact of each k-mer in the classification, comparing our findings with existing literature and known environmental mutagens. Finally, we employ an unsupervised learning methodology to cluster multiple DNA fragments from bacterial and archaeal genomes.
Our results demonstrate that a deep learning-based methodology can accurately reconstruct the genomes of microbial extremophiles from their fragments, enabling the identification of potential candidates of convergent evolution through the analysis of the recovered genera. In summary, this research showcases the power of deep learning approaches in the discovery and advancement of our understanding of a pervasive environmental component of microbial genomic signatures.
Link to Nature Scientific Reports journal paper on which this seminar is based: https://www.nature.com/articles/s41598-023-42518-y.