David Sankoff
University of Ottawa

Computational Genomics of Flowering Plant Evolution
 
photo of David Sankoff​Abstract: The 1990s saw the initiation of a formal theory to account for the evolution of gene order along chromosomes on the time scale of millions of years, one of the key aspects of the origin of new species. The dominant mechanisms for this rearrangement process are the inversion of a chromosomal fragment and the reciprocal exchange of the terminals of two chromosomes. This is modelled in terms of non-local operations on signed, fragmented, permutations, very different from the local substitutions, deletions and insertions in models of DNA or protein evolution.

These models have enabled the development of rapid algorithms to analyze many problems on comparative genomic data. More recently, processes of gene duplication and gene loss have been incorporated into the repertoire of rearrangement events. A dramatic instance of duplication, which is very rare but which has given rise to some of the most diverse and successful evolutionary lineages, is whole genome duplication (WGD), where every chromosome, every gene, appears in two copies.

This happened twice in the forebears of all vertebrates, has recurred in several fish lineages and amphibians, in some protists and in the ancestor of baker’s yeast. In the flowering plants, however, numbering hundreds of thousands of species and including virtually all crops, it has occurred many times, once at the very origin of these plants, and thereafter in every lineage, with a solitary exception, at least once again and some times three or four or more times. 

After WGD in a species, another mechanism operates simultaneously with traditional rearrangement processes to scramble gene order. This is fractionation: duplicate gene loss on a massive scale, deleting one copy of each gene alternately from one or the other copy of every chromosome. Serious biases result if we analyze such genomes using algorithms designed for traditional chromosomal rearrangements only, and we lose much vital information about how evolution has proceeded.

We have been working on a comprehensive program for studying fractionation and incorporating it into existing models and analyses of gene order evolution.  Among the new problems investigated are run sizes on a single chromosome for retained duplicate genes, biased fractionation towards one chromosome or the other, comparison of WGD genomes with unaffected sister genomes, removal of massive fractionation artifacts in genome rearrangement analysis and inference of ancestral, pre-WGD, genomes. We illustrate with analyses of many flowering plant genomes, including the newly sequenced coffee and tomato genomes.

​Biography: David Sankoff studied mathematics at McGill University. He presently holds the Canada Research Chair in Mathematical Genomics at the University of Ottawa and was a professor at the Université de Montréal for many years, where he was among the first cohort of researchers at the Centre de recherches mathématiques.

He is a Fellow of the Royal Society of Canada and of the International Society for Computational Biology and a former Fellow of the Canadian Institute for Advanced Research. He is a medallist of the Association Francophone pour le Savoir, recipient of the first Senior Scientist Accomplishment Award by the International Society for Computational Biology and the Weldon Memorial Prize from Oxford University.

Dr. Sankoff was founding editor of Language Variation and Change (Cambridge) for 20 years and serves on the editorial boards of a number of bioinformatics, computational biology and linguistics journals.  His chief interest in computational biology is in algorithms for comparative genomics.