Revised July 3, 2015

CS 482: Computational Techniques in Biological Sequence Analysis


General description

This course introduces the most well known bioinformatics problems and the algorithms behind their solutions. These problems include sequence alignment, large-scale sequence database search, evolutionary tree reconstruction, gene prediction, and protein sequencing. Students explore the underlying computational techniques and skills to solve similar problems.

Logistics

Audience

  • Students taking the Bioinformatics option or students interested in learning how to apply mathematical modeling and algorithmic methods to solve biological problems. Usually taken in fourth year.

Normally available

  • Winter

Related courses

  • Pre-requisites: CS 341, STAT 241 or at least 60% in STAT 231

For official details, see the UW calendar.

Software/hardware used

  • A personal computer for programming

Typical reference(s)

  • R. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence Analysis, Cambridge Press, 1999

Required preparation

At the start of the course, students should be able to

  • Program in Java, C++, or Python
  • Design algorithms and analyze an algorithm's complexity
  • Describe basic concepts in molecular biology or quickly learn the concepts in the first few weeks

Learning objectives

At the end of the course, students should be able to

  • Find and use common bioinformatics resources and tools
  • Apply the learned modeling and algorithmic techniques to solve computational problems in biology
  • Apply the learned modeling and algorithmic techniques to solve data analysis problems in other areas

Typical syllabus

Introduction

  • Brief review of the fundamentals of molecular biology and genetics in the context of biology as an information science.

Pairwise sequence alignment

  • Classic dynamic programming ideas for pairwise sequence alignment
  • Statistical measures of alignment significance
  • Probabilistic models of homologous sequences

Heuristic sequence alignment

  • Mathematical ideas underlying BLAST, FASTA and other heuristic sequence aligners
  • Applications of sequence alignment
  • Multiple alignment

Exact string matching

  • Suffix trees, suffix arrays, and their application in pairwise and multiple alignment

Sequence annotation

  • Gene finding
  • Sequence feature detection
  • Motif finding
  • Hidden Markov models

Evolutionary tree algorithms

  • Classical and contemporary algorithms for inferring evolutionary trees

Protein sequence identification

  • Mass spectrometry and its application in protein identification and sequencing