This course will cover the fundamental principles of probabilistic inference and computational learning systems. It will introduce the basic methods used in reasoning under uncertainty, computational probability and statistics in large-scale systems, machine learning, neural networks, pattern recognition, and graphical probability modelling. These techniques are now widely applied in scientific data analysis, data mining, trainable recognition/interpretation systems, computer vision algorithms, signal processing algorithms, adaptive resource allocators, adaptive controllers, and software for automated diagnosis. The emphasis will be on understanding the fundamental principles that permit effective probabilistic inference and learning in systems, realizing their inherent theoretical limitations, and exploring the latest advanced techniques.

This course requires prior knowledge of artificial intelligence techniques
and a working knowledge of methods from probability and statistics. It would
be advantageous to have some prior exposure to optimization methods.

Pattern Classification, by Richard Duda, Peter Hart, David Storck, Wiley,
2000. Neural Networks for Pattern Recognition, by Chris Bishop, Oxford, 1995.
Graphical Models for Machine Learning and Digital Communication, by Brendan
Frey, MIT Press, 1998.

3 hours of lecture per week.

What is probabilistic inference? What is machine learning? Learning to do probabilistic inference. Probabilistic inference for learning. Examples of inference and learning in computer vision, scientific data analysis, data mining, pattern recognition, signal processing, and adaptive controllers.

Bayes decision theory and utility theory. Exact inference. Monte Carlo methods: Unbiased samples, rejection sampling and importance sampling, particle filtering. Markov chain Monte Carlo (MCMC) methods: Stationary distributions and detailed balance, Gibbs sampling, the Metropolis algorithm, MCMC in practice. Variational techniques.

Fundamental types of learning problems. Propositional classifiers: Boolean formulae, decision trees, version spaces, bias. Real predictors: loss functions, linear and nonlinear regression, neural nets, local methods. Real classifiers: linear and nonlinear discriminants, neural nets, SVMs, local methods. Maximum likelihood learning. MAP learning. Complexity control, complexity penalization, regularization, cross validation, metric methods. Bayesian learning using probabilistic inference. Model averaging, Bayes, bagging, boosting, general ensemble methods.

Bayesian networks, Markov random fields and factor graphs. Exact inference using the sum-product algorithm and max-product algorithm. Approximate inference using Monte Carlo, Markov chain Monte Carlo, variational methods and the approximate sum-product algorithm.

Newton-type methods. Expectation-maximization algorithm. Generalized expectation-maximization algorithm. Monte Carlo, Markov chain Monte Carlo, variational techniques and the approximate sum-product algorithm. Learning structure. Bayesian learning with incomplete data, model ensembles.

Statistics of learning: learning curves, bias/variance decomposition of generalization error, overfitting/underfitting. Worst case analysis; uniform convergence: Vapnik Chervonenkis dimension, computational learning theory, PAC learning theory.