Positive Semidefinite Matrices & Semidefinite Programming
Symmetric Matrices & Spectral Theorem
A matrix $A \in \mathbb{R}^{n \times n}$ is called symmetric if $A = A^T$. A complex number $\lambda \in \mathbb{C}$ is called an eigenvalue of $A$ if there exists a vector $u \in \mathbb{C}^n$ such that $Au = \lambda u$. In this case, $u$ is called an eigenvector of $A$ corresponding to $\lambda$.
Spectral Theorem: If $A$ is a symmetric matrix, then
- $A$ has $n$ eigenvalues (counted with multiplicity),
- all eigenvalues of $A$ are real,
- there exists an orthonormal basis of $\mathbb{R}^n$ consisting of eigenvectors of $A$.
In other words, we can write $$ A = \sum_{i=1}^n \lambda_i u_i u_i^T, $$ where $\lambda_1, \dots, \lambda_n \in \mathbb{R}$ are the eigenvalues of $A$, and $u_1, \dots, u_n \in \mathbb{R}^n$ are the corresponding (orthonormal) eigenvectors.
If a symmetric matrix $A$ has only non-negative eigenvalues, then we say that $A$ is positive semidefinite, and write $A \succeq 0$. There are several equivalent definitions of positive semidefinite matrices:
- all eigenvalues of $A$ are non-negative.
- $A = B^T B$ for some matrix $B \in \mathbb{R}^{d \times n}$, where $d \leq n$. The smallest value of $d$ is the rank of $A$.
- $x^T A x \geq 0$ for all $x \in \mathbb{R}^n$.
- $A = LDL^T$ for some diagonal matrix $D$ with non-negative diagonal entries and some lower triangular matrix $L$ with diagonal elements equal to $1$.
- $A$ is in the convex hull of the set of rank-one matrices $uu^T$ for $u \in \mathbb{R}^n$.
- $A = U^T D U$ for some diagonal matrix $D$ with non-negative diagonal entries and some orthonormal matrix $U$.
- $A$ is symmetric and all principal minors of $A$ are non-negative. Here, by principal minors we mean the determinants of the submatrices of $A$ obtained by deleting the same set of rows and columns.
Semidefinite Programming
Let $\mathcal{S}^m := \mathcal{S}^m(\mathbb{R})$ denote the set of $m \times m$ real symmetric matrices.
A semidefinite program (SDP) is an optimization problem of the form $$ \begin{array}{ll} \text{minimize} & \langle C, X \rangle \\ \text{subject to} & \langle A_i, X \rangle = b_i, \quad i = 1, \dots, m \\ & X \succeq 0, \end{array} $$ where $C, A_1, \dots, A_m \in \mathcal{S}^n$ and $b_1, \dots, b_m \in \mathbb{R}$. Moreover, $\langle A, B \rangle := \text{tr}(A^T B)$ is the trace inner product.
We can write and SDP in a way similar to a linear program as follows: $$ \begin{array}{ll} \text{minimize} & c^Tx \\ \text{subject to} & A_1 x_1 + \cdots + A_n x_n \succeq B \\ & x \in \mathbb{R}^n, \end{array} $$ where $A_1, \dots, A_n, B \in \mathcal{S}^m$ and $c \in \mathbb{R}^n$, and we use $C \succeq D$ to denote that $C - D \succeq 0$.
If the matrices $A_i, B$ are diagonal matrices, then the SDP is equivalent to a linear program. Thus, we see that SDPs generalize linear programs.
In a similar way to linear programs, the following are important structural and algorithmic questions for SDPs:
- When is a given SDP feasible? That is, is there a solution to the constraints at all?
- When is a given SDP bounded? Is there a minimum? Is it achievable? If so, how can we find it?
- Can we characterize optimality?
- How can we know that a given solution is optimal?
- Do the optimal solutions have a nice description?
- Do the solutions have small bit complexity?
- How can we solve SDPs efficiently?
To understand better these questions and the structure of SDPs, we will need to learn a bit about convex algebraic geometry.
Convex Algebraic Geometry
Spectrahedra
To understand the geometry of SDPs, we will need to understand their feasible regions, which are called spectrahedra and are described by Linear Matrix Inequalities (LMIs).
Definition 1 (Linear Matrix Inequality (LMI)): An LMI is an inequality of the form $$A_0 + \sum_{i=1}^n A_i x_i \succeq 0,$$ where $A_0, \ldots, A_n \in \mathcal{S}^m$.
Definition 2 (Spectrahedron): A spectrahedron is a set of the form $$ S = { x \in \mathbb{R}^n : A_0 + \sum_{i=1}^n A_i x_i \succeq 0 }, $$ where $A_0, \ldots, A_n \in \mathcal{S}^m$.
Note that spectrahedra are convex sets, since they are defined by LMIs, which are convex constraints. Moreover, several important convex sets are spectrahedra, including all polyhedra, circles/spheres, hyperbola, (sections of) elliptic curves, among others.
When considering SDPs, it is enough to work with a more general class of convex sets, which we call spectrahedral shadows. Spectrahedral shadows are simply projections of spectrahedra onto lower-dimensional spaces.
Testing Membership in Spectrahedra
To be able to solve SDPs efficiently, a first step is to be able to test membership in spectrahedra efficiently. That is, given a spectrahedron $S = { x \in \mathbb{R}^n : A_0 + \sum_{i=1}^n A_i x_i \succeq 0 }$ and a point $x \in \mathbb{R}^n$, we want to determine whether $x \in S$. Since $x \in S$ if and only if $A_0 + \sum_{i=1}^n A_i x_i \succeq 0$, this is equivalent to testing whether a given symmetric matrix is positive semidefinite.
More succinctly, we have the following decision problem:
- Input: symmetric matrix $A \in \mathcal{S}^m$
- Output: YES if $A \succeq 0$, NO otherwise.
An efficient algorithm for this problem is the symmetric gaussian elimination algorithm, which runs in time $O(m^3)$. The algorithm will proceed just as in gaussian elimination, by performing elementary row operations (without row swapping) to reduce $A$ to an upper triangular form. However, in this case, every time we apply a row operation, which can be encoded by a lower unitriangular matrix $L$, we also apply the same operation to the columns of $A$ by right-multiplying $A$ by $L^T$. This way, we ensure that $A$ remains symmetric throughout the process.
As the product of lower (or upper) unitriangular matrices is again a lower (or upper) unitriangular matrix, we can see that the symmetric gaussian elimination algorithm will always output a diagonal matrix $D$ with non-negative diagonal entries iff $A \succeq 0$.
To see that $D \succeq 0 \Leftrightarrow A \succeq 0$, first note that $D = L A L^T$, where $L$ is a lower unitriangular matrix. Now, $A \succeq 0 \Leftrightarrow z^T A z \geq 0$ for all $z \in \mathbb{R}^m \Leftrightarrow (L^T z)^T D (L^T z) \geq 0$ for all $z \in \mathbb{R}^m \Leftrightarrow D \succeq 0$.
The above proves that our algorithm is correct, and the running time is $O(m^3)$, since we need to perform $m^2$ elementary row operations, each of which takes $O(m)$ time.
Application: Control Theory
not required material - to be written here later - please see slides and references for this part
SDPs are used in many areas of mathematics and engineering, including control theory, combinatorial optimization, and quantum information theory. Today we will see an application of SDPs to control theory, in particular to the problem of stabilizing a linear, discrete-time dynamical system.