Fundamental Theorem of Markov Chains, PageRank

In this lecture, we will prove the Fundamental Theorem of Markov Chains and discuss the PageRank algorithm. In order to prove the Fundamental Theorem of Markov Chains, we need to review some concepts from linear algebra.

Linear Algebra Review

Eigenvalues, Eigenvectors, and Spectral Radius

Given a square matrix $A \in R^{n \times n}$ , a scalar $λ \in C$ is called an eigenvalue of $A$ if there exists a non-zero unit vector $v \in C^{n}$ (that is, $| v |_{2} = 1$ ) such that $A v = λ v$ . The vector $v$ is called an eigenvector of $A$ corresponding to the eigenvalue $λ$ .

The eigenvalues of a matrix $A$ are the roots of the characteristic polynomial $det (A - t I) = 0$ , where $I$ is the identity matrix of size $n \times n$ . The characteristic polynomial is a univariate polynomial of degree $n$ in the variable $t$ .

The eigenspace corresponding to an eigenvalue $λ$ is the set of all eigenvectors corresponding to $λ$ , together with the zero vector. The eigenspace corresponding to an eigenvalue $λ$ is a subspace of $C^{n}$ .

There are two ways of defining the multiplicity of an eigenvalue $λ$ :

The algebraic multiplicity of an eigenvalue $λ$ is the multiplicity of $λ$ as a root of the characteristic polynomial.
The geometric multiplicity of an eigenvalue $λ$ is the dimension of the eigenspace corresponding to $λ$ .

These two notions of multiplicity are equal for symmetric matrices (by the spectral theorem), but can differ for non-symmetric matrices. For instance, the matrix $A = [\begin{matrix} 1 & 1 \\ 0 & 1 \end{matrix}]$ has a single eigenvalue $λ = 1$ with algebraic multiplicity 2 and geometric multiplicity 1.

The spectral radius of a matrix $A$ is defined as $ρ (A) = max | λ | : λ is an eigenvalue of A .$

The Frobenius norm of a matrix $A$ is defined as $$|A|F = \sqrt{\sum{i=1}^n \sum_{j=1}^n A_{ij}^2} = \sqrt{\text{trace}(A^T A)}.$$

Note that the Frobenius norm of a matrix upper bounds the spectral radius of the matrix, i.e., $ρ (A) \leq | A |_{F}$ . One can see this as follows: Let $λ$ be an eigenvalue of $A$ with eigenvector $v$ . Then, we have $| λ |^{2} = | A v |_{2}^{2} = ⟨ A v, A v ⟩ = ⟨ v, A^{T} A v ⟩ = trace (v^{T} A^{T} A v) = = trace (A^{T} A v v^{T}) \leq trace (A^{T} A \cdot I) = | A |_{F}^{2} .$

Note that the above argument also shows that the following inequality holds for any unit vector $v$ : $| A v |_{2} \leq | A |_{F} .$

Proposition 1 (Gelfand’s formula): For any matrix $A \in R^{n \times n}$ , we have $ρ (A) = lim_{k \to \infty} | A^{k} |_{F}^{1 / k} .$

For two vectors $u, v \in R^{n}$ , we say that $u \geq v$ if $u_{i} \geq v_{i}$ for all $i \in [n]$ . We say that $u > v$ if $u \geq v$ and $u \neq v$ . With this definition at hand, we have the following easy lemma.

Lemma 2 (Positivity Lemma): Let $A \in R^{n \times n}$ be a positive matrix, i.e., $A_{i j} > 0$ for all $i, j \in [n]$ . Let $u, v \in R^{n}$ be distinct vectors such that $u \geq v$ . Then, we have $A u > A v$ . Moreover, there is $ε > 0$ such that $A u > (1 + ε) A v$ .

Proof: Since $u \geq v$ , and $u \neq v$ we have $u - v \neq 0$ and $u - v \geq 0$ . Let $α := min_{i, j \in [n]} A_{i j}$ . Then, we have $(A (u - v))_{i} = \sum_{j = 1}^{n} A_{i j} (u_{j} - v_{j}) \geq α \sum_{j = 1}^{n} (u_{j} - v_{j}) > 0.$ Therefore, $A u > A v$ . The moreover part follows from taking a small enough $ε$ . $◼$

We are now ready to state and prove the main tool that we will use to prove the Fundamental Theorem of Markov Chains.

Perron-Frobenius Theorem

We begin with Perron’s theorem for positive matrices.

Theorem 3 (Perron’s Theorem): Let $A \in R^{n \times n}$ be a positive matrix. Then, the following hold:

The spectral radius $ρ (A)$ is an eigenvalue of $A$ , and it has a positive eigenvector $v \in R_{> 0}^{n}$ .
$ρ (A)$ is the only eigenvalue of $A$ in the complex circumference $z \in C : | z | = ρ (A)$ .
$ρ (A)$ has geometric multiplicity 1.
$ρ (A)$ is simple, i.e., its algebraic multiplicity is 1.

Proof: By definition of $ρ (A)$ , there exists an eigenvalue $λ \in C$ of $A$ such that $| λ | = ρ (A)$ . Let $v \in C^{n}$ be an eigenvector corresponding to $λ$ . Let $u \in R^{n}$ be defined as $u_{i} := | v_{i} |$ for all $i \in [n]$ . Then, we have $(A u)_{i} = \sum_{j = 1}^{n} A_{i j} u_{j} \geq | \sum_{j = 1}^{n} A_{i j} v_{j} | = | λ v_{i} | = ρ (A) | v_{i} | = ρ (A) u_{i} .$ Therefore, $A u \geq ρ (A) u$ . If the inequality is strict, then by Lemma 2 we have $A^{2} u > ρ (A) A u$ , and there is some $ε > 0$ such that $A^{2} u > (1 + ε) ρ (A) A u$ . By induction, we have $A^{k + 1} u > (1 + ε)^{k} ρ (A)^{k} A u$ for all $k \in N$ . Hence, by Gelfand’s formula, by setting $w := A u / ‖ A u ‖_{2}$ , we have $ρ (A) = lim_{k \to \infty} ‖ A^{k} ‖_{F}^{1 / k} \geq lim_{k \to \infty} ‖ A^{k} w ‖_{2}^{1 / k} \geq lim_{k \to \infty} {((1 + ε)^{k} ρ (A)^{k})}^{1 / k} = (1 + ε) ρ (A),$ which is a contradiction. Therefore, the inequality $A u \geq ρ (A) u$ must be an equality, and $u$ is a non-negative eigenvector of $A$ corresponding to $ρ (A)$ . However, since $A$ is positive, the eigenvector $u$ must be positive, as $ρ (A) u_{i} = (A u)_{i} = \sum_{j = 1}^{n} A_{i j} u_{j} > 0 for all i \in [n] .$ This proves the first part of the theorem.

To prove the second part, let $λ \in C$ be an eigenvalue of $A$ such that $| λ | = ρ (A)$ , but $λ \neq ρ (A)$ . Let $z \in C^{n}$ be an eigenvector corresponding to $λ$ . Let $w \in R^{n}$ be defined as $w_{i} := | z_{i} |$ for all $i \in [n]$ . Then, by the above discussion, we must have $A w = ρ (A) w \Leftrightarrow \sum_{j = 1}^{n} A_{i j} w_{j} = ρ (A) w_{i} = ρ (A) | z_{i} | = | λ z_{i} | = | \sum_{j = 1}^{n} A_{i j} z_{j} | for all i \in [n] .$

From the above conditions, we can deduce that there is $α \in C$ such that $α z = w$ (as the triangle inequality must be an equality). But in this case, we have $λ α z = α A z = A w = ρ (A) w = ρ (A) α z \Rightarrow λ = ρ (A)$ , which is a contradiction. This proves the second part of the theorem.

Now we are ready to prove item 3: the geometric multiplicity of $ρ (A)$ is 1.

Suppose, for the sake of contradiction, that the geometric multiplicity of $ρ (A)$ is greater than 1. Let $u, v \in R^{n}$ be linearly independent eigenvectors corresponding to $ρ (A)$ (by the above discussion, we know that such eigenvectors must be real vectors). Let $β > 0$ be such that $u - β v \geq 0$ and at least one of the components of $u - β v$ is zero. Note that $u - β v \neq 0$ as $u$ and $v$ are linearly independent. Then, by Lemma 2, we have $ρ (A) (u - β v) = A (u - β v) > 0$ which contradicts the fact that $u - β v$ has a zero component. This proves the third part of the theorem.

Finally, we prove the fourth part of the theorem: the algebraic multiplicity of $ρ (A)$ is 1.

Let $v \in R^{n}$ be a positive eigenvector corresponding to $ρ (A)$ , and let $u \in R^{n}$ be a positive eigenvector of $A^{T}$ , corresponding to $ρ (A^{T})$ (which is equal to $ρ (A)$ , by Gelfand’s formula). We know $u$ exists by the first part of the theorem.

Claim: the space $u^{⊥} := x \in R^{n} : u^{T} x = 0$ is invariant under $A$ .

Proof of Claim: Let $x \in u^{⊥}$ . Then, we have $u^{T} A x = (A^{T} u)^{T} x = ρ (A^{T}) u^{T} x = 0$ .

Note that $u^{⊥}$ is a subspace of $R^{n}$ of dimension $n - 1$ , and $v \notin u^{⊥}$ , as $u^{T} v > 0$ , since both vectors are positive. Hence, we have that $R^{n}$ is the direct sum of $u^{⊥}$ and $span (v)$ . Let $w_{2}, \dots, w_{n}$ be a basis of $u^{⊥}$ , and $B \in R^{n \times n}$ be the matrix whose columns are $v, w_{2}, \dots, w_{n}$ .

By the above, $B$ is invertible, and we have that $B A B^{- 1}$ leaves the subspaces $B^{- 1} span (v) = span (e_{1})$ and $B^{- 1} u^{⊥} = span (e_{2}, \dots, e_{n})$ invariant. Thus, $B^{- 1} A B$ is a block matrix of the form $B^{- 1} A B = [\begin{matrix} ρ (A) & 0 0 & C \end{matrix}] .$

Since $A$ and $B^{- 1} A B$ are similar, they have the same eigenvalues. Moreover, we have $det (A - t I) = det (B^{- 1} A B - t I) = det (C - t I) \cdot (ρ (A) - t)$ . Thus, if $ρ (A)$ had algebraic multiplicity greater than 1, then $C$ would have $ρ (A)$ as an eigenvalue, and therefore $A$ would have $ρ (A)$ as an eigenvalue with geometric multiplicity greater than 1, which is a contradiction. This proves the fourth part of the theorem. $◼$

The Perron-Frobenius theorem is a generalization of Perron’s theorem to non-negative matrices.

Theorem 4 (Perron-Frobenius Theorem): Let $A \in R^{n \times n}$ be a non-negative matrix, which is irreducible and aperiodic. Then, the following hold:

The spectral radius $ρ (A)$ is an eigenvalue of $A$ , and it has a positive eigenvector $v \in R_{> 0}^{n}$ .
$ρ (A)$ is the only eigenvalue of $A$ in the complex circumference $z \in C : | z | = ρ (A)$ .
$ρ (A)$ has geometric multiplicity 1.
$ρ (A)$ is simple, i.e., its algebraic multiplicity is 1.

Proof: By Lemma 1 of Lecture 9, we know that there is a positive integer $m$ such that $A^{m}$ is positive. Apply Perron’s theorem to $A^{m}$ , and note that the eigenvalues of $A^{m}$ are the $m$ -th powers of the eigenvalues of $A$ , with the same eigenvectors. $◼$

Fundamental Theorem of Markov Chains

We are now ready to prove (most of) the Fundamental Theorem of Markov Chains.

Theorem 5 (Fundamental Theorem of Markov Chains): Let $P$ be the transition matrix of a finite, irreducible and aperiodic Markov chain. Then, the following statements hold:

There exists a unique stationary distribution $π$ of the Markov chain, where $π_{i} > 0$ for all $i \in [n]$ , where $n$ is the number of states of the Markov chain.
For any initial distribution $p_{0}$ , we have $lim_{t \to \infty} Δ_{T V} (P^{t} \cdot p_{0}, π) = 0.$
The stationary distribution $π$ is given by $π_{i} = lim_{t \to \infty} P_{i i}^{t} = \frac{1}{τ_{i i}} .$

Proof: We will prove items 1 and 2 of the theorem. As $P$ is the transition matrix of an irreducible and aperiodic Markov chain, we know that $P$ is non-negative, irreducible, and aperiodic. By the Perron-Frobenius theorem, we know that there exists a unique positive eigenvector $v \in R_{> 0}^{n}$ of $P$ corresponding to the spectral radius $ρ (P)$ . Moreover, we know that $ρ (P) = 1$ , since for any non-negative vector $u \in R^{n}$ with $| u |_{1} = 1$ , we have $| P u |_{1} = 1$ , as it is the probability distribution of the next state of the Markov chain. Hence $π := v / | v |_{1}$ is the unique stationary distribution of the Markov chain.

To prove item 2, let $B$ the the change of basis matrix used in the proof of Perron’s theorem. Then, we have that $B^{- 1} P B$ is a block matrix of the form $B^{- 1} P B = [\begin{matrix} 1 & 0 \\ 0 & C \end{matrix}],$ where $C$ is a matrix of size $(n - 1) \times (n - 1)$ with eigenvalues strictly inside the unit circle, which implies that $lim_{t \to \infty} C^{t} = 0$ . Thus, we have that $P^{t} = B [\begin{matrix} 1 & 0 \\ 0 & C^{t} \end{matrix}] B^{- 1}$ and therefore $lim_{t \to \infty} P^{t} = B [\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}] B^{- 1}$ .

PageRank Algorithm

Last updated on May 25, 2024

Edit this page