PageRank.mw

Example: Google Page Rank 

Reference: CS 370 Course Notes, Chapter 5. 

The Connectivity Matrix 

 

> Typesetting:-mrow(Typesetting:-mi(
 

> Typesetting:-mrow(Typesetting:-mi(
 

> Typesetting:-mrow(Typesetting:-mi(
 

>
 

 

Define a directed graph Typesetting:-mrow(Typesetting:-mi( with vertices Typesetting:-mrow(Typesetting:-mi( and edges Typesetting:-mrow(Typesetting:-mi(. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

[a, b, c, d, e, f] (1)
 

> Typesetting:-mrow(Typesetting:-mi(
 

{[a, b], [b, c], [b, d], [c, d], [c, e], [c, f], [d, a], [e, f], [f, a]} (2)
 

> Typesetting:-mrow(Typesetting:-mi(
 

GRAPHLN(directed, unweighted, [a, b, c, d, e, f], Array(%id = 165983624), `GRAPHLN/table/1`, 0) (3)
 

> Typesetting:-mrow(Typesetting:-mi(
 

Plot_2d
 

>
 

> Typesetting:-mrow(Typesetting:-mi(
 

6 (4)
 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (5)
 

>
 

 

For this application, the connectivity matrix Typesetting:-mrow(Typesetting:-mi( is the transpose of the graph's adjacency matrix. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (6)
 

>
 

 

Note that the definition of the connectivity matrix Typesetting:-mrow(Typesetting:-mi( is:  Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(Typesetting:-mrow(Typesetting:-mo( if there is a directed arc from node Typesetting:-mrow(Typesetting:-mi( to node Typesetting:-mrow(Typesetting:-mi( (and 0 otherwise). 

The nodes represent web pages and a directed arc from node Typesetting:-mrow(Typesetting:-mi( to node Typesetting:-mrow(Typesetting:-mi( represents the existence of a hyperlink from page Typesetting:-mrow(Typesetting:-mi( to page Typesetting:-mrow(Typesetting:-mi(. 

 

The connectivity matrix Typesetting:-mrow(Typesetting:-mi( would be very large for the world wide web (there are billions of web pages). However, the matrix is sparse (meaning that there are many zero entries in the matrix). 

 

The Typesetting:-mrow(Typesetting:-msup(Typesetting:-mi( column of Typesetting:-mrow(Typesetting:-mi( shows the links on page Typesetting:-mrow(Typesetting:-mi(. 

 

The outdegree of node Typesetting:-mrow(Typesetting:-mi(, denoted Typesetting:-mrow(Typesetting:-mi(, is the number of arcs leaving node Typesetting:-mrow(Typesetting:-mi(. 

The column sums in Typesetting:-mrow(Typesetting:-mi( yield the outdegree values:  Typesetting:-mrow(Typesetting:-mi((i.e., the Typesetting:-mrow(Typesetting:-msup(Typesetting:-mi( column sum). 

 

For our example, the outdegrees for nodes Typesetting:-mrow(Typesetting:-mi( are, respectively, Typesetting:-mrow(Typesetting:-mn( 

 

>
 

The Markov Transition Matrix 

Closely related to the connectivity matrix Typesetting:-mrow(Typesetting:-mi( , we define a matrix Typesetting:-mrow(Typesetting:-mi( of probabilities. We imagine a random web surfer, and Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(denotes the probability that the random surfer visits page Typesetting:-mrow(Typesetting:-mi( given that he is at page Typesetting:-mrow(Typesetting:-mi(. Assuming equal probability of choosing any of the outlinks from page Typesetting:-mrow(Typesetting:-mi(, the desired matrix of probabilities is obtained by scaling the connectivity matrix Typesetting:-mrow(Typesetting:-mi( by its column sums. 

 

In other words, Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi( if Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi( . 

 

> Typesetting:-mrow(Typesetting:-mi(
 

> Typesetting:-mrow(Typesetting:-mo(
 

> Typesetting:-mrow(Typesetting:-mi(
 

Matrix(%id = 151738700) (7)
 

>
 

 

The matrix Typesetting:-mrow(Typesetting:-mi( is an example of a Markov transition matrix because its elements satisfy Typesetting:-mrow(Typesetting:-mn(Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi( and its column sums are all equal to 1. 

 

>
 

Dead End Pages and Cycles 

>
 

In general, the directed graph representing web pages may contain dead end pages, i.e. pages with no outlinks. The corresponding Markov matrix Typesetting:-mrow(Typesetting:-mi( would have a column of all zeros for any such page. The random surfer would get stuck at such a page. Another problem is cycles: the random surfer might travel endlessly in a cycle of pages such as:  Typesetting:-mrow(Typesetting:-mi( 

 

The following modification is for dealing with dead end pages (see Section 5.2.1 of Course Notes): 

 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (8)
 

>
 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (9)
 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (10)
 

>
 

 

Note that the outer product matrix in this case is a zero matrix (because we have no dead end pages). 

 

> Typesetting:-mrow(Typesetting:-mi(
 

Matrix(%id = 151853792) (11)
 

>
 

 

To handle cycles (see Section 5.2.2 of Course Notes), we give the random surfer a probability Typesetting:-mrow(Typesetting:-mi( that he will follow a link from a given page and correspondingly a probability Typesetting:-mrow(Typesetting:-mfenced(Typesetting:-mrow(Typesetting:-mn( that he will teleport to another page (chosen at random) and then continue surfing. Google uses the value Typesetting:-mrow(Typesetting:-mi( 

 

Therefore we update our definition of the Markov transition matrix as follows. As in the Course Notes, we express the formula using the vector Typesetting:-mrow(Typesetting:-mi( by forming an outer product of Typesetting:-mrow(Typesetting:-mi( and its transpose. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

.85 (12)
 

> Typesetting:-mrow(Typesetting:-mi(
 

 

Display Typesetting:-mrow(Typesetting:-mi( to 3 decimal places. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

Matrix(%id = 151861100) (13)
 

>
 

 

An aside: Note that the outer product operation in this case simply forms a Matrix containing ones. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

Matrix(%id = 151870124) (14)
 

 

The matrix Typesetting:-mrow(Typesetting:-mi( again satisfies the properties of a Markov transition matrix. In particular, note that the column sums are all equal to 1. 

 

> Typesetting:-mrow(Typesetting:-mo(
 

 

 

 

 

 

1.000000000
1.000000000
.9999999999
1.000000000
1.000000000
1.000000000 (15)
 

>
 

Page Rank Algorithm 

A vector Typesetting:-mrow(Typesetting:-mi(is a probability vector if its entries satisfy Typesetting:-mrow(Typesetting:-mn( and Typesetting:-mrow(Typesetting:-munder(Typesetting:-mstyle(Typesetting:-mo( . Suppose that Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(is a probability vector representing the initial state of the random surfer, and for this initial state use equal probability for each page, as follows. 

 

> Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (16)
 

 

The Typesetting:-mrow(Typesetting:-msup(Typesetting:-mi( entry Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi( represents the probability that the random surfer is at page Typesetting:-mrow(Typesetting:-mi( , and the Markov transition matrix Typesetting:-mrow(Typesetting:-mi( can be used to calculate the probability vector for the random surfer after one hop, as follows. 

 

> Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (17)
 

 

Check that this is still a probability vector. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

1.000000000 (18)
 

 

Similarly for the state of the random surfer after two hops, three hops, etc. 

 

> Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (19)
 

> Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (20)
 

 

Check again that this is a probability vector. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

1.000000000 (21)
 

>
 

Convergence 

 

We wish to obtain a final page ranking as the limit of the above process, because this represents the infinite random hopping of our random surfer. 

 

The question is: Does the above process converge?  Theorem 5.7 in the Course Notes proves that the process converges to a unique vector Typesetting:-mrow(Typesetting:-msup(Typesetting:-mi( for any initial probability vector Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi( . The proof relies on the fact that our Markov transition matrix Typesetting:-mrow(Typesetting:-mi( is a positive Markov matrix as defined in the Course Notes. 

 

 

We can therefore apply the Page Rank Vector Algorithm stated in Section 5.7 of the Course Notes. 

 

> Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(
 

Vector[column](%id = 151870076) (22)
 

 

Choose a tolerance for the stopping criterion. 

(Note that in the actual application to web pages, fewer digits of accuracy would suffice.) 

 

> Typesetting:-mrow(Typesetting:-mi(
 

0.1e-7 (23)
 

> Typesetting:-mrow(Typesetting:-mo(
 

> Typesetting:-mrow(Typesetting:-mi(
 

38 (24)
 

> Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(
 

Vector[column](%id = 150459936) (25)
 

 

Look again at the directed graph we are modelling and compare the page rankings specified by Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi( . 

 

> Typesetting:-mrow(Typesetting:-mi(
 

> Typesetting:-mrow(Typesetting:-mi(
 

Plot_2d
 

>
 

Eigenvector Formulation 

The above method is actually performing what is known as the power method for finding the eigenvector of the matrix Typesetting:-mrow(Typesetting:-mi( corresponding to its largest eigenvalue. 

 

To see this, note that if we had converged to the limiting vector Typesetting:-mrow(Typesetting:-msup(Typesetting:-mi( then we would have:  Typesetting:-mrow(Typesetting:-mi(. The definition of an eigenvalue λ and its corresponding eigenvector Typesetting:-mrow(Typesetting:-mi( is that  Typesetting:-mrow(Typesetting:-mi( . Therefore, we are computing the eigenvector Typesetting:-mrow(Typesetting:-msup(Typesetting:-mi(corresponding to the eigenvalue Typesetting:-mrow(Typesetting:-mi(. 

 

Note: The eigenvalues of a matrix Typesetting:-mrow(Typesetting:-mi( are the zeros of the characteristic polynomial  Typesetting:-mrow(Typesetting:-mi( where Typesetting:-mrow(Typesetting:-mi( denotes the identity matrix. The eigenvectors corresponding to an eigenvalue Typesetting:-mrow(Typesetting:-mi( are the solutions of the singular linear system Typesetting:-mrow(Typesetting:-mfenced(Typesetting:-mrow(Typesetting:-mi( which we know is a singular system because the determinant is zero. 

 

This all works because, as discussed in the Course Notes, for a positive Markov matrix Typesetting:-mrow(Typesetting:-mi( the largest eigenvalue is Typesetting:-mrow(Typesetting:-mi( and there is only one eigenvector corresponding to this eigenvalue. 

 

>
 

 

To check these facts, let us compute the eigenvalues and eigenvectors of the matrix Typesetting:-mrow(Typesetting:-mi(. 

 

 

First, for clarification, let us compute the eigenvalues via the characteristic polynomial. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

> Typesetting:-mrow(Typesetting:-mi(
 

`+`(`-`(`*`(0.870010422e-1, `*`(lambda))), `-`(`*`(.4154375001, `*`(`^`(lambda, 3)))), `-`(0.1e-8), `-`(`*`(.2200614569, `*`(`^`(lambda, 2)))), `-`(`*`(.1274999999, `*`(`^`(lambda, 4)))), `-`(`*`(.150...
`+`(`-`(`*`(0.870010422e-1, `*`(lambda))), `-`(`*`(.4154375001, `*`(`^`(lambda, 3)))), `-`(0.1e-8), `-`(`*`(.2200614569, `*`(`^`(lambda, 2)))), `-`(`*`(.1274999999, `*`(`^`(lambda, 4)))), `-`(`*`(.150...
(26)
 

> Typesetting:-mrow(Typesetting:-mi(
 

1.000000001, `+`(`-`(0.7801022483e-1), `*`(.6217448261, `*`(I))), `+`(`-`(.3469897693), `*`(.3180736817, `*`(I))), -0.1149411552e-7, `+`(`-`(.3469897693), `-`(`*`(.3180736817, `*`(I)))), `+`(`-`(0.780...
1.000000001, `+`(`-`(0.7801022483e-1), `*`(.6217448261, `*`(I))), `+`(`-`(.3469897693), `*`(.3180736817, `*`(I))), -0.1149411552e-7, `+`(`-`(.3469897693), `-`(`*`(.3180736817, `*`(I)))), `+`(`-`(0.780...
(27)
 

> Typesetting:-mrow(Typesetting:-mi(
 

[1.000000001, .6266196805, .4707151655, 0.1149411552e-7, .4707151655, .6266196805] (28)
 

 

Note that the eigenvalue largest in magnitude is Typesetting:-mrow(Typesetting:-mi( . 

 

>
 

 

We can use the command Eigenvectors in Maple to compute the eigenvalues and eigenvectors. 

 

> Typesetting:-mrow(Typesetting:-mfenced(Typesetting:-mrow(Typesetting:-mi(
 

> Typesetting:-mrow(Typesetting:-mi(
 

`+`(1.00000000000000044, `*`(0., `*`(I))) (29)
 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (30)
 

>
 

 

An eigenvector is unique only up to a scaling factor. The eigenvector we are computing in our Page Rank Algorithm has all entries positive and its 1-norm is equal to 1 (i.e., the sum of the entries is equal to 1). 

 

 

Scale the eigenvector Typesetting:-mrow(Typesetting:-mi( by first making the entries positive, then divide by the 1-norm of Typesetting:-mrow(Typesetting:-mi(. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (31)
 

>
 

 

Check that this normalized eigenvector Typesetting:-mrow(Typesetting:-mi( corresponds to the vector Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi( computed by our Page Rank Algorithm. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

38 (32)
 

> Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(
 

Vector[column](%id = 150459936) (33)
 

> Typesetting:-mrow(Typesetting:-mi(
 

Vector[column](%id = 150471888) (34)
 

>
 

 

We see that Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi( corresponds to the normalized eigenvector Typesetting:-mrow(Typesetting:-mi( to within the accuracy tolerance Typesetting:-mrow(Typesetting:-mi( used in our Page Rank Algorithm. 

 

>
 

Example 2: A Graph with Dead Ends 

 

Consider a second example, this time a case where there are dead end pages. We add two dead end pages to the previous example. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

> Typesetting:-mrow(Typesetting:-mi(
 

[a, b, c, d, e, f, g, h] (35)
 

> Typesetting:-mrow(Typesetting:-mi(
 

{[a, b], [b, c], [b, d], [c, d], [c, e], [c, f], [d, a], [e, f], [f, a], [d, g], [e, h]} (36)
 

> Typesetting:-mrow(Typesetting:-mi(
 

GRAPHLN(directed, unweighted, [a, b, c, d, e, f, g, h], Array(%id = 181551868), `GRAPHLN/table/4`, 0) (37)
 

> Typesetting:-mrow(Typesetting:-mi(
 

Plot_2d
 

>
 

> Typesetting:-mrow(Typesetting:-mi(
 

8 (38)
 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (39)
 

>
 

 

The connectivity matrix Typesetting:-mrow(Typesetting:-mi( is the transpose of the graph's adjacency matrix. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (40)
 

 

Form the probability matrix Typesetting:-mrow(Typesetting:-mi( as before. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

> Typesetting:-mrow(Typesetting:-mo(
Typesetting:-mrow(Typesetting:-mo(
Typesetting:-mrow(Typesetting:-mo(
Typesetting:-mrow(Typesetting:-mo(
 

> Typesetting:-mrow(Typesetting:-mi(
 

Matrix(%id = 151975096) (41)
 

>
 

The Markov Transition Matrix 

 

Adjust the matrix Typesetting:-mrow(Typesetting:-mi( to deal with dead end pages as discussed in the previous example. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (42)
 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (43)
 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (44)
 

>
 

 

Note that the outer product matrix has ones in the columns that were previously zero, and then we multiply by the probability Typesetting:-mrow(Typesetting:-mfrac(Typesetting:-mn( (equal probability of teleporting to any of the pages). 

 

> Typesetting:-mrow(Typesetting:-mi(
 

Matrix(%id = 152007376) (45)
 

>
 

 

As before, avoid the possibility of cycles by using a probability Typesetting:-mrow(Typesetting:-mfenced(Typesetting:-mrow(Typesetting:-mn( of teleporting to a random page. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

.85 (46)
 

> Typesetting:-mrow(Typesetting:-mi(
 

 

Display Typesetting:-mrow(Typesetting:-mi( to 3 decimal places. 

 

> Typesetting:-mrow(Typesetting:-mi(
 

Matrix(%id = 152016120) (47)
 

>
 

 

The matrix Typesetting:-mrow(Typesetting:-mi( must satisfy the properties of a Markov transition matrix. In particular, the column sums must be all equal to 1. 

 

> Typesetting:-mrow(Typesetting:-mo(
 

 

 

 

 

 

 

 

1.000000000
1.000000000
.9999999999
1.000000000
1.000000000
1.000000000
1.000000000
1.000000000 (48)
 

>
 

Page Rank Algorithm 

 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (49)
 

> Typesetting:-mrow(Typesetting:-mi(
 

Typesetting:-mrow(Typesetting:-mverbatim( (50)
 

> Typesetting:-mrow(Typesetting:-mi(
 

1.000000000 (51)
 

>
 

> Typesetting:-mrow(Typesetting:-mi(
 

0.1e-7 (52)
 

> Typesetting:-mrow(Typesetting:-mo(
 

> Typesetting:-mrow(Typesetting:-mi(
 

30 (53)
 

> Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(
 

Vector[column](%id = 150384888) (54)
 

 

Look again at the directed graph we are modelling and compare the page rankings specified by Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi( . 

 

> Typesetting:-mrow(Typesetting:-mi(
 

> Typesetting:-mrow(Typesetting:-mi(
 

Plot_2d
 

>