Fingeprinting, Polynomial Identities, Matchings and Isolation Lemma
Motivation
It is hard to overstate the importance of algebraic techniques in computer science. Algebraic techniques are used in many areas of computer science, including randomized algorithms (hashing, today’s lecture), parallel algorithms (also this lecture), efficient proof/program verification (PCPs), coding theory, cryptography, and complexity theory.
Fingerprinting
We begin with a basic problem: suppose Alice and Bob each maintain the same large database of information (think of each as being a server from a comapany that deals with a lot of data). Alice and Bob want to check if their databases are consistent. However, they do not want to reveal their entire database to each other (as that would be too expensive).
So, sending the entire database to each other is not an option. What can they do? Deterministic consistent checking requires sending the entire database to each other. However, if we use randomness we can do much better, using a technique called fingerprinting.
The problem above can be more succinctly stated as follows: if Alice’s version of the database is given by string
Fingerprinting Mechanism
Let
Now, we can describe the fingerprinting mechanism/protocol as follows:
- Alice picks a random prime
and sends to Bob. - Bob checks if
, and sends to Alice
In the above algorithm, the total number of bits communicated is
Verifying String Inequality
If
Claim: If a number
Proof: each prime divisor of
By the above claim, the number of primes
Thus, the number of bits sent is
Polynomial Identity Testing
The technique of fingerprinting can be used to solve a more general problem: given two polynomials
Two polynomials are equal if and only if their difference is the zero polynomial.
Hence, the problem reduces to checking if a polynomial is the zero polynomial.
Since a polynomial of degree
If we want to increase the success probability, there are two ways to do it: either we can increase the number of points we check, or we can repeat the above procedure multiple times.
The above problem as well as the approach can be generalized to polynomials in many variables.
The general problem is known as polynomial identity testing, which we now formally state:
Polynomial Identity Testing (PIT): Given a polynomial
What do we mean by “given a polynomial”?
This can come in many forms, but in this class we will only assume that we have access to an oracle that can evaluate the polynomial at any point in
Generalizing the above approach yields the following lemma, that can be used in a randomized algorithm for polynomial identity testing.
Lemma 1 (Ore-Schwartz-Zippel-de Millo-Lipton): Let
Proof: We prove the lemma by induction on
For the inductive step, we assume that the lemma holds for
Now, we have
where in the second to last inequality we simply applied the inductive hypothesis for the cases of 1 variable and
Randomized Matching Algorithms
We now use the above lemma to give a randomized algorithm for the perfect matching problem.
We begin with the problem of deciding whether a bipartite graph
Input: A bipartite graph
Output: YES if
Let
Since
Thus, we can use Lemma 1 to give a randomized algorithm for the perfect matching problem! In other words, the perfect matching problem for bipartite graphs is a special case of the polynomial identity testing problem.
Thus, our algorithm is simply to evaluate the polynomial
Isolation Lemma
Often times in parallel algorithms, when solving a problem with many possible solutions, it is important to make sure that different processors are working towards the same solution.
For this, we need to single out (i.e. isolate) a specific solution without knowing any element of the solution space. How can we do this?
One way to do this is to implicitly define a random ordering on the solution space and then pick the first solution (i.e. lowest order solution) in this ordering. This approach also has applications in distributed computing, where we want to pick a leader among a set of processors, or break deadlocks. We can also use this approach to compute a minimum weight perfect matching in a graph (see references in slides).
We now state the isolation lemma:
Lemma 2 (Isolation Lemma): given a set system over
Example: Suppose
Then a random weight function
Remark: The isolation lemma can be quite counter-intuitive.
A set system can have
Proof of Isolation Lemma: Let
Let
Note that
does not belong to any minimum weight set. belongs to every unique minimum weight set. belongs to some but not all minimum weight sets (so this is an ambiguous case).
Since the weight function
Note that if we have two sets