Homework 4
This homework covers the material from lectures 19 to 23.
Due date: July 15th, 10pm Waterloo time.
PDF version of Homework 4.
LaTeX template, in case you want to write it in LaTeX.
Required Exercises
Problem 1 - Distinct Elements (25 points)
Consider again the distinct elements problem that we saw in class.
We are given a stream of elements
We will now analyze a different algorithm for the distinct elements problem:
- Let
be an integer - Pick at random a function
from a strongly 2-universal family. - Let
- Output
- Suppose that
contains distinct elements. Show that - Suppose that
contains distinct elements. Show that - Assuming that
, what is the memory requirement of the algorithm above?
Problem 2 - Hardness of Approximation (25 points)
Given a graph
Interestingly, the complement of an independent set is a vertex cover, so the complement of the MIS is the minimum vertex cover. We’ve seen (twice) how to get a two-approximation for vertex cover. Even though it is complementary, the situation for MIS is much worse.
Early in the study of approximation hardness, MIS was shown to be MAX-SNP-hard, meaning there is some constant to within which it cannot be approximated (unless P = NP).
Suppose one has an
- Prove that if there is an independent set of size
in , then there is an independent set of size in the product graph. - Prove that given an independent set of size
in the product graph, one can find an independent set of size in . - Conclude from the MAX-SNP-hardness of MIS that MIS has no constant-factor approximation (unless P = NP).
Problem 3 - Proving that (25 points)
Given a matrix
Optional question: how would you remove the assumption given above?
Problem 4 - Consensus with Crash Failures (25 points)
We will now analyze the consensus problem in the synchronous model with crash failures.
In this model, there are
We will assume that there will be at most
There are two types of crashes: clean and general.
- In a clean crash, if a processor crashes at round
, then it will not send or receive any messages at round or later. - In a general crash, if a processor
crashes at round , then an arbitrary subset of the messages that sends at round will be delivered to the other processors.
The setup of the consensus problem is as follows:
- Input: Each processor
has an input value . - Output: Each processor
should output a value such that:- Agreement: All alive processors output the same value.
- Validity: If all processors have the same input value
, then for any alive processor. - Termination: All alive processors must output a value at the end.
Part 1 (10 points): give a protocol for consensus in the synchronous model with clean crashes, using one round of communication.
Part 2 (15 points): when there are general crashes, the processors can no longer reach consensus in one round. Hence, we will need a more resilient protocol to reach consensus.
Consider the following protocol for consensus under general crashes, where
Protocol: Consensus with at most
Code for Processor
- Initially, set
- For round
do:- Send
to all processors - Receive
from processor - Set
- if
, output
- Send
To prove correctness of the protocol above, let
- In every execution of the protocol,
for any two alive processors and .
Practice Problems
You are not required to submit the solution to these problems, but highly encouraged to attempt them for your own knowledge sake. :)
Problem 1
You are building a real-time analytics system for a popular blogging platform. The platform has millions of users around the world who are constantly reading different blog posts. Your task is to find the number of unique blog posts read by each user in the past day. Due to the large scale of the platform, you can’t store all the user activity data in memory at once. Also, due to the dynamic nature of the world the data comes to you in a data stream.
The data stream contains a sequence of tuples, where each tuple consists of a user ID and a blog post ID. For example, a tuple may look like this: (“UserID_123”, “BlogPostID_456”), which indicates that User 123 has read Blog Post 456.
Assume that during one day there are
You need to develop and analyze an algorithm that does the following:
- Processes each tuple from the stream one by one.
- provides a list of users that have read more than
of the total number of posts (that is, of ). Your list should include all such users (but may also contain more users) - your algorithm should use
memory
Problem 2
Remember that the characteristic polynomial of a matrix
Define the problem of computing the characteristic polynomial of a matrix
Hint: First note that if I can evaluate a univariate polynomial
Clarification: in class we only saw how to compute the determinant and the inverse of a matrix of field elements. You should only need to use these facts.