Please note: This PhD defence will be take place online.
Nathan Harms, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Eric Blais
This thesis studies two types of problems in sublinear algorithms, where the goal is to perform some computation with as little information about the input as possible. The first type are property testing and learning problems, whose input is a pair (f,D) of a function f and a probability distribution D, and where the algorithm’s access to (f,D) is mediated by an “oracle” who answers limited types of queries; the goal is to design algorithms that use as few queries as possible, while weakening the oracle and loosening the restrictions on the input. The second type of problems we study are communication problems, where multiple parties collaborate to compute a function whose inputs are distributed among the parties.
For the first type of problem, we focus on the fundamental testing vs. learning question: Which hypothesis classes of functions can be tested more efficiently than they can be learned? Our main result is a technique for proving lower bounds on the complexity of testing in the distribution-free sample-based model that corresponds to the standard PAC learning model. We prove, among other results, that testing halfspaces cannot be done significantly more efficiently than learning.
For the second type of problem, we focus on the communication complexity of computing adjacency or distance thresholds in graphs, and our goal is to determine the structural conditions on the graphs that allow for constant-cost randomized communication. We introduce the adjacency and distance sketching problems for graphs, which allow us to relate communication complexity to the well-studied adjacency and distance labeling problems for graphs. This motivates a probabilistic version of the well-studied and wide-open “Implicit Graph Question”, and allows a new technique for designing adjacency and distance labeling schemes. Among many other results, we show that monotone classes of graphs have constant-cost randomized communication protocols for adjacency and distance thresholds if and only if they have bounded arboricity and bounded expansion, respectively.