Polynomial Regression Under Arbitrary Product Distributions

Abstract

In recent work, Kalai, Klivans, Mansour, and Servedio (2005) studied a variant of the “Low-Degree (Fourier) Algorithm” for learning under the uniform probability distribution on $\{0, 1\}^n$. They showed that the L1 polynomial regression algorithm yields agnostic (tolerant to arbitrary noise) learning algorithms with respect to the class of threshold functions — under certain restricted instance distributions, including uniform on $\{0, 1\}^n$ and Gaussian on $\mathbb{R}^n$. In this work we show how all learning results based on the Low-Degree Algorithm can be generalized to give almost identical agnostic guarantees under arbitrary product distributions on instance spaces $X_1 \times \cdots \times X_n$. We also extend these results to learning under mixtures of product distributions.

The main technical innovation is the use of (Hoeffding) orthogonal decomposition and the extension of the “noise sensitivity method” to arbitrary product spaces. In particular, we give a very simple proof that threshold functions over arbitrary product spaces have $\delta$-noise sensitivity $O(\sqrt{\delta})$, resolving an open problem suggested by Peres (2004).

Publication
Machine Learning

Conference version