Machine Learning

p-Norm Flow Diffusion for Local Graph Clustering

Local graph clustering and the closely related seed set expansion problem are primitives on graphs that are central to a wide range of analytic and learning tasks such as local clustering, community detection, nodes ranking and feature inference. …

Flow-based Algorithms for Improving Clusters: A Unifying Framework, Software, and Performance

Clustering points in a vector space or nodes in a graph is a ubiquitous primitive in statistical data analysis, and it is commonly used for exploratory data analysis. In practice, it is often of interest to refine or improve a given cluster that has …

Statistical Guarantees for Local Graph Clustering

Local graph clustering methods aim to find small clusters in very large graphs. These methods take as input a graph and a seed node, and they return as output a good cluster in a running time that depends on the size of the output cluster but that is …

Parallel and Communication Avoiding Least Angle Regression

We are interested in parallelizing the Least Angle Regression (LARS) algorithm for fitting linear regression models to high-dimensional data. We consider two parallel and communication avoiding versions of the basic LARS algorithm. The two algorithms …

A Flexible Coordinate Descent Method

We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature) information, so that the algorithm performance is more robust …

Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization

Parallel computing has played an important role in speeding up convex optimization methods for big data analytics and large-scale machine learning (ML). However, the scalability of these optimization methods is inhibited by the cost of communicating …

Avoiding communication in primal and dual block coordinate descent methods

Primal and dual block coordinate descent methods are iterative methods for solving regularized and unregularized optimization problems. Distributed-memory parallel implementations of these methods have become popular in analyzing large machine …

Performance of First- and Second-Order Methods for L1-Regularized Least Squares Problems

We study the performance of first- and second-order optimization methods for l1-regularized sparse least-squares problems as the conditioning of the problem changes and the dimensions of the problem increase up to one trillion. A rigorously defined …

A Second-Order Method for Strongly-Convex L1-Regularization Problems

In this paper a robust second-order method is developed for the solution of strongly convex l1-regularized problems. The main aim is to make the proposed method as inexpensive as possible, while even difficult problems can be efficiently solved. The …