« search calendars« DIMACS/TRIPODS Workshop on Optimization and Machine Learning

« Uniform Convergence of Gradients for Non-convex Learning and Optimization

Uniform Convergence of Gradients for Non-convex Learning and Optimization

August 13, 2018, 10:00 AM - 10:30 AM

Location:

Iacocca Hall

Lehigh University

Bethlehem PA

Click here for map.

Karthik Sridharan, Cornell University

We introduce vector-valued Rademacher complexities as a user-friendly tool to bound the rate at which refined properties of the empirical risk such as gradients and Hessians converge to their population counterparts in non-convex settings. Our tools are simple, composable, and allow one to derive dimension-free uniform convergence bounds for gradients and hessians in a diverse range of non-convex learning problems under a robust set of assumptions. As an application of our techniques, we give a new analysis of batch gradient descent methods for non-convex generalized linear models and non-convex robust regression models, showing how to use any algorithm that finds approximate stationary points to obtain optimal sample complexity both in high (possibly infinite)- and low-dimensional regimes. This analysis applies under weaker distributional assumptions than in past works and applies even when multiple passes over the dataset are allowed. Moving beyond smooth models we show - in contrast to the smooth case - even for simple models such as a single ReLU it is not possible to obtain dimension-independent convergence rates for gradients in the worst case. On the positive side, we show that it is still possible to obtain dimension-independent rates for this and other non-smooth models under a new type of distributional assumption.