« DIMACS/TRIPODS Workshop on Optimization and Machine Learning
August 13, 2018 - August 15, 2018
Organizer(s):
Frank Curtis, Lehigh University
Stefanie Jegelka, Massachusetts Institute of Technology
Satyen Kale, Google
Edo Liberty, Amazon
Han Liu, Northwestern University
Francesco Orabona, Stony Brook University
Katya Scheinberg, Lehigh University
Martin Takáč, Lehigh University
The DIMACS/TRIPODS workshop on Optimization in Machine Learning is being held at Lehigh University in association with two related events. It is immediately preceded by a three-day summer school for doctoral students interested in improving their theoretical and practical skills related to optimization approaches in machine learning. The workshop is followed by the annual Modeling and Optimization: Theory and Applications (MOPTA) conference, which covers related topics but is much broader in scope.
Although machine learning (ML) is still a relatively new field, it is growing at an astounding rate, and ML models are a driving force in optimization today. While some ML models—notably support vector machines and logistic regression—yield to more traditional convex optimization approaches, others—using deep neural networks—lead to nonlinear and nonconvex problems that stymie traditional solvers. This workshop will address several of the biggest optimization challenges that arise in practical ML applications and be organized around three key themes: 1) stochastic and nonconvex methods; 2) adaptive and online methods for learning under uncertainty; and 3) finding and exploiting special structure.
(1) Nonconvex and stochastic optimization methods. The optimization effort in ML applications has grown exponentially in the recent past, because of the difficulty and importance of training complex nonconvex models, such as deep neural networks. Stochastic methods have significant advantage on these problems because computing accurate function and gradient information required by deterministic methods may be prohibitive for very large data sets. At the same time, it is difficult to construct useful second order information based on stochastic estimates, and hence stochastic methods with fast convergence are still out of reach for training deep neural nets. On the other hand, exploiting second order information seems necessary for deep neural networks to avoid areas of saddle points, which terminate training with inferior solutions. Significant progress has been made recently for stochastic second-order methods and variance-reduction methods for convex models, as well as fast second-order deterministic methods for deep neural networks. Second-order methods also require advanced use of matrix approximation and sketching techniques to run efficiently. These methods are currently being tested and extended to stochastic nonconvex settings. Their promise makes them a timely and important workshop topic.
(2) Methods for Learning Under Uncertainty. Changing data distributions is a commonly encountered scenario in machine learning, especially when data are generated over time (as in spam filtering). Adaptive and online learning techniques able to handle changing data distributions are thus paramount in such situations. A recent trend is online learning methods that adapt to “easiness” in the data. This line of work provides algorithms that are able exploit structure in the data (if it exists) in order to learn faster, while retaining optimal worst-case convergence rates. One of the aims of this workshop is to investigate what notions of easiness are amenable to these sorts of algorithms. Another active area of research in adaptive online learning deals with methods that simultaneously provide competitive guarantees with predictors of arbitrary complexity. These guarantees naturally degrade gracefully with increasing complexity. These methods are intimately connected with optimization methods and convex analysis. Beyond their performance guarantees, such algorithms also tend to be parameter-free, making them immensely practical because they require no tuning. One aim of the workshop will be to gain insight on the optimal competitive guarantees that can be obtained in various scenarios of learning with unbounded complexity predictors.
(3) Special structure for optimization in machine learning. The identification of convexity in machine learning problems led to a surge in machine learning methods and related optimization algorithms, with broad impact in theory and applications. Many of these rely on optimality guarantees and scalability of the optimization. With the rise of deep learning and other complex models, and concomitant questions in computational versus statistical complexity, there is increasing interest in understanding the properties of nonconvex learning formulations. For example, in some cases nonconvex formulations may be statistically competitive if not. Still, the current theoretical understanding of empirical results for nonconvex learning is very limited and in need of deeper insight.
Scalability and guarantees for (nonconvex) optimization rely on additional mathematical structure. The workshop will explore types of structure that aid optimization and their potential impact on machine learning problems. Potential topics include geometric optimization, polynomial optimization, relaxations, restricted and generalized versions of convexity, and blending of discrete and continuous optimization. Examples of the latter are optimization problems arising from learning representations of discrete objects, nonconvex relaxations for discrete optimization, and exploiting structure from discrete optimization for nonconvex optimization. Recent examples in machine learning include results for matrix factorization and other recovery problems or submodular optimization.
Monday, August 13, 2018
Opening remarks
Plenary talk: Tractable Nonconvex Optimization via Geometry
Suvrit Sra, Massachusetts Institute of Technology
Uniform Convergence of Gradients for Non-convex Learning and Optimization
Karthik Sridharan, Cornell University
Logistic Regression: The Importance of Being Improper
Satyen Kale, Google
Break
Building Algorithms by Playing Games
Jake Abernethy, University of Michigan
Risk Bounds for Classification and Regression Models that Interpolate
Daniel Hsu, Columbia University
Lunch
Plenary talk: Representation, Optimization and Generalization Properties of Deep Neural Networks
Peter Bartlett, University of California, Berkeley
Parameter-free Nonsmooth Convex Stochastic Optimization through Coin Betting
Francesco Orabona, Stony Brook University
Data-dependent Hashing via Nonlinear Spectral Gaps
Alexandr Andoni, Columbia University
Break
Stochastic Optimization for AUC Maximization
Yiming Ying, University at Albany
The Power of Interpolation: Machine Learning without Loss Functions and Regularization
Mikhail Belkin, Ohio State University
Contextual Reinforcement Learning
John Langford, Microsoft Research
Social time at the Comfort Suites Bar
Tuesday, August 14, 2018
Maximizing Submodular Functions Exponentially Faster
Yaron Singer, Harvard University
Stefanie Jegelka, Massachusetts Institute of Technology
Nonconvex Sparse Deconvolution: Geometry and Efficient Methods
John Wright, Columbia University
Break
Hossein Mobahi, Google
Statistical Properties of Stochastic Gradient Descent
Panos Toulis, University of Chicago
Direct Runge-Kutta Discretization Achieves Acceleration
Aryan Mokhtari, Massachusetts Institute of Technology
Lunch
Plenary talk: Better Models in Optimization
John Duchi, Stanford University
Frank-Wolfe Splitting via Augmented Lagrangian Method
Simon Lacoste-Julien, University of Montreal
Second Order Optimization and Non-convex Machine Learning
Michael Mahoney, University of California, Berkeley
Break
Optimization over Nonnegative Polynomials
Amir Ali Ahmadi, Princeton University
Stochastic Quasi-gradient Methods: Variance Reduction via Jacobian Sketching
Peter Richtarik, University of Edinburgh
Wednesday, August 15, 2018
Plenary talk: Deep Learning with Dense Connectivity
Kilian Weinberger, Cornell University
Break
A Positive Outlook on Negative Curvature
Daniel Robinson, Johns Hopkins University
How to Characterize Worst-case Performance of Algorithms for Nonconvex Optimization
Frank Curtis, Lehigh University
Convex Lens for Non-convex Problems
Benjamin Haeffele, Johns Hopkins University
Break
Plenary talk: Practical Conditional Gradient Algorithms
Stephen Wright, University of Wisconsin, Madison
Lunch
Dimensionality Reduction Techniques for Global Optimization
Coralia Cartis, University of Oxford
“Active-set Complexity” of Proximal Gradient: How Long Does It Take to Find the Sparsity Pattern?
Mark Schmidt, University of British Columbia
Stochastic Methods for Non-smooth Non-convex Optimization
Damek Davis, Cornell University
Break
New Framework For Convergence Analysis Of Stochastic Optimization Methods (Part 1)
Katya Scheinberg, Lehigh University
New Framework For Convergence Analysis Of Stochastic Optimization Methods (Part 2)
Courtney Paquette, University of Waterloo
Do We Need 2nd Order Methods in Machine Learning?
Martin Takáč, Lehigh University
For the most up-to-date information on registration and participation, please see the workshop's primary webpage.
The workshop will feature plenary presentations each morning, two parallel sessions of invited presentations each afternoon, as well a contributed poster session and reception on August 14.
Workshop program and book of abstracts
Videos of DIMACS/TRIPODS and TRIPODS/MOPTA plenary presentations
Sponsored by NSF: TRIPODS Institute for Optimization and Learning and DIMACS, in association with the Special Focus on Bridging Continuous and Discrete Optimization.