Instructor
Prof. Volkan Cevher
Description
Convex optimization offers a unified framework in obtaining numerical solutions to data analytics problems with provable statistical guarantees of correctness at wellunderstood computational costs.
To this end, this course reviews recent advances in convex optimization and statistical analysis in the wake of Big Data. We provide an overview of the emerging convex data models and their statistical guarantees, describe scalable numerical solution techniques such as firstorder methods and their randomized versions, and illustrate the essential role of parallel and distributed computation.
Throughout the course, we put the mathematical concepts into action with largescale applications from machine learning, signal processing, and statistics.
Learning outcomes
By the end of the course, the students are expected to understand the socalled timedata tradeoffs in data analytics. In particular, the students must be able to
 Choose an appropriate convex formulation for a data analytics problem at hand
 Estimate the underlying data size requirements for the correctness of its solution
 Implement an appropriate convex optimization algorithm based on the available computational platform
 Decide on a meaningful level of optimization accuracy for stopping the algorithm
 Characterize the time required for their algorithm to obtain a numerical solution with the chosen accuracy
Prerequisites
Previous coursework in calculus, linear algebra, and probability is required. Familiarity with optimization is useful.
Outline
The course consists of the following topics
Lecture 1  “Objects in Space”: Definitions of norms, inner products, and metrics for vector, matrix and tensor objects. 
Basics of complexity theory.  
Lecture 2  Maximum likelihood principle as a motivation for convex optimization. 
Fundamental structures in convex analysis, such as cones, smoothness, and conjugation.  
Lecture 3  Unconstrained, smooth minimization techniques. 
Gradient methods.  
Variable metric algorithms.  
Timedata tradeoffs in ML estimation.  
Lecture 4  Convex geometry of linear inverse problems. 
Structured data models (e.g., sparse and lowrank) and convex gauge functions and formulations that encourage these structures.  
Computational aspects of gauge functions.  
Lecture 5 
Composite convex minimization. Regularized Mestimators.

Timedata tradeoffs in linear inverse problems.  
Lecture 6  Convex demixing. 
Statistical dimension.  
Phase transitions in convex minimization.  
Smoothing approaches for nonsmooth convex minimization.  
Lecture 7  Constrained convex minimizationI. 
Introduction to convex duality.  
Classical solution methods (the augmented Lagrangian method, alternating minimization algorithm, alternating direction method of multipliers, and the FrankWolfe method) and their deficiencie.  
Lecture 8  Constrained convex minimizationII. 
Variational gap characterizations and dual smoothing. 

Scalable, blackbox optimization techniques.  
Time datatradeoffs for linear inverse problems.  
Lecture 9  Classical blackbox convex optimization techniques. 
Linear programming, semidefinite programming, and the interior point method (IPM).  
Hierarchies of classical formulations.  
Time and space complexity of the IPM.  
Lecture 10  Timedata tradeoffs in machine learning. 
Lecture 11  Convex methods for Big Data I: Randomized coordinate descent methods. 
The Page Rank problem and Nesterov’s solution.  
Composite formulations.  
Lecture 12  Convex methods for Big Data II: Stochastic gradient descent methods. 
Least squares: conjugate gradients vs. a simple stochastic gradient method.  
Dual and gradient averaging schemes.  
Stochastic mirror descent.  
Lecture 13  Randomized linear algebra routines for scalable convex optimization. 
Probabilistic algorithms for constructing approximate lowrank matrix decompositions.  
Subset selection approaches.  
Theoretical approximation guarantees.  
Lecture 14  Role of parallel and distributed computing. 
How to avoid communication bottlenecks and synchronization.  
Consensus methods.  
Memory lockfree, decentralized, and asynchronous algorithms. 