# Mathematics of Data: From Theory to Computation (Fall 2018)

Instructor

Prof. Volkan Cevher

#### Description

Convex optimization offers a unified framework in obtaining numerical solutions to data analytics problems with provable statistical guarantees of correctness at well-understood computational costs.

To this end, this course reviews recent advances in convex optimization and statistical analysis in the wake of Big Data. We provide an overview of the emerging convex data models and their statistical guarantees, describe scalable numerical solution techniques such as stochastic, first-order and primal-dual methods.

Throughout the course, we put the mathematical concepts into action with large-scale applications from machine learning, signal processing, and statistics.

#### Learning outcomes

By the end of the course, the students are expected to understand the so-called time-data tradeoffs in data analytics. In particular, the students must be able to

1. Choose an appropriate convex formulation for a data analytics problem at hand
2. Estimate the underlying data size requirements for the correctness of its solution
3. Implement an appropriate convex optimization algorithm based on the available computational platform
4. Decide on a meaningful level of optimization accuracy for stopping the algorithm
5. Characterize the time required for their algorithm to obtain a numerical solution with the chosen accuracy

#### Prerequisites

Previous coursework in calculus, linear algebra, and probability is required. Familiarity with optimization is useful.

#### Outline

The course consists of the following topics

 lecture 1 Introduction to convex optimization and iterative methods. lecture 2 Review of basic probability theory. Maximum likelihood, M-estimators, and empirical risk minimization as a motivation for convex optimization. lecture 3 Fundamental concepts in convex analysis. Basics of complexity theory. lecture 4 Unconstrained smooth minimization I: Concept of an iterative optimization algorithm. Convergence rates. Characterization of functions. lecture 5 Unconstrained smooth minimization II: Gradient and accelerated gradient methods. lecture 6 Unconstrained smooth minimization III: The quadratic case. The conjugate gradient method. Variable metric algorithms. lecture 7 Stochastic gradient methods. lecture 8 Non-convex optimization. Neural networks. Convergence of SGD on nonconvex problems. lecture 9 Composite convex minimization I. Subgradient method. Proximal and accelerated proximal gradient methods. lecture 10 Composite convex minimization II. Proximal Newton-type methods. Stochastic proximal gradient methods. lecture 11 Constrained convex minimization I. The primal-dual approach. Smoothing approaches for non-smooth convex minimization. lecture 12 Constrained convex minimization II. The Frank-Wolfe method. The universal primal-dual gradient method. The alternating direction method of multipliers (ADMM).