SCOPT: Self-Concordant OPTimization

Overview

SCOPT  is a MATLAB implementation of the proximal gradient, proximal quasi-Newton, proximal Newton, and path-following interior-point algorithms for solving composite convex minimization problems involving self-concordant and self-concordant-like functions:

$$\min_{\mathbf{x}\in\mathbb{R}^n}\Big\{ F(\mathbf{x}) := f(\mathbf{x}) + g(\mathbf{x}) \Big\},$$

where \(f\) is a self-concordant or self-concordant-like function, and \(g\) is a proper, closed and convex function.

  • Briefly, the function \(f\) is called self-concordant (with the parameter \(M_f > 0\)) if \(\vert\varphi”'(t)\vert \leq M_f\varphi”(t)^{3/2}\), where \(\varphi(t) := f(\mathbf{x} + t\mathbf{u})\) for \(\mathbf{x}\in\mathrm{dom}(f), \mathbf{u}\in\mathbb{R}^m\), and \(\mathbf{x} + t\mathbf{u}\in\mathrm{dom}(f)\).

     Examples: \(f(\mathbf{X}) := -\log\det(\mathbf{X})\), \(f(\mathbf{x},t) := -\log(t^2 – \Vert\mathbf{x}\Vert_2^2)\), and \(f(\mathbf{x}) = -\sum\limits_{i=1}^p\log(\mathbf{a}_i^T\mathbf{x} – b_i)\) are all self-concordant.

  • The function \(g\) is usually assumed that its proximal operator $$\mathrm{prox}_g(\mathbf{x}) := \mathrm{arg}\min_{\mathbf{u}\in\mathbb{R}^n}\big\{ g(\mathbf{u}) + (1/2)\Vert\mathbf{u} – \mathbf{x}\Vert_2^2\big\}$$ is ”easy” to compute (e.g., in a closed form or in polynomial time).

SCOPT consists of several algorithms customized for solving the following convex optimization problems (but not limited to those):

  • Sparse Inverse Covariance Estimation Problems

J. Friedman, T. Hastie, and R. Tibshirani. Sparse Inverse Covariance Estimation with the graphical LASSO. Biostatistics, vol. 9, no. 3, pp. 432–441, 2007.

Example: 

$$\min_{\mathbf{X}\succeq 0}\Big\{ -\log\det(\mathbf{X}) + \mathrm{tr}(\mathbf{S}\mathbf{X}) + \rho\Vert\mathrm{vec}(\mathbf{X})\Vert_1\Big\}.$$

  • Poisson Intensity Reconstruction Problems

Z. T. Harmany, R. F. Marcia, and R. M. Willett. This is SPIRAL-TAP: Sparse Poisson Intensity Reconstruction Algorithms – Theory and Practice. IEEE Trans. Image Processing, vol. 21, no. 3, pp. 1084–1096, 2012.

Example:

$$ \min_{\mathbf{x}\geq 0}\Big\{ \sum_{i=1}^m\mathbf{a}_i^T\mathbf{x} – \sum_{i=1}^mb_i\log(\mathbf{a}_i^T\mathbf{x} + \mu_i) + \rho\mathrm{TV}(\mathbf{x})\Big\}.$$

  • Heteroscedastic LASSO problems

N. Stadler, P. Buhlmann, and S. V. de Geer. L1-Penalization for mixture regression models. TEST, vol. 19, no. 2, pp. 209–256, 2010.

Example $$\min_{\sigma>0,\beta\in\mathbb{R}^p}\Big\{-\log(\sigma) + (1/(2n))\Vert\mathbf{X}\beta-\sigma\mathbf{y}\Vert_2^2 + \rho\Vert\beta\Vert_1\Big\}.$$

  • Sparse logistic and multinomial logistic regression problems  

B. Krishnapuram, M. Figueredo, L. Carin, and A. Hertemink. Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), vol. 27, pp. 957–968, 2005.

Example: $$\min_{\mathbf{X}\in\mathbb{R}^{m\times p}}\Big\{(1/N)\sum_{i=1}^N\Big[\log\big(1 + \sum_{j=1}^m\mathrm{exp}((\mathbf{W}^{(j)})^T\mathbf{X}^{(i)})\big) – \sum_{i=1}^my_i^{(j)}(\mathbf{W}^{(j)})^T\mathbf{X}^{(i)}\Big] + \rho\Vert\mathrm{vec}(\mathbf{X})\Vert_1\Big\},$$

where \(y^{(j)}\) and \(\mathbf{W}^{(j)}\) are input data, \(j=1,\dots, N\).

Download

SCOPT can be downloaded at:

References

The theory and algorithms implemented in SCOPT can be found in the following papers:

[1]. Q. Tran-Dinh, A. Kyrillidis, and V. Cevher. A proximal Newton framework for composite minimization: Graph learning without Cholesky decomposition and matrix inversions. Proceedings of the 30th International Conference on Machine Learning (ICML), JMLR W&CP 28(2): 271–279, 2013.

Available at : http://infoscience.epfl.ch/record/183012 

Abstract:  We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator. We theoretically establish the convergence of our framework without relying on the usual Lipschitz gradient assumption on the smooth part. An important highlight of our work is a new set of analytic step-size selection and correction procedures based on the structure of the problem. We describe concrete algorithmic instances of our framework for several interesting applications and demonstrate them numerically on both synthetic and real data.

[2]. Q. Tran-Dinh,  A. Kyrillidis and V. Cevher. Composite Self-concordant Minimization, Journal of Machine Learning Research, 2014.

Available  at : http://infoscience.epfl.ch/record/188126

Abstract:   We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator. We theoretically establish the convergence of our framework without relying on the usual Lipschitz gradient assumption on the smooth part. An important highlight of our work is a new set of analytic step-size selection and correction procedures based on the structure of the problem. We describe concrete algorithmic instances of our framework for several interesting applications and demonstrate them numerically on both synthetic and real data.

[3]. Q. Tran-Dinh,  A. Kyrillidis and V. Cevher. An inexact proximal path-following algorithm for constrained convex minimization, SIAM J. Optimization, vol. 24, no. 4, pp. 1718–1745, 2014.

Available at:  http://infoscience.epfl.ch/record/190317

Abstract :  Many scientific and engineering applications feature nonsmooth convex minimization problems over convex sets. In this paper, we address an important instance of this broad class where we assume that the nonsmooth objective is equipped with a tractable proximity operator and that the convex constraint set affords a self-concordant barrier. We provide a new joint treatment of proximal and self-concordant barrier concepts and illustrate that such problems can be efficiently solved, without the need of lifting the problem dimensions, as in disciplined convex optimization approach. We propose an inexact path-following algorithmic framework and theoretically characterize the worst-case analytical complexity of this framework when the proximal subproblems are solved inexactly. To show the merits of our framework, we apply its instances to both synthetic and real-world applications, where it shows advantages over standard interior point methods. As a by-product, we describe how our framework can obtain points on the Pareto frontier of regularized problems with self-concordant objectives in a tuning free fashion.