0% found this document useful (0 votes)

35 views26 pages

Convex Cardinality Optimization

This document summarizes a lecture on convex cardinality optimization problems. It begins with an introduction to cardinality and its properties, such as being non-convex and discontinuous. It then discusses convex-cardinality problems and various applications, including sparse portfolio selection and high dimensional statistics. Exact solutions to convex-cardinality problems involve enumerating all possible sparsity patterns, which is computationally intractable. Approximate solutions instead replace cardinality with the l1-norm heuristic and methods like iterative shrinkage-thresholding algorithm are discussed for solving the resulting l1-regularized problems.

Uploaded by

Steve Yeo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views26 pages

Convex Cardinality Optimization

Uploaded by

Steve Yeo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Lecture: Convex Cardinality Optimization

Junyu Zhang

Based on Stephen Boyd’s lecture notes

October 28, 2019

1 / 26
Outline

Convex cardinality optimization

Case studies

Hardness of the problem

`1 -regularized methods

The cardinality of x ∈ Rn , written card(x), is the number of

nonzero elements in x.

Cardinality is not continuous.

Cardinality is non-convex and non-concave. (Think of 1-d case

card(x), x ∈ R)

Quasi-concave on Rn+ since

card(λx + (1 − λ)y ) ≥ min{card(x), card(y )}

Useful concept: support (sparsity pattern)

supp(x) := {i : xi 6= 0}

3 / 26
More about cardinality

It is often called the “`0 -norm”, written as kxk0 . However,

cardinality is not a norm.
Triangle inequality: kx + y k0 ≤ kxk0 + ky k0 . (Yes)
Positive definite: kxk0 = 0 ⇐⇒ x = 0. (Yes)
Homogeneity: ka · xk0 = |a| · kxk0 , ∀a ∈ R. (No)

Define d(x, y ) := kx − y k0 , d(·, ·) is a metric (distance).

Triangle inequality: d(x, y ) ≤ d(x, z) + d(y , z). (Yes)
Symmetry: d(x, y ) = d(y , x). (Yes)
Identity of indiscernibles: d(x, y ) = 0 ⇐⇒ x = y . (Yes)

If x ∈ {0, 1}n : d(·, ·) is called Hamming distance. It has many

application in data minning, e.g., recommondation systems.

4 / 26
Convex-cardinality problems

A convex-cardinality problem is a problem which will be convex

without the presence of card.

minimizing cardinality as objective function:

minimize card(x)
s.t. x ∈C

having cardinality as constraint:

minimize f (x)
s.t. x ∈C
card(x) ≤ k

5 / 26
Sparse portfolio selection
The problem set up:
n assets, each with expected return ri , 1 ≤ i ≤ n.
The covariance matrix of the the returns Σ.
Each assets invest wi .

The portfolio selection problem:

minimize w T Σw
s.t. r T w = µ, 1T w = 1.

Solution w ∗ can be dense, many entries can be small but nonzero:

minimize w T Σw
s.t. r T w = µ, 1T w = 1,
card(w ) ≤ k.
6 / 26
Sparse portfolio selection

which is often formulated as

minimize w T Σw + γ · card(w )
s.t. r T w = µ, 1T w = 1.

In practice, solve

minimize w T Σw + λkw k1
s.t. r T w = µ, 1T w = 1.

7 / 26
High dimensional statistics

Fitting linear models:

y = x T β ∗ + , ∼ N(0, σ 2 ).
Collected data points {(xi , yi )}ni=1 .
The linear regression:
n
X
β̂n = arg min (xiT β − yi )2
β i=1

Classical senarios, abundant (nondegenerate) data:

lim β̂n = β ∗
n→∞

8 / 26
High dimensional statistics

Modern situation: x ∈ Rd , with d n.

Linear regression can be arbitrarily bad.
No theoretical guarantee.

Sepcial structure: sparsity: card(β ∗ ) ≤ O(n/ log d) d

Choose proper k, solve
n
X
β̂n = arg min (xiT β − yi )2 s.t. card(β) ≤ k.
β i=1

In practice, solve
n
X
β̂n = arg min (xiT β − yi )2 + λn kβk1 .
β i=1

With proper λn , good bound can be derived for kβ̂n − β ∗ k2 .

9 / 26
Sparse coding / Dictionary learning
Unsupervised methods for learning sets of over-complete bases to
represent data efficiently.
Belief: natural images are generated by linear combination of bases.

picture from Andrew Ng.

10 / 26
Sparse coding / Dictionary learning

The set of over-complete basis: Φ = [φ1 , φ2 , · · · , φm ].

Assume each data points xi ∈ Rd can be expressed as

m
(j)
X
xi = ai · φj + νi = Φai + νi
j=1

νi ∈ Rd is an additive noise; card ai is small.

Sparse coding: a := [a1 , ..., an ],

n
X
kxi − Φai k2 + λkai k1

minimize
a, Φ
i=1

Fix a, convex in Φ; Fix Φ, convex in a.

11 / 26
Exact solution to convex-cardinality problems

Consider the case:

minimize
n
f (x) + λ · card(x)
x∈R
s.t. x ∈C

If supp(x) is given, solvable convex problem.

Exact solution: enumerate all possible supp(x) and solve the

problems. In the worst case: solve 2n problems.

The problem is in general (NP-)hard.

12 / 26
Binary linear programming

Binary LP problems:

minimize cT x
x
s.t. Ax ≤ b, x ∈ {0, 1}n .

Includes hard problems traveling salesman, Hamiltonian cycle, etc.

Observation (“=” achieved if and only if x ∈ {0, 1}n ):
n
X
card(x) + card(1 − x) = card(xi ) + card(1 − xi ) ≥ n.
i=1

reduction:
minimize cT x
x
s.t. Ax ≤ b,
card(x) + card(1 − x) = n.

13 / 26
`1 -norm heuristic

Step 1. Replace card(x) with λkxk1 . Solve the problem to get x̂ ∗

Tune λ to get desired sparsity. The larger λ is, the sparser the
solution is.
P
Or sophisticated versions: i wi |xi |.

Step 2. Polishing: fix the sparsity pattern of supp(x̂ ∗ ) and resolve

the problem without regularization to get x ∗

Step 3. In statistics and machine learning, a cross validation is

needed to ensure the selection of λ is reasonable.

14 / 26
The bias in the solution

Solve minimize f (x) s.t. card(x) ≤ k by

x̂λ = minimize f (x) + λkxk1

KKT condition: 0 ∈ ∇f (x̂λ ) + λ∂(kx̂λ k1 ), element-wise:


∇i f (x̂λ ) = −λ,
 if (x̂λ )i > 0
∇i f (x̂λ ) = λ, if (x̂λ )i < 0

∇i f (x̂λ ) ∈ [−λ, λ], if (x̂λ )i = 0


Even if we find the correct sparsity pattern. There is a nonzero bias.

Namely, define B = supp(x̂λ ),

k∇B f (x̂λ )k2 = |B| · λ2 6= 0

15 / 26
Solving the `1 regularized problem
Suppose f is convex and smooth, consider

minimize f (x) + λ · kxk1 , s.t. x ∈C

ISTA / FISTA method. (Next page)

Subgradient method

xk+1 = xk − ηk · (∇f (xk ) + λvk ), vk ∈ ∂(kxk k1 )

ADMM formulation, for simple f such as kAx − bk2

minimize f (x) + λ · ky k1 , s.t. x = y, x ∈ C

Min-max formulation. (kxk1 = maxy {y T x : ky k∞ ≤ 1})

minimize maximize f (x) + λy T x

x∈C ky k∞ ≤1

...
16 / 26
Iterative shrinkage-thresholding algorithm (ISTA)

Define x̃k+1 = xk − η · ∇f (xk ). ISTA:

1
xk+1 = arg min kx − x̃k+1 k2 + λkxk1
x 2η
Equivalent formulation:
1
xk+1 = arg min f (xk ) + h∇f (xk ), x − xk i + λkxk1 + kx − xk k2
x 2η
Separable, each entry solves arg minz (z − a)2 + b|z|, with b > 0.
Explicit update:

(x̃k+1 )i − ηλ, if
 (x̃k+1 )i ∈ (ηλ, ∞)
(xk+1 )i = 0, if (x̃k+1 )i ∈ [−ηλ, ηλ]

(x̃k+1 )i + ηλ, if (x̃k+1 )i ∈ (−∞, −ηλ)


Apply Nesterov’s acceleration, fast ISTA (FISTA).

17 / 26
Interpretation as convex relaxation

For R large enough, consider

minimize f (x) + γ · card(x)

x
s.t. x ∈ C, kxk∞ ≤ R

Equivalent mixed integer programming:

minimize f (x) + γ · 1T z
x,z
s.t. |xi | ≤ Rzi , for i = 1, ..., n
x ∈ C, z ∈ {0, 1}n

18 / 26
Interpretation as convex relaxation

Relax z ∈ {0, 1}n to z ∈ [0, 1]n

minimize f (x) + γ · 1T z
x
s.t. |xi | ≤ Rzi , for i = 1, ..., n
x ∈ C, z ∈ [0, 1]n

Fix x, minimize over z gives zi = |xi |/R:

γ
minimize f (x) + · kxk1
x R
s.t. x ∈C

optimal value lower bounds the original problem.

19 / 26
Sparse design

Find design vector x with smallest card(x) such that some set of
specifications are satisfied.

Zero entries of x simplify design (components that are not needed).

Problem formulation

minimize card(x)
s.t. x ∈C

Examples
antenna array beamforming (zero coefficients correspond to unneeded
antenna elements)
truss design (zero coefficients correspond to bars that are not needed)
...

20 / 26
Truss design

Figure: Such truss contains a huge number of steal bars, potentially many of
them are not necessary.

21 / 26
Sparse modeling / regressor selection

Fitting data b as a linear combinition of k regressors, chosen from n

regressors.

minimize
n
kAx − bk2
x∈R
s.t. card(x) ≤ k

Reformulations:
minimize
n
kAx − bk2 + λ · card(x)
x∈R

minimize
n
kAx − bk22 + λ · card(x)
x∈R

minimize
n
card(x) s.t. kAx − bk2 ≤
x∈R

Lasso, feature selection, etc.

Generalized linear model: logistic regression, poisson regression, etc.

22 / 26
Sparse signal reconstruction (compressed sensing)

For an unknown signal x ∈ Rd

noisy measurement y = Ax + v , v ∼ N(0, σ 2 I ). A ∈ Rn×d is a known
fat matrix, n d.
prior knowledge x is sparse, card(x) ≤ k.

Maximum likelihood estimator (MLE):

minimize
n
kAx − y k22
x∈R
s.t. card(x) ≤ k

Exact recovery expected under certain conditions (for `1 heuristic).

23 / 26
Estimation with outliers

Data points {(xi , yi )}ni=1 : yi = xiT β ∗ + i + wi∗ .

i ∼ N(0, σ 2 ) are i.i.d. noise.

w ∗ = [w1∗ , ..., wn∗ ]T is sparse noise: card(w ∗ ) ≤ k.

supp(w ∗ ) corresponds to the set of outliers.

The MLE for β:

minimize ky − X β − w k22
β∈Rd ,w ∈Rn
s.t. card(w ) ≤ k

24 / 26
Robust principle component analysis∗

Data matrix M ∈ Rn×n . M has an approximate low-rank

decomposition: M = UV T + W , U, V ∈ Rn×k , k n and the noise
matrix W is sparse.

Classical PCA problem:

minimize kM − Lk
L∈Rn×n
s.t. rank(L) ≤ k

Robust PCA problem:

minimize kM − L − W k
L,W ∈Rn×n
s.t. rank(L) ≤ k, card(W ) ≤ k 0 .

25 / 26
Robust principle component analysis∗

Robust PCA problem:

minimize kM − L − W k
L,W ∈Rn×n
s.t. rank(L) ≤ k, card(W ) ≤ k 0 .

Matrix L is low-rank ⇐⇒ card(σ(L)) is small, σ(L) is the vector of

n singular values of L.

Nuclear norm kLk∗ := kσ(L)k1 .

Practical formulation of robust PCA:

minimize kLk∗ + λkW k1

L,W ∈Rn×n
s.t. L + W = M.

26 / 26

Venture Design Anne Marie Knott 1
No ratings yet
Venture Design Anne Marie Knott 1
570 pages
OceanofPDF - Com Baby Blues - Sophia Snowe
No ratings yet
OceanofPDF - Com Baby Blues - Sophia Snowe
75 pages
GUTINDEX - Complete Catalogue Upto Jan 2009
No ratings yet
GUTINDEX - Complete Catalogue Upto Jan 2009
1,080 pages
Convex Optimizatiom IP
No ratings yet
Convex Optimizatiom IP
97 pages
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
No ratings yet
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
258 pages
Module 5 - Creating A Better World
No ratings yet
Module 5 - Creating A Better World
82 pages
An Introduction To Syllabus Design and Evaluation
100% (1)
An Introduction To Syllabus Design and Evaluation
7 pages
My Grandmother's House
No ratings yet
My Grandmother's House
19 pages
Wainwrightslides 2
No ratings yet
Wainwrightslides 2
77 pages
Algorithms and Complexity
No ratings yet
Algorithms and Complexity
130 pages
Chap01 Introduction
No ratings yet
Chap01 Introduction
21 pages
Lecture 7
No ratings yet
Lecture 7
46 pages
SESO2018 Wednesday Sagastizabal
No ratings yet
SESO2018 Wednesday Sagastizabal
181 pages
Robust Shape Matching With OT
No ratings yet
Robust Shape Matching With OT
175 pages
Stark Soren Central Asia and The Steppes
No ratings yet
Stark Soren Central Asia and The Steppes
28 pages
ssp1 Final Uts
100% (1)
ssp1 Final Uts
22 pages
Malaysian M NT For The Critically Ill Patients 2017
No ratings yet
Malaysian M NT For The Critically Ill Patients 2017
43 pages
l1 Slides
No ratings yet
l1 Slides
31 pages
2022lectures1-8 Optimization For DataScience
No ratings yet
2022lectures1-8 Optimization For DataScience
35 pages
Lecture1 introductionPCA
No ratings yet
Lecture1 introductionPCA
75 pages
Lecture-05 - Least Squares and Optimization
No ratings yet
Lecture-05 - Least Squares and Optimization
34 pages
Case Study 1
50% (2)
Case Study 1
6 pages
Optimization
No ratings yet
Optimization
16 pages
Tối Ưu Hóa Cho Khoa Học Dữ Liệu
No ratings yet
Tối Ưu Hóa Cho Khoa Học Dữ Liệu
64 pages
10 Convex Optimisation
No ratings yet
10 Convex Optimisation
31 pages
l1 Ext Slides
No ratings yet
l1 Ext Slides
21 pages
Grade 10 Examination 2023 Term 4-1
No ratings yet
Grade 10 Examination 2023 Term 4-1
10 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
No ratings yet
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
86 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Berkeley-Tutorial Optimization For Machine Learningpart2
No ratings yet
Berkeley-Tutorial Optimization For Machine Learningpart2
35 pages
श्री यंत्र पूजा Abridged-only pooja details
No ratings yet
श्री यंत्र पूजा Abridged-only pooja details
16 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Methods of Teaching ICT Notes 1
No ratings yet
Methods of Teaching ICT Notes 1
11 pages
F-Bach
No ratings yet
F-Bach
36 pages
Computational OPT Book 2023 Chapter 01
No ratings yet
Computational OPT Book 2023 Chapter 01
18 pages
Convex Functions
No ratings yet
Convex Functions
13 pages
1 s2.0 S0168927423000429 Main
No ratings yet
1 s2.0 S0168927423000429 Main
20 pages
Optimization PDF
No ratings yet
Optimization PDF
59 pages
Parents and Emerging Adults in India
No ratings yet
Parents and Emerging Adults in India
14 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
A Comprehensive Study On Corporate Social Responsibility
No ratings yet
A Comprehensive Study On Corporate Social Responsibility
8 pages
Optimization Methods For Machine Learning: Stephen Wright
No ratings yet
Optimization Methods For Machine Learning: Stephen Wright
78 pages
To Draw The I-V Characteristic Curve of A P-N Junction in Forward Bias and Reverse Bias - Physics
No ratings yet
To Draw The I-V Characteristic Curve of A P-N Junction in Forward Bias and Reverse Bias - Physics
7 pages
176-Article Text-290-1-10-20180319 PDF
No ratings yet
176-Article Text-290-1-10-20180319 PDF
9 pages
Lect5 Removed
No ratings yet
Lect5 Removed
35 pages
Convex Optimization in Classification Problems: MIT/ORC Spring Seminar
No ratings yet
Convex Optimization in Classification Problems: MIT/ORC Spring Seminar
39 pages
Ariba Minutes of Meeting 15042024
No ratings yet
Ariba Minutes of Meeting 15042024
4 pages
2.2. EG214 Worksheet Unit 2 (p.24-27)
No ratings yet
2.2. EG214 Worksheet Unit 2 (p.24-27)
8 pages
Local Search in Smooth Convex Sets: CX Ax B A I A A A A A A O D X Ax B X CX CX O A I J Z O Opt D X X C A B P CX
No ratings yet
Local Search in Smooth Convex Sets: CX Ax B A I A A A A A A O D X Ax B X CX CX O A I J Z O Opt D X X C A B P CX
9 pages
The Laboratory Diagnostic Solution
No ratings yet
The Laboratory Diagnostic Solution
8 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
No ratings yet
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
20 pages
Norm Methods For Convex-Cardinality Problems
No ratings yet
Norm Methods For Convex-Cardinality Problems
31 pages
CS480 6 Linear Models
No ratings yet
CS480 6 Linear Models
68 pages
Post Listening
No ratings yet
Post Listening
17 pages
Pmi
No ratings yet
Pmi
5 pages
Methods of Sampling and Test (Physical and Chemical) For Water and Waste Water
No ratings yet
Methods of Sampling and Test (Physical and Chemical) For Water and Waste Water
4 pages
C2 M2 Exam Withsol
No ratings yet
C2 M2 Exam Withsol
12 pages
Optimization Algorithms For Data Analysis Wright
No ratings yet
Optimization Algorithms For Data Analysis Wright
49 pages
Video Analysis - Waiting On The World To Change
No ratings yet
Video Analysis - Waiting On The World To Change
4 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
18 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Rajmic 2016
No ratings yet
Rajmic 2016
5 pages
Sparse Optimization Lecture: Basic Sparse Optimization Models
No ratings yet
Sparse Optimization Lecture: Basic Sparse Optimization Models
33 pages
1 Intro
No ratings yet
1 Intro
25 pages
Mycoherbicides: Fungal Plant Pathogens For Biological Control of Weeds
No ratings yet
Mycoherbicides: Fungal Plant Pathogens For Biological Control of Weeds
3 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Sparse Regression and Dictionary Learning
No ratings yet
Sparse Regression and Dictionary Learning
14 pages
Giraffes
No ratings yet
Giraffes
2 pages
Lasso Linear Regression
No ratings yet
Lasso Linear Regression
8 pages
Borja Vs Sulyap Inc
No ratings yet
Borja Vs Sulyap Inc
3 pages
Materials 12 01227 PDF
No ratings yet
Materials 12 01227 PDF
10 pages
O4MD 01 Introduction
No ratings yet
O4MD 01 Introduction
10 pages
Convex Problems
No ratings yet
Convex Problems
48 pages
ESOMAR Membership Brochure 2016
No ratings yet
ESOMAR Membership Brochure 2016
2 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Exercises 1
No ratings yet
Exercises 1
2 pages
Data Science - Convex Optimization and Examples PDF
No ratings yet
Data Science - Convex Optimization and Examples PDF
9 pages
Sparsity and Its Mathematics
No ratings yet
Sparsity and Its Mathematics
44 pages
Teacher's Book
No ratings yet
Teacher's Book
1 page
Vinilos Remember-Ruta Del Bakalao Valencia
No ratings yet
Vinilos Remember-Ruta Del Bakalao Valencia
20 pages
Convex Optimization For Machine Learning
No ratings yet
Convex Optimization For Machine Learning
110 pages
Chapter 0: Introduction: 0.2.1 Examples in Machine Learning
No ratings yet
Chapter 0: Introduction: 0.2.1 Examples in Machine Learning
4 pages
COMP 4211 - Machine Learning
No ratings yet
COMP 4211 - Machine Learning
19 pages
Convex Optimization Prerequisite - Topics
No ratings yet
Convex Optimization Prerequisite - Topics
6 pages
1 Introduction
No ratings yet
1 Introduction
24 pages
189 Cheat Sheet Minicards
No ratings yet
189 Cheat Sheet Minicards
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Uploaded by

Uploaded by

Lecture: Convex Cardinality Optimization

Based on Stephen Boyd’s lecture notes

October 28, 2019

Convex cardinality optimization

Hardness of the problem

More problems cases

The cardinality of x ∈ Rn , written card(x), is the number of

Cardinality is not continuous.

Cardinality is non-convex and non-concave. (Think of 1-d case

Quasi-concave on Rn+ since

card(λx + (1 − λ)y ) ≥ min{card(x), card(y )}

Useful concept: support (sparsity pattern)

It is often called the “`0 -norm”, written as kxk0 . However,

Define d(x, y ) := kx − y k0 , d(·, ·) is a metric (distance).

If x ∈ {0, 1}n : d(·, ·) is called Hamming distance. It has many

A convex-cardinality problem is a problem which will be convex

minimizing cardinality as objective function:

having cardinality as constraint:

The portfolio selection problem:

Solution w ∗ can be dense, many entries can be small but nonzero:

which is often formulated as

Fitting linear models:

Classical senarios, abundant (nondegenerate) data:

Modern situation: x ∈ Rd , with d n.

Sepcial structure: sparsity: card(β ∗ ) ≤ O(n/ log d) d

With proper λn , good bound can be derived for kβ̂n − β ∗ k2 .

picture from Andrew Ng.

The set of over-complete basis: Φ = [φ1 , φ2 , · · · , φm ].

Assume each data points xi ∈ Rd can be expressed as

νi ∈ Rd is an additive noise; card ai is small.

Sparse coding: a := [a1 , ..., an ],

Fix a, convex in Φ; Fix Φ, convex in a.

Consider the case:

If supp(x) is given, solvable convex problem.

Exact solution: enumerate all possible supp(x) and solve the

The problem is in general (NP-)hard.

Includes hard problems traveling salesman, Hamiltonian cycle, etc.

Step 1. Replace card(x) with λkxk1 . Solve the problem to get x̂ ∗

Step 2. Polishing: fix the sparsity pattern of supp(x̂ ∗ ) and resolve

Step 3. In statistics and machine learning, a cross validation is

Solve minimize f (x) s.t. card(x) ≤ k by

x̂λ = minimize f (x) + λkxk1

KKT condition: 0 ∈ ∇f (x̂λ ) + λ∂(kx̂λ k1 ), element-wise:

Even if we find the correct sparsity pattern. There is a nonzero bias.

k∇B f (x̂λ )k2 = |B| · λ2 6= 0

minimize f (x) + λ · kxk1 , s.t. x ∈C

ISTA / FISTA method. (Next page)

xk+1 = xk − ηk · (∇f (xk ) + λvk ), vk ∈ ∂(kxk k1 )

ADMM formulation, for simple f such as kAx − bk2

minimize f (x) + λ · ky k1 , s.t. x = y, x ∈ C

Min-max formulation. (kxk1 = maxy {y T x : ky k∞ ≤ 1})

minimize maximize f (x) + λy T x

Define x̃k+1 = xk − η · ∇f (xk ). ISTA:

Apply Nesterov’s acceleration, fast ISTA (FISTA).

For R large enough, consider

minimize f (x) + γ · card(x)

Equivalent mixed integer programming:

Relax z ∈ {0, 1}n to z ∈ [0, 1]n

Fix x, minimize over z gives zi = |xi |/R:

optimal value lower bounds the original problem.

Zero entries of x simplify design (components that are not needed).

Fitting data b as a linear combinition of k regressors, chosen from n

Lasso, feature selection, etc.

Generalized linear model: logistic regression, poisson regression, etc.

For an unknown signal x ∈ Rd

Maximum likelihood estimator (MLE):

Exact recovery expected under certain conditions (for `1 heuristic).

Data points {(xi , yi )}ni=1 : yi = xiT β ∗ + i + wi∗ .

i ∼ N(0, σ 2 ) are i.i.d. noise.

w ∗ = [w1∗ , ..., wn∗ ]T is sparse noise: card(w ∗ ) ≤ k.

supp(w ∗ ) corresponds to the set of outliers.

The MLE for β:

Data matrix M ∈ Rn×n . M has an approximate low-rank

Classical PCA problem:

Robust PCA problem:

Robust PCA problem:

Matrix L is low-rank ⇐⇒ card(σ(L)) is small, σ(L) is the vector of

Nuclear norm kLk∗ := kσ(L)k1 .