0% found this document useful (0 votes)
44 views

General Pseudo-Inverse: SVD Applications 16-2

The document discusses the singular value decomposition (SVD) and some of its applications. It defines the SVD of a matrix A as A = UΣV^T, where U and V are orthogonal matrices and Σ is a diagonal matrix of singular values. It then discusses how the SVD can be used to find the pseudo-inverse of a matrix, compute least squares solutions, and interpret linear transformations geometrically. It also discusses how the condition number from the SVD relates to the sensitivity of solutions to errors in the input data.

Uploaded by

Pedro Luis Carro
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

General Pseudo-Inverse: SVD Applications 16-2

The document discusses the singular value decomposition (SVD) and some of its applications. It defines the SVD of a matrix A as A = UΣV^T, where U and V are orthogonal matrices and Σ is a diagonal matrix of singular values. It then discusses how the SVD can be used to find the pseudo-inverse of a matrix, compute least squares solutions, and interpret linear transformations geometrically. It also discusses how the condition number from the SVD relates to the sensitivity of solutions to errors in the input data.

Uploaded by

Pedro Luis Carro
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

General pseudo-inverse

if A has SVD A = UV
T
,
A

= V
1
U
T
is the pseudo-inverse or Moore-Penrose inverse of A
if A is skinny and full rank,
A

= (A
T
A)
1
A
T
gives the least-squares solution x
ls
= A

y
if A is fat and full rank,
A

= A
T
(AA
T
)
1
gives the least-norm solution x
ln
= A

y
SVD Applications 162
Full SVD
SVD of A R
mn
with Rank(A) = r:
A = U
1

1
V
T
1
=

u
1
u
r

1
.
.
.

v
T
1
.
.
.
v
T
r

nd U
2
R
m(mr)
, V
2
R
n(nr)
s.t. U = [U
1
U
2
] R
mm
and
V = [V
1
V
2
] R
nn
are orthogonal
add zero rows/cols to
1
to form R
mn
:
=


1
0
r(n r)
0
(mr)r
0
(mr)(n r)

SVD Applications 165


then we have
A = U
1

1
V
T
1
=

U
1
U
2


1
0
r(n r)
0
(mr)r
0
(mr)(n r)

V
T
1
V
T
2

i.e.:
A = UV
T
called full SVD of A
(SVD with positive singular values only called compact SVD)
SVD Applications 166
Image of unit ball under linear transformation
full SVD:
A = UV
T
gives intepretation of y = Ax:
rotate (by V
T
)
stretch along axes by
i
(
i
= 0 for i > r)
zero-pad (if m > n) or truncate (if m < n) to get m-vector
rotate (by U)
SVD Applications 167
Image of unit ball under A
rotate by V
T
stretch, = diag(2, 0.5)
u
1 u
2 rotate by U
1
1
1
1
2
0.5
{Ax | x 1} is ellipsoid with principal axes
i
u
i
.
SVD Applications 168
Sensitivity of linear equations to data error
consider y = Ax, A R
nn
invertible; of course x = A
1
y
suppose we have an error or noise in y, i.e., y becomes y +y
then x becomes x +x with x = A
1
y
hence we have x = A
1
y A
1
y
if A
1
is large,
small errors in y can lead to large errors in x
cant solve for x given y (with small errors)
hence, A can be considered singular in practice
SVD Applications 1615
a more rened analysis uses relative instead of absolute errors in x and y
since y = Ax, we also have y Ax, hence
x
x
AA
1

y
y
(A) = AA
1
=
max
(A)/
min
(A)
is called the condition number of A
we have:
relative error in solution x condition number relative error in data y
or, in terms of # bits of guaranteed accuracy:
# bits accuacy in solution # bits accuracy in data log
2

SVD Applications 1616


we say
A is well conditioned if is small
A is poorly conditioned if is large
(denition of small and large depend on application)
same analysis holds for least-squares solutions with A nonsquare,
=
max
(A)/
min
(A)
SVD Applications 1617
State estimation set up
we consider the discrete-time system
x(t + 1) = Ax(t) +Bu(t) +w(t), y(t) = Cx(t) +Du(t) +v(t)
w is state disturbance or noise
v is sensor noise or error
A, B, C, and D are known
u and y are observed over time interval [0, t 1]
w and v are not known, but can be described statistically, or assumed
small (e.g., in RMS value)
Observability and state estimation 192
State estimation problem
state estimation problem: estimate x(s) from
u(0), . . . , u(t 1), y(0), . . . , y(t 1)
s = 0: estimate initial state
s = t 1: estimate current state
s = t: estimate (i.e., predict) next state
an algorithm or system that yields an estimate x(s) is called an observer or
state estimator
x(s) is denoted x(s|t 1) to show what information estimate is based on
(read, x(s) given t 1)
Observability and state estimation 193
Noiseless case
lets look at nding x(0), with no state or measurement noise:
x(t + 1) = Ax(t) +Bu(t), y(t) = Cx(t) +Du(t)
with x(t) R
n
, u(t) R
m
, y(t) R
p
then we have

y(0)
.
.
.
y(t 1)

= O
t
x(0) + T
t

u(0)
.
.
.
u(t 1)

Observability and state estimation 194


where
O
t
=

C
CA
.
.
.
CA
t1

, T
t
=

D 0
CB D 0
.
.
.
CA
t2
B CA
t3
B CB D

O
t
maps initials state into resulting output over [0, t 1]
T
t
maps input to output over [0, t 1]
hence we have
O
t
x(0) =

y(0)
.
.
.
y(t 1)

T
t

u(0)
.
.
.
u(t 1)

RHS is known, x(0) is to be determined


Observability and state estimation 195
hence:
can uniquely determine x(0) if and only if N(O
t
) = {0}
N(O
t
) gives ambiguity in determining x(0)
if x(0) N(O
t
) and u = 0, output is zero over interval [0, t 1]
input u does not aect ability to determine x(0);
its eect can be subtracted out
Observability and state estimation 196
Observability matrix
by C-H theorem, each A
k
is linear combination of A
0
, . . . , A
n1
hence for t n, N(O
t
) = N(O) where
O = O
n
=

C
CA
.
.
.
CA
n1

is called the observability matrix


if x(0) can be deduced from u and y over [0, t 1] for any t, then x(0)
can be deduced from u and y over [0, n 1]
N(O) is called unobservable subspace; describes ambiguity in determining
state from input and output
system is called observable if N(O) = {0}, i.e., Rank(O) = n
Observability and state estimation 197
Observers for noiseless case
suppose Rank(O
t
) = n (i.e., system is observable) and let F be any left
inverse of O
t
, i.e., FO
t
= I
then we have the observer
x(0) = F

y(0)
.
.
.
y(t 1)

T
t

u(0)
.
.
.
u(t 1)

which deduces x(0) (exactly) from u, y over [0, t 1]


in fact we have
x( t + 1) = F

y( t + 1)
.
.
.
y()

T
t

u( t + 1)
.
.
.
u()

Observability and state estimation 1910


i.e., our observer estimates what state was t 1 epochs ago, given past
t 1 inputs & outputs
observer is (multi-input, multi-output) nite impulse response (FIR) lter,
with inputs u and y, and output x
Observability and state estimation 1911
Invariance of unobservable set
fact: the unobservable subspace N(O) is invariant, i.e., if z N(O),
then Az N(O)
proof: suppose z N(O), i.e., CA
k
z = 0 for k = 0, . . . , n 1
evidently CA
k
(Az) = 0 for k = 0, . . . , n 2;
CA
n1
(Az) = CA
n
z =
n1

i=0

i
CA
i
z = 0
(by C-H) where
det(sI A) = s
n
+
n1
s
n1
+ +
0
Observability and state estimation 1912
Continuous-time observability
continuous-time system with no sensor or state noise:
x = Ax +Bu, y = Cx +Du
can we deduce state x from u and y?
lets look at derivatives of y:
y = Cx +Du
y = C x +D u = CAx +CBu +D u
y = CA
2
x +CABu +CB u +D u
and so on
Observability and state estimation 1913
hence we have

y
y
.
.
.
y
(n1)

= Ox + T

u
u
.
.
.
u
(n1)

where O is the observability matrix and


T =

D 0
CB D 0
.
.
.
CA
n2
B CA
n3
B CB D

(same matrices we encountered in discrete-time case!)


Observability and state estimation 1914
rewrite as
Ox =
_

_
y
y
.
.
.
y
(n1)
_

_
T
_

_
u
u
.
.
.
u
(n1)
_

_
RHS is known; x is to be determined
hence if N(O) = {0} we can deduce x(t) from derivatives of u(t), y(t) up
to order n 1
in this case we say system is observable
can construct an observer using any left inverse F of O:
x = F
_
_
_
_
_

_
y
y
.
.
.
y
(n1)
_

_
T
_

_
u
u
.
.
.
u
(n1)
_

_
_
_
_
_
Observability and state estimation 1915
reconstructs x(t) (exactly and instantaneously) from
u(t), . . . , u
(n1)
(t), y(t), . . . , y
(n1)
(t)
derivative-based state reconstruction is dual of state transfer using
impulsive inputs
Observability and state estimation 1916
A converse
suppose z N(O) (the unobservable subspace), and u is any input, with
x, y the corresponding state and output, i.e.,
x = Ax +Bu, y = Cx +Du
then state trajectory x = x +e
tA
z satises

x = A x +Bu, y = C x +Du
i.e., input/output signals u, y consistent with both state trajectories x, x
hence if system is unobservable, no signal processing of any kind applied to
u and y can deduce x
unobservable subspace N(O) gives fundamental ambiguity in deducing x
from u, y
Observability and state estimation 1917
Least-squares observers
discrete-time system, with sensor noise:
x(t + 1) = Ax(t) +Bu(t), y(t) = Cx(t) +Du(t) +v(t)
we assume Rank(O
t
) = n (hence, system is observable)
least-squares observer uses pseudo-inverse:
x(0) = O

y(0)
.
.
.
y(t 1)

T
t

u(0)
.
.
.
u(t 1)

where O

t
=

O
T
t
O
t

1
O
T
t
Observability and state estimation 1918
interpretation: x
ls
(0) minimizes discrepancy between
output y that would be observed, with input u and initial state x(0)
(and no sensor noise), and
output y that was observed,
measured as
t1

=0
y() y()
2
can express least-squares initial state estimate as
x
ls
(0) =

t1

=0
(A
T
)

C
T
CA

1
t1

=0
(A
T
)

C
T
y()
where y is observed output with portion due to input subtracted:
y = y h u where h is impulse response
Observability and state estimation 1919
Least-squares observer uncertainty ellipsoid
since O

t
O
t
= I, we have
x(0) = x
ls
(0) x(0) = O

v(0)
.
.
.
v(t 1)

where x(0) is the estimation error of the initial state


in particular, x
ls
(0) = x(0) if sensor noise is zero
(i.e., observer recovers exact state in noiseless case)
now assume sensor noise is unknown, but has RMS value ,
1
t
t1

=0
v()
2

2
Observability and state estimation 1920
set of possible estimation errors is ellipsoid
x(0) E
unc
=
_
_
_
O

t
_
_
v(0)
.
.
.
v(t 1)
_
_

1
t
t1

=0
v()
2

2
_
_
_
E
unc
is uncertainty ellipsoid for x(0) (least-square gives best E
unc
)
shape of uncertainty ellipsoid determined by matrix
_
O
T
t
O
t
_
1
=
_
t1

=0
(A
T
)

C
T
CA

_
1
maximum norm of error is
x
ls
(0) x(0)

tO

Observability and state estimation 1921


Innite horizon uncertainty ellipsoid
the matrix
P = lim
t

t1

=0
(A
T
)

C
T
CA

1
always exists, and gives the limiting uncertainty in estimating x(0) from u,
y over longer and longer periods:
if A is stable, P > 0
i.e., cant estimate initial state perfectly even with innite number of
measurements u(t), y(t), t = 0, . . . (since memory of x(0) fades . . . )
if A is not stable, then P can have nonzero nullspace
i.e., initial state estimation error gets arbitrarily small (at least in some
directions) as more and more of signals u and y are observed
Observability and state estimation 1922
Continuous-time least-squares state estimation
assume x = Ax + Bu, y = Cx + Du + v is observable
least-squares estimate of initial state x(0), given u(), y(), 0 t:
choose x
ls
(0) to minimize integral square residual
J =

t
0

y() Ce
A
x(0)

2
d
where y = y h u is observed output minus part due to input
lets expand as J = x(0)
T
Qx(0) + 2r
T
x(0) + s,
Q =

t
0
e
A
T
C
T
Ce
A
d, r =

t
0
e
A
T
C
T
y() d,
q =

t
0
y()
T
y() d
Observability and state estimation 1928
setting
x(0)
J to zero, we obtain the least-squares observer
x
ls
(0) = Q
1
r =

t
0
e
A
T
C
T
Ce
A
d

t
0
e
A
T

C
T
y() d
estimation error is
x(0) = x
ls
(0) x(0) =

t
0
e
A
T
C
T
Ce
A
d

t
0
e
A
T
C
T
v() d
therefore if v = 0 then x
ls
(0) = x(0)
Observability and state estimation 1929

Given examples we want to model the relation between x


i

and y
i
as y
i
! Ax
i
. Define the estimation problem as:

we differentiate w.r.t. A and set to 0


13
System Identication*
*partial, discreet time, as LTI
y Ax A {x
i
, y
i
}
N
i=1
arg min
A
N

i=1
||y
i
Ax
i
||
2
=
N

i=1
x

i
A

Ax
i
2y

i
Ax
i
+y

i
y
i
A

A
N

i=1
||y
i
Ax
i
||
2
= 0
N

i=1
2Ax
i
x

i
2y
i
x

i
= 0
A
N

i=1
x
i
x

i
=
N

i=1
y
i
x

i
N > n

N
i=1
x
i
x

A =

i=1
y
i
x

i=1
x
i
x

1
y
t
= x
t
x
t+1
= Ax
t
+ A
!10 !8 !6 !4 !2 0 2 4 6 8 10
!10
!8
!6
!4
!2
0
2
4
6
8
Phase plane. x
t+1
=Ax
t
+w
0 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
8
||A!A||
2
time
s
q
u
a
r
e
d

e
r
r
o
r
||A

A||
2
y Ax A {x
i
, y
i
}
N
i=1
arg min
A
N

i=1
||y
i
Ax
i
||
2
=
N

i=1
x

i
A

Ax
i
2y

i
Ax
i
+y

i
y
i
A

A
N

i=1
||y
i
Ax
i
||
2
= 0
N

i=1
2Ax
i
x

i
2y
i
x

i
= 0
A
N

i=1
x
i
x

i
=
N

i=1
y
i
x

i
N > n

N
i=1
x
i
x

A =

i=1
y
i
x

i=1
x
i
x

1
y
t
= x
t
x
t+1
= Ax
t
+ A
!10 !8 !6 !4 !2 0 2 4 6 8 10
!10
!8
!6
!4
!2
0
2
4
6
8
Phase plane. x
t+1
=Ax
t
+w
0 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
8
||A!A||
2
time
s
q
u
a
r
e
d

e
r
r
o
r
||A

A||
2
y Ax A {x
i
, y
i
}
N
i=1
arg min
A
N

i=1
||y
i
Ax
i
||
2
=
N

i=1
x

i
A

Ax
i
2y

i
Ax
i
+y

i
y
i
A

A
N

i=1
||y
i
Ax
i
||
2
= 0
N

i=1
2Ax
i
x

i
2y
i
x

i
= 0
A
N

i=1
x
i
x

i
=
N

i=1
y
i
x

i
N > n

N
i=1
x
i
x

A =

i=1
y
i
x

i=1
x
i
x

1
y
t
= x
t
x
t+1
= Ax
t
+ A
!10 !8 !6 !4 !2 0 2 4 6 8 10
!10
!8
!6
!4
!2
0
2
4
6
8
Phase plane. x
t+1
=Ax
t
+w
0 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
8
||A!A||
2
time
s
q
u
a
r
e
d

e
r
r
o
r
||A

A||
2

If rank of is full rank (requires N > n) then

E.g. wed like to estimate A in system: x


t+1
= Ax
t
+ ! (! is noise).
To solve, simply replace y
i
with x
i+1
in above solution
.

Note that this would also be the most likely A if ! were Gaussian noise
with zero mean and unit variance.
14
y Ax A {x
i
, y
i
}
N
i=1
arg min
A
N

i=1
||y
i
Ax
i
||
2
=
N

i=1
x

i
A

Ax
i
2y

i
Ax
i
+y

i
y
i
A

A
N

i=1
||y
i
Ax
i
||
2
= 0
N

i=1
2Ax
i
x

i
2y
i
x

i
= 0
A
N

i=1
x
i
x

i
=
N

i=1
y
i
x

i
N > n

N
i=1
x
i
x

A =

i=1
y
i
x

i=1
x
i
x

1
y
t
= x
t
x
t+1
= Ax
t
+ A
!10 !8 !6 !4 !2 0 2 4 6 8 10
!10
!8
!6
!4
!2
0
2
4
6
8
Phase plane. x
t+1
=Ax
t
+w
0 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
8
||A!A||
2
time
s
q
u
a
r
e
d

e
r
r
o
r
||A

A||
2
y Ax A {x
i
, y
i
}
N
i=1
arg min
A
N

i=1
||y
i
Ax
i
||
2
=
N

i=1
x

i
A

Ax
i
2y

i
Ax
i
+y

i
y
i
A

A
N

i=1
||y
i
Ax
i
||
2
= 0
N

i=1
2Ax
i
x

i
2y
i
x

i
= 0
A
N

i=1
x
i
x

i
=
N

i=1
y
i
x

i
N > n

N
i=1
x
i
x

A =

i=1
y
i
x

i=1
x
i
x

1
y
t
= x
t
x
t+1
= Ax
t
+ A
!10 !8 !6 !4 !2 0 2 4 6 8 10
!10
!8
!6
!4
!2
0
2
4
6
8
Phase plane. x
t+1
=Ax
t
+w
0 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
8
||A!A||
2
time
s
q
u
a
r
e
d

e
r
r
o
r
||A

A||
2
y Ax A {x
i
, y
i
}
N
i=1
arg min
A
N

i=1
||y
i
Ax
i
||
2
=
N

i=1
x

i
A

Ax
i
2y

i
Ax
i
+y

i
y
i
A

A
N

i=1
||y
i
Ax
i
||
2
= 0
N

i=1
2Ax
i
x

i
2y
i
x

i
= 0
A
N

i=1
x
i
x

i
=
N

i=1
y
i
x

i
N > n

N
i=1
x
i
x

i
A =

i=1
y
i
x

i=1
x
i
x

1
y
t
= x
t
x
t+1
= Ax
t
+ A
!10 !8 !6 !4 !2 0 2 4 6 8 10
!10
!8
!6
!4
!2
0
2
4
6
8
Phase plane. x
t+1
=Ax
t
+w
0 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
8
||A!A||
2
time
s
q
u
a
r
e
d

e
r
r
o
r
||A

A||
2

Left: Phase plane, values of x


t
where !N(0,I)

Right: Squared error between true and estimated A as function of step


number. error = "
i,j
(A
ij
-
ij
)
2
is called the Frobenius norm.
15
y Ax A {x
i
, y
i
}
N
i=1
arg min
A
N

i=1
||y
i
Ax
i
||
2
=
N

i=1
x

i
A

Ax
i
2y

i
Ax
i
+y

i
y
i
A

A
N

i=1
||y
i
Ax
i
||
2
= 0
N

i=1
2Ax
i
x

i
2y
i
x

i
= 0
A
N

i=1
x
i
x

i
=
N

i=1
y
i
x

i
N > n

N
i=1
x
i
x

i
A =

i=1
y
i
x

i=1
x
i
x

1
y
t
= x
t
x
t+1
= Ax
t
+ A
!10 !8 !6 !4 !2 0 2 4 6 8 10
!10
!8
!6
!4
!2
0
2
4
6
8
Phase plane. x
t+1
=Ax
t
+w
0 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
8
||A!A||
2
time
s
q
u
a
r
e
d

e
r
r
o
r
||A

A||
2
y Ax A {x
i
, y
i
}
N
i=1
arg min
A
N

i=1
||y
i
Ax
i
||
2
=
N

i=1
x

i
A

Ax
i
2y

i
Ax
i
+y

i
y
i
A

A
N

i=1
||y
i
Ax
i
||
2
= 0
N

i=1
2Ax
i
x

i
2y
i
x

i
= 0
A
N

i=1
x
i
x

i
=
N

i=1
y
i
x

i
N > n

N
i=1
x
i
x

i
A =

i=1
y
i
x

i=1
x
i
x

1
y
t
= x
t
x
t+1
= Ax
t
+ A
!10 !8 !6 !4 !2 0 2 4 6 8 10
!10
!8
!6
!4
!2
0
2
4
6
8
Phase plane. x
t+1
=Ax
t
+w
0 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
8
||A!A||
2
time
s
q
u
a
r
e
d

e
r
r
o
r
||A

A||
2
EE363 Winter 2005-06
Lecture 6
Estimation
Gaussian random vectors
minimum mean-square estimation (MMSE)
MMSE with linear measurements
relation to least-squares, pseudo-inverse
61
Gaussian random vectors
random vector x R
n
is Gaussian if it has density
p
x
(v) = (2)
n/2
(det )
1/2
exp

1
2
(v x)
T

1
(v x)

,
for some =
T
> 0, x R
n
denoted x N( x, )
x R
n
is the mean or expected value of x, i.e.,
x = Ex =

vp
x
(v)dv
=
T
> 0 is the covariance matrix of x, i.e.,
= E(x x)(x x)
T
Estimation 62
= Exx
T
x x
T
=

(v x)(v x)
T
p
x
(v)dv
density for x N(0, 1):
4 3 2 1 0 1 2 3 4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
PSfrag replacements
v
p
x
(
v
)
=
1

v
2
/
2
Estimation 63
mean and variance of scalar random variable x
i
are
Ex
i
= x
i
, E(x
i
x
i
)
2
=
ii
hence standard deviation of x
i
is

ii
covariance between x
i
and x
j
is E(x
i
x
i
)(x
j
x
j
) =
ij
correlation coecient between x
i
and x
j
is
ij
=

ij

ii

jj
mean (norm) square deviation of x from x is
Ex x
2
= ETr(x x)(x x)
T
= Tr =
n

i=1

ii
(using Tr AB = Tr BA)
example: x N(0, I) means x
i
are independent identically distributed
(IID) N(0, 1) random variables
Estimation 64
Condence ellipsoids
p
x
(v) is constant for (v x)
T

1
(v x) = , i.e., on the surface of
ellipsoid
E

= {v | (v x)
T

1
(v x) }
thus x and determine shape of density
can interpret E

as condence ellipsoid for x:


the nonnegative random variable (x x)
T

1
(x x) has a
2
n
distribution, so Prob(x E

) = F

2
n
() where F

2
n
is the CDF
some good approximations:
E
n
gives about 50% probability
E
n+2

n
gives about 90% probability
Estimation 65
geometrically:
mean x gives center of ellipsoid
semiaxes are

i
u
i
, where u
i
are (orthonormal) eigenvectors of
with eigenvalues
i
Estimation 66
example: x N( x, ) with x =

2
1

, =

2 1
1 1

x
1
has mean 2, std. dev.

2
x
2
has mean 1, std. dev. 1
correlation coecient between x
1
and x
2
is = 1/

2
Ex x
2
= 3
90% condence ellipsoid corresponds to = 4.6:
10 8 6 4 2 0 2 4 6 8 10
8
6
4
2
0
2
4
6
8
PSfrag replacements
x
1
x
2
(here, 91 out of 100 fall in E
4.6
)
Estimation 67
Ane transformation
suppose x N( x,
x
)
consider ane transformation of x:
z = Ax + b,
where A R
mn
, b R
m
then z is Gaussian, with mean
Ez = E(Ax + b) = AEx + b = A x + b
and covariance

z
= E(z z)(z z)
T
= EA(x x)(x x)
T
A
T
= A
x
A
T
Estimation 68
examples:
if w N(0, I) then x =
1/2
w + x is N( x, )
useful for simulating vectors with given mean and covariance
conversely, if x N( x, ) then z =
1/2
(x x) is N(0, I)
(normalizes & decorrelates)
Estimation 69
suppose x N( x, ) and c R
n
scalar c
T
x has mean c
T
x and variance c
T
c
thus (unit length) direction of minimum variability for x is u, where
u =
min
u, u = 1
standard deviation of u
T
n
x is

min
(similarly for maximum variability)
Estimation 610
Degenerate Gaussian vectors
it is convenient to allow to be singular (but still =
T
0)
(in this case density formula obviously does not hold)
meaning: in some directions x is not random at all
write as
= [Q
+
Q
0
]


+
0
0 0

[Q
+
Q
0
]
T
where Q = [Q
+
Q
0
] is orthogonal,
+
> 0
columns of Q
0
are orthonormal basis for N()
columns of Q
+
are orthonormal basis for range()
Estimation 611
then Q
T
x = [z
T
w
T
]
T
, where
z N(Q
T
+
x,
+
) is (nondegenerate) Gaussian (hence, density formula
holds)
w = Q
T
0
x R
n
is not random
(Q
T
0
x is called deterministic component of x)
Estimation 612
Linear measurements
linear measurements with noise:
y = Ax +v
x R
n
is what we want to measure or estimate
y R
m
is measurement
A R
mn
characterizes sensors or measurements
v is sensor noise
Estimation 613
common assumptions:
x N( x,
x
)
v N( v,
v
)
x and v are independent
N( x,
x
) is the prior distribution of x (describes initial uncertainty
about x)
v is noise bias or oset (and is usually 0)

v
is noise covariance
Estimation 614
thus

x
v

x
v


x
0
0
v

using

x
y

I 0
A I

x
v

we can write
E

x
y

x
A x + v

and
E

x x
y y

x x
y y

T
=

I 0
A I


x
0
0
v

I 0
A I

T
=


x

x
A
T
A
x
A
x
A
T
+
v

Estimation 615
covariance of measurement y is A
x
A
T
+
v
A
x
A
T
is signal covariance

v
is noise covariance
Estimation 616
Minimum mean-square estimation
suppose x R
n
and y R
m
are random vectors (not necessarily Gaussian)
we seek to estimate x given y
thus we seek a function : R
m
R
n
such that x = (y) is near x
one common measure of nearness: mean-square error,
E(y) x
2
minimum mean-square estimator (MMSE)
mmse
minimizes this quantity
general solution:
mmse
(y) = E(x|y), i.e., the conditional expectation of x
given y
Estimation 617
MMSE for Gaussian vectors
now suppose x R
n
and y R
m
are jointly Gaussian:

x
y

N

x
y


x

xy

T
xy

y

(after alot of algebra) the conditional density is
p
x|y
(v|y) = (2)
n/2
(det )
1/2
exp

1
2
(v w)
T

1
(v w)

,
where
=
x

xy

1
y

T
xy
, w = x +
xy

1
y
(y y)
hence MMSE estimator (i.e., conditional expectation) is
x =
mmse
(y) = E(x|y) = x +
xy

1
y
(y y)
Estimation 618

mmse
is an ane function
MMSE estimation error, x x, is a Gaussian random vector
x x N(0,
x

xy

1
y

T
xy
)
note that

xy

1
y

T
xy

x
i.e., covariance of estimation error is always less than prior covariance of x
Estimation 619
Best linear unbiased estimator
estimator
x =
blu
(y) = x +
xy

1
y
(y y)
makes sense when x, y arent jointly Gaussian
this estimator
is unbiased, i.e., E x = Ex
often works well
is widely used
has minimum mean square error among all ane estimators
sometimes called best linear unbiased estimator
Estimation 620
MMSE with linear measurements
consider specic case
y = Ax + v, x N( x,
x
), v N( v,
v
),
x, v independent
MMSE of x given y is ane function
x = x + B(y y)
where B =
x
A
T
(A
x
A
T
+
v
)
1
, y = A x + v
intepretation:
x is our best prior guess of x (before measurement)
y y is the discrepancy between what we actually measure (y) and the
expected value of what we measure ( y)
Estimation 621
estimator modies prior guess by B times this discrepancy
estimator blends prior information with measurement
B gives gain from observed discrepancy to estimate
B is small if noise term
v
in denominator is large
Estimation 622
MMSE error with linear measurements
MMSE estimation error, x = x x, is Gaussian with zero mean and
covariance

est
=
x

x
A
T
(A
x
A
T
+
v
)
1
A
x

est

x
, i.e., measurement always decreases uncertainty about x
dierence
x

est
gives value of measurement y in estimating x
e.g., (
est ii
/
x ii
)
1/2
gives fractional decrease in uncertainty of x
i
due
to measurement
note: error covariance
est
can be determined before measurement y is
made!
Estimation 623
to evaluate
est
, only need to know
A (which characterizes sensors)
prior covariance of x (i.e.,
x
)
noise covariance (i.e.,
v
)
you do not need to know the measurement y (or the means x, v)
useful for experiment design or sensor selection
Estimation 624

You might also like