0% found this document useful (0 votes)
40 views17 pages

Random Variables

1) A random variable is a function that assigns a real number to each outcome of a random experiment. The sample space is the domain of the random variable, and the set of all values it can take on is its range. 2) The cumulative distribution function (CDF) of a random variable X gives the probability that X is less than or equal to any value x. It ranges from 0 to 1 and is a non-decreasing function of x. 3) For an exponential random variable with parameter λ, its CDF is 1 - e-λx for x ≥ 0. The empirical distribution function estimates the true CDF based on a random sample.

Uploaded by

titser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views17 pages

Random Variables

1) A random variable is a function that assigns a real number to each outcome of a random experiment. The sample space is the domain of the random variable, and the set of all values it can take on is its range. 2) The cumulative distribution function (CDF) of a random variable X gives the probability that X is less than or equal to any value x. It ranges from 0 to 1 and is a non-decreasing function of x. 3) For an exponential random variable with parameter λ, its CDF is 1 - e-λx for x ≥ 0. The empirical distribution function estimates the true CDF based on a random sample.

Uploaded by

titser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Random Variables Overview Definition: Random Variable

• Definitions X(ζ) = ζ
S
• Cumulative distribution function
Real
• Probability density Function ζ Line

• Functions of random variables Range


• Expected values
• Definition: a random variable X is a function that assigns a real
• Mean & variance
number, X(ζ), to each outcome ζ in the sample space of a
• Markov & Chebyshev inequalities random experiment.
• Independence & marginal distributions • The sample space S is the domain of the random variable
• Bayes rule and conditional probability • The set of all values that X can have is the range of the random
variable
• Mean square estimation
• This is a many to one mapping. That is, a set of points, ζ1 , ζ2 , . . .
• Linear prediction
may take on the same value of the random variable
• Will abbreviate as simply “RV”

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 1 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 2

Example 1: Random Variable Definitions Cumulative Distribution Function


Suppose that a coin is tossed three times and the sequence of heads The cumulative distribution function (CDF) of a random variable X
and tails is noted. What is the sample space for this experiment? Let is defined as the probability of the event {X ≤ x}:
X be the number of heads in three coin tosses. What is the range of
F (x) = P [X ≤ x]
X? List all of the points in the domain of the sample space and the
corresponding values of X.
• Sometimes is just called distribution function
• Here X is the random variable and x is a non-random variable

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 3 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 4
Properties of the CDF Example 2: Distribution Functions
1. 0 ≤ F (x) ≤ 1 The arrival time of Joe’s email obeys the exponential probability law
with parameter λ:
2. limx→+∞ F (x) = 1
3. limx→−∞ F (x) = 0 0 x<0
P [X > x] = −λx
λe x ≥ 0.
4. F (x) is a nondecreasing function of x. Thus, if a < b, then
F (a) ≤ F (b). Find the CDF of X for λ = 2 and plot F (x) versus x.
5. F (x) is continuous from the right. That is, for h > 0,
F (b) = limh→0 F (b + h) = F (b+ )
6. P [a < X ≤ b] = F (b) − F (a)
7. P [X = b] = F (b) − F (b− )

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 5 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 6

Example 2: Distribution Plot Example 2: MATLAB Code


function [] = ExponentialCDF();
Exponential Cumulative Distribution Function
close all;

1 %FigureSet(1);
%figure(1);
FigureSet(1,’LTX’);
lambda = 2;
x = 0:0.01:2;
0.8 y = 1-exp(-lambda*x);

h = plot(x,y,’b’,[0 100],[1 1],’k:’);


set(h,’LineWidth’,1.5);
axis([0 max(x) 0 1.1]);
0.6 xlabel(’x’);
F(x)

ylabel(’F(x)’);
title(’Exponential Cumulative Distribution Function’);
set(gca,’Box’,’Off’);
AxisSet(8);
0.4
print -depsc ExponentialCDF;

0.2

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 7 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 8
Empirical Distribution Function Example 3: Empirical Distribution Function Plot
Let X1 , X2 , . . . , Xn be a random sample. The empirical distribution Exponential Emperical Distribution Function N:25
function (edf) is a function of x which equals the fraction of Xi s that
are less than or equal to x for each x, −∞ < x < ∞ 1

• The “true” CDF is never known


0.8
• All we have is data
• The edf is a rough estimate of the CDF
0.6

S(x)
• Piecewise-constant function (stairs)
• Assuming the sample consist of distinct values, each step has 0.4
height = n1
• Minimum value: 0, Maximum value: 1 0.2
• Nondecreasing
• Is a random function 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 9 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 10

Example 3: MATLAB Code Random Variable Types


function [] = ExponentialEDF();
Discrete: An RV whose CDF is a right-continuous, staircase
close all;
(piecewise constant) function of x.
FigureSet(1,’LTX’);
lambda = 2;
N = 25;
• Only takes on values from a finite set
R = exprnd(1/lambda,N,1); • Encounter often in applications involving counting
x = 0:0.02:max(R);
F = 1-exp(-lambda*x);
Continuous: An RV whose CDF is continuous everywhere
h = cdfplot(R);
hold on; • Can be writtenas an integral of some nonnegative function
plot(x,F,’r’,[0 100],[1 1],’k:’);
x
hold off;
grid off;
f (x): F (x) = −∞ f (u) du
set(h,’LineWidth’,1.0);
set(gca,’XLim’,[0 max(R)]);
set(gca,’YLim’,[0 1.1]);
• Implies P [X = x] = 0 everywhere.
xlabel(’x’);
ylabel(’S(x)’); • In words, there is an infinitesimal probability X will be equal to
title(sprintf(’Exponential Emperical Distribution Function
box off;
N:%d’,N));
any specific number x.
AxisSet(8);
print -depsc ExponentialEDF;
• Nonetheless, an experiment will cause X to equal some value.
Mixed: An RV with a CDF that has jumps on a countable set of
x
Note: MATLAB defines the distribution as f (x) = λ1 e− λ u(x) rather points, but also increases continuously over one or more intervals.
than f (x) = λe−λx u(x). IOW, everything else.

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 11 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 12
Definition: Probability Density Function (PDF) Properties of the PDF
The probability density function (PDF) of a continuous RV is 1. f (x) ≥ 0
defined as the derivative of F (x): b
2. P [a ≤ X ≤ b] = a f (u) du
dF (x) x
f (x) = 3. F (x) = −∞ f (u) du
dx
+∞
Alternatively, 4. −∞ f (u) du = 1
F (x − ) + F (x + ) 5. A valid PDF can be formed from any nonnegative, piecewise
f (x) = lim
→0 2 continuous function g(x) that has a finite integral
6. The PDF must be defined for all real values of x
• Conceptually, it is more useful than the CDF
7. If X does not take on some values, this implies f (x) = 0 for those
• Does not technically exist for discrete or mixed RV’s values
– Can finesse with impulse functions
du(x)
– δ(x) = dx where u(x) is the unit step function
• PDF represents the density of probability at the point x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 13 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 14

Example 4: Exponential PDF Example 4: MATLAB Code


function [] = ExponentialPDF();
Exponential CDF and PDF close all;
1 FigureSet(1,’LTX’);

lambda = 2;
F(x)

x = -0.5:0.005:2;
0.5 xl = [min(x) max(x)];
F = zeros(size(x));
id = find(x>=0);
F(id) = 1-exp(-lambda*x(id));
f = zeros(size(x));
0 f(id) = lambda*exp(-lambda*x(id));
−0.5 0 0.5 1 1.5 2
subplot(2,1,1);
h = plot(x,F,’b’,xl,[1 1],’k:’);
set(h,’LineWidth’,1.5);
2 xlim(xl);
ylim([0 1.1]);
ylabel(’F(x)’);
1.5 title(’Exponential CDF and PDF’);
box off;
f(x)

1 subplot(2,1,2);
h = plot(x,f,’g’);
0.5 set(h,’LineWidth’,1.5);
xlim(xl);
ylim([0 2.1]);
0 xlabel(’x’);
−0.5 0 0.5 1 1.5 2 ylabel(’f(x)’);
box off;
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 15 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 16
AxisSet(8); Histograms
Let X1 , X2 , . . . , Xn be a random sample. The histogram is a function
print -depsc ExponentialPDF;

of x which equals the fraction of Xi s that are within specified intervals.


• Like the CDF, the “true” PDF is never known
• The histogram is a rough estimate of the PDF
• Usually shown in the form of a bar plot
• Minimum value: 0, Maximum value: ∞
• Is a random function
• Perhaps the most common graphical representation of estimated
PDFs

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 17 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 18

Example 5: Histograms Histogram Comments

Exponential Emperical Distribution Function N:100 • Histograms can be misleading


1 • The apparent shape of the histogram is sensitive to
True

0.5 – The bin locations


0
– The bin widths
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
• It can be shown that the bin width affects the bias and the
1 variance of this estimator of the PDF
Estimated

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

1 True
Estimated
0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 19 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 20
Histogram Accuracy Example 5: Histograms with Different Bin Centers
∞ 2 Exponential Emperical Distribution Function N:100
ISE = fˆ(u) − f (u) du 1
−∞

True
0.5
1
BIAS fˆ(x) = 2 f (x) [h − 2(x − bj )] + O(h2 ) for x ∈ (bj , bj + 1]
f (x)
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Var fˆ(x) = + O(n−1 )
nh 1

Estimated
1 h2 R(f )
MISE = + + O(n−1 ) + O(h3 ) 0.5
nh 12
where h is the bin width, bj is the jth bin boundary, and 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
R(φ) = φ(u)2 du
1 True
• The bin width controls the bias-variance tradeoff Estimated
0.5
• More on all of this later
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 21 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 22

Example 6: Uniform Distribution Gaussian RV’s


Plot the CDF and PDF for a uniform random variable X ∼ U[a, b]. 1
Gaussian Distribution Function
0.4
Gaussian Density Function

Note: X ∼ U[a, b] denotes that X is drawn from a uniform 0.8


0.3
distribution and has a range of [a, b]. 0.6

F(x)

f(x)
0.2
0.4
0.1
0.2

0 0
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x

1 (x−m)2
f (x) = √ e− 2σ2
2πσ
x
1 (x−m)2
F (x) = P [X ≤ x] = √ e− 2σ 2 dx
2πσ −∞

• Denoted as X ∼ N (μX , σX
2
)
• Also called the normal distribution
• Arises naturally in many applications

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 23 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 24
• Central limit theorem (more later) Functions of RV’s
• We will work with functions of RV’s: Y = g(X)
• Y is also an RV
• Example: Y = aX + b

FY (y) = P [Y ≤ y]
= P [aX + b ≤ y]

y−b
= P X≤
a


y−b
= FX
a
for a > 0.

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 25 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 26

Expected Values Overview Expected Values Defined


+∞
• To completely describe all of the known information about an RV,
we must specify the CDF or PDF E[X] = x f (x) dx
−∞
• Given a data set, estimating the CDF/PDF is one of the most
difficult problems we will discuss (density estimation) • The expected value of a random variable X is denoted E[X]
• Often, much less information about the distribution of X is • This is called the mean of X
sufficient
• The expected value of X is only defined if the integral converges
– Mean absolutely: ∞
– Median
|x| f (x) dx < ∞
– Standard deviation −∞
– Range • The “best” estimate of the mean of X given a data set is the
• These scalar descriptive statistics are called point estimates sample average,
N
1
X̄ = xi ≈ E[x]
N i=1

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 27 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 28
Average versus Mean Expected Values of Functions
N
We can also calculate the expected values of functions of random
+∞
1 variables. Let Y = g(X). Then,
E[X] = μx = xf (x) dx x̄ = μ̂x = xi
−∞ N i=1 ∞
E[Y ] = g(x)f (x) dx
−∞
Note the distinction between the average and mean
Example Let g(X) = I(X) where I(X) is the indicator function of
• The average is the event {X in C}, where C is some interval in the real line:
– an estimate of the mean
– calculated from a data set 0 X not in C
g(X) =
– a random variable 1 X in C

• The mean is then



– Calculated from a PDF
E[Y ] = g(x)f (x) dx = f (x) dx = P [X ⊂ C]
– Not a random variable −∞ C
– A property of the PDF Thus, the expected value of the indicator of an event is equal to the
probability of the event.

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 29 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 30

Expected Value Properties Variance


1. E[c] = c where c is a constant The variance of a random variable is defined as follows:

2. E[cX] = c E[X] σX
2
≡ E[(X − μx )2 ]
N N
3. E[ k=1 gk (X)] = k=1 E[gk (X)] The nth moment of an RV is defined as

4. Proof left as a homework assignment E[X n ] ≡ xn f (x) dx
−∞

• Variance is a measure of how wide a distribution is


• A measure of dispersion
• There are others as well

• The standard deviation is defined as σ ≡ σ2
• σX
2
= E[X 2 ] − E[X]2
• Both are properties of the CDF and are not RVs

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 31 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 32
Markov Inequality Example 7: Markov Inequality
The mean and variance of a RV X give us sufficient information to The mean height of children in a kindergarten class is 3.5 feet. Find
establish bounds on certain probabilities. Suppose that X is a the bound on probability that a kid in the class is taller than 9 feet.
nonnegative random variable.
Markov inequality:
E[X]
P [X ≥ a] ≤
a
Proof
a ∞
E[X] = xf (x) dx + xf (x) dx
a
0

≥ xf (x) dx
a

≥ af (x) dx
a
= aP [X ≥ a]

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 33 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 34

Chebyshev Inequality Multiple Random Variables


σ2 A vector random variable is a function that assigns a vector of real
P [|X − μ| ≥ a] ≤ numbers to each outcome ζ in S, the sample space of the random
a2
experiment.
Proof Let D = (X − μ) where μ = E[X]. Then apply the Markov
2 2

inequality • Example: randomly select a student


E[(X − μ)2 ] σ2 • X ≡ [H(ζ), W (ζ), A(ζ)]
P [D ≥ a] = P [D2 ≥ a2 ] ≤ =
a2 a2 • Where
• These bounds are very loose – H(ζ) = height of student ζ
• Note: if σ 2 = 0, the Chebyshev inequality implies P [X = μ] = 1 – W (ζ) = weight of student ζ
– A(ζ) = age of student ζ

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 35 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 36
Jointly Continuous Random Variables Example 8: Jointly Continuous RV
Random variables X and Y are jointly continuous if the probabilities Gaussian Density Function f(x,y)
1.5
of events involving (X, Y ) can be expressed as an integral of a PDF.
0.25
In other words, there is a joint probability density function that is
defined on the real plane such that for any event A,
1 0.2
P [X, Y in A] = fX,Y (u, v) du dv
A

0.5 0.15

y
Properties
+∞ +∞
• −∞ −∞ fX,Y (u, v) du dv = 1
0.1
d2 FX,Y (x,y)
• fX,Y (x, y) = dx dy
0

0.05

−0.5
−0.5 0 0.5 1 1.5
x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 37 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 38

Example 8: MATLAB Code Example 8: Jointly Continuous RV Continued

Gaussian Density Function f(x,y)


0.25

0.4
0.2
0.3
F(x,y)

0.2
0.15

0.1
0.1
0
2
1 1.5
1 0.05
0 0.5
0
y −1 −0.5 x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 39 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 40
Joint Cumulative Distribution Function (CDF) Marginal PDF’s

Y Y x Y
FX,Y (x, y) FX (x) FY (y)
x y
y

X X X

The marginal PDF’s fX (x) and fY (y) are given as follows


There is also a joint CDF: ∞
x y fX (x) = fX,Y (x, y) dy
FX,Y (x, y) = fX,Y (u, v) du dv −∞

−∞ −∞
= P [X ≤ x & Y ≤ y] fY (y) = fX,Y (x, y) dx
−∞
= P [X ≤ x, Y ≤ y]
• fX (x) is the same as the PDF of X, if Y had not been considered
• The marginal PDF can be obtained from the joint PDF
• The joint PDF cannot be obtained from the marginal PDF

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 41 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 42

Independence Conditional CDF’s & Bayes’ Theorem


Two random variables X and Y are independent if and only if their The conditional CDF of Y given X = x:
joint PDF is equal to the product of the marginal PDF’s.
FY (y|x) = lim FY (y|x < X ≤ x + h)
h→0
fX,Y (x, y) = fX (x)fY (y) y
−∞
fX,Y (x, y ) dy
Equivalently, they are independent if and only if their joint CDF is =
fX (x)
equal to the product of the marginal CDF’s.
Proof omitted.
FX,Y (x, y) = FX (x)FY (y)
The conditional PDF of Y given X = x:
d fX,Y (x, y)
fY (y|x) = FY (y|x) =
• If X and Y are independent, the random variables W = g(X) and dy fX (x)
Z = h(Y ) are also independent
• This can be viewed as a form of Bayes’ theorem
• Gives a posteriori probability that Y is close to y given that X is
close to x

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 43 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 44
Conditional CDF’s & Independence Conditional Expectation
If X and Y are independent The conditional expectation of Y given X = x is defined by

fX,Y (x, y) = fX (x)fY (y)
EY [Y |X = x] = y fY (y|x) dy
−∞
and the conditional PDF of Y given X = x is
fX,Y (x, y) • EY [Y |X = x] can be viewed as a function of x: g(x) = EY [Y |x]
fY (y|x) =
fX (x)
• g(X) = EY [Y |X] is a random variable
fX (x)fY (y)
= • It can be shown that EY [Y ] = EX [g(X)] = EX [EY [Y |X]]
fX (x)
= fY (y) • More generally, EY [h(Y )] = EX [EY [h(Y )|X]] where


EX [EY [h(Y )|X]] = h(y)fY (y|x) dy fX (x) dx
−∞ −∞

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 45 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 46

Correlation and Covariance Correlation Coefficient


The jk th moment of X and Y is defined as The correlation coefficient of X and Y is defined as
∞ ∞ σX,Y
2
j k
E[X Y ] = xj y k fX,Y (x, y) dx dy ρX,Y =
σX σY
−∞ −∞ • −1 ≤ ρX,Y ≤ 1
• Extreme values of ρX,Y indicate a linear relationship between X
• The correlation of X and Y is defined as E[XY ] and Y : Y = aX + b
• If E[XY ] = 0, we say X and Y are orthogonal • ρX,Y = 1 implies a > 0, ρX,Y = −1 implies a < 0
• The covariance of X and Y is defined as • X and Y are said to be uncorrelated if ρX,Y = 0
• If X and Y are independent, σX,Y
2
= 0 (see homework)
σX,Y
2
= E[(X − μX )(Y − μY )]
• If X and Y are independent, ρX,Y = 0
• If ρX,Y = 0, X and Y may not be independent
• Uncorrelated variables are not necessarily independent
• However, if X and Y are Gaussian random variables, then
ρX,Y = 0 implies X and Y are independent

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 47 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 48
Mean Square Estimation Example 9: Minimum MSE Estimation
Observed Suppose we wish to estimate a random variable Y with a constant a.
Variables Output What is the best value of a that minimizes the MSE?
x1,...,x n Model y

• Often will want to estimate the value of one RV Y from one or


more other RVs X: Ŷ = g(X)
• Encounter often in nonlinear modeling and classification
• It may be that Y = g(X)
• The estimation error is defined as Y − g(X)
• We will assign a cost to each error c(Y − g(X))
• Goal: find G(X) that minimizes E[c(Y − g(X))]
• The most common cost function is mean squared error (MSE):
MSE = E[(Y − g(X))2 ]

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 49 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 50

Example 10: Minimum Linear MSE Estimation Example 10: Workspace (1)
Suppose we wish to estimate a random variable Y with a linear
function of X, Ŷ = aX + b. What values of a and b minimize the
MSE?

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 51 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 52
Example 10: Workspace (2) MMSE Linear Estimation Discussion

Ŷ = a∗ X + b∗


X − E[X]
= ρX,Y σY + E[Y ]
σX
• Note that X−E[X]
σX is just a scaled version of X
– Zero mean
– Unit variance
– Sometimes called a z score
X−E[X]
• Xs = σY σX has the variance of Y
• The term E[Y ] ensures that E[Ŷ ] = E[Y ]
• ρX,Y specifies the sign and extent of Y relative to Xs
• If uncorrelated, Ŷ = E[Y ]

X−E[X]
• If perfectly correlated, Ŷ = ±σY σX + E[Y ] = Y

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 53 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 54

Orthogonality Condition Best Linear Estimator MMSE


The orthogonality condition states that the error of the best linear
E {(Y − E[Y ]) − a∗ (X − E[X])}
2
estimator is orthogonal to the observation X − E[X]. MMSE =
2
• Fundamental result in mean square estimation ∗
= E Ỹ − a X̃
• Central to the area of linear estimation
• Enables us to more easily find the minimum MSE of the best = E Ỹ − a∗ X̃ Ỹ − a∗ E Ỹ − a∗ X̃ X̃
linear estimator
= E Ỹ − a∗ X̃ Ỹ
• The notation will be simplified by the following notation
= σY2 − a∗ σX,Y
2
X̃ = X − E[X] Ỹ = Y − E[Y ]
σX,Y
2
= σY −
2
σX,Y
2
• These are called centered random variables σX2

= σY2 (1 − ρ2X,Y )

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 55 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 56
Best Linear Estimator MMSE Nonlinear Estimation
• In general, the best estimator of Y given X will be nonlinear
E {(Y − E[Y ]) − a∗ (X − E[X])}
2
MMSE =
• Suppose we wish to find the g(X) that best approximates Y in

= σY2 1 − ρ2X,Y the MMSE sense
min EX,Y [(Y − g(X))2 ]
g(·)
• When ρX,Y = ±1, MMSE = 0 Using conditional expectation
• Perfect correlation implies perfect prediction
2 2
EX,Y (Y − g(X)) = EX EY (Y − g(x)) |X = x
• No correlation (ρX,Y = ±0) implies MMSE = σY2 ∞
2
= EY (Y − g(x)) |X = x fX (x) dx
−∞


= (y − g(x)) fY (y|x) dy fX (x) dx
2
−∞ −∞

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 57 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 58

Nonlinear Estimation Continued Random Vectors



∞ Let X be a random vector,
EX,Y [(Y − g(X))2 ] = (y − g(x))2 fY (y|x) dy fX (x) dx ⎡ ⎤
−∞ −∞ X1
⎢ X2 ⎥
⎢ ⎥
X=⎢ . ⎥
• Integrand is positive for all x ⎣ .. ⎦
• Minimized by minimizing EY [(Y − g(x))2 |X = x] for each x XL
• g(x) is a constant relative to EY [·] Then, the expected value of X is defined as
• Reduces to the equivalent example earlier: estimate Y with a ⎡ ⎤
E[X1 ]
constant g(x) ⎢ E[X2 ] ⎥
• Therefore, the g(x) that minimizes the MSE is ⎢ ⎥
E[X] = ⎢ . ⎥
⎣ .. ⎦
Ŷ = g ∗ (x) = EY [Y |X = x] E[XL ]

• The function g (x) is called the regression curve
• Has the smallest possible MSE
• Linear estimators are generally worse (larger MSE)

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 59 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 60
Linear Estimation with Vectors Linear Estimation Error
Suppose that we wish to estimate Y with a linear sum of random T
variables X1 , X2 , . . . XL . ε2 = (Y − X w)2
T T T

Ŷ =
T
X w = Y 2 + w XX w − 2Y X w
L
The expected value of the squared error is
= X i wi
i=1 MSE ≡ E[ε2 ]
T
Then the error Y − Ŷ can be written as = E[(Y − X w)2 ]
T T T

ε = Y −X w
T = E[Y 2 ] + w E[XX ]w − 2 E[Y X ]w

and the squared error can be written as


T
ε2 = (Y − X w)2
T T T
= Y 2 + w XX w − 2Y X w

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 61 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 62

Correlation Matrix Cross-Correlation Matrix


Let X be a zero-mean random vector. The variance-covariance of Let Y be a zero-mean scalar random variable. Define P , the cross
the vector X, also called the correlation matrix, can be written as correlation matrix, as
R = σ 2 {X} P = E[Y X]
T
⎡ ⎤
= E[XX ] E[Y X1 ]
⎡ 2 ⎤ ⎢ E[Y X2 ] ⎥
σ X1 σX
2
. . . σX
2 ⎢ ⎥
1 ,X2 1 ,XL = ⎢ .. ⎥
⎢ σX
2
σX2
. . . σX
2 ⎥ ⎣ . ⎦
⎢ 2 ,X1 2 2 ,XL ⎥
= ⎢ . .. .. .. ⎥ E[Y XL ]
⎣ .. . . . ⎦
σX
2
L ,X1
σX
2
L ,X2
... σXL
2

T
R is a symmetric matrix: R = R.

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 63 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 64
Minimum Mean Squared Error Minimum Mean Squared Error
Using the matrices R and P , the MSE can be rewritten w∗ = R−1 P
T
E[ε2 ] = E[(Y − X w)2 ] Find the minimum MSE by substitution into the equation for the MSE
T T T
= E[Y ] + w E[XX ]w − 2 E[Y X ]w
2 T

T T
min E[ε2 ] = σY2 + w∗ Rw∗ − 2P w∗
= σY2 + w Rw − 2w P T
= σY2 + (R−1 P ) R(R−1 P ) − 2P (R−1 P )
Take the gradient of the MSE above with respect to w and set the =
T
σY2 + P R−1 P − 2P R−1 P
resulting expression equal to zero. T
T T = σY2 − P R−1 P
∇w E[ε ] 2
= ∇w (σY2 + w Rw − 2w P ) T
T = σY2 + P w∗
= R w + Rw − 2P
= 2Rw − 2P
= 0
Solving for w∗ , we obtain
w∗ = R−1 P

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 65 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 66

Closing Comments
• In general, we cannot calculate anything discussed so far
• Everything discussed requires the true PDF (or CDF) be known
• In practice, we’ll have data, not PDF’s
• Represents a best-case scenario
• How close can we approximate the true point estimate given only
data
• Will compare our estimators on cases where the true PDF is known

J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 67

You might also like