0% found this document useful (0 votes)
31 views

STA457_Lecture11

The lecture focuses on Integrated ARMA (ARIMA) models for time series analysis, particularly for nonstationary data. It outlines the steps for building ARIMA models, including data plotting, transformation, identifying model orders, parameter estimation, and diagnostics. An example using U.S. GNP data illustrates the application of these steps in practice.

Uploaded by

harperzhang2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

STA457_Lecture11

The lecture focuses on Integrated ARMA (ARIMA) models for time series analysis, particularly for nonstationary data. It outlines the steps for building ARIMA models, including data plotting, transformation, identifying model orders, parameter estimation, and diagnostics. An example using U.S. GNP data illustrates the application of these steps in practice.

Uploaded by

harperzhang2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

STA457: Time Series Analysis

Lecture 11

Lijia Wang

Department of Statistical Sciences


University of Toronto

Lijia Wang (UofT) STA457: Time Series Analysis 1 / 28


Overview

Last Time:
1 Forecasting
2 Estimation
Today:
1 Integrated ARMA (ARIMA) models
2 Building ARIMA models

Lijia Wang (UofT) STA457: Time Series Analysis 2 / 28


Outline

1 Integrated Models for Nonstationary Data

2 Building ARIMA Models

Lijia Wang (UofT) STA457: Time Series Analysis 3 / 28


Motivation 1

We consider the model


xt = µt + yt ,
where µt = β0 + β1 t and yt is stationary. Differencing such a process will
lead to a stationary process:

∇xt = xt − xt−1 = β1 + yt − yt−1 = β1 + ∇yt .

Lijia Wang (UofT) STA457: Time Series Analysis 4 / 28


Motivation 2

Another model that leads to first differencing is the case in which µt is


stochastic and slowly varying according to a random walk. That is,

µt = µt−1 + νt ,

where νt is stationary. In this case,

∇xt = νt + ∇yt ,

is stationary.

Lijia Wang (UofT) STA457: Time Series Analysis 5 / 28


The Integrated ARMA, or ARIMA, Model

Definition: A process xt is said to be ARIMA(p, d, q) if

∇d xt = (1 − B)d xt

is ARMA(p, q). In general, we will write the model as

ϕ(B)(1 − B)d xt = θ(B)wt .

If E (∇d xt ) = µ, we write the model as

ϕ(B)(1 − B)d xt = δ + θ(B)wt ,

where δ = µ(1 − ϕ1 − ϕ2 − · · · − ϕp ).

Lijia Wang (UofT) STA457: Time Series Analysis 6 / 28


ARIMA forecasting

It should be clear that, since yt = ∇d xt is ARMA, we can use the


previously introduced methods to obtain forecasts of yt , which in turn lead
to forecasts for xt .

n
For example, if d = 1, given forecasts yn+m for m = 1, 2, . . ., we have
n n n
yn+m = xn+m − xn+m−1 , so that

n n n
xn+m = yn+m + xn+m−1
n
with initial condition xn+1 n
= yn+1 + xn .

Lijia Wang (UofT) STA457: Time Series Analysis 7 / 28


Example: IMA(1,1) and EWMA model

The ARIMA(0,1,1), or IMA(1,1) model is of interest because many


economic time series can be successfully modeled this way. In addition, the
model leads to a frequently used forecasting method called exponentially
weighted moving averages (EWMA). We will write the model as

xt = xt−1 + wt − λwt−1 ,
with |λ| < 1, for t = 1, 2, . . ., and x0 = 0. We could also include a drift
term in the formula.

Lijia Wang (UofT) STA457: Time Series Analysis 8 / 28


Example: IMA(1,1) and EWMA model

If we write

yt = wt − λwt−1 ,
we may write the IMA(1,1) as P xt = xt−1 + yt . Because |λ| < 1, yt has an
invertible representation, yt = ∞ j
j=1 λ yt−j + wt , and substituting
yt = xt − xt−1 , we may write

X
xt = (1 − λ)λj−1 xt−j + wt .
j=1

as an approximation for large t (put xt = 0 for t ≤ 0), which is the


exponentially weighted moving averages (EWMA).

Lijia Wang (UofT) STA457: Time Series Analysis 9 / 28


Outline

1 Integrated Models for Nonstationary Data

2 Building ARIMA Models

Lijia Wang (UofT) STA457: Time Series Analysis 10 / 28


Steps for building ARIMA Models

There are a few basic steps to fitting ARIMA models to time series data.
These steps involve:
1 Plotting the data and interpreting the plot.
2 Possibly transforming the data (e.g., log, first difference).
3 Identifying the dependence orders of the model (p, d, q).
4 Parameter estimation.
5 Diagnostics of residuals (interpretation, normality assumptions, ACF
graphs).
6 Model choice.

Lijia Wang (UofT) STA457: Time Series Analysis 11 / 28


Steps 1 and 2: Plotting the data and transforming

1. Plotting the Data: First, as with any data analysis, we should


construct a time plot of the data and inspect the graph for any
anomalies.
2. Transforming the Data: If the variability in the data grows with time,
it will be necessary to transform the data to stabilize the variance. In
such cases, the Box–Cox class of power transformations can be
employed.

Lijia Wang (UofT) STA457: Time Series Analysis 12 / 28


Step 3: Identifying the dependence orders (p, d, q)

3. Identifying the Dependence Orders of the Model: After suitably


transforming the data, the next step is to identify preliminary values
of the autoregressive order, p, the order of differencing, d, and the
moving average order, q.

Lijia Wang (UofT) STA457: Time Series Analysis 13 / 28


Step 3.1: Identifying the differencing order d

3.1 Identifying the differencing order d:


The time plot and the ACF plot can help in indicating whether
differencing is needed. A slow decay in the sample ACF is an indication
that differencing may be needed.
If from the plots, we see that differencing is needed and differencing is
called for, then difference the data once, d = 1, and inspect the time
plot of ∇xt . If we see that additional differencing is necessary, then try
differencing again and inspect a time plot of ∇2 xt . We repeat the
procedure
Watch out for over-differencing: Be careful not to over-difference
because this may introduce dependence where none exists. For
example, xt = wt is serially uncorrelated, but ∇xt = wt − wt−1 is
MA(1).

Lijia Wang (UofT) STA457: Time Series Analysis 14 / 28


Step 3.2: Identifying the AR order p and MA order q

3.2 Identifying the autoregressive order p and moving average order q:


When preliminary values of d have been settled, the next step is to
look at the sample ACF and PACF of ∇d xt for whatever values of d
have been chosen.
We use the plots to identify parameters p and q following the
properties of ACF and PACF introduced in previous lectures.

Lijia Wang (UofT) STA457: Time Series Analysis 15 / 28


Step 3.2: Identifying the AR order p and MA order q

3.2 Identifying the autoregressive order p and moving average order q:


Note that it cannot be the case that both the ACF and PACF cut off.
Because we are dealing with real data estimates, it will not always be
clear whether the sample ACF or PACF is tailing off or cutting off.
Also, two models that are seemingly different can actually be very
similar.
With this in mind, we should not worry about being so precise at this
stage of the model fitting. At this point, with a few preliminary values
of p, d, and q at hand, and we can start estimating the parameters.

Lijia Wang (UofT) STA457: Time Series Analysis 16 / 28


Step 4: Estimate model parameters

4. Estimate model parameters: We estimate model parameters using


both the MOM and the MLE approaches that were introduced in
previous lectures.

Lijia Wang (UofT) STA457: Time Series Analysis 17 / 28


Step 5: Model Diagnostics

5. Diagnostics: This investigation focuses on the analysis of the


residuals. The diagnostic results provide base for model selection.

The standardized innovations or residuals can be computed by

xt − x̂ t−1
et = q t ,
P̂tt−1

where x̂tt−1 is the one-step-ahead prediction of xt based on the fitted


model and P̂tt−1 is the estimated one-step-ahead error variance. If the
model fits well, the standardized residuals should behave as an iid
sequence with mean zero and variance one. A normal probability plot
or a Q-Q plot can help in identifying departures from normality.

Lijia Wang (UofT) STA457: Time Series Analysis 18 / 28


Step 5: Model Diagnostics

A good check on the correlation structure of the residuals is to plot


ρ̂e (h) versus h along with the error bounds of ± √2n .
The Ljung–Box–Pierce test can be used to identify whether ρ̂e (h) is
small in magnitude for a given lag h.

Lijia Wang (UofT) STA457: Time Series Analysis 19 / 28


Step 6: Model choosing

6. Model choosing: There may be multiple candidate models after Step


3. We choose the best one following the diagnose results.

Lijia Wang (UofT) STA457: Time Series Analysis 20 / 28


Example: Analysis of US GNP Data

In this example, we consider the analysis of quarterly U.S. GNP from


1947(1) to 2002(3), n = 223 observations. The data are real U.S. gross
national product in billions of chained 1996 dollars and have been
seasonally adjusted. The data were obtained from the Federal Reserve
Bank of St. Louis (http://research.stlouisfed.org/).

Lijia Wang (UofT) STA457: Time Series Analysis 21 / 28


Example: Step 1&2

Figure: Quarterly U.S. GNP from 1947(1) to 2002(3)

Lijia Wang (UofT) STA457: Time Series Analysis 22 / 28


Example: Step 3

When reports of GNP and similar economic indicators are given, it is often
in growth rate (percent change) rather than in actual (or adjusted) values
that is of interest. The growth rate, say, xt = ∇log(yt ), is plotted, and it
appears to be a stable process.

Figure: U.S. GNP quarterly growth rate

Lijia Wang (UofT) STA457: Time Series Analysis 23 / 28


Example: Step 3

Figure: Sample ACF and PACF of the GNP quarterly growth rate

Lijia Wang (UofT) STA457: Time Series Analysis 24 / 28


Example: Step 3

Inspecting the sample ACF and PACF, we might feel that two models are
suitable for the data:
1 The ACF is cutting off at lag 2 and the PACF is tailing off. This
would suggest the GNP growth rate follows an MA(2) process, or log
GNP follows an ARIMA(0,1,2) model.
2 The ACF is tailing off and the PACF is cutting off at lag 1. This
suggests an AR(1) model for the growth rate, or ARIMA(1,1,0) for
log GNP.
Rather than focus on one model, we will fit both models.

Lijia Wang (UofT) STA457: Time Series Analysis 25 / 28


Example: Step 4

Using MLE to fit the models:


1 For the MA(2) model:

x̂t = .008(.001) +.303(.065) ŵt−1 +.204(.064) ŵt−2 +ŵt with σ̂w = .0094

2 For the AR(1) model:

x̂t = .008(.001) (1 − .347) + .347(.063) x̂t−1 + ŵt with σ̂w = .0095

The values in parentheses are the corresponding estimated standard errors.


All of the regression coefficients are significant, including the constant.

Lijia Wang (UofT) STA457: Time Series Analysis 26 / 28


Example: Step 5

We take the MA(2) for example:

Figure: Residual plots of the GNP quarterly growth rate

Lijia Wang (UofT) STA457: Time Series Analysis 27 / 28


Example: Step 5

Performing the Ljung-Box Test

The figure shows the p-values associated with the Ljung-Box Q-statistic,
at lags H = 3 through H = 20 (with corresponding degrees of freedom
H − 2).

Lijia Wang (UofT) STA457: Time Series Analysis 28 / 28

You might also like