Dive into AR(p) models, stationarity conditions, and multi-step forecasting methods. Learn how to use model selection criteria, diagnose residuals, and avoid pitfalls for a robust time-series analysis strategy.
So, you’re looking at historical data and notice your variable of interest—call it yₜ—seems to have elements of its own past “echoing” into the present. In finance, this often pops up with interest rates, inflation, exchange rates, even volatility measures. When the current value of a time series depends on some combination of its own past values, we call it an autoregressive (AR) model.
Put more formally (but don’t worry, it’s actually not as scary as it looks):
$$ y_{t} = c + \phi_{1} y_{t-1} + \phi_{2} y_{t-2} + \dots + \phi_{p} y_{t-p} + \epsilon_t $$
where:
• yₜ is the value of the series at time t.
• c is a constant term (often interpreted as a long-run mean anchor).
• φᵢ are the parameters capturing how much past observations yₜ₋ᵢ contribute to today’s value.
• p is the “order” of the AR process (i.e., how many lags we include).
• εₜ is the error term (also referred to as white noise).
In words, an AR(p) model says: “Today’s value is a sum of (1) a constant, (2) some fraction of yₜ₋₁, (3) some fraction of yₜ₋₂, … all the way up to (p) lags, plus an error term.” Many financial data series—like monthly inflation—can be reasonably well modeled by an AR(1) or AR(2). I remember once we tried an AR(6) for yield spreads, and it felt like we were reading tea leaves back six months—sometimes it’s simply too much history.
Below is a small Mermaid diagram to illustrate the general flow of an AR(p) model. Each new observation depends on its recent historical values:
flowchart LR A["y(t-1)"] --> B["AR Process <br/>f(y(t-1), y(t-2), ..., y(t-p))"] C["y(t-2)"] --> B["AR Process <br/>f(y(t-1), y(t-2), ..., y(t-p))"] B["AR Process <br/>f(y(t-1), y(t-2), ..., y(t-p))"] --> D["Forecast of y(t)"]
A critical requirement for AR models to be valid in forecasting is stationarity. Roughly, a stationary process has a constant mean and variance over time, and its autocorrelations depend only on the lag, not on the specific time period. If the process keeps drifting or its variance grows indefinitely, the AR model as stated isn’t going to do its job well.
Mathematically, stationarity often requires that the roots of the characteristic polynomial:
$$ 1 - \phi_{1}z - \phi_{2}z^{2} - \dots - \phi_{p}z^{p} = 0 $$
lie outside the unit circle in the complex plane. Put more simply, we need |φ| < 1 in the AR(1) case (and analogous constraints in higher-order processes) to ensure the series doesn’t explode over time.
If you spot a time series that’s trending upward or downward too strongly, you might suspect nonstationarity. In real-world terms, if your process is heavily trending or has a structural break (like a sudden change in policy or market regime), you’d typically difference the series or apply other transformations before applying a standard AR model.
How do you decide how many lags to include (the order p)? If you include too few lags, you might leave out important dynamics. If you include too many, you can overfit—a big no-no when your ultimate goal might be forecasting. Two common metrics to guide you are:
• Akaike Information Criterion (AIC)
• Bayesian Information Criterion (BIC)
Both weigh goodness-of-fit against model complexity, punishing you for each additional parameter introduced. The difference is that BIC tends to impose a heavier penalty on extra parameters, often leading to a more parsimonious model.
Let’s say you’ve tested AR(1), AR(2), AR(3), and so forth. For each, you calculate:
AIC = −2 ln(L) + 2k
BIC = −2 ln(L) + k ln(n)
where L is the likelihood of the model, k is the number of estimated parameters, and n is the number of observations. In practice, you pick the model that yields the smallest AIC or BIC ( often you’ll check both and see if they coincide).
Once you have your AR(p) specification, you’re ready to forecast future values. Let’s keep it simple with an AR(2) example:
$$ y_{t} = c + \phi_{1} y_{t-1} + \phi_{2} y_{t-2} + \epsilon_t. $$
Here’s how you might forecast one period ahead, at time T:
$$ \hat{y}{T+1|T} = c + \phi{1} y_{T} + \phi_{2} y_{T-1}. $$
Notice there’s no ε term in the forecast—because that’s an error with mean zero, so the best prediction for it is 0. For a multi-period forecast, we do it iteratively. For instance, a two-step-ahead forecast:
$$ \hat{y}{T+2|T} = c + \phi{1} \hat{y}{T+1|T} + \phi{2} y_{T}. $$
Then for the next step, you keep replacing future terms with their own forecasts. This iterative process can quickly accumulate uncertainty, so keep in mind that multi-step forecasts can get fuzzy.
Imagine you have monthly data on a short-term interest rate, and you suspect an AR(2) model:
It’s not perfect, but it’s a straightforward approach for short-term forecasting.
Forecast error is simply (actual − forecast). Over multiple observations, you’d check whether errors systematically deviate from zero. If they do, that’s a sign your model might have bias.
Typically, you’d place a confidence interval around forecasts to reflect the uncertainty. For an AR(1) model, the forecast variance for a 1-step-ahead forecast often looks like:
$$ \mathrm{Var}(\hat{y}{T+1|T} - y{T+1}) = \sigma_{\epsilon}^2 $$
but grows as you move further out. In practice, you’d estimate the variance of εᵗ and incorporate the effect of multiplying past forecast terms (because each forecasted lag has its own variance).
To evaluate how well your model’s doing, one popular measure is RMSE:
$$ \mathrm{RMSE} = \sqrt{\frac{1}{n}\sum_{t=1}^{n}( \hat{y}_t - y_t )^2}. $$
A smaller RMSE means your forecasts, on average, deviate less from actuals. Note that it does not penalize positive and negative forecast errors differently—it’s a symmetric measure.
After you fit the AR(p), you want to confirm that your residuals (the difference between the fitted values and actual observations) are basically white noise—no remaining autocorrelation patterns. One widely used test is the Ljung-Box Q test, which checks the joint significance of autocorrelations at multiple lags. If your residuals are showing strong autocorrelation, your AR(p) might be incomplete or misspecified.
Additionally, you might:
• Plot the autocorrelation function (ACF) of residuals to see if any patterns remain.
• Use the partial autocorrelation function (PACF) to glean any additional structure.
• Check for any sign of structural breaks—maybe the data generating process changed setpoints at some time.
• Overfitting: If you choose a very high p, you may capture historical quirks that won’t repeat. This leads to poor forecasts out-of-sample.
• Ignoring Seasonality: Some series, like monthly sales or GDP, have inherent seasonality. You’d want an ARIMA or seasonal AR approach.
• Structural Breaks: Economic policy changes or major crises can shift relationships. An AR model calibrated on old data might fail to reflect new conditions.
• Nonstationarity: If your series is trending or has a unit root, a simple AR may give misleading results or spurious regressions.
I once had a somewhat funny (though frustrating) experience where an AR(3) that fit historically well suddenly went haywire the moment a central bank changed policy. Our estimates for φ₁, φ₂, and φ₃ were basically out the window due to that break.
Suppose you have an item set describing a company analyzing monthly commodity prices. The data suggests an AR(2) model. A sample question might ask:
• “Based on the AR(2) model, generate a forecast for the next month’s commodity price.” You’ll have compressed data, and you’d do exactly what we outlined—plugging in the last observed prices, using the estimated φ’s, and providing a forecast.
• Another question might say: “Examine the residual plot and the autocorrelation in the residuals. What might you conclude about the appropriateness of the AR(2) specification?” If you see leftover autocorrelation, you’d suspect an unmodeled effect—maybe seasonality or a higher AR order.
• Or the item set could present multiple AR(p) model fits with different AIC/BIC values. Your job is to pick the “best” model. Typically, you’d choose the one with the lowest AIC or BIC.
In a test or practical context, remember that the AR process is all about how recent history influences the current value. Keep an eye on stationarity (no unit roots or drifting means), systematically test lags using AIC/BIC, and double-check residuals to confirm you haven’t missed any hidden structure.
On the exam, you can expect to:
• Identify stationarity issues or a potential unit root from a vignette’s data.
• Calculate short-term forecasts from a given AR(1) or AR(2).
• Evaluate residual diagnostics using the Ljung-Box test or an ACF/PACF chart.
• Spot differences in AIC/BIC and pick an appropriate model order.
As you practice, watch out for tricky scenarios involving seasonality or structural breaks. Good luck, and don’t forget: sometimes simpler is better. Keep your model as lean as possible—particularly in fast-paced exam conditions.
• CFA Program Curriculum, Level II, “Time‑Series Analysis.”
• Box, G.E.P., Jenkins, G.M., and Reinsel, G.C., “Time Series Analysis: Forecasting and Control.”
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.