Explore a full multiple regression scenario in a CFA-style vignette, from data interpretation through residual diagnostics and exam strategies.
Let’s imagine a portfolio manager, Lisa, who wants to predict the quarterly returns of XYZ Mutual Fund. She suspects several factors play a big role in driving these returns, such as:
• Overall market performance (proxied by the S&P 500 quarterly return)
• Changes in interest rates (quarterly change in short-term interest rates)
• Sector performance (e.g., a specific industry index return)
• Quarterly GDP growth
Lisa’s goal is to build a multiple regression model using Ordinary Least Squares (OLS) with historical data spanning the last 20 quarters (five years). She hopes the model will help forecast future fund performance and guide investment decisions.
Now, let’s see how this might show up on a CFA-style vignette. We’ll show some data, walk through it carefully, interpret exactly what’s going on, and figure out if any assumption is violated. Strap in—this is exactly how you might see it on exam day, but we’ll take it step by step.
Below is a simplified summary of the data Lisa collected for 20 quarters:
• Dependent Variable: Fund Return (Y), measured in percentage points each quarter.
• Independent Variables:
Lisa’s regression equation (hypothesized) is:
Y = β₀ + β₁·X₁ + β₂·X₂ + β₃·X₃ + β₄·X₄ + ε
The following table shows some aggregated summary statistics from the regression output (for illustration, not actual raw data points for all 20 quarters):
Statistic | Value |
---|---|
β₀ (Intercept) | 0.50 |
β₁ (Market Return Coefficient) | 0.80 |
β₂ (Interest Rate Coefficient) | -0.40 |
β₃ (Sector Return Coefficient) | 0.65 |
β₄ (GDP Growth Coefficient) | 0.10 |
Standard Errors (β₁, β₂, β₃, β₄) | 0.15, 0.12, 0.20, 0.05 |
R-squared | 0.72 |
Adjusted R-squared | 0.68 |
F-Statistic (p-value) | 12.56 (0.001) |
Durbin–Watson Statistic | 1.10 |
Observations (n) | 20 |
From a quick glance, you might see a few potential highlights: a relatively high R-squared (0.72), some interesting coefficients (especially that negative sign for interest rate changes), and a Durbin–Watson statistic possibly pointing toward some autocorrelation concerns (typically, we like to see a Durbin–Watson around 2 for no autocorrelation).
On the exam, the first thing you’d do is read the entire item set from top to bottom—paying special attention to:
• The time frame (20 quarters, or 5 years of data)
• The dependent variable (Fund Return)
• The independent variables (Market Return, Interest Rate Changes, Sector Return, GDP Growth)
• The given statistical measures (R-squared, standard errors, Durbin–Watson, p-values, etc.)
• Any context clues about data issues (e.g., mention of outliers, non-constant variance, patterns in residuals)
It might sound obvious, but it’s so easy to skip details under time pressure. Anyway, once you’ve identified these bits, you’ll see how they connect—and which ones might indicate trouble if the assumptions aren’t met.
From the vignette (and from Lisa’s assumptions), you’ll note:
(1) Y = β₀ + β₁·X₁ + β₂·X₂ + β₃·X₃ + β₄·X₄ + ε
Where:
• Y = Quarterly Return on XYZ Mutual Fund
• X₁ = S&P 500 Quarterly Return
• X₂ = Change in Short-Term Interest Rates
• X₃ = Sector Quarterly Return
• X₄ = GDP Growth
Lisa performed an OLS regression and found:
• Intercept, β₀ = 0.50
• β₁ = 0.80 (Statistically significant if t-stat > 1.96 in absolute value at ~5% significance)
• β₂ = -0.40 (Check if statistically significant)
• β₃ = 0.65 (Check if statistically significant)
• β₄ = 0.10 (Check if borderline or significant)
Suppose each coefficient has the following standard error:
• SE(β₁) = 0.15
• SE(β₂) = 0.12
• SE(β₃) = 0.20
• SE(β₄) = 0.05
And let’s do a quick t-stat check:
• t(β₁) = 0.80 / 0.15 ≈ 5.33 → likely highly significant (p-value < 0.01).
• t(β₂) = -0.40 / 0.12 ≈ -3.33 → likely significant (p-value < 0.01).
• t(β₃) = 0.65 / 0.20 ≈ 3.25 → fairly significant (p-value between ~0.01 to 0.005).
• t(β₄) = 0.10 / 0.05 = 2.00 → borderline significant (p-value at or around 0.05).
Interpretation:
• β₀ = 0.50: If all other factors are zero, you’d expect the fund to return about 0.50% in a quarter.
• β₁ = 0.80: For every 1 percentage point increase in the market (S&P 500) quarterly return, the fund’s return tends to increase by 0.80 percentage points (holding everything else constant).
• β₂ = -0.40: For every 1 percentage point increase in interest rates (which is big), the fund’s return is expected to decrease by 0.40 percentage points. This negative sign implies an inverse relationship—makes sense if rising rates typically weigh on equity returns.
• β₃ = 0.65: For every 1 percentage point increase in the relevant sector’s quarterly return, the fund’s return increases by 0.65 percentage points.
• β₄ = 0.10: For each 1 percentage point increase in quarterly GDP growth, the fund’s return increases by 0.10 percentage points. A smaller coefficient, but still potentially significant.
Now we come to the detective work: see if the assumptions are met.
Linearity: Does the relationship between each X and Y appear linear? We assume yes for now, though in a real scenario you might see a residual plot or a mention of curvature in the data.
No Perfect Multicollinearity: Possibly check correlation among the independent variables. If the item set says “The correlation between interest rates and GDP growth is 0.90,” that’s a red flag for multicollinearity. Our scenario doesn’t mention extreme correlations, so we’ll assume we’re good.
Error Term Has a Mean of Zero: Typically, the regression intercept is meant to handle that, so no direct concern from a quick read.
Homoskedasticity: We want constant variance of the residuals. If the vignette says “No pattern is observed in the plot of residuals vs. fitted values,” that’s encouraging. If we see something like “Non-constant variance (heteroskedasticity) is present,” that’s obviously an assumption violation.
No Autocorrelation: The Durbin–Watson (DW) statistic is 1.10, which often suggests possible positive autocorrelation (DW < 1.5 might be suspicious). Because we have time-series data (quarterly data often have momentum or correlation across time), there could be a big question mark about autocorrelation here.
Normality of Errors: Typically tested with something like a Q-Q plot or Shapiro–Wilk test, but often exam vignettes simply say “residuals appear normally distributed,” or they provide a chart. We’ll assume normal errors unless told otherwise.
Given the data, the largest red flag is the Durbin–Watson statistic of 1.10, which might indicate some auto-correlation in the errors. If the exam question highlights this, we might need to mention that “Yes, the model might violate the assumption of no autocorrelation.”
• Autocorrelation: With DW = 1.10, and a sample size of 20, the critical values might suggest the presence of positive autocorrelation. This is common in time-series. You might need a remedy like a Newey–West approach or a first-difference model or an AR(1) error correction.
• Heteroskedasticity: We don’t see direct evidence, but we can’t be sure. If the vignette specifically notes “there is a pattern in residuals that widen over time,” that would be a sign.
If the question asks, “Is the standard error estimate likely biased given the residual pattern?” you’d probably say “Yes” if there was persistent autocorrelation or heteroskedasticity.
From the data, we see a fairly strong R-squared of 0.72, and most coefficients appear significant. The model could be quite useful. However, the presence of autocorrelation might suggest that the standard errors we’re using could be understated—meaning that some of the “t-stats” might be higher than they should be if we accounted for the correlation.
In a real exam scenario, you’d highlight:
• The overall significance looks good (F-stat is 12.56 with p = 0.001).
• Each factor matters in the expected direction (market and sector returns positively correlated, interest rates negatively correlated, GDP growth is positive but not huge).
• You do need to address serial correlation, which can be done by adjusting your standard errors or using a time-series approach that accounts for autocorrelation.
So how in the world do we do all this under a time crunch? Here are some tips:
• Scan for Keywords: Terms like “non-constant variance,” “increasing spread,” “patterned residuals,” “Durbin–Watson = 1.1,” or “p-values below 0.05.” These are signposts pointing to assumption violations or significance.
• Mark Up the Vignette: Under exam conditions, highlight or underline the key data. Jot quick notes in the margins about which assumption might be in trouble.
• Watch the Clock: Resist the temptation to overthink. Item sets have multiple questions. If you get stuck on one detail for too long, you lose valuable time.
• Use the Tools: You have a calculator—use it for quick t-stat approximations. If you see a coefficient is 0.80 and standard error is 0.10, you can do 0.80 / 0.10 in two seconds.
• Logical Flow: Approach each question systematically:
flowchart LR A["Read the Vignette <br/>Identify Data & Variables"] --> B["Formulate the <br/>Multiple Regression Model"] B --> C["Estimate Coefficients <br/>Interpret Results"] C --> D["Check Assumptions <br/>Residual Diagnostics"] D --> E["Conclude & Adjust if Needed"]
Think you might need a roadmap? Here’s a mini-solution key blueprint:
Always link these steps back to the data. The exam might present 4–6 multiple-choice items about these details (like “Which assumption is most likely violated?” or “Which coefficient is not statistically significant?”).
• Vignette-Style Questions: CFA item sets that present a mini-case with relevant data. You have to interpret that data to answer multiple sub-questions.
• Exam-Taking Strategy: A structured approach that focuses on scanning for critical data, diagnosing any assumptions or typical pitfalls, then calculating or interpreting results in a streamlined way.
• Past CFA Institute Level II mock exams: Look for questions involving multiple regression output.
• Third-party CFA prep providers: Many offer specialized guides on answering item set questions under time constraints.
• Online tutorials that show you step-by-step how to read regression output. These can help you practice pattern recognition for assumption violations.
Remember, the best way to build accuracy and speed is to practice with as many item sets as possible. Time yourself, focus on reading carefully, note the given data, identify any disclaimers or red-flag phrases like “residual pattern,” “serial correlation,” or “non-constant variance,” and interpret accordingly. And hey, don’t be too hard on yourself—part of becoming proficient is messing up occasionally and learning from it. Good luck, and see you in the next chapter!
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.