Discover how to detect multiple regression assumption violations through residual plots, understand common patterns and formal tests, and avoid misinterpretations in CFA Level II Quantitative Methods.
Well, let me tell you, I once ran a quick regression in my early days, trying to explain my caffeine consumption as a function of the time of day. I plotted the residuals (i.e., the differences between my actual coffee intake and the model’s predictions) and discovered a weird, funnel-like shape. It turned out that the relationship changed drastically in the afternoon, causing bigger and bigger errors as the day wore on. That was my introduction to the importance of checking residual plots.
In typical multiple regression analysis, diagnosing the behavior of residuals is crucial for ensuring we meet the underlying assumptions—namely homoskedasticity (constant variance), no autocorrelation, and correct functional form, among others. If these assumptions break down, so does our confidence in the estimated coefficients and their statistical significance. This section spotlights how to spot and interpret those red flags quickly using residual plots. It may save you from a misguided conclusion on the CFA exam (and in real-life investment decisions).
Residuals are simply the differences between observed data points and the fitted values from your regression. If your model is well-specified and the assumptions are met, these residuals should look random, hovering around zero with no apparent pattern.
But if, for example, you see a predictable pattern in how residuals change from left to right on a plot, you might be spotting a violation:
• A funnel-shaped “cone” of residuals getting wider indicates heteroskedasticity (unequal variance).
• A systematically up-and-down wave in residuals hints at serial correlation (autocorrelation).
• Large spikes or outliers might point to data entry errors or an incomplete model specification.
Identifying such patterns is simpler than you might think—just plot your residuals and see what’s going on visually. Then, if you see red flags, you can follow up with formal tests. Let’s dig into that process.
It’s usually a good idea to start with three main residual plots:
Residuals vs. Fitted Values
Residuals vs. Each Independent Variable
Residuals over Time (if you have time‑series data)
In this plot, you place your fitted (predicted) values of the dependent variable along the x‑axis, and the residuals (eᵢ) on the y‑axis. Picture each dot at coordinates: (ŷᵢ, eᵢ).
• If the assumptions hold, the dots should be randomly dispersed around zero, forming a horizontal band.
• A “fanning” (or funnel) shape suggests that residuals grow in magnitude as fitted values increase (heteroskedasticity).
• A curved shape could suggest that you’re missing a key nonlinear component in the model (functional form misspecification).
In addition to plotting residuals against fitted values, it’s helpful to check them against each predictor (Xʲ). This highlights whether any single predictor exhibits a systematic pattern:
• If we see an arched pattern for X₂, for example, that might indicate a polynomial term is necessary.
• Increasing or decreasing variance as Xʲ changes can also be an early sign of heteroskedasticity or outliers.
• Non-random groupings of residuals might indicate that you need a dummy variable to capture distinct subgroups in the data.
When your data has a time component—such as daily stock returns or monthly sales—always check how residuals evolve across time:
• If residuals exhibit repetitive patterns or cycles, you may be dealing with autocorrelation.
• Sudden spikes could reflect an external shock (e.g., a policy announcement, market meltdown), an outlier, or a regime change.
• Sometimes, a single big outlier can throw off the entire model, so investigating timestamps around that outlier is helpful.
It’s one thing to say “Look out for patterns” and quite another to recognize them in the wild. Here’s a quick cheat sheet:
• Funnel Shape (aka Cone or “Megaphone” Shape): This typically signals heteroskedasticity. The variance of errors is not constant across different levels of fitted values or a specific independent variable.
• Cyclical or Wave-Like Pattern: Suggests positive or negative serial correlation. Observations that are close in time might be overly similar or different in a systematic way.
• Vertical Outliers: Data points with extremely large residuals might be poorly measured data or might be out-of-sample scenarios your model can’t handle.
• Shifting Mean of Residuals: If the average of your residuals seems to shift away from zero in certain ranges, your model might be missing a relevant variable or a structural shift.
The process for analyzing residuals is pretty straightforward:
flowchart LR A["Generate Residuals from OLS Model"] --> B["Plot Residuals vs. Fitted Values"] B --> C["Plot Residuals vs. Each Independent Variable"] C --> D["Plot Residuals Over Time (if time-series)"] D --> E["Inspect for Patterns<br/>(Funnel, Cycles, Outliers)"] E --> F["Apply Formal Tests (BP,<br/>DW, or Ljung-Box)"] F --> G["Adjust Model as Needed"]
Generate Residuals from the OLS Model: The residuals eᵢ = yᵢ - ŷᵢ are derived from your initial regression output. That’s your raw material for diagnostics.
Plot Residuals Against Fitted Values and Each Independent Variable: Check each plot. Focus on whether residuals appear scattered randomly around zero.
Check for Randomness of Residual Distribution: A distribution that’s clumping or fanning out means you have to dig deeper.
Investigate Patterns Violating the Assumptions: This is where you note whether the pattern might be heteroskedastic, autocorrelated, or otherwise suspicious.
Apply Additional Formal Tests (if needed):
• Breusch–Pagan test for heteroskedasticity.
• Durbin–Watson or Ljung–Box test for autocorrelation.
• White Test for general forms of heteroskedasticity.
Take Corrective Actions: Adjust your model specification or use robust standard errors. (More on these solutions in Chapter 4: Model Misspecification.)
If you spot or suspect a funnel shape, a Breusch–Pagan (BP) test can confirm heteroskedasticity. The general idea is to regress the squared residuals on the original predictors (or a set of variables suspected of driving the changing variance). A significant BP statistic indicates that residuals’ variance is not constant.
Primarily used for detecting first-order autocorrelation in the residuals when you have time-series or sequential data in a cross-sectional setting:
• DW ranges from 0 to 4.
• A value near 2 implies little to no autocorrelation.
• Values closer to 0 or 4 suggest strong positive or negative autocorrelation, respectively.
Another approach for time-series models, the Ljung–Box statistic is computed for various lag lengths to test for correlation in residuals at multiple time lags, not just the first lag.
Let’s be honest: if your standard errors are off because of heteroskedasticity, you might be misjudging which coefficients are statistically significant. Similarly, if autocorrelation is present, you risk underestimating standard errors and concluding something is more certain than it is. In a real-world context, that could mean an asset allocation strategy that overlooks latent risks or incorrectly timing a market entry. In the CFA exam context, you might get item-set questions where you have to identify these violations and propose appropriate adjustments.
Suppose you’re modeling daily returns of a single stock (Rₜ) on a market index (Mₜ):
Rₜ = α + βMₜ + εₜ
After fitting this regression, you plot εₜ (y-axis) vs. ŷₜ (x-axis). You notice the residuals start off snug around zero for low returns, but as you go to higher predicted returns, the residuals spread out widely in both directions. This is a classic funnel shape. You suspect heteroskedasticity because on high-volume or high-volatility days, your errors balloon.
To confirm, run a Breusch–Pagan test. If that test comes out significant, you know your standard OLS standard errors might be understating or overstating statistical significance, and you might consider using robust or generalized least squares methods.
• Residual Plots: Visual representations of eᵢ = yᵢ - ŷᵢ that help diagnose whether the model’s assumptions (constant variance, no autocorrelation, correct functional form) hold.
• Funnel-Shaped Residuals: A telltale sign of heteroskedasticity, where error variance increases with predicted values or a specific independent variable.
• Runs/Persistent Patterns: Residuals appearing in clusters or waves, indicating autocorrelation.
• Durbin–Watson Statistic: A test statistic that detects the presence of first-order autocorrelation. A value near 2 suggests no autocorrelation.
• Develop a habit of quickly plotting residuals once you run a regression. Visual checks can often alert you to issues before you even reach for a p-value.
• Memorize major signs: funnel shape → likely heteroskedasticity; wave-like pattern → possible autocorrelation; odd curves or shifts → functional form issues.
• If you see outliers, investigate them individually. Could be data entry errors, or you might be missing a crucial variable.
• The CMA (Correct Model Attitude) is: “Test, see, fix.” Don’t ignore warnings just to keep life simple. The market doesn’t care if you prefer a neat or easy model.
• Practice applying these diagnostic tests to item sets under time pressure. In the exam, you might see a mini-vignette with a regression output, some suspicious residual plot, and a question on how to interpret it.
• John Fox (2015). “Applied Regression Analysis and Generalized Linear Models.” A great resource on theory and practical diagnostics.
• Various Monte Carlo simulations from quantitative finance websites illustrate how ignoring assumptions can lead to erroneous inferences.
• Kaggle (https://www.kaggle.com/) hosts notebooks that show step-by-step residual diagnostics in Python or R.
And that’s pretty much the gist: when in doubt, do the residual plots. If they flash any bright patterns, investigate further with formal tests, and take corrective measures to ensure your results remain reliable.
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.