Explore how incorrectly specifying regression models leads to biased estimates, inefficient results, and poor decisions, with real‑world finance examples for CFA Level II candidates.
Imagine you’re trying to predict tomorrow’s stock returns and, well, your model is just not cooperating. You keep seeing bizarre results—or results that sound too good to be true (which usually means they’re false!). This scene often hints at “model misspecification,” a fancy phrase for “You might be missing some essential structure in your regression.” In more precise terms, model misspecification simply means the regression equation doesn’t mirror the real mechanisms behind your data. Perhaps you excluded a critical factor, or you used a linear model when you should have used a non-linear one.
For CFA Level II candidates, spotting and correcting these issues is crucial. Misspecified models can skew everything from factor exposures to risk estimates. If you’re an investment analyst relying on data insights to pick stocks, an overlooked variable or a wrong functional form can lead you astray. Let’s dig in and see where these mistakes can creep up, how to detect them, and what they mean for your forecasting and policy decisions.
Model misspecification occurs when the functional form or variables included in a regression do not properly reflect the true relationship between the dependent variable and the independent variable(s). In other words, it’s like trying to fit a square peg into a round hole: the mathematical structure you assume just won’t match the reality of the data. And because markets aren’t forgiving, your model parameters—coefficients, inference statistics, everything—can be off.
Below is a quick diagram showing how misspecification can happen in a simplified regression context:
flowchart LR A["True Data-Generating Process (DGP)"] --> B["Actual Stock Returns"] A --> C["Critical Factors (e.g., size, value, sentiment)"] D["Misspecified Model"] --> E["Estimation <br/> (OLS, etc.)"] C --> E E --> F["Biased or Inefficient <br/> Coefficient Estimates"] B --> F
As you can see, the model labeled “D” may not account for certain critical factors (like “C”), leading to inaccurate (biased or inefficient) coefficient estimates in “F.”
Omitted Variable Bias (OVB) is what you get when you leave out something important that actually matters to your regression. If that left‑out variable is correlated with one of the variables you’ve kept in the model, your coefficient estimates for the included variable(s) are likely to be off—sometimes significantly. This is the most classic form of misspecification.
Let’s say you’re modeling expected stock returns with factors including market beta and momentum. But you forget firm size—a known influencer of stock returns. If the size factor happens to be correlated with momentum, your estimated momentum coefficient might be either spuriously inflated or deflated.
Why does this matter? If you’re an analyst deciding whether a small cap is undervalued, ignoring the size factor may cause you to misjudge the real effectiveness of momentum signals. The end result: misguided portfolio positions and potential loss.
Mathematically, consider you have a true data‑generating process (DGP):
(1)
y = β₀ + β₁x₁ + β₂z + ε
But you estimate:
(2)
y = α₀ + α₁x₁ + ν
If z is correlated with x₁, the coefficient α₁ in the misspecified model will be biased (i.e., E[α₁] ≠ β₁). This means your entire analysis built on α₁ is potentially misleading.
Including irrelevant variables—those not correlated with your dependent variable—doesn’t necessarily create bias. But it can introduce a fair bit of “noise.” In other words, your model might feel inflated or cluttered. This often increases the variance of your estimated coefficients, which means:
• Higher standard errors
• Lower t‑statistics
• Less precise estimates
The classic pitfall is overfitting. You throw everything into your model (maybe because you think more is better) and end up with an equation so tailor‑made to your sample that it fails to generalize out of sample. When you go to forecast, performance can tank. Overfitting is the ultimate frenemy: it looks super on in-sample metrics, but out of sample, it might collapse.
Measuring your variables incorrectly is notorious for causing trouble. Depending on whether the error is random (classical measurement error) or systematic (biased measurement tool), the implications differ.
• Classical Measurement Error
– If it’s in the dependent variable, your coefficient estimates remain unbiased but become less precise.
– If it’s in the independent variable, the coefficient estimates tend to be biased toward zero (attenuation bias).
• Systematic Measurement Error
– This occurs when your measuring tool systematically underestimates or overestimates the true value. For example, systematically under‑reporting intangible assets on a balance sheet.
– You end up with biased estimates, and in our portfolio or equity analysis context, that can cause big misinterpretations of risk exposures or factor loadings.
I once worked on a project involving credit default swaps (CDS) spreads where the data vendor systematically reported certain emerging market CDS quotes on a rolling two-day lag. Didn’t realize the mismatch until results started looking… messy. In hindsight, that was measurement error at its finest—systematic, not random. We had to rebuild the entire dataset with correct timing, and only then did the regression coefficients behave as expected.
If your model is purely linear but the real relationship is curved (like a quadratic), you’ll see patterns in your residual plots (e.g., residuals fanning out or systematically curving around zero). This is a giveaway that you might need a transformed or polynomial term.
For instance, the relationship between a company’s market capitalization and returns might have diminishing effects after a certain point. If you don’t capture that quadratic effect, you could get biased inferences.
Sometimes two variables interact. For example, in corporate finance, an increase in interest rates might affect large firms differently than small firms. If you think these effects are simply additive, you could be missing cross-effects. Adding an interaction term can significantly improve the model.
• Residual plots: if you see systematic curvature or patterns, that’s a big red flag.
• Statistical tests or information criteria (like Akaike Information Criterion—AIC, or the Bayesian Information Criterion—BIC) might suggest adding polynomial or interaction terms.
• Domain knowledge: if you suspect an effect that’s not purely linear, test it out.
Once your model is off, you can expect:
• Biased Coefficients: Wrong estimates of factor exposures or risk sensitivities.
• Distorted Standard Errors: Leading to misjudgment of significance levels.
• Spurious Relationships: You might find a “significant” relationship that’s an artifact of misspecification.
• Poor Forecast Accuracy: GIGO—Garbage In, Garbage Out. If your specification is wrong, your predictive power falls apart.
• Suboptimal Decisions: In portfolio management or policy analysis, flawed inferences can translate to the wrong capital allocation, misinformed strategic moves, or risk that is systematically overlooked.
You’ll likely face complex item sets on the CFA exam that test your ability to pick apart regression outputs. So, you want to be a pro at recognizing:
• Telltale signs of omitted variables (maybe through strange residual patterns or large shifts in coefficients when a new variable is introduced).
• Overfitting and how it can lead to seemingly strong R² but weak out‑of‑sample performance.
• Correctly specifying the functional form and looking out for potential interactions.
• Ensuring your data is accurate and not systematically off, which can hamper your entire analysis.
In real-world finance, your job may be to present or vet models that drive investment strategies, risk measurement, or valuation. If the model is misspecified, you might be underestimating risk exposures or inflating expected returns. That can lead to serious underperformance or unrecognized vulnerabilities in a portfolio.
• Consult Residual Plots: A quick check can reveal structural patterns or major red flags.
• Start Simple, Add Gradually: Begin with parsimonious models and only add variables if there’s a theoretical or data-driven rationale.
• Watch for Overfitting: Use holdout samples or cross-validation when possible. If your model’s brilliance disappears in the holdout, it’s probably overfitted.
• Use Domain Knowledge: If there’s a well-known driver (e.g., sector dummy in equity returns), don’t omit it just because you can’t find clean data. Data cleaning is often worth the extra effort.
• Sensitive Analysis: Vary the specification. If your main results drastically change with minor specification tweaks, you might be dealing with misspecification.
Assume a regression (simplified for demonstration):
y = β₀ + β₁ × (Interest Rate) + β₂ × (Firm Size) + ε
Where y is stock return, Interest Rate is your measure of the current interest environment, and Firm Size is a log of market cap. Suppose in reality you also need an interaction term:
y = β₀ + β₁ × (Interest Rate) + β₂ × (Firm Size) + β₃ × (Interest Rate × Firm Size) + ε
If β₃ ≠ 0 but you fail to include it, your coefficients on β₁ and β₂ might incorrectly reflect the effect of interest rates and size on returns. A negative interest rate environment might be catastrophic for smaller firms but less so for large, diversified blue-chips. Without the interaction, your final recommendation might ignore the unique vulnerabilities of small firms, leading to a potentially flawed portfolio tilt.
Below is a short snippet of Python-like pseudocode that checks for an interaction:
1import statsmodels.formula.api as smf
2import pandas as pd
3
4
5model_simple = smf.ols("ret ~ irate + size", data=df).fit()
6print(model_simple.summary())
7
8df['interaction'] = df['irate'] * df['size']
9model_interact = smf.ols("ret ~ irate + size + interaction", data=df).fit()
10print(model_interact.summary())
If the coefficient on “interaction” is significant and the residual plots look healthier, you’ve likely corrected a functional form misspecification.
• Omitted Variable Bias: Distortion in estimated coefficients from excluding a relevant factor.
• Functional Form: The assumed mathematical structure (linear, log‑linear, polynomial, etc.) of a regression.
• Measurement Error: Discrepancy between actual and recorded values, which can bias or reduce the efficiency of estimates.
• Overfitting: Including too many variables or using an overly complex specification that fits noise rather than true relationships.
• Bias: Systematic deviation between an estimator’s expected value and the true parameter.
• Efficiency: The precision of an estimator (lower variance is more efficient).
• Systematic Error: Error that consistently skews the measurement or estimates in one direction.
• Noise: Random variation not explained by the model.
• Read the Vignette Carefully: Identify relevant variables the passage hints at. A missing factor or interaction may point to potential misspecification.
• Inspect the Tables: Look at the residual patterns or note if R² improvements are suspiciously large when you add or remove variables.
• Time Management: Don’t get bogged down in re-deriving formulas. Focus on conceptual understanding for quick detection of misspecification.
• Know the Common Culprits: Omitted variables, incorrect functional forms, and measurement error are tested frequently.
• Keep the Big Picture in Mind: If the scenario implies a nonlinear effect or cross-factor synergy, check for an interaction term or polynomial expansion in the answer choices.
• Greene, W. H. Econometric Analysis (7th ed.). New York: Pearson.
• Wooldridge, J. M. Introductory Econometrics: A Modern Approach (6th ed.). Boston: Cengage.
• Kennedy, P. A Guide to Econometrics. Malden, MA: Blackwell Publishing.
• Academic Journals: The Journal of Finance, The Review of Financial Studies.
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.