Learn how to estimate causal treatment effects in panel data settings using Differences-in-Differences (DiD), interpret the spillover of policy interventions, and handle pitfalls with parallel trends and unbalanced data.
If you’ve ever wondered whether a new regulation or policy really changed market behavior—or if financial metrics would have changed anyway—Differences-in-Differences (DiD) might be your best friend. In plain terms, DiD helps you compare what happened in a group affected by a policy (the “treatment” group) with what happened in a similar group that wasn’t affected (the “control” group). Then it subtracts out any common trends that both groups would have experienced over time.
I remember a time when I helped analyze a tax incentive policy that only impacted large manufacturing firms. We weren’t sure if it was truly the tax incentive that drove the resulting uptick in profitability, or if there was a general economic boom affecting everyone. By using DiD, we teased out that difference. And guess what? The policy did matter, but only half as much as the initial raw data suggested. That is precisely the power of DiD.
Now, let’s break down the mechanics, assumptions, and best practices you’ll need for your CFA® 2025 Level II exam and real-world applications.
DiD analysis is typically done in a panel data framework (sometimes cross-sectional time series) with observations across two groups over at least two time periods. One group experiences a “treatment” (like a policy, regulation, or event) at a specific point in time, while the comparison group does not.
• Treatment Group: Receives an intervention at time tᵢ.
• Control Group: Remains unaffected by the intervention.
• Pre-Treatment Period: Observations before the intervention.
• Post-Treatment Period: Observations after the intervention.
The math behind DiD is straightforward at the surface, but do note the critical assumption: in the absence of any intervention, the treated and control groups would have followed parallel trends over time. Meaning, if you plotted the outcome variable for both groups from pre- to post-intervention, you’d see lines that move pretty much in the same direction and magnitude, just offset by some constant difference. If that parallel relationship holds, any additional divergence you see after the policy is presumably due to the treatment itself.
A very common expression for Differences-in-Differences is:
$$ \text{DiD} = \bigl(\bar{Y}{T,\text{post}} - \bar{Y}{T,\text{pre}}\bigr) ;-; \bigl(\bar{Y}{C,\text{post}} - \bar{Y}{C,\text{pre}}\bigr), $$
where
– \( \bar{Y}{T,\text{post}} \) is the average outcome for the treatment group in the post-treatment period.
– \( \bar{Y}{T,\text{pre}} \) is the average outcome for the treatment group in the pre-treatment period.
– \( \bar{Y}{C,\text{post}} \) is the average outcome for the control group in the post-treatment period.
– \( \bar{Y}{C,\text{pre}} \) is the average outcome for the control group in the pre-treatment period.
This difference-between-differences is the estimated average treatment effect of the policy change, net of time trends that affect everyone and net of pre-existing differences between the groups.
DiD can also be estimated with a regression that typically looks like this:
• \( Y_{it} \): outcome of entity \( i \) at time \( t \).
• \( \text{Treatment}_i \): dummy (1 if entity \( i \) is in the treatment group, 0 if in control).
• \( \text{Post}_t \): dummy (1 if time period is “post-treatment,” 0 otherwise).
• \( \text{Treatment}_i \times \text{Post}_t \): interaction term capturing the DiD effect.
In this specification:
– \(\beta_3\) measures the causal treatment effect, i.e., the difference-in-differences.
– \(\beta_1\) picks up average differences between treatment and control (constant over time).
– \(\beta_2\) picks up any differences over time that affect both groups equally.
When reading a CFA exam vignette, look out for clues like “A new regulation is introduced in 2022 that applies only to certain banks based on asset size.” You might have a dataset from 2020, 2021 (pre) and 2023, 2024 (post). The question might ask, “How do you interpret the coefficient on the interaction term between Post and Treated banks?” That’s your DiD coefficient.
Below is a simple diagram to visualize the concept of DiD. Think of it like two lines, one for the control group, one for the treatment group, each measured before and after the policy:
graph TB A["Control Group <br/>Pre-Treatment"] --> B["Control Group <br/>Post-Treatment"] C["Treatment Group <br/>Pre-Treatment"] --> D["Treatment Group <br/>Post-Treatment"]
If we took only the difference between C→D, we’d overstate (or understate) the policy’s effect if both groups were generally on an upward (or downward) trend. By subtracting A→B (the difference for the control group) from C→D, we get the net effect.
Let’s say a new regulation was introduced in 2021 that only targets “large” mutual funds—perhaps something about additional disclosure or constraints on portfolio composition. We have data from 2019 and 2020 (pre-regulation) and 2022 (post-regulation). We want to see the average effect on fund performance (e.g., net return) among large vs. small mutual funds.
So we interpret 0.8% as the net effect of that new regulation, presumably after netting out any overall market improvements that would have raised everyone’s returns anyway. On the exam, you might see a table with these kinds of numbers and be asked to compute that difference or interpret the corresponding regression coefficient.
At the risk of repeating ourselves, let’s highlight arguably the single most important assumption: in the absence of treatment, the two groups would have continued along parallel outcome trends. If that assumption is violated—say, because large mutual funds were already heading in a significantly different direction than small mutual funds—then your DiD estimates might be biased.
Possible ways to spot or check parallel trends:
• Plot the outcome variable for both groups over time (before the policy). Look for roughly similar slopes.
• Regress the outcome on group/time interactions before the policy and see if the difference is stable.
• Use domain knowledge. Maybe large funds always adopt new technology or face different constraints. If those generate systematically different performance patterns, you might question the parallel trends assumption.
• Nonparallel Trends: As mentioned, if there’s a reason the treatment group would’ve grown faster (or slower) even without the policy, DiD can incorrectly attribute that difference to the policy.
• External Shocks or Macro Events: If something big happened (like a massive market crash) that affected the treatment group differently than the control group, we might again attribute that difference erroneously to the policy.
• Unbalanced Panels: Sometimes not all entities are observed in each period. If data is “missing” or “uneven,” your comparison might be skewed.
• Staggered Treatments: In real life, interventions might not occur at a single point in time for all subjects. Some states adopt a policy in 2020, others do in 2021, etc. Handling that in a standard DiD requires careful modifications (such as two-way fixed effects or specialized methods).
If you have a basic dataset in Python’s pandas, you can run a DiD regression with statsmodels:
1import statsmodels.formula.api as smf
2
3# plus 'entity_id' for cluster-robust standard errors if needed.
4
5model = smf.ols('outcome ~ treat + post + treat:post', data=df).fit(
6 cov_type='cluster',
7 cov_kwds={'groups': df['entity_id']}
8)
9print(model.summary())
Here, the term “treat:post” is your interaction capturing the DiD effect. If you see a significant coefficient on “treat:post,” that suggests a meaningful difference in the treatment group’s post-period outcome beyond what’s explained by time and group-level differences alone.
• Look for Vignette Clues: If a scenario states “Company X was subject to a new regulation in 2023, but Company Y was not,” and you see data from 2022 and 2024, that’s a strong hint that a DiD approach might be the intended method.
• Check the Parallel Trend: The exam might test whether you can identify a scenario that violates or satisfies the parallel trends assumption.
• Interpret the Coefficient: If the question specifically asks, “What does the coefficient of the interaction term represent?” your immediate reaction: “It’s the estimated treatment effect under DiD.”
• Watch for Over Control: Sometimes you might see a question about controlling for other factors. You want to ensure that you’re not controlling away the effect you’re trying to estimate or introducing endogeneity.
• Time Constraints: In exam settings, item set (vignette) questions can be quite data-dense. Practice quickly scanning for evidence of “pre” vs. “post” periods and “treated” vs. “control” groups.
In the broader context of panel data, you might also incorporate fixed effects for entities (firms, funds, etc.) or time if you suspect there are unobserved characteristics that vary at the firm level or the year level. However, be mindful that the standard DiD approach already accounts for many of these sources of bias, especially if data collection is consistent across groups.
Differences-in-Differences is remarkably powerful for making a cause-and-effect statement, which is crucial in finance research and real-world policy analysis. It’s kind of like hearing two sets of traffic noise: one from a busy street and one from a quiet neighborhood. If the entire city has a festival (time effect), noise levels will rise everywhere—but if your busy street had a special event on top of that festival, that extra jump you hear is basically the “treatment.”
As you approach the CFA® Level II exam, keep an eye out for scenarios that fit the DiD pattern. Understand the parallel trends assumption, how to interpret the interaction term, and the common pitfalls. If you nail these, you’ll be in great shape for any item set that tests your ability to identify or interpret a DiD design.
• Angrist, J.D., & Pischke, J.-S. (2014). Mastering ’Metrics: The Path from Cause to Effect. Princeton University Press.
• Gormley, T.A., & Matsa, D.A. (2014). “Common Errors: How to (and Not to) Control for Unobserved Heterogeneity.” Review of Financial Studies, 27(2), 617–661.
• Official CFA Institute Curriculum, 2025 Edition (Level II), for foundational treatments of regression and panel-data topics.
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.