Avoiding common mistakes in regression, time-series, and correlation analysis when tackling CFA Level II item sets. Tips for verifying assumptions, reading disclaimers, and distinguishing correlation from causation.
It’s easy to get tripped up in the rush of reading CFA vignette questions—especially under the pressure of the clock. One of the sneakiest mistakes I’ve seen involves mixing up the signs of regression coefficients or ignoring the standard error measures entirely. I remember meeting a test-taker who was so focused on the magnitude of a coefficient that he forgot to note its negative sign. He was thrilled to find what he thought was a strong, positive relationship between market returns and a stock’s performance—until he realized he had interpreted a negative coefficient as a positive one.
Similar oversights happen when ignoring standard errors or p-values. In many item sets, you might see a table that provides the regression coefficient, the standard error, and a t-statistic. If you jump directly to the coefficient and skip its associated standard error or t-statistic, you might incorrectly conclude that a parameter is significant. That’s a quick route to choosing the wrong answer.
Also, watch out for incorrectly applying formulas from memory. If the question indicates, “Use a t-distribution with n – k – 1 degrees of freedom,” be sure you’re referencing the right degrees of freedom. A lot of partial-knowledge traps arise where the candidate uses a z-statistic—maybe from Level I or because they see a large sample size—when the question specifically states a small sample with the t-distribution. Even with large data sets, the exam may set up a scenario that calls for a different test distribution.
Let’s face it: correlation is a big deal in finance. Spotting patterns in returns or fundamental factors can lead to profitable insights. But correlation does not prove causation. You might see a compelling chart or regression table in a vignette claiming that “X strongly correlates with Y.” However, the pitfall is to jump to “X causes Y.” Vignette authors often love to set a trap by associating a correlation (maybe 0.85 or something eye-catching) with the idea that one variable drives the other.
Imagine you see a scenario analyzing two economic variables—for instance, average consumer credit and real estate sales volumes—both trending upward over a decade. The data might show a strong correlation. The temptation is to say something like, “Because consumer credit increased, real estate sales also rose.” But what if there was strong economic growth overall, and that factor influenced both credit and real estate markets simultaneously? The question might specifically mention an external factor like interest rate policy changes. If you skip that detail, you’ll jump to the correlation-equals-causation trap. The exam often includes disclaimers (e.g., data is from a booming decade) that are subtle hints you can’t automatically conclude a cause-and-effect relationship.
Another big pitfall is forgetting to check assumptions, particularly in multiple regression (see Chapter 2 for more details) and time-series analysis (Chapter 6). The presence of heteroskedasticity or autocorrelation can throw off test statistics and confidence intervals. If you see a question that presents a suspicious pattern of residuals—maybe they increase in magnitude over time or cluster in certain periods—this often indicates something is wrong with the usual regression assumptions.
Heteroskedasticity means the variance of the residuals is not constant across observations. When that happens, our standard errors are messed up—they might be smaller or larger than they should be, causing us to incorrectly judge coefficient significance. Autocorrelation (often referred to as serial correlation) arises when residuals are correlated over time, which can also skew test results. If a vignette states that residuals show a pattern or if the Durbin-Watson statistic is suspiciously low, be cautious about using ordinary least squares (OLS) results without adjustments.
A typical exam mistake is to assume you can interpret the slope coefficients normally even if standard assumptions break down. Perhaps the vignette specifically mentions that the “Breusch-Pagan test indicates significant heteroskedasticity.” That’s a major red flag not to rely on uncorrected standard errors. So always read the disclaimers and check if the vignette is nudging you to use remedial measures, like robust standard errors or a specialized method.
The exam, in item-set format, is filled with “close but not quite” answer choices. Let’s say you recall that the test statistic for a slope coefficient is:
But you’re fuzzy on the degrees of freedom. The exam provides four plausible answers that differ by the degrees of freedom used or a slight difference in the formula. If you rely on partial knowledge, you might zero in on an answer that looks right at a glance but fails a deeper check. It’s exactly the kind of question that can punish overconfidence.
Another partial-knowledge trap is mixing up the sample standard deviation with the population standard deviation. Or you might recall the process for the F-test in multiple regression but forget that with autocorrelation, the entire F-statistic might not be valid in its normal form. As a result, you end up picking an answer that was “kind of close” to the standard approach but not correct for the specific scenario given. Precisely these small details matter, and the exam is well-designed to exploit them.
Sometimes you open a vignette and see a line in the first or second paragraph like, “Our data was collected during a period of unusually low interest rates,” or “Management changed its accounting policies midway through the sample period.” Here’s a confession: in my early study days, I breezed right past disclaimers like these, thinking they weren’t relevant to the problem’s main calculations. Big mistake.
Details about data collection, sample biases, or a limited timeframe can drastically change your interpretation. The exam might expect you to question the stability of the regression or the representativeness of the sample. What if the analysis was done only during a bull market? You can’t easily extrapolate the results to a bear market. A mention of an “economic regime shift” often signals that the underlying relationships in your data may have changed. For instance, if a regression is run across two very different economic regimes, any single set of coefficients might not hold consistently across both. That’s where faulty inferences happen—unless, of course, you pick up on the disclaimers.
Many item sets revolve around time-series data. Nonstationarity is a biggie. If a series has a unit root or a strong trend, you can’t just run an OLS regression and assume standard inference applies. Chapter 6 delves deeper into stationarity, but in a nutshell, if the data is nonstationary, your test statistics can be meaningless. I’ve seen many exam takers get tricked when a question states “the data show a rising pattern over time, with mean and variance changing,” yet they proceed as if the standard regression assumptions hold. Potential stationarity issues should prompt alarm bells.
Creating a short mental or written checklist during the exam can help ensure you’re avoiding common mishaps. It might be something you quickly reference before finalizing your answers:
Below is a simple Mermaid diagram of how these “pitfall checks” typically flow in your mind:
flowchart LR A["Start: Read Vignette Carefully"] B["Identify Data Type <br/> (Cross-Section vs. Time-Series)"] C["Assess Assumptions <br/> (Heteroskedasticity, Autocorrelation)"] D["Check Relevance <br/> of Coefficients & p-values"] E["Look for Disclaimers <br/> or Regime Shifts"] F["Finalize Interpretation <br/> and Avoid Correlation≠Causation Pitfall"] A --> B B --> C C --> D D --> E E --> F
Referring to this flow—either conceptually or literally—can be a lifesaver in the exam. Also, remember to cross-check formula references with what the vignette or the question specifically tells you to use. If you see something like “An analyst used White’s robust standard errors,” that’s your cue to think about heteroskedasticity and how it’s being addressed.
Consider a scenario where a company’s stock performance is regressed on the S&P 500 returns and the 10-year Treasury yield. The vignette reveals:
• Data is from 2010–2020
• Standard errors appear suspiciously small, and a test suggests strong autocorrelation.
• The question states: “Due to historically low interest rates throughout this period, be cautious in interpreting the Treasury yield coefficient.”
In the item set, one answer choice might say: “The coefficient on the 10-year Treasury yield is reliably negative, indicating that whenever yields drop, the firm’s stock experiences a decline.” Another choice might say: “The coefficient estimate is likely unreliable due to serial correlation and a sample covering a period of atypical interest rates.” The second is probably the better choice because it acknowledges the disclaimers and the presence of autocorrelation, which can invalidate naive interpretations.
Alternatively, consider a time-series example with daily bond returns. The vignette might hint that the series has a unit root (“the average and variance appear to change over the sample”). A partial-knowledge trap is to accept standard OLS t-statistics at face value: maybe the regression shows a “spurious” relationship that arises only because both series trend upward with time. The correct approach—recognizing the potential nonstationarity—would lead you to question the entire regression method or look for something like a cointegration test (covered in advanced time-series chapters).
Sometimes, seeing a quick snippet of code clarifies how partial knowledge can trick us. Suppose you do a quick correlation test in Python:
1import numpy as np
2
3asset_a = np.array([0.01, 0.002, -0.005, 0.015, 0.009])
4asset_b = np.array([0.008, 0.0, -0.01, 0.02, 0.012])
5
6corr_value = np.corrcoef(asset_a, asset_b)[0,1]
7print(f"Correlation: {corr_value}")
This code might spit out something like “Correlation: 0.95,” which is quite high. But guess what? Before concluding that one asset’s returns cause the other’s returns, you should check if both assets are trending because of a common factor—like the overall market environment. In exam terms, the item set might provide just that high correlation figure, then a fluff statement about “Asset A typically leads the movements of Asset B,” luring you into concluding causation. Don’t fall for it.
CFA Level II item sets are designed to push your analytical abilities and see if you can integrate multiple details—regression output, disclaimers about data, and the big difference between correlation and causation. Always slow down just enough to read each detail carefully. The disclaimers, the sign on coefficients, or the mention of outliers might be where the real test lies. It’s much more about being aware and methodical than simply memorizing facts.
Ultimately, skipping assumption checks or ignoring disclaimers can cost you big time. By following a structured approach—looking at the sample type, verifying stationarity (if time-series), assessing disclaimers, and carefully reviewing standard errors—you’ll drastically reduce the chance of falling for a trap. And trust me, the exam writers are experts at setting them.
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.