This section explores how historical data influences risk estimation, the pitfalls of sampling error and survivorship bias, and strategies for adapting data to reflect changing market conditions.
Have you ever tried predicting something about the future by looking at past data—only to find yourself blindsided by changes you didn’t see coming? Well, we’ve all been there. In portfolio management, historical data is a valuable tool for estimating risk and return. However, it’s not always as reliable as it seems. Markets evolve, biases creep in, and sometimes the past simply violates that old disclaimer, “Past performance is not necessarily indicative of future results.”
This section examines how historical data is used in the risk estimation process and, crucially, when it might lead you astray. We’ll look at common pitfalls like sampling error and survivorship bias, explore structural shifts that can make old data less meaningful, and discuss methods such as Bayesian approaches and regime-switching models aimed at bridging the gap between the past and the (unknown) future.
Portfolio managers generally rely on two broad classes of estimates for risk and return:
• Historical Estimates: These use actual market data, such as historical returns and volatilities, to predict the future.
• Forward-Looking (Subjective) Estimates: These rely on investors’ or analysts’ expectations of market changes, economic cycles, and other forward indicators.
Historical data provides a foundation. It’s often easier to obtain, can be validated by auditing past records, and typically covers multiple market cycles. But forward-looking estimates incorporate current market sentiments, macroeconomic forecasts, and structural trends that aren’t always visible in older data sets. An effective risk estimation strategy often uses both, balancing the rigor of historical numbers with the insight and flexibility of subjective forecasting.
Survivorship bias is a classic trap. Imagine you’re building a performance analysis of hedge funds, but you only include the funds that still exist. Those that closed shop or went bankrupt are conveniently omitted from your sample. This can paint an overly rosy picture of returns (and sometimes an understated measure of risk), because the failures aren’t reflected.
From a personal perspective, I recall analyzing a set of equity funds after the dot-com bubble. My data was missing all the tech funds that went belly-up. So, the average return figure suggested the industry overall did “okay,” which was a bit, well, misleading. If you ignore the non-survivors, you may underestimate volatility and overestimate fund performance.
Sampling error arises when your sample (historical data set) doesn’t accurately represent the true population of possible outcomes. A small sample can amplify anomalies and outliers, causing you to draw conclusions that don’t generalize well.
In risk estimation, a short historical period may capture only one distinct market regime—say a prolonged bull market. If you run optimization or risk estimates on that limited sample, you might project unrealistically low volatility. Then, if the market experiences a downturn similar to 2008 or the initial COVID-19 shock, your risk estimates prove dangerously off the mark.
Even if you manage to collect a sufficiently large sample over many years, markets undergo structural changes. Technological innovations—like the rise of algorithmic trading or digital asset platforms—reshape liquidity, transaction costs, and even the correlation dynamics among asset classes. Regulatory shifts, such as new capital requirements or changes in monetary policy frameworks, can drastically alter interest rate behaviors and credit market structures.
These changes matter because historical relationships (e.g., the correlation between equities and bonds, or the volatility of commodities) can shift as the underlying market structure evolves. The historical data might indicate one steady pattern—until something like high-frequency trading or major central bank interventions come along and blow that pattern out of the water.
Being aware of the pitfalls is only half the battle. We also want some tools to address them actively. Let’s look at two: Bayesian methods and regime-switching models.
Bayesian statistics offers a framework where we can update our “prior” beliefs about an asset’s risk and return with new “evidence”—such as more recent market data or expert judgments. The formula for updating a posterior distribution (in simplified form) is often written as:
In plain terms, if the historical data suggests a certain mean return and volatility, but you have reason to believe, based on latest market signals, that risk has fundamentally increased, you can incorporate that newer view into your prior. Then, the final (posterior) estimate is a blend of this prior and the recent data’s likelihood. Over time, as new data arrives, your estimate evolves—hopefully capturing real shifts in risk profiles as they occur.
Markets don’t always behave in a single consistent “mode.” Instead, they may switch between regimes—for example, a stable/low-volatility regime and a turbulent/high-volatility regime. Regime-switching models allow us to estimate different sets of parameters, such as means, variances, and correlations, depending on the prevailing market state.
A simplified two-regime model might say:
• In Regime 1 (low volatility), equity returns average 8%, with an annual volatility of 10%.
• In Regime 2 (high volatility), equity returns average 1%, with an annual volatility of 25%.
Statistical techniques will estimate the probability of transitioning from one regime to the other. When the model suggests, for instance, a high probability of shifting to Regime 2, your risk estimates can reflect that sharper volatility spike, rather than blindly relying on the weighted average of historical returns.
It’s often helpful to combine quantitative insights with qualitative observations. Let’s say you see a historically moderate correlation between stocks and bonds, but you notice central banks implementing extraordinary monetary policies. A purely quantitative approach might not capture that shift in real time. By reading policy statements, analyzing forward guidance, or simply being mindful of global macro developments, you can get a heads-up on possible structural changes.
In practice, that means cross-checking numerical results from your models with fundamental or anecdotal evidence. If your forward-looking scenario suggests a surge in inflation risk, but your purely historical dataset doesn’t, you might want to increase your inflation-risk estimates to reflect the new conditions, especially if other macro indicators agree.
Below is a simple Mermaid diagram showing how historical data feeds into risk estimation, highlighting points where distortion or bias can occur:
flowchart TB A["Historical Data <br/>(Sample)"] --> B["Sampling Effects"] A --> C["Potential Biases <br/>(Survivorship)"] B --> D["Risk <br/>Estimates"] C --> D D --> E["Final Portfolio <br/>Decisions"]
At each step, we can introduce adjustments—such as removing survivorship bias or weighting data by economic regimes—before we finalize estimates.
• Check for Survivorship Bias: Make sure your dataset includes both winners and losers, even if they’re no longer in the market.
• Use Sufficiently Long Horizons (If Relevant): But always question whether the historical period is truly representative of current market dynamics.
• Incorporate Qualitative Inputs: Market narratives, investor sentiment, and structural changes often stay a few steps ahead of purely quantitative data.
• Consider Multiple Regimes: If the environment changes from low to high volatility or from expansion to recession, consider dynamic or regime-based modeling.
• Update with Bayesian Methods: Combine prior knowledge with new evidence. Keep updating your estimates as conditions evolve.
• Sensitivity and Scenario Analysis: Subject your estimates to various hypothetical scenarios. Ask, “What if market volatility doubles overnight?”
Not too long ago, I worked with a small family office that relied heavily on data from 2010–2020 to forecast the risk of their long-only equity strategy. This sample, while containing some volatility, largely missed the extreme stress seen in 2008 and the dot-com crash. Over time, it became clear their hedging budgets were too low, and risk capital was too lean for a worst-case scenario. By adding a Bayesian “shock” layer that accounted for older crisis data and applying a modest regime-switching approach, they recalibrated their overall risk. When markets turned choppy during the COVID-19 outbreak, the office was at least partially insulated due to a more conservative stance.
• Emphasize Basic Definitions: When the exam asks about survivorship bias or sampling error, your first step is to nail the definitions and quickly illustrate their impact on returns and volatility.
• Show Your Work: If you’re asked to re-estimate returns considering an older set of data, demonstrate each step clearly.
• Bring In Forward-Looking Views: Even if the question focuses on historical data, try to discuss how new info or signals could refine that data.
• Connect to Scenario Analysis: This reading is closely tied to the next sections on scenario and sensitivity analysis (e.g., Section 2.12); consider synergy in the exam.
• Summarize Pitfalls: The exam often rewards students who can “teach back” common pitfalls like sampling error and survivorship bias succinctly.
• Dimson, E., Marsh, P., and Staunton, M. (various years). “Global Investment Returns Yearbook.” This research highlights how omitting defunct companies can bias data.
• CFA Institute Program Curriculum. Discussion on historical vs. forward-looking estimates in risk management contexts.
• Hamilton, J.D. (1989). “A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle.” Econometrica, which introduced regime-switching approaches.
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.