Explore various sampling techniques, understand how they influence statistical inference, and learn ways to minimize errors in financial data analysis.
Sampling is a fundamental technique in finance and many other fields. At its core, it’s about selecting a manageable subset (a sample) from a larger group (a population) so we can make statements about that larger group without having to examine every single member. In finance, we might draw samples of past returns to forecast future performance, sample bond issuers to understand credit risk trends, or sample investors to gauge sentiment.
Although the term “sampling” might sound straightforward, there are multiple ways to do it, and each approach comes with its own strengths, weaknesses, and biases. On the bright side, a well-chosen sampling method can save enormous amounts of time and money, while providing insights that are close to what you’d get if you studied everyone (or everything). On the darker side, a poor sampling choice can lead to misleading results and potentially disastrous investment decisions.
I remember a time, early in my career, when I was working on a quick internal survey of portfolio managers to measure how much value they believed active management provided versus passive investing. I relied on convenience sampling—basically asking the folks in my immediate network—and ended up with a skewed set of results: nearly everyone was outraged by the idea of indexing because they were a specialized group of active managers. That personal experience really highlighted for me how sampling method can be critical. If you pick the wrong approach, well, your “insights” won’t be so insightful.
Below, we explore the various sampling methods, disclose how you might apply them in an investment context, and show how to minimize both sampling and non-sampling errors for more robust conclusions.
Before diving into specific sampling methods, let’s clarify a few terms you’re bound to encounter:
In a perfect world, your sample’s statistic would precisely reflect the population’s parameter. In reality, samples are never perfect. Part of our job as analysts is to choose a method of sampling and a sample size that balance practicality with accuracy.
Simple random sampling is the foundation. Each member of the population has an equal probability of being chosen, and selections are made entirely by chance. This approach is popular because it’s conceptually straightforward and often considered the gold standard for unbiasedness—assuming you can truly randomize the selection.
In a practical investment scenario, simple random sampling might mean pulling random stock returns from a comprehensive list of thousands of global equities. If all securities truly have the same chance of being selected, you minimize selection bias. However, pure randomness can be challenging to ensure in practice (especially if your database has structural issues), and it can become very expensive or time-consuming for large populations.
Stratified sampling involves dividing the population into subgroups (called strata) based on certain characteristics—maybe geography, market capitalization, sector, or credit rating—and then randomly sampling within each subgroup. If each stratum’s internal population is relatively homogeneous on some key trait, stratified sampling can yield more precise estimates.
In portfolio management, you might stratify by market cap (small-cap, mid-cap, large-cap). Then, you randomly sample a certain number of stocks within each group. This approach ensures you capture small-cap patterns even if the total proportion of small-cap stocks in the market is modest. In practice, stratification often leads to lower variance in your estimates (like mean returns) compared to a simple random sample, especially if the strata are well-chosen.
Cluster sampling flips the stratification concept somewhat by dividing the population into heterogeneous groups that ideally resemble mini versions of the entire population. Then you randomly select a few clusters and study either all or a random selection of members within those clusters.
For instance, you might cluster by global region—say, North America, Europe, Asia-Pacific—and hope each region has a diverse set of investments. You then pick a random sample of these regions and collect all your data from within the chosen clusters. It’s more cost-effective than traveling to every possible region or company, but it can introduce higher sampling error if your clusters vary significantly from each other.
Systematic sampling involves picking a random starting point and then choosing members at regular intervals (the skip or sampling interval). In a list of 10,000 subscribers to an investment newsletter, you might randomly pick someone in the first 100, then select every 100th subscriber thereafter.
It’s an easy approach to implement—especially if you have a neatly ordered list—but watch out for hidden patterns. For example, if the list is ordered by zip code and you inadvertently skip entire types of neighborhoods, you might end up with a weirdly biased sample.
Convenience sampling is like picking fruit from the lowest branch of a tree: you just grab whatever is easiest. This might be volunteers who respond to a call for feedback on a website or passersby in a busy financial district.
It’s quick and cost-effective but can be spectacularly biased. People with strong opinions or abundant free time may be overrepresented. Despite its flaws, convenience sampling can be a practical preliminary tool in finance projects—especially if you’re under a severe time constraint and just need some early-stage feel for data. Still, any major conclusions drawn from convenience samples are suspect.
Judgmental sampling relies on the researcher’s expertise to choose what they believe is a representative sample. For instance, an investment manager might pick “best-in-class” corporate bonds for a fixed-income study, believing those selections represent bond market performance overall.
The risk is that the researcher’s judgment might be off (or simply biased). This approach can be valuable when genuine expertise is available—like a senior analyst who thoroughly knows a niche market—but you always have to weigh the possibility of unintentional bias.
Below is a Mermaid diagram summarizing the relationship among different sampling approaches:
flowchart LR A["Population"] --> B["Sample <br/>Selection"] B --> C["Simple Random Sampling"] B --> D["Stratified Random Sampling"] B --> E["Cluster Sampling"] B --> F["Systematic Sampling"] B --> G["Convenience Sampling"] B --> H["Judgmental Sampling"]
Sampling error is essentially the gap between the sample statistic (like the sample mean) and the true population parameter (like the true mean). Even in a perfectly executed random design, there’s no full escape from sampling error unless you sample the entire population. This is part of the reason analysts rely on confidence intervals and hypothesis testing (covered in subsequent sections) to assess uncertainty.
But it’s not just about random chance. Non-sampling errors can be equally deadly. Imagine you have a small glitch in your code that misclassifies certain bonds as having zero coupon. Or consider that some respondents might lie in a survey about their risk tolerance. These are non-sampling issues and can compromise your entire analysis—even if you used the world’s best sampling technique.
In an investment firm, non-sampling errors might arise from data snooping or incorrectly labeled data sets. As you might guess, controlling for these errors requires rigorous data cleaning, consistency checks, and robust operational processes.
One straightforward way to reduce sampling error is to increase the sample size. The more data points you have, the closer your sample mean is likely to be to the true mean (by virtue of the Law of Large Numbers). However, in analyzing, say, global stock returns, you can’t just keep collecting data forever—it might be cost-prohibitive or logistically impossible.
Another strategy is to choose a more effective sampling method for the problem at hand. If the population naturally segments into meaningful subgroups, stratified sampling can yield narrower confidence intervals than a simple random sample of the same size. Similarly, if practical constraints hamper your ability to gather data from diverse corners of the market, cluster sampling might be the only cost-effective design.
Let’s connect sampling to a scenario you could well see in your investment practice. Suppose you oversee a multi-asset portfolio with positions in global equities, bonds, and alternative investments. You want to estimate the overall volatility of this portfolio over the next quarter to stress-test your capital requirements.
• If you draw a simple random sample of historical daily returns, you might omit the fact that certain markets are rarely traded during certain times, leading to biases in your variance estimate.
• A stratified approach might separate equities by region (North America, EMEA, Asia-Pacific), then sample daily returns within each region. You’d do something similar for bonds and alternatives. This ensures representation of all key subgroups.
• Alternatively, if you have consistent historical data for only a handful of markets, you might cluster these by geographical zone. You’d randomly pick a few representative zones and gather historical returns data from each. This cuts data collection costs.
Your choice depends on data availability, cost, and the level of precision you need. The final takeaway: the better the sampling design, the more confidence you can have in your risk estimates and subsequent capital allocation decisions.
Sampling is a significant topic on the CFA® exams, especially in quantitative methods sections. At the Level I stage, you’re expected to know definitions, rationale, and basic calculations (like how sample mean differs from population mean). By Level II and Level III, you’ll face more advanced applications—like how to interpret regression outputs or how sampling design can target a particular research question in portfolio analysis. For instance, in a performance attribution scenario, you might see vignettes about how they formed their data sample and be asked to judge sample reliability and potential biases.
In the broader context of ethical standards (CFA Institute Code of Ethics and Standards of Professional Conduct), using an unreliable sample to make inflated performance claims can easily violate the responsibility to clients and the market. So, thorough knowledge of sampling designs isn’t just academic—it helps preserve integrity and trust in your analyses.
• Always question your sampling approach: Is it truly capturing the population, or is there hidden systematic bias?
• Confirm your sampling frame is complete and up to date, especially in fast-changing fields like emerging market equities or cryptocurrencies.
• Document your assumptions and processes. If you use cluster sampling, justify your cluster definitions and check if they indeed represent miniature versions of the overall population.
• Watch out for convenience and judgmental sampling biases. Sometimes you have no choice (maybe you have a single data vendor), but at least be mindful of the limits of your design.
• Know the key differences: For instance, how stratified sampling differs from cluster sampling.
• Expect item-set questions where a portfolio manager used systematic sampling incorrectly because of a cyclical listing of securities.
• Practice short calculations: For example, identifying sample mean, sample standard deviation, or sampling error from a small data set.
• Be prepared for conceptual questions about the trade-off between cost and accuracy in a real-world scenario.
• Before tackling more advanced topics like hypothesis testing (Chapter 8) or regression (Chapters 10 and 14), make sure you’re comfortable with how your data was sampled in the first place.
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.