A thorough exploration of descriptive statistics, hypothesis testing, and their practical application in finance, including central tendency measures, correlation, parametric vs. non-parametric tests, and common pitfalls.
Descriptive statistics are all about summarizing and understanding data. When we say “descriptive,” we mean tools like the mean and median that help us pinpoint typical values, or the variance that tells us how numbers spread out. Let’s walk through the main categories: measures of central tendency (Where’s the middle?), measures of dispersion (How spread out is the data?), and correlation/covariance (How do variables move together?).
Sure, we all know that various formulae exist, but it’s super important to grasp why these metrics matter in finance. If you’re analyzing a stock’s returns over time (maybe monthly returns of your favorite large-cap stock), you’d probably start with the average return (the arithmetic mean) and the standard deviation (a measure of the fluctuations in returns). Then, if you’re building a portfolio, you might check how your stock’s returns correlate with those of other assets. All of these tasks—or “checks,” if you like—fall under descriptive statistics.
• Mean (Arithmetic Mean):
The arithmetic mean (often just called “mean”) is the average value of a set of observations. If we have n observations x₁, x₂, …, xₙ, their mean is:
This measure is widely used to represent expected returns when analyzing stocks or entire portfolios.
• Median:
The median is the midpoint of the dataset when your observations are arranged in ascending (or descending) order. That is to say, half the observations are above the median and half are below. If you have an odd number of observations, it’s simply the middle one. If it’s even, you take the average of the two middle ones. In investment contexts, the median is less sensitive to extreme outliers, such as a one-off quarter with extraordinary returns (either extremely high or negative).
• Mode:
The mode is the most frequently occurring value in a dataset. In finance, it is less often used for things like returns (because returns are typically numerical and continuous in nature), but you might encounter the mode in categorical data—say, the most common credit rating an issuer has across different rating agencies.
• Range:
The simplest measure of dispersion is the range, which is the difference between the maximum and minimum values in your dataset. If your monthly returns range from –3% to +5%, then your range is 8%. It’s straightforward but can be misleading if there is just one big outlier.
• Variance and Standard Deviation:
Variance is the average of the squared deviations from the mean. For a population with mean μ, each data point xᵢ is subtracted from μ, squared, summed, and then divided by the population size N:
Standard deviation (σ) is simply the square root of variance. For a sample (with sample size n), we use the sample variance estimator:
The n – 1 in the denominator is that little correction we do (the so-called Bessel’s correction) to adjust the bias when estimating a population variance from a sample.
• Coefficient of Variation (CV):
The coefficient of variation is the ratio of the standard deviation to the mean:
We see this used in finance to compare risk per unit of return across different assets. A lower CV indicates a more favorable risk–return trade-off (generally speaking).
In portfolio management, we really care about how different assets move together because that guides diversification decisions.
• Covariance:
Covariance measures the degree to which two variables move together (or separately). For instance, you might see that the returns of stock A and stock B have a positive covariance, meaning when stock A’s return is above its average, stock B’s return tends to be above its average too. The sample covariance is:
• Correlation:
Correlation is a standardized version of covariance, bounded between –1 and +1. With correlation (ρ), we remove the influence of scale:
A correlation close to +1 means a very strong positive relationship. A correlation near –1 indicates they move in opposite directions. A correlation around 0 indicates no linear relationship.
Sometimes, your dataset is the entire population. But in finance, we typically have a sample—like we’re studying monthly returns from the last five years to infer something about the “true” distribution of returns. The sample approach means we often use the sample mean and sample variance. Remember that if a question specifically calls something the “population variance,” you’d divide by N, not (n – 1). Keep an eye on vignettes: they might specify population assumptions or highlight small sample sizes. That detail can decide whether you use z- or t-tests.
Let me just say: it’s easy to get turned off by the formal structure of hypothesis testing. But trust me, you’ll be happy to have a systematic framework when you’re deciding if your portfolio’s “alpha” is truly different from zero.
The standard approach is to set up a Null Hypothesis (H₀) and an Alternative Hypothesis (H₁). H₀ is typically the “no change” or “no difference” scenario (like “Mean return is 5%” or “Coefficient is zero”). H₁ is the “there is a difference” scenario (“Mean return is not 5%” or “Coefficient is not zero”).
Then you choose a significance level (α)—like 0.05 or 5%—beyond which, if your test statistic is sufficiently large, you reject the null. If the test statistic is not that large, or if your p-value is bigger than α, you fail to reject the null. Notice we say “fail to reject” and not “accept” the null—because it’s possible the test is just not powerful enough to detect the difference.
A standard set of steps is typically followed:
flowchart LR A["Formulate Null (H₀)<br/>and Alternative (H₁)"] --> B["Choose Significance Level<br/>(α)"] B --> C["Determine Appropriate Test<br/>Statistic (z, t, F, or χ²)"] C --> D["Formulate Decision Rule"] D --> E["Calculate Test Statistic<br/>and Make Decision"]
At the end, you’ll “Reject H₀” if your evidence is strong, or “Fail to reject H₀” if it’s not.
• One-tailed:
A one-tailed test is used when you expect an effect in a specific direction. For instance, “The portfolio return is greater than 3%.” Your alternative hypothesis might be H₁: μ > 3%.
• Two-tailed:
A two-tailed test is used when the difference could go in either direction. “The portfolio return differs from 3%.” That means H₁: μ ≠ 3%. If you see “different from,” “not equal to,” or “less or greater,” likely it’s a two-tailed scenario.
The p-value is the probability of obtaining results at least as extreme as those observed, assuming H₀ is true. If that p-value is smaller than α, it means your evidence is strong enough that such an extreme event under H₀ is unlikely—so you reject H₀. The exam vignettes might present p-values directly, saving you the trouble of using distribution tables. Make sure you watch if they give you a p-value for a two-tailed scenario or if you need to double it for a two-tailed test. Such details can be tricky in exam contexts.
• Type I Error (False Positive):
You reject a true null hypothesis. If you set your α = 5%, it means there’s a 5% chance you commit this error in repeated testing.
• Type II Error (False Negative):
You fail to reject a false null hypothesis. So, you might incorrectly conclude your portfolio’s alpha is zero when, in reality, you did have alpha.
• Power of a Test:
The probability of correctly rejecting a false null hypothesis. This is 1 – β, where β is the probability of a Type II error. If you have a large enough sample (and a real effect to detect), your test power increases.
• Parametric Tests:
These tests assume the underlying population follows a certain distribution (often normal). Examples include the z-test and t-test.
• Non-Parametric Tests:
These tests do not rely on assumptions about the population distribution. They’re handy when you suspect severe non-normality, have ordinal data, or small sample sizes. Familiar examples might include the rank-sum test or the Wilcoxon signed-rank test.
When you’re analyzing regression output, you’re actually running loads of little hypothesis tests. For each coefficient (like a slope coefficient in multiple regression), you check whether it differs significantly from zero using a t-test. The default null is H₀: βᵢ = 0.
Similarly, you can run an F-test to check whether multiple coefficients as a group differ from zero. For example, in a 3-factor asset pricing model, you might test whether all three betas are simultaneously zero. If the F-test is large enough (or the p-value small enough), you conclude that at least one of those betas is non-zero.
Confidence intervals are basically ranges of values that are likely to contain the true parameter. A 95% confidence interval around the mean might look like:
where (s / √n) is the standard error of the mean, and tᵅ/₂,df is the critical value from the t-table for a 95% confidence interval (two-tailed) with df = n – 1 degrees of freedom. These intervals pop up in finance when you’re trying to estimate the future return of a stock or the effect size of a factor in a regression.
• Statistical Sig vs. Practical Sig:
Just because something is statistically significant doesn’t mean it’s big enough in practical terms. Context matters—a 0.03% outperformance might be “significant” but maybe not worth your transaction costs.
• Sample Size Woes:
With small samples, you rely on t-distributions (rather than z). Also, small samples can reduce your test’s power.
• Read the Fine Print in Vignettes:
The exam might say “Assume all relevant populations are normally distributed, and the sample size is 100.” That green-lights a z-test. Or it might say “n=12,” so you pivot to a t-test. These details matter.
• Interpretation in Context:
If you’re testing the significance of correlation, ensure you’re using the right formula and distribution. If you’re testing volatility, you might bump into χ² tests or F-tests, especially in certain portfolio variance contexts.
Hypothesis testing and descriptive statistics provide the backbone for analyzing data, whether you’re just describing a single return series or exploring regression coefficients. At Level II, you’ll see many item sets that throw in a short table of returns or regression outputs, followed by questions about test results, confidence intervals, and significance. Keep your eyes on sample sizes, normal distribution assumptions, one-tailed vs. two-tailed clarifications, and watch out for Type I/Type II error pitfalls.
Now—best advice? Practice interpreting p-values, reading summary tables, and running quickly through the five-step hypothesis testing framework in your head. That mental muscle memory is critical under exam-time pressure.
• CFA Institute Level II Curriculum – Official readings on hypothesis testing, correlation, and basic regression.
• Freedman, Pisani, and Purves, “Statistics” – A classic text for fundamental statistics.
• Walpole et al., “Probability & Statistics for Engineers & Scientists” – A deeper look into real-world applications.
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.