Explore the fundamental concepts and practical techniques of simple linear regression, focusing on least squares estimation, slope and intercept interpretation, and application within a financial context.
In investment analysis, we often encounter situations where we want to understand how one variable influences another—like how a company’s revenue might be affected by advertising expenditure, or how a stock’s daily returns move with changes in a benchmark index. Simple linear regression (SLR) is a great tool to tackle these questions because it allows us to estimate the relationship between a single independent variable (X) and a dependent variable (Y).
The typical setup is:
where:
• \(\beta_0\) = intercept
• \(\beta_1\) = slope coefficient
• \(\varepsilon\) = error term (random noise with mean zero)
In this section, you’ll learn the mechanics of how these \(\beta\) coefficients are estimated using the least squares criterion and how to interpret them in the context of financial data. Also, we’ll look at common pitfalls and best practices from an investment analyst’s viewpoint, and we’ll explore how this knowledge can be applied in exam scenarios and in everyday finance work.
Imagine you want to estimate how changes in a market index (X), like the S&P 500, might affect an individual stock’s returns (Y). If you gather a series of observations on X and Y (say, 100 days of data), simple linear regression helps you draw the “best” straight line that fits the data. The line should capture the overall trend: when X goes up, Y might also go up (positive correlation) or go down (negative correlation).
So, a quick personal anecdote here: I remember once trying to figure out if changes in oil prices could predict the daily returns of an airline company’s stock. The results weren’t as clear-cut as I expected—there was some relationship, but it wasn’t super strong. That’s typical: real data can be messy, which is why an organized method like least squares is so valuable.
The “least squares” part comes from minimizing the sum of the squared differences between the actual observed values of Y (the airline’s actual daily returns) and the fitted (or predicted) values of Y from the regression line.
Mathematically for each observation \(i\):
These \(e_i\) are called residuals, and they measure how far off the regression line is for each data point. We define the sum of squared errors (SSE) as:
We find \(\hat{\beta}_0\) (estimated intercept) and \(\hat{\beta}_1\) (estimated slope) that make this SSE as small as possible.
To estimate the slope \(\hat{\beta}_1\):
where \(\overline{X}\) is the average of all \(X_i\) values and \(\overline{Y}\) is the average of all \(Y_i\) values. Once we have \(\hat{\beta}_1\), we estimate the intercept:
If, for instance, we find that \(\hat{\beta}_1 = 1.5\) in the airline example, that could imply that for every 1% change in oil prices (X), the airline’s stock return (Y) is expected to move by +1.5% on average, all else being equal. Meanwhile, \(\hat{\beta}_0\) would tell us the average predicted return when oil prices remain at some baseline or near zero change (though typically zero change in oil might not be meaningful in real life—so we interpret with caution).
Let’s do a quick numeric example. Suppose we have five days of data for an airline’s stock return versus changes in the price of oil:
• Day 1: (X = −2%, Y = 4%)
• Day 2: (X = 0%, Y = 3%)
• Day 3: (X = 1%, Y = 1%)
• Day 4: (X = 2%, Y = −1%)
• Day 5: (X = 5%, Y = 2%)
(Here, X is daily change in oil price, Y is daily stock return.)
Compute \(\overline{X}\) and \(\overline{Y}\):
Compute the numerator \(\sum (X_i - \overline{X})(Y_i - \overline{Y})\). This gets a bit tedious by hand, so for a small data set, you can do a quick summation. Or you can do what many analysts do: open Excel or Python and let the software handle the arithmetic.
Estimate \(\hat{\beta}_1\). This slope approximates how changes in oil price movement might correlate with the airline’s returns.
Estimate \(\hat{\beta}_0\). This gives the predicted daily return when the change in oil price is at 1.2% (the average X) or specifically extends to the point where \(X=0\).
If you run these five observations in a spreadsheet using, for example, Excel’s “LINEST” function, you’ll get an intercept and slope that best fit the data. The outcome might not be strongly significant with only five observations, but it demonstrates how the process works.
If you want to see how you might do this in Python:
1import numpy as np
2import statsmodels.api as sm
3
4X = np.array([-2, 0, 1, 2, 5]) # % changes in oil
5Y = np.array([4, 3, 1, -1, 2]) # % changes in airline stock
6
7X = sm.add_constant(X) # add intercept
8model = sm.OLS(Y, X).fit()
9print(model.summary())
You’ll see output with the estimated slope (\(\hat{\beta}_1\)) and intercept (\(\hat{\beta}_0\)), plus many diagnostics, like R-squared and t-statistics, which we’ll discuss in upcoming sections (look forward to 10.2, 10.3, and beyond for a deeper dive).
Below is a Mermaid diagram that shows a simplified workflow:
flowchart LR A["Raw data: (X_i, Y_i)"] --> B["Compute means <br/> of X (X-bar) <br/> and Y (Y-bar)"] B --> C["Compute slope <br/> (beta1)"] C --> D["Compute intercept <br/> (beta0)"] D --> E["Form model: <br/> Y = beta0 + beta1 * X"] E --> F["Analyze residuals: <br/> Y_i - (beta0 + beta1*X_i)"]
You gather the data points, compute the means, then use them to get \(\hat{\beta}_1\) and \(\hat{\beta}_0\). Finally, you can track how well your model performs by examining residuals.
The slope tells us how much Y is expected to change when X changes by one unit. So if \(\hat{\beta}_1 = 2\) for daily returns: for every 1% increase in X (your independent variable), your dependent variable Y (some other return or measure) goes up 2 percentage points on average.
This “on average” part is key. We’re talking about expected changes in Y, not a guarantee for every single day. Markets are noisy: some days your predictive model will overestimate, other days it’ll underestimate.
The intercept \(\hat{\beta}_0\) indicates the expected value of Y when the independent variable X is zero. But watch out: in finance, sometimes \(X=0\) might lie outside the data’s practical range or might not be meaningful at all (for example, “zero interest rate” environment or “zero GDP growth” might be relevant or might not be). Interpreting intercepts often requires domain knowledge.
I recall being slightly confused by an intercept in a bond-yield regression once, because the intercept suggested negative yield at times when inflation rates were near zero. It was a clue that the model might be simplified or that something else was going on in the market—like risk premium factors we hadn’t included.
Though this section focuses primarily on estimating \(\hat{\beta}_0\) and \(\hat{\beta}_1\), in practice you won’t just stop there—you’ll want to check how well the model fits the data. That typically involves:
We’ll explore these in Chapter 10.3 (Residual Analysis and Model Diagnostics) and in Chapter 10.4 (Measures of Fit), so stay tuned.
• Overfitting: Even in simple linear regression, we might read too much into a small dataset. Watch out if you have a very small sample or if your variables are non-stationary (like some macroeconomic time series).
• Omitted Variable Bias: If you suspect other factors explain Y (like interest rates, exchange rates, or inflation), ignoring them could mean your slope \(\hat{\beta}_1\) is biased.
• Extrapolation: The fitted line might look great within your data’s range, but if you extrapolate far beyond the range of X, the predictions can mislead you.
• Interpretation of Intercept: As mentioned, be careful. If \(X=0\) is not in the data’s range, \(\hat{\beta}_0\) might not have a meaningful real-world interpretation.
• Unit Scales: Always confirm the units—like whether your X is in percentages or decimal form. A slope of 0.02 means something very different if X is in decimals vs. percentages.
• Violations of Regression Assumptions: Non-constant variance in residuals or correlation of errors over time can produce unreliable standard errors, affecting inference. See Chapter 10.2 and 10.3 for how to detect and handle these.
lm(Y ~ X)
.Simple linear regression is particularly handy when you want to examine a primary driver of returns, like how interest rate changes might influence bond fund performance or how a currency pair might affect an importer’s stock price. However, in real capital markets, multiple influences often exist. That’s why more advanced models (multiple regression, factor models, etc.) can provide broader insights. Still, as the initial building block, SLR is indispensable.
• You might encounter a constructed-response question where you’re given a small data set and asked to calculate the slope. Prep by practicing short calculations and memorizing the slope and intercept formulas.
• For item sets, you might be interpreting regression output. Expect questions about the meaning of the slope, intercept, SSE, or how to minimize SSE.
• Always keep an eye on conceptual traps: Don’t assume correlation implies causation, and be mindful of the domain-specific meaning of the intercept.
• Practice explaining the regression line in plain English. This is crucial in portfolio management contexts (e.g., “beta” in the Capital Asset Pricing Model is basically the slope of a regression of stock returns on market returns).
• Wooldridge, J. M. (2020). Introductory Econometrics: A Modern Approach. Cengage Learning.
• Montgomery, D. C., Peck, E. A., & Vining, G. G. (2021). Introduction to Linear Regression Analysis. Wiley.
• Penn State Online–STAT 501: https://online.stat.psu.edu/stat501/lesson
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.