Learn how Simple Linear Regression uses Ordinary Least Squares (OLS) to estimate relationships between a single independent variable and a dependent variable. Explore key assumptions, interpret coefficients, and evaluate model fit using R², SEE, and ANOVA.
Sometimes I remember the first time I came across a scatterplot in finance class—dots everywhere, me scratching my head, thinking, “Um, how do I make sense of this?” So, let’s explore that question together in a simple, straightforward way. Simple Linear Regression helps us draw a line through those dots, or data points, so we can (hopefully) spot meaningful trends and make predictions.
Simple Linear Regression is an important tool in quantitative methods (see other sections in Chapter 2, such as 2.8 Hypothesis Testing, for complementary statistical concepts). It tries to explain or predict a “dependent” variable (often denoted y) using a single “independent” or “explanatory” variable (x). We assume there is some sort of linear relationship between x and y, so you might write it like this:
(1) y = α + βx + ε
Where:
• y = Dependent variable (the outcome we’re trying to predict)
• x = Independent variable (the predictor we use to explain or predict y)
• α = Intercept (sometimes called a constant)
• β = Slope coefficient
• ε = Error term, capturing unexplained variation
Once we estimate α and β, we get a line—our regression line—that ideally does a decent job of fitting our observed data.
The main driver behind simple linear regression is the assumption that the predicted value of y is a linear function of x. In real life, we might use monthly sales (y) as a function of advertising expenditure (x). Another time, we might use a company’s stock return (y) as a function of the market index’s return (x). The possibilities are basically endless in finance, from modeling interest rates to forecasting corporate earnings.
Mathematically, we want to fit a line through the points (xᵢ, yᵢ) such that it captures the overall pattern. The “best fit” line is almost always found using the Ordinary Least Squares (OLS) method.
OLS is a fancy name for the criterion we use to select the optimal α (intercept) and β (slope). We choose α and β to minimize the sum of squared residuals (SSR). Those residuals are just the difference between each observed yᵢ and the predicted value ŷᵢ (i.e., yᵢ - ŷᵢ).
Here’s how we get the estimates of our parameters:
• Slope coefficient (β̂):
$$ \hat{\beta} = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} $$
• Intercept (α̂):
$$ \hat{\alpha} = \bar{y} - \hat{\beta} ,\bar{x} $$
where:
• \( \bar{x} \) is the mean of all x values.
• \( \bar{y} \) is the mean of all y values.
So this is what literally ties your regression line to the data. The slope captures how y changes if x changes by one unit. The intercept is the predicted value of y when x = 0 (though in some cases, x = 0 might not make practical sense—like when x is “years of education,” or “advertising dollars,” or “time,” but let’s keep moving).
Imagine you have monthly sales data (y) and monthly advertising budgets (x). Maybe your dataset is (in thousands of dollars for advertising, and thousands of units sold for sales numbers):
• Month 1: x = 10, y = 65
• Month 2: x = 12, y = 70
• Month 3: x = 9, y = 60
• Month 4: x = 11, y = 68
• …
Plugging these into the formulas will yield specific values of α̂ and β̂. And like magic, you get a line: ŷ = α̂ + β̂x. That line then helps you forecast sales for next month, provided you have an advertising budget guess.
Interpretation is often more important than fancy math. The slope (β̂) is the expected change in y per unit change in x. If β̂ = 2, that tells you that for every 1-unit increase in x, the expected change in y is +2 units. If x = 10 and it moves to 11, then we’d expect y to move from ŷ1 to ŷ2 with a difference of 2. In language: “We expect an additional 2 units of y if x goes up by 1.”
The intercept (α̂) is the value of y when x = 0. But as we mentioned earlier, this may not always be meaningful. For instance, if x represents “years of work experience,” then x = 0 is “no experience.” The intercept is the predicted y for that hypothetical scenario. In some business cases, x=0 might be something we never actually see in practice or doesn’t even exist. So if the intercept is huge or negative or doesn’t make sense in real-world terms, we usually just say “it’s the value of the line at x=0,” but we might not interpret it too literally.
Well, linear regression is powerful, but it comes with a few big assumptions. Think of them like the ground rules we assume about our data and error terms.
• Linearity: The relationship between x and y is linear. If the actual relationship is curvature or something more complicated, simple linear regression might not be the perfect fit.
• Independence of Errors: The residuals (ε) should not be correlated with each other. If your data is time-ordered, it’s easy to have autocorrelation. That’s basically consecutive points being related in a pattern.
• Homoscedasticity: The variance of the error terms is constant across all values of x. When variance changes with x—maybe the residuals get bigger for bigger values of x—that’s heteroscedasticity.
• Normality of Errors: The error terms are assumed to be normally distributed with mean zero and variance σ². This assumption influences the validity of confidence intervals and hypothesis tests.
• No or Little Multicollinearity: In simple linear regression with only one x, you don’t have to worry too much about multicollinearity (that’s more relevant for multiple regression, which references other chapters).
When these assumptions hold, the OLS estimates are “Best Linear Unbiased Estimates” (BLUE). If they don’t hold, well, the inferences and predictions might go haywire.
After we fit a line, we want to see how well it explains the variation in y. That’s where R², the coefficient of determination, and the Standard Error of the Estimate (SEE) step in.
R² is a number between 0 and 1 that basically says, “Hey, this fraction of the variation in y is explained by the x variable(s) in your model.” The bigger the R², the more variability your model accounts for. Let’s define it:
$$ R^2 = \frac{\text{Explained Sum of Squares (SSR)}}{\text{Total Sum of Squares (SST)}} $$
In words, SSR is the portion of the variation in y explained by the regression line, and SST is the total variation in y around its mean.
• If R² = 0.80, it suggests that 80% of the variation in y is explained by the model.
The SEE is the standard deviation of the residuals. It shows, on average, how far off your predictions are from the actual observed y values. A smaller SEE means your model predictions are closer to reality. The formula for SEE (sometimes denoted sᵧₓ) is:
$$ \text{SEE} = \sqrt{\frac{\sum (y_i - \hat{y}_i)^2}{n - 2}} $$
where n is the number of data points. We use (n - 2) because we estimated two parameters: α and β.
Analysis of Variance (ANOVA) is used to test the overall significance of your regression model. If you like exploring sums of squares (SST, SSR, SSE), this is your jam. It partitions the total variability (SST) into explained variation (SSR) and unexplained variation (SSE). Then it uses an F-test to see if your model is “statistically significant”—basically checking whether β ≠ 0.
You can visualize it like this:
flowchart LR A["Total Variation <br/> (SST)"] --> B["Explained Variation <br/> (SSR)"]; A["Total Variation <br/> (SST)"] --> C["Unexplained Variation <br/> (SSE)"];
Where:
• SST = SSR + SSE
• SSR is the regression sum of squares.
• SSE is the error sum of squares.
We often see an F-test determined by:
$$
F = \frac{\text{MSR}}{\text{MSE}} = \frac{\text{SSR}/1}{\text{SSE}/(n - 2)}
$$
If this F-statistic is sufficiently large (and the p-value is small), we reject the null hypothesis that β=0, indicating that x is significantly related to y. If not, our fancy regression line might be no better than a random guess.
All right, so once we have that line ŷ = α̂ + β̂x, we can predict (or forecast) y for a given x. Let’s say you want to forecast next month’s sales (y) given your planned advertising budget (x). That’s straightforward: just plug x into your equation.
Because estimates aren’t guaranteed, we often attach intervals around them. Two common intervals:
• Confidence Intervals (CI): e.g., an interval for the average predicted y. “We’re 95% confident that the mean value of y for a given x is between these two numbers.”
• Prediction Intervals (PI): e.g., an interval for a particular predicted value of y. “We’re 95% confident that an individual outcome for y for a given x will be between these two numbers.”
A PI is wider than a CI because it accounts for the extra uncertainty of an individual observation, not just the average.
So, does that mean all financial data is linear? Ha, not even close. If your data is strongly skewed or you see a pattern in the residuals (like a funnel shape indicating heteroscedasticity), you might try transformations:
• Log transform: Replace y with ln(y) (and/or x with ln(x)).
• Polynomial: Fit y = α + β₁x + β₂x² + … (a polynomial shape).
• Other functional forms: Exponential, reciprocal, etc.
These transformations can address nonlinearity or reduce heteroscedasticity. For instance, in economics, it’s often common to regress ln(y) on ln(x) for elasticity analysis. But for the scope of single-equation, simple linear regression, we typically stick to the format y = α + βx + ε.
Let’s imagine Bob, a personal friend, who complains: “I opened a small coffee stall in the weekend farmers’ market, and I notice that on days with more foot traffic (x), I sell more pastry combos (y). But how do I forecast next weekend’s sales?” Step one: gather data on foot traffic and combos sold across multiple weeks. Step two: apply simple linear regression, estimate α̂ and β̂, interpret them. Step three: forecast combos sold for a projected foot traffic figure you might expect next weekend. Step four: check how well the model fits the data using R². If you see that R² is 0.90, that’s a strong relationship, but if it’s 0.25, maybe foot traffic alone is insufficient to explain sales.
• Ordinary Least Squares (OLS): A method to estimate α and β by minimizing the sum of squared residuals.
• Intercept (α): Predicted value of y when x = 0.
• Slope Coefficient (β): Expected change in y for a one-unit change in x.
• Residual (Error) Term (ε): The difference between actual y and predicted y.
• Homoscedasticity: Constant variance of the error term across different x values.
• Coefficient of Determination (R²): Fraction of variance in y explained by the model.
• ANOVA (Analysis of Variance): Statistical method to evaluate the model’s significance by partitioning total variation into explained and unexplained components.
• Standard Error of Estimate (SEE): Standard deviation of the model’s residuals, measuring predictive accuracy.
• Gujarati, D. N., & Porter, D. C. (2008). Basic Econometrics. McGraw-Hill.
• Kennedy, P. (2008). A Guide to Econometrics. Wiley-Blackwell.
These texts provide a deeper dive into regression analysis, the math behind OLS, and advanced diagnostic tests for validating assumptions.
So, that’s the gist of simple linear regression. Sure, it can get fancier with multiple regressors, cross-sectional and time-series data intricacies, or advanced techniques like robust standard errors (to tame heteroscedasticity). But for a core understanding, the straightforward approach of a single x explaining y is a pretty awesome starting point for any aspiring analyst or curious mind.
Anyway, ready to test your understanding?
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.