Explore Probit regression and other nonlinear models for binary or limited-dependent variables, including Tobit, Heckman, and logistic approaches, with practical finance applications.
So, we’ve all been there—running a standard Ordinary Least Squares (OLS) regression only to realize that our dependent variable isn’t really suitable for a plain old linear model in some critical way. Maybe it’s binary: takes a value of 0 if a firm does not default and 1 if it does. Or perhaps it’s bounded within [0, 1], such as a proportion of a portfolio allocated to a risky asset. In such situations, the linear assumption can be surprisingly misleading. Let’s say you’re trying to model the probability of a certain investment outcome. If you use OLS, the predicted values can easily be outside the range [0, 1], which is obviously nonsensical for a probability.
That’s where nonlinear models come to the rescue. These models ensure our predictions and inferences align with the data’s shape and constraints. A binary outcome often suggests logistic or probit regression. A continuous variable with a whole bunch of zeros (say, daily returns that any normal day might be zero or negative) might call for a censored or truncated model (Tobit), or a situation with sample selection might demand the Heckman correction.
These approaches are pretty standard in finance. For instance, you might want to:
• Estimate the probability of default (which is 0 or 1).
• Investigate the probability of a stock paying a dividend or not.
• Deal with “corner solutions” (e.g., 0 holdings in a certain asset).
This section explores the most common nonlinear regression methods used in investment analysis, risk modeling, and research vignettes you’ll see at the CFA Level II exam. Understanding why we pick one type of model over another is critical, particularly when you come across item-set questions illustrating how real data might not “fit” the simpler linear form.
I remember the first time I encountered a probit model for a class project. I was crossing my fingers that the difference between “probit” and “logit” would be super obvious. Spoiler alert: the difference is subtle. Probit regression uses the cumulative distribution function (CDF) of the standard normal distribution as its link function. Instead of modeling the response directly, probit introduces a latent variable framework:
Let’s suppose there’s an unobservable (latent) variable:
p* = Xβ + ε
where X is your set of predictors, β is the vector of coefficients, and ε ~ Normal(0, 1). We do not observe p*, but we do observe a binary outcome:
p =
1 if p* > 0
0 if p* ≤ 0
Hence, the probability that p = 1 is given by:
P(p = 1 | X) = Φ(Xβ),
where Φ(·) is the standard normal CDF. If you mapped out Φ(Xβ) on paper, it’s basically an S-shaped curve between 0 and 1, ensuring probabilities remain valid.
Under probit, we typically estimate β using Maximum Likelihood Estimation (MLE). The likelihood function is constructed around the probability that each observation is a 1 or a 0, given X. We then choose the β that maximizes this likelihood. In practice, software packages handle the math for us, but you’ll see references to log-likelihood values, z-statistics for coefficients, and standard errors that we interpret in a manner similar to OLS.
One of the biggest initial challenges is interpreting the coefficients. With OLS, it’s easy: a one-unit increase in X is associated with a β increase in Y. For probit, the coefficient represents how X shifts the z-score threshold in a normal distribution—yikes, that’s not super intuitive off the bat. Typically, we check marginal effects: how a small change in X changes the probability.
We can express the marginal effect of X_k on P(p=1) as:
∂Φ(Xβ) / ∂X_k = φ(Xβ) × β_k,
where φ(·) is the standard normal PDF. Notice that the marginal effect depends on X (because φ(Xβ) is evaluated at the sample’s Xβ values). Often, you’ll see partial effects reported at the mean of X or at other representative values.
“But wait,” you’re thinking, “I’ve also heard of logistic regression—are they the same?” More or less, yes, in the sense that they both handle a binary dependent variable. The difference? Logit uses the logistic function instead of the normal CDF:
P(y = 1 | X) = 1 / [1 + e^(-Xβ)].
If you take the log of [P / (1 - P)], you get the “log odds,” which is linear in X. In practice, the choice between probit and logit can be a bit subjective: probit is more common in economics (and older finance texts) because it’s historically associated with the idea of underlying normal errors. Logit is more frequent in machine learning and data science contexts. In typical finance vignettes, you may see either. Probit’s results and logit’s results are usually quite similar, save for scaling differences in the coefficients.
Sometimes, the dependent variable isn’t just 0 or 1. Maybe it’s a continuous variable that’s often censored (like a measure that piles up at 0). Or there’s a sample selection issue—only certain observations get recorded or included. These are classic “limited dependent variable” situations.
Consider you’re analyzing daily sales for an equity research firm, but a large portion of the observations are zero (e.g., no trades reported on certain instruments). A plain OLS might be biased because standard assumptions don’t hold. The Tobit model steps in when the data is censored at some value (often zero).
The Tobit idea is that there is a latent variable y* = Xβ + ε that we would observe if it exceeded the threshold c (often c = 0). If y* ≤ c, then we observe y = c. The model is useful if you suspect your outcome is more or less linear beyond the threshold but is jammed at that threshold for a non-trivial fraction of the sample.
Ever read a finance paper that says “we only have data for companies that issue dividends, so the sample is biased”? That’s a selection bias problem. The Heckman model corrects for sample selection by using two steps:
This approach introduces something called the inverse Mills ratio (IMR), which, in plain terms, corrects the second equation coefficients for the non-random nature of the sample. The end result is a “selection-corrected” predictor, giving you less-biased estimates of the relationship among the variables of interest.
As soon as you start using probit, logit, or the fancier limited dependent variable models, focusing solely on raw coefficients can be perplexing. That’s why marginal effects are typically reported. They answer, “What is the effect on the predicted outcome or probability if we tweak a single independent variable?”
At the same time, you might wonder how to measure the fit of these nonlinear models since you can’t just rely on R-squared the way we do with OLS. You have:
• Likelihood Ratio (LR) test: Compares the log-likelihood of the full model versus a restricted model.
• Pseudo-R²: Not the same as the OLS R-squared, but helps measure improvement over a null model. Common ones are McFadden’s, Cox & Snell, and Nagelkerke.
• Classification-based metrics: If you’re dealing with a binary outcome, confusion matrices, accuracy, precision, recall, and F1 scores can be quite relevant.
For the CFA exam, you might see references to “pseudo-R²” or the significance of the LR test. They want to see if you understand how to gauge model usefulness in a setting where the predicted variable sits in [0, 1].
Working with maximum likelihood is usually painless if your software converges. Sometimes, though, MLE can be finicky—especially with small sample sizes or poorly specified functional forms. Watch out for:
• Convergence Failures: If your data doesn’t supply enough variation, the iterative solver can fail.
• Separation Issues: If X can perfectly separate 0 and 1 outcomes, the model might blow up (infinite coefficient estimates).
• Multicollinearity: Just like in OLS, correlated predictors can create large standard errors.
Validation or calibration tests (like the Hosmer-Lemeshow test) can help you see whether the predicted probabilities track the observed data well. In finance terms, you could see that it fails to capture the probability of default in different subgroups (e.g., high-yield vs. investment-grade) if the model is poorly specified.
Now, imagine an exam vignette that shows a partial output from a probit model predicting whether a bond defaults. You’ll see something like:
• Coefficients (beta estimates).
• Standard errors or z-stats.
• Possibly a pseudo-R² or classification table.
• A mention of the significance of each predictor.
They might ask (1) how to interpret a certain coefficient, (2) the effect of a one-unit change in interest rate spread on the probability of default, or (3) how the model’s “fit” is evaluated. The correct approach is to highlight that each coefficient influences the latent propensity and that we typically convert that effect into marginal probability changes.
If the question is about making a “go/no-go” investment call based on the predicted probability, the classification-based approach might come up: “At a threshold of 0.5, would you classify this outcome as default or non-default?” Always look for the detail in the question stem. They might set a threshold of 0.2 or 0.7, depending on risk tolerance. That’s all fair game in a Level II item set.
One helpful way to think about these models is as a flowchart:
flowchart LR A["Binary or <br/>Limited Dependent Variable?"] --> B["Binary <br/>Outcome?"] B -->|Yes| C["Probit or Logit"] B -->|No| D["Censored/Truncated? <br/>(Tobit) or <br/>Selection Problem? <br/>(Heckman)"] C --> E["Check <br/>MLE Convergence <br/>And Fit"] D --> E["Check <br/>MLE Convergence <br/>And Fit"]
In finance contexts:
• Use probit/logit if your dependent variable is binary, like forecasting a corporate bond’s default event.
• Use Tobit if the dependent variable is censored, e.g., a portion of your sample is at a boundary (like 0).
• Use Heckman if you only observe certain data points due to selection issues.
Let’s do a quick scenario:
Imagine we want to predict whether a firm will issue new equity in the upcoming quarter (Issue = 1 if yes, 0 if no). We collect data on:
• The firm’s current leverage ratio (LEVERAGE).
• Expected GDP growth (GDPGROWTH).
• The firm’s stock return performance relative to market (RELRETURN).
A probit model might look like:
Issue_i = 1 if Issue_i* = β₀ + β₁ LEVERAGE_i + β₂ GDPGROWTH_i + β₃ RELRETURN_i + ε_i > 0,
0 otherwise.
We run MLE and obtain estimates. Possibly:
β₀ = -1.20, β₁ = 0.50, β₂ = 0.80, β₃ = -0.10.
If LEVERAGE increases by 1—other variables held constant—we see a positive shift in the latent variable. Because β₁ = 0.50 > 0, that suggests the probability of issuing equity goes up (maybe the firm feels levered enough that it needs more equity). However, to find the exact probability change, we’d compute:
ΔP = φ(Xβ) × 0.50
evaluated at a specific point Xβ. If Xβ is around 0, φ(0) = 0.3989, so the marginal effect might be 0.3989 × 0.50 = 0.1994. That’s about a 19.94 percentage-point increase in the probability, though the exact number depends on the actual Xβ for that observation.
• Probit Model: A binary outcome model using the standard normal CDF as the link function.
• Logit Model: A binary outcome model based on the logistic function, where the log-odds are linear in predictors.
• Latent Variable: An unobserved variable underlying observed outcomes, e.g., propensity to default.
• Marginal Probability Effect: The partial derivative of the probability function with respect to a predictor.
• Tobit Model: A model addressing censored outcomes, often at zero.
• Heckman Correction: A two-step procedure to address sample selection bias, incorporating the inverse Mills ratio.
• Maximum Likelihood Estimation (MLE): A common estimation method maximizing the likelihood function of the data.
• Pseudo-R²: A measure of fit for categorical or limited dependent variables, not directly comparable to OLS R².
• Double-check whether an OLS approach is valid. If data is binary, bounded, or censored, you probably want a nonlinear model.
• Be ready to interpret partial effects or marginal effects in exam item sets. A direct coefficient interpretation can be misleading.
• Understand that “pseudo-R²” is not the same as R². Don’t be alarmed if it’s smaller than what you’d see in typical OLS regressions.
• Probit vs. logit is often a matter of convenience. Unless the exam specifically focuses on the difference in link functions, results are usually similar in practice.
• For the Tobit and Heckman models, watch out for vignettes pointing out systematic data issues—like many zero values or a sample that’s clearly self-selected.
• Long, J.S. “Regression Models for Categorical and Limited Dependent Variables.” Sage Publications.
• Greene, W.H. “Econometric Analysis.” Pearson.
• Maddala, G.S. “Limited-Dependent and Qualitative Variables in Econometrics.” Cambridge University Press.
• CFA Institute Level II Curriculum, Nonlinear Regression Topics.
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.