Abstract
In this article, we introduce lassopack, a suite of programs for regularized regression in Stata. lassopack implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso, and postestimation ordinary least squares. The methods are suitable for the high-dimensional setting, where the number of predictors p may be large and possibly greater than the number of observations, n. We offer three approaches for selecting the penalization (“tuning”) parameters: information criteria (implemented in lasso2), K-fold cross-validation and h-step-ahead rolling cross-validation for cross-section, panel, and time-series data (cvlasso), and theory-driven (“rigorous” or plugin) penalization for the lasso and square-root lasso for cross-section and panel data (rlasso). We discuss the theoretical framework and practical considerations for each approach. We also present Monte Carlo results to compare the performances of the penalization approaches.
| Original language | English |
|---|---|
| Pages (from-to) | 176-235 |
| Number of pages | 60 |
| Journal | The Stata Journal |
| Volume | 20 |
| Issue number | 1 |
| Early online date | 24 Mar 2020 |
| DOIs | |
| Publication status | Published - Mar 2020 |
Keywords
- cross-validation
- cvlasso
- cvlassologit
- elastic net
- lasso
- lasso2
- lasso2 postestimation
- lassologit
- lassologit postestimation
- rlasso
- rlasso postestimation
- rlassologit
- square-root lasso
- st0594
ASJC Scopus subject areas
- Mathematics (miscellaneous)