Abstract
In this article, we introduce lassopack, a suite of programs for regularized regression in Stata. lassopack implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso, and postestimation ordinary least squares. The methods are suitable for the high-dimensional setting, where the number of predictors p may be large and possibly greater than the number of observations, n. We offer three approaches for selecting the penalization (“tuning”) parameters: information criteria (implemented in lasso2), K-fold cross-validation and h-step-ahead rolling cross-validation for cross-section, panel, and time-series data (cvlasso), and theory-driven (“rigorous” or plugin) penalization for the lasso and square-root lasso for cross-section and panel data (rlasso). We discuss the theoretical framework and practical considerations for each approach. We also present Monte Carlo results to compare the performances of the penalization approaches.
Original language | English |
---|---|
Pages (from-to) | 176-235 |
Number of pages | 60 |
Journal | The Stata Journal |
Volume | 20 |
Issue number | 1 |
Early online date | 24 Mar 2020 |
DOIs | |
Publication status | Published - Mar 2020 |
Keywords
- cross-validation
- cvlasso
- cvlassologit
- elastic net
- lasso
- lasso2
- lasso2 postestimation
- lassologit
- lassologit postestimation
- rlasso
- rlasso postestimation
- rlassologit
- square-root lasso
- st0594
ASJC Scopus subject areas
- Mathematics (miscellaneous)