lassopack is a suite of programs for penalized regression methods suitable for the high-dimensional setting where the number of predictors p may be large and possibly greater than the number of observations. The package consists of three main programs: lasso2 implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso and post-estimation OLS. cvlasso supports K-fold cross-validation and rolling cross-validation for cross-section, panel and time-series data. rlasso implements theory-driven penalization for the lasso and square-root lasso for cross-section and panel data. The lasso (Least Absolute Shrinkage and Selection Operator, Tibshirani 1996), the square-root-lasso (Belloni et al. 2011) and the adaptive lasso (Zou 2006) are regularization methods that use L1 norm penalization to achieve sparse solutions: of the full set of p predictors, typically most will have coefficients set to zero. Ridge regression (Hoerl & Kennard 1970) relies on L2 norm penalization; the elastic net (Zou & Hastie 2005) uses a mix of L1 and L2 penalization. lasso2 implements all these estimators. rlasso uses the theory-driven penalization methodology of Belloni et al. (2012, 2013, 2014, 2016) for the lasso and square-root lasso. cvlasso implements K-fold cross-validation and h-step ahead rolling cross-validation (for time-series and panel data) to choose the penalization parameters for all the implemented estimators. In addition, rlasso implements the Chernozhukov et al. (2013) sup-score test of joint significance of the regressors that is suitable for the high-dimensional setting.pdslasso and ivlasso are routines for estimating structural parameters in linear models with many controls and/or instruments. The routines use methods for estimating sparse high-dimensional models, specifically the lasso (Least Absolute Shrinkage and Selection Operator, Tibshirani 1996) and the square-root-lasso (Belloni et al. 2011, 2014). These estimators are used to select controls (pdslasso) and/or instruments (ivlasso) from a large set of variables (possibly numbering more than the number of observations), in a setting where the researcher is interested in estimating the causal impact of one or more (possibly endogenous) causal variables of interest. Two approaches are implemented in pdslasso and ivlasso: (1) The "post-double-selection" (PDS) methodology of Belloni et al. (2012, 2013, 2014, 2015, 2016). (2) The "post-regularization" (CHS) methodology of Chernozhukov, Hansen and Spindler (2015). For instrumental variable estimation, ivlasso implements weak-identification-robust hypothesis tests and confidence sets using the Chernozhukov et al. (2013) sup-score test. The implemention of these methods in pdslasso and ivlasso require the Stata program rlasso (available in the separate Stata module lassopack), which provides lasso and square root-lasso estimation with data-driven penalization.
|Place of Publication||Boston, USA|
|Publisher||Boston College Department of Economics|
|Publication status||Published - 24 Jan 2019|
- high-dimensional models
- elastic net
Ahrens, A. (Author), Hansen, C. (Author), & Schaffer, M. E. (Photographer). (2019). PDSLASSO: Stata module for post-selection and post-regularization OLS or IV estimation and inference. Software, Boston College Department of Economics. Retrieved from https://ideas.repec.org/c/boc/bocode/s458458.html