PDSLASSO & LASSOPACK: Stata module for post-selection and post-regularization OLS or IV estimation and inference

Achim Ahrens, Christian Hansen, Mark Edwin Schaffer (Photographer)

Research output: Non-textual formSoftware

Abstract

lassopack is a suite of programs for penalized regression methods suitable for the high-dimensional setting where the number of predictors p may be large and possibly greater than the number of observations.  The pdslasso package contains routines forestimating structural parameters in linear models with many controls and/orinstruments.  The lassopack package consists of six main programs: lasso2 implements lasso, square-root lasso,elastic net, ridge regression, adaptive lasso and post-estimation OLS. cvlassosupports K-fold cross-validation and rolling cross-validation forcross-section, panel and time-series data. rlasso implements theory-driven penalization for the lasso and square-root lasso for cross-section and paneldata. lassologit, cvlassologit and rlassologit are the corresponding programsfor logistic lasso regression.  The lasso (Least Absolute Shrinkage and Selection Operator, Tibshirani 1996), the square-root-lasso (Belloni et al. 2011) and the adaptive lasso (Zou 2006) a reregularization methods that use L1 norm penalization to achieve sparsesolutions: of the full set of p predictors, typically most will havecoefficients set to zero. Ridge regression (Hoerl & Kennard 1970) relies onL2 norm penalization; the elastic net (Zou & Hastie 2005) uses a mix of L1and L2 penalization. lasso2 implements all these estimators. rlasso uses the theory-driven penalization methodology of Belloni et al. (2012, 2013, 2014,2016) for the lasso and square-root lasso. cvlasso implements K-foldcross-validation and h-step ahead rolling cross-validation (for time-series and panel data) to choose the penalization parameters for all the implemented estimators.  lassologit, rlassologit and cvlassologit extend support to the case where the dependent variable is abinary response.  In addition, rlassoimplements the Chernozhukov et al. (2013) sup-score test of joint significance of the regressors that is suitable for the high-dimensional setting.  The pdslasso package consists of twoprograms: pdslasso and ivlasso are routines for estimating structural parameters in linear models with many controls and/or instruments. The routines use methods for estimating sparse high-dimensional models, specifically the lasso (Least Absolute Shrinkage and Selection Operator, Tibshirani 1996) and the square-root-lasso (Belloni et al. 2011, 2014). These estimators are used to select controls (pdslasso) and/or instruments (ivlasso) from a large set of variables (possibly numbering more than the number of observations), in a setting where the researcher is interested in estimating the causal impact of one or more (possibly endogenous) causal variables of interest. Two approaches are implemented in pdslasso and ivlasso: (1) The"post-double-selection" (PDS) methodology of Belloni et al. (2012,2013, 2014, 2015, 2016). (2) The "post-regularization" (CHS)methodology of Chernozhukov, Hansen and Spindler (2015). For instrumental variable estimation, ivlasso implements weak-identification-robust hypothesistests and confidence sets using the Chernozhukov et al. (2013) sup-score test.The implementation of these methods in pdslasso and ivlasso require the Stataprogram rlasso (available in the separate Stata module lassopack.

Original languageEnglish
Place of PublicationBoston, USA
PublisherBoston College Department of Economics
Media of outputOnline
Publication statusPublished - 24 Jan 2019

Keywords

  • econometrics
  • high-dimensional models
  • inference
  • lasso
  • elastic net
  • sparsity

Fingerprint

Dive into the research topics of 'PDSLASSO & LASSOPACK: Stata module for post-selection and post-regularization OLS or IV estimation and inference'. Together they form a unique fingerprint.

Cite this