PDSLASSO: Stata module for post-selection and post-regularization OLS or IV estimation and inference

Achim Ahrens, Christian Hansen, Mark Edwin Schaffer (Photographer)

Research output: Non-textual formSoftware

Abstract

lassopack is a suite of programs for penalized regression methods suitable for the high-dimensional setting where the number of predictors p may be large and possibly greater than the number of observations. The package consists of three main programs: lasso2 implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso and post-estimation OLS. cvlasso supports K-fold cross-validation and rolling cross-validation for cross-section, panel and time-series data. rlasso implements theory-driven penalization for the lasso and square-root lasso for cross-section and panel data. The lasso (Least Absolute Shrinkage and Selection Operator, Tibshirani 1996), the square-root-lasso (Belloni et al. 2011) and the adaptive lasso (Zou 2006) are regularization methods that use L1 norm penalization to achieve sparse solutions: of the full set of p predictors, typically most will have coefficients set to zero. Ridge regression (Hoerl & Kennard 1970) relies on L2 norm penalization; the elastic net (Zou & Hastie 2005) uses a mix of L1 and L2 penalization. lasso2 implements all these estimators. rlasso uses the theory-driven penalization methodology of Belloni et al. (2012, 2013, 2014, 2016) for the lasso and square-root lasso. cvlasso implements K-fold cross-validation and h-step ahead rolling cross-validation (for time-series and panel data) to choose the penalization parameters for all the implemented estimators. In addition, rlasso implements the Chernozhukov et al. (2013) sup-score test of joint significance of the regressors that is suitable for the high-dimensional setting.pdslasso and ivlasso are routines for estimating structural parameters in linear models with many controls and/or instruments. The routines use methods for estimating sparse high-dimensional models, specifically the lasso (Least Absolute Shrinkage and Selection Operator, Tibshirani 1996) and the square-root-lasso (Belloni et al. 2011, 2014). These estimators are used to select controls (pdslasso) and/or instruments (ivlasso) from a large set of variables (possibly numbering more than the number of observations), in a setting where the researcher is interested in estimating the causal impact of one or more (possibly endogenous) causal variables of interest. Two approaches are implemented in pdslasso and ivlasso: (1) The "post-double-selection" (PDS) methodology of Belloni et al. (2012, 2013, 2014, 2015, 2016). (2) The "post-regularization" (CHS) methodology of Chernozhukov, Hansen and Spindler (2015). For instrumental variable estimation, ivlasso implements weak-identification-robust hypothesis tests and confidence sets using the Chernozhukov et al. (2013) sup-score test. The implemention of these methods in pdslasso and ivlasso require the Stata program rlasso (available in the separate Stata module lassopack), which provides lasso and square root-lasso estimation with data-driven penalization.
Original languageEnglish
Place of PublicationBoston, USA
PublisherBoston College Department of Economics
Publication statusPublished - 24 Jan 2019

Keywords

  • econometrics
  • high-dimensional models
  • inference
  • lasso
  • elastic net
  • sparsity

Fingerprint Dive into the research topics of 'PDSLASSO: Stata module for post-selection and post-regularization OLS or IV estimation and inference'. Together they form a unique fingerprint.

  • Cite this