Fit lasso for regression using coordinate descent.

cdlasso(formula,
      data,
      nfolds = 0,
      weights = NULL,
      nlambda = 100,
      lambda.min.ratio = ifelse(n < n.xvar, 0.01, 1e-04),
      lambda = NULL,
      threshold = 1e-7,
      eps = .0001,
      maxit = 5000,
      efficiency = ifelse(n.xvar < 500, "covariance", "naive"),
      seed = NULL,
      do.trace = FALSE)

Arguments

formula

Formula describing the model to be fit.

data

Data frame containing response and features.

nfolds

Number of cross-validation folds where default is 0 corresponding to no cross-validation.

weights

Observation weights. Default is 1 for each observation.

nlambda

The number of lambda values; default is 100.

lambda.min.ratio

Smallest value for lambda, as a fraction of lambda.max which equals smallest value for which all coefficients are zero. A very small value of lambda.min.ratio will lead to a saturated fit in if number of observations n is less than number of features n.xvar.

lambda

Lasso lambda sequence. Default is an internally selected sequence based on nlambda and lambda.min.ratio. For experts only.

threshold

Convergence threshold for coordinate descent. Each inner coordinate-descent loop continues until the maximum change in the objective after any coefficient update is less than threshold times the null deviance.

eps

Multiplication factor applied to lambda.min.ratio used to define the smallest lambda value.

maxit

Maximum number of passes over the data for all lambda values.

efficiency

Switches the algorithm to efficiency or naive mode depending on number of variables. Efficiency covariance saves all inner-products and can be significantly faster in certain settings than naive which loops through all values n each time an inner-product is formed.

seed

Negative integer specifying seed for the random number generator.

do.trace

Number of seconds between updates to the user on approximate time to completion.

Details

Use coordinate descent to fit lasso to a regression model.

Value

A list containing the fitted lasso solution path. The list contains:

convgCount

Convergence counter returned by the coordinate-descent routine.

lambdaCount

Number of lambda values in the fitted solution path.

lambda

The sequence of lambda values used.

beta

Matrix of regression coefficients for the lasso solution path. Rows correspond to values in lambda; columns contain the intercept followed by the encoded predictor variables.

xvar

Numeric predictor matrix used in the fit, after any internal encoding of the supplied data.

yvar

Response vector used in the fit.

yHat

Cross-validated fitted values or predictions by lambda and observation. Returned only when cross-validation output is available, such as when nfolds is greater than 1.

lambda.min.indx

Index of the lambda value with minimum cross-validation error. Returned only when cross-validation output is available.

lambda.1se.min.indx

Index of the minimum lambda value within one standard error of the minimum cross-validation error. Returned only when cross-validation output is available.

lambda.1se.max.indx

Index of the maximum lambda value within one standard error of the minimum cross-validation error. Returned only when cross-validation output is available.

lambda.cvm

Mean cross-validation error for each lambda. Returned only when cross-validation output is available.

lambda.cvsd

Cross-validation standard-error values for each lambda. Returned only when cross-validation output is available.

ctime.internal

Timing information reported by the native coordinate-descent routine.

ctime.external

Elapsed R-side timing, computed from proc.time().

Author

Hemant Ishwaran and Udaya B. Kogalur

References

Friedman, J., Hastie, T. and Tibshirani, R. (2010) Regularization paths for generalized linear models via coordinate descent, J. of Statistical Software, 33(1):1-22.

See also

Examples

# \donttest{

## ------------------------------------------------------------
## regression example: boston housing
## ------------------------------------------------------------

if (requireNamespace("mlbench", quietly = TRUE)) {
## load the data
data(BostonHousing, package = "mlbench")

## 10-fold validation
o <- cdlasso(medv ~., BostonHousing, nfolds=10)

## lasso solution
bhat <- data.frame(bhat.min=o$beta[o$lambda.min.indx,],
                   bhat.1se=o$beta[o$lambda.1se.max.indx[1],])
print(bhat)

## compare to results from glmnet
if (library("glmnet", logical.return = TRUE)) {

  oo <- cv.glmnet(data.matrix(o$xvar), o$yvar, nfolds=10)
  bhat2 <- cbind(data.matrix(coef(oo, s=oo$lambda.min)),
                 data.matrix(coef(oo, s=oo$lambda.1se)))
  rownames(bhat2) <- rownames(bhat)
  print(bhat2)

}
}
# }