Fit lasso for regression using coordinate descent.

cdlasso(formula,
      data,
      nfolds = 0,
      weights = NULL,
      nlambda = 100,
      lambda.min.ratio = ifelse(n < n.xvar, 0.01, 1e-04),
      lambda = NULL,
      threshold = 1e-7,
      eps = .0001,
      maxit = 5000,
      efficiency = ifelse(n.xvar < 500, "covariance", "naive"),
      seed = NULL,
      do.trace = FALSE)

Arguments

formula

Formula describing the model to be fit.

data

Data frame containing response and features.

nfolds

Number of cross-validation folds where default is 0 corresponding to no cross-validation.

weights

Observation weights. Default is 1 for each observation.

nlambda

The number of lambda values; default is 100.

lambda.min.ratio

Smallest value for lambda, as a fraction of lambda.max which equals smallest value for which all coefficients are zero. A very small value of lambda.min.ratio will lead to a saturated fit in if number of observations n is less than number of features n.xvar.

lambda

Lasso lambda sequence. Default is an internally selected sequence based on nlambda and lambda.min.ratio. For experts only.

threshold

Convergence threshold for coordinate descent. Each inner coordinate-descent loop continues until the maximum change in the objective after any coefficient update is less than threshold times the null deviance.

eps

Multiplication factor applied to lambda.min.ratio used to define the smallest lambda value.

maxit

Maximum number of passes over the data for all lambda values.

efficiency

Switches the algorithm to efficiency or naive mode depending on number of variables. Efficiency covariance saves all inner-products and can be significantly faster in certain settings than naive which loops through all values n each time an inner-product is formed.

seed

Negative integer specifying seed for the random number generator.

do.trace

Number of seconds between updates to the user on approximate time to completion.

Details

Use coordinate descent to fit lasso to a regression model.

Value

Lasso solution path with the following values.

beta

Matrix containing beta values for the lasso solution path.

lambda

The sequence of lambda values used.

lambda.min.indx

Index for value of lambda that gives the minimum cross-validation error. Only applies if nfolds is greater than 1.

lambda.1se.min.indx

Index for minimum lambda value within 1 standard error of the minimum cross-validation error. This is more liberal. Only applies if nfolds is greater than 1.

lambda.1se.max.indx

Index for maximum lambda value within 1 standard error of the minimum cross-validation error. This is more conservative. Only applies if nfolds is greater than 1.

Author

Hemant Ishwaran and Udaya B. Kogalur

References

Friedman, J., Hastie, T. and Tibshirani, R. (2010) Regularization paths for generalized linear models via coordinate descent, J. of Statistical Software, 33(1):1-22.

See also

Examples

# \donttest{

## ------------------------------------------------------------
## regression example: boston housing
## ------------------------------------------------------------

## load the data
data(BostonHousing, package = "mlbench")

## 10-fold validation
o <- cdlasso(medv ~., BostonHousing, nfolds=10)

## lasso solution
bhat <- data.frame(bhat.min=o$beta[o$lambda.min.indx,],
                   bhat.1se=o$beta[o$lambda.1se.max.indx[1],])
print(bhat)

## compare to results from glmnet
if (library("glmnet", logical.return = TRUE)) {

  oo <- cv.glmnet(data.matrix(o$xvar), o$yvar, nfolds=10)
  bhat2 <- cbind(data.matrix(coef(oo, s=oo$lambda.min)),
                 data.matrix(coef(oo, s=oo$lambda.1se)))
  rownames(bhat2) <- rownames(bhat)
  print(bhat2)

}

# }