glmnetãããå°ãçè§£ãããâ
ä¹ ãã¶ãã®æ´æ°ã§ãï¼ãã¤ãè¨ã£ã¦ãã¾ãï¼ã
èæ¯
ãã¼ã¿ãµã¤ã¨ã³ã¹å
¥éã·ãªã¼ãºã®ãã¹ãã¼ã¹å帰åæã¨ãã¿ã¼ã³èªèããèªãã§ããã大å¤é¢ç½ãã£ãã®ã§ããã¤ãã®ããã« glmnet ã®ä¸èº«ãè¦ã¦ã¿ããã¨ã«ãã¾ããã
ãªãç§ã¯æ¥åã§Lasso/Ridgeã使ã£ãçµé¨ããã¾ããªãããçè§£ãééã£ã¦ããããããã¾ãããããã®ç¹ããããããäºæ¿ãã ããã

ã¹ãã¼ã¹å帰åæã¨ãã¿ã¼ã³èªè (ãã¼ã¿ãµã¤ã¨ã³ã¹å ¥éã·ãªã¼ãº)
- ä½è :æ¢ æ´¥ ä½å¤ª,è¥¿äº é¾æ ,ä¸ç° åç¥
- çºå£²æ¥: 2020/02/28
- ã¡ãã£ã¢: åè¡æ¬ï¼ã½ããã«ãã¼ï¼
ãã¡ãã®æ¬ã§ããè¯ãæ¬ã§ãã
glmnet ã®å®è¡çµæ
ååã® GAM ã®æã¨åæ§ã«ãã¾ã㯠glmnet ã§ã©ã®ãããªçµæãå¾ããã¨ãã§ããã®ã確èªãã¦ã¿ã¾ãããããã¹ãã¼ã¹å帰åæã¨ãã¿ã¼ã³èªèãï¼ä»¥ä¸ãæç§æ¸ï¼P12 ã³ã¼ã1.2ãï¼å°ãæ¹å¤ãã¦ï¼å®è¡ãã¦ã¿ã¾ãã
ãªããããã®ã³ã¼ãã¯ãã¡ããããã¦ã³ãã¼ããããã¨ãã§ãã¾ãã
ç°å¢ã¯ä»¥ä¸ã®ãããªæãã§ãã
> sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.6 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib locale: [1] ja_JP.UTF-8/ja_JP.UTF-8/ja_JP.UTF-8/C/ja_JP.UTF-8/ja_JP.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.6.0 tools_3.6.0 grid_3.6.0 lattice_0.20-38
library(glmnet) library(plotmo) x <- scale(LifeCycleSavings[, 2:5]) y <- LifeCycleSavings[, 1] - mean(LifeCycleSavings[, 1]) lasso <- glmnet(x, y, family = "gaussian", alpha = 1) # alpha = 1 ã§ lasso ridge <- glmnet(x, y, family = "gaussian", alpha = 0) # alpha = 0 ã§ ridge ## directoryã¯é©å½ã«æå® png("./Image/glmnet_dive_01_01.png", width = 600, height = 400) plot_glmnet(lasso, xvar = "lambda", label = TRUE) dev.off() png("./Image/glmnet_dive_01_02.png", width = 600, height = 400) plot_glmnet(ridge, xvar = "lambda", label = TRUE) dev.off()


çµæã®è§£éãªã©ã«ã¤ãã¦è©³ããã¯æç§æ¸ãè¦ã¦é ãã¨ãã¦ã glmnet ã¯ç®ç颿°ã«åå¸°ä¿æ°ã®è¦æ¨¡ã«å¿ããç½°åãè¨ãããã¨ã§ãåå¸°ä¿æ°ã0ã«åãã£ã¦ç¸®å°ãããªãããã£ããã£ã³ã°ãè¡ãã¾ãã
ã¾ãã°ã©ãã®ããã«ç½°åã®å¤§ãããè²ã
ã¨åãããã¨ã§å夿°ã¸ã®åå¸°ä¿æ°ãã©ã®ããã«å¤åããããè©ä¾¡ãããã¨ãã§ãã¾ãã
ãã®ã°ã©ãã§ã¯å·¦ããå³ã«åãã£ã¦ç½°åãå¼·ããªãã¾ãããããã«ã¤ãã¦Lasso/Ridgeã®ä¸¡æ¹ã¨ãåå¸°ä¿æ°ã0ã«åãã£ã¦å°ãããªã£ã¦ããï¼ç¸®å°ãã¦ããï¼ãã¨ããããã¾ãã
ãªã Lasso ã§ã¯åå¸°ä¿æ°ã0ã«åæãã¦ãã䏿¹ã Ridge ã§ã¯å¾®å°ãªããæå¾ã¾ã§ä¿æ°ã0ã¨ãªããã«æ®ã£ã¦ãããã¨ããããã¾ããï¼ã°ã©ãä¸é¨ã® Degrees of Freedom ã 4 ã®ã¾ã¾ã¨ãªã£ã¦ãã¾ãï¼ã Lasso ã®ããã«ä¸é¨ã®åå¸°ä¿æ°ãæ£ç¢ºã« 0 ã¨æ¨å®ãããã¨ãå¯è½ãªææ³ãã¹ãã¼ã¹æ¨å®ã¨å¼ã³ã¾ãã
glmnet ã®å®è£
ããã§ã¯ glmnet ã¨ãã颿°ãã©ã®ããã«å®è£
ããã¦ããã®ãè¦ã¦ããã¾ãããã
ã¾ãã¯ãã¤ãã®ããã«å
¨ä½ãçºããè¦éãããããã¾ãã
function (x, y, family = c("gaussian", "binomial", "poisson", "multinomial", "cox", "mgaussian"), weights, offset = NULL, alpha = 1, nlambda = 100, lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04), lambda = NULL, standardize = TRUE, intercept = TRUE, thresh = 1e-07, dfmax = nvars + 1, pmax = min(dfmax * 2 + 20, nvars), exclude, penalty.factor = rep(1, nvars), lower.limits = -Inf, upper.limits = Inf, maxit = 1e+05, type.gaussian = ifelse(nvars < 500, "covariance", "naive"), type.logistic = c("Newton", "modified.Newton"), standardize.response = FALSE, type.multinomial = c("ungrouped", "grouped"), relax = FALSE, trace.it = 0, ...) { ### 1. ãã©ã¡ã¼ã¿ã®è¨å®ãåå¦çãã¨ã©ã¼ãã§ã㯠family = match.arg(family) if (alpha > 1) { warning("alpha >1; set to 1") alpha = 1 } if (alpha < 0) { warning("alpha<0; set to 0") alpha = 0 } alpha = as.double(alpha) this.call = match.call() nlam = as.integer(nlambda) y = drop(y) np = dim(x) if (is.null(np) | (np[2] <= 1)) stop("x should be a matrix with 2 or more columns") nobs = as.integer(np[1]) if (missing(weights)) weights = rep(1, nobs) else if (length(weights) != nobs) stop(paste("number of elements in weights (", length(weights), ") not equal to the number of rows of x (", nobs, ")", sep = "")) nvars = as.integer(np[2]) dimy = dim(y) nrowy = ifelse(is.null(dimy), length(y), dimy[1]) if (nrowy != nobs) stop(paste("number of observations in y (", nrowy, ") not equal to the number of rows of x (", nobs, ")", sep = "")) vnames = colnames(x) if (is.null(vnames)) vnames = paste("V", seq(nvars), sep = "") ne = as.integer(dfmax) nx = as.integer(pmax) if (missing(exclude)) exclude = integer(0) if (any(penalty.factor == Inf)) { exclude = c(exclude, seq(nvars)[penalty.factor == Inf]) exclude = sort(unique(exclude)) } if (length(exclude) > 0) { jd = match(exclude, seq(nvars), 0) if (!all(jd > 0)) stop("Some excluded variables out of range") penalty.factor[jd] = 1 jd = as.integer(c(length(jd), jd)) } else jd = as.integer(0) vp = as.double(penalty.factor) internal.parms = glmnet.control() if (internal.parms$itrace) trace.it = 1 else { if (trace.it) { glmnet.control(itrace = 1) on.exit(glmnet.control(itrace = 0)) } } if (any(lower.limits > 0)) { stop("Lower limits should be non-positive") } if (any(upper.limits < 0)) { stop("Upper limits should be non-negative") } lower.limits[lower.limits == -Inf] = -internal.parms$big upper.limits[upper.limits == Inf] = internal.parms$big if (length(lower.limits) < nvars) { if (length(lower.limits) == 1) lower.limits = rep(lower.limits, nvars) else stop("Require length 1 or nvars lower.limits") } else lower.limits = lower.limits[seq(nvars)] if (length(upper.limits) < nvars) { if (length(upper.limits) == 1) upper.limits = rep(upper.limits, nvars) else stop("Require length 1 or nvars upper.limits") } else upper.limits = upper.limits[seq(nvars)] cl = rbind(lower.limits, upper.limits) if (any(cl == 0)) { fdev = glmnet.control()$fdev if (fdev != 0) { glmnet.control(fdev = 0) on.exit(glmnet.control(fdev = fdev)) } } storage.mode(cl) = "double" isd = as.integer(standardize) intr = as.integer(intercept) if (!missing(intercept) && family == "cox") warning("Cox model has no intercept") jsd = as.integer(standardize.response) thresh = as.double(thresh) if (is.null(lambda)) { if (lambda.min.ratio >= 1) stop("lambda.min.ratio should be less than 1") flmin = as.double(lambda.min.ratio) ulam = double(1) } else { flmin = as.double(1) if (any(lambda < 0)) stop("lambdas should be non-negative") ulam = as.double(rev(sort(lambda))) nlam = as.integer(length(lambda)) } is.sparse = FALSE ix = jx = NULL if (inherits(x, "sparseMatrix")) { is.sparse = TRUE x = as(x, "CsparseMatrix") x = as(x, "dgCMatrix") ix = as.integer(x@p + 1) jx = as.integer(x@i + 1) x = as.double(x@x) } if (trace.it) { if (relax) cat("Training Fit\n") pb <- createPB(min = 0, max = nlam, initial = 0, style = 3) } kopt = switch(match.arg(type.logistic), Newton = 0, modified.Newton = 1) if (family == "multinomial") { type.multinomial = match.arg(type.multinomial) if (type.multinomial == "grouped") kopt = 2 } kopt = as.integer(kopt) ### 2. ãã£ããã£ã³ã° fit = switch(family, gaussian = elnet(x, is.sparse, ix, jx, y, weights, offset, type.gaussian, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, intr, vnames, maxit), poisson = fishnet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, intr, vnames, maxit), binomial = lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, intr, vnames, maxit, kopt, family), multinomial = lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, intr, vnames, maxit, kopt, family), cox = coxnet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, vnames, maxit), mgaussian = mrelnet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, jsd, intr, vnames, maxit)) if (trace.it) { utils::setTxtProgressBar(pb, nlam) close(pb) } ### 3. å¾å¦ç if (is.null(lambda)) fit$lambda = fix.lam(fit$lambda) fit$call = this.call fit$nobs = nobs class(fit) = c(class(fit), "glmnet") if (relax) relax.glmnet(fit, x = x, y = y, weights = weights, offset = offset, lower.limits = lower.limits, upper.limits = upper.limits, check.args = FALSE, ...) else fit }
glmnet ã§ã¯ä»¥ä¸ã®ããã«ã
- ãã©ã¡ã¼ã¿ã®è¨å®ãåå¦çãã¨ã©ã¼ãã§ãã¯
- ãã£ããã£ã³ã°
- å¾å¦ç
ã¨ãã£ãã¹ãããã§å¦çãé²ãã§ãããããã¯éå»ã«ã¿ã¦ãã glm ã gam ã¨åæ§ã§ããã
ããã§ã¯åã¹ããããç´°ããè¦ã¦ããã¾ãããã
1. ãã©ã¡ã¼ã¿ã®è¨å®ãåå¦çãã¨ã©ã¼ãã§ãã¯
ã¾ãã¯ãã©ã¡ã¼ã¿ã®è¨å®ãåå¦çã«é¢ããé¨åã§ãããã¯ããã« family ã®æå®ãåé¡ãªããããã§ãã¯ãã¾ãã
## æå®ããfamilyã弿°ã¨ãã¦OKããã§ã㯠family = match.arg(family)
glmnet ã§ä½¿ç¨å¯è½ãª family 㯠glm ã¨ã¯ç°ãªã£ã¦ãããGamma / inverse.gaussian / quasi- ã使ããªã代ããã«ã multinomial / cox / mgaussian ã使ããããã«ãªã£ã¦ãã¾ãã
ããã§ multinomial ã¯å¤é
åå¸ãmgaussian ã¯å¤å¤éæ£è¦åå¸ãæå³ããããã§ãã
family ã®ãã§ãã¯ã«ã¯ match.arg 颿°ã使ããã¦ãã¾ãã
ãã®é¢æ°ã®æåãçè§£ããã®ã¯å°ãé£ããã®ã§ããããã¡ãã®ããã°ãåèã«ãªãã¾ãã
ç¶ã㦠alpha ããã§ãã¯ãã¾ãï¼
## alpha ### Lassoã¨Ridgeããããã«å¯¾ããããã«ãã£ã®é åãæ±ºãããã©ã¡ã¼ã¿ ### glmnetã«ãããç½°åé ã¯ä»¥ä¸ã§å®ç¾© ### alphaã¯0~1ã§ã1ãªãLassoã0ãªãRidgeã«å¯¾å¿ if (alpha > 1) { warning("alpha >1; set to 1") alpha = 1 } if (alpha < 0) { warning("alpha<0; set to 0") alpha = 0 } alpha = as.double(alpha)
glmnet ã«ããã¦ãã® alpha ã¯ãåå¸°ä¿æ°ã®L1ããã³L2ãã«ã ããããã«å¯¾ããç½°åã®å²åãã³ã³ããã¼ã«ãã¾ãã
ããå
·ä½çã«ã¯ã glmnet ã§ã¯ç½°åé
ã¯ä»¥ä¸ã«ãã£ã¦å®ç¾©ããã¾ãï¼https://cran.r-project.org/web/packages/glmnet/glmnet.pdf ã® P19ããï¼ï¼
åé ã®ã³ã¼ãã§ã¯ alpha = 1 ã¾ã㯠alpha = 0 ã¨ãã¾ããããä¸ã®å¼ãã alpha = 1 ã®ã¨ãã«L2ãã«ã ã«å¯¾ããç½°åãæ¶ãã¦L1ãã«ã ã®ã¿ãæ®ãï¼Lassoï¼ãéã« alpha = 0 ã¨ããã¨L1ãã«ã ã«å¯¾ããç½°åãæ¶ãã¦L2ãã«ã ãæ®ãï¼Ridgeï¼ãã¨ããããã¾ãã
ã¾ã alpha ã (0, 1) ã¨ããã¨ä¸¡è
ãããããã®å²åã§ãã¬ã³ãããã¾ãã
ãªããããã§L2ãã«ã ã«å¯¾ããç½°åã1/2ã«ãªã£ã¦ããçç±ã¯ãããããã¾ããã§ããã
glmnet ã® help ã§å¼ç¨ããã¦ãããã¡ãã®è«æã§ã¯ãã§ã« $(1-\alpha)1/2||\beta||^2_2$ ã¨ãã¦å®ç¾©ããã¦ãã¾ãã
ã¾ãscikit-learnã§ãåæ§ã«L2ãã«ã ã«å¯¾ãã¦ã¯0.5ãä¹ãã¦ããããã§ãï¼https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.htmlï¼ã
誰ãçç±ãæãã¦ãã ããã
ç¶ã㦠match.call() ãç¨ãã¦å¼æ°ã®æå®ãæ£å¼ãªãã®ã«ç´ãã¾ãï¼
## match.call this.call = match.call()
ããã ãã ã¨ä½ãè¨ã£ã¦ãããã¡ãã£ã¨ããããªãã¨æãã¾ãã®ã§ã以ä¸ã®ä¾ã§ç¢ºèªãã¦ã¿ã¾ãããï¼
myfun <- function(abc, def, ghi) { return(abc + 2*def + 3*ghi) }
ä¸ã®ããã«å¼æ°ã¨ã㦠abc ã def ã ghi ãåã颿°ãå®ç¾©ãã¾ãã
ãã®ã¨ã R ã§ã¯ã弿°ã®æå®ããªãå ´åã«ã¯é çªéãã«å
¥åããã¾ãï¼
> myfun(1, 2, 3) [1] 14
ä¸é¨ã®å¼æ°ã®ã¿æå®ãããå ´åã§ã¯æå®ããã弿°ã ãããã®éãã«å ¥åãããæ®ãã¯é çªéãã«å²ãå½ã¦ãããããã§ãã
> myfun(def = 3, 4, 5) [1] 25
ã¨ããã§ãã®å¼æ°ã®æå®ã¯ãä¸æã«æ±ºã¾ãã°æå®ã¯çç¥ãããã¨ãã§ãã¾ãï¼
> myfun(d = 3, 4, 5) [1] 25
䏿¹ãä¾ãã°ä»¥ä¸ã®ãããªå¼ã³åºãã§ã¯ g ããå§ã¾ã弿°ãï¼ã¤ããããä¸æã«æ±ºã¾ãããã¨ã©ã¼ã¨ãªã£ã¦ãã¾ãã¾ãã
> myfun2 <- function(abc, def, ghi, gjk) { + return(abc + 2*def + 3*ghi + 4*gjk) + } > myfun2(g = 3, 4, 5, 6) myfun2(g = 3, 4, 5, 6) ã§ã¨ã©ã¼: 弿° 1 ãè¤æ°ã®ä»®å¼æ°ã«ä¸è´ãã¾ã
ã§ã¯ match.call ã使ã£ã¦é¢æ°ãå¼ã³åºãã¨ã©ããªããã¨è¨ãã¨ï¼
> match.call(myfun, call("myfun", 1, def = 3, ghi = 5)) myfun(abc = 1, def = 3, ghi = 5)
ãã®éããå弿°ã«å¯¾ãã¦ä½ãå²ãå½ã¦ãããå¾ããã¨ãã§ãã¾ãã 便å©ã§ããã
ããã«ç¶ãã¦ã nlambda ã®æå®ã§ãã
ããã§ã¯ $\lambda$ ï¼ç½°åã®å¤§ããï¼ãã®ãã®ã§ã¯ãªããæ¤è¨¼ãã $\lambda$ ã®æ°ï¼nubmer of lambdaï¼ãæå®ãã¾ãï¼ããã©ã«ãã¯100ï¼ã
## nlambda nlam = as.integer(nlambda)
ãããã㯠y ã x ããã³ weight ã®ãã§ãã¯ã§ãï¼
## drop y = drop(y) ## x ### x ã¯ï¼å以䏿ã¤å¿ è¦ãããã®ã§ãåå帰ã¯ã§ããªãæ§å np = dim(x) if (is.null(np) | (np[2] <= 1)) stop("x should be a matrix with 2 or more columns") ### x ã®ã¬ã³ã¼ãæ° nobs = as.integer(np[1]) ### weights ### æªå ¥åã®ã¨ã㯠1 ãä¸ããweights 㨠nobs ãä¸è´ããªãã¨ãã¯ã¨ã©ã¼ if (missing(weights)) weights = rep(1, nobs) else if (length(weights) != nobs) stop(paste("number of elements in weights (", length(weights), ") not equal to the number of rows of x (", nobs, ")", sep = "")) ### 夿°ã®æ° nvars = as.integer(np[2]) ## y dimy = dim(y) ### y ã®ã¬ã³ã¼ãæ° nrowy = ifelse(is.null(dimy), length(y), dimy[1]) ### y 㨠x ã§ã¬ã³ã¼ãæ°ãåããªãã¨ãã¯ã¨ã©ã¼ if (nrowy != nobs) stop(paste("number of observations in y (", nrowy, ") not equal to the number of rows of x (", nobs, ")", sep = "")) ## 夿°å vnames = colnames(x) if (is.null(vnames)) vnames = paste("V", seq(nvars), sep = "")
y ã«å¯¾ãã drop ã§ããããã㯠length ã 1 ã§ãããããªåé·ãªæ¬¡å
ãè½ã¨ã颿°ã§ãã
ç¶ã㦠x ã®è¡æ°ã weight ã y ã¨åããªãå ´åã«ã¨ã©ã¼ãè¿ãã¦ãã¾ãã
以ä¸ã§ã¯ã¢ãã«ã«å«ãã夿°ãéã¼ãã¨ãã夿°ãªã©ãæå®ãã¾ã
ï¼ nx(=pmax) ã®æ¹ã¯ã¡ãã£ã¨çè§£ãã¢ã¤ã·ã¤ã®ã§ help ã®èª¬æãæ¸ãã¦ããã¾ãï¼ï¼
## èªç±åº¦ ### ã¢ãã«ã«å«ã¾ãã夿°ã®ä¸éãæå® ### dfmax = nvars + 1 ne = as.integer(dfmax) ### éã¼ãã¨ãã夿°ã®æ°ã®ä¸é(?) ### Limit the maximum number of variables ever to be nonzero ### pmax = min(dfmax * 2 + 20, nvars) nx = as.integer(pmax) ### é¤å¤å¯¾è±¡ã¨ãªã夿°ã®æå® if (missing(exclude)) exclude = integer(0)
次ã«å¤æ°ãã¨ã«ç°ãªãããã«ãã£ãä¸ããããã« penalty.factor ãæå®ãã¾ãã
ãã®æ°å¤ã lambda ã«ä¹ãããããããä¾ãã°ç¹å®ã®å¤æ°ã«å¯¾ã㦠penalty.factor = 0 ã¨ãã¦ããã°ç½°åãä¸ããªãããã«ãããã¨ãå¯è½ã¨ãªãã¾ãï¼çµæã¨ãã¦å¸¸ã«ã¢ãã«ã«æ¡ç¨ãããããã«ãªãï¼ï¼
## 夿°ãã¨ã«ç°ãªãããã«ãã£ãä¸ãã ### ããã©ã«ã㯠1 ãå ¥ã ### Inf ãæå®ããã¦ãã夿°ã¯ exclude ã¨ãã¦æ±ããã if (any(penalty.factor == Inf)) { exclude = c(exclude, seq(nvars)[penalty.factor == Inf]) exclude = sort(unique(exclude)) } if (length(exclude) > 0) { jd = match(exclude, seq(nvars), 0) if (!all(jd > 0)) stop("Some excluded variables out of range") penalty.factor[jd] = 1 jd = as.integer(c(length(jd), jd)) } else jd = as.integer(0) vp = as.double(penalty.factor)
ããã¯ãã£ãããªã®ã§å®éã«ãã£ã¦ã¿ã¾ãããã
åé ã®ã³ã¼ããæã£ã¦ãã¦ã以ä¸ã®ããã« lambda ãé©å½ã«è¨å®ãã¦ã¿ã¾ãã
x <- scale(LifeCycleSavings[, 2:5]) y <- LifeCycleSavings[, 1] - mean(LifeCycleSavings[, 1])
> coef(glmnet(x, y, family = "gaussian", alpha = 1, lambda = 0.3)) 5 x 1 sparse Matrix of class "dgCMatrix" s0 (Intercept) 1.182354e-15 pop15 -1.691002e+00 pop75 . dpi . ddpi 9.816514e-01
ãã®ã¨ãã2ã»3çªç®ã®å¤æ°ã§ãã pop75 㨠dpi 㯠0 ã¨æ¨å®ããã¦ãã¾ãã¾ããã
ããã§ãããã®å¤æ°ã® penalty.factor ã 0 ã¨ããã¨
> coef(glmnet(x, y, family = "gaussian", alpha = 1, lambda = 0.3, + penalty.factor = c(1, 0, 0, 1))) 5 x 1 sparse Matrix of class "dgCMatrix" s0 (Intercept) 9.523943e-16 pop15 -7.827680e-01 pop75 8.127991e-01 dpi -1.560908e-01 ddpi 6.812498e-01
ã¡ããã¨æ¨å®ãããããã«ãªã£ã¦ãã¾ãã
éã« pop15 ã® penalty.factor ã大ããããã¨
> coef(glmnet(x, y, family = "gaussian", alpha = 1, lambda = 0.3, + penalty.factor = c(2, 0, 0, 1))) 5 x 1 sparse Matrix of class "dgCMatrix" s0 (Intercept) 7.266786e-16 pop15 . pop75 1.374655e+00 dpi 2.586151e-02 ddpi 9.300500e-01
ãã®ããã«ã¢ãã«ããé¤å¤ããããã¨ã«ãªãã¾ãã
ããã« penalty.factor = Inf ã¨ããã¨ããã®å¤æ°ã¯ exclude ã¨ãã¦æ±ãããããã«ãªãã¾ãã
ç¶ã㦠glmnet.control ã§æã£ã¦ãããã©ã¡ã¼ã¿ã渡ãã¾ãã
## å é¨ã§ããã©ã«ãã§æã£ã¦ãããã©ã¡ã¼ã¿ internal.parms = glmnet.control() ### ããã°ã¬ã¹ãã¼ã表示ããï¼ if (internal.parms$itrace) trace.it = 1 else { if (trace.it) { glmnet.control(itrace = 1) on.exit(glmnet.control(itrace = 0)) } }
次ã«ãåå¸°ä¿æ°ã«å¯¾ããä¸éã»ä¸éãè¨å®ãã¾ãã ãªãä¸é㯠non-positive ãä¸é㯠non-negative ããè¨å®ã§ããªãããã§ããã
## ä¸éã»ä¸é ### lower.limit ã¨ãã¦ã¯éæ£ã®å¤ã®ã¿æå®ã§ãã if (any(lower.limits > 0)) { stop("Lower limits should be non-positive") } ### upper.limtit ã¯éã«éè² ã®å¤ã®ã¿æå®ã§ãã if (any(upper.limits < 0)) { stop("Upper limits should be non-negative") } ### Inf ï¼ããã©ã«ãï¼ã«ãªã£ã¦ãããã®ã«ã¤ãã¦ã¯ç¹å®ã®å¤(9.9e35)ã«å·®ãæ¿ã lower.limits[lower.limits == -Inf] = -internal.parms$big upper.limits[upper.limits == Inf] = internal.parms$big ### nvars ã¨ã®æ´åæ§ãã§ã㯠if (length(lower.limits) < nvars) { ### lower.limits ã¨ãã¦ã¹ã«ã©ãæå®ããã¦ããå ´å㯠nvars å ¨ã¦ã«é©ç¨ if (length(lower.limits) == 1) lower.limits = rep(lower.limits, nvars) else stop("Require length 1 or nvars lower.limits") } ### lower.limits ã nvars ãããé·ãå ´åã¯åããå©ç¨ãã else lower.limits = lower.limits[seq(nvars)] ### nvars ã¨ã®æ´åæ§ãã§ãã¯ï¼lower.limits ã¨åæ§ï¼ if (length(upper.limits) < nvars) { if (length(upper.limits) == 1) upper.limits = rep(upper.limits, nvars) else stop("Require length 1 or nvars upper.limits") } else upper.limits = upper.limits[seq(nvars)] ### ä¸éã»ä¸é ### coefficient limitï¼ cl = rbind(lower.limits, upper.limits) ### lower ã¾ã㯠upper ã« 0 ãå«ãå ´å ### 0é¤ç®ãçºçããã¨ãã®ã¨ã©ã¼å¯¾çï¼ if (any(cl == 0)) { ### fdev ã¯æå°ã¨ãªãããã¢ã³ã¹ã®å¤åé(å²å) ### minimum fractional change in deviance for stopping path; factory default = 1.0e5 fdev = glmnet.control()$fdev if (fdev != 0) { glmnet.control(fdev = 0) on.exit(glmnet.control(fdev = fdev)) # 颿°çµäºæã«å®è¡ãããå¦ç } } storage.mode(cl) = "double"
æ¨æºåã¨åçã«å¯¾ããæå®ã§ãã æ¨æºåã®å¦çãã®ãã®ã¯ä»¥éã®é¢æ°ã®å é¨ã§å®è¡ããããããããã§ã¯æå®ã®ã¿ãè¡ãã¾ãã
## æ¨æºå ### standardize 㨠intercept ã¯ããã©ã«ã㯠TRUE ãªã®ã§ 1 ã«ãªã isd = as.integer(standardize) intr = as.integer(intercept) ### Coxå帰ã«ãããè¦å if (!missing(intercept) && family == "cox") warning("Cox model has no intercept") ### standardize.response 㯠family="mgaussian" ã®ã¨ãã«ç®ç夿°ãæ¨æºåãããã®æå® jsd = as.integer(standardize.response)
åæãå¤å®ããé¾å¤ãæå®ãã¾ãã
## åæå¤å® ### coordinate descent ã«ãããåæã®é¾å¤ thresh = as.double(thresh)
次ã«ã lambda ã«é¢ããæå®ã¨ãªãã¾ããã flmin ããã³ ulam ã®ä½¿ããæ¹ãããçè§£ã§ããªãã£ãããããããã®èª¬æã¯çç¥ãã¾ãã
ãªã help ã«ãããã¾ãããé常㯠lambda ã«ã¯åä¸ã®å¤ã§ã¯ãªããåè£ã¨ãªãå¤ã®ãã¯ãã«ãä¸ãã¾ãã
Avoid supplying a single value for lambda (for predictions after CV use predict() instead).
## lambda ### ããã«ãã£ã®å¤§ãã ### æå®ããªãå ´åãflmin 㨠ulam 㯠lambda.min.ratio ããã³ 1 ã«æå®ããã ### lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04) if (is.null(lambda)) { if (lambda.min.ratio >= 1) stop("lambda.min.ratio should be less than 1") flmin = as.double(lambda.min.ratio) ulam = double(1) } ### æå®ãããå ´åãflmin(ä¸éï¼)ã¨ulam(ä¸éï¼)㯠1 ããã³ lambdaã®éé ã«æå®ããã else { flmin = as.double(1) if (any(lambda < 0)) stop("lambdas should be non-negative") ulam = as.double(rev(sort(lambda))) nlam = as.integer(length(lambda)) }
次ã«çè¡åã®æå®ã§ãã
å
¥å X ãçè¡åã§ããå ´åãdgCMatrix å½¢å¼ã«å¤æããã¾ãã
ããã§ dgCMatrix ã¨ã¯åæ¹åã®å¿åæ§ãæã¤çè¡åã®å½¢å¼ã§ãã
## sparse matrix ### x ã Matrix::sparseMatrix ã®å ´å㯠Matrix::dgCMatrix ã«å¤æãã ### dgCMatrix: cscé ã«ä¸¦ã³æ¿ãã¦(cscå½¢å¼)ã®çè¡åå§ç¸®ä¿ç®¡ is.sparse = FALSE ix = jx = NULL if (inherits(x, "sparseMatrix")) { is.sparse = TRUE x = as(x, "CsparseMatrix") x = as(x, "dgCMatrix") ### x@p ã¯ååã®éã¼ãã®å¤ã®åæ°ãç©ã¿ä¸ãããã®ãæ ¼ç´ããã¦ããï¼åæ° + 1ï¼ ### diff(x@p + 1) ããã°ååã®éã¼ãã®å¤ã®åæ°ãããã ix = as.integer(x@p + 1) ### x@i ã¯ååã®éã¼ãã®å¤ã®è¡çªå·ãæ ¼ç´ããã¦ããï¼ãªã®ã§ length(x@i) ãéã¼ãã®å¤ã®åæ°ã¨ä¸è´ããï¼ ### 0-index ãªã®ã§ R ã®ã¹ã¿ã¤ã«ã¨åãããããã« +1 ãã¦ããã®ã§ããã jx = as.integer(x@i + 1) ### x@x ã¯éã¼ãã§ããå¤ãã®ãã®ã®ãã¯ãã« x = as.double(x@x) }
ãããããã£ãããªã®ã§çè¡åã«ãããæ°å¤ã®æ ¼ç´æ¹æ³ã«ã¤ãã¦ãè¦ã¦ããã¾ãããã 以ä¸ã®ããã«çè¡åã使ãã¾ãï¼
set.seed(1234) i <- c(1, 5, 18) j <- c(4, 13, 19) n <- rnorm(3) m <- matrix(0, 20, 20) for (k in 1:length(n)) { m[i[k], j[k]] <- n[k] } s_m <- as(m, "dgCMatrix")
ããã§ s_m ã¯è¡å m ãçè¡åã¨ãã¦æ±ã£ããã®ã§ãã
str() ã§ç¢ºèªããã¨ã s_m ã«ã¯
@ iï¼éã¼ãã®è¦ç´ ã®å ¥ã£ã¦ããè¡çªå·( 0-index ã§ãããã¨ã«æ³¨æ)@ pï¼ååã«ãããéã¼ãã®è¦ç´ ã®åæ°ãç©ã¿ä¸ãããã®@ Dimï¼è¡åã®æ¬¡å @ Dimnamesï¼è¡åã®å次å ã®åå@ xï¼éã¼ãã®è¦ç´ ã®æ°å¤@ factorsï¼ï¼ããã¯ã¡ãã£ã¨ãããã¾ããã§ããï¼
ãæ ¼ç´ããã¦ãã¾ãã
> str(s_m) Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:3] 0 4 17 ..@ p : int [1:21] 0 0 0 0 1 1 1 1 1 1 ... ..@ Dim : int [1:2] 20 20 ..@ Dimnames:List of 2 .. ..$ : NULL .. ..$ : NULL ..@ x : num [1:3] -1.207 0.277 1.084 ..@ factors : list()
ããã§ @ i ã«ã¯éã¼ãã§ããåè¦ç´ ã®è¡çªå·ãå
¥ãããè¡å m ãä½ã£ãã¨ãã®è¡çªå·ã®æå® i ã«å¯¾å¿ãã¾ããã0-index ã§ããããæ°åã¯1ã¤ãã¤å°ãããªã£ã¦ãã¾ãã
> print(i- 1) [1] 0 4 17 > print(s_m@i) [1] 0 4 17
ã¡ãã£ã¨ãããã«ããã®ã @ p ã§ãããã«ã¯ååã«ãããéã¼ãã®è¦ç´ ã®åæ°ã®ç´¯ç©ãæ ¼ç´ãããåæ°ã«å¯¾å¿ãã¾ãï¼ãã ãæåã« 0 ã追å ããããããåæ° + 1 ã®é·ãã«ãªãã¾ãï¼ã
ä»åã®ä¾ã§ã¯è¡åã®åæ°ã 20 ãªã®ã§ãlength ã 21 ã¨ãªãã¾ãã
> length(s_m@p) [1] 21
ãã®ãã¯ãã«ã«ã¯éã¼ãã®è¦ç´ ã®åæ°ã®ç´¯ç©ãå ¥ã£ã¦ãããããå·®åãåãã¨å ã®è¡åã§éã¼ãã®è¦ç´ ãå ¥ã£ã¦ããåãå¾ããã¨ãã§ãã¾ãã
> diff(s_m@p) [1] 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0
åçªå·ãæå®ãã j ã¨æ¯è¼ãã¦ã¿ã¾ãããï¼
> which(diff(s_m@p) == 1) [1] 4 13 19 > j [1] 4 13 19
åã£ã¦ãã¾ããã
ç¶ãå¦çã§ã¯ã ix ã«ã¯ååã«ãããéã¼ãã®è¦ç´ ã®ç´¯ç©åæ°(+1)ã
ã jx ã«ã¯è¡çªå·ã代å
¥ãã¦ãã¾ãã
ã¾ã x ã«ã¯å
ã®çè¡åã«ãããéã¼ãã®è¦ç´ ã®å¤ãã®ãã®ããã¯ãã«ã¨ãã¦å
¥åãã¦ããã説æå¤æ°ã®è¡åãçè¡åã§ãã£ãå ´åããã®æç¹ã§è¡åã§ã¯ãªããã¯ãã«ã¨ãã¦æ±ããããã¨ã«ãªãã¾ãã
次ã«ãããã°ã¬ã¹ãã¼ã®æå®ã§ãï¼åºãããã§ããï¼ã
## ããã°ã¬ã¹ãã¼ if (trace.it) { if (relax) cat("Training Fit\n") pb <- createPB(min = 0, max = nlam, initial = 0, style = 3) }
ããã¦æå¾ã«æé©åã®ææ³ã«ã¤ãã¦ã®æå®ã§ãã
family ã `binomial ã¾ã㯠multinomial ã®å ´åã glmnet ã®å¼æ°ã§ãã type.logistic ããã³ type.multinomial ãè©ä¾¡ãããï¼å¾ã®å·¥ç¨ã§ï¼ããã«å¿ãã¦å¼ã°ãã颿°ãå¤ããã¾ãã
å
·ä½çã«ã¯ lognet2m ã lognetn ããã³ multlognetn ã®ã©ããé¸ã°ããããæ±ºã¾ãã¾ãã
ããã¯å¥ã®æ©ä¼ã«è§£èª¬ãã¾ãï¼äºå®ã§ãï¼ã
## æé©åã®ææ³ï¼ãã¸ã¹ãã£ãã¯ããã³å¤é ãã¸ã¹ãã£ãã¯ã®æï¼ ### type.logistic = c("Newton", "modified.Newton") ### Newton ãæå®ãªã 0ãmodified.Newton ãæå®ãªã 1 ãè¿ã ### If "Newton" then the exact hessian is used (default), while "modified.Newton" uses an upper-bound on the hessian, and can be faster. kopt = switch(match.arg(type.logistic), Newton = 0, modified.Newton = 1) ### type.multinomial = c("ungrouped", "grouped") ### å¤é ãã¸ã¹ãã£ãã¯ã§æ´ã«groupedã®å ´å㯠kopt 㯠2 ã¨ãªã ### If "grouped" then a grouped lasso penalty is used on the multinomial coefficients for a variable. This ensures they are all in our out together. ### The default is "ungrouped" if (family == "multinomial") { type.multinomial = match.arg(type.multinomial) if (type.multinomial == "grouped") kopt = 2 } kopt = as.integer(kopt)
æåã®æ¹ã§ family ã®ãã§ãã¯ã«ä½¿ãããããã§ã使ããã¦ãã match.arg ã§ããããã£ãããªã®ã§æåã確èªãã¦ããã¾ãããï¼
### 弿°ã« type.logistic ãæã¤é¢æ°ãå®ç¾© myfun <- function(a = "aaa", type.logistic = c("Newton", "modified.Newton")) { ### å¼ã³åºãå ã®é¢æ°ã®å¼æ°ããã§ãã¯ãã Newton ãªã 0ãmodified.Newton ãªã 1ãå²ãå½ã¦ã kopt <- switch(match.arg(type.logistic), Newton = 0, modified.Newton = 1) kopt }
ä¸ã®ãããªé¢æ°ãå®ç¾©ãã以ä¸ã®ããã«å¼ã³åºãã¨ãçµæã¯ãããã 0, 0, 1 ã¨ãªãã¾ãã
> myfun() [1] 0 > myfun(type.logistic = "Newton") [1] 0 > myfun(type.logistic = "modified.Newton") [1] 1
2. ãã£ããã£ã³ã°
以ä¸ã§ãã©ã¡ã¼ã¿ã®è¨å®ãåå¦çãçµããã¾ããã®ã§æ¬¡ã¯ãã£ããã£ã³ã°ã§ãã
ã¨ãã£ã¦ãããã§ã¯ family ã«å¿ãã¦å¼ã³åºã颿°ãå¤ãã¦ããã ããªã®ã§ã詳細ã¯ä¸æ¦ã¹ããããã¾ãããã
# ãã£ããã£ã³ã° ## family ã«å¿ãã¦ãã®å¾ã«å¼ã³åºã颿°ãå¤ãã fit = switch(family, ### gaussian ã®ã¨ã㯠elnet gaussian = elnet(x, is.sparse, ix, jx, y, weights, offset, type.gaussian, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, intr, vnames, maxit), ### poisson ã®ã¨ã㯠fishnet poisson = fishnet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, intr, vnames, maxit), ### binomial ã®ã¨ã㯠lognet binomial = lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, intr, vnames, maxit, kopt, family), ### multinomial ã®ã¨ãã lognet multinomial = lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, intr, vnames, maxit, kopt, family), ### cox ã®ã¨ã㯠coxnet cox = coxnet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, vnames, maxit), ### mgaussian ã®ã¨ã㯠mrelnet mgaussian = mrelnet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, jsd, intr, vnames, maxit)) ## ããã°ã¬ã¹ãã¼ if (trace.it) { utils::setTxtProgressBar(pb, nlam) close(pb) }
ãªãããã§ããããã®é¢æ°ã«æ¸¡ããã¦ãã弿°ãæ¯è¼ããã¨ä»¥ä¸ã®ããã«ãªãã¾ãï¼ä¸é¨ã¯ãããããã¾ããã§ããï¼ï¼
| 弿° | 説æ | elnet | fishnet | lognet | coxnet | mrelnet |
|---|---|---|---|---|---|---|
| x | 説æå¤æ°ã®è¡å | ã | ã | ã | ã | ã |
| is.sparse | çè¡åã§ãããã®æå® | ã | ã | ã | ã | ã |
| ix | çè¡åã«ãããéã¼ãã®è¦ç´ ã®ç´¯ç©åæ° | ã | ã | ã | ã | ã |
| jx | çè¡åã«ãããéã¼ãã®è¦ç´ ã®è¡çªå· | ã | ã | ã | ã | ã |
| y | ç®ç夿°ã®è¡å | ã | ã | ã | ã | ã |
| weights | 観測å¤ã«å¯¾ããéã¿ | ã | ã | ã | ã | ã |
| offset | ãªãã»ãã | ã | ã | ã | ã | ã |
| type.gaussian | 1:covariance, 2:naïve | ã | - | - | - | - |
| alpha | L1ã¨L2ã«å¯¾ããéã¿ã®èª¿æ´ãã©ã¡ã¼ã¿ | ã | ã | ã | ã | ã |
| nobs | ã¬ã³ã¼ãæ° | ã | ã | ã | ã | ã |
| nvars | 説æå¤æ°ã®æ° | ã | ã | ã | ã | ã |
| jd | ? | ã | ã | ã | ã | ã |
| vp | å夿°ã«å¯¾ããç½°åã®éã¿ï¼penalty.factorï¼ | ã | ã | ã | ã | ã |
| cl | ? | ã | ã | ã | ã | ã |
| ne | ã¢ãã«ã«å«ã¾ãã夿°ã®ä¸éãne = dfmax = nvars + 1 | ã | ã | ã | ã | ã |
| nx | éã¼ãã¨ãã夿°ã®åæ°ã®ä¸éï¼ | ã | ã | ã | ã | ã |
| nlam | lambdaã®æ° | ã | ã | ã | ã | ã |
| flmin | ? | ã | ã | ã | ã | ã |
| ulam | ? | ã | ã | ã | ã | ã |
| thresh | åæå¤å®ã®é¾å¤ | ã | ã | ã | ã | ã |
| isd | standardizeãããã®æå® | ã | ã | ã | ã | ã |
| jsd | ? | - | - | - | - | ã |
| intr | åçï¼Interceptï¼ãå«ãããã®æå® | ã | ã | ã | - | ã |
| vnames | 夿°å | ã | ã | ã | ã | ã |
| maxit | å復忰ã®ä¸é | ã | ã | ã | ã | ã |
| kopt | æé©åã®ææ³ | - | - | ã | - | - |
| family | family | - | - | ã | - | - |
3. å¾å¦ç
æå¾ã«å¾å¦çã§ãã
# å¾å¦ç ## lambda ãæå®ããã¦ããã fit$lambda ã 3 ãã¿ã¼ã³ä»¥ä¸æ¤è¨¼ããã¦ããå ´åãå é ãå·®ãæ¿ãã ## glmnet::fix.lam ## function (lam) { ## if (length(lam) > 2) { ## llam = log(lam) ## lam[1] = exp(2 * llam[2] - llam[3]) ## } ## lam ## } if (is.null(lambda)) fit$lambda = fix.lam(fit$lambda) ## call fit$call = this.call ## ã¬ã³ã¼ãæ° fit$nobs = nobs ## class ã« glmnet ã追å class(fit) = c(class(fit), "glmnet") # ãªã¿ã¼ã³ ## relax ã TRUE ã®å ´åãè§£ãã¹ã®åã»ããã«ã¤ãã¦ç½°åãªãã§ã¢ãã«ããã£ããã£ã³ã°ãã ## If TRUE then for each active set in the path of solutions, the model is refit without any regularization. See details for more information. ## This argument is new, and users may experience convergence issues with small datasets, especially with non-gaussian families. ## Limiting the value of âmaxpâ can alleviate these issues in some cases. if (relax) relax.glmnet(fit, x = x, y = y, weights = weights, offset = offset, lower.limits = lower.limits, upper.limits = upper.limits, check.args = FALSE, ...) else fit
ãã®å¾å¦çã§ç®ç«ã¤å·¥ç¨ã¨ãã¦ã¯ relax ã®é¨åã§ãããã
ããã§ relax 㯠help ã«ããã¨ã
If relax=TRUE a duplicate sequence of models is produced, where each active set in the elastic-net path is refit without regularization. The result of this is a matching "glmnet" object which is stored on the original object in a component named "relaxed", and is part of the glmnet output.
ã¨ãããã¨ã§ãglmnet ã«ãã£ã¦å¤æ°é¸æãããçµæãç¨ãã¦ãç½°åãªãã§å度ãã£ããã£ã³ã°ãè¡ããªãã·ã§ã³ã®ããã§ãã
ãããå®éã«ãã£ã¦ã¿ãã®ãæ©ãã¨æãã¾ãã®ã§ã以ä¸ã®ããã«å®è¡ãã¦ã¿ã¾ãï¼
lasso_02 <- glmnet(x, y, family = "gaussian", relax = T)
ããã¨ãå
ç¨ã®çµæï¼ lasso ï¼ã«ã lasso_02$relaxed ã¨ããçµæã追å ããã¦ãããã¨ããããã¾ãããå
容㯠lasso ã¨ã»ã¨ãã©åãã§ãã
> str(lasso) List of 12 $ a0 : Named num [1:68] 6.11e-16 6.71e-16 7.26e-16 7.76e-16 8.22e-16 ... ..- attr(*, "names")= chr [1:68] "s0" "s1" "s2" "s3" ... $ beta :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots .. ..@ i : int [1:216] 0 0 0 0 0 3 0 3 0 3 ... .. ..@ p : int [1:69] 0 0 1 2 3 4 6 8 10 12 ... .. ..@ Dim : int [1:2] 4 68 .. ..@ Dimnames:List of 2 .. .. ..$ : chr [1:4] "pop15" "pop75" "dpi" "ddpi" .. .. ..$ : chr [1:68] "s0" "s1" "s2" "s3" ... .. ..@ x : num [1:216] -0.181 -0.347 -0.497 -0.634 -0.757 ... .. ..@ factors : list() $ df : int [1:68] 0 1 1 1 1 2 2 2 2 2 ... $ dim : int [1:2] 4 68 $ lambda : num [1:68] 2.02 1.84 1.68 1.53 1.39 ... $ dev.ratio: num [1:68] 0 0.0352 0.0645 0.0888 0.1089 ... $ nulldev : num 984 $ npasses : int 562 $ jerr : int 0 $ offset : logi FALSE $ call : language glmnet(x = x, y = y, family = "gaussian", alpha = 1) $ nobs : int 50 - attr(*, "class")= chr [1:2] "elnet" "glmnet" > str(lasso_02$relaxed) List of 12 $ a0 : Named num [1:68] 6.11e-16 1.29e-15 1.29e-15 1.29e-15 1.29e-15 ... ..- attr(*, "names")= chr [1:68] "s0" "s1" "s2" "s3" ... $ beta :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots .. ..@ i : int [1:216] 0 0 0 0 0 3 0 3 0 3 ... .. ..@ p : int [1:69] 0 0 1 2 3 4 6 8 10 12 ... .. ..@ Dim : int [1:2] 4 68 .. ..@ Dimnames:List of 2 .. .. ..$ : chr [1:4] "pop15" "pop75" "dpi" "ddpi" .. .. ..$ : chr [1:68] "s0" "s1" "s2" "s3" ... .. ..@ x : num [1:216] -2.04 -2.04 -2.04 -2.04 -1.98 ... .. ..@ factors : list() $ df : int [1:68] 0 1 1 1 1 2 2 2 2 2 ... $ dim : int [1:2] 4 68 $ lambda : num [1:68] 2.02 1.84 1.68 1.53 1.39 ... $ dev.ratio: num [1:68] 0 0.208 0.208 0.208 0.208 ... $ nulldev : num 984 $ npasses : int 562 $ jerr : int 0 $ offset : logi FALSE $ call : language glmnet(x = x, y = y, family = "gaussian", relax = T) $ nobs : int 50 - attr(*, "class")= chr [1:2] "elnet" "glmnet"
ããã§ lasso_02$relaxed ã®ä¸èº«ãå°ãè¦ã¦ã¿ãã¨ãä¾ãã° beta ã«ã¯ä»¥ä¸ã®ãããªæ°å¤ãå
¥ã£ã¦ãã¾ãã
> lasso_02$relaxed$beta[, 1:6] 4 x 6 sparse Matrix of class "dgCMatrix" s0 s1 s2 s3 s4 s5 pop15 . -2.040996 -2.040996 -2.040996 -2.040996 -1.980216 pop75 . . . . . . dpi . . . . . . ddpi . . . . . 1.270865
ããã¯ä½ãã¨è¨ãã¨ãå°ããã¤ç½°åã®éã¿ãå¤ãããã¨ã§å¤æ°ã鏿ãããç¶æ
ã§é常ã®ç·å½¢å帰ãå½ã¦ã¯ããã¨ãã®åå¸°ä¿æ°ã¨ãªã£ã¦ãã¾ãã
ä¾ãã° lasso_02$relaxed$beta[, 6] ã«ã¯ã夿°ã¨ãã¦é¸æããã pop15 㨠ddpi ããããã®åå¸°ä¿æ°ãå
¥ã£ã¦ãã¾ãã
å®éã« lm ã®çµæã¨ä¸è´ãããè¦ã¦ã¿ã¾ãããï¼
> coef(lm(y ~ x[, c(1, 4)])) (Intercept) x[, c(1, 4)]pop15 x[, c(1, 4)]ddpi 1.364331e-15 -1.980216e+00 1.270865e+00
åã£ã¦ãã¾ããã
ã¨ããã§åçã®æ¨å®å¤ãå
¥ã£ã¦ãã lasso_02$relaxed$a0 ã®å¤ã¯å°ãç°ãªãããã§ãï¼
> lasso_02$relaxed$a0[6] s5 1.28119e-15
ãªãã§ããããã
ããããããæ¨æºåã®éããã¨ãæãã¾ãããããã§ããªãããã§ããã®çç±ã¯ãããã¾ããã§ããã
lasso_03 <- glmnet(x, y, family = "gaussian", relax = T, standardize = F)
> lasso_03$relaxed$a0[6] s5 1.28119e-15
glmnet() ã®å®è£
ã¯ä»¥ä¸ã¨ãªãã¾ãã
次åã¯ãã£ããã£ã³ã°ã®é¨åã§å¼ã°ãã¦ãã elnet ã詳ããè¦ã¦ããã¾ãããã
ãªã gam ã®ã¨ãã¨ã¯éãã glmnet ã§ã¯ library ãã¤ã³ã¹ãã¼ã«ãã¦ãã½ã¼ã¹ã³ã¼ãã¯ä»ãã¦ãã¾ããã§ããã®ã§ããã¡ããåèã« fortran ã®ã½ã¼ã¹ã³ã¼ããåå¾ãã¾ããã
ã§ã¯ã¾ã次åã