Stanã使ã£ã¦å¤æ°é¸æããã
èæ¯
Stanã使ã£ã¦ã¢ããªã³ã°ããã¦ããæã«ä¸æºãæããç¹ã¨ãã¦ã夿°é¸æãé£ããã¨ãããã¨ãæãããã¾ãããã¨ãã¨ç§èªèº«ã¯ãä¾ãã°StepwiseãLassoãªã©ãç¨ãã"æ©æ¢°çãª"夿°é¸æããã¾ã好ãã§ã¯ãªã1ã®ã§ãããããã§ãåæãå¹ççã«é²ããä¸ã§ãããã®ææ³ã«é ¼ããããªããã¨ãããã¾ãã
ãããã£ãã¨ãã«glmãç¨ãã¦ããã®ã§ããã°step颿°ã«ãã容æã«å¤æ°é¸æãå¯è½ãªã®ã§ãããStanã§ã¯ãããããã¾ãããä½ãè¯ãæ¹æ³ã¯ãªããã¨æ¢ãã¦ããã¨ãããStanã®Githubã¬ãã¸ããªã«{projpred}ã¨ããããã£ã½ãlibraryãè¦ã¤ããã®ã§ãç´¹ä»ãã¦ã夿°ã®é¸æç²¾åº¦ãå®é¨ãã¦ã¿ã¾ã2ã
ãã¼ã¿æºå
ã©ã¤ãã©ãªã®èªã¿è¾¼ã¿
ä»åã®åæã§ã¯{rstan}ã®ä»£ããã«{rstanarm}ã¨ããlibraryã使ç¨ãã¾ãã{rstanarm}ã¯Rã®glmã¨åããããªã·ã³ã¿ãã¯ã¹ã®ã¾ã¾ã§ã¢ãã«ããã¤ãºåãããã¨ãå¯è½ã§ãä¾ãã°{rstanarm}ã®vignetteã§ã¯ä»¥ä¸ã®ãããªã¹ãããããç´¹ä»ããã¦ãã¾ã(ä¸é¨æ¹å¤ãã)3ã
library(rstanarm) data("womensrole", package = "HSAUR3") womensrole$total <- womensrole$agree + womensrole$disagree womensrole_bglm_1 <- stan_glm(cbind(agree, disagree) ~ education + gender, data = womensrole, family = binomial(link = "logit"), prior = student_t(df = 7), prior_intercept = student_t(df = 7), chains = 4, cores = 1, seed = 123)
> womensrole_bglm_1 stan_glm family: binomial [logit] formula: cbind(agree, disagree) ~ education + gender observations: 42 predictors: 3 ------ Median MAD_SD (Intercept) 2.5 0.2 education -0.3 0.0 genderFemale 0.0 0.1 Sample avg. posterior predictive distribution of y: Median MAD_SD mean_PPD 24.3 0.8 ------ For info on the priors used see help('prior_summary.stanreg').
æ°åã®æå³ã¯ä¸æ¦ç½®ãã¦ããã¾ãããä»åã¯ãã¡ãã®libraryã使ç¨ããªããé²ãã¦ããã¾ãã
ç¶ãã¦ä»ã®libraryãèªã¿è¾¼ã¿ã¾ãããã®è¾ºãã¯vignetteãã®ã¾ã¾ã§ãã
library(projpred) library(ggplot2) library(bayesplot) theme_set(theme_bw()) options(mc.cores = parallel::detectCores())
ã·ãã¥ã¬ã¼ã·ã§ã³ãã¼ã¿ã®ä½æ
å®é¨ãé²ããã«ãããã·ãã¥ã¬ã¼ã·ã§ã³ãã¼ã¿ã使ãã¾ãããã¡ããlibraryã«ä»å±ã®ãã¼ã¿ã»ããã使ã£ã¦ãè¯ãã®ã§ããããã®ã¾ã¾åããã¨ãããã®ãé¢ç½ããªãã®ã§ä»¥ä¸ã®ããã«è¤æ°ã®ãã¼ã¿ã使ãã¦çµæãæ¯è¼ãã¦ã¿ã¾ãããï¼
- 説æå¤æ°ã®æ°ãå°ãªã(20)ãé£ç¶å¤æ°ã®ã¿
- 説æå¤æ°ã®æ°ãå¤ã(100)ãé£ç¶å¤æ°ã®ã¿
- 説æå¤æ°ã®æ°ãå°ãªã(20)ãããã¼å¤æ°ãå«ã
- 説æå¤æ°ã®æ°ãå¤ã(100)ãããã¼å¤æ°ãå«ã
è¦ã¦ãããéãã§ã説æå¤æ°ã®æ°ã¨ããã¼å¤æ°ã®æç¡ã«ãã£ã¦4ãã¿ã¼ã³ãç¨æãã¾ãããªããã®ãã¿ã¼ã³ã«ãããã¨ããã¨ãåç´ã«vignetteãè¦ã¦ãã¦é£ç¶å¤æ°ã°ããã ãªã¨æã£ãã®ã¨ããµã³ãã«ãã¼ã¿ã®èª¬æå¤æ°ã®æ°ã20ãããã§ããããªãæ©æ¢°çãªé¸æã«é ¼ããªãã¦ãèªåã§ã©ãã«ãã§ãããã ãªãã¨æã£ãããã§ãã
åãã¿ã¼ã³ã«ã¤ãã¦ç¹ã«ã²ããããªããã¼ã¿ã使ãã¾ãããªãçã®ã¢ãã«ã¯1ã¨2ã3ã¨4ã§åä¸ã®ãã®ã¨ãã¾ããããããã£ã¦2ã¨4ã®ç¶æ³ã¯ã1ã¨3ããããã«æ¯ã¹ã¦ãä¸è¦ãªãã¼ã¿ã®ã¿ã追å ãããç¶æ ãã§ã®å¤æ°é¸æã¨ããç¶æ³ã«ãªãã¾ãã
以ä¸ã®ããã«åãã©ã¡ã¼ã¿ãè¨å®ãã¾ãã
set.seed(1) N <- 100 # number of observations xn_1 <- 20 # number of variables for pattern 1 xn_2 <- 100 # number of variables for pattern 2 b1 <- 0.5 # regression coefficient of X[, 1] b2 <- 0.8 # regression coefficient of X[, 2] var_e <- 1 # residual variance
ãã¿ã¼ã³1 & 2ã«ã¤ãã¦ãã¾ãä¹±æ°ã«ããXãçæãããã®ãã¡ã®2åã使ã£ã¦Yã使ãã¾ããã¾ããã®2åãå«ã20åããã³å ¨åãåãã¿ã¼ã³ã®èª¬æå¤æ°ã¨ãã¾ãã
X <- matrix(rnorm(N * xn_2, 0, 1), nrow = N, ncol = xn_2) Y <- 1.0 + X[, 1] * b1 + X[, 2] * b2 + rnorm(N, 0, var_e) X1 <- X[, 1:xn_1] X2 <- X[, 1:xn_2] dat1 <- data.frame("Y" = Y, "X" = I(X1)) dat2 <- data.frame("Y" = Y, "X" = I(X2))
ãã¼ã¿ã»ããã使ããéã«X1ãI()ã§æ¸¡ããã¨ã§ãX1å
¨ä½ã"X"ã¨ãã¦åå®ç¾©ã§ãã¾ããããã¨ä»¥ä¸ã®ãããªformulaãå®è¡å¯è½ã«ãªãã¾ãã
> lm(Y ~ X, dat1) Call: lm(formula = Y ~ X, data = dat1) Coefficients: (Intercept) X1 X2 X3 X4 X5 X6 1.038393 0.422365 0.824111 0.006940 -0.064897 0.060024 -0.004623 X7 X8 X9 X10 X11 X12 X13 -0.096471 0.040521 0.123807 -0.051311 0.125116 0.009435 0.014924 X14 X15 X16 X17 X18 X19 X20 0.065920 -0.037135 -0.110772 -0.015537 -0.003969 -0.136834 0.095512
ãã¡ãããã®ã¾ã¾æ¸¡ãã¦.ã§æå®ããã®ã§ãæ§ãã¾ããã
tmp <- data.frame("Y" = Y, "X" = X1)
> lm(Y ~ ., tmp) Call: lm(formula = Y ~ ., data = tmp) Coefficients: (Intercept) X.1 X.2 X.3 X.4 X.5 X.6 1.038393 0.422365 0.824111 0.006940 -0.064897 0.060024 -0.004623 X.7 X.8 X.9 X.10 X.11 X.12 X.13 -0.096471 0.040521 0.123807 -0.051311 0.125116 0.009435 0.014924 X.14 X.15 X.16 X.17 X.18 X.19 X.20 0.065920 -0.037135 -0.110772 -0.015537 -0.003969 -0.136834 0.095512
ç¶ãã¦ãã¿ã¼ã³3 & 4ã«ã¤ãã¦ãåºæ¬çã«åãæµãã§ãã¼ã¿ã使ãã¾ãããä¸é¨ã«ããã¼å¤æ°ãå«ãã¾ãã
Z <- matrix(rnorm(N * xn_2, 0, 1), nrow = N, ncol = xn_2) Z[, seq(1, 100, 10)] <- if_else(Z[, seq(1, 100, 10)] > 0, 1, 0) Y <- 1.0 + Z[, 1] * b1 + Z[, 2] * b2 + rnorm(N, 0, var_e) X3 <- Z[, 1:xn_1] X4 <- Z[, 1:xn_2] dat3 <- data.frame("Y" = Y, "X" = I(X3)) dat4 <- data.frame("Y" = Y, "X" = I(X4))
ãã£ããã£ã³ã°
stan_glmã«ãããã£ããã£ã³ã°
ã§ã¯ããããã®ãã¼ã¿ãç¨ãã¦ãã£ããã£ã³ã°ããã¦ã¿ã¾ããããã¹ã¯ãªããã¯ä»¥ä¸ã®ããã«ãªãã¾ãï¼
n <- N D <- xn_1 p0 <- 5 # prior guess for the number of relevant variables # scale for tau (notice that stan_glm will automatically scale this by sigma) tau0 <- p0 / (D-p0) * 1/sqrt(n) prior_coeff <- hs(global_scale = tau0, slab_scale = 1) # regularized horseshoe prior fit1 <- stan_glm(Y ~ X, family = gaussian(), data = dat1, prior = prior_coeff, seed = 777, chains = 4, iter = 2000)
D <- xn_2 p0 <- 5 tau0 <- p0/(D-p0) * 1/sqrt(n) prior_coeff <- hs(global_scale = tau0, slab_scale = 1) fit2 <- stan_glm(Y ~ X, family = gaussian(), data = dat2, prior = prior_coeff, seed = 777, chains = 4, iter = 2000)
D <- xn_1 p0 <- 5 tau0 <- p0 / (D-p0) * 1/sqrt(n) prior_coeff <- hs(global_scale = tau0, slab_scale = 1) fit3 <- stan_glm(Y ~ X, family = gaussian(), data = dat3, prior = prior_coeff, seed = 777, chains = 4, iter = 2000)
D <- xn_2 p0 <- 5 tau0 <- p0 / (D-p0) * 1/sqrt(n) prior_coeff <- hs(global_scale = tau0, slab_scale = 1) fit4 <- stan_glm(Y ~ X, family = gaussian(), data = dat4, prior = prior_coeff, seed = 777, chains = 4, iter = 2000)
å®è¡ç»é¢ã¯çç¥ãã¾ãã
çµæã®ç¢ºèª
ç¡äºã«ãã£ããã£ã³ã°ãçµãã£ãã®ã§ã¾ãã¯çµæã確èªãã¾ããããfit1ãªãã¸ã§ã¯ãã«ã¯å¤ãã®æ
å ±ãæ ¼ç´ããã¦ãã¾ããã以ä¸ã®ããã«ãã©ã¡ã¼ã¿ã®åå¸ã確èªãã¾ã(çµæã¯ä¸é¨æç²)ã
> fit1$stan_summary mean se_mean sd 2.5% 10% 25% 50% 75% 90% 97.5% n_eff Rhat (Intercept) 1.026278e+00 0.0012818960 0.08107422 0.87096060 0.92283578 9.695387e-01 1.025776e+00 1.083455e+00 1.12962235 1.18570039 4000.000 1.0002199 X1 3.908290e-01 0.0016254360 0.09798225 0.19341915 0.26638997 3.278100e-01 3.940279e-01 4.542220e-01 0.51236797 0.57793402 3633.750 0.9999048 X2 8.240328e-01 0.0013608445 0.08516982 0.65884688 0.71552972 7.672683e-01 8.238806e-01 8.816342e-01 0.93310548 0.98857084 3917.008 0.9995082 X3 -1.059357e-03 0.0005388889 0.03408233 -0.08025937 -0.03484370 -1.106505e-02 -1.177711e-04 1.032889e-02 0.03180827 0.07368474 4000.000 1.0005552 X4 -1.872770e-02 0.0007089570 0.04483838 -0.13979248 -0.07768171 -3.228712e-02 -3.977780e-03 3.213492e-03 0.01638515 0.04656703 4000.000 0.9998376 X5 2.138502e-02 0.0006738661 0.04261903 -0.03511247 -0.01201396 -1.858049e-03 6.263328e-03 3.500771e-02 0.07857940 0.13847975 4000.000 0.9993905# ï¸ # 以ä¸çç¥ # ï¸
ãã®çµæãè¦ãã¨X1ã¨X2ã§æ¦ãè¨å®éãã®å¤ãå¾ããã¦ããããã§ããã
ã§ã¯ãããã{projpred}ã«ãã夿°é¸æã«åãæããã¾ããããã¹ã¯ãªããã¯ä»¥ä¸ã®ããã«ãªãã¾ãï¼
sel1 <- varsel(fit1, method = 'forward')
> sel1$varsel$vind # variables ordered as they enter during the search X2 X1 X20 X16 X19 X11 X5 X14 X18 X4 X7 X9 X8 X17 X10 X15 X13 X12 X3 X6 2 1 20 16 19 11 5 14 18 4 7 9 8 17 10 15 13 12 3 6
ããã£ï¼ã¡ããã¨1åç®ã¨2åç®ãé¸ã°ãã¦ãã¾ããï¼ä¸è¨ã®ãããããè¦ã¦ãã3çªç®ä»¥éã®å¤æ°ã¯ã¢ãã«ã®ç²¾åº¦ã«å¯¾ãã¦å¯ä¸ãã¦ããªããã¨ããããã¨æãã¾ãããªãããã§elpdã¯expected log predictive densityãæãã¾ãããã¡ãã£ã¨æ
å ±éè¦æºè¾ºãã®è©±ã¯ããã¤ã«è§£èª¬ã§ããªãã®ã§vignetteãå¼ç¨ããã«çãã¦ããã¾ãã
The loo method for stanreg objects provides an interface to the loo package for approximate leaveone-out cross-validation (LOO). The LOO Information Criterion (LOOIC) has the same purpose as the Akaike Information Criterion (AIC) that is used by frequentists. Both are intended to estimate the expected log predictive density (ELPD) for a new dataset.
varsel_plot(sel1, stats=c('elpd', 'rmse'))

åãã©ã¡ã¼ã¿ã®åå¸ã«ã¤ãã¦ãè¦ã¦ã¿ã¾ããããä¸ä½5åã«é¸ã°ãã説æå¤æ°ã«éå®ãã¦ãããããã¦ã¿ã¾ãã
# Visualise the three most relevant variables in the full model mcmc_areas(as.matrix(sel1), pars = names(sel1$varsel$vind[1:5])) + coord_cartesian(xlim = c(-2, 2))

广ã®ãªãã£ã夿°ã¯0ãä¸å¿ã«éä¸ãã¦åå¸ãã¦ãããã¨ããããã¾ãã
æ®ãã®åãã¿ã¼ã³ã«ã¤ãã¦ãåæ§ã«è¦ã¦ããã¾ãããåå¥ã«ç¢ºèªããã®ã¯é¢åãªã®ã§ä¸åº¦ã«ã¾ã¨ãã¦ãã¾ãã¾ãã
sel2 <- varsel(fit2, method = 'forward') sel3 <- varsel(fit3, method = 'forward') sel4 <- varsel(fit4, method = 'forward')
> sel2$varsel$vind X2 X1 X87 X24 X83 X71 X89 X18 X95 X85 X4 X67 X19 X16 X60 X56 X55 X53 X42 X20 2 1 87 24 83 71 89 18 95 85 4 67 19 16 60 56 55 53 42 20 > sel3$varsel$vind X2 X14 X4 X1 X18 X5 X16 X3 X9 X15 X19 X20 X6 X8 X17 X11 X7 X12 X13 X10 2 14 4 1 18 5 16 3 9 15 19 20 6 8 17 11 7 12 13 10 > sel4$varsel$vind X2 X90 X74 X45 X64 X14 X25 X52 X60 X4 X20 X16 X22 X1 X63 X41 X9 X26 X53 X5 2 90 74 45 64 14 25 52 60 4 20 16 22 1 63 41 9 26 53 5
ããããã
ãããã®ãã¿ã¼ã³ã§ã2åç®ã¯ã¡ããã¨é¸ã°ãã¦ãã¾ããããã¿ã¼ã³3ã¨4ã§ã¯1åç®ãé¸ã°ãã¦ãã¾ãããã ãããããè¦ã¦ã¿ã¾ãããã
varsel_plot(sel2, stats=c('elpd', 'rmse')) varsel_plot(sel3, stats=c('elpd', 'rmse')) varsel_plot(sel4, stats=c('elpd', 'rmse'))

æ£ãã夿°ã»ãããé¸ã°ãã¦ããªãã®ã§å½ç¶ã§ããã2夿°ç®ã§ãã§ã«ãã«ã¢ãã«ã«ãããelpdãrmseã¨æ¥ããããã«ãªã£ã¦ãããã¢ãã«ã®ç²¾åº¦æ¹åã«è²¢ç®ãã¦ããã®ã¯1夿°ç®ã ãã¨ããçµæã«ãªã£ã¦ãã¾ãã
ãã©ã¡ã¼ã¿ã®åå¸ãè¦ã¦ãããã¿ã¼ã³1ã¨æ¯è¼ããã¨ããããåå¸ã®è£¾ãåºããªã£ã¦ãã¾ããããã«ãã¿ã¼ã³3ã§ã¯æ£ã®å¹æãåãããããå³ã«è£¾ãå¼ãå½¢ã¨ã¯ãªã£ã¦ãããã®ã®ã广ã®ãªãä»ã®å¤æ°ã¨ã»ã¨ãã©å¤ãããªãåå¸ã¨ãªã£ã¦ãã¾ãã¾ãããããã¼å¤æ°ã¯ãã©ã¡ã¼ã¿ã®æ¨å®ãé£ããããã§ãã
mcmc_areas(as.matrix(sel2), pars = names(sel2$varsel$vind[1:5])) + coord_cartesian(xlim = c(-2, 2)) mcmc_areas(as.matrix(sel3), pars = names(sel3$varsel$vind[1:5])) + coord_cartesian(xlim = c(-2, 2)) mcmc_areas(as.matrix(sel4), pars = names(sel4$varsel$vind[1:5])) + coord_cartesian(xlim = c(-2, 2))

追試
ã§ã¯ãããã¼å¤æ°ãå«ã¾ããå ´åã«çã®ãã©ã¡ã¼ã¿ãæãããã¨ãã§ãã確çã¨ãã¦ã¯ã©ã®ç¨åº¦ã«ãªãã®ã§ããããããã¿ã¼ã³3ã100åç¹°ãè¿ãããã®ãã¡ä½åçã®å¤æ°ã»ããã鏿ã§ããã確èªãã¦ã¿ã¾ãããã
n <- 100 t <- 100 D <- 20 p0 <- 5 b1 <- 0.5 b2 <- 0.8 tau0 <- p0 / (D-p0) * 1/sqrt(n) out <- matrix(0, t, 2) for (i in 1:t) { x <- matrix(rnorm(n * D, 0, 1), nrow = n) x[, seq(1, D, 5)] <- ifelse(x[, seq(1, D, 5)] > 0, 1, 0) y <- 1.5 + x[, 1] * b1 + x[, 2] * b2 + rnorm(n, 0, 1) dat <- data.frame("y" = y, "x" = I(x)) prior_coeff <- hs(global_scale = tau0, slab_scale = 1) fit0 <- stan_glm(y ~ x, family = gaussian(), data = dat, prior = prior_coeff, chains = 4, iter = 2000) sel0 <- varsel(fit0) out[i, ] <- names(sel0$varsel$vind[1:2]) }
> table(out[, 1]) x2 100 > table(out[, 2]) x1 x10 x11 x12 x13 x14 x15 x17 x18 x19 x20 x3 x4 x5 x6 x7 x9 55 2 2 3 3 1 1 5 5 1 1 2 7 4 1 4 3
ãªãã¨ãé£ç¶å¤æ°ã§ãã2åç®ã®é¸æç¢ºçã¯100%ã§ãã䏿¹ãããã¼å¤æ°ã§ãã1åç®ã¯ç´åæ°ãã鏿ãããªãã¨ããçµæã«ãªãã¾ããããã¡ããåå¸°ä¿æ°ã®å¤§å°ã«ãå½±é¿ãããã®ã§ãããããçµæ§è¡æçãªçµæã«æãã¾ãã
çµããã«
ããã¾ã§ã®å®é¨ã®çµæãã¾ã¨ããã¨ä»¥ä¸ã®ããã«ãªãããã§ãï¼
- é£ç¶å¤æ°ã§ããã°èª¬æå¤æ°ã20ãã100ã«å¢ããã¦ã大ããªåé¡ãªã夿°ã鏿ã§ããã
- ããã¼å¤æ°ã®å¤æ°é¸æã¯é£ãããä»åã®æ¡ä»¶ã®å ´åãååã®ç¢ºçã§å¤±æãã
- ããã¼å¤æ°ã¯ãã©ã¡ã¼ã¿ã®æ¨å®ãé£ãã
å®éã®ãã¼ã¿åæã®ç¾å ´ã§ã¯çã®ãã©ã¡ã¼ã¿ãäºãç¥ããã¨ãã§ããªãããã夿°é¸æã®çµæãåãå ¥ããããå¾ãªããã¨ããããã¨æãã¾ãããããä»åã®å®é¨ã®çµæããã¯ãçã®å¤æ°ã»ããã鏿ã§ãã確çã¯ååç¨åº¦ãããªãããã¨ã示ããã¦ãããæ©æ¢°çãªå¤æ°ã®åæ¨é¸æãã©ãã»ã©å±éºã§ããããæãã¦ããã¦ããã®ã§ã¯ãªãã§ããããã
ã¨ã¯è¨ãä»åå®é¨ã«ä½¿ç¨ãã{projpred}ã¯ä¾¿å©ãªlibraryã§ããã¨æãã¾ãã®ã§ãããããæ´»ç¨ãã¤ã¤ãå¼ãç¶ã夿°é¸æã®æ¹æ³ã«ã¤ãã¦æ¢ãã¦ãããã¨æãã¾ãã
-
æ¢ç´¢çãªåæã®æã¯ã¨ãããã夿°ã®æ©æ¢°çãªåæ¨é¸æã¯ãã¢ããªã³ã°ãããªãï¼ãã¨ããæ°åã«ãªã£ã¦ãã¾ãã¾ã↩
-
ä»ã«ãè¯ãææ³ã¯ããã¨æãã¾ããä¾ãã°æ¾æµ¦ããã®ã¢ãã«æ¬ã§ã¯ãåå¸°ä¿æ°ã®äºååå¸ãäºéææ°åå¸ã¨ãããã¨ã§ä¸è¦ãªèª¬æå¤æ°ãåã"Bayesian Lasso"ã¨ããææ³ãç´¹ä»ããã¦ãã¾ã↩
-
Gelmanã®arm::bayesglmã¨ã®é¢ä¿ã¯ãããããã¾ãããä¸å¿ãdependencyã§ã¯ãªãããã§ããã↩