å®ã¯æ¥åã§ãStan使ãå§ãã¦ããã§ãããã¾ã ã¾ã å使 ¹ããããã«ãã¼ã¿ã®åæã«åããªã©ä½ã¬ãã«ãªãã®ãå¤ããç¡æ å ±äºååå¸ã¨é層äºååå¸ãå·§ã¿ã«ä½¿ãããªãã¦è¯éºã«ãµã³ããªã³ã°ããããªãã¦å¤¢ã®ã¾ã夢ã¨ããæ ããªãç¶æ³ã§ãï¼æ³£ï¼ã
ã§ãæ°ãä»ããã@berobero11ããã®Stan関連ブログ記事が超絶充実ãã¦ãã¦ã久保先生もびっくりã¿ãããªç¶æ³ã«ããã¯ãåãä½ããã ãã ãæ¸ãã®ãã¢ããããã®ã§ãå ã«Stanã®ä½¿ãæ¹ãè¦ãããã¨ããæ¹ã¯æ¯é@berobero11ããã®ããã°ããèªãã§ä¸ããï¼ç¬ï¼ãåã¯ã²ããããã¡ãã®è¨äºãï¼ä¾ãã°infer.NETãããã®ä¾é¡ãè§£ããªããï¼ãã¬ã¼ã¹ãã¦ããã ãã®ã·ã§ããè¨äºãã ãã ãæ¸ãã¦ãããã¨æã£ã¦ã¾ãã
ã¨ãããã¨ã§ãã¨ããããStanマニュアルã§ããstan-reference-2.1.0.pdfãæãèªã¿ãã¦*1ããã£ã¨åãèå³ã®ããç¯å²ã§ã¾ã¨ããã ãã®ãã®ãæ¸ãåºãã¦ã¿ã¾ããStanã³ã¼ãä¾ã¯å
¨ã¦ããã¥ã¢ã«ããã®å¼ç¨ã§ãã
ã¯ããã«
åºæ¬çã«Stanï¼ã¨ãããMC / MCMCãµã³ãã©ã¼ï¼ã¯å°¤åº¦è¨ç®ãã¢ã³ãã«ã«ãæ³ã«åºã¥ãã¦ãã£ã¦ããããã®ã§ãããªã®ã§ã尤度è¨ç®ããåæããã°è¯ãã¨ããã¿ã¤ãã®çµ±è¨ã¢ããªã³ã°ã¯äºå®ä¸ä½ã§ãStanã§ã§ããã¯ãã§ãããã£ã¦ãããã«ååãæãã£ã¦ãªãã¢ããªã³ã°ææ³ãããã¾ãã³ã¼ããæ¸ãã¦å®è£
ã§ããã°Stanã§èµ°ããããã¨ãå¯è½ã ã¨ãè¨ããã¨æãã¾ãã

ã¢ã³ãã«ã«ãæ³ã«ãã尤度è¨ç®ã®ã¤ã¡ã¼ã¸ã¯ãããªæããã¨ããã®ã¯前回記事ã§ç´¹ä»ããéãã§ããã§ã大äºãªãã¨ã¯ã©ããªã¢ããªã³ã°ã«ãããå ã®ã¢ãã«å¼ããã¡ãã¨ã左辺ï¼å³è¾ºï¼ç¢ºçåå¸ï¼ãã®å½¢ã«ç´ãããã¨ãããã¨ãä¾ãã°æ£è¦ç·å½¢ã¢ãã«ã§ããã°ã
â
â
â
y[i] ~ normal( alpha + beta * x[i], sigma )
ã¨ããããã«Stanã³ã¼ãã§è¡¨ç¾ã§ããã¨ããã¾ã§å¼å¤å½¢ã§ããã¨ãããã¨ã§ãããæ
£ããã°ããã¨æ¸ããããã«ãªãã¨æãã¾ãããè¤éãªã¢ããªã³ã°ãè¡ãããå ´åã¯æåã«ãã¿ãã¨åçã«å¾ã£ãã¢ããªã³ã°å¼ãæ¸ãããããå¼å¤å½¢ãã¦Stanã³ã¼ãã«æ¹ããã¨ããã®ãè¯ãããã§ãã
å¤å¤éè§£æã¾ãã
大ä½ä½ã§ãã§ãã¾ãããStanã¯ã³ã¼ãææ³ããããããã£ããããµã³ããªã³ã°ã®ä»æ¹ã«ãã£ã¦åæã®åº¦åããã¾ãã£ããå¤ãã£ããããã®ã§æ³¨æãå¿
è¦ã§ããåãããªãæã¯è¿·ãã@berobero11ãããããã«èãã¾ãããï½ï¼ä½ã§ã丸æãï½ï½ï¼
æ£è¦ç·å½¢ã¢ãã«
ã¾ãè¶
ã®åãã¤ãåºæ¬ä¸ã®åºæ¬ã¨ãã¦ãæ£è¦ç·å½¢ã¢ãã«ãããã¯æ£è¦åå¸normal(mu, sigma)ã§è¨ç®ã§ãã¾ãã
ï¼ä¾ï¼
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
for (n in 1:N)
y[n] ~ normal(alpha + beta * x[n], sigma);
}
ãã¸ã¹ãã£ãã¯å帰ã»ãããããã¢ãã«
ããã¯ãã«ãã¼ã¤åå¸bernoulli(theta)ã§è¨ç®ã§ãã¾ããç®ç夿°ãäºå¤ã§ã¡ããã¨intã§ä¸ãã¦ãããªãã¨æ®éã«ã³ã³ãã¤ã«ã¨ã©ã¼ã«ãªãã®ã§è¦æ³¨æã
ï¼ä¾ï¼
data {
int<lower=0> N;
real x[N];
int<lower=0,upper=1> y[N];
}
parameters {
real alpha;
real beta;
}
model {
for (n in 1:N)
y[n] ~ bernoulli(inv_logit(alpha + beta * x[n]));
}
å¤é ãã¸ããã¢ãã«
äºå¤ãã¸ããã¢ãã«ãã§ãããªããå½ç¶å¤é
ãã¸ãããã§ãã¾ããããã¯å¤é
åå¸categorical(theta)ã§å®ç¾å¯è½ã§ããã¡ãªã¿ã«é åºãã¸ã¹ãã£ãã¯å帰ã¯ãã®ã¾ãã¾ã®ordered_logistic(eta,c)ã§ã§ãã¾ãã
ï¼ä¾ï¼
data {
int K;
int N;
int D;
int y[N];
vector[D] x[N];
}
parameters {
matrix[K,D] beta;
}
model {
for (k in 1:K)
for (d in 1:D)
beta[k,d] ~ normal(0,5);
for (n in 1:N)
y[n] ~ categorical(softmax(beta * x[n]));
}
ãã®ä»
é層ãã¸ã¹ãã£ãã¯ã¢ãã«ã¨ãitem-response theory model(1PL-Rasch model / Multi-level 2PL model)ã¨ããåºã¦ãããã§ãããåã¯ããåãããªãã®ã§ããã§ã¯å²æãã¾ãããã*2
è¨éæç³»ååæã¾ãã
ä½ã§ãåæã«ç¢ºçåå¸ããã£ã¤ãã¦ã¢ããªã³ã°ã§ããã¨ããStanã®ç¹æ§ãçããã¦ãæ®éã«è¨éæç³»åã¢ãã«ã®ãã©ã¡ã¼ã¿æ¨å®ãè¡ããã¨ãã§ãã¾ãã「Rで計量時系列分析」シリーズ記事ã§è¦ã¦ããããã«ãããã¾ãæå°¤æ³ã§æ¨å®ãã¦ããã®ã§ãã¡ããStanã§ããããªãããã¨ããããã§ãã
ARã¢ãã«
ã¿ã¤ã ã©ã°ã®åã ãå·®åãåã£ã¦ãæ£è¦åå¸normal(mu,sigma)ã§ã¢ããªã³ã°ããã°OKã§ãããªãçºå±ã¨ãã¦ARCHã¢ãã«ãæ¨å®ã§ãã¾ãããGARCHã¢ãã«ã®èª¬æã詳ããã®ã§ããã§ã¯å²æã
ï¼ä¾ï¼AR(1)ã¢ãã«ï¼ data { int<lower=0> N; real y[N]; } parameters { real alpha; real beta; real sigma; } model { for (n in 2:N) y[n] ~ normal(alpha + beta*y[n-1], sigma); } ï¼ä¾ï¼AR(K)ã¢ãã«ï¼ data { int<lower=0> K; int<lower=0> N; real y[N]; } parameters { real alpha; real beta[K]; real sigma; } model { for (n in (K+1):N) { real mu; mu <- alpha; for (k in 1:K) mu <- mu + beta[k] * y[n-k]; y[n] ~ normal(mu, sigma); } }
GARCHã¢ãã«
å®ã¯ARCH(1)ã¢ãã«ã®èª¬æã«ä½¿ããã¦ãããã©ã¡ã¼ã¿ç¾¤ã®èª¬æãããã®ã§è¦æ³¨æï½
ï¼ä¾ï¼GARCH(1,1)ã¢ãã«ï¼ data { int<lower=0> T; real r[T]; real<lower=0> sigma1; } parameters { real mu; real<lower=0> alpha0; real<lower=0,upper=1> alpha1; real<lower=0,upper=(1-alpha1)> beta1; } transformed parameters { real<lower=0> sigma[T]; sigma[1] <- sigma1; for (t in 2:T) sigma[t] <- sqrt(alpha0 + alpha1 * pow(r[t-1] - mu, 2) + beta1 * pow(sigma[t-1], 2)); } model { r ~ normal(mu,sigma); }
MAã¢ãã«
MA(Q)ã¢ãã«ã¯ãã¯ãã«è¡¨ç¾ã使ã£ã¦ä»¥ä¸ã®ããã«æ¸ãã¾ãã
ï¼ä¾ï¼MA(Q)ã¢ãã«ï¼
data {
int<lower=0> Q; // num previous noise terms
int<lower=3> T; // num observations
vector[T] y; // observation at time t
}
parameters {
real mu; // mean
real<lower=0> sigma; // error scale
vector[Q] theta; // error coeff, lag -t
}
transformed parameters {
vector[T] epsilon; // error term at time t
for (t in 1:T) {
epsilon[t] <- y[t] - mu;
for (q in 1:min(t-1,Q))
epsilon[t] <- epsilon[t] - theta[q] * epsilon[t - q];
}
}
model {
vector[T] eta;
mu ~ cauchy(0,2.5);
theta ~ cauchy(0,2.5);
sigma ~ cauchy(0,2.5);
for (t in 1:T) {
eta[t] <- mu;
for (q in 1:min(t-1,Q))
eta[t] <- eta[t] + theta[q] * epsilon[t - q];
}
y ~ normal(eta,sigma);
}
ARMAã¢ãã«
ããã§ã¯ARMA(1,1)ã¢ãã«ã®ä¾ã ãå¼ç¨ãã¦ããã¾ãã
ï¼ä¾ï¼ARMA(1,1)ã¢ãã«ï¼ data { int<lower=1> T; // num observations real y[T]; // observed outputs } parameters { real mu; // mean coeff real phi; // autoregression coeff real theta; // moving avg coeff real<lower=0> sigma; // noise scale } model { vector[T] nu; // prediction for time t vector[T] err; // error for time t nu[1] <- mu + phi * mu; // assume err[0] == 0 err[1] <- y[1] - nu[1]; for (t in 2:T) { nu[t] <- mu + phi * y[t-1] + theta * err[t-1]; err[t] <- y[t] - nu[t]; } mu ~ normal(0,10); // priors phi ~ normal(0,2); theta ~ normal(0,2); sigma ~ cauchy(0,5); err ~ normal(0,sigma); // likelihood }
ãã®ä»
確ççãã©ãã£ãªãã£ã¢ãã«ãé ããã«ã³ãã¢ãã«ãã³ã¼ãã§ããããã§ãããå¾è
ã¯ã¨ãããåè
ã¯åã¯æ®æ®µã¯å
¨ç¶æ±ããªãã¦çãã®ã§å²æãã¾ããããã
測å®èª¤å·®ã¨ã¡ã¿ã¢ããªã·ã¹
ãã®è¾ºåã¯ã¾ã¼ã¼ã¼ã¼ã¼ã¼ã£ãã詳ãããªãã®ã§å
¨ã¦å²æï¼ããããªããï¼ãã¨ã¯è¨ããæ¸¬å®èª¤å·®ã®ã¢ããªã³ã°ãStanã使ãã°èªç±èªå¨ã«ã§ããã®ã§ãä¾ã«ãæãã£ã¦ããããã«ã¡ã¿ã¢ããªã·ã¹ãªã©ã§æ¸¬å®èª¤å·®å士ãçµ±åãã¦è©ä¾¡ãããã±ã¼ã¹ãªããã§ã¯ç©åãæç¨ãªæ°ããã¾ãã
ã¯ã©ã¹ã¿ãªã³ã°ã¾ãã
ããããå¤é
åå¸categorical(theta)ãå®è£
ããã¦ããã®ãè¦ãã°åããéãããã¤ã¸ã¢ã³ã®æµåã§ã¯ã©ã¹ã¿ãªã³ã°ãå®è£
ãããã¨ãã§ãã¾ãã
K-means
ããããæ®éã®ã¦ã¼ã¯ãªããè·é¢ã使ã£ããã¼ã¸ã§ã³ãããã¥ã¢ã«ã«è¼ã£ã¦ãã¾ãã
ï¼ä¾ï¼"Soft" K-meansï¼ data { int<lower=0> N; // number of data points int<lower=1> D; // number of dimensions int<lower=1> K; // number of clusters vector[D] y[N]; // observations } transformed data { real<upper=0> neg_log_K; neg_log_K <- -log(K); } parameters { vector[D] mu[K]; // cluster means } transformed parameters { real<upper=0> soft_z[N,K]; // log unnormalized clusters for (n in 1:N) for (k in 1:K) soft_z[n,k] <- neg_log_K - 0.5 * dot_self(mu[k] - y[n]); } model { // prior for (k in 1:K) mu[k] ~ normal(0,1); // likelihood for (n in 1:N) increment_log_prob(log_sum_exp(soft_z[n])); }
ããã¤ã¸ã¢ã³ã§çæã¢ãã«ããã®ã¯é£ãããã ãã³ã©ã
ã¨ã使
ã1ç¯ã¾ãã¾ã使ã£ã¦ããã¥ã¢ã«ã§ç©åã説æãã¦ã¾ãï¼ç¬ï¼ã詳ããã¯ããã¥ã¢ã«æ¬æãèªãã§ãããããã®ã§ãããNon-Identifiabilityã¨Multimodalityãå
å¶ã§ããã¨ããã³ã¡ã³ããããã¦ã¾ããã²ã¨ã¾ãããã¯@berobero11ããã«ã§ãã解説ãè³ãã¨ãã¦*3ãæ¬¡ã«ããã¾ãããã
æ··åã¢ãã«
å®ã¯ããã¥ã¢ã«ã®ãããã«åºã¦ãä¾ãªãã§ãããæ®éã«æ··åæ£è¦åå¸ã¨ããå¤é
åå¸categorial(theta)ã¨æ£è¦åå¸normal(mu,sigma)ã¨ã®åããæã§ããã¾ãã
ï¼ä¾ï¼æ··åã¢ãã«ï¼
data {
int<lower=1> K; // number of mixture components
int<lower=1> N; // number of data points
real y[N]; // observations
}
parameters {
simplex[K] theta; // mixing proportions
real mu[K]; // locations of mixture components
real<lower=0,upper=10> sigma[K]; // scales of mixture components
}
model {
real ps[K]; // temp for log component densities
for (k in 1:K) {
mu[k] ~ normal(0,10);
}
for (n in 1:N) {
for (k in 1:K) {
ps[k] <- log(theta[k])
+ normal_log(y[n],mu[k],sigma[k]);
}
increment_log_prob(log_sum_exp(ps));
}
}
Latent Dirichlet Allocation (LDA)
ã¨ãããã¨ã§è²ã
å¶ç´ã¯ããããããã§ãããäºååå¸ã«ãã£ãªã¯ã¬åå¸dirichlet(alpha)ãå
ã¦ã¦ã尤度è¨ç®ãå¤é
åå¸categorical(theta)ã§ããã°ããºããªLatent Dirichlet Allocationãèµ°ããããã¨ãã§ãã¾ãã
ï¼ä¾ï¼LDAï¼
data {
int<lower=2> K; // num topics
int<lower=2> V; // num words
int<lower=1> M; // num docs
int<lower=1> N; // total word instances
int<lower=1,upper=V> w[N]; // word n
int<lower=1,upper=M> doc[N]; // doc ID for word n
vector<lower=0>[K] alpha; // topic prior
vector<lower=0>[V] beta; // word prior
}
parameters {
simplex[K] theta[M]; // topic dist for doc m
simplex[V] phi[K]; // word dist for topic k
}
model {
for (m in 1:M)
theta[m] ~ dirichlet(alpha); // prior
for (k in 1:K)
phi[k] ~ dirichlet(beta); // prior
for (n in 1:N) {
real gamma[K];
for (k in 1:K)
gamma[k] <- log(theta[doc[n],k]) + log(phi[k,w[n]]);
increment_log_prob(log_sum_exp(gamma)); // likelihood
}
}
ãã®ä»
ã¬ã¦ã¹éç¨ã使ããã¤ã¨ããcholesky_decompose()颿°ã§ã³ã¬ã¹ãã¼åè§£ãåºæ¥ãã¨ã*4ãè²ã
ãªãããã¯ã¹ãä»ã«ãç¶ããã§ãããåã®ç¾å¨ã®çè§£ã®ç¯çãä»åã¯ä¸æ¦å²æãã¾ããã¾ãå¿
è¦ã«ãªã£ããåãä¸ãã¾ããã¼ã¨ãããã¨ã§ããããããããã®Stanä¿®è¡å¤§å¤ããããã