Anova-Freak and Bayesian Hipster #rstats
This article is originally published at https://strengejacke.wordpress.com
I’m pleased to announce an update of my sjstats-package. New features are specifically implemented for the Anova and Bayesian statistic and summary functions. Here’s a short overview of what’s new…
Beside the already implemented functions to calculate eta-squared, partial eta-squared and omega-squared, it is now also possible to calculate partial omega-squared statistics for Anova-tables.
library(sjstats) # load sample data data(efc) # fit linear model fit <- aov( c12hour ~ as.factor(e42dep) + as.factor(c172code) + c160age, data = efc ) omega_sq(fit, partial = TRUE) #> # A tibble: 3 x 2 #> term partial.omegasq #> 1 as.factor(e42dep) 0.278 #> 2 as.factor(c172code) 0.00547 #> 3 c160age 0.0649
omega_sq() have a
ci.lvl-argument, which – if not
NULL – also computes a confidence interval.
For eta-squared, i.e.
partial = FALSE, due to non-symmetry, confidence intervals are based on bootstrap-methods. Confidence intervals for partial omega-squared, i.e.
partial = TRUE – are also based on bootstrapping. In these cases,
n indicates the number of bootstrap samples to be drawn to compute the confidence intervals.
eta_sq(fit, partial = TRUE, ci.lvl = .8) #> # A tibble: 3 x 4 #> term partial.etasq conf.low conf.high #> 1 as.factor(e42dep) 0.281 0.247 0.310 #> 2 as.factor(c172code) 0.00788 0.00109 0.0160 #> 3 c160age 0.0665 0.0467 0.0885
Bayiasian statistics and summaries
Some of the summary-functions for Bayesian models were polished in the current update, e.g.
hdi(). Let’s fit a (non-sense) model with the great brms-package first (you’ll see later why I don’t use rstanarm here):
library(lme4) library(brms) library(dplyr) data(sleepstudy) sleepstudy$mygrp <- sample(1:5, size = 180, replace = TRUE) sleepstudy % group_by(mygrp) %>% mutate(mysubgrp = sample(1:30, size = n(), replace = TRUE)) m <- brm( Reaction ~ Days + (1 | mygrp / mysubgrp) + (1 | Subject), data = sleepstudy )
Highest Density Intervals
hdi() calculates by default the 89% HDI for Bayesian regression models:
hdi(m) #> # A tibble: 3 x 3 #> term hdi.low hdi.high #> 1 b_Intercept 232.00 268.0 #> 2 b_Days 9.15 11.8 #> 3 sigma 27.20 33.5
Now you can calculate the HDI for multiple tresholds (probabilities) at once, using the
hdi(m, prob = c(.5, .8, .95)) #> # A tibble: 3 x 7 #> term hdi.low_0.5 hdi.high_0.5 hdi.low_0.8 hdi.high_0.8 hdi.low_0.95 hdi.high_0.95 #> 1 b_Intercept 244.00 258.0 237.00 265.0 229.00 274.0 #> 2 b_Days 9.97 11.1 9.51 11.6 8.90 12.1 #> 3 sigma 28.90 31.5 27.60 32.5 26.60 33.8
A tidy summary, similar to the
tidy()-function from the broom-package, can be obtained with
tidy_stan(). By default, for multilevel models, this function only shows the fixed effects. Use the different options from the
type-argument if you want random effects only or both random and fixed effects to be printed.
tidy_stan(m) #> # A tibble: 3 x 8 #> term estimate std.error hdi.low hdi.high n_eff Rhat mcse #> 1 b_Intercept 250.0 10.900 233.00 268.0 0.326 1.00 0.312 #> 2 b_Days 10.6 0.835 9.18 11.8 1.000 1.00 0.013 #> 3 sigma 30.2 1.930 27.30 33.4 1.000 1.00 0.202
Intraclass Correlation Coefficient (ICC)
The ICC is a useful statistic for random-intercept models, which describes the proportion of variance that is explained by the random effects structure (see
?icc for more details). The ICC is based on the variance and correlation components of the model, which you typically get with the
VarCorr function. According to the brms-package, there’s a good and a bad news: from a developers perspective, the „bad“ news is that
VarCorr.brmsfit has recently changed in its internal code structure, so I had to revise my
icc()-function to work with brms-models again, with success:
icc(m) #> Bayesian mixed model #> Family: gaussian (identity) #> Formula: list() Reaction ~ Days + (1 | mygrp/mysubgrp) + (1 | Subject) list() #> #> ICC (mygrp): 0.032317 #> ICC (mygrp:mysubgrp): 0.013529 #> ICC (Subject): 0.598439
The good news is, that
VarCorr.brmsfit has recently changed in its internal code structure. Now it is also possible to get the variance and correlation components for each sample of the model, which allows us to compute the ICC for each sample of the model, which again in turn allows us to compute an ICC including uncertainty intervals very quickly – simply add
posterior = TRUE to the
icc(m, posterior = TRUE) #> # Random Effect Variances and ICC #> #> Family: gaussian (identity) #> Formula: Reaction ~ Days + (1 | mygrp/mysubgrp) + (1 | Subject) #> #> ## mygrp #> ICC: 0.022 (HDI 89%: 0.000-0.104) #> Between-group: 57.406 (HDI 89%: 0.000-277.588) #> #> ## mygrp:mysubgrp #> ICC: 0.011 (HDI 89%: 0.000-0.050) #> Between-group: 28.535 (HDI 89%: 0.000-127.271) #> #> ## Subject #> ICC: 0.578 (HDI 89%: 0.419-0.737) #> Between-group: 1449.446 (HDI 89%: 682.362-2413.380) #> #> ## Residuals #> Within-group: 909.039 (HDI 89%: 722.154-1085.644)
By default, the 89% interval around the ICC „point estimate“ is shown in the output, however, the
print()-method has an argument to change the interval range to any value, e.g.
print(icc(m, posterior = TRUE), prob = .5, digits = 2).
The colored output above is not just playing with CSS, the R console output is indeed colored!
Please visit source website for post related comments.