R / R News / Rstats

Anova-Freak and Bayesian Hipster #rstats

by Daniel · March 27, 2018

This article is originally published at https://strengejacke.wordpress.com

I’m pleased to announce an update of my sjstats-package. New features are specifically implemented for the Anova and Bayesian statistic and summary functions. Here’s a short overview of what’s new…

Anova statistics

Beside the already implemented functions to calculate eta-squared, partial eta-squared and omega-squared, it is now also possible to calculate partial omega-squared statistics for Anova-tables.

library(sjstats)
# load sample data
data(efc)

# fit linear model
fit <- aov(
  c12hour ~ as.factor(e42dep) + as.factor(c172code) + c160age,
  data = efc
)

omega_sq(fit, partial = TRUE)
#> # A tibble: 3 x 2
#>   term                partial.omegasq
#> 1 as.factor(e42dep)           0.278  
#> 2 as.factor(c172code)         0.00547
#> 3 c160age                     0.0649

Furthermore, eta_sq() and omega_sq() have a ci.lvl-argument, which – if not NULL – also computes a confidence interval.

For eta-squared, i.e. eta_sq() with partial = FALSE, due to non-symmetry, confidence intervals are based on bootstrap-methods. Confidence intervals for partial omega-squared, i.e. omega_sq() with partial = TRUE – are also based on bootstrapping. In these cases, n indicates the number of bootstrap samples to be drawn to compute the confidence intervals.

eta_sq(fit, partial = TRUE, ci.lvl = .8)
#> # A tibble: 3 x 4
#>   term                partial.etasq conf.low conf.high
#> 1 as.factor(e42dep)         0.281    0.247      0.310 
#> 2 as.factor(c172code)       0.00788  0.00109    0.0160
#> 3 c160age                   0.0665   0.0467     0.0885

Bayiasian statistics and summaries

Some of the summary-functions for Bayesian models were polished in the current update, e.g. hdi(). Let’s fit a (non-sense) model with the great brms-package first (you’ll see later why I don’t use rstanarm here):

library(lme4)
library(brms)
library(dplyr)
data(sleepstudy)

sleepstudy$mygrp <- sample(1:5, size = 180, replace = TRUE)
sleepstudy %
  group_by(mygrp) %>%
  mutate(mysubgrp = sample(1:30, size = n(), replace = TRUE))

m <- brm(
  Reaction ~ Days + (1 | mygrp / mysubgrp) + (1 | Subject),
  data = sleepstudy
)

Highest Density Intervals

hdi() calculates by default the 89% HDI for Bayesian regression models:

hdi(m)
#> # A tibble: 3 x 3
#>   term        hdi.low hdi.high
#> 1 b_Intercept  232.00    268.0
#> 2 b_Days         9.15     11.8
#> 3 sigma         27.20     33.5

Now you can calculate the HDI for multiple tresholds (probabilities) at once, using the prob-argument:

hdi(m, prob = c(.5, .8, .95))
#> # A tibble: 3 x 7
#>   term        hdi.low_0.5 hdi.high_0.5 hdi.low_0.8 hdi.high_0.8 hdi.low_0.95 hdi.high_0.95
#> 1 b_Intercept      244.00        258.0      237.00        265.0       229.00         274.0
#> 2 b_Days             9.97         11.1        9.51         11.6         8.90          12.1
#> 3 sigma             28.90         31.5       27.60         32.5        26.60          33.8

Tidy summary

A tidy summary, similar to the tidy()-function from the broom-package, can be obtained with tidy_stan(). By default, for multilevel models, this function only shows the fixed effects. Use the different options from the type-argument if you want random effects only or both random and fixed effects to be printed.

tidy_stan(m)
#> # A tibble: 3 x 8
#>   term        estimate std.error hdi.low hdi.high n_eff  Rhat   mcse
#> 1 b_Intercept    250.0    10.900  233.00    268.0 0.326  1.00  0.312 
#> 2 b_Days          10.6     0.835    9.18     11.8 1.000  1.00  0.013
#> 3 sigma           30.2     1.930   27.30     33.4 1.000  1.00  0.202

Intraclass Correlation Coefficient (ICC)

The ICC is a useful statistic for random-intercept models, which describes the proportion of variance that is explained by the random effects structure (see ?icc for more details). The ICC is based on the variance and correlation components of the model, which you typically get with the VarCorr function. According to the brms-package, there’s a good and a bad news: from a developers perspective, the „bad“ news is that VarCorr.brmsfit has recently changed in its internal code structure, so I had to revise my icc()-function to work with brms-models again, with success:

icc(m)
#> Bayesian mixed model
#>  Family: gaussian (identity)
#> Formula: list() Reaction ~ Days + (1 | mygrp/mysubgrp) + (1 | Subject) list()
#> 
#>            ICC (mygrp): 0.032317
#>   ICC (mygrp:mysubgrp): 0.013529
#>          ICC (Subject): 0.598439

The good news is, that VarCorr.brmsfit has recently changed in its internal code structure. Now it is also possible to get the variance and correlation components for each sample of the model, which allows us to compute the ICC for each sample of the model, which again in turn allows us to compute an ICC including uncertainty intervals very quickly – simply add posterior = TRUE to the icc()-function call:

icc(m, posterior = TRUE)
#> # Random Effect Variances and ICC
#> 
#> Family: gaussian (identity)
#> Formula: Reaction ~ Days + (1 | mygrp/mysubgrp) + (1 | Subject)
#> 
#> ## mygrp
#>           ICC: 0.022 (HDI 89%: 0.000-0.104)
#> Between-group: 57.406 (HDI 89%: 0.000-277.588)
#> 
#> ## mygrp:mysubgrp
#>           ICC: 0.011 (HDI 89%: 0.000-0.050)
#> Between-group: 28.535 (HDI 89%: 0.000-127.271)
#> 
#> ## Subject
#>           ICC: 0.578 (HDI 89%: 0.419-0.737)
#> Between-group: 1449.446 (HDI 89%: 682.362-2413.380)
#> 
#> ## Residuals
#> Within-group: 909.039 (HDI 89%: 722.154-1085.644)

By default, the 89% interval around the ICC „point estimate“ is shown in the output, however, the print()-method has an argument to change the interval range to any value, e.g.
print(icc(m, posterior = TRUE), prob = .5, digits = 2).

Final words

The colored output above is not just playing with CSS, the R console output is indeed colored!

Thanks for visiting r-craft.org
This article is originally published at https://strengejacke.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Anova-Freak and Bayesian Hipster #rstats

You may also like...

Categories

Anova-Freak and Bayesian Hipster #rstats

Anova statistics

Bayiasian statistics and summaries

Highest Density Intervals

Tidy summary

Intraclass Correlation Coefficient (ICC)

Final words

You may also like...

R Weekly 2021-W42 VSCode, rpp, fill region between ggplot2 lines

Worldwide COVID-19 spread visualization with R

Homebrew 2.0.0 Released == homebrewanalytics package updated

Categories