Author: Oliver Guggenbühl

array(6) { ["headers"]=> object(WpOrg\Requests\Utility\CaseInsensitiveDictionary)#5546 (1) { ["data":protected]=> array(10) { ["content-type"]=> string(34) "application/rss+xml; charset=UTF-8" ["x-ws-ratelimit-limit"]=> string(4) "1000" ["x-ws-ratelimit-remaining"]=> string(3) "999" ["date"]=> string(29) "Fri, 21 Mar 2025 10:13:11 GMT" ["server"]=> string(6) "Apache" ["x-powered-by"]=> string(10) "PHP/8.3.19" ["vary"]=> string(19) "accept,content-type" ["link"]=> string(56) "; rel="https://api.w.org/"" ["last-modified"]=> string(29) "Fri, 22 Mar 2024 20:46:35 GMT" ["etag"]=> string(34) ""0d2bbeef40ab9477580aaf48ebef6249"" } } ["body"]=> string(1463553) " Henrik Singmann – Computational Psychology http://singmann.org Tue, 23 Jun 2020 12:50:27 +0000 en-US hourly 1 73426105 Install R without support for long doubles (noLD) on Ubuntu http://singmann.org/install-r-without-support-for-long-doubles/ http://singmann.org/install-r-without-support-for-long-doubles/#comments Mon, 22 Jun 2020 20:08:19 +0000 http://singmann.org/?p=894 R packages on CRAN needs to pass a series of technical checks. These checks can also be invoked by any user when running R CMD check on the package tar.gz (to emulate CRAN as much as possible one should also set the --as-cran option when doing so). These checks need to be passed before a package is accepted on CRAN. In addition, these checks are regularly run for each package on CRAN to ensure that new R features or updates of upstream packages do not break the package. Furthermore, CRAN checks regularly become stricter. Thus, keeping a package on CRAN may require regular effort from the package maintainer. Whereas this sometimes can be rather frustrating for the maintainer, partly because of CRAN’s rather short two week limit in case of newly appearing issues, this is one the features that ensures the high technical quality of packages on CRAN.

As an example for the increasingly stricter checks, CRAN now also performs a set of additional checks in addition to the CRAN checks on all R platforms that are shown on a packages check page (e.g., for the MPTmultiverse). These additional checks include tests for memory access errors (e.g., using valgrind), R compiled using alternative compilers, different numerical algebra libraries, but also tests for an R version without support for long doubles (i.e., noLD). It now has happened for the second time that one of my packages showed a problem on the R version without long double support

In my case, the problem on the R version without long double support appeared in the package examples or in the unit tests of the package. Therefore, I did not only want to fix the check issue, I also wanted to understand what was happening. Thus, I needed a working version of R without support for long doubles. Unfortunately, the description of this setup is rather sparse. The only information on CRAN is rather sparse:

tests on x86_64 Linux with R-devel configured using --disable-long-double

Other details as https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-gcc

Similarly sparse information is given in Writing R Extensions:

If you must try to establish a tolerance empirically, configure and build R with –disable-long-double and use appropriate compiler flags (such as -ffloat-store and -fexcess-precision=standard for gcc, depending on the CPU type86) to mitigate the effects of extended-precision calculations.

Unfortunately, my first approach in which I simply tried to add the --disable-long-double option to the R-devel install script failed. After quite a bit of searching I found the solution on the RStudio community forum thanks to David F. Severski. In addition to --disable-long-double one also needs to add --enable-long-double=no to configure. At least on Ubuntu, this successfully compiles an R version without long double support. This can be confirmed with a call to capabilities() in R.

The rest of this post gives a list of all the packages I needed to install on a fresh Ubuntu version to successfully compile R in this way (e.g., from here). This set of packages should of course also hold for compiling normal R versions. I hope I did not forget too many packages, but this hopefully covers most. Feel free to post a comment if something is missing and I will try to update the list.

sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install gfortran
sudo apt-get install gcc-multilib
sudo apt-get install gobjc++
sudo apt-get install libpcre2-dev
sudo apt-get install xorg-dev
sudo apt-get install libcurl4-openssl-dev
sudo apt-get install libbz2-dev
sudo apt-get install liblzma-dev
sudo apt-get install libblas-dev
sudo apt-get install texlive-fonts-extra
sudo apt-get install default-jdk
sudo apt-get install aptitude
sudo aptitude install libreadline-dev
sudo apt-get install curl

In addition to the necessary packages, the following packages probably lead to a better R user experience (after installing these a restart may help):

sudo apt-get install xfonts-100dpi 
sudo apt-get install xfonts-75dpi
sudo apt-get install qpdf
sudo apt-get install pandoc
sudo apt-get install libssl-dev
sudo apt-get install libxml2-dev
sudo apt-get install git
sudo apt-get install gdebai-core
sudo apt-get install libcairo2-dev
sudo apt-get install libtiff-dev

The last two packages should allow you to add --with-cairo=yes to the configure script below. The package above might be needed for installing RStudio.

After this, we should be able to build R. For this, I followed the `RStudio` instructions for installing multiple R versions in parallel. We begin by setting an environment variable and downloading R.

export R_VERSION=4.0.1

curl -O https://cran.rstudio.com/src/base/R-4/R-${R_VERSION}.tar.gz
tar -xzvf R-${R_VERSION}.tar.gz
cd R-${R_VERSION}

We can then install R (here I set the options for disabling long doubles):

./configure \
    --prefix=/opt/R/${R_VERSION} \
    --enable-R-shlib \
    --with-blas \
    --with-lapack \
    --disable-long-double \
    --enable-long-double=no

make 
sudo make install

To test the installation we can use:

/opt/R/${R_VERSION}/bin/R --version

Finally, we need to create a symbolic link:

sudo ln -s /opt/R/${R_VERSION}/bin/R /usr/local/bin/R
sudo ln -s /opt/R/${R_VERSION}/bin/Rscript /usr/local/bin/Rscript

We can then run R and check the capabilities of the installation:

> capabilities()
       jpeg         png        tiff       tcltk         X11 
      FALSE        TRUE       FALSE       FALSE        TRUE 
       aqua    http/ftp     sockets      libxml        fifo 
      FALSE        TRUE        TRUE        TRUE        TRUE 
     cledit       iconv         NLS     profmem       cairo 
       TRUE        TRUE        TRUE       FALSE       FALSE 
        ICU long.double     libcurl 
       TRUE       FALSE        TRUE

Or shorter:

> capabilities()[["long.double"]]
[1] FALSE

 

 

 

 

]]>
http://singmann.org/install-r-without-support-for-long-doubles/feed/ 2 894
afex_plot(): Publication-Ready Plots for Factorial Designs http://singmann.org/afex_plot/ http://singmann.org/afex_plot/#respond Tue, 25 Sep 2018 17:44:35 +0000 http://singmann.org/?p=744 I am happy to announce that a new version of afex (version 0.22-1) has appeared on CRAN. This version comes with two major changes, for more see the NEWS file. To get the new version including all packages used in the examples run:

install.packages("afex", dependencies = TRUE)

First, afex does not load or attach package emmeans automatically anymore. This reduces the package footprint and makes it more lightweight. If you want to use afex without using emmeans, you can do this now. The consequence of this is that you have to attach emmeans explicitly if you want to continue using emmeans() et al. in the same manner. Simply add library("emmeans") to the top of your script just below library("afex") and things remain unchanged. Alternatively, you can use emmeans::emmeans() without attaching the package.

Second and more importantly, I have added a new plotting function to afex. afex_plot() visualizes results from factorial experiments combining estimated marginal means and associated uncertainties (i.e., error bars) in the foreground with a depiction of the raw data in the background. Currently, afex_plots() supports ANOVAs and mixed models fitted with afex as well as mixed models fitted with lme4 (support for more models will come in the next version). As shown in the example below, afex_plots() makes it easy to produce nice looking plots that are ready to be incorporated into publications. Importantly, afex_plots() allows different types of error bars, including within-subjects confidence intervals, which makes it particularly useful for fields where such designs are very common (e.g., psychology). Furthermore, afex_plots() is built on ggplot2 and designed in a modular manner, making it easy to customize the plot to ones personal preferences.

afex_plot() requires the fitted model object as first argument and then has three arguments determining which factor or factors are displayed how:
x is necessary and specifies the factor(s) plotted on the x-axis
trace is optional and specifies the factor(s) plotted as separate lines (i.e., with each factor-level present at each x-axis tick)
panel is optional and specifies the factor(s) which separate the plot into different panels.

The further arguments make it easy to customize the plot in various ways. A comprehensive overview is provided in the new vignette, further details, specifically regarding the question of which type of error bars are supported, is given on its help page (which also has many more examples).

Let us look at an example. We take data from a 3 by 2 within-subject experiment that also features prominently in the vignette. Note that we plot within-subjects confidence intervals (by setting error = "within") and then customize the plot quite a bit by changing the theme, using nicer labels, removing some y-axis ticks, adding colour, and using a customized geom (geom_boxjitter from the ggpol package) for displaying the data in the background.

library("afex") 
library("ggplot2") 
data(md_12.1)
aw <- aov_ez("id", "rt", md_12.1, within = c("angle", "noise"))

afex_plot(aw, x = "angle", trace = "noise", error = "within",
          mapping = c("shape", "fill"), dodge = 0.7,
          data_geom = ggpol::geom_boxjitter, 
          data_arg = list(
            width = 0.5, 
            jitter.width = 0,
            jitter.height = 10,
            outlier.intersect = TRUE),
          point_arg = list(size = 2.5), 
          error_arg = list(size = 1.5, width = 0),
          factor_levels = list(angle = c("0°", "4°", "8°"),
                               noise = c("Absent", "Present")), 
          legend_title = "Noise") +
  labs(y = "RTs (in ms)", x = "Angle (in degrees)") +
  scale_y_continuous(breaks=seq(400, 900, length.out = 3)) +
  theme_bw(base_size = 15) + 
  theme(legend.position="bottom", panel.grid.major.x = element_blank())

ggsave("afex_plot.png", device = "png", dpi = 600,
       width = 8.5, height = 8, units = "cm") 

In the plot, the black dots are the means and the thick black lines the 95% within-subject confidence intervals. The raw data is displayed in the background with a half box plot showing the median and upper and lower quartile as well as the raw data. The raw data is jittered on the y-axis to avoid perfect overlap.


One final thing to note. In the vignette on CRAN as well as the help page there is an error in the code. The name of the argument for changing the labels of the factor-levels is factor_levels and not new_levels. The vignette linked above and here uses the correct argument name. This is already corrected on github and will be corrected on CRAN with the next release.

]]>
http://singmann.org/afex_plot/feed/ 0 744
Diffusion/Wiener Model Analysis with brms – Part III: Hypothesis Tests of Parameter Estimates http://singmann.org/wiener-model-analysis-with-brms-part-iii/ http://singmann.org/wiener-model-analysis-with-brms-part-iii/#comments Thu, 06 Sep 2018 15:58:49 +0000 http://singmann.org/?p=708 This is the third part of my blog series on fitting the 4-parameter Wiener model with brms. The first part discussed how to set up the data and model. The second part was concerned with (mostly graphical) model diagnostics and the assessment of the adequacy (i.e., the fit) of the model. This third part will inspect the parameter estimates of the model with the goal of determining whether there is any evidence for differences between the conditions. As before, this part is completely self sufficient and can be run without running the code of Parts I or II.

As I promised in the second part of this series of blog posts, the third part did not take another two months to appear. No, this time it took almost eight month. I apologize for this, but we all know the planning fallacy and a lot of more important things got into the way (e.g., teaching).

As this part is relatively long, I will provide a brief overview. The next section contains a short explanation for the way in which we will perform hypothesis testing. This is followed by a short section loading some packages and the fitted model object and giving a small recap of the model. After this comes one relatively long section looking at the drift rate parameters in various ways. Then we will take look at each of the other three parameters in turn. Of especial importance will be the subsection on the non-decision time. As described in more detail below, I believe that this parameter cannot be interpreted. In the end, I give a brief overview of some of the limits of the present model and how it could be improved upon.

Bayesian Hypothesis Testing

The goal of this post is to provide evidence for differences in parameter estimates between conditions. This posts will present different ways to do so. Importantly, different ways of how to produce such evidence is only meant in the technical sense. In statistical terms we will always do basically the same thing: inspect difference distributions resulting from linear combinations of cell-wise posterior distributions of the group-level model parameter estimates. The somewhat technical phrase “linear combinations of cell-wise posterior distributions” often simply means the difference between two distributions. For example, the difference distribution resulting from subtracting the posterior of the speed condition from the posterior of the accuracy condition.

As a reminder, a posterior distribution is the probability distribution of the parameter conditional on data and model (where the latter includes the parameter priors). It answers the question which parameters are likely given our prior knowledge and the data. Therefore, the posterior distribution of the difference answers, for example, which difference values between two conditions are likely or not. With such a difference distribution we can then do two things.

First, we can check whether the x%-highest posterior density (HPD) or credibility interval of this difference distribution includes 0. If 0 is within the 95% HPD interval it could be seen a plausible value. If 0 is outside the 95% interval we could regard it as not plausible enough and would conclude that there is evidence for a difference.

Second, we can evaluate how much of the difference distribution is on one side of 0. If this value is considerably away from 50%, this constitutes evidence for a difference. For example, if all of the posterior samples for a specific difference are larger than zero, this provides considerable evidence that the difference is above 0.

The approach of investigating posterior distributions to gauge differences between conditions is only one approach for hypothesis testing in a Bayesian setting. And, at least in the psychological literature, it is not the most popular one. More specifically, many of the more vocal proponents of Bayesian statistics in the psychological literature advocate hypothesis testing using Bayes factors (e.g., ). One prominent exception to this rule in psychology is maybe John . However, he proposes yet another approach of inference based on posterior distributions as used here. In general, I agree with many of the argument pro Bayes factor, especially in cases as the current one in which all relevant hypothesis or competing models are nested within one large (super) model.

The main difficulty when using Bayes factors is their extreme sensitivity to the parameter priors. In a situation with nested models, this is in principle not such a big problem, because one could use Jeffrey’s default prior approach (e.g., ). have extended this approach to general ANOVA designs (I am sure they were not the first to have this idea, but they were at least the first to popularize this idea in psychology). Quentin Gronau and colleagues have applied it to accumulator models, including the diffusion model. The general idea is to reparameterize the model using effect parameters which are normalized using, for example, the residual variance. For example, for a two sample design parameterize the model using a standardized difference such as Cohen’s d. Then it is comparatively easy and uncontroversial to put a prior on the standardized effect size measure. In the present case, in which the model does not contain a residual variance parameter, one could use the variance estimate of the group-level distribution for each parameter for such a normalization.

Unfortunately, brms does to the best of my knowledge not contain the ability to specify a parameterization and prior distribution in line with Jeffrey’s default Bayes factor. And as far as I remember a discussion I had on this topic with Paul Bürkner some time ago, it is also unlikely brms will ever get this ability. Consequently, I feel that brms is not the right tool for model selection using Bayes factors. Whereas it offers this ability now from a technical side (using our bridgesampling package), it only allows models with an unnormalized parameterization. I believe that such a parameterization is in most cases not appropriate for Bayes factors based model selection as the priors cannot be specified in a ‘default’ manner. Thus, I cannot recommend brms for Bayes factor based model selection at the moment. In sum, the reason for basing our inferences solely on posterior distributions in the present case is practical constraints and not philosophical considerations.

One final word of caution for the psychological readership. Whereas Bayes factors are clearly extremely popular in psychology, this is not the case in many other scientific disciplines. For example, the patron saint of applied Bayesian statistics, Andrew Gelman, is a self-declared opponent of Bayes factors: “I generally hate Bayes factors myself”. As far as I can see, this disagreement comes from the different type of data different people work with. When working with observational (or correlational) data, as Andrew Gelman usually does, tests of the presence of effects (or of nullity) are either a big no-no (e.g., when wanting to do causal inference) or simply not interesting. We know that the real world is full of relationships, especially small ones, between arbitrary things. So getting effects simply by increasing N is just not interesting and estimation is the more interesting approach. In contrast, for experimental data, we often have true null hypothesis and testing of those makes a lot of sense. For example, if Bem was right and there truly were PSI, we could surely exploit this somehow. But as far as we can tell, the effect is truly null. In this case we really need ypothesis testing.

Getting Started

We start with loading some packages for analyzing the posterior. Since the beginning of this series, I have more and more become a fan of the whole tidyverse, so we import it completely. We of course also need brms. As shown below, we will need a few more packages (especially emmeans and tidybayes), but these are only loaded when needed.

library("brms")
library("tidyverse")
theme_set(theme_classic()) # theme for ggplot2
options(digits = 3)

Then we will also need the posterior samples, we can load them in the same way as before from my github page. Note that we neither need the data nor the posterior predictive distribution this time.

tmp <- tempdir()
download.file("https://singmann.github.io/files/brms_wiener_example_fit.rda",
file.path(tmp, "brms_wiener_example_fit.rda"))
load(file.path(tmp, "brms_wiener_example_fit.rda"))

We begin with looking at the group-level posteriors. An overview of their posterior distributions can be obtained using the summary function.

#                                    Estimate Est.Error l-95% CI u-95% CI
# conditionaccuracy:frequencyhigh      -2.944    0.1971   -3.345   -2.562
# conditionspeed:frequencyhigh         -2.716    0.2135   -3.125   -2.299
# conditionaccuracy:frequencynw_high    2.238    0.1429    1.965    2.511
# conditionspeed:frequencynw_high       1.989    0.1785    1.626    2.332
# bs_conditionaccuracy                  1.898    0.1448    1.610    2.186
# bs_conditionspeed                     1.357    0.0813    1.200    1.525
# ndt_conditionaccuracy                 0.323    0.0173    0.289    0.358
# ndt_conditionspeed                    0.262    0.0154    0.232    0.293
# bias_conditionaccuracy                0.471    0.0107    0.449    0.491
# bias_conditionspeed                   0.499    0.0127    0.474    0.524
# Warning message:
# There were 7 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
# See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

As a reminder, we have data from a lexical decision task (i.e., participants have to decide whether presented strings are a word or not) and frequency is the factor determining the true status of a string, with high referring to words and nw_high to non-words. Consequently, for the drift rate (the first four rows in the results table) the frequency factor determines the sign of the parameter estimates with the drift rate for words (rows 1 and 2) being clearly negative (i.e., those trials mostly hit the lower boundary for the word decision) and the drift rate for non-words (rows 3 and 4) being clearly positive (i.e., those trials mostly hit the upper boundary for non-word decisions). Furthermore, there could be differences between the drift rates in the accuracy or speed conditions. Specifically, in the speed conditions drift rates seem to be less extreme (i.e., nearer to 0) compared to the accuracy conditions.

The other three parameters, only differ between the condition factor. Given the experimental manipulation of accuracy versus speed condition, we expect differences for the boundary separation, parameters starting with bs_. For the non-decision time, parameters starting with ndt_, there also appears to be a small effect as the 95% only overlap slightly. However, as discussed in detail below, we should be careful in interpreting this difference. Finally, for bias, parameters starting with bias_, there might be a difference or not. Furthermore, at least in the accuracy condition there appears to be a bias for “word” responses.

One way to test differences between conditions is using the hypothesis function in brms. However, I was not able to get it to work with the current model. I suspect the reason for this is the somewhat unconventional parameterizations where each cell gets one parameter (in some sense each cell has its own intercept, but there is no overall intercept). This contrasts with a more “standard” parameterization in which there is one intercept (for either the unweighted means or one of the cells) and the remaining parameters capture the differences between the intercept and the cell means. As a reminder, I chose this unconventional parameterization in the first place to make the specification of the parameters priors easier. Additionally, this is a common parameterization when programming cognitive models by hand.

emmeans and tidybayes: Differences in the Drift Rate

An alternative is to use the great emmeans package by Russel Lenth. I am a huge fan of emmeans and use it all the time when using “normal” statistical models (e.g., ANOVAs, mixed models), independent of whether I use frequentist methods (e.g., via afex) or Bayesian methods (e.g., rstanarm or brms). Unfortunately, it appears as if emmeans at the moment only allows an analysis of the main parameter of the response distribution for models estimated with brms, which in our case is the drift rate. If someone were to extend emmeans to allow using brms models with all parameters, I would be very happy and thankful. In any case, I highly recommend to check out the emmeans vignettes to get an overview of what type of follow-up tests are all possible with this great package.

As I recently learned, emmeans works quite nicely together with tidybayes, a package that enables working with posterior draws within the tidyverse. tidybayes has a surprisingly large package footprint (i.e., it imports quite a lot of other packages) for a package with a comparatively small functionality. I guess this is a consequence of being embedded within the tidyverse. In any case, many of the imported packages are already in the search path thanks to loading the tidyverse above and attaching should not take that long here.

library("emmeans")
library("tidybayes")

We begin with emmeans only to assure ourselves that it works as expected. For this, we get the estimated marginal means plus 95%-highest posterior density (HPD) intervals which match the output of the fixed effects for the estimate of the central tendency (which is the median of the posterior samples in both cases). As a reminder, the fact that the cell estimates match the parameter estimates is of course a consequence of the unusual parameterization which is picked up correctly by emmeans. The lower and upper bounds of the intervals differ slightly between the summary output from brms and emmeans, a consequence of using different ways of calculating the intervals (i.e., quantiles versus HPD intervals).

fit_wiener %>%
  emmeans( ~ condition*frequency) 
#  condition frequency emmean lower.HPD upper.HPD
#  accuracy  high       -2.94     -3.34     -2.56
#  speed     high       -2.72     -3.10     -2.28
#  accuracy  nw_high     2.24      1.96      2.50
#  speed     nw_high     1.99      1.64      2.34
# 
# HPD interval probability: 0.95

Using HPD Intervals And Histograms

As a first test, we are interested in assessing whether there is evidence for a difference between speed and accuracy conditions for both words (i.e., frequency = high) and non-words (i.e., frequency = nw_high). There are many ways to do this with emmeans one of them is via the by argument and the pairs function.

 

fit_wiener %>%
  emmeans("condition", by = "frequency") %>% 
  pairs
# frequency = high:
#  contrast         estimate lower.HPD upper.HPD
#  accuracy - speed   -0.225   -0.6964     0.256
# 
# frequency = nw_high:
#  contrast         estimate lower.HPD upper.HPD
#  accuracy - speed    0.249   -0.0647     0.550
# 
# HPD interval probability: 0.95

Here, we do not have a lot of evidence that there is a difference for either stimulus type, as both HPD intervals include 0.

Instead of getting the summary of the distribution via emmeans, we can also use the capabilities of tidybayes and extract the samples in a tidy way. Then we use one of the convenient aggregation functions coming with tidybayes and aggregate the samples based on the same conditioning variable. After trying a few different options, I have the feeling that emmeanshpd.summary() function uses the same approach for calculating HPD intervals as tidybayes, as both results match.

samp1 <- fit_wiener %>%
  emmeans("condition", by = "frequency") %>% 
  pairs %>% 
  gather_emmeans_draws()
samp1 %>% 
  median_hdi()
# # A tibble: 2 x 8
# # Groups:   contrast [1]
#   contrast         frequency .value  .lower .upper .width .point .interval
#   <fct>            <fct>      <dbl>   <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy - speed high      -0.225 -0.696   0.256   0.95 median hdi      
# 2 accuracy - speed nw_high    0.249 -0.0647  0.550   0.95 median hdi

Instead of the median, we can also use the mode as our point estimate. In the present case the differences between both are not large but noticeable for the word stimuli.

samp1 %>% 
  mode_hdi()
# # A tibble: 2 x 8
# # Groups:   contrast [1]
#   contrast         frequency .value  .lower .upper .width .point .interval
#   <fct>            <fct>      <dbl>   <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy - speed high      -0.190 -0.696   0.256   0.95 mode   hdi      
# 2 accuracy - speed nw_high    0.252 -0.0647  0.550   0.95 mode   hdi

Further, we might use a different way for calculating HPD intervals. I have the feeling, Rob Hyndman’s hdrcde package provides the most elaborated set of functions for estimating highest density intervals. Consequently, this is what we use next. Note that the package need to be installed for that.

To use it in a tidy way, we write a short function returning a data.frame in a list. Thus, when called within summarise we get a list-column. Consequently, we have to call unnest to get a nice output.

get_hdi <- function(x, level = 95) {
  tmp <- hdrcde::hdr(x, prob = level)
  list(data.frame(mode = tmp$mode[1], lower = tmp$hdr[1,1], upper = tmp$hdr[1,2]))
}
samp1 %>% 
  summarise(hdi = get_hdi(.value)) %>% 
  unnest
# # A tibble: 2 x 5
# # Groups:   contrast [1]
#   contrast         frequency   mode   lower upper
#   <fct>            <fct>      <dbl>   <dbl> <dbl>
# 1 accuracy - speed high      -0.227 -0.712  0.247
# 2 accuracy - speed nw_high    0.249 -0.0616 0.558

The results differ again slightly, but not too much. Perhaps more importantly, there is still no real evidence for a difference in the drift rate between conditions. Even when looking only at 80% HPD intervals there is only evidence for a difference for the non-word stimuli.

samp1 %>% 
  summarise(hdi = get_hdi(.value, level = 80)) %>% 
  unnest
# # A tibble: 2 x 5
# # Groups:   contrast [1]
#   contrast         frequency   mode   lower  upper
#   <fct>            <fct>      <dbl>   <dbl>  <dbl>
# 1 accuracy - speed high      -0.212 -0.540  0.0768
# 2 accuracy - speed nw_high    0.246  0.0547 0.442

Because we have the samples in a convenient form, we could now evaluate whether there is any evidence for a drift rate difference between conditions for both, word and non-word stimuli. One problem for this is, however, that the direction of the effect differs between words and non-words. This is a consequence from the fact that word stimuli require a response at the lower decision boundary and non-words a response at the upper boundary. Consequently, we need to multiply the effect with -1 for one of the conditions. After that, we can take the mean of both conditions. We do this via tidyverse magic and also add the number of values that are aggregated in this way to the table. This is just a precaution to make sure that our logic is correct and we always aggregate exactly two values. As the final check shows, this is the case.

samp2 <- samp1 %>% 
  mutate(val2 = if_else(frequency == "high", -1*.value, .value)) %>% 
  group_by(contrast, .draw) %>% 
  summarise(value = mean(val2),
            n = n())
all(samp2$n == 2)
# [1] TRUE

We can then investigate the resulting difference distribution. One way to do so is in a graphical manner via a histogram. As recommended by Hadley Wickham, it makes sense to play around with the number of bins a bit until the figure looks good. Given we have quite a large number of samples, 75 bins seemed good to me. With less bins there was not enough granularity, with more bins I felt there were too many small peaks.

ggplot(samp2, aes(value)) +
  geom_histogram(bins = 75) +
  geom_vline(xintercept = 0)

This shows that, whereas quite a bit of the posterior mass is to the right of 0, a non negligible part is still to the left. So there is some evidence for a difference, but it is still not very strong, even when looking at words and non-words together.

We can also investigate this difference distribution via the HPD intervals. To get a better overview we now look at several intervals sizes:

hdrcde::hdr(samp2$value, prob = c(99, 95, 90, 80, 85, 50))
# $`hdr`
#        [,1]  [,2]
# 99% -0.1825 0.669
# 95% -0.0669 0.554
# 90% -0.0209 0.505
# 85%  0.0104 0.471
# 80%  0.0333 0.445
# 50%  0.1214 0.340
# 
# $mode
# [1] 0.225
# 
# $falpha
#    1%    5%   10%   15%   20%   50% 
# 0.116 0.476 0.757 0.984 1.161 1.857 

This shows that only for the 85% interval and smaller intervals is 0 excluded. Note, you can use hdrcde::hdr.den instead of hdrcde::hdr to get a graphical overview of the output.

Using Bayesian p-values

An approach that requires less arbitrary cutoffs then HPDs (for which we have to define the width) is to calculate the actual proportion of samples below 0:

mean(samp2$value < 0)
# [1] 0.0665

As explained above, if this proportion would be small, this would constitute evidence for a difference. Here, the proportion of samples below 0 is .067. Unfortunately, .067 is a bit above the magical cutoff of .05, which is universally accepted as delineating small from big numbers, or perhaps more appropriately, likely from unlikely probabilities.

Let us look at such a proportion a bit more in depth. If two posterior distributions are lying exactly on top of each other, the resulting difference distribution is centered on 0 and exactly 50% of the difference distribution would be on either side of 0. Thus, a proportion of 50% corresponds to the least evidence for a difference, or alternatively, to the strongest evidence for an absence of a difference. One further consequence is that both, values near 0 and values near 1, are indicative of a difference, albeit in different directions. To make interpretation of these proportions easier, I suggest to always calculate them in such a way that small values represent evidence for a difference (e.g., by subtracting the proportion from 1 if it is above .5).

But what does this proportion tell us exactly? It represents the probability that there is a difference in a specific direction. Thus, it represents one-sided evidence for a difference. In contrast, for a 95% HPD we remove 2.5% from each sides of the difference distribution. To ensure this proportion has the same two-sided property as our HPD intervals, we need to multiply it by 2. A further benefit of this multiplication is that it stretches the range to the whole probability scale (i.e., from 0 to 1).

Thus, the resulting value is a probability (i.e., ranging from 0 to 1), with values near zero denoting evidence for a difference, and values near one provide some evidence against a difference. Thus, in contrast to a classical p-value it is a continuous measure of evidence for (when near 0) or against (when near 1) a difference between the parameter estimates. Given its superficial similarity with classical p-values (i.e., low values are seen as evidence for a difference), we could call this it a version of a Bayesian p-value or pB for short. In the present case we could say: The pB value for a difference between speed and accuracy conditions in drift rate across word and non-word stimuli is .13, indicating that the evidence for a difference is at best weak.

Bayesian p-values of course allows us to misuse them in the same way that we can misuse classical p-values. For example, by introducing arbitrary cutoff values, such as at .05. Imagine for a second that we are interested in testing whether there are differences in the absolute amount of evidence as measured via drift rate for any of the four cells of the design (I am not suggesting that is particularly sensible). For this, we would have to transform the posterior for all drift rates onto the same side (note, we do not want to take the absolute values as we still want to retain the information of switching from positive to negative drift rates or the other way around). For example, by multiplying the drift rate for words by -1. We do so and then inspect the cell means.

samp3 <- fit_wiener %>%
  emmeans( ~ condition*frequency) %>% 
  gather_emmeans_draws() %>% 
  mutate(.value = if_else(frequency == "high", -1 * .value, .value),
         intera = paste(condition, frequency, sep = ".")) 
samp3 %>% 
  mode_hdi(.value)
# # A tibble: 4 x 8
# # Groups:   condition [2]
#   condition frequency .value .lower .upper .width .point .interval
#   <fct>     <fct>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy  high        2.97   2.56   3.34   0.95 mode   hdi      
# 2 accuracy  nw_high     2.25   1.96   2.50   0.95 mode   hdi      
# 3 speed     high        2.76   2.28   3.10   0.95 mode   hdi      
# 4 speed     nw_high     2.00   1.64   2.34   0.95 mode   hdi

Inspection of the four cell means suggests that drift rate values for words are larger then the values for non-words.

To get an overview of all pairwise differences using an arbitrary cut-off value, I have written two functions that returns a compact letter display of all pairwise comparisons. The functions require the data in the wide format, with each column representing the draws for one parameter. Note that the compact letter display is calculated via another package, multcompView, which needs to be installed before using these functions.

get_p_matrix <- function(df, only_low = TRUE) {
  # pre-define matrix
  out <- matrix(-1, nrow = ncol(df), ncol = ncol(df), dimnames = list(colnames(df), colnames(df)))
  for (i in seq_len(ncol(df))) {
    for (j in seq_len(ncol(df))) {
      out[i, j] <- mean(df[,i] < df[,j]) 
    }
  }
  if (only_low) out[out > .5] <- 1- out[out > .5]
  out
}

cld_pmatrix <- function(model, pars, level = 0.05) {
  p_matrix <- get_p_matrix(model)
  lp_matrix <- (p_matrix < (level/2) | p_matrix > (1-(level/2)))
  cld <- multcompView::multcompLetters(lp_matrix)$Letters
  cld
}
samp3 %>% 
  ungroup() %>% ## to get rid of unneeded columns
  select(.value, intera, .draw) %>% 
  spread(intera, .value) %>% 
  select(-.draw) %>% ## we need to get rid of all columns not containing draws
  cld_pmatrix()
# accuracy.high accuracy.nw_high       speed.high    speed.nw_high 
#           "a"              "b"              "a"              "b"

In a compact letter display, conditions that share a common letter do not differ according to the criterion. Conditions that do not share a common letter do differ according to the criterion. Here, the compact letter display is not super informative and just recovers what we have seen above. The drift rates for the words form one group and the drift rates for the non-words form another group. In cases with more conditions or more complicated difference pattern compact letter displays can be quite informative.

We could have also used the functionality of tidybayes to inspect all pairwise comparisons. Note that it is important to use ungroup before invoking the compare_levels function. Otherwise we get an error that is difficult to understand (the grouping appears to be a consequence of using emmeans).

samp3 %>% 
  ungroup %>% 
  compare_levels(.value, by = intera) %>% 
  mode_hdi()
# # A tibble: 6 x 7
#   intera                           .value  .lower  .upper .width .point .interval
#   <fct>                             <dbl>   <dbl>   <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy.nw_high - accuracy.high -0.715 -1.09   -0.351    0.95 mode   hdi      
# 2 speed.high - accuracy.high       -0.190 -0.696   0.256    0.95 mode   hdi      
# 3 speed.nw_high - accuracy.high    -0.946 -1.46   -0.526    0.95 mode   hdi      
# 4 speed.high - accuracy.nw_high     0.488  0.0879  0.876    0.95 mode   hdi      
# 5 speed.nw_high - accuracy.nw_high -0.252 -0.550   0.0647   0.95 mode   hdi      
# 6 speed.nw_high - speed.high       -0.741 -1.12   -0.309    0.95 mode   hdi

Differences in Other Parameters

As discussed above, to look at the differences in the other parameter we apparently cannot use emmeans anymore. Luckily, tidybayes still offers the possibility to extract the posterior samples in a tidy way using either gather_draws or spread_draws. It appears that for either of those you need to pass the specific variable names you want to extract. We get them via get_variables:

get_variables(fit_wiener)[1:10]
# [1] "b_conditionaccuracy:frequencyhigh"    "b_conditionspeed:frequencyhigh"      
# [3] "b_conditionaccuracy:frequencynw_high" "b_conditionspeed:frequencynw_high"   
# [5] "b_bs_conditionaccuracy"               "b_bs_conditionspeed"                 
# [7] "b_ndt_conditionaccuracy"              "b_ndt_conditionspeed"                
# [9] "b_bias_conditionaccuracy"             "b_bias_conditionspeed"

Boundary Separation

We will use spread_draws to analyze the boundary separation. First we extract the draws and then immediately calculate the difference distribution between both.

samp_bs <- fit_wiener %>%
  spread_draws(b_bs_conditionaccuracy, b_bs_conditionspeed) %>% 
  mutate(bs_diff = b_bs_conditionaccuracy - b_bs_conditionspeed)
samp_bs
# # A tibble: 2,000 x 6
#    .chain .iteration .draw b_bs_conditionaccuracy b_bs_conditionspeed bs_diff
#     <int>      <int> <int>                  <dbl>               <dbl>   <dbl>
#  1      1          1     1                   1.73                1.48   0.250
#  2      1          2     2                   1.82                1.41   0.411
#  3      1          3     3                   1.80                1.28   0.514
#  4      1          4     4                   1.85                1.42   0.424
#  5      1          5     5                   1.86                1.37   0.493
#  6      1          6     6                   1.81                1.36   0.450
#  7      1          7     7                   1.67                1.34   0.322
#  8      1          8     8                   1.90                1.47   0.424
#  9      1          9     9                   1.99                1.20   0.790
# 10      1         10    10                   1.76                1.19   0.569
# # ... with 1,990 more rows

Now we can of course use the same tools as above. For example, look at the histogram. Here, I again chose 75 bins.

samp_bs %>% 
  ggplot(aes(bs_diff)) +
  geom_histogram(bins = 75) +
  geom_vline(xintercept = 0)

The histogram reveals pretty convincing evidence for a difference. It appears as if only two samples are below 0. We confirm this suspicion and then calculate the Bayesian p-value. As it turns out, is is also extremely small.

sum(samp_bs$bs_diff < 0)
# [1] 2
mean(samp_bs$bs_diff < 0) *2
# [1] 0.002

All in all we can be pretty confident that manipulating speed versus accuracy conditions affects the boundary separation in the current data set. Exactly as expected.

Non-Decision Time

For assessing differences in the non-decision time, we use gather_draws. One benefit of this function compared to spread_draws is that it makes it easy to obtain the marginal estimates. As already said above, the HPD interval overlap only very little suggesting that there is a difference between the conditions. We save the resulting marginal estimates for later in a new data.frame called ndt_mean.

samp_ndt <- fit_wiener %>%
  gather_draws(b_ndt_conditionaccuracy, b_ndt_conditionspeed) 
(ndt_mean <- samp_ndt %>% 
  median_hdi())
# # A tibble: 2 x 7
#   .variable               .value .lower .upper .width .point .interval
#   <chr>                    <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 b_ndt_conditionaccuracy  0.323  0.293  0.362   0.95 median hdi      
# 2 b_ndt_conditionspeed     0.262  0.235  0.295   0.95 median hdi

To evaluate the difference, the easiest approach to me seems again to spread the two variables across rows and then calculate the difference (i.e., similar to starting with spread_draws in the first place). We can then again plot the resulting difference distribution.

samp_ndt2 <- samp_ndt %>% 
  spread(.variable, .value) %>% 
  mutate(ndt_diff = b_ndt_conditionaccuracy - b_ndt_conditionspeed)  

samp_ndt2 %>% 
  ggplot(aes(ndt_diff)) +
  geom_histogram(bins = 75) +
  geom_vline(xintercept = 0)

As previously speculated, there appears to be strong evidence for a difference. We can further confirm this via the Bayesian p-value:

mean(samp_ndt2$ndt_diff < 0) * 2
# [1] 0.005

So far this looks as if we found another clear difference in parameter estimates due to the manipulation. But this conclusion would be premature. In fact, investigating the non-decision time from the 4-parameter Wiener model estimated in this way is completely misleading. Instead of capturing a meaningful feature of the response time distribution, the non-decision time parameter is only sensitive to very few data points. Specifically, the non-decision time basically only reflects a specific feature of the distribution of minimum response times per participant and per condition or cell for which it is estimated. I will demonstrate this in the following for our example data.

We first need to load the data in the same manner as in the previous posts. We then calculate the minimum RTs per participant and condition.

data(speed_acc, package = "rtdists")
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) # remove extreme RTs
speed_acc <- droplevels(speed_acc[ speed_acc$frequency %in% 
                                     c("high", "nw_high"),])
min_val <- speed_acc %>% 
  group_by(condition, id) %>% 
  summarise(min = min(rt))

To investigate the problem, we want to graphically compare the the distribution of minimum RTs with the estimates for the non-decision time. For this, we need to add a condition column with matching condition names to the ndt_mean data.frame created above. Then, we can plot both into the same plot. We also add several summary statistics regarding the distribution of individual minimum RTs. Specifically, the black points show the individual minimum RTs for each of the two conditions; the blue + shows the median and the blue x the mean of the individual minimum RTs; the blue circle shows the midpoint between the largest and smallest value of the minimum RT distributions; the red square shows the point estimate of the non-decision time parameter with corresponding 95% HPD intervals.

ndt_mean$condition <- c("accuracy", "speed")

ggplot(min_val, aes(x = condition, y = min)) +
  geom_jitter(width = 0.1) +
  geom_pointrange(data = ndt_mean, 
                  aes(y = .value, ymin = .lower, ymax = .upper), 
                  shape = 15, size = 1, color = "red") +
  stat_summary(col = "blue", size = 3.5, shape = 3, 
               fun.y = "median", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 4, 
               fun.y = "mean", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 16, 
               fun.y = function(x) (min(x) + max(x))/2, 
               geom = "point")

What this graph rather impressively shows is that the estimate of the non-decision time almost perfectly matches the midpoint between largest and smallest minimum RT (i.e., the blue dot). Let us put this in perspective by comparing the number of minimum data points (i.e., the number of participants) to the number of total data points.

speed_acc %>% 
  group_by(condition) %>% 
  summarise(n())
# # A tibble: 2 x 2
#   condition `n()`
#   <fct>     <int>
# 1 accuracy   5221
# 2 speed      5241

length(unique(speed_acc$id))
# [1] 17

17 / 5000
# [1] 0.0034

This shows that the non-decision time parameter, one of only four model parameters, is essentially completely determined by less than .5% of the data. If any of these minimum RTs is an outlier (which at least in the accuracy condition seems likely) a single response time can have an immense influence on the parameter estimate. In other words, it can hardly be assumed that with the current implementation the non-decision time parameter reflects an actual latent process. Instead, it simply reflects the midpoint between smallest and largest minimum RT per participant and condition, slightly weighted toward the mass of the distribution of minimum RTs. This parameter estimate should not be used to draw substantive conclusions.

In the present case, this confound does not appear to be too consequential. If only one of the data points in the accuracy condition is an outlier and the other data points are faithful representatives of the leading edge of the response time distribution (which is essentially what the non-decision time is supposed to capture), the current parameter estimates underestimate the true difference. Using a more robust ad-hoc measure of the leading edge, specifically the 10% trimmed mean of the 40 fastest RTs per participant and condition plotted below, further supports this conclusion. This graph also does not contain any clear outliers anymore. For reference, the non-decision time estimates are still included. Nevertheless, having a parameter be essentially driven by very few data points seems completely at odds with the general idea of cognitive modeling and the interpretation of non-decision times obtained with such a model cannot be recommended.

min_val2 <- speed_acc %>% 
  group_by(condition, id) %>% 
  summarise(min = mean(sort(rt)[1:40], trim = 0.1))

ggplot(min_val2, aes(x = condition, y = min)) +
  geom_jitter(width = 0.1) +
  stat_summary(col = "blue", size = 3.5, shape = 3, 
               fun.y = "median", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 4, 
               fun.y = "mean", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 16, 
               fun.y = function(x) (min(x) + max(x))/2, 
               geom = "point") +
  geom_point(data = ndt_mean, aes(y = .value), shape = 15, 
             size = 2, color = "red")

It is important to note that this confound does not hold for all implementations of the diffusion model, but is specific to the 4-parameter Wiener model as implemented here. There are solutions for avoiding this problem, two of which I want to list here. First, one could add across trial variability in the non-decision time. This variability is often assumed to come a uniform distribution which can capture outliers at the leading edge of the response time distribution. Second, instead of only fitting a diffusion model one could assume that some of the responses are contaminants coming from a different process, for example random responses from a uniform distribution ranging from the absolute minimum to maximum RT. Technically, this would consitute a mixture model between the diffusion process and a uniform distribution with either a free or fixed mixture/contamination rate (e.g., ). It should be relatively easy to implement such a mixture model via a custom_family in brms and I hope to find the time to do that at some later point.

I am of course not the first one to discover this behavior of the 4-parameter Wiener model (see e.g., ). However, this problem seems especially prevalent in a Bayesian setting as the 4-parameter model variant is readily available and model variants appropriately dealing with this problem are not. Some time ago I asked Chris Donkin and Greg Cox what they thought would be the best way to address this issue and the one thing I remember from this discussion was Chris’ remark that, when he uses the 4-parameter Wiener model, he simply ignores the non-decision time parameter. That still seems like the best course of action to me.

I hope there are not too many papers out there that use the 4-parameter model in such a way and interpret differences in the non-decision time parameter. If you know of one, I would be interested to learn about it. Either write me a mail or post it in the comments below.

Starting Point / Bias

Finally, we can take a look at the starting point or bias. We do this again using spread_draws and then plot the resulting difference distribution.

samp_bias <- fit_wiener %>%
  spread_draws(b_bias_conditionaccuracy, b_bias_conditionspeed) %>% 
  mutate(bias_diff = b_bias_conditionaccuracy - b_bias_conditionspeed)
samp_bias %>% 
  ggplot(aes(bias_diff)) +
  geom_histogram(bins = 100) +
  geom_vline(xintercept = 0)

The difference distributions suggests there might be a difference. Consequently, we calculate the Bayesian p-value next. Note that we calculate the difference in the other direction this time so that evidence for a difference is represented by small values.

mean(samp_bias$bias_diff > 0) *2
# [1] 0.046

We get lucky and our Bayesian p-value is just below .05, encouraging us to believe that the difference is real. To round this up, we again take a look at the estimates:

fit_wiener %>%
  gather_draws(b_bias_conditionaccuracy, b_bias_conditionspeed) %>% 
  summarise(hdi = get_hdi(.value, level = 80)) %>% 
  unnest
# # A tibble: 2 x 4
#   .variable                 mode lower upper
#   <chr>                    <dbl> <dbl> <dbl>
# 1 b_bias_conditionaccuracy 0.470 0.457 0.484
# 2 b_bias_conditionspeed    0.498 0.484 0.516

Together with the evidence for a difference we can now postulate in a more confident manner that for the accuracy condition there is a bias toward the lower boundary and the “word” responses, whereas evidence accumulation starts unbiased in the speed condition.

Closing Words

This third part wraps up a the most important steps in a diffusion model analysis with brms. Part I shows how to setup the model, Part II shows how to evaluate the adequacy of the model, and the present Part III shows how to inspect the parameter and test hypotheses about them.

As I have mentioned quite a bit throughout these parts, the model used here is not the full diffusion model, but the 4-parameter Wiener model. Whereas this makes estimation possible in the first place, it comes with a few problems. One of them was discussed at length in the present part. The estimate of the non-decision time parameter essentially captures a feature of the distribution of minimum RTs. If these are contaminated by responses that cannot be assumed to come from the same process as the other responses (which I believe a priori to be quite likely), the estimate becomes rather meaningless. My take away from this is that I would not interpret these estimates at all. I feel that the dangers outweigh the benefits by far.

Another feature of the 4-parameter Wiener model is that, in the absence of a bias for any of the response options, it predicts equal mean response times for correct and error responses. This is perhaps the main theoretical constraint which has led to the development of many of the more highly parameterized model variants, such as the full (i.e., 7-parameter) diffusion model. An overview of this issue can, for example, be found in . They write (p. 335):

Depending on the experimental manipulation, RTs for errors are sometimes shorter than RTs for correct responses, sometimes longer, and sometimes there is a crossover in which errors are slower than correct responses when accuracy is low and faster than correct responses when accuracy is high. The models must be capable of capturing all these aspects of a data set.

For the present data we find a specific pattern that is often seen as typical. As shown below, error RTs are quite a bit slower than correct RTs in the accuracy condition. This effect cannot be found in the speed condition where, if anything, error RTs are faster than correct RTs.

speed_acc %>% 
  mutate(correct = stim_cat == response) %>% 
  group_by(condition, correct, id) %>% 
  summarise(mean = mean(rt), 
            se = mean(rt)/sqrt(n())) %>% 
  summarise(mean = mean(mean),
            se = mean(se))
# # A tibble: 4 x 4
# # Groups:   condition [?]
#   condition correct  mean     se
#   <fct>     <lgl>   <dbl>  <dbl>
# 1 accuracy  FALSE   0.751 0.339 
# 2 accuracy  TRUE    0.693 0.0409
# 3 speed     FALSE   0.491 0.103 
# 4 speed     TRUE    0.513 0.0314

Given this difference in the relative speeds of correct and error responses in the accuracy condition, it may seem unsurprising that the accuracy condition is also the one in which we have a measurable bias. Specifically, a bias towards the word responses. However, as can be seen by adding stim_cat into the group_by call above, the difference in the relative error rate is particularly strong for non-words where a bias toward words should lead to faster errors. Thus, it appears that some of the more subtle effects in the data are not fully accounted for in the current model variant.

The canonical way for dealing with differences in the relative speed of errors in diffusion modeling is via across-trial variabilities in the model parameters (see ). Variability in the starting point (introduced by Laming, 1968) allows errors RTs to be faster than correct RTs. Variability in the drift rate (introduced by ) allows error RTs to be slower than correct RTs. (As discussed above, variability in the non-decision time allows its parameter estimates to be less influenced by contaminates or individual outliers.) However, as described below, introducing these variabilities in a Bayesian framework comes with its own problems. Furthermore, there is a recent discussion of the value of these variabilities from a measurement standpoint.

Possible Future Extensions

Whereas this series comes to an end here, there are a few further things that seem either important, interesting, or viable. Maybe I will have some time in the future to talk about these as well, but I suggest to not expect those soon.

  • One important thing we have not yet looked at is the estimates of the group-level parameters (i.e., standard deviations and correlations). They may contain important information about the specific data set and research question, but also about the tradeoffs of the model parameters.

  • Replacing the pure Wiener process with a mixture between a Wiener and a uniform distribution to be able to interpret the non-decision time. As written above, this should be doable with a custom_family in brms.

  • As described above, one of the driving forces for modern response time models, such as the 7-parameter diffusion model, were differences in the relative speed of error and correct RTs. These are usually explained via variabilities in the model parameters. One relatively straight forward way to implement these variabilities in a Bayesian setting would be via the hierarchical structure. For example, each participant gets a by-trial random intercept for the drift rate, + (0+id||trial) (the double bar notation should ensure that these are uncorrelated across participants). Whereas this sounds conceptually simple, I doubt such a model will converge in a reasonable timeframe. Furthermore, as shown by , a model in which the shape of the variability distribution is essentially unconstrained (as is the case when only constraining it via the prior as suggested here) is not testable. The model becomes unfalsifiable as it can predict any data pattern. Given the importance of this approach from a theoretical point of view it nevertheless seems to be an extremely important angle to explore.

  • Fitting the Wiener model takes quite a lot of time. It would be interesting to compare the fit using full Bayesian inference (i.e., sampling as done here) with variational Bayes (i.e., parametric approximation of the posterior), which is also implemented in Stan. I expect that it does not work that well, but the comparison would still be interesting. Recently, diagnostics for variational Bayes were introduced.

  • The diffusion model is of course only one model for response time data. A popular alternative is the LBA. I know there are some implementations in Stan out there, so if they could be accessed via brms, this would be quite interesting.

The RMarkdown file for this post is available here.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/wiener-model-analysis-with-brms-part-iii/feed/ 2 708
Diffusion/Wiener Model Analysis with brms – Part II: Model Diagnostics and Model Fit http://singmann.org/wiener-model-analysis-with-brms-part-ii/ http://singmann.org/wiener-model-analysis-with-brms-part-ii/#comments Sun, 07 Jan 2018 20:08:31 +0000 http://singmann.org/?p=624 This is the considerably belated second part of my blog series on fitting diffusion models (or better, the 4-parameter Wiener model) with brms. The first part discusses how to set up the data and model. This second part is concerned with perhaps the most important steps in each model based data analysis, model diagnostics and the assessment of model fit. Note, the code in the part is completely self sufficient and can be run without running the code of part I.

Setup

At first, we load quite a few packages that we will need down the way. Obviously brms, but also some of the packages from the tidyverse (i.e., dplyr, tidyr, tibble, and ggplot2). It took me a little time to jump on the tidyverse bandwagon, but now that I use it more and more I cannot deny its utility. If your data can be made ‘tidy’, the coherent set of tools offered by the tidyverse make many seemingly complicated tasks pretty easy. A few examples of this will be shown below. If you need more introduction, I highly recommend the awesome ‘R for Data Science’ book by Grolemund and Wickham, which they made available for free! We also need gridExtra for combining plots and DescTools for the concordance correlation coefficient CCC used below.

library("brms")
library("dplyr")
library("tidyr")
library("tibble")    # for rownames_to_column
library("ggplot2")
library("gridExtra") # for grid.arrange
library("DescTools") # for CCC

As in part I, we need package rtdists for the data.

data(speed_acc, package = "rtdists")
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) # remove extreme RTs
speed_acc <- droplevels(speed_acc[ speed_acc$frequency %in% 
                                     c("high", "nw_high"),])
speed_acc$response2 <- as.numeric(speed_acc$response)-1

I have uploaded the binary R data file containing the fitted model object as well as the generated posterior predictive distributions to github, from which we can download them directly into R. Note that I needed to go the way via a temporary folder. If there is a way without that I would be happy to learn about it.

tmp <- tempdir()
download.file("https://singmann.github.io/files/brms_wiener_example_fit.rda", 
              file.path(tmp, "brms_wiener_example_fit.rda"))
download.file("https://singmann.github.io/files/brms_wiener_example_predictions.rda", 
              file.path(tmp, "brms_wiener_example_predictions.rda"))
load(file.path(tmp, "brms_wiener_example_fit.rda"))
load(file.path(tmp, "brms_wiener_example_predictions.rda"))

Model Diagnostics

We already know from part I that there are a few divergent transitions. If this were a real analysis we therefore would not be satisfied with the current fit and try to rerun brm with an increased adapt_delta with the hope that this removes the divergent transitions. The Stan warning guidelines clearly state that “the validity of the estimates is not guaranteed if there are post-warmup divergences”. However, it is unclear what the actual impact of the small number of divergent transitions (< 10) observed here is on the posterior. Also, it is unclear what one can do if adapt_delta cannot be increased anymore and the model also cannot be reparameterized. Should all fits with any divergent transitions be completely disregarded? I hope the Stan team provides more guidelines to such questions in the future.

Coming back to our fit, as a first step in our model diagnostics we check the R-hat statistic as well as the number of effective samples. Specifically, we look at the parameters with the highest R² and lowest number of effective samples.

tail(sort(rstan::summary(fit_wiener$fit)$summary[,"Rhat"]))
#                      sd_id__conditionaccuracy:frequencyhigh 
#                                                        1.00 
#                              r_id__bs[15,conditionaccuracy] 
#                                                        1.00 
#                                    b_bias_conditionaccuracy 
#                                                        1.00 
# cor_id__conditionspeed:frequencyhigh__ndt_conditionaccuracy 
#                                                        1.00 
#                                   sd_id__ndt_conditionspeed 
#                                                        1.00 
#  cor_id__conditionspeed:frequencynw_high__bs_conditionspeed 
#                                                        1.01 
head(sort(rstan::summary(fit_wiener$fit)$summary[,"n_eff"]))
#                                     lp__ 
#                                      462 
#        b_conditionaccuracy:frequencyhigh 
#                                      588 
#                sd_id__ndt_conditionspeed 
#                                      601 
#      sd_id__conditionspeed:frequencyhigh 
#                                      646 
#           b_conditionspeed:frequencyhigh 
#                                      695 
# r_id[12,conditionaccuracy:frequencyhigh] 
#                                      712

Both are unproblematic (i.e., R-hat < 1.05 and n_eff > 100) and suggest that the sampler has converged on the stationary distribution. If anyone has a similar oneliner to return the number of divergent transitions, I would be happy to learn about it.

We also visually inspect the chain behavior of a few semi-randomly selected parameters.

pars <- parnames(fit_wiener)
pars_sel <- c(sample(pars[1:10], 3), sample(pars[-(1:10)], 3))
plot(fit_wiener, pars = pars_sel, N = 6, 
     ask = FALSE, exact_match = TRUE, newpage = TRUE, plot = TRUE)

This visual inspection confirms the earlier conclusion. For all parameters the posteriors look well-behaved and the chains appears to mix well.

Finally, in the literature there are some discussions about parameter trade-offs for the diffusion and related models. These trade-offs supposedly make fitting the diffusion model in a Bayesian setting particularly complicated. To investigate whether fitting the Wiener model with HMC as implemented in Stan (i.e., NUTS) also shows this pattern we take a look at the joint posterior of the fixed-effects of the main Wiener parameters for the accuracy condition. For this we use the stanfit method of the pairs function and set the condition to "divergent__". This plots the few divergent transitions above the diagonal and the remaining samples below the diagonal.

pairs(fit_wiener$fit, pars = pars[c(1, 3, 5, 7, 9)], condition = "divergent__")

This plot shows some correlations, but nothing too dramatic. HMC appears to sample quite efficiently from the Wiener model.

Next we also take a look at the correlations across all parameters (not only the fixed effects).

posterior <- as.mcmc(fit_wiener, combine_chains = TRUE)
cor_posterior <- cor(posterior)
cor_posterior[lower.tri(cor_posterior, diag = TRUE)] <- NA
cor_long <- as.data.frame(as.table(cor_posterior))
cor_long <- na.omit(cor_long)
tail(cor_long[order(abs(cor_long$Freq)),], 10)
#                              Var1                         Var2   Freq
# 43432        b_ndt_conditionspeed  r_id__ndt[1,conditionspeed] -0.980
# 45972 r_id__ndt[4,conditionspeed] r_id__ndt[11,conditionspeed]  0.982
# 46972        b_ndt_conditionspeed r_id__ndt[16,conditionspeed] -0.982
# 44612        b_ndt_conditionspeed  r_id__ndt[6,conditionspeed] -0.983
# 46264        b_ndt_conditionspeed r_id__ndt[13,conditionspeed] -0.983
# 45320        b_ndt_conditionspeed  r_id__ndt[9,conditionspeed] -0.984
# 45556        b_ndt_conditionspeed r_id__ndt[10,conditionspeed] -0.985
# 46736        b_ndt_conditionspeed r_id__ndt[15,conditionspeed] -0.985
# 44140        b_ndt_conditionspeed  r_id__ndt[4,conditionspeed] -0.990
# 45792        b_ndt_conditionspeed r_id__ndt[11,conditionspeed] -0.991

This table lists the ten largest absolute values of correlations among posteriors for all pairwise combinations of parameters. The value in column Freq somewhat unintuitively is the observed  correlation among the posteriors of the two parameters listed in the two previous columns. To create this table I used a trick from SO using as.table, which is responsible for labeling the column containing the correlation value Freq.

What the table shows is some extreme correlations for the individual-level deviations (the first index in the squared brackets of the parameter names seems to be the participant number). Let us visualize these correlations as well.

pairs(fit_wiener$fit, pars = 
        c("b_ndt_conditionspeed", 
          "r_id__ndt[11,conditionspeed]",
          "r_id__ndt[4,conditionspeed]"), 
      condition = "divergent__")

This plot shows that some of the individual-level parameters are not well estimated.

However, overall these extreme correlations appear rather rarely.

hist(cor_long$Freq, breaks = 40)

Overall the model diagnostics do not show any particularly worrying behavior (with the exception of the divergent transitions). We have learned that a few of the individual-level estimates for some of the parameters are not very trustworthy. However, this does not disqualify the overall fit. The main take away from this fact is that we would need to be careful in interpreting the individual-level estimates. Thus, we assume the fit is okay and continue with the next step of the analysis.

Assessing Model Fit

We will now investigate the model fit. That is, we will investigate whether the model provides an adequate description of the observed data. We will mostly do so via graphical checks. To do so, we need to prepare the posterior predictive distribution and the data. As a first step, we combine the posterior predictive distributions with the data.

d_speed_acc <- as_tibble(cbind(speed_acc, as_tibble(t(pred_wiener))))

Then we calculate three important measures (or test statistics T()) on the individual level for each cell of the design (i.e., combination of condition and frequency factors):

  • Probability of giving an upper boundary response (i.e., respond “nonword”).
  • Median RTs for responses to the upper boundary.
  • Median RTs for the lower boundary.

We first calculate this for each sample of the posterior predictive distribution. We then summarize these three measures by calculating the median and some additional quantiles across the posterior predictive distribution. We calculate all of this in one step using a somewhat long combination of dplyr and tidyr magic.

d_speed_acc_agg <- d_speed_acc %>% 
  group_by(id, condition, frequency) %>%  # select grouping vars
  summarise_at(.vars = vars(starts_with("V")), 
               funs(prob.upper = mean(. > 0),
                    medrt.lower = median(abs(.[. < 0]) ),
                    medrt.upper = median(.[. > 0] )
               )) %>% 
  ungroup %>% 
  gather("key", "value", -id, -condition, -frequency) %>% # remove grouping vars
  separate("key", c("rep", "measure"), sep = "_") %>% 
  spread(measure, value) %>% 
  group_by(id, condition, frequency) %>% # select grouping vars
  summarise_at(.vars = vars(prob.upper, medrt.lower, medrt.upper), 
               .funs = funs(median = median(., na.rm = TRUE),
                            llll = quantile(., probs = 0.01,na.rm = TRUE),
                            lll = quantile(., probs = 0.025,na.rm = TRUE),
                            ll = quantile(., probs = 0.1,na.rm = TRUE),
                            l = quantile(., probs = 0.25,na.rm = TRUE),
                            h = quantile(., probs = 0.75,na.rm = TRUE),
                            hh = quantile(., probs = 0.9,na.rm = TRUE),
                            hhh = quantile(., probs = 0.975,na.rm = TRUE),
                            hhhh = quantile(., probs = 0.99,na.rm = TRUE)
               ))

Next, we calculate the three measures also for the data and combine it with the results from the posterior predictive distribution in one data.frame using left_join.

speed_acc_agg <- speed_acc %>% 
  group_by(id, condition, frequency) %>% # select grouping vars
  summarise(prob.upper = mean(response == "nonword"),
            medrt.upper = median(rt[response == "nonword"]),
            medrt.lower = median(rt[response == "word"])
  ) %>% 
  ungroup %>% 
  left_join(d_speed_acc_agg)

Aggregated Model-Fit

The first important question is whether our model can adequately describe the overall patterns in the data aggregated across participants. For this we simply aggregate the results obtained in the previous step (i.e., the summary results from the posterior predictive distribution as well as the test statistics from the data) using mean.

d_speed_acc_agg2 <- speed_acc_agg %>% 
  group_by(condition, frequency) %>% 
  summarise_if(is.numeric, mean, na.rm = TRUE) %>% 
  ungroup

We then use these summaries and plot predictions (in grey and black) as well as data (in red) for the three measures. The inner (fat) error bars show the 80% credibility intervals (CIs), the outer (thin) error bars show the 95% CIs. The black circle shows the median of the posterior predictive distributions.

new_x <- with(d_speed_acc_agg2, 
              paste(rep(levels(condition), each = 2), 
                    levels(frequency), sep = "\n"))

p1 <- ggplot(d_speed_acc_agg2, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  prob.upper_lll, ymax =  prob.upper_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  prob.upper_ll, ymax =  prob.upper_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = prob.upper_median), shape = 1) +
  geom_point(aes(y = prob.upper), shape = 4, col = "red") +
  ggtitle("Response Probabilities") + 
  ylab("Probability of upper resonse") + xlab("") +
  scale_x_discrete(labels = new_x)

p2 <- ggplot(d_speed_acc_agg2, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  medrt.upper_lll, ymax =  medrt.upper_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  medrt.upper_ll, ymax =  medrt.upper_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = medrt.upper_median), shape = 1) +
  geom_point(aes(y = medrt.upper), shape = 4, col = "red") +
  ggtitle("Median RTs upper") + 
  ylab("RT (s)") + xlab("") +
  scale_x_discrete(labels = new_x)

p3 <- ggplot(d_speed_acc_agg2, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  medrt.lower_lll, ymax =  medrt.lower_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  medrt.lower_ll, ymax =  medrt.lower_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = medrt.lower_median), shape = 1) +
  geom_point(aes(y = medrt.lower), shape = 4, col = "red") +
  ggtitle("Median RTs lower") + 
  ylab("RT (s)") + xlab("") +
  scale_x_discrete(labels = new_x)

grid.arrange(p1, p2, p3, ncol = 2)

 

Inspection of the plots show no dramatic misfit. Overall the model appears to be able to describe the general patterns in the data. Only the response probabilities for words (i.e., frequency = high) appears to be estimated too low. The red x appear to be outside the 80% CIs but possibly also outside the 95% CIs.

The plots of the RTs show an interesting (but not surprising) pattern. The posterior predictive distributions for the rare responses (i.e., “word” responses for upper/non-word stimuli and “nonword” response to lower/word stimuli) are relatively wide. In contrast, the posterior predictive distributions for the common responses are relatively narrow. In each case, the observed median is inside the 80% CI and also quite near to the predicted median.

Individual-Level Fit

To investigate the pattern of predicted response probabilities further, we take a look at them on the individual level. We again plot the response probabilities in the same way as above, but separated by participant id.

ggplot(speed_acc_agg, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  prob.upper_lll, ymax =  prob.upper_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  prob.upper_ll, ymax =  prob.upper_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = prob.upper_median), shape = 1) +
  geom_point(aes(y = prob.upper), shape = 4, col = "red") +
  facet_wrap(~id, ncol = 3) +
  ggtitle("Prediced (in grey) and observed (red) response probabilities by ID") + 
  ylab("Probability of upper resonse") + xlab("") +
  scale_x_discrete(labels = new_x)

This plot shows a similar pattern as the aggregated data. For none of the participants do we observe dramatic misfit. Furthermore, response probabilities to non-word stimuli appear to be predicted rather well. In contrast, response probabilities for word-stimuli are overall predicted to be lower than observed. However, this misfit does not seem to be too strong.

As a next step we look at the coverage probabilities of our three measures across individuals. That is, we calculate for each of the measures, for each of the cells of the design, and for each of the CIs (i.e., 50%, 80%, 95%, and 99%), the proportion of participants for which the observed test statistics falls into the corresponding CI.

speed_acc_agg %>% 
  mutate(prob.upper_99 = (prob.upper >= prob.upper_llll) & 
           (prob.upper <= prob.upper_hhhh),
         prob.upper_95 = (prob.upper >= prob.upper_lll) & 
           (prob.upper <= prob.upper_hhh),
         prob.upper_80 = (prob.upper >= prob.upper_ll) & 
           (prob.upper <= prob.upper_hh),
         prob.upper_50 = (prob.upper >= prob.upper_l) & 
           (prob.upper <= prob.upper_h),
         medrt.upper_99 = (medrt.upper > medrt.upper_llll) & 
           (medrt.upper < medrt.upper_hhhh),
         medrt.upper_95 = (medrt.upper > medrt.upper_lll) & 
           (medrt.upper < medrt.upper_hhh),
         medrt.upper_80 = (medrt.upper > medrt.upper_ll) & 
           (medrt.upper < medrt.upper_hh),
         medrt.upper_50 = (medrt.upper > medrt.upper_l) & 
           (medrt.upper < medrt.upper_h),
         medrt.lower_99 = (medrt.lower > medrt.lower_llll) & 
           (medrt.lower < medrt.lower_hhhh),
         medrt.lower_95 = (medrt.lower > medrt.lower_lll) & 
           (medrt.lower < medrt.lower_hhh),
         medrt.lower_80 = (medrt.lower > medrt.lower_ll) & 
           (medrt.lower < medrt.lower_hh),
         medrt.lower_50 = (medrt.lower > medrt.lower_l) & 
           (medrt.lower < medrt.lower_h)
  ) %>% 
  group_by(condition, frequency) %>% ## grouping factors without id
  summarise_at(vars(matches("\\d")), mean, na.rm = TRUE) %>% 
  gather("key", "mean", -condition, -frequency) %>% 
  separate("key", c("measure", "ci"), "_") %>% 
  spread(ci, mean) %>% 
  as.data.frame()
#    condition frequency     measure    50     80    95    99
# 1   accuracy      high medrt.lower 0.706 0.8824 0.882 1.000
# 2   accuracy      high medrt.upper 0.500 0.8333 1.000 1.000
# 3   accuracy      high  prob.upper 0.529 0.7059 0.765 0.882
# 4   accuracy   nw_high medrt.lower 0.500 0.8125 0.938 0.938
# 5   accuracy   nw_high medrt.upper 0.529 0.8235 1.000 1.000
# 6   accuracy   nw_high  prob.upper 0.529 0.8235 0.941 0.941
# 7      speed      high medrt.lower 0.471 0.8824 0.941 1.000
# 8      speed      high medrt.upper 0.706 0.9412 1.000 1.000
# 9      speed      high  prob.upper 0.000 0.0588 0.588 0.647
# 10     speed   nw_high medrt.lower 0.706 0.8824 0.941 0.941
# 11     speed   nw_high medrt.upper 0.471 0.7647 1.000 1.000
# 12     speed   nw_high  prob.upper 0.235 0.6471 0.941 1.000

As can be seen, for the RTs, the coverage probability is generally in line with the width of the CIs or even above it. Furthermore, for the common response (i.e., upper for frequency = nw_high and lower for frequency = high), the coverage probability is 1 for the 99% CIs in all cases.

Unfortunately, for the response probabilities, the coverage is not that great. especially in the speed condition and for tighter CIs. However, for the wide CIs the coverage probabilities is at least acceptable. Overall the results so far suggest that the model provides an adequate account. There are some misfits that should be kept in mind if one is interested in extending the model or fitting it to new data, but overall it provides a satisfactory account.

QQ-plots: RTs

The final approach for assessing the fit of the model will be based on more quantiles of the RT distribution (i.e., so far we only looked at th .5 quantile, the median). We will then plot individual observed versus predicted (i.e., mean from posterior predictive distribution) quantiles across conditions. For this we first calculate the quantiles per sample from the posterior predictive distribution and then aggregate across the samples. This is achieved via dplyr::summarise_at using a list column and tidyr::unnest to unstack the columns (see section 25.3 in “R for data Science”). We then combine the aggregated predicted RT quantiles with the observed RT quantiles.

quantiles <- c(0.1, 0.25, 0.5, 0.75, 0.9)

pp2 <- d_speed_acc %>% 
  group_by(id, condition, frequency) %>%  # select grouping vars
  summarise_at(.vars = vars(starts_with("V")), 
               funs(lower = list(rownames_to_column(
                 data.frame(q = quantile(abs(.[. < 0]), probs = quantiles)))),
                    upper = list(rownames_to_column(
                      data.frame(q = quantile(.[. > 0], probs = quantiles ))))
               )) %>% 
  ungroup %>% 
  gather("key", "value", -id, -condition, -frequency) %>% # remove grouping vars
  separate("key", c("rep", "boundary"), sep = "_") %>% 
  unnest(value) %>% 
  group_by(id, condition, frequency, boundary, rowname) %>% # grouping vars + new vars
  summarise(predicted = mean(q, na.rm = TRUE))

rt_pp <- speed_acc %>% 
  group_by(id, condition, frequency) %>% # select grouping vars
  summarise(lower = list(rownames_to_column(
    data.frame(observed = quantile(rt[response == "word"], probs = quantiles)))),
    upper = list(rownames_to_column(
      data.frame(observed = quantile(rt[response == "nonword"], probs = quantiles ))))
  ) %>% 
  ungroup %>% 
  gather("boundary", "value", -id, -condition, -frequency) %>%
  unnest(value) %>% 
  left_join(pp2)

To evaluate the agreement between observed and predicted quantiles we calculate for each cell and quantile the concordance correlation coefficient (CCC; e.g, Barchard, 2012, Psych. Methods). The CCC is a measure of absolute agreement between two values and thus better suited than simple correlation. It is scaled from -1 to 1 where 1 represent perfect agreement, 0 no relationship, and -1 a correlation of -1 with same mean and variance of the two variables.

The following code produces QQ-plots for each condition and quantile separately for responses to the upper boundary and lower boundary. The value in the upper left of each plot gives the CCC measures of absolute agreement.

plot_text <- rt_pp %>% 
  group_by(condition, frequency, rowname, boundary) %>% 
  summarise(ccc = format(
    CCC(observed, predicted, na.rm = TRUE)$rho.c$est, 
    digits = 2))

p_upper <- rt_pp %>% 
  filter(boundary == "upper") %>% 
  ggplot(aes(x = observed, predicted)) +
  geom_abline(slope = 1, intercept = 0) +
  geom_point() +
  facet_grid(condition+frequency~ rowname) + 
  geom_text(data=plot_text[ plot_text$boundary == "upper", ],
            aes(x = 0.5, y = 1.8, label=ccc), 
            parse = TRUE, inherit.aes=FALSE) +
  coord_fixed() +
  ggtitle("Upper responses") +
  theme_bw()

p_lower <- rt_pp %>% 
  filter(boundary == "lower") %>% 
  ggplot(aes(x = observed, predicted)) +
  geom_abline(slope = 1, intercept = 0) +
  geom_point() +
  facet_grid(condition+frequency~ rowname) + 
  geom_text(data=plot_text[ plot_text$boundary == "lower", ],
            aes(x = 0.5, y = 1.6, label=ccc), 
            parse = TRUE, inherit.aes=FALSE) +
  coord_fixed() +
  ggtitle("Lower responses") +
  theme_bw()

grid.arrange(p_upper, p_lower, ncol = 1)

Results show that overall the fit is better for the accuracy than the speed conditions. Furthermore, fit is better for the common response (i.e., nw_high for upper and high for lower responses). This latter observation is again not too surprising.

When comparing the fit for the different quantiles it appears that at least the median (i.e., 50%) shows acceptable values for the common response. However, especially in the speed condition the account of the other quantiles is not great. Nevertheless, dramatic misfit is only observed for the rare responses.

One possibility for some of the low CCCs in the speed conditions may be the comparatively low variances in some of the cells. For example, for both speed conditions that are common (i.e., speed & nw_high for upper responses and speed & high for lower responses) the visual inspection of the plot suggests a acceptable account while at the same time some CCC value are low (i.e., < .5). Only for the 90% quantile in the speed conditions (and somewhat less the 75% quantile) we see some systematic deviations. The model predicts slower RTs than observed.

Taken together, the model appear to provide an at least acceptable account. The only slightly worrying patterns are (a) that the model predicts a slightly better performance for the word stimuli than observed (i.e., lower predicted rate of non-word responses than observed for word-stimuli) and (b) that in the speed conditions the model predicts somewhat longer RTs for the 75% and 90% quantile than observed.

The next step will be to look at differences between parameters as a function of the speed-accuracy condition. This is the topic of the third blog post. I am hopeful it will not take two months this time.

 

]]>
http://singmann.org/wiener-model-analysis-with-brms-part-ii/feed/ 6 624
Diffusion/Wiener Model Analysis with brms – Part I: Introduction and Estimation http://singmann.org/wiener-model-analysis-with-brms-part-i/ http://singmann.org/wiener-model-analysis-with-brms-part-i/#comments Sun, 26 Nov 2017 17:47:48 +0000 http://singmann.org/?p=570 Stan is probably the most interesting development in computational statistics in the last few years, at least for me. The version of Hamiltonian Monte-Carlo (HMC) implemented in Stan (NUTS, ) is extremely efficient and the range of probability distributions implemented in the Stan language allows to fit an extremely wide range of models. Stan has considerably changed which models I think can be realistically estimated both in terms of model complexity and data size. It is not an overstatement to say that Stan (and particularly rstan) has considerable changed the way I analyze data.

One of the R packages that allows to implement Stan models in a very convenient manner and which has created a lot of buzz recently is brms . It allows to specify a wide range of models using the R formula interface. Based on the formula and a specification of the family of the model, it generates the model code, compiles it, and then passes it together with the data to rstan for sampling. Because I usually program my models by-hand (thanks to the great Stan documentation), I have so far stayed away from brms.

However, I recently learned that brms also allows the estimation of the Wiener model (i.e., the 4-parameter diffusion model, ) for simultaneously accounting for responses and corresponding response times for data from two-choice tasks. Such data is quite common in psychology and the diffusion model is one of the more popular cognitive models out there . In a series of (probably 3) posts I provide an example of applying the Wiener model to some published data using brms. This first part shows how to set up and estimate the model. The second part gives an overview of model diagnostics and an assessment of model fit via posterior predictive distributions. The third part shows how to inspect and compare the posterior distributions of the parameters.

In addition to brms and a working C++ compiler, this first part also needs package RWiener for generating the posterior predictive distribution within brms and package rtdists for the data.

library("brms")

Data and Model

A graphical illustration of the Wiener diffusion model for two-choice reaction times. An evidence counter starts at value `alpha`*`beta` and evolves with random increments. The mean increment is `delta` . The process terminates as soon as the accrued evidence exceeds `alpha` or deceeds 0. The decision process starts at time `tau` from the stimulus presentation and terminates at the reaction time. [This figure and caption are taken from Wabersich and Vandekerckhove (2014, The R Journal, CC-BY license).]

I expect the reader to already be familiar with the Wiener model and will only provide the a very brief introduction here, for more see . The Wiener model is a continuous-time evidence accumulation model for binary choice tasks. It assumes that in each trial evidence is accumulated in a noisy (diffusion) process by a single accumulator. Evidence accumulation starts at the start point and continues until the accumulator hits one of the two decision bounds in which case the corresponding response is given. The total response time is the sum of the decision time from the accumulation process plus non-decisional components. In sum, the Wiener model allows to decompose responses to a binary choice tasks and corresponding response times into four latent processes:

  • The drift rate (delta) is the average slope of the accumulation process towards the boundaries. The larger the (absolute value of the) drift rate, the stronger the evidence for the corresponding response option.
  • The boundary separation (alpha) is the distance between the two decision bounds and interpreted as a measure of response caution.
  • The starting point (beta) of the accumulation process is a measure of response bias towards one of the two response boundaries.
  • The non-decision time (tau) captures all non-decisional process such as stimulus encoding and response processes.

We will analyze part of the data from Experiment 1 of . The data comes from 17 participants performing a lexical decision task in which they have to decide if a presented string is a word or non-word. Participants made decisions either under speed or accuracy emphasis instructions in different experimental blocks. This data comes with the rtdists package (which provides the PDF, CDF, and RNG for the full 7-parameter diffusion model). After removing some extreme RTs, we restrict the analysis to high-frequency words (frequency = high) and the corresponding high-frequency non-words (frequency = nw_high) to reduce estimation time. To setup the model we also need a numeric response variable in which 0 corresponds to responses at the lower response boundary and 1 corresponds to responses at the upper boundary. For this we transform the categorical response variable response to numeric and subtract 1 such that a word response correspond to the lower response boundary and a nonword response to the upper boundary.

data(speed_acc, package = "rtdists")
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) # remove extreme RTs
speed_acc <- droplevels(speed_acc[ speed_acc$frequency %in% 
                                     c("high", "nw_high"),])
speed_acc$response2 <- as.numeric(speed_acc$response)-1
str(speed_acc)
'data.frame':    10462 obs. of  10 variables:
 $ id       : Factor w/ 17 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ block    : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ condition: Factor w/ 2 levels "accuracy","speed": 2 2 2 2 2 2 2 2 2 2 ...
 $ stim     : Factor w/ 1611 levels "1001","1002",..: 1271 46 110 666 422 ...
 $ stim_cat : Factor w/ 2 levels "word","nonword": 2 1 1 1 1 1 2 1 1 2 ...
 $ frequency: Factor w/ 2 levels "high","nw_high": 2 1 1 1 1 1 2 1 1 2 ...
 $ response : Factor w/ 2 levels "word","nonword": 2 1 1 1 1 1 1 1 1 1 ...
 $ rt       : num  0.773 0.39 0.435 0.427 0.622 0.441 0.308 0.436 0.412 ...
 $ censor   : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ response2: num  1 0 0 0 0 0 0 0 0 0 ...

Model Formula

The important decision that has to be made before setting up a model is which parameters are allowed to differ between which conditions (i.e., factor levels). One common constraint of the Wiener model (and other evidence-accumulation models) is that the parameters that are set before the evidence accumulation process starts (i.e., boundary separation, starting point, and non-decision time) cannot change based on stimulus characteristics that are not known to the participant before the start of the trial. Thus, the item-type, in the present case word versus non-word, is usually only allowed to affect the drift rate. We follow this constraint. Furthermore, all four parameters are allowed to vary between speed and accuracy condition as this is manipulated between blocks of trials. Also note that all relevant variables are manipulated within-subjects. Thus, the maximal random-effects structure entails corresponding random-effects parameters for each fixed-effect. To set up the model we need to invoke the bf() function and construct one formula for each of the four parameters of the Wiener model.

formula <- bf(rt | dec(response2) ~ 0 + condition:frequency + 
                (0 + condition:frequency|p|id), 
               bs ~ 0 + condition + (0 + condition|p|id), 
               ndt ~ 0 + condition + (0 + condition|p|id),
               bias ~ 0 + condition + (0 + condition|p|id))

The first formula is for the drift rate and is also used for specifying the column containing the RTs (rt) and the response or decision (response2) on the left hand side. On the right hand side one can specify fixed effects as well as random effects in a way similar to lme4. The drift rate is allowed to vary between both variables, condition and frequency (stim_cat would be equivalent), thus we estimate fixed effects as well as random effects for both factors as well as their interaction.

We then also need to set up one formula for each of the other three parameters (which are only allowed to vary by condition). For these formulas, the left hand side denotes the parameter names:

  • bs: boundary separation (alpha)
  • ndt: non-decision time (tau)
  • bias: starting point (beta)

The right hand side again specifies the fixed- and random-effects. Note that one common approach for setting up evidence accumulation models is to specify that one response boundary represent correct responses and one response boundary denotes incorrect responses (in contrast to the current approach in which the response boundaries represent the actually two response options). In such a situation one cannot estimate the starting point and it needs to be fixed to 0.5 (i.e., replace the formula with bias = 0.5).

Two further points are relevant in the formulas. First, I have used a somewhat uncommon parameterization and suppressed the intercept (e.g., ~ 0 + condition instead of ~ condition). The reason for this is that when an intercept is present, categorical variables (i.e., factors) with k levels are coded with k-1 deviation variables that represent deviations from the intercept. Thus, in a Bayesian setting one needs to consider the choice of prior for these deviation variables. In contrast, when suppressing the intercept the model can be setup such that each factor level (or design cells in case of involvement of more than one factor) receives its own parameter, as done here. This essentially allows the same prior for each parameter (as long as one does not expect the parameters to vary dramatically). Furthermore, when programming a model oneself this is a common parameterization. To see the differences between the different parameterizations compare the following two calls (model.matrix is the function that creates the parameterization internally). Only the first creates a separate parameter for each condition.

unique(model.matrix(~0+condition, speed_acc))
##     conditionaccuracy conditionspeed
## 36                  0              1
## 128                 1              0
unique(model.matrix(~condition, speed_acc))
##     (Intercept) conditionspeed
## 36            1              1
## 128           1              0

Note that when more than one factor is involved and one wants to use this parameterization, one needs to combine the factors using the : and not *. This can be seen when running the code below. Also note that when combining the factors with : without suppressing the intercept, the resulting model has one parameter more than can be estimated (i.e., the model-matrix is rank deficient). So care needs to be taken at this step.

unique(model.matrix(~ 0 + condition:frequency, speed_acc))
unique(model.matrix(~ 0 + condition*frequency, speed_acc))
unique(model.matrix(~ condition:frequency, speed_acc))

Second, brms formulas provide a way to estimate correlations among random-effects parameters of different formulas. To achieve this, one can place an identifier in the middle of the random-effects formula that is separated by | on both sides. Correlations among random-effects will then be estimated for all random-effects formulas that share the same identifier. In our case, we want to estimate the full random-effects matrix with correlations among all model parameters, following the “latent-trait approach” . We therefore place the same identifier (p) in all formulas. Thus, correlations will be estimated among all individual-level deviations across all four Wiener parameters. To estimate correlations only among the random-effects parameters of each formula, simply omit the identifier (e.g., (0 + condition|id)). Furthermore, note that brms, similar to afex, supports suppressing the correlations among categorical random-effects parameters via || (e.g., (0 + condition||id)).

Family, Link-Functions, and Priors

The next step is to setup the priors. For this we can invoke the get_prior function. This function requires one to specify the formula, data, as well as the family of the model. family is the argument where we tell brms that we want to use the wiener model. We also use it to specify the link function for the four Wiener parameters. Because the drift rate can take on any value (i.e., from -Inf to Inf), the default link function is "identity" (i.e., no transformation) which we retain. The other three parameters all have a restricted range. The boundary needs to be larger than 0, the non-decision time needs to be larger than 0 and smaller than the smallest RT, and the starting point needs to be between 0 and 1. The default link-functions respect these constraints and use "log" for the first two parameters and "logit" for the bias. This certainly is a possibility, but has a number of drawbacks leading me to use the "identity" link function for all parameters. First, when parameters are transformed, the priors need to be specified on the untransformed scale. Second, the individual-levels deviations (i.e., the random-effects estimates) are assumed to come from a multivariate normal distribution. Parameters transformations would entail that these individual-deviations are only normally distributed on the untransformed scale. Likewise, the correlations of parameter deviations across parameters would also be on the untransformed scale. Both make the interpretation of the random-effects difficult.

When specifying the parameters without transformation (i.e., link = "identity") care must be taken that the priors places most mass on values inside the allowed range. Likewise, starting values need to be inside the allowed range. Using the identity link function also comes with drawbacks discussed at the end. However, as long as parameter outside the allowed range only occur rarely, such a model can converge successfully and it makes the interpretation easier.

The get_prior function returns a data.frame containing all parameters of the model. If parameters have default priors these are listed as well. One needs to define priors either for individual parameters, parameter classes, or parameter classes for specific groups, or dpars. Note that all parameters that do not have a default prior should receive a specific prior.

get_prior(formula,
          data = speed_acc, 
          family = wiener(link_bs = "identity", 
                          link_ndt = "identity", 
                          link_bias = "identity"))

[Two empty columns to the right were removed from the following output.]

                 prior class                               coef group resp dpar 
1                          b                                                    
2                          b    conditionaccuracy:frequencyhigh                 
3                          b conditionaccuracy:frequencynw_high                 
4                          b       conditionspeed:frequencyhigh                 
5                          b    conditionspeed:frequencynw_high                 
6               lkj(1)   cor                                                    
7                        cor                                       id           
8  student_t(3, 0, 10)    sd                                                    
9                         sd                                       id           
10                        sd    conditionaccuracy:frequencyhigh    id           
11                        sd conditionaccuracy:frequencynw_high    id           
12                        sd       conditionspeed:frequencyhigh    id           
13                        sd    conditionspeed:frequencynw_high    id           
14                         b                                               bias 
15                         b                  conditionaccuracy            bias 
16                         b                     conditionspeed            bias 
17 student_t(3, 0, 10)    sd                                               bias 
18                        sd                                       id      bias 
19                        sd                  conditionaccuracy    id      bias 
20                        sd                     conditionspeed    id      bias 
21                         b                                                 bs 
22                         b                  conditionaccuracy              bs 
23                         b                     conditionspeed              bs 
24 student_t(3, 0, 10)    sd                                                 bs 
25                        sd                                       id        bs 
26                        sd                  conditionaccuracy    id        bs 
27                        sd                     conditionspeed    id        bs 
28                         b                                                ndt 
29                         b                  conditionaccuracy             ndt 
30                         b                     conditionspeed             ndt 
31 student_t(3, 0, 10)    sd                                                ndt 
32                        sd                                       id       ndt 
33                        sd                  conditionaccuracy    id       ndt 
34                        sd                     conditionspeed    id       ndt

Priors can be defined with the prior or set_prior function allowing different levels of control. One benefit of the way the model is parameterized is that we only need to specify priors for one set of parameters per Wiener parameters (i.e., b) and do not have to distinguish between intercept and other parameters.

We base our choice of the priors on prior knowledge of likely parameter values for the Wiener model, but otherwise try to specify them in a weakly informative manner. That is, they should restrict the range to likely values but not affect the estimation any further. For the drift rate we use a Cauchy distribution with location 0 and scale 5 so that roughly 70% of prior mass are between -10 and 10. For the boundary separation we use a normal prior with mean 1.5 and standard deviation of 1, for the non-decision time a normal prior with mean 0.2 and standard deviation of 0.1, and for the bias we use a normal with mean of 0.5 (i.e., no-bias) and standard deviation of 0.2.

prior <- c(
 prior("cauchy(0, 5)", class = "b"),
 set_prior("normal(1.5, 1)", class = "b", dpar = "bs"),
 set_prior("normal(0.2, 0.1)", class = "b", dpar = "ndt"),
 set_prior("normal(0.5, 0.2)", class = "b", dpar = "bias")
)

With this information we can use the make_stancode function and inspect the full model code. The important thing is to make sure that all parameters listed in the parameters block have a prior listed in the model block. We can also see, at the beginning of the model block, that none of our parameters is transformed just as desired (a bug in a previous version of brms prevented anything but the default links for the Wiener model parameters).

make_stancode(formula, 
              family = wiener(link_bs = "identity", 
                              link_ndt = "identity",
                              link_bias = "identity"),
              data = speed_acc, 
              prior = prior)

 

// generated with brms 1.10.2
functions { 

  /* Wiener diffusion log-PDF for a single response
   * Args: 
   *   y: reaction time data
   *   dec: decision data (0 or 1)
   *   alpha: boundary separation parameter > 0
   *   tau: non-decision time parameter > 0
   *   beta: initial bias parameter in [0, 1]
   *   delta: drift rate parameter
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
   real wiener_diffusion_lpdf(real y, int dec, real alpha, 
                              real tau, real beta, real delta) { 
     if (dec == 1) {
       return wiener_lpdf(y | alpha, tau, beta, delta);
     } else {
       return wiener_lpdf(y | alpha, tau, 1 - beta, - delta);
     }
   }
} 
data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K;  // number of population-level effects 
  matrix[N, K] X;  // population-level design matrix 
  int<lower=1> K_bs;  // number of population-level effects 
  matrix[N, K_bs] X_bs;  // population-level design matrix 
  int<lower=1> K_ndt;  // number of population-level effects 
  matrix[N, K_ndt] X_ndt;  // population-level design matrix 
  int<lower=1> K_bias;  // number of population-level effects 
  matrix[N, K_bias] X_bias;  // population-level design matrix 
  // data for group-level effects of ID 1 
  int<lower=1> J_1[N]; 
  int<lower=1> N_1; 
  int<lower=1> M_1; 
  vector[N] Z_1_1; 
  vector[N] Z_1_2; 
  vector[N] Z_1_3; 
  vector[N] Z_1_4; 
  vector[N] Z_1_bs_5; 
  vector[N] Z_1_bs_6; 
  vector[N] Z_1_ndt_7; 
  vector[N] Z_1_ndt_8; 
  vector[N] Z_1_bias_9; 
  vector[N] Z_1_bias_10; 
  int<lower=1> NC_1; 
  int<lower=0,upper=1> dec[N];  // decisions 
  int prior_only;  // should the likelihood be ignored? 
} 
transformed data { 
  real min_Y = min(Y); 
} 
parameters { 
  vector[K] b;  // population-level effects 
  vector[K_bs] b_bs;  // population-level effects 
  vector[K_ndt] b_ndt;  // population-level effects 
  vector[K_bias] b_bias;  // population-level effects 
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations 
  matrix[M_1, N_1] z_1;  // unscaled group-level effects 
  // cholesky factor of correlation matrix 
  cholesky_factor_corr[M_1] L_1; 
} 
transformed parameters { 
  // group-level effects 
  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)'; 
  vector[N_1] r_1_1 = r_1[, 1]; 
  vector[N_1] r_1_2 = r_1[, 2]; 
  vector[N_1] r_1_3 = r_1[, 3]; 
  vector[N_1] r_1_4 = r_1[, 4]; 
  vector[N_1] r_1_bs_5 = r_1[, 5]; 
  vector[N_1] r_1_bs_6 = r_1[, 6]; 
  vector[N_1] r_1_ndt_7 = r_1[, 7]; 
  vector[N_1] r_1_ndt_8 = r_1[, 8]; 
  vector[N_1] r_1_bias_9 = r_1[, 9]; 
  vector[N_1] r_1_bias_10 = r_1[, 10]; 
} 
model { 
  vector[N] mu = X * b; 
  vector[N] bs = X_bs * b_bs; 
  vector[N] ndt = X_ndt * b_ndt; 
  vector[N] bias = X_bias * b_bias; 
  for (n in 1:N) { 
    mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n] + (r_1_2[J_1[n]]) * Z_1_2[n] + (r_1_3[J_1[n]]) * Z_1_3[n] + (r_1_4[J_1[n]]) * Z_1_4[n]; 
    bs[n] = bs[n] + (r_1_bs_5[J_1[n]]) * Z_1_bs_5[n] + (r_1_bs_6[J_1[n]]) * Z_1_bs_6[n]; 
    ndt[n] = ndt[n] + (r_1_ndt_7[J_1[n]]) * Z_1_ndt_7[n] + (r_1_ndt_8[J_1[n]]) * Z_1_ndt_8[n]; 
    bias[n] = bias[n] + (r_1_bias_9[J_1[n]]) * Z_1_bias_9[n] + (r_1_bias_10[J_1[n]]) * Z_1_bias_10[n]; 
  } 
  // priors including all constants 
  target += cauchy_lpdf(b | 0, 5); 
  target += normal_lpdf(b_bs | 1.5, 1); 
  target += normal_lpdf(b_ndt | 0.2, 0.1); 
  target += normal_lpdf(b_bias | 0.5, 0.2); 
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 10 * student_t_lccdf(0 | 3, 0, 10); 
  target += lkj_corr_cholesky_lpdf(L_1 | 1); 
  target += normal_lpdf(to_vector(z_1) | 0, 1); 
  // likelihood including all constants 
  if (!prior_only) { 
    for (n in 1:N) { 
      target += wiener_diffusion_lpdf(Y[n] | dec[n], bs[n], ndt[n], bias[n], mu[n]); 
    } 
  } 
} 
generated quantities { 
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1); 
  vector<lower=-1,upper=1>[NC_1] cor_1; 
  // take only relevant parts of correlation matrix 
  cor_1[1] = Cor_1[1,2]; 
  [...]
  cor_1[45] = Cor_1[9,10]; 
}

[The output was slightly modified.]

The last piece we need, before we can finally estimate the model, is a function that generates initial values. Without initial values that lead to an identifiable model for all data points, estimation will not start. The function needs to provide initial values for all parameters listed in the parameters block of the model. Note that many of those parameters have at least one dimension with a parameterized extent (e.g., K). We can use make_standata and create the data set used by brms for the estimation for obtaining the necessary information. We then use this data object (i.e., a list) for generating the correctly sized initial values in function initfun (note that initfun relies on the fact that tmp_dat is in the global environment which is something of a code smell).

tmp_dat <- make_standata(formula, 
                         family = wiener(link_bs = "identity", 
                              link_ndt = "identity",
                              link_bias = "identity"),
                            data = speed_acc, prior = prior)
str(tmp_dat, 1, give.attr = FALSE)
## List of 26
##  $ N          : int 10462
##  $ Y          : num [1:10462(1d)] 0.773 0.39 0.435  ...
##  $ K          : int 4
##  $ X          : num [1:10462, 1:4] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_1      : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_2      : num [1:10462(1d)] 0 1 1 1 1 1 0 1 1 0 ...
##  $ Z_1_3      : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_4      : num [1:10462(1d)] 1 0 0 0 0 0 1 0 0 1 ...
##  $ K_bs       : int 2
##  $ X_bs       : num [1:10462, 1:2] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bs_5   : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bs_6   : num [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ K_ndt      : int 2
##  $ X_ndt      : num [1:10462, 1:2] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_ndt_7  : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_ndt_8  : num [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ K_bias     : int 2
##  $ X_bias     : num [1:10462, 1:2] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bias_9 : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bias_10: num [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ J_1        : int [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ N_1        : int 17
##  $ M_1        : int 10
##  $ NC_1       : num 45
##  $ dec        : num [1:10462(1d)] 1 0 0 0 0 0 0 0 0 0 ...
##  $ prior_only : int 0

initfun <- function() {
  list(
    b = rnorm(tmp_dat$K),
    b_bs = runif(tmp_dat$K_bs, 1, 2),
    b_ndt = runif(tmp_dat$K_ndt, 0.1, 0.15),
    b_bias = rnorm(tmp_dat$K_bias, 0.5, 0.1),
    sd_1 = runif(tmp_dat$M_1, 0.5, 1),
    z_1 = matrix(rnorm(tmp_dat$M_1*tmp_dat$N_1, 0, 0.01),
                 tmp_dat$M_1, tmp_dat$N_1),
    L_1 = diag(tmp_dat$M_1)
  )
}

Estimation (i.e., Sampling)

Finally, we have all pieces together and can estimate the Wiener model using the brm function. Note that this will take roughly a full day, depending on the speed of your PC also longer. We also already increase the maximal treedepth to 15. We probably should have also increased adapt_delta above the default value of .8 as there are a few divergent transitions, but this is left as an exercise to the reader.

After estimation is finished, we see that there are a few (< 10) divergent transitions. If this were a real analysis and not only an example, we would need to increase adapt_delta to a larger value (e.g., .95 or .99) and rerun the estimation. In this case however, we immediately begin with the second step and obtain samples from the posterior predictive distribution using predict. For this it is important to specify the number of posterior samples (here we use 500). In addition, it is important to set summary = FALSE, for obtaining the actual posterior predictive distribution and not a summary of the posterior predictive distribution, and negative_rt = TRUE. The latter ensures that predicted responses to the lower boundary receive a negative sign whereas predicted responses to the upper boundary receive a positive sign.

fit_wiener <- brm(formula, 
                  data = speed_acc,
                  family = wiener(link_bs = "identity", 
                                  link_ndt = "identity",
                                  link_bias = "identity"),
                  prior = prior, inits = initfun,
                  iter = 1000, warmup = 500, 
                  chains = 4, cores = 4, 
                  control = list(max_treedepth = 15))
NPRED <- 500
pred_wiener <- predict(fit_wiener, 
                       summary = FALSE, 
                       negative_rt = TRUE, 
                       nsamples = NPRED)

Because both steps are quite time intensive (estimation 1 day, obtaining posterior predictives a few hours), we save the results of both steps. Given the comparatively large size of both objects, using the 'xz' compression (i.e., the strongest in R) seems like a good idea.

save(fit_wiener, file = "brms_wiener_example_fit.rda", 
     compress = "xz")
save(pred_wiener, file = "brms_wiener_example_predictions.rda", 
     compress = "xz")

The second part shows how to perform model diagnostics and how to asses the model fit. The third part shows how to test for differences in parameters between conditions.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/wiener-model-analysis-with-brms-part-i/feed/ 3 570
ANOVA in R: afex may be the solution you are looking for http://singmann.org/anova-in-r-afex-may-be-the-solution-you-are-looking-for/ http://singmann.org/anova-in-r-afex-may-be-the-solution-you-are-looking-for/#comments Mon, 05 Jun 2017 15:14:59 +0000 http://singmann.org/?p=485 Prelude: When you start with R and try to estimate a standard ANOVA , which is relatively simple in commercial software like SPSS, R kind of sucks. Especially for unbalanced designs or designs with repeated-measures replicating the results from such software in base R may require considerable effort. For a newcomer (and even an old timer) this can be somewhat off-putting. After I had gained experience developing my first package and was once again struggling with R and ANOVA I had enough and decided to develop afex. If you know this feeling, afex is also for you.


A new version of afex (0.18-0) has been accepted on CRAN a few days ago. This version only fixes a small bug that was introduced in the last version.  aov_ez did not work with more than one covariate (thanks to tkerwin for reporting this bug).

I want to use this opportunity to introduce one of the main functionalities of afex. It provides a set of functions that make calculating ANOVAs easy. In the default settings, afex automatically uses appropriate orthogonal contrasts for factors, transforms numerical variables into factors, uses so-called Type III sums of squares, and allows for any number of factors including repeated-measures (or within-subjects) factors and mixed/split-plot designs. Together this guarantees that the ANOVA results correspond to the results obtained from commercial statistical packages such as SPSS or SAS. On top of this, the ANOVA object returned by afex (of class afex_aov) can be directly used for follow-up or post-hoc tests/contrasts using the lsmeans package .

Example Data

Let me illustrate how to calculate an ANOVA with a simple example. We use data courtesy of Andrew Heathcote and colleagues . The data are lexical decision and word naming latencies for 300 words and 300 nonwords from 45 participants. Here we only look at three factors:

  • task is a between subjects (or independent-samples) factor: 25 participants worked on the lexical decision task (lexdec; i.e., participants had to make a binary decision whether or not the presented string is a word or nonword) and 20 participants on the naming task (naming; i.e., participant had to say the presented string out loud).
  • stimulus is a repeated-measures or within-subjects factor that codes whether a presented string was a word or nonword.
  • length is also a repeated-measures factor that gives the number of characters of the presented strings with three levels: 3, 4, and 5.

The dependent variable is the response latency or response time for each presented string. More specifically, as is common in the literature we analyze the log of the response times, log_rt. After excluding erroneous responses each participants responded to between 135 and 150 words and between 124 and 150 nonwords. To use this data in an ANOVA one needs to aggregate the data such that only one observation per participant and cell of the design (i.e., combination of all factors) remains. As we will see, afex does this automatically for us (this is one of the features I blatantly stole from ez).

library(afex)
data("fhch2010") # load data (comes with afex) 

mean(!fhch2010$correct) # error rate
# [1] 0.01981546
fhch <- droplevels(fhch2010[ fhch2010$correct,]) # remove errors

str(fhch2010) # structure of the data
# 'data.frame': 13222 obs. of  10 variables:
#  $ id       : Factor w/ 45 levels "N1","N12","N13",..: 1 1 1 1 1 1 1 1 ...
#  $ task     : Factor w/ 2 levels "naming","lexdec": 1 1 1 1 1 1 1 1 1 1 ...
#  $ stimulus : Factor w/ 2 levels "word","nonword": 1 1 1 2 2 1 2 2 1 2 ...
#  $ density  : Factor w/ 2 levels "low","high": 2 1 1 2 1 2 1 1 1 1 ...
#  $ frequency: Factor w/ 2 levels "low","high": 1 2 2 2 2 2 1 2 1 2 ...
#  $ length   : Factor w/ 3 levels "4","5","6": 3 3 2 2 1 1 3 2 1 3 ...
#  $ item     : Factor w/ 600 levels "abide","acts",..: 363 121 ...
#  $ rt       : num  1.091 0.876 0.71 1.21 0.843 ...
#  $ log_rt   : num  0.0871 -0.1324 -0.3425 0.1906 -0.1708 ...
#  $ correct  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

We first load the data and remove the roughly 2% errors. The structure of the data.frame (obtained via str()) shows us that the data has a few more factors than discussed here. To specify our ANOVA we first use function aov_car() which works very similar to base R aov(), but as all afex functions uses car::Anova() (read as function Anova() from package car) as the backend for calculating the ANOVA.

Specifying an ANOVA

(a1 <- aov_car(log_rt ~ task*length*stimulus + Error(id/(length*stimulus)), fhch))
# Contrasts set to contr.sum for the following variables: task
# Anova Table (Type 3 tests)
# 
# Response: log_rt
#                 Effect          df  MSE          F   ges p.value
# 1                 task       1, 43 0.23  13.38 ***   .22   .0007
# 2               length 1.83, 78.64 0.00  18.55 ***  .008  <.0001
# 3          task:length 1.83, 78.64 0.00       1.02 .0004     .36
# 4             stimulus       1, 43 0.01 173.25 ***   .17  <.0001
# 5        task:stimulus       1, 43 0.01  87.56 ***   .10  <.0001
# 6      length:stimulus 1.70, 72.97 0.00       1.91 .0007     .16
# 7 task:length:stimulus 1.70, 72.97 0.00       1.21 .0005     .30
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
# 
# Sphericity correction method: GG 
# Warning message:
# More than one observation per cell, aggregating the data using mean (i.e, fun_aggregate = mean)!

The printed output is an ANOVA table that could basically be copied to a manuscript as is. One sees the terms in column Effect, the degrees of freedoms (df), the mean-squared error (MSE, I would probably remove this column in a manuscript), the F-value (F, which also contains the significance stars), and the p-value (p.value). The only somewhat uncommon column is ges which provides generalized eta-squared, ‘the recommended effect size statistics for repeated measure designs’  . The standard output also reports Greenhouse-Geisser (GG) corrected df for repeated-measures factors with more than two levels (to account for possible violations of sphericity). Note that these corrected df are not integers.

We can also see a warning notifying us that afex has detected that each participant and cell of the design provides more than one observation which are then automatically aggregated using mean. The warning serves the purpose to notify the user in case this was not intended (i.e., when there should be only one observation per participant and cell of the design). The warning can be suppressed via specifying fun_aggregate = mean explicitly in the call to aov_car.

The formula passed to aov_car basically needs to be the same as for standard aov with a few differences:

  • It must have an Error term specifying the column containing the participant (or unit of observation) identifier (e.g., minimally +Error(id)). This is necessary to allow the automatic aggregation even in designs without repeated-measures factor.
  • Repeated-measures factors only need to be defined in the Error term and do not need to be enclosed by parentheses. Consequently, the following call produces the same ANOVA:
    aov_car(log_rt ~ task + Error(id/length*stimulus), fhch)

     

In addition to aov_car, afex provides two further function for calculating ANOVAs. These function produce the same output but differ in the way how to specify the ANOVA.

  • aov_ez allows the ANOVA specification not via a formula but via character vectors (and is similar to ez::ezANOVA()):
    aov_ez(id = "id", dv = "log_rt", fhch, between = "task", within = c("length", "stimulus"))
  • aov_4 requires a formula for which the id and repeated-measures factors need to be specified as in lme4::lmer() (with the same simplification that repeated-measures factors only need to be specified in the random part):
    aov_4(log_rt ~ task + (length*stimulus|id), fhch)
    aov_4(log_rt ~ task*length*stimulus + (length*stimulus|id), fhch)
    

Follow-up Tests

A common requirement after the omnibus test provided by the ANOVA is some-sort of follow-up analysis. For this purpose, afex is fully integrated with lsmeans .

For example, assume we are interested in the significant task:stimulus interaction. As a first step we might want to investigate the marginal means of these two factors:

lsmeans(a1, c("stimulus","task"))
# NOTE: Results may be misleading due to involvement in interactions
#  stimulus task        lsmean         SE    df    lower.CL    upper.CL
#  word     naming -0.34111656 0.04250050 48.46 -0.42654877 -0.25568435
#  nonword  naming -0.02687619 0.04250050 48.46 -0.11230839  0.05855602
#  word     lexdec  0.00331642 0.04224522 47.37 -0.08165241  0.08828525
#  nonword  lexdec  0.05640801 0.04224522 47.37 -0.02856083  0.14137684
# 
# Results are averaged over the levels of: length 
# Confidence level used: 0.95 

From this we can see naming trials seems to be generally slower (as a reminder, the dv is log-transformed RT in seconds, so values below 0 correspond to RTs bewteen 0 and 1), It also appears that the difference between word and nonword trials is larger in the naming task then in the lexdec task. We test this with the following code using a few different lsmeans function. We first use lsmeans again, but this time using task as the conditioning variable specified in by. Then we use pairs() for obtaining all pairwise comparisons within each conditioning strata (i.e., level of task). This provides us already with the correct tests, but does not control for the family-wise error rate across both tests. To get those, we simply update() the returned results and remove the conditioning by setting by=NULL. In the call to update we can already specify the method for error control and we specify 'holm',  because it is uniformly more powerful than Bonferroni.

# set up conditional marginal means:
(ls1 <- lsmeans(a1, c("stimulus"), by="task"))
# task = naming:
#  stimulus      lsmean         SE    df    lower.CL    upper.CL
#  word     -0.34111656 0.04250050 48.46 -0.42654877 -0.25568435
#  nonword  -0.02687619 0.04250050 48.46 -0.11230839  0.05855602
# 
# task = lexdec:
#  stimulus      lsmean         SE    df    lower.CL    upper.CL
#  word      0.00331642 0.04224522 47.37 -0.08165241  0.08828525
#  nonword   0.05640801 0.04224522 47.37 -0.02856083  0.14137684
# 
# Results are averaged over the levels of: length 
# Confidence level used: 0.95 
update(pairs(ls1), by=NULL, adjust = "holm")
#  contrast       task      estimate         SE df t.ratio p.value
#  word - nonword naming -0.31424037 0.02080113 43 -15.107  <.0001
#  word - nonword lexdec -0.05309159 0.01860509 43  -2.854  0.0066
# 
# Results are averaged over the levels of: length 
# P value adjustment: holm method for 2 tests

Hmm. These results show that the stimulus effects in both task conditions are independently significant. Obviously, the difference between them must also be significant then, or?

pairs(update(pairs(ls1), by=NULL))
# contrast                              estimate         SE df t.ratio p.value
# wrd-nnwrd,naming - wrd-nnwrd,lexdec -0.2611488 0.02790764 43  -9.358  <.0001

They obviously are. As a reminder, the interaction is testing exactly this, the difference of the difference. And we can actually recover the F-value of the interaction using lsmeans alone by invoking yet another of its functions, test(..., joint=TRUE).

test(pairs(update(pairs(ls1), by=NULL)), joint=TRUE)
# df1 df2      F p.value
#   1  43 87.565  <.0001

These last two example were perhaps not particularly interesting from a statistical point of view, but show an important ability of lsmeans. Any set of estimated marginal means produced by lsmeans, including any sort of (custom) contrasts, can be used again for further tests or calculating new sets of marginal means. And with test() we can even obtain joint F-tests over several parameters using joint=TRUE. lsmeans is extremely powerful and one of my most frequently used packages that basically performs all tests following an omnibus test (and in its latest version it directly interfaces with rstanarm so it can now also be used for a lot of Bayesian stuff, but this is the topic of another blog post).

Finally, lsmeans can also be used directly for plotting by envoking lsmip:

lsmip(a1, task ~ stimulus)

Note that lsmip does not add error bars to the estimated marginal means, but only plots the point estimates. There are mainly two reasons for this. First, as soon as repeated-measures factors are involved, it is difficult to decide which error bars to plot. Standard error bars based on the standard error of the mean are not appropriate for within-subjects comparisons. For those, one would need to use a within-subject intervals  (see also here or here). Especially for plots as the current one with both independent-samples and repeated-measures factors (i.e., mixed within-between designs or split-plot designs) no error bar will allow comparisons across both dimensions. Second, only ‘if the SE [i.e., standard error] of the mean is exactly 1/2 the SE of the difference of two means — which is almost never the case — it would be appropriate to use overlapping confidence intervals to test comparisons of means’ (lsmeans author Russel Lenth, the link provides an alternative).

We can also use lsmeans in combination with lattice to plot the results on the unconstrained scale (i.e., after back-transforming tha data from the log scale to the original scale of response time in seconds). The plot is not shown here.

lsm1 <- summary(lsmeans(a1, c("stimulus","task")))
lsm1$lsmean <- exp(lsm1$lsmean)
require(lattice)
xyplot(lsmean ~ stimulus, lsm1, group = task, type = "b", 
       auto.key = list(space = "right"))

 

Summary

  • afex provides a set of functions that make specifying standard ANOVAs for an arbitrary number of between-subjects (i.e., independent-sample) or within-subjects (i.e., repeated-measures) factors easy: aov_car(), aov_ez(), and aov_4().
  • In its default settings, the afex ANOVA functions replicate the results of commercial statistical packages such as SPSS or SAS (using orthogonal contrasts and Type III sums of squares).
  • Fitted ANOVA models can be passed to lsmeans for follow-up tests, custom contrast tests, and plotting.
  • For specific questions visit the new afex support forum: afex.singmann.science (I think we just need someone to ask the first ANOVA question to get the ball rolling).
  • For more examples see the vignette or here (blog post by Ulf Mertens) or download the full example R script used here.

As a caveat, let me end this post with some cautionary remarks from Douglas Bates (fortunes::fortune(184)) who explains why ANOVA in R is supposed to not be the same as in other software packages (i.e., he justifies why it ‘sucks’):

You must realize that R is written by experts in statistics and statistical computing who, despite popular opinion, do not believe that everything in SAS and SPSS is worth copying. Some things done in such packages, which trace their roots back to the days of punched cards and magnetic tape when fitting a single linear model may take several days because your first 5 attempts failed due to syntax errors in the JCL or the SAS code, still reflect the approach of “give me every possible statistic that could be calculated from this model, whether or not it makes sense”. The approach taken in R is different. The underlying assumption is that the useR is thinking about the analysis while doing it.
— Douglas Bates (in reply to the suggestion to include type III sums of squares and lsmeans in base R to make it more similar to SAS or SPSS)
R-help (March 2007)

Maybe he is right, but maybe what I have described here is useful to some degree.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/anova-in-r-afex-may-be-the-solution-you-are-looking-for/feed/ 8 485
Mixed models for ANOVA designs with one observation per unit of observation and cell of the design http://singmann.org/mixed-models-for-anova-designs-with-one-observation-per-unit-of-observation-and-cell-of-the-design/ http://singmann.org/mixed-models-for-anova-designs-with-one-observation-per-unit-of-observation-and-cell-of-the-design/#comments Mon, 29 May 2017 20:34:17 +0000 http://singmann.org/?p=499 Together with David Kellen I am currently working on an introductory chapter to mixed models for a book edited by Dan Spieler and Eric Schumacher (the current version can be found here). The goal is to provide a theoretical and practical introduction that is targeted mainly at experimental psychologists, neuroscientists, and others working with experimental designs and human data. The practical part focuses obviously on R, specifically on lme4 and afex.

One part of the chapter was supposed to deal with designs that cannot be estimated with the maximal random effects structure justified by the design because there is only one observation per participant and cell of the design. Such designs are the classical repeated-measures ANOVA design as ANOVA cannot deal with replicates at the cell levels (i.e., those are usually aggregated to yield one observation per cell and unit of observation). Based on my previous thoughts that turned out to be wrong we wrote the following:

Random Effects Structures for Traditional ANOVA Designs

The estimation of the maximal model is not possible when there is only one observation per participant and cell of a repeated-measures design. These designs are typically analyzed using a repeated-measures ANOVA. Currently, there are no clear guidelines on how to proceed in such situations, but we will try to provide some advice. If there is only a single random effects grouping factor, for example participants, we feel that instead of a mixed model, it is appropriate to use a standard repeated-measures ANOVA that addresses sphericity violations via the Greenhouse-Geisser correction.

One alternative strategy that employs mixed models and that we do not recommend consists of using the random-intercept only model or removing the random slopes for the highest within-subject interaction. The resulting model assumes invariance of the omitted random effects across participants. If this assumption is violated such a model produces results that cannot be trusted . […]

Fortunately, we asked Jake Westfall to take a look at the chapter and Jake responded:

I don’t think I agree with this. In the situation you describe, where we have a single random factor in a balanced ANOVA-like design with 1 observation per unit per cell, personally I am a proponent of the omit-the-the-highest-level-random-interaction approach. In this kind of design, the random slopes for the highest-level interaction are perfectly confounded with the trial-level error term (in more technical language, the model is only identifiable up to the sum of these two variance components), which is what causes the identifiability problems when one tries to estimate the full maximal model there. (You know all of this of course.) So two equivalent ways to make the model identifiable are to (1) omit the error term, i.e., force the residual variance to be 0, or (2) omit the random slopes for the highest-level interaction. Both of these approaches should (AFAIK) result in a statistically equivalent model, but lme4 does not provide an easy way to do (1), so I generally recommend (2). The important point here is that the standard errors should still be correct in either case — because these two variance components are confounded, omitting e.g. the random interaction slopes simply causes that omitted variance component to be implicitly added to the residual variance, where it is still incorporated into the standard errors of the fixed effects in the appropriate way (because the standard error of the fixed interaction looks roughly like sqrt[(var_error + var_interaction)/n_subjects]). I think one could pretty easily put together a little simulation that would demonstrate this.

Hmm, that sounds very reasonable, but can my intuition on the random effects structure and mixed models really be that wrong? To investigate this I followed Jake’s advise and coded a short simulation that tested this and as it turns out, Jake is right and I was wrong.

In the simulation we will simulate a simple one-factor repeated-measures design with one factor with three levels. Importantly, each unit of observation will only have one observation per factor level. We will then fit this simulated data with both repeated-measures ANOVA and random-intercept only mixed model and compare their p-values. Note again that for such a design we cannot estimate random slopes for the condition effect.

First, we need a few packages and set some parameters for our simulation:

require(afex)
set_sum_contrasts() # for orthogonal sum-to-zero contrasts
require(MASS) 

NSIM <- 1e4  # number of simulated data sets
NPAR <- 30  # number of participants per cell
NCELLS <- 3  # number of cells (i.e., groups)

Now we need to generate the data. For this I employed an approach that is clearly not the most parsimonious, but most clearly follows the formulation of a mixed model that has random variability in the condition effect and on top of this residual variance (i.e., the two confounded factors).

We first create a bare bone data.frame with participant id and condition column and a corresponding model.matrix. Then we create the three random parameters (i.e., intercept and the two parameters for the three conditions) using a zero-centered multivarite normal with specified variance-covariance matrix. We then loop over the participant and estimate the predictions deriving from the three random effects parameters. Only after this we add uncorrelated residual variance to the observations for each simulated data set.

dat <- expand.grid(condition = factor(letters[seq_len(NCELLS)]),
                   id = factor(seq_len(NPAR)))
head(dat)
#   condition id
# 1         a  1
# 2         b  1
# 3         c  1
# 4         a  2
# 5         b  2
# 6         c  2

mm <- model.matrix(~condition, dat)
head(mm)
#   (Intercept) condition1 condition2
# 1           1          1          0
# 2           1          0          1
# 3           1         -1         -1
# 4           1          1          0
# 5           1          0          1
# 6           1         -1         -1

Sigma_c_1 <- matrix(0.6, NCELLS,NCELLS)
diag(Sigma_c_1) <- 1
d_c_1 <- replicate(NSIM, mvrnorm(NPAR, rep(0, NCELLS), Sigma_c_1), simplify = FALSE)

gen_dat <- vector("list", NSIM)
for(i in seq_len(NSIM)) {
  gen_dat[[i]] <- dat
  gen_dat[[i]]$dv <- NA_real_
  for (j in seq_len(NPAR)) {
    gen_dat[[i]][(j-1)*3+(1:3),"dv"] <- mm[1:3,] %*% d_c_1[[i]][j,]
  }
  gen_dat[[i]]$dv <- gen_dat[[i]]$dv+rnorm(nrow(mm), 0, 1)
}

Now we only need a function that estimates the ANOVA and mixed model for each data set and returns the p-value and loop over it.

## functions returning p-value for ANOVA and mixed model
within_anova <- function(data) {
  suppressWarnings(suppressMessages(
  a <- aov_ez(id = "id", dv = "dv", data, within = "condition", return = "univariate", anova_table = list(es = "none"))
  ))
  c(without = a[["univariate.tests"]][2,6],
    gg = a[["pval.adjustments"]][1,2],
    hf = a[["pval.adjustments"]][1,4])
}

within_mixed <- function(data) {
  suppressWarnings(
    m <- mixed(dv~condition+(1|id),data, progress = FALSE)  
  )
  c(mixed=anova(m)$`Pr(>F)`)
}

p_c1_within <- vapply(gen_dat, within_anova, rep(0.0, 3))
m_c1_within <- vapply(gen_dat, within_mixed, 0.0)

The following graph shows the results (GG is the results using the Greenhouse-Geisser adjustment for sphericity violations).

ylim <- c(0, 700)
par(mfrow = c(1,3))
hist(p_c1_within[1,], breaks = 20, main = "ANOVA (default)", xlab = "p-value", ylim=ylim)
hist(p_c1_within[2,], breaks = 20, main = "ANOVA (GG)", xlab = "p-value", ylim=ylim)
hist(m_c1_within, breaks = 20, main = "Random-Intercept Model", xlab = "p-value", ylim=ylim)

What these graph clearly shows is that the p-value distribution for the standard repeated-measures ANOVA and the random-intercept mixed model is virtually identical. This clearly shows that my intuition was wrong and Jake was right.

We also see that for ANOVA and mixed model the rate of significant findings with p < .05 is slightly above the nominal level. More specifically:

mean(p_c1_within[1,] < 0.05) # ANOVA default
# [1] 0.0684
mean(p_c1_within[2,] < 0.05) # ANOVA GG
# [1] 0.0529
mean(p_c1_within[3,] < 0.05) # ANOVA HF
# [1] 0.0549
mean(m_c1_within < 0.05)     # random-intercept mixed model
# [1] 0.0701

These additional results indicate that maybe one also needs to adjust the degrees of freedom for mixed models for violations of sphericity. But this is not the topic of today’s post.

To sum this up, this simulation shows that removing the highest-order random slope seems to be the right decision if one wants to use a mixed model for a design with one observation per cell of the design and participant, but wants to implement the ‘maximal random effects structure’.

One more thing to note. Ben Bolker raised the same issue and pointed us to one of his example analyses of the starling data that is relevant to the current question (alternatively, the more up to date Rmd file). We are very grateful that Jake and Ben took the time to go through our chapter!

You can also download the RMarkdown file of the simulation.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/mixed-models-for-anova-designs-with-one-observation-per-unit-of-observation-and-cell-of-the-design/feed/ 3 499
rtdists 0.7-2: response time distributions now with Rcpp and faster http://singmann.org/rtdists-0-7-2-response-time-distributions-now-with-rcpp-and-faster/ http://singmann.org/rtdists-0-7-2-response-time-distributions-now-with-rcpp-and-faster/#comments Fri, 26 May 2017 07:11:09 +0000 http://singmann.org/?p=475 It took us quite a while but we have finally released a new version of rtdists to CRAN which provides a few significant improvements. As a reminder, rtdists

[p]rovides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA)  with different distributions underlying the drift rate.

The main reason it took us relatively long to push the new version was that we had a problem with the C code for the diffusion model that we needed to sort out first. Specifically, the CDF (i.e., pdiffusion) in versions prior to 0.5-2 did not produce correct results in many cases (one consequence of this is that the model predictions given in the previous blog post are wrong). As a temporary fix, we resorted to the correct but slow numerical integration of the PDF (i.e., ddiffusion) to obtain the CDF in version 0.5-2 and later. Importantly, it appears as if the error was not present in  fastdm which is the source of the C code we use. Matthew Gretton carefully investigated the original C code, changed it such that it connects to R via Rcpp, and realized that there are two different variants of the CDF, a fast variant and a precise variant. Up to this point we had only used the fast variant and, as it turns out, this was responsible for our incorrect results. We now per default use the precise variant (which only seems to be marginally slower) as it produces the correct results for all cases we have tested (and we have tested quite a few).

In addition to a few more minor changes (see NEWS for full list), we made two more noteworthy changes. First, all diffusion functions as well as rLBA received a major performance update, mainly in situations with trial-wise parameters. Now it should almost always be fastest to call the diffusion functions (e.g., ddiffusion) only once with vectorized parameters instead of calling it several times for different sets of parameters. The speed up with the new version depends on the number of unique parameter sets, but even with only a few different sets the speed up should be clearly noticeable. For completely trial-wise parameters the speed-up should be quite dramatic.

Second, I also updated the vignette which now uses the tidyverse in, I believe, a somewhat more reasonable manner. Specifically, it now is built on nested data (via tidyr::nest) and purrr::map instead of relying heavily on dplyr::do.  The problem I had with dplyr::do is that it often leads to somewhat ugly syntax. The changes in the vignette are mainly due to me reading Chapter 25 in the great R for Data Science book by Wickham and Gorlemund. However, I still prefer lattice over ggplot2.

Example Analysis

To show the now correct behavior of the diffusion CDF let me repeat the example from the last post. As a reminder, we somewhat randomly pick one participant from the speed_acc data set and fit both diffusion model and LBA to the data.

require(rtdists)

# Exp. 1; Wagenmakers, Ratcliff, Gomez, & McKoon (2008, JML)
data(speed_acc)   
# remove excluded trials:
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) 
# create numeric response variable where 1 is an error and 2 a correct response: 
speed_acc$corr <- with(speed_acc, as.numeric(stim_cat == response))+1 
# select data from participant 11, accuracy condition, non-word trials only
p11 <- speed_acc[speed_acc$id == 11 & 
                   speed_acc$condition == "accuracy" & 
                   speed_acc$stim_cat == "nonword",] 
prop.table(table(p11$corr))
#          1          2 
# 0.04166667 0.95833333 


ll_lba <- function(pars, rt, response) {
  d <- dLBA(rt = rt, response = response, 
            A = pars["A"], 
            b = pars["A"]+pars["b"], 
            t0 = pars["t0"], 
            mean_v = pars[c("v1", "v2")], 
            sd_v = c(1, pars["sv"]), 
            silent=TRUE)
  if (any(d == 0)) return(1e6)
  else return(-sum(log(d)))
}

start <- c(runif(3, 0.5, 3), runif(2, 0, 0.2), runif(1))
names(start) <- c("A", "v1", "v2", "b", "t0", "sv")
p11_norm <- nlminb(start, ll_lba, lower = c(0, -Inf, 0, 0, 0, 0), 
                   rt=p11$rt, response=p11$corr)
p11_norm[1:3]
# $par
#          A         v1         v2          b         t0         sv 
#  0.1182940 -2.7409230  1.0449963  0.4513604  0.1243441  0.2609968 
# 
# $objective
# [1] -211.4202
# 
# $convergence
# [1] 0


ll_diffusion <- function(pars, rt, response) 
{
  densities <- ddiffusion(rt, response=response, 
                          a=pars["a"], 
                          v=pars["v"], 
                          t0=pars["t0"], 
                          sz=pars["sz"], 
                          st0=pars["st0"],
                          sv=pars["sv"])
  if (any(densities == 0)) return(1e6)
  return(-sum(log(densities)))
}

start <- c(runif(2, 0.5, 3), 0.1, runif(3, 0, 0.5))
names(start) <- c("a", "v", "t0", "sz", "st0", "sv")
p11_diff <- nlminb(start, ll_diffusion, lower = 0, 
                   rt=p11$rt, response=p11$corr)
p11_diff[1:3]
# $par
#         a         v        t0        sz       st0        sv 
# 1.3206011 3.2727202 0.3385602 0.4621645 0.2017950 1.0551706 
# 
# $objective
# [1] -207.5487
# 
# $convergence
# [1] 0

As is common, we pass the negative summed log-likelihood to the optimization algorithm (here trusty nlminb) and hence lower values of objective indicate a better fit. Results show that the LBA provides a somewhat better account. The interesting question is whether this somewhat better fit translates into a visibly better fit when comparing observed and predicted quantiles.

# quantiles:
q <- c(0.1, 0.3, 0.5, 0.7, 0.9)

## observed data:
(p11_q_c <- quantile(p11[p11$corr == 2, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4900 0.5557 0.6060 0.6773 0.8231 
(p11_q_e <- quantile(p11[p11$corr == 1, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4908 0.5391 0.5905 0.6413 1.0653 

### LBA:
# predicted error rate  
(pred_prop_correct_lba <- pLBA(Inf, 2, 
                               A = p11_norm$par["A"], 
                               b = p11_norm$par["A"]+p11_norm$par["b"], 
                               t0 = p11_norm$par["t0"], 
                               mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), 
                               sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.9581342

(pred_correct_lba <- qLBA(q*pred_prop_correct_lba, response = 2, 
                          A = p11_norm$par["A"], 
                          b = p11_norm$par["A"]+p11_norm$par["b"], 
                          t0 = p11_norm$par["t0"], 
                          mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), 
                          sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4871710 0.5510265 0.6081855 0.6809796 0.8301286
(pred_error_lba <- qLBA(q*(1-pred_prop_correct_lba), response = 1, 
                        A = p11_norm$par["A"], 
                        b = p11_norm$par["A"]+p11_norm$par["b"], 
                        t0 = p11_norm$par["t0"], 
                        mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), 
                        sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4684374 0.5529575 0.6273737 0.7233961 0.9314820


### diffusion:
# same result as when using Inf, but faster:
(pred_prop_correct_diffusion <- pdiffusion(rt = 20,  response = "upper",
                                      a=p11_diff$par["a"], 
                                      v=p11_diff$par["v"], 
                                      t0=p11_diff$par["t0"], 
                                      sz=p11_diff$par["sz"], 
                                      st0=p11_diff$par["st0"], 
                                      sv=p11_diff$par["sv"]))  
# [1] 0.964723

(pred_correct_diffusion <- qdiffusion(q*pred_prop_correct_diffusion, 
                                      response = "upper",
                                      a=p11_diff$par["a"], 
                                      v=p11_diff$par["v"], 
                                      t0=p11_diff$par["t0"], 
                                      sz=p11_diff$par["sz"], 
                                      st0=p11_diff$par["st0"], 
                                      sv=p11_diff$par["sv"]))
# [1] 0.4748271 0.5489903 0.6081182 0.6821927 0.8444566
(pred_error_diffusion <- qdiffusion(q*(1-pred_prop_correct_diffusion), 
                                    response = "lower",
                                    a=p11_diff$par["a"], 
                                    v=p11_diff$par["v"], 
                                    t0=p11_diff$par["t0"], 
                                    sz=p11_diff$par["sz"], 
                                    st0=p11_diff$par["st0"], 
                                    sv=p11_diff$par["sv"]))
# [1] 0.4776565 0.5598018 0.6305120 0.7336275 0.9770047


### plot predictions

par(mfrow=c(1,2), cex=1.2)
plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "LBA")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_lba, q*pred_prop_correct_lba, type = "b")
lines(pred_error_lba, q*(1-pred_prop_correct_lba), type = "b")
legend("right", legend = c("data", "predictions"), pch = c(2, 1), lty = c(0, 1))

plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "Diffusion")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_diffusion, q*pred_prop_correct_diffusion, type = "b")
lines(pred_error_diffusion, q*(1-pred_prop_correct_diffusion), type = "b")

The fit plot compares observed quantiles (as triangles) with predicted quantiles (circles connected by lines). Here we decided to plot the 10%, 30%, 50%, 70% and 90% quantiles. In each plot, the x-axis shows RTs and the y-axis cumulative probabilities. From this it follows that the upper line and points correspond to the correct trials (which are common) and the lower line and points to the incorrect trials (which are uncommon). For both models the fit looks pretty good especially for the correct trials. However, it appears the LBA does a slightly better job in predicting very fast and slow trials here, which may be responsible for the better fit in terms of summed log-likelihood. In contrast, the diffusion model seems somewhat better in predicting the long tail of the error trials.

Checking the CDF

Finally, we can also check whether the analytical CDF does in fact correspond to the empirical CDF of the data. For this we can compare the analytical CDF function pdiffusion to the empirical CDF obtained from random deviates. One thing one needs to be careful about is that pdiffusion provides the ‘defective’ CDF that only approaches one if one adds the CDF for both response boundaries. Consequently, to compare the empirical CDF for one response with the analytical CDF, we need to scale the latter to also go from 0 to 1 (simply by dividing it by its maximal value). Here we will use the parameters values obtained in the previous fit.

rand_rts <- rdiffusion(1e5, a=p11_diff$par["a"], 
                            v=p11_diff$par["v"], 
                            t0=p11_diff$par["t0"], 
                            sz=p11_diff$par["sz"], 
                            st0=p11_diff$par["st0"], 
                            sv=p11_diff$par["sv"])
plot(ecdf(rand_rts[rand_rts$response == "upper","rt"]))

normalised_pdiffusion = function(rt,...) pdiffusion(rt,...)/pdiffusion(rt=Inf,...) 
curve(normalised_pdiffusion(x, response = "upper",
                            a=p11_diff$par["a"], 
                            v=p11_diff$par["v"], 
                            t0=p11_diff$par["t0"], 
                            sz=p11_diff$par["sz"], 
                            st0=p11_diff$par["st0"], 
                            sv=p11_diff$par["sv"]), 
      add=TRUE, col = "yellow", lty = 2)

This figure shows that the analytical CDF (in yellow) lies perfectly on top the empirical CDF (in black). If it does not for you, you still use an old version of rtdists. We have also added a series of unit tests to rtdists that compare the empirical CDF to the analytical CDF (using ks.test) for a variety of parameter values to catch if such a problem ever occurs again.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/rtdists-0-7-2-response-time-distributions-now-with-rcpp-and-faster/feed/ 2 475
New Version of rtdists on CRAN (v. 0.4-9): Accumulator Models for Response Time Distributions http://singmann.org/new-version-of-rtdists-on-cran-v-0-4-9-accumulator-models-for-response-time-distributions/ http://singmann.org/new-version-of-rtdists-on-cran-v-0-4-9-accumulator-models-for-response-time-distributions/#comments Sun, 03 Apr 2016 13:59:49 +0000 http://singmann.org/?p=391 I have just submitted a new version of rtdists to CRAN (v. 0.4-9). As I haven’t mentioned rtdists on here yet, let me simply copy it’s description as a short introduction, a longer introduction follows below:

Provides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA) with different distributions underlying the drift rate.

Cognitive models of response time distributions are (usually) bivariate distributions that simultaneously account for choices and corresponding response latencies. The arguably most prominent of these models are the Ratcliff diffusion model and the linear ballistic accumulator (LBA) . The main assumption of both is the idea of an internal evidence accumulation process. As soon as the accumulated evidence reaches a specific threshold the corresponding response is invariably given. To predict errors, the evidence accumulation process in each model can reach the wrong threshold (because it is noisy or because of variability in its direction). The central parameters of both models are the quality of the evidence accumulation process (the drift rate) and the position of the threshold. The latter can be voluntarily set by the decision maker, for example to trade off speed and accuracy. Additionally, the models can account for an initial bias towards one response (via position of the start point) and non-decision processes. To account for differences between the distribution besides their differential weighting (e.g., fast or slow errors) the models allow trial-by-trial variability of most parameters.

The new version of rtdists provides a completely new interface for the LBA and a considerably overhauled interface for the diffusion model. In addition the package now provides quantile functions for both models. In line with general R designs for distribution functions, the density starts with d (dLBA & ddiffusion), the distribution function with p (pLBA & pdiffusion), the quantile function with q (qLBA & qdiffusion), and the random generation with r (rLBA & rdiffusion). All main functions are now fully vectorized across all parameters and also across response (i.e., boundary or accumulator).

As an example, I will show how to estimate both models for a single individual data set using trial wise maximum likelihood estimation (in contrast to the often used binned chi-square estimation). We will be using one (somewhat randomly picked) participant from the data set that comes as an example with rtdists, speed_acc . Thanks to EJ Wagenmakers for providing this data and allowing it to be published on CRAN. We first prepare the data and plot the response time distribution.

require(rtdists)

require(lattice) # for plotting
lattice.options(default.theme = standard.theme(color = FALSE))
lattice.options(default.args = list(as.table = TRUE))

# Exp. 1; Wagenmakers, Ratcliff, Gomez, & McKoon (2008, JML)
data(speed_acc)   
# remove excluded trials:
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) 
# create numeric response variable where 1 is an error and 2 a correct response: 
speed_acc$corr <- with(speed_acc, as.numeric(stim_cat == response))+1 
# select data from participant 11, accuracy condition, non-word trials only
p11 <- speed_acc[speed_acc$id == 11 & speed_acc$condition == "accuracy" & speed_acc$stim_cat == "nonword",] 
prop.table(table(p11$corr))
#          1          2 
# 0.04166667 0.95833333 

densityplot(~rt, p11, group = corr, auto.key=TRUE, plot.points=FALSE, weights = rep(1/nrow(p11), nrow(p11)), ylab = "Density")

p11_nonwords_online

The plot obviously does not show the true density of both response time distributions (which can also be inferred from the warning messages produced by the call to densityplot) but rather the defective density in which only the sum of both integrals is one. This shows that there are indeed a lot more correct responses (around 96% of the data) and that the error RTs have quite a long tail.

To estimate the LBA for this data we simply need a wrapper function to which we can pass the RTs and responses and which will return the summed log-likelihood of all data points (actually the negative value of that because most optimizers minimize per default). This function and the data then only needs to be passed to our optimizer of choice (I like nlminb). To make the model identifiable we fix the SD of the drift rate for error RTs to 1 (other choices would be possible). The model converges at a maximum likelihood estimate (MLE) of 211.42 with parameters that look reasonable (not that the boundary b is parametrized as A + b). One might wonder about the mean negative dirft rate for error RTs, but the default for the LBA is a normal truncated at zero so even though the mean is negative, it only produces positive drift rates (negative drift rates could produce unidentified RTs).

ll_lba <- function(pars, rt, response) {
  d <- dLBA(rt = rt, response = response, A = pars["A"], b = pars["A"]+pars["b"], t0 = pars["t0"], mean_v = pars[c("v1", "v2")], sd_v = c(1, pars["sv"]), silent=TRUE)
  if (any(d == 0)) return(1e6)
  else return(-sum(log(d)))
}

start <- c(runif(3, 0.5, 3), runif(2, 0, 0.2), runif(1))
names(start) <- c("A", "v1", "v2", "b", "t0", "sv")
p11_norm <- nlminb(start, ll_lba, lower = c(0, -Inf, 0, 0, 0, 0), rt=p11$rt, response=p11$corr)
p11_norm
# $par
#          A         v1         v2          b         t0         sv 
#  0.1182951 -2.7409929  1.0449789  0.4513499  0.1243456  0.2609930 
# 
# $objective
# [1] -211.4202
# 
# $convergence
# [1] 0
# 
# $iterations
# [1] 57
# 
# $evaluations
# function gradient 
#       76      395 
# 
# $message
# [1] "relative convergence (4)"

We also might want to fit the diffusion model to these data. For this we need a similar wrapper. However, as the diffusion model can fail for certain parameter combinations the safest way is to wrap the ddiffusion call into a tryCatch call. Note that the diffusion model is already identified as the diffusion constant is set to 1 internally. Note that obtaining that fit can take longer than for the LBA and might need a few different tries with different random starting values to reach the MLE which is at 207.55. The lower MLE indicates that the diffusion model provides a somewhat worse account for this data set, but the parameters look reasonable.

ll_diffusion <- function(pars, rt, boundary) 
{
  densities <- tryCatch(ddiffusion(rt, boundary=boundary, a=pars[1], v=pars[2], t0=pars[3], z=0.5, sz=pars[4], st0=pars[5], sv=pars[6]), error = function(e) 0)
  if (any(densities == 0)) return(1e6)
  return(-sum(log(densities)))
}

start <- c(runif(2, 0.5, 3), 0.1, runif(3, 0, 0.5))
names(start) <- c("a", "v", "t0", "sz", "st0", "sv")
p11_fit <- nlminb(start, ll_diffusion, lower = 0, rt=p11$rt, boundary=p11$corr)
p11_fit
# $par
#         a         v        t0        sz       st0        sv 
# 1.3206011 3.2727201 0.3385602 0.3499652 0.2017950 1.0551704 
# 
# $objective
# [1] -207.5487
# 
# $convergence
# [1] 0
# 
# $iterations
# [1] 31
# 
# $evaluations
# function gradient 
#       50      214 
# 
# $message
# [1] "relative convergence (4)"

Finally, we might be interested to assess the fit of the models graphically in addition to simply comparing their MLEs (see also ). Specifically, we will produce a version of a quantile probability plot in which we plot for the .1, .3, .5, .7, and .9 quantile both the RTs and cumulative probabilities and compare the model predictions with those values from the data (see , pp. 162). For this we need both the CDFs and the quantile functions. The cumulative probabilities are simply the quantiles for each response, for example, the .1 quantile for the error RTs is .1 times the overall error rate (which is .04166667). Therefore, the first step in obtaining the model predictions is to obtain the predicted error rate by evaluating the CDF at infinity (or a high value). We use this obtained error rate then to get the actual quantiles for each response which are then used to obtain the corresponding predicted RTs using the quantile functions. Finally, we plot predictions and observed data separately for both models.

# quantiles:
q <- c(0.1, 0.3, 0.5, 0.7, 0.9)

## observed data:
(p11_q_c <- quantile(p11[p11$corr == 2, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4900 0.5557 0.6060 0.6773 0.8231 
(p11_q_e <- quantile(p11[p11$corr == 1, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4908 0.5391 0.5905 0.6413 1.0653 

### LBA:
# predicted error rate  
(pred_prop_correct_lba <- pLBA(Inf, 2, A = p11_norm$par["A"], b = p11_norm$par["A"]+p11_norm$par["b"], t0 = p11_norm$par["t0"], mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.9581342

(pred_correct_lba <- qLBA(q*pred_prop_correct_lba, response = 2, A = p11_norm$par["A"], b = p11_norm$par["A"]+p11_norm$par["b"], t0 = p11_norm$par["t0"], mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4871709 0.5510265 0.6081855 0.6809797 0.8301290
(pred_error_lba <- qLBA(q*(1-pred_prop_correct_lba), response = 1, A = p11_norm$par["A"], b = p11_norm$par["A"]+p11_norm$par["b"], t0 = p11_norm$par["t0"], mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4684367 0.5529569 0.6273732 0.7233959 0.9314825


### diffusion:
# same result as when using Inf, but faster:
(pred_prop_correct_diffusion <- do.call(pdiffusion, args = c(rt = 20, as.list(p11_fit$par), boundary = "upper")))  
# [1] 0.938958

(pred_correct_diffusion <- qdiffusion(q*pred_prop_correct_diffusion, a=p11_fit$par["a"], v=p11_fit$par["v"], t0=p11_fit$par["t0"], sz=p11_fit$par["sz"], st0=p11_fit$par["st0"], sv=p11_fit$par["sv"], boundary = "upper"))
# [1] 0.4963608 0.5737010 0.6361651 0.7148225 0.8817063
(pred_error_diffusion <- qdiffusion(q*(1-pred_prop_correct_diffusion), a=p11_fit$par["a"], v=p11_fit$par["v"], t0=p11_fit$par["t0"], sz=p11_fit$par["sz"], st0=p11_fit$par["st0"], sv=p11_fit$par["sv"], boundary = "lower"))
# [1] 0.4483908 0.5226722 0.5828972 0.6671577 0.8833553


### plot predictions

par(mfrow=c(1,2), cex=1.2)
plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "LBA")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_lba, q*pred_prop_correct_lba, type = "b")
lines(pred_error_lba, q*(1-pred_prop_correct_lba), type = "b")
legend("right", legend = c("data", "predictions"), pch = c(2, 1), lty = c(0, 1))

plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "Diffusion")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_diffusion, q*pred_prop_correct_diffusion, type = "b")
lines(pred_error_diffusion, q*(1-pred_prop_correct_diffusion), type = "b")

p11_predictions_online

The plot confirms the somewhat better fit for the LBA compared to the diffusion model for this data set; while the LBA provides a basically perfect fit for the correct RTs, the diffusion model is somewhat off, especially for the higher quantiles. However, both models have similar problems predicting the long tail for the error RTs.

Many thanks to my package coauthors, Andrew Heathcote, Scott Brown, and Matthew Gretton, for developing rtdists with me. And also many thanks to Andreas and Jochen Voss for releasing their C code of the diffusion model under the GPL.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/new-version-of-rtdists-on-cran-v-0-4-9-accumulator-models-for-response-time-distributions/feed/ 1 391
Hierarchical MPT in Stan I: Dealing with Convergent Transitions via Control Arguments http://singmann.org/hierarchical-mpt-in-stan-i-dealing-with-convergent-transitions-via-control-arguments/ http://singmann.org/hierarchical-mpt-in-stan-i-dealing-with-convergent-transitions-via-control-arguments/#comments Sat, 05 Mar 2016 12:54:12 +0000 http://singmann.org/?p=337 I have recently restarted working with Stan and unfortunately ran into the problem that my (hierarchical) Bayesian models often produced divergent transitions. And when this happens, the warning basically only suggests to increase adapt_delta:

Warning messages:
1: There were X divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
2: Examine the pairs() plot to diagnose sampling problems

However, increasing adapt_delta often does not help, even if one uses values such as .99. Also, I never found pairs() especially illuminating. This is the first of two blog posts dealing with this issue. In this (the first) post I will show which Stan settings need to be changed to remove the divergent transitions (to foreshadow, these are adapt_delta, stepsize, and max_treedepth). In the next blog post I will show how reparameterizations of the model following Stan recommendations can remove divergent transitions often without the necessity to extensively fiddle with the sampler settings while at the same time dramatically improve the fitting speed.

My model had some similarities to the multinomial processing tree (MPT) example in the Lee and Wagenmakers cognitive modeling book. As I am a big fan of both, MPTs and the book, I investigated the issue of divergent transitions using this example. Luckily, a first implementation of all the examples of Lee and Wagenmakers in Stan has been provided by Martin Šmíra (who is now working on his PhD in Birmingham) and is part of the Stan example models. I submitted a pull request with the changes to the model discussed here so they are now also part of the example models (and contains a README file discussing those changes).

The example uses the pair-clustering model also discussed in the paper introducing MPTs formally . The model has three parameters, c for cluster-storage, r for cluster-retrieval, and u for unique storage-retrieval. For the hierarchical structure the model employs the latent trait approach of : The group level (i.e., hyper-) parameters are estimated separately on the unconstrained space from -infinity to +infinity. Individual level parameters are added to the group means as displacements estimated from a multivariate normal with mean zero and freely estimated variance/covariance matrix. Only then is the unconstrained space mapped onto the unit range (i.e., 0 to 1), which represents the parameter space, via the probit transformation. This allows to freely estimate the correlation among the individual parameters on the unconstrained space and at the same time constrains the parameters after transformation onto the allowed range.

The original implementation employed two features that are particularly useful for models estimated via Gibbs sampling (as implemented in Jags), but not so much for the NUTS sampler implemented in Stan: (a) A scaled inverse Wishart as prior for the covariance matrix due to its computational convenience (following ) and (b) parameter expansion to move the scale parameters of the variance-covariance matrix away from zero and ensure reasonable priors.

The original implementation of the model in Stan is simply a literal translation of the Jags code given in Lee and Wagenmakers. Consequently, it retains the Gibbs specific features. When fitting this model it seems to produce stable estimates, but Stan reports several divergent transitions after warm up. Given that the estimates seem stable and the results basically replicate what is reported in Lee and Wagenmakers (Figures 14.5 and 14.6) one may wonder why not too trust these results. I can give no full explanation, so let me copy the relevant part from the shinystan help. Important is the last section, it clearly says not to use the results if there are any divergent transitions.

n_divergent

Quick definition The number of leapfrog transitions with diverging error. Because NUTS terminates at the first divergence this will be either 0 or 1 for each iteration. The average value of n_divergent over all iterations is therefore the proportion of iterations with diverging error.

More details

Stan uses a symplectic integrator to approximate the exact solution of the Hamiltonian dynamics and when stepsize is too large relative to the curvature of the log posterior this approximation can diverge and threaten the validity of the sampler. n_divergent counts the number of iterations within a given sample that have diverged and any non-zero value suggests that the samples may be biased in which case the step size needs to be decreased. Note that, because sampling is immediately terminated once a divergence is encountered, n_divergent should be only 0 or 1.

If there are any post-warmup iterations for which n_divergent = 1 then the results may be biased and should not be used. You should try rerunning the model with a higher target acceptance probability (which will decrease the step size) until n_divergent = 0 for all post-warmup iterations.

My first step trying to get rid of the divergent transitions was to increase adapt_delta as suggested by the warning. But as said initially, this did not help in this case even when using quite high values such as .99 or .999. Fortunately, the quote above tells that divergent transitions are related to the stepsize with which the sampler traverses the posterior. stepsize is also one of the control arguments one can pass to Stan in addition to adapt_delta. Unfortunately, the stan help page is relatively uninformative with respect to the stepsize argument and does not even provide its default value, it simply says stepsize (double, positive). Bob Carpenter clarified on the Stan mailing list that the default value is 1 (referring to the CMD Stan documentation). He goes on:

The step size is just the initial step size.  It lets the first few iterations move around a bit and set relative scales on the parameters.  It’ll also reduce numerical issues. On the negative side, it will also be slower because it’ll take more steps at a smaller step size before hitting a U-turn.

The adapt_delta (target acceptance rate) determines what the step size will be during sampling — the higher the accept rate, the lower the step size has to be.  The lower the step size, the less likely there are to be divergent (numerically unstable) transitions.

Taken together, this means that divergent transitions can be dealt with by increasing adapt_delta above the default value of .8 while at the same time decreasing the initial stepsize below the default value of 1. As this may increase the necessary number of steps one might also need to increase the max_treedepth above the default value of 10. After trying out various different values, the following set of control arguments seems to remove all divergent transitions in the example model (at the cost of prolonging the fitting process quite considerably):

control = list(adapt_delta = 0.999, stepsize = 0.001, max_treedepth = 20)

As this uses rstan, the R interface to stan, here the full call:

samples_1 <- stan(model_code=model,   
                  data=data, 
                  init=myinits,  # If not specified, gives random inits
                  pars=parameters,
                  iter=myiterations, 
                  chains=3, 
                  thin=1,
                  warmup=mywarmup,  # Stands for burn-in; Default = iter/2
                  control = list(adapt_delta = 0.999, stepsize = 0.01, max_treedepth = 15)
)

With these values the traceplots of the post-warmup samples look pretty good. Even for the sigma parameters which occasionally have problems moving away from 0. As you can see from these nice plots, rstan uses ggplot2.

traceplot(samples_1, pars = c("muc", "mur", "muu", "Omega", "sigma", "lp__"))

traceplots_orig

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/hierarchical-mpt-in-stan-i-dealing-with-convergent-transitions-via-control-arguments/feed/ 2 337
" ["response"]=> array(2) { ["code"]=> int(200) ["message"]=> string(2) "OK" } ["cookies"]=> array(0) { } ["filename"]=> NULL ["http_response"]=> object(WP_HTTP_Requests_Response)#5542 (5) { ["response":protected]=> object(WpOrg\Requests\Response)#5543 (10) { ["body"]=> string(1463553) " Henrik Singmann – Computational Psychology http://singmann.org Tue, 23 Jun 2020 12:50:27 +0000 en-US hourly 1 73426105 Install R without support for long doubles (noLD) on Ubuntu http://singmann.org/install-r-without-support-for-long-doubles/ http://singmann.org/install-r-without-support-for-long-doubles/#comments Mon, 22 Jun 2020 20:08:19 +0000 http://singmann.org/?p=894 R packages on CRAN needs to pass a series of technical checks. These checks can also be invoked by any user when running R CMD check on the package tar.gz (to emulate CRAN as much as possible one should also set the --as-cran option when doing so). These checks need to be passed before a package is accepted on CRAN. In addition, these checks are regularly run for each package on CRAN to ensure that new R features or updates of upstream packages do not break the package. Furthermore, CRAN checks regularly become stricter. Thus, keeping a package on CRAN may require regular effort from the package maintainer. Whereas this sometimes can be rather frustrating for the maintainer, partly because of CRAN’s rather short two week limit in case of newly appearing issues, this is one the features that ensures the high technical quality of packages on CRAN.

As an example for the increasingly stricter checks, CRAN now also performs a set of additional checks in addition to the CRAN checks on all R platforms that are shown on a packages check page (e.g., for the MPTmultiverse). These additional checks include tests for memory access errors (e.g., using valgrind), R compiled using alternative compilers, different numerical algebra libraries, but also tests for an R version without support for long doubles (i.e., noLD). It now has happened for the second time that one of my packages showed a problem on the R version without long double support

In my case, the problem on the R version without long double support appeared in the package examples or in the unit tests of the package. Therefore, I did not only want to fix the check issue, I also wanted to understand what was happening. Thus, I needed a working version of R without support for long doubles. Unfortunately, the description of this setup is rather sparse. The only information on CRAN is rather sparse:

tests on x86_64 Linux with R-devel configured using --disable-long-double

Other details as https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-gcc

Similarly sparse information is given in Writing R Extensions:

If you must try to establish a tolerance empirically, configure and build R with –disable-long-double and use appropriate compiler flags (such as -ffloat-store and -fexcess-precision=standard for gcc, depending on the CPU type86) to mitigate the effects of extended-precision calculations.

Unfortunately, my first approach in which I simply tried to add the --disable-long-double option to the R-devel install script failed. After quite a bit of searching I found the solution on the RStudio community forum thanks to David F. Severski. In addition to --disable-long-double one also needs to add --enable-long-double=no to configure. At least on Ubuntu, this successfully compiles an R version without long double support. This can be confirmed with a call to capabilities() in R.

The rest of this post gives a list of all the packages I needed to install on a fresh Ubuntu version to successfully compile R in this way (e.g., from here). This set of packages should of course also hold for compiling normal R versions. I hope I did not forget too many packages, but this hopefully covers most. Feel free to post a comment if something is missing and I will try to update the list.

sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install gfortran
sudo apt-get install gcc-multilib
sudo apt-get install gobjc++
sudo apt-get install libpcre2-dev
sudo apt-get install xorg-dev
sudo apt-get install libcurl4-openssl-dev
sudo apt-get install libbz2-dev
sudo apt-get install liblzma-dev
sudo apt-get install libblas-dev
sudo apt-get install texlive-fonts-extra
sudo apt-get install default-jdk
sudo apt-get install aptitude
sudo aptitude install libreadline-dev
sudo apt-get install curl

In addition to the necessary packages, the following packages probably lead to a better R user experience (after installing these a restart may help):

sudo apt-get install xfonts-100dpi 
sudo apt-get install xfonts-75dpi
sudo apt-get install qpdf
sudo apt-get install pandoc
sudo apt-get install libssl-dev
sudo apt-get install libxml2-dev
sudo apt-get install git
sudo apt-get install gdebai-core
sudo apt-get install libcairo2-dev
sudo apt-get install libtiff-dev

The last two packages should allow you to add --with-cairo=yes to the configure script below. The package above might be needed for installing RStudio.

After this, we should be able to build R. For this, I followed the `RStudio` instructions for installing multiple R versions in parallel. We begin by setting an environment variable and downloading R.

export R_VERSION=4.0.1

curl -O https://cran.rstudio.com/src/base/R-4/R-${R_VERSION}.tar.gz
tar -xzvf R-${R_VERSION}.tar.gz
cd R-${R_VERSION}

We can then install R (here I set the options for disabling long doubles):

./configure \
    --prefix=/opt/R/${R_VERSION} \
    --enable-R-shlib \
    --with-blas \
    --with-lapack \
    --disable-long-double \
    --enable-long-double=no

make 
sudo make install

To test the installation we can use:

/opt/R/${R_VERSION}/bin/R --version

Finally, we need to create a symbolic link:

sudo ln -s /opt/R/${R_VERSION}/bin/R /usr/local/bin/R
sudo ln -s /opt/R/${R_VERSION}/bin/Rscript /usr/local/bin/Rscript

We can then run R and check the capabilities of the installation:

> capabilities()
       jpeg         png        tiff       tcltk         X11 
      FALSE        TRUE       FALSE       FALSE        TRUE 
       aqua    http/ftp     sockets      libxml        fifo 
      FALSE        TRUE        TRUE        TRUE        TRUE 
     cledit       iconv         NLS     profmem       cairo 
       TRUE        TRUE        TRUE       FALSE       FALSE 
        ICU long.double     libcurl 
       TRUE       FALSE        TRUE

Or shorter:

> capabilities()[["long.double"]]
[1] FALSE

 

 

 

 

]]>
http://singmann.org/install-r-without-support-for-long-doubles/feed/ 2 894
afex_plot(): Publication-Ready Plots for Factorial Designs http://singmann.org/afex_plot/ http://singmann.org/afex_plot/#respond Tue, 25 Sep 2018 17:44:35 +0000 http://singmann.org/?p=744 I am happy to announce that a new version of afex (version 0.22-1) has appeared on CRAN. This version comes with two major changes, for more see the NEWS file. To get the new version including all packages used in the examples run:

install.packages("afex", dependencies = TRUE)

First, afex does not load or attach package emmeans automatically anymore. This reduces the package footprint and makes it more lightweight. If you want to use afex without using emmeans, you can do this now. The consequence of this is that you have to attach emmeans explicitly if you want to continue using emmeans() et al. in the same manner. Simply add library("emmeans") to the top of your script just below library("afex") and things remain unchanged. Alternatively, you can use emmeans::emmeans() without attaching the package.

Second and more importantly, I have added a new plotting function to afex. afex_plot() visualizes results from factorial experiments combining estimated marginal means and associated uncertainties (i.e., error bars) in the foreground with a depiction of the raw data in the background. Currently, afex_plots() supports ANOVAs and mixed models fitted with afex as well as mixed models fitted with lme4 (support for more models will come in the next version). As shown in the example below, afex_plots() makes it easy to produce nice looking plots that are ready to be incorporated into publications. Importantly, afex_plots() allows different types of error bars, including within-subjects confidence intervals, which makes it particularly useful for fields where such designs are very common (e.g., psychology). Furthermore, afex_plots() is built on ggplot2 and designed in a modular manner, making it easy to customize the plot to ones personal preferences.

afex_plot() requires the fitted model object as first argument and then has three arguments determining which factor or factors are displayed how:
x is necessary and specifies the factor(s) plotted on the x-axis
trace is optional and specifies the factor(s) plotted as separate lines (i.e., with each factor-level present at each x-axis tick)
panel is optional and specifies the factor(s) which separate the plot into different panels.

The further arguments make it easy to customize the plot in various ways. A comprehensive overview is provided in the new vignette, further details, specifically regarding the question of which type of error bars are supported, is given on its help page (which also has many more examples).

Let us look at an example. We take data from a 3 by 2 within-subject experiment that also features prominently in the vignette. Note that we plot within-subjects confidence intervals (by setting error = "within") and then customize the plot quite a bit by changing the theme, using nicer labels, removing some y-axis ticks, adding colour, and using a customized geom (geom_boxjitter from the ggpol package) for displaying the data in the background.

library("afex") 
library("ggplot2") 
data(md_12.1)
aw <- aov_ez("id", "rt", md_12.1, within = c("angle", "noise"))

afex_plot(aw, x = "angle", trace = "noise", error = "within",
          mapping = c("shape", "fill"), dodge = 0.7,
          data_geom = ggpol::geom_boxjitter, 
          data_arg = list(
            width = 0.5, 
            jitter.width = 0,
            jitter.height = 10,
            outlier.intersect = TRUE),
          point_arg = list(size = 2.5), 
          error_arg = list(size = 1.5, width = 0),
          factor_levels = list(angle = c("0°", "4°", "8°"),
                               noise = c("Absent", "Present")), 
          legend_title = "Noise") +
  labs(y = "RTs (in ms)", x = "Angle (in degrees)") +
  scale_y_continuous(breaks=seq(400, 900, length.out = 3)) +
  theme_bw(base_size = 15) + 
  theme(legend.position="bottom", panel.grid.major.x = element_blank())

ggsave("afex_plot.png", device = "png", dpi = 600,
       width = 8.5, height = 8, units = "cm") 

In the plot, the black dots are the means and the thick black lines the 95% within-subject confidence intervals. The raw data is displayed in the background with a half box plot showing the median and upper and lower quartile as well as the raw data. The raw data is jittered on the y-axis to avoid perfect overlap.


One final thing to note. In the vignette on CRAN as well as the help page there is an error in the code. The name of the argument for changing the labels of the factor-levels is factor_levels and not new_levels. The vignette linked above and here uses the correct argument name. This is already corrected on github and will be corrected on CRAN with the next release.

]]>
http://singmann.org/afex_plot/feed/ 0 744
Diffusion/Wiener Model Analysis with brms – Part III: Hypothesis Tests of Parameter Estimates http://singmann.org/wiener-model-analysis-with-brms-part-iii/ http://singmann.org/wiener-model-analysis-with-brms-part-iii/#comments Thu, 06 Sep 2018 15:58:49 +0000 http://singmann.org/?p=708 This is the third part of my blog series on fitting the 4-parameter Wiener model with brms. The first part discussed how to set up the data and model. The second part was concerned with (mostly graphical) model diagnostics and the assessment of the adequacy (i.e., the fit) of the model. This third part will inspect the parameter estimates of the model with the goal of determining whether there is any evidence for differences between the conditions. As before, this part is completely self sufficient and can be run without running the code of Parts I or II.

As I promised in the second part of this series of blog posts, the third part did not take another two months to appear. No, this time it took almost eight month. I apologize for this, but we all know the planning fallacy and a lot of more important things got into the way (e.g., teaching).

As this part is relatively long, I will provide a brief overview. The next section contains a short explanation for the way in which we will perform hypothesis testing. This is followed by a short section loading some packages and the fitted model object and giving a small recap of the model. After this comes one relatively long section looking at the drift rate parameters in various ways. Then we will take look at each of the other three parameters in turn. Of especial importance will be the subsection on the non-decision time. As described in more detail below, I believe that this parameter cannot be interpreted. In the end, I give a brief overview of some of the limits of the present model and how it could be improved upon.

Bayesian Hypothesis Testing

The goal of this post is to provide evidence for differences in parameter estimates between conditions. This posts will present different ways to do so. Importantly, different ways of how to produce such evidence is only meant in the technical sense. In statistical terms we will always do basically the same thing: inspect difference distributions resulting from linear combinations of cell-wise posterior distributions of the group-level model parameter estimates. The somewhat technical phrase “linear combinations of cell-wise posterior distributions” often simply means the difference between two distributions. For example, the difference distribution resulting from subtracting the posterior of the speed condition from the posterior of the accuracy condition.

As a reminder, a posterior distribution is the probability distribution of the parameter conditional on data and model (where the latter includes the parameter priors). It answers the question which parameters are likely given our prior knowledge and the data. Therefore, the posterior distribution of the difference answers, for example, which difference values between two conditions are likely or not. With such a difference distribution we can then do two things.

First, we can check whether the x%-highest posterior density (HPD) or credibility interval of this difference distribution includes 0. If 0 is within the 95% HPD interval it could be seen a plausible value. If 0 is outside the 95% interval we could regard it as not plausible enough and would conclude that there is evidence for a difference.

Second, we can evaluate how much of the difference distribution is on one side of 0. If this value is considerably away from 50%, this constitutes evidence for a difference. For example, if all of the posterior samples for a specific difference are larger than zero, this provides considerable evidence that the difference is above 0.

The approach of investigating posterior distributions to gauge differences between conditions is only one approach for hypothesis testing in a Bayesian setting. And, at least in the psychological literature, it is not the most popular one. More specifically, many of the more vocal proponents of Bayesian statistics in the psychological literature advocate hypothesis testing using Bayes factors (e.g., ). One prominent exception to this rule in psychology is maybe John . However, he proposes yet another approach of inference based on posterior distributions as used here. In general, I agree with many of the argument pro Bayes factor, especially in cases as the current one in which all relevant hypothesis or competing models are nested within one large (super) model.

The main difficulty when using Bayes factors is their extreme sensitivity to the parameter priors. In a situation with nested models, this is in principle not such a big problem, because one could use Jeffrey’s default prior approach (e.g., ). have extended this approach to general ANOVA designs (I am sure they were not the first to have this idea, but they were at least the first to popularize this idea in psychology). Quentin Gronau and colleagues have applied it to accumulator models, including the diffusion model. The general idea is to reparameterize the model using effect parameters which are normalized using, for example, the residual variance. For example, for a two sample design parameterize the model using a standardized difference such as Cohen’s d. Then it is comparatively easy and uncontroversial to put a prior on the standardized effect size measure. In the present case, in which the model does not contain a residual variance parameter, one could use the variance estimate of the group-level distribution for each parameter for such a normalization.

Unfortunately, brms does to the best of my knowledge not contain the ability to specify a parameterization and prior distribution in line with Jeffrey’s default Bayes factor. And as far as I remember a discussion I had on this topic with Paul Bürkner some time ago, it is also unlikely brms will ever get this ability. Consequently, I feel that brms is not the right tool for model selection using Bayes factors. Whereas it offers this ability now from a technical side (using our bridgesampling package), it only allows models with an unnormalized parameterization. I believe that such a parameterization is in most cases not appropriate for Bayes factors based model selection as the priors cannot be specified in a ‘default’ manner. Thus, I cannot recommend brms for Bayes factor based model selection at the moment. In sum, the reason for basing our inferences solely on posterior distributions in the present case is practical constraints and not philosophical considerations.

One final word of caution for the psychological readership. Whereas Bayes factors are clearly extremely popular in psychology, this is not the case in many other scientific disciplines. For example, the patron saint of applied Bayesian statistics, Andrew Gelman, is a self-declared opponent of Bayes factors: “I generally hate Bayes factors myself”. As far as I can see, this disagreement comes from the different type of data different people work with. When working with observational (or correlational) data, as Andrew Gelman usually does, tests of the presence of effects (or of nullity) are either a big no-no (e.g., when wanting to do causal inference) or simply not interesting. We know that the real world is full of relationships, especially small ones, between arbitrary things. So getting effects simply by increasing N is just not interesting and estimation is the more interesting approach. In contrast, for experimental data, we often have true null hypothesis and testing of those makes a lot of sense. For example, if Bem was right and there truly were PSI, we could surely exploit this somehow. But as far as we can tell, the effect is truly null. In this case we really need ypothesis testing.

Getting Started

We start with loading some packages for analyzing the posterior. Since the beginning of this series, I have more and more become a fan of the whole tidyverse, so we import it completely. We of course also need brms. As shown below, we will need a few more packages (especially emmeans and tidybayes), but these are only loaded when needed.

library("brms")
library("tidyverse")
theme_set(theme_classic()) # theme for ggplot2
options(digits = 3)

Then we will also need the posterior samples, we can load them in the same way as before from my github page. Note that we neither need the data nor the posterior predictive distribution this time.

tmp <- tempdir()
download.file("https://singmann.github.io/files/brms_wiener_example_fit.rda",
file.path(tmp, "brms_wiener_example_fit.rda"))
load(file.path(tmp, "brms_wiener_example_fit.rda"))

We begin with looking at the group-level posteriors. An overview of their posterior distributions can be obtained using the summary function.

#                                    Estimate Est.Error l-95% CI u-95% CI
# conditionaccuracy:frequencyhigh      -2.944    0.1971   -3.345   -2.562
# conditionspeed:frequencyhigh         -2.716    0.2135   -3.125   -2.299
# conditionaccuracy:frequencynw_high    2.238    0.1429    1.965    2.511
# conditionspeed:frequencynw_high       1.989    0.1785    1.626    2.332
# bs_conditionaccuracy                  1.898    0.1448    1.610    2.186
# bs_conditionspeed                     1.357    0.0813    1.200    1.525
# ndt_conditionaccuracy                 0.323    0.0173    0.289    0.358
# ndt_conditionspeed                    0.262    0.0154    0.232    0.293
# bias_conditionaccuracy                0.471    0.0107    0.449    0.491
# bias_conditionspeed                   0.499    0.0127    0.474    0.524
# Warning message:
# There were 7 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
# See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

As a reminder, we have data from a lexical decision task (i.e., participants have to decide whether presented strings are a word or not) and frequency is the factor determining the true status of a string, with high referring to words and nw_high to non-words. Consequently, for the drift rate (the first four rows in the results table) the frequency factor determines the sign of the parameter estimates with the drift rate for words (rows 1 and 2) being clearly negative (i.e., those trials mostly hit the lower boundary for the word decision) and the drift rate for non-words (rows 3 and 4) being clearly positive (i.e., those trials mostly hit the upper boundary for non-word decisions). Furthermore, there could be differences between the drift rates in the accuracy or speed conditions. Specifically, in the speed conditions drift rates seem to be less extreme (i.e., nearer to 0) compared to the accuracy conditions.

The other three parameters, only differ between the condition factor. Given the experimental manipulation of accuracy versus speed condition, we expect differences for the boundary separation, parameters starting with bs_. For the non-decision time, parameters starting with ndt_, there also appears to be a small effect as the 95% only overlap slightly. However, as discussed in detail below, we should be careful in interpreting this difference. Finally, for bias, parameters starting with bias_, there might be a difference or not. Furthermore, at least in the accuracy condition there appears to be a bias for “word” responses.

One way to test differences between conditions is using the hypothesis function in brms. However, I was not able to get it to work with the current model. I suspect the reason for this is the somewhat unconventional parameterizations where each cell gets one parameter (in some sense each cell has its own intercept, but there is no overall intercept). This contrasts with a more “standard” parameterization in which there is one intercept (for either the unweighted means or one of the cells) and the remaining parameters capture the differences between the intercept and the cell means. As a reminder, I chose this unconventional parameterization in the first place to make the specification of the parameters priors easier. Additionally, this is a common parameterization when programming cognitive models by hand.

emmeans and tidybayes: Differences in the Drift Rate

An alternative is to use the great emmeans package by Russel Lenth. I am a huge fan of emmeans and use it all the time when using “normal” statistical models (e.g., ANOVAs, mixed models), independent of whether I use frequentist methods (e.g., via afex) or Bayesian methods (e.g., rstanarm or brms). Unfortunately, it appears as if emmeans at the moment only allows an analysis of the main parameter of the response distribution for models estimated with brms, which in our case is the drift rate. If someone were to extend emmeans to allow using brms models with all parameters, I would be very happy and thankful. In any case, I highly recommend to check out the emmeans vignettes to get an overview of what type of follow-up tests are all possible with this great package.

As I recently learned, emmeans works quite nicely together with tidybayes, a package that enables working with posterior draws within the tidyverse. tidybayes has a surprisingly large package footprint (i.e., it imports quite a lot of other packages) for a package with a comparatively small functionality. I guess this is a consequence of being embedded within the tidyverse. In any case, many of the imported packages are already in the search path thanks to loading the tidyverse above and attaching should not take that long here.

library("emmeans")
library("tidybayes")

We begin with emmeans only to assure ourselves that it works as expected. For this, we get the estimated marginal means plus 95%-highest posterior density (HPD) intervals which match the output of the fixed effects for the estimate of the central tendency (which is the median of the posterior samples in both cases). As a reminder, the fact that the cell estimates match the parameter estimates is of course a consequence of the unusual parameterization which is picked up correctly by emmeans. The lower and upper bounds of the intervals differ slightly between the summary output from brms and emmeans, a consequence of using different ways of calculating the intervals (i.e., quantiles versus HPD intervals).

fit_wiener %>%
  emmeans( ~ condition*frequency) 
#  condition frequency emmean lower.HPD upper.HPD
#  accuracy  high       -2.94     -3.34     -2.56
#  speed     high       -2.72     -3.10     -2.28
#  accuracy  nw_high     2.24      1.96      2.50
#  speed     nw_high     1.99      1.64      2.34
# 
# HPD interval probability: 0.95

Using HPD Intervals And Histograms

As a first test, we are interested in assessing whether there is evidence for a difference between speed and accuracy conditions for both words (i.e., frequency = high) and non-words (i.e., frequency = nw_high). There are many ways to do this with emmeans one of them is via the by argument and the pairs function.

 

fit_wiener %>%
  emmeans("condition", by = "frequency") %>% 
  pairs
# frequency = high:
#  contrast         estimate lower.HPD upper.HPD
#  accuracy - speed   -0.225   -0.6964     0.256
# 
# frequency = nw_high:
#  contrast         estimate lower.HPD upper.HPD
#  accuracy - speed    0.249   -0.0647     0.550
# 
# HPD interval probability: 0.95

Here, we do not have a lot of evidence that there is a difference for either stimulus type, as both HPD intervals include 0.

Instead of getting the summary of the distribution via emmeans, we can also use the capabilities of tidybayes and extract the samples in a tidy way. Then we use one of the convenient aggregation functions coming with tidybayes and aggregate the samples based on the same conditioning variable. After trying a few different options, I have the feeling that emmeanshpd.summary() function uses the same approach for calculating HPD intervals as tidybayes, as both results match.

samp1 <- fit_wiener %>%
  emmeans("condition", by = "frequency") %>% 
  pairs %>% 
  gather_emmeans_draws()
samp1 %>% 
  median_hdi()
# # A tibble: 2 x 8
# # Groups:   contrast [1]
#   contrast         frequency .value  .lower .upper .width .point .interval
#   <fct>            <fct>      <dbl>   <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy - speed high      -0.225 -0.696   0.256   0.95 median hdi      
# 2 accuracy - speed nw_high    0.249 -0.0647  0.550   0.95 median hdi

Instead of the median, we can also use the mode as our point estimate. In the present case the differences between both are not large but noticeable for the word stimuli.

samp1 %>% 
  mode_hdi()
# # A tibble: 2 x 8
# # Groups:   contrast [1]
#   contrast         frequency .value  .lower .upper .width .point .interval
#   <fct>            <fct>      <dbl>   <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy - speed high      -0.190 -0.696   0.256   0.95 mode   hdi      
# 2 accuracy - speed nw_high    0.252 -0.0647  0.550   0.95 mode   hdi

Further, we might use a different way for calculating HPD intervals. I have the feeling, Rob Hyndman’s hdrcde package provides the most elaborated set of functions for estimating highest density intervals. Consequently, this is what we use next. Note that the package need to be installed for that.

To use it in a tidy way, we write a short function returning a data.frame in a list. Thus, when called within summarise we get a list-column. Consequently, we have to call unnest to get a nice output.

get_hdi <- function(x, level = 95) {
  tmp <- hdrcde::hdr(x, prob = level)
  list(data.frame(mode = tmp$mode[1], lower = tmp$hdr[1,1], upper = tmp$hdr[1,2]))
}
samp1 %>% 
  summarise(hdi = get_hdi(.value)) %>% 
  unnest
# # A tibble: 2 x 5
# # Groups:   contrast [1]
#   contrast         frequency   mode   lower upper
#   <fct>            <fct>      <dbl>   <dbl> <dbl>
# 1 accuracy - speed high      -0.227 -0.712  0.247
# 2 accuracy - speed nw_high    0.249 -0.0616 0.558

The results differ again slightly, but not too much. Perhaps more importantly, there is still no real evidence for a difference in the drift rate between conditions. Even when looking only at 80% HPD intervals there is only evidence for a difference for the non-word stimuli.

samp1 %>% 
  summarise(hdi = get_hdi(.value, level = 80)) %>% 
  unnest
# # A tibble: 2 x 5
# # Groups:   contrast [1]
#   contrast         frequency   mode   lower  upper
#   <fct>            <fct>      <dbl>   <dbl>  <dbl>
# 1 accuracy - speed high      -0.212 -0.540  0.0768
# 2 accuracy - speed nw_high    0.246  0.0547 0.442

Because we have the samples in a convenient form, we could now evaluate whether there is any evidence for a drift rate difference between conditions for both, word and non-word stimuli. One problem for this is, however, that the direction of the effect differs between words and non-words. This is a consequence from the fact that word stimuli require a response at the lower decision boundary and non-words a response at the upper boundary. Consequently, we need to multiply the effect with -1 for one of the conditions. After that, we can take the mean of both conditions. We do this via tidyverse magic and also add the number of values that are aggregated in this way to the table. This is just a precaution to make sure that our logic is correct and we always aggregate exactly two values. As the final check shows, this is the case.

samp2 <- samp1 %>% 
  mutate(val2 = if_else(frequency == "high", -1*.value, .value)) %>% 
  group_by(contrast, .draw) %>% 
  summarise(value = mean(val2),
            n = n())
all(samp2$n == 2)
# [1] TRUE

We can then investigate the resulting difference distribution. One way to do so is in a graphical manner via a histogram. As recommended by Hadley Wickham, it makes sense to play around with the number of bins a bit until the figure looks good. Given we have quite a large number of samples, 75 bins seemed good to me. With less bins there was not enough granularity, with more bins I felt there were too many small peaks.

ggplot(samp2, aes(value)) +
  geom_histogram(bins = 75) +
  geom_vline(xintercept = 0)

This shows that, whereas quite a bit of the posterior mass is to the right of 0, a non negligible part is still to the left. So there is some evidence for a difference, but it is still not very strong, even when looking at words and non-words together.

We can also investigate this difference distribution via the HPD intervals. To get a better overview we now look at several intervals sizes:

hdrcde::hdr(samp2$value, prob = c(99, 95, 90, 80, 85, 50))
# $`hdr`
#        [,1]  [,2]
# 99% -0.1825 0.669
# 95% -0.0669 0.554
# 90% -0.0209 0.505
# 85%  0.0104 0.471
# 80%  0.0333 0.445
# 50%  0.1214 0.340
# 
# $mode
# [1] 0.225
# 
# $falpha
#    1%    5%   10%   15%   20%   50% 
# 0.116 0.476 0.757 0.984 1.161 1.857 

This shows that only for the 85% interval and smaller intervals is 0 excluded. Note, you can use hdrcde::hdr.den instead of hdrcde::hdr to get a graphical overview of the output.

Using Bayesian p-values

An approach that requires less arbitrary cutoffs then HPDs (for which we have to define the width) is to calculate the actual proportion of samples below 0:

mean(samp2$value < 0)
# [1] 0.0665

As explained above, if this proportion would be small, this would constitute evidence for a difference. Here, the proportion of samples below 0 is .067. Unfortunately, .067 is a bit above the magical cutoff of .05, which is universally accepted as delineating small from big numbers, or perhaps more appropriately, likely from unlikely probabilities.

Let us look at such a proportion a bit more in depth. If two posterior distributions are lying exactly on top of each other, the resulting difference distribution is centered on 0 and exactly 50% of the difference distribution would be on either side of 0. Thus, a proportion of 50% corresponds to the least evidence for a difference, or alternatively, to the strongest evidence for an absence of a difference. One further consequence is that both, values near 0 and values near 1, are indicative of a difference, albeit in different directions. To make interpretation of these proportions easier, I suggest to always calculate them in such a way that small values represent evidence for a difference (e.g., by subtracting the proportion from 1 if it is above .5).

But what does this proportion tell us exactly? It represents the probability that there is a difference in a specific direction. Thus, it represents one-sided evidence for a difference. In contrast, for a 95% HPD we remove 2.5% from each sides of the difference distribution. To ensure this proportion has the same two-sided property as our HPD intervals, we need to multiply it by 2. A further benefit of this multiplication is that it stretches the range to the whole probability scale (i.e., from 0 to 1).

Thus, the resulting value is a probability (i.e., ranging from 0 to 1), with values near zero denoting evidence for a difference, and values near one provide some evidence against a difference. Thus, in contrast to a classical p-value it is a continuous measure of evidence for (when near 0) or against (when near 1) a difference between the parameter estimates. Given its superficial similarity with classical p-values (i.e., low values are seen as evidence for a difference), we could call this it a version of a Bayesian p-value or pB for short. In the present case we could say: The pB value for a difference between speed and accuracy conditions in drift rate across word and non-word stimuli is .13, indicating that the evidence for a difference is at best weak.

Bayesian p-values of course allows us to misuse them in the same way that we can misuse classical p-values. For example, by introducing arbitrary cutoff values, such as at .05. Imagine for a second that we are interested in testing whether there are differences in the absolute amount of evidence as measured via drift rate for any of the four cells of the design (I am not suggesting that is particularly sensible). For this, we would have to transform the posterior for all drift rates onto the same side (note, we do not want to take the absolute values as we still want to retain the information of switching from positive to negative drift rates or the other way around). For example, by multiplying the drift rate for words by -1. We do so and then inspect the cell means.

samp3 <- fit_wiener %>%
  emmeans( ~ condition*frequency) %>% 
  gather_emmeans_draws() %>% 
  mutate(.value = if_else(frequency == "high", -1 * .value, .value),
         intera = paste(condition, frequency, sep = ".")) 
samp3 %>% 
  mode_hdi(.value)
# # A tibble: 4 x 8
# # Groups:   condition [2]
#   condition frequency .value .lower .upper .width .point .interval
#   <fct>     <fct>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy  high        2.97   2.56   3.34   0.95 mode   hdi      
# 2 accuracy  nw_high     2.25   1.96   2.50   0.95 mode   hdi      
# 3 speed     high        2.76   2.28   3.10   0.95 mode   hdi      
# 4 speed     nw_high     2.00   1.64   2.34   0.95 mode   hdi

Inspection of the four cell means suggests that drift rate values for words are larger then the values for non-words.

To get an overview of all pairwise differences using an arbitrary cut-off value, I have written two functions that returns a compact letter display of all pairwise comparisons. The functions require the data in the wide format, with each column representing the draws for one parameter. Note that the compact letter display is calculated via another package, multcompView, which needs to be installed before using these functions.

get_p_matrix <- function(df, only_low = TRUE) {
  # pre-define matrix
  out <- matrix(-1, nrow = ncol(df), ncol = ncol(df), dimnames = list(colnames(df), colnames(df)))
  for (i in seq_len(ncol(df))) {
    for (j in seq_len(ncol(df))) {
      out[i, j] <- mean(df[,i] < df[,j]) 
    }
  }
  if (only_low) out[out > .5] <- 1- out[out > .5]
  out
}

cld_pmatrix <- function(model, pars, level = 0.05) {
  p_matrix <- get_p_matrix(model)
  lp_matrix <- (p_matrix < (level/2) | p_matrix > (1-(level/2)))
  cld <- multcompView::multcompLetters(lp_matrix)$Letters
  cld
}
samp3 %>% 
  ungroup() %>% ## to get rid of unneeded columns
  select(.value, intera, .draw) %>% 
  spread(intera, .value) %>% 
  select(-.draw) %>% ## we need to get rid of all columns not containing draws
  cld_pmatrix()
# accuracy.high accuracy.nw_high       speed.high    speed.nw_high 
#           "a"              "b"              "a"              "b"

In a compact letter display, conditions that share a common letter do not differ according to the criterion. Conditions that do not share a common letter do differ according to the criterion. Here, the compact letter display is not super informative and just recovers what we have seen above. The drift rates for the words form one group and the drift rates for the non-words form another group. In cases with more conditions or more complicated difference pattern compact letter displays can be quite informative.

We could have also used the functionality of tidybayes to inspect all pairwise comparisons. Note that it is important to use ungroup before invoking the compare_levels function. Otherwise we get an error that is difficult to understand (the grouping appears to be a consequence of using emmeans).

samp3 %>% 
  ungroup %>% 
  compare_levels(.value, by = intera) %>% 
  mode_hdi()
# # A tibble: 6 x 7
#   intera                           .value  .lower  .upper .width .point .interval
#   <fct>                             <dbl>   <dbl>   <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy.nw_high - accuracy.high -0.715 -1.09   -0.351    0.95 mode   hdi      
# 2 speed.high - accuracy.high       -0.190 -0.696   0.256    0.95 mode   hdi      
# 3 speed.nw_high - accuracy.high    -0.946 -1.46   -0.526    0.95 mode   hdi      
# 4 speed.high - accuracy.nw_high     0.488  0.0879  0.876    0.95 mode   hdi      
# 5 speed.nw_high - accuracy.nw_high -0.252 -0.550   0.0647   0.95 mode   hdi      
# 6 speed.nw_high - speed.high       -0.741 -1.12   -0.309    0.95 mode   hdi

Differences in Other Parameters

As discussed above, to look at the differences in the other parameter we apparently cannot use emmeans anymore. Luckily, tidybayes still offers the possibility to extract the posterior samples in a tidy way using either gather_draws or spread_draws. It appears that for either of those you need to pass the specific variable names you want to extract. We get them via get_variables:

get_variables(fit_wiener)[1:10]
# [1] "b_conditionaccuracy:frequencyhigh"    "b_conditionspeed:frequencyhigh"      
# [3] "b_conditionaccuracy:frequencynw_high" "b_conditionspeed:frequencynw_high"   
# [5] "b_bs_conditionaccuracy"               "b_bs_conditionspeed"                 
# [7] "b_ndt_conditionaccuracy"              "b_ndt_conditionspeed"                
# [9] "b_bias_conditionaccuracy"             "b_bias_conditionspeed"

Boundary Separation

We will use spread_draws to analyze the boundary separation. First we extract the draws and then immediately calculate the difference distribution between both.

samp_bs <- fit_wiener %>%
  spread_draws(b_bs_conditionaccuracy, b_bs_conditionspeed) %>% 
  mutate(bs_diff = b_bs_conditionaccuracy - b_bs_conditionspeed)
samp_bs
# # A tibble: 2,000 x 6
#    .chain .iteration .draw b_bs_conditionaccuracy b_bs_conditionspeed bs_diff
#     <int>      <int> <int>                  <dbl>               <dbl>   <dbl>
#  1      1          1     1                   1.73                1.48   0.250
#  2      1          2     2                   1.82                1.41   0.411
#  3      1          3     3                   1.80                1.28   0.514
#  4      1          4     4                   1.85                1.42   0.424
#  5      1          5     5                   1.86                1.37   0.493
#  6      1          6     6                   1.81                1.36   0.450
#  7      1          7     7                   1.67                1.34   0.322
#  8      1          8     8                   1.90                1.47   0.424
#  9      1          9     9                   1.99                1.20   0.790
# 10      1         10    10                   1.76                1.19   0.569
# # ... with 1,990 more rows

Now we can of course use the same tools as above. For example, look at the histogram. Here, I again chose 75 bins.

samp_bs %>% 
  ggplot(aes(bs_diff)) +
  geom_histogram(bins = 75) +
  geom_vline(xintercept = 0)

The histogram reveals pretty convincing evidence for a difference. It appears as if only two samples are below 0. We confirm this suspicion and then calculate the Bayesian p-value. As it turns out, is is also extremely small.

sum(samp_bs$bs_diff < 0)
# [1] 2
mean(samp_bs$bs_diff < 0) *2
# [1] 0.002

All in all we can be pretty confident that manipulating speed versus accuracy conditions affects the boundary separation in the current data set. Exactly as expected.

Non-Decision Time

For assessing differences in the non-decision time, we use gather_draws. One benefit of this function compared to spread_draws is that it makes it easy to obtain the marginal estimates. As already said above, the HPD interval overlap only very little suggesting that there is a difference between the conditions. We save the resulting marginal estimates for later in a new data.frame called ndt_mean.

samp_ndt <- fit_wiener %>%
  gather_draws(b_ndt_conditionaccuracy, b_ndt_conditionspeed) 
(ndt_mean <- samp_ndt %>% 
  median_hdi())
# # A tibble: 2 x 7
#   .variable               .value .lower .upper .width .point .interval
#   <chr>                    <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 b_ndt_conditionaccuracy  0.323  0.293  0.362   0.95 median hdi      
# 2 b_ndt_conditionspeed     0.262  0.235  0.295   0.95 median hdi

To evaluate the difference, the easiest approach to me seems again to spread the two variables across rows and then calculate the difference (i.e., similar to starting with spread_draws in the first place). We can then again plot the resulting difference distribution.

samp_ndt2 <- samp_ndt %>% 
  spread(.variable, .value) %>% 
  mutate(ndt_diff = b_ndt_conditionaccuracy - b_ndt_conditionspeed)  

samp_ndt2 %>% 
  ggplot(aes(ndt_diff)) +
  geom_histogram(bins = 75) +
  geom_vline(xintercept = 0)

As previously speculated, there appears to be strong evidence for a difference. We can further confirm this via the Bayesian p-value:

mean(samp_ndt2$ndt_diff < 0) * 2
# [1] 0.005

So far this looks as if we found another clear difference in parameter estimates due to the manipulation. But this conclusion would be premature. In fact, investigating the non-decision time from the 4-parameter Wiener model estimated in this way is completely misleading. Instead of capturing a meaningful feature of the response time distribution, the non-decision time parameter is only sensitive to very few data points. Specifically, the non-decision time basically only reflects a specific feature of the distribution of minimum response times per participant and per condition or cell for which it is estimated. I will demonstrate this in the following for our example data.

We first need to load the data in the same manner as in the previous posts. We then calculate the minimum RTs per participant and condition.

data(speed_acc, package = "rtdists")
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) # remove extreme RTs
speed_acc <- droplevels(speed_acc[ speed_acc$frequency %in% 
                                     c("high", "nw_high"),])
min_val <- speed_acc %>% 
  group_by(condition, id) %>% 
  summarise(min = min(rt))

To investigate the problem, we want to graphically compare the the distribution of minimum RTs with the estimates for the non-decision time. For this, we need to add a condition column with matching condition names to the ndt_mean data.frame created above. Then, we can plot both into the same plot. We also add several summary statistics regarding the distribution of individual minimum RTs. Specifically, the black points show the individual minimum RTs for each of the two conditions; the blue + shows the median and the blue x the mean of the individual minimum RTs; the blue circle shows the midpoint between the largest and smallest value of the minimum RT distributions; the red square shows the point estimate of the non-decision time parameter with corresponding 95% HPD intervals.

ndt_mean$condition <- c("accuracy", "speed")

ggplot(min_val, aes(x = condition, y = min)) +
  geom_jitter(width = 0.1) +
  geom_pointrange(data = ndt_mean, 
                  aes(y = .value, ymin = .lower, ymax = .upper), 
                  shape = 15, size = 1, color = "red") +
  stat_summary(col = "blue", size = 3.5, shape = 3, 
               fun.y = "median", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 4, 
               fun.y = "mean", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 16, 
               fun.y = function(x) (min(x) + max(x))/2, 
               geom = "point")

What this graph rather impressively shows is that the estimate of the non-decision time almost perfectly matches the midpoint between largest and smallest minimum RT (i.e., the blue dot). Let us put this in perspective by comparing the number of minimum data points (i.e., the number of participants) to the number of total data points.

speed_acc %>% 
  group_by(condition) %>% 
  summarise(n())
# # A tibble: 2 x 2
#   condition `n()`
#   <fct>     <int>
# 1 accuracy   5221
# 2 speed      5241

length(unique(speed_acc$id))
# [1] 17

17 / 5000
# [1] 0.0034

This shows that the non-decision time parameter, one of only four model parameters, is essentially completely determined by less than .5% of the data. If any of these minimum RTs is an outlier (which at least in the accuracy condition seems likely) a single response time can have an immense influence on the parameter estimate. In other words, it can hardly be assumed that with the current implementation the non-decision time parameter reflects an actual latent process. Instead, it simply reflects the midpoint between smallest and largest minimum RT per participant and condition, slightly weighted toward the mass of the distribution of minimum RTs. This parameter estimate should not be used to draw substantive conclusions.

In the present case, this confound does not appear to be too consequential. If only one of the data points in the accuracy condition is an outlier and the other data points are faithful representatives of the leading edge of the response time distribution (which is essentially what the non-decision time is supposed to capture), the current parameter estimates underestimate the true difference. Using a more robust ad-hoc measure of the leading edge, specifically the 10% trimmed mean of the 40 fastest RTs per participant and condition plotted below, further supports this conclusion. This graph also does not contain any clear outliers anymore. For reference, the non-decision time estimates are still included. Nevertheless, having a parameter be essentially driven by very few data points seems completely at odds with the general idea of cognitive modeling and the interpretation of non-decision times obtained with such a model cannot be recommended.

min_val2 <- speed_acc %>% 
  group_by(condition, id) %>% 
  summarise(min = mean(sort(rt)[1:40], trim = 0.1))

ggplot(min_val2, aes(x = condition, y = min)) +
  geom_jitter(width = 0.1) +
  stat_summary(col = "blue", size = 3.5, shape = 3, 
               fun.y = "median", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 4, 
               fun.y = "mean", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 16, 
               fun.y = function(x) (min(x) + max(x))/2, 
               geom = "point") +
  geom_point(data = ndt_mean, aes(y = .value), shape = 15, 
             size = 2, color = "red")

It is important to note that this confound does not hold for all implementations of the diffusion model, but is specific to the 4-parameter Wiener model as implemented here. There are solutions for avoiding this problem, two of which I want to list here. First, one could add across trial variability in the non-decision time. This variability is often assumed to come a uniform distribution which can capture outliers at the leading edge of the response time distribution. Second, instead of only fitting a diffusion model one could assume that some of the responses are contaminants coming from a different process, for example random responses from a uniform distribution ranging from the absolute minimum to maximum RT. Technically, this would consitute a mixture model between the diffusion process and a uniform distribution with either a free or fixed mixture/contamination rate (e.g., ). It should be relatively easy to implement such a mixture model via a custom_family in brms and I hope to find the time to do that at some later point.

I am of course not the first one to discover this behavior of the 4-parameter Wiener model (see e.g., ). However, this problem seems especially prevalent in a Bayesian setting as the 4-parameter model variant is readily available and model variants appropriately dealing with this problem are not. Some time ago I asked Chris Donkin and Greg Cox what they thought would be the best way to address this issue and the one thing I remember from this discussion was Chris’ remark that, when he uses the 4-parameter Wiener model, he simply ignores the non-decision time parameter. That still seems like the best course of action to me.

I hope there are not too many papers out there that use the 4-parameter model in such a way and interpret differences in the non-decision time parameter. If you know of one, I would be interested to learn about it. Either write me a mail or post it in the comments below.

Starting Point / Bias

Finally, we can take a look at the starting point or bias. We do this again using spread_draws and then plot the resulting difference distribution.

samp_bias <- fit_wiener %>%
  spread_draws(b_bias_conditionaccuracy, b_bias_conditionspeed) %>% 
  mutate(bias_diff = b_bias_conditionaccuracy - b_bias_conditionspeed)
samp_bias %>% 
  ggplot(aes(bias_diff)) +
  geom_histogram(bins = 100) +
  geom_vline(xintercept = 0)

The difference distributions suggests there might be a difference. Consequently, we calculate the Bayesian p-value next. Note that we calculate the difference in the other direction this time so that evidence for a difference is represented by small values.

mean(samp_bias$bias_diff > 0) *2
# [1] 0.046

We get lucky and our Bayesian p-value is just below .05, encouraging us to believe that the difference is real. To round this up, we again take a look at the estimates:

fit_wiener %>%
  gather_draws(b_bias_conditionaccuracy, b_bias_conditionspeed) %>% 
  summarise(hdi = get_hdi(.value, level = 80)) %>% 
  unnest
# # A tibble: 2 x 4
#   .variable                 mode lower upper
#   <chr>                    <dbl> <dbl> <dbl>
# 1 b_bias_conditionaccuracy 0.470 0.457 0.484
# 2 b_bias_conditionspeed    0.498 0.484 0.516

Together with the evidence for a difference we can now postulate in a more confident manner that for the accuracy condition there is a bias toward the lower boundary and the “word” responses, whereas evidence accumulation starts unbiased in the speed condition.

Closing Words

This third part wraps up a the most important steps in a diffusion model analysis with brms. Part I shows how to setup the model, Part II shows how to evaluate the adequacy of the model, and the present Part III shows how to inspect the parameter and test hypotheses about them.

As I have mentioned quite a bit throughout these parts, the model used here is not the full diffusion model, but the 4-parameter Wiener model. Whereas this makes estimation possible in the first place, it comes with a few problems. One of them was discussed at length in the present part. The estimate of the non-decision time parameter essentially captures a feature of the distribution of minimum RTs. If these are contaminated by responses that cannot be assumed to come from the same process as the other responses (which I believe a priori to be quite likely), the estimate becomes rather meaningless. My take away from this is that I would not interpret these estimates at all. I feel that the dangers outweigh the benefits by far.

Another feature of the 4-parameter Wiener model is that, in the absence of a bias for any of the response options, it predicts equal mean response times for correct and error responses. This is perhaps the main theoretical constraint which has led to the development of many of the more highly parameterized model variants, such as the full (i.e., 7-parameter) diffusion model. An overview of this issue can, for example, be found in . They write (p. 335):

Depending on the experimental manipulation, RTs for errors are sometimes shorter than RTs for correct responses, sometimes longer, and sometimes there is a crossover in which errors are slower than correct responses when accuracy is low and faster than correct responses when accuracy is high. The models must be capable of capturing all these aspects of a data set.

For the present data we find a specific pattern that is often seen as typical. As shown below, error RTs are quite a bit slower than correct RTs in the accuracy condition. This effect cannot be found in the speed condition where, if anything, error RTs are faster than correct RTs.

speed_acc %>% 
  mutate(correct = stim_cat == response) %>% 
  group_by(condition, correct, id) %>% 
  summarise(mean = mean(rt), 
            se = mean(rt)/sqrt(n())) %>% 
  summarise(mean = mean(mean),
            se = mean(se))
# # A tibble: 4 x 4
# # Groups:   condition [?]
#   condition correct  mean     se
#   <fct>     <lgl>   <dbl>  <dbl>
# 1 accuracy  FALSE   0.751 0.339 
# 2 accuracy  TRUE    0.693 0.0409
# 3 speed     FALSE   0.491 0.103 
# 4 speed     TRUE    0.513 0.0314

Given this difference in the relative speeds of correct and error responses in the accuracy condition, it may seem unsurprising that the accuracy condition is also the one in which we have a measurable bias. Specifically, a bias towards the word responses. However, as can be seen by adding stim_cat into the group_by call above, the difference in the relative error rate is particularly strong for non-words where a bias toward words should lead to faster errors. Thus, it appears that some of the more subtle effects in the data are not fully accounted for in the current model variant.

The canonical way for dealing with differences in the relative speed of errors in diffusion modeling is via across-trial variabilities in the model parameters (see ). Variability in the starting point (introduced by Laming, 1968) allows errors RTs to be faster than correct RTs. Variability in the drift rate (introduced by ) allows error RTs to be slower than correct RTs. (As discussed above, variability in the non-decision time allows its parameter estimates to be less influenced by contaminates or individual outliers.) However, as described below, introducing these variabilities in a Bayesian framework comes with its own problems. Furthermore, there is a recent discussion of the value of these variabilities from a measurement standpoint.

Possible Future Extensions

Whereas this series comes to an end here, there are a few further things that seem either important, interesting, or viable. Maybe I will have some time in the future to talk about these as well, but I suggest to not expect those soon.

  • One important thing we have not yet looked at is the estimates of the group-level parameters (i.e., standard deviations and correlations). They may contain important information about the specific data set and research question, but also about the tradeoffs of the model parameters.

  • Replacing the pure Wiener process with a mixture between a Wiener and a uniform distribution to be able to interpret the non-decision time. As written above, this should be doable with a custom_family in brms.

  • As described above, one of the driving forces for modern response time models, such as the 7-parameter diffusion model, were differences in the relative speed of error and correct RTs. These are usually explained via variabilities in the model parameters. One relatively straight forward way to implement these variabilities in a Bayesian setting would be via the hierarchical structure. For example, each participant gets a by-trial random intercept for the drift rate, + (0+id||trial) (the double bar notation should ensure that these are uncorrelated across participants). Whereas this sounds conceptually simple, I doubt such a model will converge in a reasonable timeframe. Furthermore, as shown by , a model in which the shape of the variability distribution is essentially unconstrained (as is the case when only constraining it via the prior as suggested here) is not testable. The model becomes unfalsifiable as it can predict any data pattern. Given the importance of this approach from a theoretical point of view it nevertheless seems to be an extremely important angle to explore.

  • Fitting the Wiener model takes quite a lot of time. It would be interesting to compare the fit using full Bayesian inference (i.e., sampling as done here) with variational Bayes (i.e., parametric approximation of the posterior), which is also implemented in Stan. I expect that it does not work that well, but the comparison would still be interesting. Recently, diagnostics for variational Bayes were introduced.

  • The diffusion model is of course only one model for response time data. A popular alternative is the LBA. I know there are some implementations in Stan out there, so if they could be accessed via brms, this would be quite interesting.

The RMarkdown file for this post is available here.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/wiener-model-analysis-with-brms-part-iii/feed/ 2 708
Diffusion/Wiener Model Analysis with brms – Part II: Model Diagnostics and Model Fit http://singmann.org/wiener-model-analysis-with-brms-part-ii/ http://singmann.org/wiener-model-analysis-with-brms-part-ii/#comments Sun, 07 Jan 2018 20:08:31 +0000 http://singmann.org/?p=624 This is the considerably belated second part of my blog series on fitting diffusion models (or better, the 4-parameter Wiener model) with brms. The first part discusses how to set up the data and model. This second part is concerned with perhaps the most important steps in each model based data analysis, model diagnostics and the assessment of model fit. Note, the code in the part is completely self sufficient and can be run without running the code of part I.

Setup

At first, we load quite a few packages that we will need down the way. Obviously brms, but also some of the packages from the tidyverse (i.e., dplyr, tidyr, tibble, and ggplot2). It took me a little time to jump on the tidyverse bandwagon, but now that I use it more and more I cannot deny its utility. If your data can be made ‘tidy’, the coherent set of tools offered by the tidyverse make many seemingly complicated tasks pretty easy. A few examples of this will be shown below. If you need more introduction, I highly recommend the awesome ‘R for Data Science’ book by Grolemund and Wickham, which they made available for free! We also need gridExtra for combining plots and DescTools for the concordance correlation coefficient CCC used below.

library("brms")
library("dplyr")
library("tidyr")
library("tibble")    # for rownames_to_column
library("ggplot2")
library("gridExtra") # for grid.arrange
library("DescTools") # for CCC

As in part I, we need package rtdists for the data.

data(speed_acc, package = "rtdists")
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) # remove extreme RTs
speed_acc <- droplevels(speed_acc[ speed_acc$frequency %in% 
                                     c("high", "nw_high"),])
speed_acc$response2 <- as.numeric(speed_acc$response)-1

I have uploaded the binary R data file containing the fitted model object as well as the generated posterior predictive distributions to github, from which we can download them directly into R. Note that I needed to go the way via a temporary folder. If there is a way without that I would be happy to learn about it.

tmp <- tempdir()
download.file("https://singmann.github.io/files/brms_wiener_example_fit.rda", 
              file.path(tmp, "brms_wiener_example_fit.rda"))
download.file("https://singmann.github.io/files/brms_wiener_example_predictions.rda", 
              file.path(tmp, "brms_wiener_example_predictions.rda"))
load(file.path(tmp, "brms_wiener_example_fit.rda"))
load(file.path(tmp, "brms_wiener_example_predictions.rda"))

Model Diagnostics

We already know from part I that there are a few divergent transitions. If this were a real analysis we therefore would not be satisfied with the current fit and try to rerun brm with an increased adapt_delta with the hope that this removes the divergent transitions. The Stan warning guidelines clearly state that “the validity of the estimates is not guaranteed if there are post-warmup divergences”. However, it is unclear what the actual impact of the small number of divergent transitions (< 10) observed here is on the posterior. Also, it is unclear what one can do if adapt_delta cannot be increased anymore and the model also cannot be reparameterized. Should all fits with any divergent transitions be completely disregarded? I hope the Stan team provides more guidelines to such questions in the future.

Coming back to our fit, as a first step in our model diagnostics we check the R-hat statistic as well as the number of effective samples. Specifically, we look at the parameters with the highest R² and lowest number of effective samples.

tail(sort(rstan::summary(fit_wiener$fit)$summary[,"Rhat"]))
#                      sd_id__conditionaccuracy:frequencyhigh 
#                                                        1.00 
#                              r_id__bs[15,conditionaccuracy] 
#                                                        1.00 
#                                    b_bias_conditionaccuracy 
#                                                        1.00 
# cor_id__conditionspeed:frequencyhigh__ndt_conditionaccuracy 
#                                                        1.00 
#                                   sd_id__ndt_conditionspeed 
#                                                        1.00 
#  cor_id__conditionspeed:frequencynw_high__bs_conditionspeed 
#                                                        1.01 
head(sort(rstan::summary(fit_wiener$fit)$summary[,"n_eff"]))
#                                     lp__ 
#                                      462 
#        b_conditionaccuracy:frequencyhigh 
#                                      588 
#                sd_id__ndt_conditionspeed 
#                                      601 
#      sd_id__conditionspeed:frequencyhigh 
#                                      646 
#           b_conditionspeed:frequencyhigh 
#                                      695 
# r_id[12,conditionaccuracy:frequencyhigh] 
#                                      712

Both are unproblematic (i.e., R-hat < 1.05 and n_eff > 100) and suggest that the sampler has converged on the stationary distribution. If anyone has a similar oneliner to return the number of divergent transitions, I would be happy to learn about it.

We also visually inspect the chain behavior of a few semi-randomly selected parameters.

pars <- parnames(fit_wiener)
pars_sel <- c(sample(pars[1:10], 3), sample(pars[-(1:10)], 3))
plot(fit_wiener, pars = pars_sel, N = 6, 
     ask = FALSE, exact_match = TRUE, newpage = TRUE, plot = TRUE)

This visual inspection confirms the earlier conclusion. For all parameters the posteriors look well-behaved and the chains appears to mix well.

Finally, in the literature there are some discussions about parameter trade-offs for the diffusion and related models. These trade-offs supposedly make fitting the diffusion model in a Bayesian setting particularly complicated. To investigate whether fitting the Wiener model with HMC as implemented in Stan (i.e., NUTS) also shows this pattern we take a look at the joint posterior of the fixed-effects of the main Wiener parameters for the accuracy condition. For this we use the stanfit method of the pairs function and set the condition to "divergent__". This plots the few divergent transitions above the diagonal and the remaining samples below the diagonal.

pairs(fit_wiener$fit, pars = pars[c(1, 3, 5, 7, 9)], condition = "divergent__")

This plot shows some correlations, but nothing too dramatic. HMC appears to sample quite efficiently from the Wiener model.

Next we also take a look at the correlations across all parameters (not only the fixed effects).

posterior <- as.mcmc(fit_wiener, combine_chains = TRUE)
cor_posterior <- cor(posterior)
cor_posterior[lower.tri(cor_posterior, diag = TRUE)] <- NA
cor_long <- as.data.frame(as.table(cor_posterior))
cor_long <- na.omit(cor_long)
tail(cor_long[order(abs(cor_long$Freq)),], 10)
#                              Var1                         Var2   Freq
# 43432        b_ndt_conditionspeed  r_id__ndt[1,conditionspeed] -0.980
# 45972 r_id__ndt[4,conditionspeed] r_id__ndt[11,conditionspeed]  0.982
# 46972        b_ndt_conditionspeed r_id__ndt[16,conditionspeed] -0.982
# 44612        b_ndt_conditionspeed  r_id__ndt[6,conditionspeed] -0.983
# 46264        b_ndt_conditionspeed r_id__ndt[13,conditionspeed] -0.983
# 45320        b_ndt_conditionspeed  r_id__ndt[9,conditionspeed] -0.984
# 45556        b_ndt_conditionspeed r_id__ndt[10,conditionspeed] -0.985
# 46736        b_ndt_conditionspeed r_id__ndt[15,conditionspeed] -0.985
# 44140        b_ndt_conditionspeed  r_id__ndt[4,conditionspeed] -0.990
# 45792        b_ndt_conditionspeed r_id__ndt[11,conditionspeed] -0.991

This table lists the ten largest absolute values of correlations among posteriors for all pairwise combinations of parameters. The value in column Freq somewhat unintuitively is the observed  correlation among the posteriors of the two parameters listed in the two previous columns. To create this table I used a trick from SO using as.table, which is responsible for labeling the column containing the correlation value Freq.

What the table shows is some extreme correlations for the individual-level deviations (the first index in the squared brackets of the parameter names seems to be the participant number). Let us visualize these correlations as well.

pairs(fit_wiener$fit, pars = 
        c("b_ndt_conditionspeed", 
          "r_id__ndt[11,conditionspeed]",
          "r_id__ndt[4,conditionspeed]"), 
      condition = "divergent__")

This plot shows that some of the individual-level parameters are not well estimated.

However, overall these extreme correlations appear rather rarely.

hist(cor_long$Freq, breaks = 40)

Overall the model diagnostics do not show any particularly worrying behavior (with the exception of the divergent transitions). We have learned that a few of the individual-level estimates for some of the parameters are not very trustworthy. However, this does not disqualify the overall fit. The main take away from this fact is that we would need to be careful in interpreting the individual-level estimates. Thus, we assume the fit is okay and continue with the next step of the analysis.

Assessing Model Fit

We will now investigate the model fit. That is, we will investigate whether the model provides an adequate description of the observed data. We will mostly do so via graphical checks. To do so, we need to prepare the posterior predictive distribution and the data. As a first step, we combine the posterior predictive distributions with the data.

d_speed_acc <- as_tibble(cbind(speed_acc, as_tibble(t(pred_wiener))))

Then we calculate three important measures (or test statistics T()) on the individual level for each cell of the design (i.e., combination of condition and frequency factors):

  • Probability of giving an upper boundary response (i.e., respond “nonword”).
  • Median RTs for responses to the upper boundary.
  • Median RTs for the lower boundary.

We first calculate this for each sample of the posterior predictive distribution. We then summarize these three measures by calculating the median and some additional quantiles across the posterior predictive distribution. We calculate all of this in one step using a somewhat long combination of dplyr and tidyr magic.

d_speed_acc_agg <- d_speed_acc %>% 
  group_by(id, condition, frequency) %>%  # select grouping vars
  summarise_at(.vars = vars(starts_with("V")), 
               funs(prob.upper = mean(. > 0),
                    medrt.lower = median(abs(.[. < 0]) ),
                    medrt.upper = median(.[. > 0] )
               )) %>% 
  ungroup %>% 
  gather("key", "value", -id, -condition, -frequency) %>% # remove grouping vars
  separate("key", c("rep", "measure"), sep = "_") %>% 
  spread(measure, value) %>% 
  group_by(id, condition, frequency) %>% # select grouping vars
  summarise_at(.vars = vars(prob.upper, medrt.lower, medrt.upper), 
               .funs = funs(median = median(., na.rm = TRUE),
                            llll = quantile(., probs = 0.01,na.rm = TRUE),
                            lll = quantile(., probs = 0.025,na.rm = TRUE),
                            ll = quantile(., probs = 0.1,na.rm = TRUE),
                            l = quantile(., probs = 0.25,na.rm = TRUE),
                            h = quantile(., probs = 0.75,na.rm = TRUE),
                            hh = quantile(., probs = 0.9,na.rm = TRUE),
                            hhh = quantile(., probs = 0.975,na.rm = TRUE),
                            hhhh = quantile(., probs = 0.99,na.rm = TRUE)
               ))

Next, we calculate the three measures also for the data and combine it with the results from the posterior predictive distribution in one data.frame using left_join.

speed_acc_agg <- speed_acc %>% 
  group_by(id, condition, frequency) %>% # select grouping vars
  summarise(prob.upper = mean(response == "nonword"),
            medrt.upper = median(rt[response == "nonword"]),
            medrt.lower = median(rt[response == "word"])
  ) %>% 
  ungroup %>% 
  left_join(d_speed_acc_agg)

Aggregated Model-Fit

The first important question is whether our model can adequately describe the overall patterns in the data aggregated across participants. For this we simply aggregate the results obtained in the previous step (i.e., the summary results from the posterior predictive distribution as well as the test statistics from the data) using mean.

d_speed_acc_agg2 <- speed_acc_agg %>% 
  group_by(condition, frequency) %>% 
  summarise_if(is.numeric, mean, na.rm = TRUE) %>% 
  ungroup

We then use these summaries and plot predictions (in grey and black) as well as data (in red) for the three measures. The inner (fat) error bars show the 80% credibility intervals (CIs), the outer (thin) error bars show the 95% CIs. The black circle shows the median of the posterior predictive distributions.

new_x <- with(d_speed_acc_agg2, 
              paste(rep(levels(condition), each = 2), 
                    levels(frequency), sep = "\n"))

p1 <- ggplot(d_speed_acc_agg2, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  prob.upper_lll, ymax =  prob.upper_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  prob.upper_ll, ymax =  prob.upper_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = prob.upper_median), shape = 1) +
  geom_point(aes(y = prob.upper), shape = 4, col = "red") +
  ggtitle("Response Probabilities") + 
  ylab("Probability of upper resonse") + xlab("") +
  scale_x_discrete(labels = new_x)

p2 <- ggplot(d_speed_acc_agg2, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  medrt.upper_lll, ymax =  medrt.upper_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  medrt.upper_ll, ymax =  medrt.upper_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = medrt.upper_median), shape = 1) +
  geom_point(aes(y = medrt.upper), shape = 4, col = "red") +
  ggtitle("Median RTs upper") + 
  ylab("RT (s)") + xlab("") +
  scale_x_discrete(labels = new_x)

p3 <- ggplot(d_speed_acc_agg2, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  medrt.lower_lll, ymax =  medrt.lower_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  medrt.lower_ll, ymax =  medrt.lower_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = medrt.lower_median), shape = 1) +
  geom_point(aes(y = medrt.lower), shape = 4, col = "red") +
  ggtitle("Median RTs lower") + 
  ylab("RT (s)") + xlab("") +
  scale_x_discrete(labels = new_x)

grid.arrange(p1, p2, p3, ncol = 2)

 

Inspection of the plots show no dramatic misfit. Overall the model appears to be able to describe the general patterns in the data. Only the response probabilities for words (i.e., frequency = high) appears to be estimated too low. The red x appear to be outside the 80% CIs but possibly also outside the 95% CIs.

The plots of the RTs show an interesting (but not surprising) pattern. The posterior predictive distributions for the rare responses (i.e., “word” responses for upper/non-word stimuli and “nonword” response to lower/word stimuli) are relatively wide. In contrast, the posterior predictive distributions for the common responses are relatively narrow. In each case, the observed median is inside the 80% CI and also quite near to the predicted median.

Individual-Level Fit

To investigate the pattern of predicted response probabilities further, we take a look at them on the individual level. We again plot the response probabilities in the same way as above, but separated by participant id.

ggplot(speed_acc_agg, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  prob.upper_lll, ymax =  prob.upper_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  prob.upper_ll, ymax =  prob.upper_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = prob.upper_median), shape = 1) +
  geom_point(aes(y = prob.upper), shape = 4, col = "red") +
  facet_wrap(~id, ncol = 3) +
  ggtitle("Prediced (in grey) and observed (red) response probabilities by ID") + 
  ylab("Probability of upper resonse") + xlab("") +
  scale_x_discrete(labels = new_x)

This plot shows a similar pattern as the aggregated data. For none of the participants do we observe dramatic misfit. Furthermore, response probabilities to non-word stimuli appear to be predicted rather well. In contrast, response probabilities for word-stimuli are overall predicted to be lower than observed. However, this misfit does not seem to be too strong.

As a next step we look at the coverage probabilities of our three measures across individuals. That is, we calculate for each of the measures, for each of the cells of the design, and for each of the CIs (i.e., 50%, 80%, 95%, and 99%), the proportion of participants for which the observed test statistics falls into the corresponding CI.

speed_acc_agg %>% 
  mutate(prob.upper_99 = (prob.upper >= prob.upper_llll) & 
           (prob.upper <= prob.upper_hhhh),
         prob.upper_95 = (prob.upper >= prob.upper_lll) & 
           (prob.upper <= prob.upper_hhh),
         prob.upper_80 = (prob.upper >= prob.upper_ll) & 
           (prob.upper <= prob.upper_hh),
         prob.upper_50 = (prob.upper >= prob.upper_l) & 
           (prob.upper <= prob.upper_h),
         medrt.upper_99 = (medrt.upper > medrt.upper_llll) & 
           (medrt.upper < medrt.upper_hhhh),
         medrt.upper_95 = (medrt.upper > medrt.upper_lll) & 
           (medrt.upper < medrt.upper_hhh),
         medrt.upper_80 = (medrt.upper > medrt.upper_ll) & 
           (medrt.upper < medrt.upper_hh),
         medrt.upper_50 = (medrt.upper > medrt.upper_l) & 
           (medrt.upper < medrt.upper_h),
         medrt.lower_99 = (medrt.lower > medrt.lower_llll) & 
           (medrt.lower < medrt.lower_hhhh),
         medrt.lower_95 = (medrt.lower > medrt.lower_lll) & 
           (medrt.lower < medrt.lower_hhh),
         medrt.lower_80 = (medrt.lower > medrt.lower_ll) & 
           (medrt.lower < medrt.lower_hh),
         medrt.lower_50 = (medrt.lower > medrt.lower_l) & 
           (medrt.lower < medrt.lower_h)
  ) %>% 
  group_by(condition, frequency) %>% ## grouping factors without id
  summarise_at(vars(matches("\\d")), mean, na.rm = TRUE) %>% 
  gather("key", "mean", -condition, -frequency) %>% 
  separate("key", c("measure", "ci"), "_") %>% 
  spread(ci, mean) %>% 
  as.data.frame()
#    condition frequency     measure    50     80    95    99
# 1   accuracy      high medrt.lower 0.706 0.8824 0.882 1.000
# 2   accuracy      high medrt.upper 0.500 0.8333 1.000 1.000
# 3   accuracy      high  prob.upper 0.529 0.7059 0.765 0.882
# 4   accuracy   nw_high medrt.lower 0.500 0.8125 0.938 0.938
# 5   accuracy   nw_high medrt.upper 0.529 0.8235 1.000 1.000
# 6   accuracy   nw_high  prob.upper 0.529 0.8235 0.941 0.941
# 7      speed      high medrt.lower 0.471 0.8824 0.941 1.000
# 8      speed      high medrt.upper 0.706 0.9412 1.000 1.000
# 9      speed      high  prob.upper 0.000 0.0588 0.588 0.647
# 10     speed   nw_high medrt.lower 0.706 0.8824 0.941 0.941
# 11     speed   nw_high medrt.upper 0.471 0.7647 1.000 1.000
# 12     speed   nw_high  prob.upper 0.235 0.6471 0.941 1.000

As can be seen, for the RTs, the coverage probability is generally in line with the width of the CIs or even above it. Furthermore, for the common response (i.e., upper for frequency = nw_high and lower for frequency = high), the coverage probability is 1 for the 99% CIs in all cases.

Unfortunately, for the response probabilities, the coverage is not that great. especially in the speed condition and for tighter CIs. However, for the wide CIs the coverage probabilities is at least acceptable. Overall the results so far suggest that the model provides an adequate account. There are some misfits that should be kept in mind if one is interested in extending the model or fitting it to new data, but overall it provides a satisfactory account.

QQ-plots: RTs

The final approach for assessing the fit of the model will be based on more quantiles of the RT distribution (i.e., so far we only looked at th .5 quantile, the median). We will then plot individual observed versus predicted (i.e., mean from posterior predictive distribution) quantiles across conditions. For this we first calculate the quantiles per sample from the posterior predictive distribution and then aggregate across the samples. This is achieved via dplyr::summarise_at using a list column and tidyr::unnest to unstack the columns (see section 25.3 in “R for data Science”). We then combine the aggregated predicted RT quantiles with the observed RT quantiles.

quantiles <- c(0.1, 0.25, 0.5, 0.75, 0.9)

pp2 <- d_speed_acc %>% 
  group_by(id, condition, frequency) %>%  # select grouping vars
  summarise_at(.vars = vars(starts_with("V")), 
               funs(lower = list(rownames_to_column(
                 data.frame(q = quantile(abs(.[. < 0]), probs = quantiles)))),
                    upper = list(rownames_to_column(
                      data.frame(q = quantile(.[. > 0], probs = quantiles ))))
               )) %>% 
  ungroup %>% 
  gather("key", "value", -id, -condition, -frequency) %>% # remove grouping vars
  separate("key", c("rep", "boundary"), sep = "_") %>% 
  unnest(value) %>% 
  group_by(id, condition, frequency, boundary, rowname) %>% # grouping vars + new vars
  summarise(predicted = mean(q, na.rm = TRUE))

rt_pp <- speed_acc %>% 
  group_by(id, condition, frequency) %>% # select grouping vars
  summarise(lower = list(rownames_to_column(
    data.frame(observed = quantile(rt[response == "word"], probs = quantiles)))),
    upper = list(rownames_to_column(
      data.frame(observed = quantile(rt[response == "nonword"], probs = quantiles ))))
  ) %>% 
  ungroup %>% 
  gather("boundary", "value", -id, -condition, -frequency) %>%
  unnest(value) %>% 
  left_join(pp2)

To evaluate the agreement between observed and predicted quantiles we calculate for each cell and quantile the concordance correlation coefficient (CCC; e.g, Barchard, 2012, Psych. Methods). The CCC is a measure of absolute agreement between two values and thus better suited than simple correlation. It is scaled from -1 to 1 where 1 represent perfect agreement, 0 no relationship, and -1 a correlation of -1 with same mean and variance of the two variables.

The following code produces QQ-plots for each condition and quantile separately for responses to the upper boundary and lower boundary. The value in the upper left of each plot gives the CCC measures of absolute agreement.

plot_text <- rt_pp %>% 
  group_by(condition, frequency, rowname, boundary) %>% 
  summarise(ccc = format(
    CCC(observed, predicted, na.rm = TRUE)$rho.c$est, 
    digits = 2))

p_upper <- rt_pp %>% 
  filter(boundary == "upper") %>% 
  ggplot(aes(x = observed, predicted)) +
  geom_abline(slope = 1, intercept = 0) +
  geom_point() +
  facet_grid(condition+frequency~ rowname) + 
  geom_text(data=plot_text[ plot_text$boundary == "upper", ],
            aes(x = 0.5, y = 1.8, label=ccc), 
            parse = TRUE, inherit.aes=FALSE) +
  coord_fixed() +
  ggtitle("Upper responses") +
  theme_bw()

p_lower <- rt_pp %>% 
  filter(boundary == "lower") %>% 
  ggplot(aes(x = observed, predicted)) +
  geom_abline(slope = 1, intercept = 0) +
  geom_point() +
  facet_grid(condition+frequency~ rowname) + 
  geom_text(data=plot_text[ plot_text$boundary == "lower", ],
            aes(x = 0.5, y = 1.6, label=ccc), 
            parse = TRUE, inherit.aes=FALSE) +
  coord_fixed() +
  ggtitle("Lower responses") +
  theme_bw()

grid.arrange(p_upper, p_lower, ncol = 1)

Results show that overall the fit is better for the accuracy than the speed conditions. Furthermore, fit is better for the common response (i.e., nw_high for upper and high for lower responses). This latter observation is again not too surprising.

When comparing the fit for the different quantiles it appears that at least the median (i.e., 50%) shows acceptable values for the common response. However, especially in the speed condition the account of the other quantiles is not great. Nevertheless, dramatic misfit is only observed for the rare responses.

One possibility for some of the low CCCs in the speed conditions may be the comparatively low variances in some of the cells. For example, for both speed conditions that are common (i.e., speed & nw_high for upper responses and speed & high for lower responses) the visual inspection of the plot suggests a acceptable account while at the same time some CCC value are low (i.e., < .5). Only for the 90% quantile in the speed conditions (and somewhat less the 75% quantile) we see some systematic deviations. The model predicts slower RTs than observed.

Taken together, the model appear to provide an at least acceptable account. The only slightly worrying patterns are (a) that the model predicts a slightly better performance for the word stimuli than observed (i.e., lower predicted rate of non-word responses than observed for word-stimuli) and (b) that in the speed conditions the model predicts somewhat longer RTs for the 75% and 90% quantile than observed.

The next step will be to look at differences between parameters as a function of the speed-accuracy condition. This is the topic of the third blog post. I am hopeful it will not take two months this time.

 

]]>
http://singmann.org/wiener-model-analysis-with-brms-part-ii/feed/ 6 624
Diffusion/Wiener Model Analysis with brms – Part I: Introduction and Estimation http://singmann.org/wiener-model-analysis-with-brms-part-i/ http://singmann.org/wiener-model-analysis-with-brms-part-i/#comments Sun, 26 Nov 2017 17:47:48 +0000 http://singmann.org/?p=570 Stan is probably the most interesting development in computational statistics in the last few years, at least for me. The version of Hamiltonian Monte-Carlo (HMC) implemented in Stan (NUTS, ) is extremely efficient and the range of probability distributions implemented in the Stan language allows to fit an extremely wide range of models. Stan has considerably changed which models I think can be realistically estimated both in terms of model complexity and data size. It is not an overstatement to say that Stan (and particularly rstan) has considerable changed the way I analyze data.

One of the R packages that allows to implement Stan models in a very convenient manner and which has created a lot of buzz recently is brms . It allows to specify a wide range of models using the R formula interface. Based on the formula and a specification of the family of the model, it generates the model code, compiles it, and then passes it together with the data to rstan for sampling. Because I usually program my models by-hand (thanks to the great Stan documentation), I have so far stayed away from brms.

However, I recently learned that brms also allows the estimation of the Wiener model (i.e., the 4-parameter diffusion model, ) for simultaneously accounting for responses and corresponding response times for data from two-choice tasks. Such data is quite common in psychology and the diffusion model is one of the more popular cognitive models out there . In a series of (probably 3) posts I provide an example of applying the Wiener model to some published data using brms. This first part shows how to set up and estimate the model. The second part gives an overview of model diagnostics and an assessment of model fit via posterior predictive distributions. The third part shows how to inspect and compare the posterior distributions of the parameters.

In addition to brms and a working C++ compiler, this first part also needs package RWiener for generating the posterior predictive distribution within brms and package rtdists for the data.

library("brms")

Data and Model

A graphical illustration of the Wiener diffusion model for two-choice reaction times. An evidence counter starts at value `alpha`*`beta` and evolves with random increments. The mean increment is `delta` . The process terminates as soon as the accrued evidence exceeds `alpha` or deceeds 0. The decision process starts at time `tau` from the stimulus presentation and terminates at the reaction time. [This figure and caption are taken from Wabersich and Vandekerckhove (2014, The R Journal, CC-BY license).]

I expect the reader to already be familiar with the Wiener model and will only provide the a very brief introduction here, for more see . The Wiener model is a continuous-time evidence accumulation model for binary choice tasks. It assumes that in each trial evidence is accumulated in a noisy (diffusion) process by a single accumulator. Evidence accumulation starts at the start point and continues until the accumulator hits one of the two decision bounds in which case the corresponding response is given. The total response time is the sum of the decision time from the accumulation process plus non-decisional components. In sum, the Wiener model allows to decompose responses to a binary choice tasks and corresponding response times into four latent processes:

  • The drift rate (delta) is the average slope of the accumulation process towards the boundaries. The larger the (absolute value of the) drift rate, the stronger the evidence for the corresponding response option.
  • The boundary separation (alpha) is the distance between the two decision bounds and interpreted as a measure of response caution.
  • The starting point (beta) of the accumulation process is a measure of response bias towards one of the two response boundaries.
  • The non-decision time (tau) captures all non-decisional process such as stimulus encoding and response processes.

We will analyze part of the data from Experiment 1 of . The data comes from 17 participants performing a lexical decision task in which they have to decide if a presented string is a word or non-word. Participants made decisions either under speed or accuracy emphasis instructions in different experimental blocks. This data comes with the rtdists package (which provides the PDF, CDF, and RNG for the full 7-parameter diffusion model). After removing some extreme RTs, we restrict the analysis to high-frequency words (frequency = high) and the corresponding high-frequency non-words (frequency = nw_high) to reduce estimation time. To setup the model we also need a numeric response variable in which 0 corresponds to responses at the lower response boundary and 1 corresponds to responses at the upper boundary. For this we transform the categorical response variable response to numeric and subtract 1 such that a word response correspond to the lower response boundary and a nonword response to the upper boundary.

data(speed_acc, package = "rtdists")
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) # remove extreme RTs
speed_acc <- droplevels(speed_acc[ speed_acc$frequency %in% 
                                     c("high", "nw_high"),])
speed_acc$response2 <- as.numeric(speed_acc$response)-1
str(speed_acc)
'data.frame':    10462 obs. of  10 variables:
 $ id       : Factor w/ 17 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ block    : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ condition: Factor w/ 2 levels "accuracy","speed": 2 2 2 2 2 2 2 2 2 2 ...
 $ stim     : Factor w/ 1611 levels "1001","1002",..: 1271 46 110 666 422 ...
 $ stim_cat : Factor w/ 2 levels "word","nonword": 2 1 1 1 1 1 2 1 1 2 ...
 $ frequency: Factor w/ 2 levels "high","nw_high": 2 1 1 1 1 1 2 1 1 2 ...
 $ response : Factor w/ 2 levels "word","nonword": 2 1 1 1 1 1 1 1 1 1 ...
 $ rt       : num  0.773 0.39 0.435 0.427 0.622 0.441 0.308 0.436 0.412 ...
 $ censor   : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ response2: num  1 0 0 0 0 0 0 0 0 0 ...

Model Formula

The important decision that has to be made before setting up a model is which parameters are allowed to differ between which conditions (i.e., factor levels). One common constraint of the Wiener model (and other evidence-accumulation models) is that the parameters that are set before the evidence accumulation process starts (i.e., boundary separation, starting point, and non-decision time) cannot change based on stimulus characteristics that are not known to the participant before the start of the trial. Thus, the item-type, in the present case word versus non-word, is usually only allowed to affect the drift rate. We follow this constraint. Furthermore, all four parameters are allowed to vary between speed and accuracy condition as this is manipulated between blocks of trials. Also note that all relevant variables are manipulated within-subjects. Thus, the maximal random-effects structure entails corresponding random-effects parameters for each fixed-effect. To set up the model we need to invoke the bf() function and construct one formula for each of the four parameters of the Wiener model.

formula <- bf(rt | dec(response2) ~ 0 + condition:frequency + 
                (0 + condition:frequency|p|id), 
               bs ~ 0 + condition + (0 + condition|p|id), 
               ndt ~ 0 + condition + (0 + condition|p|id),
               bias ~ 0 + condition + (0 + condition|p|id))

The first formula is for the drift rate and is also used for specifying the column containing the RTs (rt) and the response or decision (response2) on the left hand side. On the right hand side one can specify fixed effects as well as random effects in a way similar to lme4. The drift rate is allowed to vary between both variables, condition and frequency (stim_cat would be equivalent), thus we estimate fixed effects as well as random effects for both factors as well as their interaction.

We then also need to set up one formula for each of the other three parameters (which are only allowed to vary by condition). For these formulas, the left hand side denotes the parameter names:

  • bs: boundary separation (alpha)
  • ndt: non-decision time (tau)
  • bias: starting point (beta)

The right hand side again specifies the fixed- and random-effects. Note that one common approach for setting up evidence accumulation models is to specify that one response boundary represent correct responses and one response boundary denotes incorrect responses (in contrast to the current approach in which the response boundaries represent the actually two response options). In such a situation one cannot estimate the starting point and it needs to be fixed to 0.5 (i.e., replace the formula with bias = 0.5).

Two further points are relevant in the formulas. First, I have used a somewhat uncommon parameterization and suppressed the intercept (e.g., ~ 0 + condition instead of ~ condition). The reason for this is that when an intercept is present, categorical variables (i.e., factors) with k levels are coded with k-1 deviation variables that represent deviations from the intercept. Thus, in a Bayesian setting one needs to consider the choice of prior for these deviation variables. In contrast, when suppressing the intercept the model can be setup such that each factor level (or design cells in case of involvement of more than one factor) receives its own parameter, as done here. This essentially allows the same prior for each parameter (as long as one does not expect the parameters to vary dramatically). Furthermore, when programming a model oneself this is a common parameterization. To see the differences between the different parameterizations compare the following two calls (model.matrix is the function that creates the parameterization internally). Only the first creates a separate parameter for each condition.

unique(model.matrix(~0+condition, speed_acc))
##     conditionaccuracy conditionspeed
## 36                  0              1
## 128                 1              0
unique(model.matrix(~condition, speed_acc))
##     (Intercept) conditionspeed
## 36            1              1
## 128           1              0

Note that when more than one factor is involved and one wants to use this parameterization, one needs to combine the factors using the : and not *. This can be seen when running the code below. Also note that when combining the factors with : without suppressing the intercept, the resulting model has one parameter more than can be estimated (i.e., the model-matrix is rank deficient). So care needs to be taken at this step.

unique(model.matrix(~ 0 + condition:frequency, speed_acc))
unique(model.matrix(~ 0 + condition*frequency, speed_acc))
unique(model.matrix(~ condition:frequency, speed_acc))

Second, brms formulas provide a way to estimate correlations among random-effects parameters of different formulas. To achieve this, one can place an identifier in the middle of the random-effects formula that is separated by | on both sides. Correlations among random-effects will then be estimated for all random-effects formulas that share the same identifier. In our case, we want to estimate the full random-effects matrix with correlations among all model parameters, following the “latent-trait approach” . We therefore place the same identifier (p) in all formulas. Thus, correlations will be estimated among all individual-level deviations across all four Wiener parameters. To estimate correlations only among the random-effects parameters of each formula, simply omit the identifier (e.g., (0 + condition|id)). Furthermore, note that brms, similar to afex, supports suppressing the correlations among categorical random-effects parameters via || (e.g., (0 + condition||id)).

Family, Link-Functions, and Priors

The next step is to setup the priors. For this we can invoke the get_prior function. This function requires one to specify the formula, data, as well as the family of the model. family is the argument where we tell brms that we want to use the wiener model. We also use it to specify the link function for the four Wiener parameters. Because the drift rate can take on any value (i.e., from -Inf to Inf), the default link function is "identity" (i.e., no transformation) which we retain. The other three parameters all have a restricted range. The boundary needs to be larger than 0, the non-decision time needs to be larger than 0 and smaller than the smallest RT, and the starting point needs to be between 0 and 1. The default link-functions respect these constraints and use "log" for the first two parameters and "logit" for the bias. This certainly is a possibility, but has a number of drawbacks leading me to use the "identity" link function for all parameters. First, when parameters are transformed, the priors need to be specified on the untransformed scale. Second, the individual-levels deviations (i.e., the random-effects estimates) are assumed to come from a multivariate normal distribution. Parameters transformations would entail that these individual-deviations are only normally distributed on the untransformed scale. Likewise, the correlations of parameter deviations across parameters would also be on the untransformed scale. Both make the interpretation of the random-effects difficult.

When specifying the parameters without transformation (i.e., link = "identity") care must be taken that the priors places most mass on values inside the allowed range. Likewise, starting values need to be inside the allowed range. Using the identity link function also comes with drawbacks discussed at the end. However, as long as parameter outside the allowed range only occur rarely, such a model can converge successfully and it makes the interpretation easier.

The get_prior function returns a data.frame containing all parameters of the model. If parameters have default priors these are listed as well. One needs to define priors either for individual parameters, parameter classes, or parameter classes for specific groups, or dpars. Note that all parameters that do not have a default prior should receive a specific prior.

get_prior(formula,
          data = speed_acc, 
          family = wiener(link_bs = "identity", 
                          link_ndt = "identity", 
                          link_bias = "identity"))

[Two empty columns to the right were removed from the following output.]

                 prior class                               coef group resp dpar 
1                          b                                                    
2                          b    conditionaccuracy:frequencyhigh                 
3                          b conditionaccuracy:frequencynw_high                 
4                          b       conditionspeed:frequencyhigh                 
5                          b    conditionspeed:frequencynw_high                 
6               lkj(1)   cor                                                    
7                        cor                                       id           
8  student_t(3, 0, 10)    sd                                                    
9                         sd                                       id           
10                        sd    conditionaccuracy:frequencyhigh    id           
11                        sd conditionaccuracy:frequencynw_high    id           
12                        sd       conditionspeed:frequencyhigh    id           
13                        sd    conditionspeed:frequencynw_high    id           
14                         b                                               bias 
15                         b                  conditionaccuracy            bias 
16                         b                     conditionspeed            bias 
17 student_t(3, 0, 10)    sd                                               bias 
18                        sd                                       id      bias 
19                        sd                  conditionaccuracy    id      bias 
20                        sd                     conditionspeed    id      bias 
21                         b                                                 bs 
22                         b                  conditionaccuracy              bs 
23                         b                     conditionspeed              bs 
24 student_t(3, 0, 10)    sd                                                 bs 
25                        sd                                       id        bs 
26                        sd                  conditionaccuracy    id        bs 
27                        sd                     conditionspeed    id        bs 
28                         b                                                ndt 
29                         b                  conditionaccuracy             ndt 
30                         b                     conditionspeed             ndt 
31 student_t(3, 0, 10)    sd                                                ndt 
32                        sd                                       id       ndt 
33                        sd                  conditionaccuracy    id       ndt 
34                        sd                     conditionspeed    id       ndt

Priors can be defined with the prior or set_prior function allowing different levels of control. One benefit of the way the model is parameterized is that we only need to specify priors for one set of parameters per Wiener parameters (i.e., b) and do not have to distinguish between intercept and other parameters.

We base our choice of the priors on prior knowledge of likely parameter values for the Wiener model, but otherwise try to specify them in a weakly informative manner. That is, they should restrict the range to likely values but not affect the estimation any further. For the drift rate we use a Cauchy distribution with location 0 and scale 5 so that roughly 70% of prior mass are between -10 and 10. For the boundary separation we use a normal prior with mean 1.5 and standard deviation of 1, for the non-decision time a normal prior with mean 0.2 and standard deviation of 0.1, and for the bias we use a normal with mean of 0.5 (i.e., no-bias) and standard deviation of 0.2.

prior <- c(
 prior("cauchy(0, 5)", class = "b"),
 set_prior("normal(1.5, 1)", class = "b", dpar = "bs"),
 set_prior("normal(0.2, 0.1)", class = "b", dpar = "ndt"),
 set_prior("normal(0.5, 0.2)", class = "b", dpar = "bias")
)

With this information we can use the make_stancode function and inspect the full model code. The important thing is to make sure that all parameters listed in the parameters block have a prior listed in the model block. We can also see, at the beginning of the model block, that none of our parameters is transformed just as desired (a bug in a previous version of brms prevented anything but the default links for the Wiener model parameters).

make_stancode(formula, 
              family = wiener(link_bs = "identity", 
                              link_ndt = "identity",
                              link_bias = "identity"),
              data = speed_acc, 
              prior = prior)

 

// generated with brms 1.10.2
functions { 

  /* Wiener diffusion log-PDF for a single response
   * Args: 
   *   y: reaction time data
   *   dec: decision data (0 or 1)
   *   alpha: boundary separation parameter > 0
   *   tau: non-decision time parameter > 0
   *   beta: initial bias parameter in [0, 1]
   *   delta: drift rate parameter
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
   real wiener_diffusion_lpdf(real y, int dec, real alpha, 
                              real tau, real beta, real delta) { 
     if (dec == 1) {
       return wiener_lpdf(y | alpha, tau, beta, delta);
     } else {
       return wiener_lpdf(y | alpha, tau, 1 - beta, - delta);
     }
   }
} 
data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K;  // number of population-level effects 
  matrix[N, K] X;  // population-level design matrix 
  int<lower=1> K_bs;  // number of population-level effects 
  matrix[N, K_bs] X_bs;  // population-level design matrix 
  int<lower=1> K_ndt;  // number of population-level effects 
  matrix[N, K_ndt] X_ndt;  // population-level design matrix 
  int<lower=1> K_bias;  // number of population-level effects 
  matrix[N, K_bias] X_bias;  // population-level design matrix 
  // data for group-level effects of ID 1 
  int<lower=1> J_1[N]; 
  int<lower=1> N_1; 
  int<lower=1> M_1; 
  vector[N] Z_1_1; 
  vector[N] Z_1_2; 
  vector[N] Z_1_3; 
  vector[N] Z_1_4; 
  vector[N] Z_1_bs_5; 
  vector[N] Z_1_bs_6; 
  vector[N] Z_1_ndt_7; 
  vector[N] Z_1_ndt_8; 
  vector[N] Z_1_bias_9; 
  vector[N] Z_1_bias_10; 
  int<lower=1> NC_1; 
  int<lower=0,upper=1> dec[N];  // decisions 
  int prior_only;  // should the likelihood be ignored? 
} 
transformed data { 
  real min_Y = min(Y); 
} 
parameters { 
  vector[K] b;  // population-level effects 
  vector[K_bs] b_bs;  // population-level effects 
  vector[K_ndt] b_ndt;  // population-level effects 
  vector[K_bias] b_bias;  // population-level effects 
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations 
  matrix[M_1, N_1] z_1;  // unscaled group-level effects 
  // cholesky factor of correlation matrix 
  cholesky_factor_corr[M_1] L_1; 
} 
transformed parameters { 
  // group-level effects 
  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)'; 
  vector[N_1] r_1_1 = r_1[, 1]; 
  vector[N_1] r_1_2 = r_1[, 2]; 
  vector[N_1] r_1_3 = r_1[, 3]; 
  vector[N_1] r_1_4 = r_1[, 4]; 
  vector[N_1] r_1_bs_5 = r_1[, 5]; 
  vector[N_1] r_1_bs_6 = r_1[, 6]; 
  vector[N_1] r_1_ndt_7 = r_1[, 7]; 
  vector[N_1] r_1_ndt_8 = r_1[, 8]; 
  vector[N_1] r_1_bias_9 = r_1[, 9]; 
  vector[N_1] r_1_bias_10 = r_1[, 10]; 
} 
model { 
  vector[N] mu = X * b; 
  vector[N] bs = X_bs * b_bs; 
  vector[N] ndt = X_ndt * b_ndt; 
  vector[N] bias = X_bias * b_bias; 
  for (n in 1:N) { 
    mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n] + (r_1_2[J_1[n]]) * Z_1_2[n] + (r_1_3[J_1[n]]) * Z_1_3[n] + (r_1_4[J_1[n]]) * Z_1_4[n]; 
    bs[n] = bs[n] + (r_1_bs_5[J_1[n]]) * Z_1_bs_5[n] + (r_1_bs_6[J_1[n]]) * Z_1_bs_6[n]; 
    ndt[n] = ndt[n] + (r_1_ndt_7[J_1[n]]) * Z_1_ndt_7[n] + (r_1_ndt_8[J_1[n]]) * Z_1_ndt_8[n]; 
    bias[n] = bias[n] + (r_1_bias_9[J_1[n]]) * Z_1_bias_9[n] + (r_1_bias_10[J_1[n]]) * Z_1_bias_10[n]; 
  } 
  // priors including all constants 
  target += cauchy_lpdf(b | 0, 5); 
  target += normal_lpdf(b_bs | 1.5, 1); 
  target += normal_lpdf(b_ndt | 0.2, 0.1); 
  target += normal_lpdf(b_bias | 0.5, 0.2); 
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 10 * student_t_lccdf(0 | 3, 0, 10); 
  target += lkj_corr_cholesky_lpdf(L_1 | 1); 
  target += normal_lpdf(to_vector(z_1) | 0, 1); 
  // likelihood including all constants 
  if (!prior_only) { 
    for (n in 1:N) { 
      target += wiener_diffusion_lpdf(Y[n] | dec[n], bs[n], ndt[n], bias[n], mu[n]); 
    } 
  } 
} 
generated quantities { 
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1); 
  vector<lower=-1,upper=1>[NC_1] cor_1; 
  // take only relevant parts of correlation matrix 
  cor_1[1] = Cor_1[1,2]; 
  [...]
  cor_1[45] = Cor_1[9,10]; 
}

[The output was slightly modified.]

The last piece we need, before we can finally estimate the model, is a function that generates initial values. Without initial values that lead to an identifiable model for all data points, estimation will not start. The function needs to provide initial values for all parameters listed in the parameters block of the model. Note that many of those parameters have at least one dimension with a parameterized extent (e.g., K). We can use make_standata and create the data set used by brms for the estimation for obtaining the necessary information. We then use this data object (i.e., a list) for generating the correctly sized initial values in function initfun (note that initfun relies on the fact that tmp_dat is in the global environment which is something of a code smell).

tmp_dat <- make_standata(formula, 
                         family = wiener(link_bs = "identity", 
                              link_ndt = "identity",
                              link_bias = "identity"),
                            data = speed_acc, prior = prior)
str(tmp_dat, 1, give.attr = FALSE)
## List of 26
##  $ N          : int 10462
##  $ Y          : num [1:10462(1d)] 0.773 0.39 0.435  ...
##  $ K          : int 4
##  $ X          : num [1:10462, 1:4] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_1      : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_2      : num [1:10462(1d)] 0 1 1 1 1 1 0 1 1 0 ...
##  $ Z_1_3      : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_4      : num [1:10462(1d)] 1 0 0 0 0 0 1 0 0 1 ...
##  $ K_bs       : int 2
##  $ X_bs       : num [1:10462, 1:2] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bs_5   : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bs_6   : num [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ K_ndt      : int 2
##  $ X_ndt      : num [1:10462, 1:2] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_ndt_7  : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_ndt_8  : num [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ K_bias     : int 2
##  $ X_bias     : num [1:10462, 1:2] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bias_9 : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bias_10: num [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ J_1        : int [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ N_1        : int 17
##  $ M_1        : int 10
##  $ NC_1       : num 45
##  $ dec        : num [1:10462(1d)] 1 0 0 0 0 0 0 0 0 0 ...
##  $ prior_only : int 0

initfun <- function() {
  list(
    b = rnorm(tmp_dat$K),
    b_bs = runif(tmp_dat$K_bs, 1, 2),
    b_ndt = runif(tmp_dat$K_ndt, 0.1, 0.15),
    b_bias = rnorm(tmp_dat$K_bias, 0.5, 0.1),
    sd_1 = runif(tmp_dat$M_1, 0.5, 1),
    z_1 = matrix(rnorm(tmp_dat$M_1*tmp_dat$N_1, 0, 0.01),
                 tmp_dat$M_1, tmp_dat$N_1),
    L_1 = diag(tmp_dat$M_1)
  )
}

Estimation (i.e., Sampling)

Finally, we have all pieces together and can estimate the Wiener model using the brm function. Note that this will take roughly a full day, depending on the speed of your PC also longer. We also already increase the maximal treedepth to 15. We probably should have also increased adapt_delta above the default value of .8 as there are a few divergent transitions, but this is left as an exercise to the reader.

After estimation is finished, we see that there are a few (< 10) divergent transitions. If this were a real analysis and not only an example, we would need to increase adapt_delta to a larger value (e.g., .95 or .99) and rerun the estimation. In this case however, we immediately begin with the second step and obtain samples from the posterior predictive distribution using predict. For this it is important to specify the number of posterior samples (here we use 500). In addition, it is important to set summary = FALSE, for obtaining the actual posterior predictive distribution and not a summary of the posterior predictive distribution, and negative_rt = TRUE. The latter ensures that predicted responses to the lower boundary receive a negative sign whereas predicted responses to the upper boundary receive a positive sign.

fit_wiener <- brm(formula, 
                  data = speed_acc,
                  family = wiener(link_bs = "identity", 
                                  link_ndt = "identity",
                                  link_bias = "identity"),
                  prior = prior, inits = initfun,
                  iter = 1000, warmup = 500, 
                  chains = 4, cores = 4, 
                  control = list(max_treedepth = 15))
NPRED <- 500
pred_wiener <- predict(fit_wiener, 
                       summary = FALSE, 
                       negative_rt = TRUE, 
                       nsamples = NPRED)

Because both steps are quite time intensive (estimation 1 day, obtaining posterior predictives a few hours), we save the results of both steps. Given the comparatively large size of both objects, using the 'xz' compression (i.e., the strongest in R) seems like a good idea.

save(fit_wiener, file = "brms_wiener_example_fit.rda", 
     compress = "xz")
save(pred_wiener, file = "brms_wiener_example_predictions.rda", 
     compress = "xz")

The second part shows how to perform model diagnostics and how to asses the model fit. The third part shows how to test for differences in parameters between conditions.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/wiener-model-analysis-with-brms-part-i/feed/ 3 570
ANOVA in R: afex may be the solution you are looking for http://singmann.org/anova-in-r-afex-may-be-the-solution-you-are-looking-for/ http://singmann.org/anova-in-r-afex-may-be-the-solution-you-are-looking-for/#comments Mon, 05 Jun 2017 15:14:59 +0000 http://singmann.org/?p=485 Prelude: When you start with R and try to estimate a standard ANOVA , which is relatively simple in commercial software like SPSS, R kind of sucks. Especially for unbalanced designs or designs with repeated-measures replicating the results from such software in base R may require considerable effort. For a newcomer (and even an old timer) this can be somewhat off-putting. After I had gained experience developing my first package and was once again struggling with R and ANOVA I had enough and decided to develop afex. If you know this feeling, afex is also for you.


A new version of afex (0.18-0) has been accepted on CRAN a few days ago. This version only fixes a small bug that was introduced in the last version.  aov_ez did not work with more than one covariate (thanks to tkerwin for reporting this bug).

I want to use this opportunity to introduce one of the main functionalities of afex. It provides a set of functions that make calculating ANOVAs easy. In the default settings, afex automatically uses appropriate orthogonal contrasts for factors, transforms numerical variables into factors, uses so-called Type III sums of squares, and allows for any number of factors including repeated-measures (or within-subjects) factors and mixed/split-plot designs. Together this guarantees that the ANOVA results correspond to the results obtained from commercial statistical packages such as SPSS or SAS. On top of this, the ANOVA object returned by afex (of class afex_aov) can be directly used for follow-up or post-hoc tests/contrasts using the lsmeans package .

Example Data

Let me illustrate how to calculate an ANOVA with a simple example. We use data courtesy of Andrew Heathcote and colleagues . The data are lexical decision and word naming latencies for 300 words and 300 nonwords from 45 participants. Here we only look at three factors:

  • task is a between subjects (or independent-samples) factor: 25 participants worked on the lexical decision task (lexdec; i.e., participants had to make a binary decision whether or not the presented string is a word or nonword) and 20 participants on the naming task (naming; i.e., participant had to say the presented string out loud).
  • stimulus is a repeated-measures or within-subjects factor that codes whether a presented string was a word or nonword.
  • length is also a repeated-measures factor that gives the number of characters of the presented strings with three levels: 3, 4, and 5.

The dependent variable is the response latency or response time for each presented string. More specifically, as is common in the literature we analyze the log of the response times, log_rt. After excluding erroneous responses each participants responded to between 135 and 150 words and between 124 and 150 nonwords. To use this data in an ANOVA one needs to aggregate the data such that only one observation per participant and cell of the design (i.e., combination of all factors) remains. As we will see, afex does this automatically for us (this is one of the features I blatantly stole from ez).

library(afex)
data("fhch2010") # load data (comes with afex) 

mean(!fhch2010$correct) # error rate
# [1] 0.01981546
fhch <- droplevels(fhch2010[ fhch2010$correct,]) # remove errors

str(fhch2010) # structure of the data
# 'data.frame': 13222 obs. of  10 variables:
#  $ id       : Factor w/ 45 levels "N1","N12","N13",..: 1 1 1 1 1 1 1 1 ...
#  $ task     : Factor w/ 2 levels "naming","lexdec": 1 1 1 1 1 1 1 1 1 1 ...
#  $ stimulus : Factor w/ 2 levels "word","nonword": 1 1 1 2 2 1 2 2 1 2 ...
#  $ density  : Factor w/ 2 levels "low","high": 2 1 1 2 1 2 1 1 1 1 ...
#  $ frequency: Factor w/ 2 levels "low","high": 1 2 2 2 2 2 1 2 1 2 ...
#  $ length   : Factor w/ 3 levels "4","5","6": 3 3 2 2 1 1 3 2 1 3 ...
#  $ item     : Factor w/ 600 levels "abide","acts",..: 363 121 ...
#  $ rt       : num  1.091 0.876 0.71 1.21 0.843 ...
#  $ log_rt   : num  0.0871 -0.1324 -0.3425 0.1906 -0.1708 ...
#  $ correct  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

We first load the data and remove the roughly 2% errors. The structure of the data.frame (obtained via str()) shows us that the data has a few more factors than discussed here. To specify our ANOVA we first use function aov_car() which works very similar to base R aov(), but as all afex functions uses car::Anova() (read as function Anova() from package car) as the backend for calculating the ANOVA.

Specifying an ANOVA

(a1 <- aov_car(log_rt ~ task*length*stimulus + Error(id/(length*stimulus)), fhch))
# Contrasts set to contr.sum for the following variables: task
# Anova Table (Type 3 tests)
# 
# Response: log_rt
#                 Effect          df  MSE          F   ges p.value
# 1                 task       1, 43 0.23  13.38 ***   .22   .0007
# 2               length 1.83, 78.64 0.00  18.55 ***  .008  <.0001
# 3          task:length 1.83, 78.64 0.00       1.02 .0004     .36
# 4             stimulus       1, 43 0.01 173.25 ***   .17  <.0001
# 5        task:stimulus       1, 43 0.01  87.56 ***   .10  <.0001
# 6      length:stimulus 1.70, 72.97 0.00       1.91 .0007     .16
# 7 task:length:stimulus 1.70, 72.97 0.00       1.21 .0005     .30
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
# 
# Sphericity correction method: GG 
# Warning message:
# More than one observation per cell, aggregating the data using mean (i.e, fun_aggregate = mean)!

The printed output is an ANOVA table that could basically be copied to a manuscript as is. One sees the terms in column Effect, the degrees of freedoms (df), the mean-squared error (MSE, I would probably remove this column in a manuscript), the F-value (F, which also contains the significance stars), and the p-value (p.value). The only somewhat uncommon column is ges which provides generalized eta-squared, ‘the recommended effect size statistics for repeated measure designs’  . The standard output also reports Greenhouse-Geisser (GG) corrected df for repeated-measures factors with more than two levels (to account for possible violations of sphericity). Note that these corrected df are not integers.

We can also see a warning notifying us that afex has detected that each participant and cell of the design provides more than one observation which are then automatically aggregated using mean. The warning serves the purpose to notify the user in case this was not intended (i.e., when there should be only one observation per participant and cell of the design). The warning can be suppressed via specifying fun_aggregate = mean explicitly in the call to aov_car.

The formula passed to aov_car basically needs to be the same as for standard aov with a few differences:

  • It must have an Error term specifying the column containing the participant (or unit of observation) identifier (e.g., minimally +Error(id)). This is necessary to allow the automatic aggregation even in designs without repeated-measures factor.
  • Repeated-measures factors only need to be defined in the Error term and do not need to be enclosed by parentheses. Consequently, the following call produces the same ANOVA:
    aov_car(log_rt ~ task + Error(id/length*stimulus), fhch)

     

In addition to aov_car, afex provides two further function for calculating ANOVAs. These function produce the same output but differ in the way how to specify the ANOVA.

  • aov_ez allows the ANOVA specification not via a formula but via character vectors (and is similar to ez::ezANOVA()):
    aov_ez(id = "id", dv = "log_rt", fhch, between = "task", within = c("length", "stimulus"))
  • aov_4 requires a formula for which the id and repeated-measures factors need to be specified as in lme4::lmer() (with the same simplification that repeated-measures factors only need to be specified in the random part):
    aov_4(log_rt ~ task + (length*stimulus|id), fhch)
    aov_4(log_rt ~ task*length*stimulus + (length*stimulus|id), fhch)
    

Follow-up Tests

A common requirement after the omnibus test provided by the ANOVA is some-sort of follow-up analysis. For this purpose, afex is fully integrated with lsmeans .

For example, assume we are interested in the significant task:stimulus interaction. As a first step we might want to investigate the marginal means of these two factors:

lsmeans(a1, c("stimulus","task"))
# NOTE: Results may be misleading due to involvement in interactions
#  stimulus task        lsmean         SE    df    lower.CL    upper.CL
#  word     naming -0.34111656 0.04250050 48.46 -0.42654877 -0.25568435
#  nonword  naming -0.02687619 0.04250050 48.46 -0.11230839  0.05855602
#  word     lexdec  0.00331642 0.04224522 47.37 -0.08165241  0.08828525
#  nonword  lexdec  0.05640801 0.04224522 47.37 -0.02856083  0.14137684
# 
# Results are averaged over the levels of: length 
# Confidence level used: 0.95 

From this we can see naming trials seems to be generally slower (as a reminder, the dv is log-transformed RT in seconds, so values below 0 correspond to RTs bewteen 0 and 1), It also appears that the difference between word and nonword trials is larger in the naming task then in the lexdec task. We test this with the following code using a few different lsmeans function. We first use lsmeans again, but this time using task as the conditioning variable specified in by. Then we use pairs() for obtaining all pairwise comparisons within each conditioning strata (i.e., level of task). This provides us already with the correct tests, but does not control for the family-wise error rate across both tests. To get those, we simply update() the returned results and remove the conditioning by setting by=NULL. In the call to update we can already specify the method for error control and we specify 'holm',  because it is uniformly more powerful than Bonferroni.

# set up conditional marginal means:
(ls1 <- lsmeans(a1, c("stimulus"), by="task"))
# task = naming:
#  stimulus      lsmean         SE    df    lower.CL    upper.CL
#  word     -0.34111656 0.04250050 48.46 -0.42654877 -0.25568435
#  nonword  -0.02687619 0.04250050 48.46 -0.11230839  0.05855602
# 
# task = lexdec:
#  stimulus      lsmean         SE    df    lower.CL    upper.CL
#  word      0.00331642 0.04224522 47.37 -0.08165241  0.08828525
#  nonword   0.05640801 0.04224522 47.37 -0.02856083  0.14137684
# 
# Results are averaged over the levels of: length 
# Confidence level used: 0.95 
update(pairs(ls1), by=NULL, adjust = "holm")
#  contrast       task      estimate         SE df t.ratio p.value
#  word - nonword naming -0.31424037 0.02080113 43 -15.107  <.0001
#  word - nonword lexdec -0.05309159 0.01860509 43  -2.854  0.0066
# 
# Results are averaged over the levels of: length 
# P value adjustment: holm method for 2 tests

Hmm. These results show that the stimulus effects in both task conditions are independently significant. Obviously, the difference between them must also be significant then, or?

pairs(update(pairs(ls1), by=NULL))
# contrast                              estimate         SE df t.ratio p.value
# wrd-nnwrd,naming - wrd-nnwrd,lexdec -0.2611488 0.02790764 43  -9.358  <.0001

They obviously are. As a reminder, the interaction is testing exactly this, the difference of the difference. And we can actually recover the F-value of the interaction using lsmeans alone by invoking yet another of its functions, test(..., joint=TRUE).

test(pairs(update(pairs(ls1), by=NULL)), joint=TRUE)
# df1 df2      F p.value
#   1  43 87.565  <.0001

These last two example were perhaps not particularly interesting from a statistical point of view, but show an important ability of lsmeans. Any set of estimated marginal means produced by lsmeans, including any sort of (custom) contrasts, can be used again for further tests or calculating new sets of marginal means. And with test() we can even obtain joint F-tests over several parameters using joint=TRUE. lsmeans is extremely powerful and one of my most frequently used packages that basically performs all tests following an omnibus test (and in its latest version it directly interfaces with rstanarm so it can now also be used for a lot of Bayesian stuff, but this is the topic of another blog post).

Finally, lsmeans can also be used directly for plotting by envoking lsmip:

lsmip(a1, task ~ stimulus)

Note that lsmip does not add error bars to the estimated marginal means, but only plots the point estimates. There are mainly two reasons for this. First, as soon as repeated-measures factors are involved, it is difficult to decide which error bars to plot. Standard error bars based on the standard error of the mean are not appropriate for within-subjects comparisons. For those, one would need to use a within-subject intervals  (see also here or here). Especially for plots as the current one with both independent-samples and repeated-measures factors (i.e., mixed within-between designs or split-plot designs) no error bar will allow comparisons across both dimensions. Second, only ‘if the SE [i.e., standard error] of the mean is exactly 1/2 the SE of the difference of two means — which is almost never the case — it would be appropriate to use overlapping confidence intervals to test comparisons of means’ (lsmeans author Russel Lenth, the link provides an alternative).

We can also use lsmeans in combination with lattice to plot the results on the unconstrained scale (i.e., after back-transforming tha data from the log scale to the original scale of response time in seconds). The plot is not shown here.

lsm1 <- summary(lsmeans(a1, c("stimulus","task")))
lsm1$lsmean <- exp(lsm1$lsmean)
require(lattice)
xyplot(lsmean ~ stimulus, lsm1, group = task, type = "b", 
       auto.key = list(space = "right"))

 

Summary

  • afex provides a set of functions that make specifying standard ANOVAs for an arbitrary number of between-subjects (i.e., independent-sample) or within-subjects (i.e., repeated-measures) factors easy: aov_car(), aov_ez(), and aov_4().
  • In its default settings, the afex ANOVA functions replicate the results of commercial statistical packages such as SPSS or SAS (using orthogonal contrasts and Type III sums of squares).
  • Fitted ANOVA models can be passed to lsmeans for follow-up tests, custom contrast tests, and plotting.
  • For specific questions visit the new afex support forum: afex.singmann.science (I think we just need someone to ask the first ANOVA question to get the ball rolling).
  • For more examples see the vignette or here (blog post by Ulf Mertens) or download the full example R script used here.

As a caveat, let me end this post with some cautionary remarks from Douglas Bates (fortunes::fortune(184)) who explains why ANOVA in R is supposed to not be the same as in other software packages (i.e., he justifies why it ‘sucks’):

You must realize that R is written by experts in statistics and statistical computing who, despite popular opinion, do not believe that everything in SAS and SPSS is worth copying. Some things done in such packages, which trace their roots back to the days of punched cards and magnetic tape when fitting a single linear model may take several days because your first 5 attempts failed due to syntax errors in the JCL or the SAS code, still reflect the approach of “give me every possible statistic that could be calculated from this model, whether or not it makes sense”. The approach taken in R is different. The underlying assumption is that the useR is thinking about the analysis while doing it.
— Douglas Bates (in reply to the suggestion to include type III sums of squares and lsmeans in base R to make it more similar to SAS or SPSS)
R-help (March 2007)

Maybe he is right, but maybe what I have described here is useful to some degree.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/anova-in-r-afex-may-be-the-solution-you-are-looking-for/feed/ 8 485
Mixed models for ANOVA designs with one observation per unit of observation and cell of the design http://singmann.org/mixed-models-for-anova-designs-with-one-observation-per-unit-of-observation-and-cell-of-the-design/ http://singmann.org/mixed-models-for-anova-designs-with-one-observation-per-unit-of-observation-and-cell-of-the-design/#comments Mon, 29 May 2017 20:34:17 +0000 http://singmann.org/?p=499 Together with David Kellen I am currently working on an introductory chapter to mixed models for a book edited by Dan Spieler and Eric Schumacher (the current version can be found here). The goal is to provide a theoretical and practical introduction that is targeted mainly at experimental psychologists, neuroscientists, and others working with experimental designs and human data. The practical part focuses obviously on R, specifically on lme4 and afex.

One part of the chapter was supposed to deal with designs that cannot be estimated with the maximal random effects structure justified by the design because there is only one observation per participant and cell of the design. Such designs are the classical repeated-measures ANOVA design as ANOVA cannot deal with replicates at the cell levels (i.e., those are usually aggregated to yield one observation per cell and unit of observation). Based on my previous thoughts that turned out to be wrong we wrote the following:

Random Effects Structures for Traditional ANOVA Designs

The estimation of the maximal model is not possible when there is only one observation per participant and cell of a repeated-measures design. These designs are typically analyzed using a repeated-measures ANOVA. Currently, there are no clear guidelines on how to proceed in such situations, but we will try to provide some advice. If there is only a single random effects grouping factor, for example participants, we feel that instead of a mixed model, it is appropriate to use a standard repeated-measures ANOVA that addresses sphericity violations via the Greenhouse-Geisser correction.

One alternative strategy that employs mixed models and that we do not recommend consists of using the random-intercept only model or removing the random slopes for the highest within-subject interaction. The resulting model assumes invariance of the omitted random effects across participants. If this assumption is violated such a model produces results that cannot be trusted . […]

Fortunately, we asked Jake Westfall to take a look at the chapter and Jake responded:

I don’t think I agree with this. In the situation you describe, where we have a single random factor in a balanced ANOVA-like design with 1 observation per unit per cell, personally I am a proponent of the omit-the-the-highest-level-random-interaction approach. In this kind of design, the random slopes for the highest-level interaction are perfectly confounded with the trial-level error term (in more technical language, the model is only identifiable up to the sum of these two variance components), which is what causes the identifiability problems when one tries to estimate the full maximal model there. (You know all of this of course.) So two equivalent ways to make the model identifiable are to (1) omit the error term, i.e., force the residual variance to be 0, or (2) omit the random slopes for the highest-level interaction. Both of these approaches should (AFAIK) result in a statistically equivalent model, but lme4 does not provide an easy way to do (1), so I generally recommend (2). The important point here is that the standard errors should still be correct in either case — because these two variance components are confounded, omitting e.g. the random interaction slopes simply causes that omitted variance component to be implicitly added to the residual variance, where it is still incorporated into the standard errors of the fixed effects in the appropriate way (because the standard error of the fixed interaction looks roughly like sqrt[(var_error + var_interaction)/n_subjects]). I think one could pretty easily put together a little simulation that would demonstrate this.

Hmm, that sounds very reasonable, but can my intuition on the random effects structure and mixed models really be that wrong? To investigate this I followed Jake’s advise and coded a short simulation that tested this and as it turns out, Jake is right and I was wrong.

In the simulation we will simulate a simple one-factor repeated-measures design with one factor with three levels. Importantly, each unit of observation will only have one observation per factor level. We will then fit this simulated data with both repeated-measures ANOVA and random-intercept only mixed model and compare their p-values. Note again that for such a design we cannot estimate random slopes for the condition effect.

First, we need a few packages and set some parameters for our simulation:

require(afex)
set_sum_contrasts() # for orthogonal sum-to-zero contrasts
require(MASS) 

NSIM <- 1e4  # number of simulated data sets
NPAR <- 30  # number of participants per cell
NCELLS <- 3  # number of cells (i.e., groups)

Now we need to generate the data. For this I employed an approach that is clearly not the most parsimonious, but most clearly follows the formulation of a mixed model that has random variability in the condition effect and on top of this residual variance (i.e., the two confounded factors).

We first create a bare bone data.frame with participant id and condition column and a corresponding model.matrix. Then we create the three random parameters (i.e., intercept and the two parameters for the three conditions) using a zero-centered multivarite normal with specified variance-covariance matrix. We then loop over the participant and estimate the predictions deriving from the three random effects parameters. Only after this we add uncorrelated residual variance to the observations for each simulated data set.

dat <- expand.grid(condition = factor(letters[seq_len(NCELLS)]),
                   id = factor(seq_len(NPAR)))
head(dat)
#   condition id
# 1         a  1
# 2         b  1
# 3         c  1
# 4         a  2
# 5         b  2
# 6         c  2

mm <- model.matrix(~condition, dat)
head(mm)
#   (Intercept) condition1 condition2
# 1           1          1          0
# 2           1          0          1
# 3           1         -1         -1
# 4           1          1          0
# 5           1          0          1
# 6           1         -1         -1

Sigma_c_1 <- matrix(0.6, NCELLS,NCELLS)
diag(Sigma_c_1) <- 1
d_c_1 <- replicate(NSIM, mvrnorm(NPAR, rep(0, NCELLS), Sigma_c_1), simplify = FALSE)

gen_dat <- vector("list", NSIM)
for(i in seq_len(NSIM)) {
  gen_dat[[i]] <- dat
  gen_dat[[i]]$dv <- NA_real_
  for (j in seq_len(NPAR)) {
    gen_dat[[i]][(j-1)*3+(1:3),"dv"] <- mm[1:3,] %*% d_c_1[[i]][j,]
  }
  gen_dat[[i]]$dv <- gen_dat[[i]]$dv+rnorm(nrow(mm), 0, 1)
}

Now we only need a function that estimates the ANOVA and mixed model for each data set and returns the p-value and loop over it.

## functions returning p-value for ANOVA and mixed model
within_anova <- function(data) {
  suppressWarnings(suppressMessages(
  a <- aov_ez(id = "id", dv = "dv", data, within = "condition", return = "univariate", anova_table = list(es = "none"))
  ))
  c(without = a[["univariate.tests"]][2,6],
    gg = a[["pval.adjustments"]][1,2],
    hf = a[["pval.adjustments"]][1,4])
}

within_mixed <- function(data) {
  suppressWarnings(
    m <- mixed(dv~condition+(1|id),data, progress = FALSE)  
  )
  c(mixed=anova(m)$`Pr(>F)`)
}

p_c1_within <- vapply(gen_dat, within_anova, rep(0.0, 3))
m_c1_within <- vapply(gen_dat, within_mixed, 0.0)

The following graph shows the results (GG is the results using the Greenhouse-Geisser adjustment for sphericity violations).

ylim <- c(0, 700)
par(mfrow = c(1,3))
hist(p_c1_within[1,], breaks = 20, main = "ANOVA (default)", xlab = "p-value", ylim=ylim)
hist(p_c1_within[2,], breaks = 20, main = "ANOVA (GG)", xlab = "p-value", ylim=ylim)
hist(m_c1_within, breaks = 20, main = "Random-Intercept Model", xlab = "p-value", ylim=ylim)

What these graph clearly shows is that the p-value distribution for the standard repeated-measures ANOVA and the random-intercept mixed model is virtually identical. This clearly shows that my intuition was wrong and Jake was right.

We also see that for ANOVA and mixed model the rate of significant findings with p < .05 is slightly above the nominal level. More specifically:

mean(p_c1_within[1,] < 0.05) # ANOVA default
# [1] 0.0684
mean(p_c1_within[2,] < 0.05) # ANOVA GG
# [1] 0.0529
mean(p_c1_within[3,] < 0.05) # ANOVA HF
# [1] 0.0549
mean(m_c1_within < 0.05)     # random-intercept mixed model
# [1] 0.0701

These additional results indicate that maybe one also needs to adjust the degrees of freedom for mixed models for violations of sphericity. But this is not the topic of today’s post.

To sum this up, this simulation shows that removing the highest-order random slope seems to be the right decision if one wants to use a mixed model for a design with one observation per cell of the design and participant, but wants to implement the ‘maximal random effects structure’.

One more thing to note. Ben Bolker raised the same issue and pointed us to one of his example analyses of the starling data that is relevant to the current question (alternatively, the more up to date Rmd file). We are very grateful that Jake and Ben took the time to go through our chapter!

You can also download the RMarkdown file of the simulation.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/mixed-models-for-anova-designs-with-one-observation-per-unit-of-observation-and-cell-of-the-design/feed/ 3 499
rtdists 0.7-2: response time distributions now with Rcpp and faster http://singmann.org/rtdists-0-7-2-response-time-distributions-now-with-rcpp-and-faster/ http://singmann.org/rtdists-0-7-2-response-time-distributions-now-with-rcpp-and-faster/#comments Fri, 26 May 2017 07:11:09 +0000 http://singmann.org/?p=475 It took us quite a while but we have finally released a new version of rtdists to CRAN which provides a few significant improvements. As a reminder, rtdists

[p]rovides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA)  with different distributions underlying the drift rate.

The main reason it took us relatively long to push the new version was that we had a problem with the C code for the diffusion model that we needed to sort out first. Specifically, the CDF (i.e., pdiffusion) in versions prior to 0.5-2 did not produce correct results in many cases (one consequence of this is that the model predictions given in the previous blog post are wrong). As a temporary fix, we resorted to the correct but slow numerical integration of the PDF (i.e., ddiffusion) to obtain the CDF in version 0.5-2 and later. Importantly, it appears as if the error was not present in  fastdm which is the source of the C code we use. Matthew Gretton carefully investigated the original C code, changed it such that it connects to R via Rcpp, and realized that there are two different variants of the CDF, a fast variant and a precise variant. Up to this point we had only used the fast variant and, as it turns out, this was responsible for our incorrect results. We now per default use the precise variant (which only seems to be marginally slower) as it produces the correct results for all cases we have tested (and we have tested quite a few).

In addition to a few more minor changes (see NEWS for full list), we made two more noteworthy changes. First, all diffusion functions as well as rLBA received a major performance update, mainly in situations with trial-wise parameters. Now it should almost always be fastest to call the diffusion functions (e.g., ddiffusion) only once with vectorized parameters instead of calling it several times for different sets of parameters. The speed up with the new version depends on the number of unique parameter sets, but even with only a few different sets the speed up should be clearly noticeable. For completely trial-wise parameters the speed-up should be quite dramatic.

Second, I also updated the vignette which now uses the tidyverse in, I believe, a somewhat more reasonable manner. Specifically, it now is built on nested data (via tidyr::nest) and purrr::map instead of relying heavily on dplyr::do.  The problem I had with dplyr::do is that it often leads to somewhat ugly syntax. The changes in the vignette are mainly due to me reading Chapter 25 in the great R for Data Science book by Wickham and Gorlemund. However, I still prefer lattice over ggplot2.

Example Analysis

To show the now correct behavior of the diffusion CDF let me repeat the example from the last post. As a reminder, we somewhat randomly pick one participant from the speed_acc data set and fit both diffusion model and LBA to the data.

require(rtdists)

# Exp. 1; Wagenmakers, Ratcliff, Gomez, & McKoon (2008, JML)
data(speed_acc)   
# remove excluded trials:
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) 
# create numeric response variable where 1 is an error and 2 a correct response: 
speed_acc$corr <- with(speed_acc, as.numeric(stim_cat == response))+1 
# select data from participant 11, accuracy condition, non-word trials only
p11 <- speed_acc[speed_acc$id == 11 & 
                   speed_acc$condition == "accuracy" & 
                   speed_acc$stim_cat == "nonword",] 
prop.table(table(p11$corr))
#          1          2 
# 0.04166667 0.95833333 


ll_lba <- function(pars, rt, response) {
  d <- dLBA(rt = rt, response = response, 
            A = pars["A"], 
            b = pars["A"]+pars["b"], 
            t0 = pars["t0"], 
            mean_v = pars[c("v1", "v2")], 
            sd_v = c(1, pars["sv"]), 
            silent=TRUE)
  if (any(d == 0)) return(1e6)
  else return(-sum(log(d)))
}

start <- c(runif(3, 0.5, 3), runif(2, 0, 0.2), runif(1))
names(start) <- c("A", "v1", "v2", "b", "t0", "sv")
p11_norm <- nlminb(start, ll_lba, lower = c(0, -Inf, 0, 0, 0, 0), 
                   rt=p11$rt, response=p11$corr)
p11_norm[1:3]
# $par
#          A         v1         v2          b         t0         sv 
#  0.1182940 -2.7409230  1.0449963  0.4513604  0.1243441  0.2609968 
# 
# $objective
# [1] -211.4202
# 
# $convergence
# [1] 0


ll_diffusion <- function(pars, rt, response) 
{
  densities <- ddiffusion(rt, response=response, 
                          a=pars["a"], 
                          v=pars["v"], 
                          t0=pars["t0"], 
                          sz=pars["sz"], 
                          st0=pars["st0"],
                          sv=pars["sv"])
  if (any(densities == 0)) return(1e6)
  return(-sum(log(densities)))
}

start <- c(runif(2, 0.5, 3), 0.1, runif(3, 0, 0.5))
names(start) <- c("a", "v", "t0", "sz", "st0", "sv")
p11_diff <- nlminb(start, ll_diffusion, lower = 0, 
                   rt=p11$rt, response=p11$corr)
p11_diff[1:3]
# $par
#         a         v        t0        sz       st0        sv 
# 1.3206011 3.2727202 0.3385602 0.4621645 0.2017950 1.0551706 
# 
# $objective
# [1] -207.5487
# 
# $convergence
# [1] 0

As is common, we pass the negative summed log-likelihood to the optimization algorithm (here trusty nlminb) and hence lower values of objective indicate a better fit. Results show that the LBA provides a somewhat better account. The interesting question is whether this somewhat better fit translates into a visibly better fit when comparing observed and predicted quantiles.

# quantiles:
q <- c(0.1, 0.3, 0.5, 0.7, 0.9)

## observed data:
(p11_q_c <- quantile(p11[p11$corr == 2, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4900 0.5557 0.6060 0.6773 0.8231 
(p11_q_e <- quantile(p11[p11$corr == 1, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4908 0.5391 0.5905 0.6413 1.0653 

### LBA:
# predicted error rate  
(pred_prop_correct_lba <- pLBA(Inf, 2, 
                               A = p11_norm$par["A"], 
                               b = p11_norm$par["A"]+p11_norm$par["b"], 
                               t0 = p11_norm$par["t0"], 
                               mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), 
                               sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.9581342

(pred_correct_lba <- qLBA(q*pred_prop_correct_lba, response = 2, 
                          A = p11_norm$par["A"], 
                          b = p11_norm$par["A"]+p11_norm$par["b"], 
                          t0 = p11_norm$par["t0"], 
                          mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), 
                          sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4871710 0.5510265 0.6081855 0.6809796 0.8301286
(pred_error_lba <- qLBA(q*(1-pred_prop_correct_lba), response = 1, 
                        A = p11_norm$par["A"], 
                        b = p11_norm$par["A"]+p11_norm$par["b"], 
                        t0 = p11_norm$par["t0"], 
                        mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), 
                        sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4684374 0.5529575 0.6273737 0.7233961 0.9314820


### diffusion:
# same result as when using Inf, but faster:
(pred_prop_correct_diffusion <- pdiffusion(rt = 20,  response = "upper",
                                      a=p11_diff$par["a"], 
                                      v=p11_diff$par["v"], 
                                      t0=p11_diff$par["t0"], 
                                      sz=p11_diff$par["sz"], 
                                      st0=p11_diff$par["st0"], 
                                      sv=p11_diff$par["sv"]))  
# [1] 0.964723

(pred_correct_diffusion <- qdiffusion(q*pred_prop_correct_diffusion, 
                                      response = "upper",
                                      a=p11_diff$par["a"], 
                                      v=p11_diff$par["v"], 
                                      t0=p11_diff$par["t0"], 
                                      sz=p11_diff$par["sz"], 
                                      st0=p11_diff$par["st0"], 
                                      sv=p11_diff$par["sv"]))
# [1] 0.4748271 0.5489903 0.6081182 0.6821927 0.8444566
(pred_error_diffusion <- qdiffusion(q*(1-pred_prop_correct_diffusion), 
                                    response = "lower",
                                    a=p11_diff$par["a"], 
                                    v=p11_diff$par["v"], 
                                    t0=p11_diff$par["t0"], 
                                    sz=p11_diff$par["sz"], 
                                    st0=p11_diff$par["st0"], 
                                    sv=p11_diff$par["sv"]))
# [1] 0.4776565 0.5598018 0.6305120 0.7336275 0.9770047


### plot predictions

par(mfrow=c(1,2), cex=1.2)
plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "LBA")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_lba, q*pred_prop_correct_lba, type = "b")
lines(pred_error_lba, q*(1-pred_prop_correct_lba), type = "b")
legend("right", legend = c("data", "predictions"), pch = c(2, 1), lty = c(0, 1))

plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "Diffusion")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_diffusion, q*pred_prop_correct_diffusion, type = "b")
lines(pred_error_diffusion, q*(1-pred_prop_correct_diffusion), type = "b")

The fit plot compares observed quantiles (as triangles) with predicted quantiles (circles connected by lines). Here we decided to plot the 10%, 30%, 50%, 70% and 90% quantiles. In each plot, the x-axis shows RTs and the y-axis cumulative probabilities. From this it follows that the upper line and points correspond to the correct trials (which are common) and the lower line and points to the incorrect trials (which are uncommon). For both models the fit looks pretty good especially for the correct trials. However, it appears the LBA does a slightly better job in predicting very fast and slow trials here, which may be responsible for the better fit in terms of summed log-likelihood. In contrast, the diffusion model seems somewhat better in predicting the long tail of the error trials.

Checking the CDF

Finally, we can also check whether the analytical CDF does in fact correspond to the empirical CDF of the data. For this we can compare the analytical CDF function pdiffusion to the empirical CDF obtained from random deviates. One thing one needs to be careful about is that pdiffusion provides the ‘defective’ CDF that only approaches one if one adds the CDF for both response boundaries. Consequently, to compare the empirical CDF for one response with the analytical CDF, we need to scale the latter to also go from 0 to 1 (simply by dividing it by its maximal value). Here we will use the parameters values obtained in the previous fit.

rand_rts <- rdiffusion(1e5, a=p11_diff$par["a"], 
                            v=p11_diff$par["v"], 
                            t0=p11_diff$par["t0"], 
                            sz=p11_diff$par["sz"], 
                            st0=p11_diff$par["st0"], 
                            sv=p11_diff$par["sv"])
plot(ecdf(rand_rts[rand_rts$response == "upper","rt"]))

normalised_pdiffusion = function(rt,...) pdiffusion(rt,...)/pdiffusion(rt=Inf,...) 
curve(normalised_pdiffusion(x, response = "upper",
                            a=p11_diff$par["a"], 
                            v=p11_diff$par["v"], 
                            t0=p11_diff$par["t0"], 
                            sz=p11_diff$par["sz"], 
                            st0=p11_diff$par["st0"], 
                            sv=p11_diff$par["sv"]), 
      add=TRUE, col = "yellow", lty = 2)

This figure shows that the analytical CDF (in yellow) lies perfectly on top the empirical CDF (in black). If it does not for you, you still use an old version of rtdists. We have also added a series of unit tests to rtdists that compare the empirical CDF to the analytical CDF (using ks.test) for a variety of parameter values to catch if such a problem ever occurs again.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/rtdists-0-7-2-response-time-distributions-now-with-rcpp-and-faster/feed/ 2 475
New Version of rtdists on CRAN (v. 0.4-9): Accumulator Models for Response Time Distributions http://singmann.org/new-version-of-rtdists-on-cran-v-0-4-9-accumulator-models-for-response-time-distributions/ http://singmann.org/new-version-of-rtdists-on-cran-v-0-4-9-accumulator-models-for-response-time-distributions/#comments Sun, 03 Apr 2016 13:59:49 +0000 http://singmann.org/?p=391 I have just submitted a new version of rtdists to CRAN (v. 0.4-9). As I haven’t mentioned rtdists on here yet, let me simply copy it’s description as a short introduction, a longer introduction follows below:

Provides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA) with different distributions underlying the drift rate.

Cognitive models of response time distributions are (usually) bivariate distributions that simultaneously account for choices and corresponding response latencies. The arguably most prominent of these models are the Ratcliff diffusion model and the linear ballistic accumulator (LBA) . The main assumption of both is the idea of an internal evidence accumulation process. As soon as the accumulated evidence reaches a specific threshold the corresponding response is invariably given. To predict errors, the evidence accumulation process in each model can reach the wrong threshold (because it is noisy or because of variability in its direction). The central parameters of both models are the quality of the evidence accumulation process (the drift rate) and the position of the threshold. The latter can be voluntarily set by the decision maker, for example to trade off speed and accuracy. Additionally, the models can account for an initial bias towards one response (via position of the start point) and non-decision processes. To account for differences between the distribution besides their differential weighting (e.g., fast or slow errors) the models allow trial-by-trial variability of most parameters.

The new version of rtdists provides a completely new interface for the LBA and a considerably overhauled interface for the diffusion model. In addition the package now provides quantile functions for both models. In line with general R designs for distribution functions, the density starts with d (dLBA & ddiffusion), the distribution function with p (pLBA & pdiffusion), the quantile function with q (qLBA & qdiffusion), and the random generation with r (rLBA & rdiffusion). All main functions are now fully vectorized across all parameters and also across response (i.e., boundary or accumulator).

As an example, I will show how to estimate both models for a single individual data set using trial wise maximum likelihood estimation (in contrast to the often used binned chi-square estimation). We will be using one (somewhat randomly picked) participant from the data set that comes as an example with rtdists, speed_acc . Thanks to EJ Wagenmakers for providing this data and allowing it to be published on CRAN. We first prepare the data and plot the response time distribution.

require(rtdists)

require(lattice) # for plotting
lattice.options(default.theme = standard.theme(color = FALSE))
lattice.options(default.args = list(as.table = TRUE))

# Exp. 1; Wagenmakers, Ratcliff, Gomez, & McKoon (2008, JML)
data(speed_acc)   
# remove excluded trials:
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) 
# create numeric response variable where 1 is an error and 2 a correct response: 
speed_acc$corr <- with(speed_acc, as.numeric(stim_cat == response))+1 
# select data from participant 11, accuracy condition, non-word trials only
p11 <- speed_acc[speed_acc$id == 11 & speed_acc$condition == "accuracy" & speed_acc$stim_cat == "nonword",] 
prop.table(table(p11$corr))
#          1          2 
# 0.04166667 0.95833333 

densityplot(~rt, p11, group = corr, auto.key=TRUE, plot.points=FALSE, weights = rep(1/nrow(p11), nrow(p11)), ylab = "Density")

p11_nonwords_online

The plot obviously does not show the true density of both response time distributions (which can also be inferred from the warning messages produced by the call to densityplot) but rather the defective density in which only the sum of both integrals is one. This shows that there are indeed a lot more correct responses (around 96% of the data) and that the error RTs have quite a long tail.

To estimate the LBA for this data we simply need a wrapper function to which we can pass the RTs and responses and which will return the summed log-likelihood of all data points (actually the negative value of that because most optimizers minimize per default). This function and the data then only needs to be passed to our optimizer of choice (I like nlminb). To make the model identifiable we fix the SD of the drift rate for error RTs to 1 (other choices would be possible). The model converges at a maximum likelihood estimate (MLE) of 211.42 with parameters that look reasonable (not that the boundary b is parametrized as A + b). One might wonder about the mean negative dirft rate for error RTs, but the default for the LBA is a normal truncated at zero so even though the mean is negative, it only produces positive drift rates (negative drift rates could produce unidentified RTs).

ll_lba <- function(pars, rt, response) {
  d <- dLBA(rt = rt, response = response, A = pars["A"], b = pars["A"]+pars["b"], t0 = pars["t0"], mean_v = pars[c("v1", "v2")], sd_v = c(1, pars["sv"]), silent=TRUE)
  if (any(d == 0)) return(1e6)
  else return(-sum(log(d)))
}

start <- c(runif(3, 0.5, 3), runif(2, 0, 0.2), runif(1))
names(start) <- c("A", "v1", "v2", "b", "t0", "sv")
p11_norm <- nlminb(start, ll_lba, lower = c(0, -Inf, 0, 0, 0, 0), rt=p11$rt, response=p11$corr)
p11_norm
# $par
#          A         v1         v2          b         t0         sv 
#  0.1182951 -2.7409929  1.0449789  0.4513499  0.1243456  0.2609930 
# 
# $objective
# [1] -211.4202
# 
# $convergence
# [1] 0
# 
# $iterations
# [1] 57
# 
# $evaluations
# function gradient 
#       76      395 
# 
# $message
# [1] "relative convergence (4)"

We also might want to fit the diffusion model to these data. For this we need a similar wrapper. However, as the diffusion model can fail for certain parameter combinations the safest way is to wrap the ddiffusion call into a tryCatch call. Note that the diffusion model is already identified as the diffusion constant is set to 1 internally. Note that obtaining that fit can take longer than for the LBA and might need a few different tries with different random starting values to reach the MLE which is at 207.55. The lower MLE indicates that the diffusion model provides a somewhat worse account for this data set, but the parameters look reasonable.

ll_diffusion <- function(pars, rt, boundary) 
{
  densities <- tryCatch(ddiffusion(rt, boundary=boundary, a=pars[1], v=pars[2], t0=pars[3], z=0.5, sz=pars[4], st0=pars[5], sv=pars[6]), error = function(e) 0)
  if (any(densities == 0)) return(1e6)
  return(-sum(log(densities)))
}

start <- c(runif(2, 0.5, 3), 0.1, runif(3, 0, 0.5))
names(start) <- c("a", "v", "t0", "sz", "st0", "sv")
p11_fit <- nlminb(start, ll_diffusion, lower = 0, rt=p11$rt, boundary=p11$corr)
p11_fit
# $par
#         a         v        t0        sz       st0        sv 
# 1.3206011 3.2727201 0.3385602 0.3499652 0.2017950 1.0551704 
# 
# $objective
# [1] -207.5487
# 
# $convergence
# [1] 0
# 
# $iterations
# [1] 31
# 
# $evaluations
# function gradient 
#       50      214 
# 
# $message
# [1] "relative convergence (4)"

Finally, we might be interested to assess the fit of the models graphically in addition to simply comparing their MLEs (see also ). Specifically, we will produce a version of a quantile probability plot in which we plot for the .1, .3, .5, .7, and .9 quantile both the RTs and cumulative probabilities and compare the model predictions with those values from the data (see , pp. 162). For this we need both the CDFs and the quantile functions. The cumulative probabilities are simply the quantiles for each response, for example, the .1 quantile for the error RTs is .1 times the overall error rate (which is .04166667). Therefore, the first step in obtaining the model predictions is to obtain the predicted error rate by evaluating the CDF at infinity (or a high value). We use this obtained error rate then to get the actual quantiles for each response which are then used to obtain the corresponding predicted RTs using the quantile functions. Finally, we plot predictions and observed data separately for both models.

# quantiles:
q <- c(0.1, 0.3, 0.5, 0.7, 0.9)

## observed data:
(p11_q_c <- quantile(p11[p11$corr == 2, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4900 0.5557 0.6060 0.6773 0.8231 
(p11_q_e <- quantile(p11[p11$corr == 1, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4908 0.5391 0.5905 0.6413 1.0653 

### LBA:
# predicted error rate  
(pred_prop_correct_lba <- pLBA(Inf, 2, A = p11_norm$par["A"], b = p11_norm$par["A"]+p11_norm$par["b"], t0 = p11_norm$par["t0"], mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.9581342

(pred_correct_lba <- qLBA(q*pred_prop_correct_lba, response = 2, A = p11_norm$par["A"], b = p11_norm$par["A"]+p11_norm$par["b"], t0 = p11_norm$par["t0"], mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4871709 0.5510265 0.6081855 0.6809797 0.8301290
(pred_error_lba <- qLBA(q*(1-pred_prop_correct_lba), response = 1, A = p11_norm$par["A"], b = p11_norm$par["A"]+p11_norm$par["b"], t0 = p11_norm$par["t0"], mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4684367 0.5529569 0.6273732 0.7233959 0.9314825


### diffusion:
# same result as when using Inf, but faster:
(pred_prop_correct_diffusion <- do.call(pdiffusion, args = c(rt = 20, as.list(p11_fit$par), boundary = "upper")))  
# [1] 0.938958

(pred_correct_diffusion <- qdiffusion(q*pred_prop_correct_diffusion, a=p11_fit$par["a"], v=p11_fit$par["v"], t0=p11_fit$par["t0"], sz=p11_fit$par["sz"], st0=p11_fit$par["st0"], sv=p11_fit$par["sv"], boundary = "upper"))
# [1] 0.4963608 0.5737010 0.6361651 0.7148225 0.8817063
(pred_error_diffusion <- qdiffusion(q*(1-pred_prop_correct_diffusion), a=p11_fit$par["a"], v=p11_fit$par["v"], t0=p11_fit$par["t0"], sz=p11_fit$par["sz"], st0=p11_fit$par["st0"], sv=p11_fit$par["sv"], boundary = "lower"))
# [1] 0.4483908 0.5226722 0.5828972 0.6671577 0.8833553


### plot predictions

par(mfrow=c(1,2), cex=1.2)
plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "LBA")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_lba, q*pred_prop_correct_lba, type = "b")
lines(pred_error_lba, q*(1-pred_prop_correct_lba), type = "b")
legend("right", legend = c("data", "predictions"), pch = c(2, 1), lty = c(0, 1))

plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "Diffusion")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_diffusion, q*pred_prop_correct_diffusion, type = "b")
lines(pred_error_diffusion, q*(1-pred_prop_correct_diffusion), type = "b")

p11_predictions_online

The plot confirms the somewhat better fit for the LBA compared to the diffusion model for this data set; while the LBA provides a basically perfect fit for the correct RTs, the diffusion model is somewhat off, especially for the higher quantiles. However, both models have similar problems predicting the long tail for the error RTs.

Many thanks to my package coauthors, Andrew Heathcote, Scott Brown, and Matthew Gretton, for developing rtdists with me. And also many thanks to Andreas and Jochen Voss for releasing their C code of the diffusion model under the GPL.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/new-version-of-rtdists-on-cran-v-0-4-9-accumulator-models-for-response-time-distributions/feed/ 1 391
Hierarchical MPT in Stan I: Dealing with Convergent Transitions via Control Arguments http://singmann.org/hierarchical-mpt-in-stan-i-dealing-with-convergent-transitions-via-control-arguments/ http://singmann.org/hierarchical-mpt-in-stan-i-dealing-with-convergent-transitions-via-control-arguments/#comments Sat, 05 Mar 2016 12:54:12 +0000 http://singmann.org/?p=337 I have recently restarted working with Stan and unfortunately ran into the problem that my (hierarchical) Bayesian models often produced divergent transitions. And when this happens, the warning basically only suggests to increase adapt_delta:

Warning messages:
1: There were X divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
2: Examine the pairs() plot to diagnose sampling problems

However, increasing adapt_delta often does not help, even if one uses values such as .99. Also, I never found pairs() especially illuminating. This is the first of two blog posts dealing with this issue. In this (the first) post I will show which Stan settings need to be changed to remove the divergent transitions (to foreshadow, these are adapt_delta, stepsize, and max_treedepth). In the next blog post I will show how reparameterizations of the model following Stan recommendations can remove divergent transitions often without the necessity to extensively fiddle with the sampler settings while at the same time dramatically improve the fitting speed.

My model had some similarities to the multinomial processing tree (MPT) example in the Lee and Wagenmakers cognitive modeling book. As I am a big fan of both, MPTs and the book, I investigated the issue of divergent transitions using this example. Luckily, a first implementation of all the examples of Lee and Wagenmakers in Stan has been provided by Martin Šmíra (who is now working on his PhD in Birmingham) and is part of the Stan example models. I submitted a pull request with the changes to the model discussed here so they are now also part of the example models (and contains a README file discussing those changes).

The example uses the pair-clustering model also discussed in the paper introducing MPTs formally . The model has three parameters, c for cluster-storage, r for cluster-retrieval, and u for unique storage-retrieval. For the hierarchical structure the model employs the latent trait approach of : The group level (i.e., hyper-) parameters are estimated separately on the unconstrained space from -infinity to +infinity. Individual level parameters are added to the group means as displacements estimated from a multivariate normal with mean zero and freely estimated variance/covariance matrix. Only then is the unconstrained space mapped onto the unit range (i.e., 0 to 1), which represents the parameter space, via the probit transformation. This allows to freely estimate the correlation among the individual parameters on the unconstrained space and at the same time constrains the parameters after transformation onto the allowed range.

The original implementation employed two features that are particularly useful for models estimated via Gibbs sampling (as implemented in Jags), but not so much for the NUTS sampler implemented in Stan: (a) A scaled inverse Wishart as prior for the covariance matrix due to its computational convenience (following ) and (b) parameter expansion to move the scale parameters of the variance-covariance matrix away from zero and ensure reasonable priors.

The original implementation of the model in Stan is simply a literal translation of the Jags code given in Lee and Wagenmakers. Consequently, it retains the Gibbs specific features. When fitting this model it seems to produce stable estimates, but Stan reports several divergent transitions after warm up. Given that the estimates seem stable and the results basically replicate what is reported in Lee and Wagenmakers (Figures 14.5 and 14.6) one may wonder why not too trust these results. I can give no full explanation, so let me copy the relevant part from the shinystan help. Important is the last section, it clearly says not to use the results if there are any divergent transitions.

n_divergent

Quick definition The number of leapfrog transitions with diverging error. Because NUTS terminates at the first divergence this will be either 0 or 1 for each iteration. The average value of n_divergent over all iterations is therefore the proportion of iterations with diverging error.

More details

Stan uses a symplectic integrator to approximate the exact solution of the Hamiltonian dynamics and when stepsize is too large relative to the curvature of the log posterior this approximation can diverge and threaten the validity of the sampler. n_divergent counts the number of iterations within a given sample that have diverged and any non-zero value suggests that the samples may be biased in which case the step size needs to be decreased. Note that, because sampling is immediately terminated once a divergence is encountered, n_divergent should be only 0 or 1.

If there are any post-warmup iterations for which n_divergent = 1 then the results may be biased and should not be used. You should try rerunning the model with a higher target acceptance probability (which will decrease the step size) until n_divergent = 0 for all post-warmup iterations.

My first step trying to get rid of the divergent transitions was to increase adapt_delta as suggested by the warning. But as said initially, this did not help in this case even when using quite high values such as .99 or .999. Fortunately, the quote above tells that divergent transitions are related to the stepsize with which the sampler traverses the posterior. stepsize is also one of the control arguments one can pass to Stan in addition to adapt_delta. Unfortunately, the stan help page is relatively uninformative with respect to the stepsize argument and does not even provide its default value, it simply says stepsize (double, positive). Bob Carpenter clarified on the Stan mailing list that the default value is 1 (referring to the CMD Stan documentation). He goes on:

The step size is just the initial step size.  It lets the first few iterations move around a bit and set relative scales on the parameters.  It’ll also reduce numerical issues. On the negative side, it will also be slower because it’ll take more steps at a smaller step size before hitting a U-turn.

The adapt_delta (target acceptance rate) determines what the step size will be during sampling — the higher the accept rate, the lower the step size has to be.  The lower the step size, the less likely there are to be divergent (numerically unstable) transitions.

Taken together, this means that divergent transitions can be dealt with by increasing adapt_delta above the default value of .8 while at the same time decreasing the initial stepsize below the default value of 1. As this may increase the necessary number of steps one might also need to increase the max_treedepth above the default value of 10. After trying out various different values, the following set of control arguments seems to remove all divergent transitions in the example model (at the cost of prolonging the fitting process quite considerably):

control = list(adapt_delta = 0.999, stepsize = 0.001, max_treedepth = 20)

As this uses rstan, the R interface to stan, here the full call:

samples_1 <- stan(model_code=model,   
                  data=data, 
                  init=myinits,  # If not specified, gives random inits
                  pars=parameters,
                  iter=myiterations, 
                  chains=3, 
                  thin=1,
                  warmup=mywarmup,  # Stands for burn-in; Default = iter/2
                  control = list(adapt_delta = 0.999, stepsize = 0.01, max_treedepth = 15)
)

With these values the traceplots of the post-warmup samples look pretty good. Even for the sigma parameters which occasionally have problems moving away from 0. As you can see from these nice plots, rstan uses ggplot2.

traceplot(samples_1, pars = c("muc", "mur", "muu", "Omega", "sigma", "lp__"))

traceplots_orig

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/hierarchical-mpt-in-stan-i-dealing-with-convergent-transitions-via-control-arguments/feed/ 2 337
" ["raw"]=> string(1463986) "HTTP/1.1 200 OK Content-Type: application/rss+xml; charset=UTF-8 Transfer-Encoding: chunked Connection: close X-WS-RateLimit-Limit: 1000 X-WS-RateLimit-Remaining: 999 Date: Fri, 21 Mar 2025 10:13:11 GMT Server: Apache X-Powered-By: PHP/8.3.19 Vary: accept,content-type Link: ; rel="https://api.w.org/" Last-Modified: Fri, 22 Mar 2024 20:46:35 GMT ETag: "0d2bbeef40ab9477580aaf48ebef6249" Henrik Singmann – Computational Psychology http://singmann.org Tue, 23 Jun 2020 12:50:27 +0000 en-US hourly 1 73426105 Install R without support for long doubles (noLD) on Ubuntu http://singmann.org/install-r-without-support-for-long-doubles/ http://singmann.org/install-r-without-support-for-long-doubles/#comments Mon, 22 Jun 2020 20:08:19 +0000 http://singmann.org/?p=894 R packages on CRAN needs to pass a series of technical checks. These checks can also be invoked by any user when running R CMD check on the package tar.gz (to emulate CRAN as much as possible one should also set the --as-cran option when doing so). These checks need to be passed before a package is accepted on CRAN. In addition, these checks are regularly run for each package on CRAN to ensure that new R features or updates of upstream packages do not break the package. Furthermore, CRAN checks regularly become stricter. Thus, keeping a package on CRAN may require regular effort from the package maintainer. Whereas this sometimes can be rather frustrating for the maintainer, partly because of CRAN’s rather short two week limit in case of newly appearing issues, this is one the features that ensures the high technical quality of packages on CRAN.

As an example for the increasingly stricter checks, CRAN now also performs a set of additional checks in addition to the CRAN checks on all R platforms that are shown on a packages check page (e.g., for the MPTmultiverse). These additional checks include tests for memory access errors (e.g., using valgrind), R compiled using alternative compilers, different numerical algebra libraries, but also tests for an R version without support for long doubles (i.e., noLD). It now has happened for the second time that one of my packages showed a problem on the R version without long double support

In my case, the problem on the R version without long double support appeared in the package examples or in the unit tests of the package. Therefore, I did not only want to fix the check issue, I also wanted to understand what was happening. Thus, I needed a working version of R without support for long doubles. Unfortunately, the description of this setup is rather sparse. The only information on CRAN is rather sparse:

tests on x86_64 Linux with R-devel configured using --disable-long-double

Other details as https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-gcc

Similarly sparse information is given in Writing R Extensions:

If you must try to establish a tolerance empirically, configure and build R with –disable-long-double and use appropriate compiler flags (such as -ffloat-store and -fexcess-precision=standard for gcc, depending on the CPU type86) to mitigate the effects of extended-precision calculations.

Unfortunately, my first approach in which I simply tried to add the --disable-long-double option to the R-devel install script failed. After quite a bit of searching I found the solution on the RStudio community forum thanks to David F. Severski. In addition to --disable-long-double one also needs to add --enable-long-double=no to configure. At least on Ubuntu, this successfully compiles an R version without long double support. This can be confirmed with a call to capabilities() in R.

The rest of this post gives a list of all the packages I needed to install on a fresh Ubuntu version to successfully compile R in this way (e.g., from here). This set of packages should of course also hold for compiling normal R versions. I hope I did not forget too many packages, but this hopefully covers most. Feel free to post a comment if something is missing and I will try to update the list.

sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install gfortran
sudo apt-get install gcc-multilib
sudo apt-get install gobjc++
sudo apt-get install libpcre2-dev
sudo apt-get install xorg-dev
sudo apt-get install libcurl4-openssl-dev
sudo apt-get install libbz2-dev
sudo apt-get install liblzma-dev
sudo apt-get install libblas-dev
sudo apt-get install texlive-fonts-extra
sudo apt-get install default-jdk
sudo apt-get install aptitude
sudo aptitude install libreadline-dev
sudo apt-get install curl

In addition to the necessary packages, the following packages probably lead to a better R user experience (after installing these a restart may help):

sudo apt-get install xfonts-100dpi 
sudo apt-get install xfonts-75dpi
sudo apt-get install qpdf
sudo apt-get install pandoc
sudo apt-get install libssl-dev
sudo apt-get install libxml2-dev
sudo apt-get install git
sudo apt-get install gdebai-core
sudo apt-get install libcairo2-dev
sudo apt-get install libtiff-dev

The last two packages should allow you to add --with-cairo=yes to the configure script below. The package above might be needed for installing RStudio.

After this, we should be able to build R. For this, I followed the `RStudio` instructions for installing multiple R versions in parallel. We begin by setting an environment variable and downloading R.

export R_VERSION=4.0.1

curl -O https://cran.rstudio.com/src/base/R-4/R-${R_VERSION}.tar.gz
tar -xzvf R-${R_VERSION}.tar.gz
cd R-${R_VERSION}

We can then install R (here I set the options for disabling long doubles):

./configure \
    --prefix=/opt/R/${R_VERSION} \
    --enable-R-shlib \
    --with-blas \
    --with-lapack \
    --disable-long-double \
    --enable-long-double=no

make 
sudo make install

To test the installation we can use:

/opt/R/${R_VERSION}/bin/R --version

Finally, we need to create a symbolic link:

sudo ln -s /opt/R/${R_VERSION}/bin/R /usr/local/bin/R
sudo ln -s /opt/R/${R_VERSION}/bin/Rscript /usr/local/bin/Rscript

We can then run R and check the capabilities of the installation:

> capabilities()
       jpeg         png        tiff       tcltk         X11 
      FALSE        TRUE       FALSE       FALSE        TRUE 
       aqua    http/ftp     sockets      libxml        fifo 
      FALSE        TRUE        TRUE        TRUE        TRUE 
     cledit       iconv         NLS     profmem       cairo 
       TRUE        TRUE        TRUE       FALSE       FALSE 
        ICU long.double     libcurl 
       TRUE       FALSE        TRUE

Or shorter:

> capabilities()[["long.double"]]
[1] FALSE

 

 

 

 

]]>
http://singmann.org/install-r-without-support-for-long-doubles/feed/ 2 894
afex_plot(): Publication-Ready Plots for Factorial Designs http://singmann.org/afex_plot/ http://singmann.org/afex_plot/#respond Tue, 25 Sep 2018 17:44:35 +0000 http://singmann.org/?p=744 I am happy to announce that a new version of afex (version 0.22-1) has appeared on CRAN. This version comes with two major changes, for more see the NEWS file. To get the new version including all packages used in the examples run:

install.packages("afex", dependencies = TRUE)

First, afex does not load or attach package emmeans automatically anymore. This reduces the package footprint and makes it more lightweight. If you want to use afex without using emmeans, you can do this now. The consequence of this is that you have to attach emmeans explicitly if you want to continue using emmeans() et al. in the same manner. Simply add library("emmeans") to the top of your script just below library("afex") and things remain unchanged. Alternatively, you can use emmeans::emmeans() without attaching the package.

Second and more importantly, I have added a new plotting function to afex. afex_plot() visualizes results from factorial experiments combining estimated marginal means and associated uncertainties (i.e., error bars) in the foreground with a depiction of the raw data in the background. Currently, afex_plots() supports ANOVAs and mixed models fitted with afex as well as mixed models fitted with lme4 (support for more models will come in the next version). As shown in the example below, afex_plots() makes it easy to produce nice looking plots that are ready to be incorporated into publications. Importantly, afex_plots() allows different types of error bars, including within-subjects confidence intervals, which makes it particularly useful for fields where such designs are very common (e.g., psychology). Furthermore, afex_plots() is built on ggplot2 and designed in a modular manner, making it easy to customize the plot to ones personal preferences.

afex_plot() requires the fitted model object as first argument and then has three arguments determining which factor or factors are displayed how:
x is necessary and specifies the factor(s) plotted on the x-axis
trace is optional and specifies the factor(s) plotted as separate lines (i.e., with each factor-level present at each x-axis tick)
panel is optional and specifies the factor(s) which separate the plot into different panels.

The further arguments make it easy to customize the plot in various ways. A comprehensive overview is provided in the new vignette, further details, specifically regarding the question of which type of error bars are supported, is given on its help page (which also has many more examples).

Let us look at an example. We take data from a 3 by 2 within-subject experiment that also features prominently in the vignette. Note that we plot within-subjects confidence intervals (by setting error = "within") and then customize the plot quite a bit by changing the theme, using nicer labels, removing some y-axis ticks, adding colour, and using a customized geom (geom_boxjitter from the ggpol package) for displaying the data in the background.

library("afex") 
library("ggplot2") 
data(md_12.1)
aw <- aov_ez("id", "rt", md_12.1, within = c("angle", "noise"))

afex_plot(aw, x = "angle", trace = "noise", error = "within",
          mapping = c("shape", "fill"), dodge = 0.7,
          data_geom = ggpol::geom_boxjitter, 
          data_arg = list(
            width = 0.5, 
            jitter.width = 0,
            jitter.height = 10,
            outlier.intersect = TRUE),
          point_arg = list(size = 2.5), 
          error_arg = list(size = 1.5, width = 0),
          factor_levels = list(angle = c("0°", "4°", "8°"),
                               noise = c("Absent", "Present")), 
          legend_title = "Noise") +
  labs(y = "RTs (in ms)", x = "Angle (in degrees)") +
  scale_y_continuous(breaks=seq(400, 900, length.out = 3)) +
  theme_bw(base_size = 15) + 
  theme(legend.position="bottom", panel.grid.major.x = element_blank())

ggsave("afex_plot.png", device = "png", dpi = 600,
       width = 8.5, height = 8, units = "cm") 

In the plot, the black dots are the means and the thick black lines the 95% within-subject confidence intervals. The raw data is displayed in the background with a half box plot showing the median and upper and lower quartile as well as the raw data. The raw data is jittered on the y-axis to avoid perfect overlap.


One final thing to note. In the vignette on CRAN as well as the help page there is an error in the code. The name of the argument for changing the labels of the factor-levels is factor_levels and not new_levels. The vignette linked above and here uses the correct argument name. This is already corrected on github and will be corrected on CRAN with the next release.

]]>
http://singmann.org/afex_plot/feed/ 0 744
Diffusion/Wiener Model Analysis with brms – Part III: Hypothesis Tests of Parameter Estimates http://singmann.org/wiener-model-analysis-with-brms-part-iii/ http://singmann.org/wiener-model-analysis-with-brms-part-iii/#comments Thu, 06 Sep 2018 15:58:49 +0000 http://singmann.org/?p=708 This is the third part of my blog series on fitting the 4-parameter Wiener model with brms. The first part discussed how to set up the data and model. The second part was concerned with (mostly graphical) model diagnostics and the assessment of the adequacy (i.e., the fit) of the model. This third part will inspect the parameter estimates of the model with the goal of determining whether there is any evidence for differences between the conditions. As before, this part is completely self sufficient and can be run without running the code of Parts I or II.

As I promised in the second part of this series of blog posts, the third part did not take another two months to appear. No, this time it took almost eight month. I apologize for this, but we all know the planning fallacy and a lot of more important things got into the way (e.g., teaching).

As this part is relatively long, I will provide a brief overview. The next section contains a short explanation for the way in which we will perform hypothesis testing. This is followed by a short section loading some packages and the fitted model object and giving a small recap of the model. After this comes one relatively long section looking at the drift rate parameters in various ways. Then we will take look at each of the other three parameters in turn. Of especial importance will be the subsection on the non-decision time. As described in more detail below, I believe that this parameter cannot be interpreted. In the end, I give a brief overview of some of the limits of the present model and how it could be improved upon.

Bayesian Hypothesis Testing

The goal of this post is to provide evidence for differences in parameter estimates between conditions. This posts will present different ways to do so. Importantly, different ways of how to produce such evidence is only meant in the technical sense. In statistical terms we will always do basically the same thing: inspect difference distributions resulting from linear combinations of cell-wise posterior distributions of the group-level model parameter estimates. The somewhat technical phrase “linear combinations of cell-wise posterior distributions” often simply means the difference between two distributions. For example, the difference distribution resulting from subtracting the posterior of the speed condition from the posterior of the accuracy condition.

As a reminder, a posterior distribution is the probability distribution of the parameter conditional on data and model (where the latter includes the parameter priors). It answers the question which parameters are likely given our prior knowledge and the data. Therefore, the posterior distribution of the difference answers, for example, which difference values between two conditions are likely or not. With such a difference distribution we can then do two things.

First, we can check whether the x%-highest posterior density (HPD) or credibility interval of this difference distribution includes 0. If 0 is within the 95% HPD interval it could be seen a plausible value. If 0 is outside the 95% interval we could regard it as not plausible enough and would conclude that there is evidence for a difference.

Second, we can evaluate how much of the difference distribution is on one side of 0. If this value is considerably away from 50%, this constitutes evidence for a difference. For example, if all of the posterior samples for a specific difference are larger than zero, this provides considerable evidence that the difference is above 0.

The approach of investigating posterior distributions to gauge differences between conditions is only one approach for hypothesis testing in a Bayesian setting. And, at least in the psychological literature, it is not the most popular one. More specifically, many of the more vocal proponents of Bayesian statistics in the psychological literature advocate hypothesis testing using Bayes factors (e.g., ). One prominent exception to this rule in psychology is maybe John . However, he proposes yet another approach of inference based on posterior distributions as used here. In general, I agree with many of the argument pro Bayes factor, especially in cases as the current one in which all relevant hypothesis or competing models are nested within one large (super) model.

The main difficulty when using Bayes factors is their extreme sensitivity to the parameter priors. In a situation with nested models, this is in principle not such a big problem, because one could use Jeffrey’s default prior approach (e.g., ). have extended this approach to general ANOVA designs (I am sure they were not the first to have this idea, but they were at least the first to popularize this idea in psychology). Quentin Gronau and colleagues have applied it to accumulator models, including the diffusion model. The general idea is to reparameterize the model using effect parameters which are normalized using, for example, the residual variance. For example, for a two sample design parameterize the model using a standardized difference such as Cohen’s d. Then it is comparatively easy and uncontroversial to put a prior on the standardized effect size measure. In the present case, in which the model does not contain a residual variance parameter, one could use the variance estimate of the group-level distribution for each parameter for such a normalization.

Unfortunately, brms does to the best of my knowledge not contain the ability to specify a parameterization and prior distribution in line with Jeffrey’s default Bayes factor. And as far as I remember a discussion I had on this topic with Paul Bürkner some time ago, it is also unlikely brms will ever get this ability. Consequently, I feel that brms is not the right tool for model selection using Bayes factors. Whereas it offers this ability now from a technical side (using our bridgesampling package), it only allows models with an unnormalized parameterization. I believe that such a parameterization is in most cases not appropriate for Bayes factors based model selection as the priors cannot be specified in a ‘default’ manner. Thus, I cannot recommend brms for Bayes factor based model selection at the moment. In sum, the reason for basing our inferences solely on posterior distributions in the present case is practical constraints and not philosophical considerations.

One final word of caution for the psychological readership. Whereas Bayes factors are clearly extremely popular in psychology, this is not the case in many other scientific disciplines. For example, the patron saint of applied Bayesian statistics, Andrew Gelman, is a self-declared opponent of Bayes factors: “I generally hate Bayes factors myself”. As far as I can see, this disagreement comes from the different type of data different people work with. When working with observational (or correlational) data, as Andrew Gelman usually does, tests of the presence of effects (or of nullity) are either a big no-no (e.g., when wanting to do causal inference) or simply not interesting. We know that the real world is full of relationships, especially small ones, between arbitrary things. So getting effects simply by increasing N is just not interesting and estimation is the more interesting approach. In contrast, for experimental data, we often have true null hypothesis and testing of those makes a lot of sense. For example, if Bem was right and there truly were PSI, we could surely exploit this somehow. But as far as we can tell, the effect is truly null. In this case we really need ypothesis testing.

Getting Started

We start with loading some packages for analyzing the posterior. Since the beginning of this series, I have more and more become a fan of the whole tidyverse, so we import it completely. We of course also need brms. As shown below, we will need a few more packages (especially emmeans and tidybayes), but these are only loaded when needed.

library("brms")
library("tidyverse")
theme_set(theme_classic()) # theme for ggplot2
options(digits = 3)

Then we will also need the posterior samples, we can load them in the same way as before from my github page. Note that we neither need the data nor the posterior predictive distribution this time.

tmp <- tempdir()
download.file("https://singmann.github.io/files/brms_wiener_example_fit.rda",
file.path(tmp, "brms_wiener_example_fit.rda"))
load(file.path(tmp, "brms_wiener_example_fit.rda"))

We begin with looking at the group-level posteriors. An overview of their posterior distributions can be obtained using the summary function.

#                                    Estimate Est.Error l-95% CI u-95% CI
# conditionaccuracy:frequencyhigh      -2.944    0.1971   -3.345   -2.562
# conditionspeed:frequencyhigh         -2.716    0.2135   -3.125   -2.299
# conditionaccuracy:frequencynw_high    2.238    0.1429    1.965    2.511
# conditionspeed:frequencynw_high       1.989    0.1785    1.626    2.332
# bs_conditionaccuracy                  1.898    0.1448    1.610    2.186
# bs_conditionspeed                     1.357    0.0813    1.200    1.525
# ndt_conditionaccuracy                 0.323    0.0173    0.289    0.358
# ndt_conditionspeed                    0.262    0.0154    0.232    0.293
# bias_conditionaccuracy                0.471    0.0107    0.449    0.491
# bias_conditionspeed                   0.499    0.0127    0.474    0.524
# Warning message:
# There were 7 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
# See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

As a reminder, we have data from a lexical decision task (i.e., participants have to decide whether presented strings are a word or not) and frequency is the factor determining the true status of a string, with high referring to words and nw_high to non-words. Consequently, for the drift rate (the first four rows in the results table) the frequency factor determines the sign of the parameter estimates with the drift rate for words (rows 1 and 2) being clearly negative (i.e., those trials mostly hit the lower boundary for the word decision) and the drift rate for non-words (rows 3 and 4) being clearly positive (i.e., those trials mostly hit the upper boundary for non-word decisions). Furthermore, there could be differences between the drift rates in the accuracy or speed conditions. Specifically, in the speed conditions drift rates seem to be less extreme (i.e., nearer to 0) compared to the accuracy conditions.

The other three parameters, only differ between the condition factor. Given the experimental manipulation of accuracy versus speed condition, we expect differences for the boundary separation, parameters starting with bs_. For the non-decision time, parameters starting with ndt_, there also appears to be a small effect as the 95% only overlap slightly. However, as discussed in detail below, we should be careful in interpreting this difference. Finally, for bias, parameters starting with bias_, there might be a difference or not. Furthermore, at least in the accuracy condition there appears to be a bias for “word” responses.

One way to test differences between conditions is using the hypothesis function in brms. However, I was not able to get it to work with the current model. I suspect the reason for this is the somewhat unconventional parameterizations where each cell gets one parameter (in some sense each cell has its own intercept, but there is no overall intercept). This contrasts with a more “standard” parameterization in which there is one intercept (for either the unweighted means or one of the cells) and the remaining parameters capture the differences between the intercept and the cell means. As a reminder, I chose this unconventional parameterization in the first place to make the specification of the parameters priors easier. Additionally, this is a common parameterization when programming cognitive models by hand.

emmeans and tidybayes: Differences in the Drift Rate

An alternative is to use the great emmeans package by Russel Lenth. I am a huge fan of emmeans and use it all the time when using “normal” statistical models (e.g., ANOVAs, mixed models), independent of whether I use frequentist methods (e.g., via afex) or Bayesian methods (e.g., rstanarm or brms). Unfortunately, it appears as if emmeans at the moment only allows an analysis of the main parameter of the response distribution for models estimated with brms, which in our case is the drift rate. If someone were to extend emmeans to allow using brms models with all parameters, I would be very happy and thankful. In any case, I highly recommend to check out the emmeans vignettes to get an overview of what type of follow-up tests are all possible with this great package.

As I recently learned, emmeans works quite nicely together with tidybayes, a package that enables working with posterior draws within the tidyverse. tidybayes has a surprisingly large package footprint (i.e., it imports quite a lot of other packages) for a package with a comparatively small functionality. I guess this is a consequence of being embedded within the tidyverse. In any case, many of the imported packages are already in the search path thanks to loading the tidyverse above and attaching should not take that long here.

library("emmeans")
library("tidybayes")

We begin with emmeans only to assure ourselves that it works as expected. For this, we get the estimated marginal means plus 95%-highest posterior density (HPD) intervals which match the output of the fixed effects for the estimate of the central tendency (which is the median of the posterior samples in both cases). As a reminder, the fact that the cell estimates match the parameter estimates is of course a consequence of the unusual parameterization which is picked up correctly by emmeans. The lower and upper bounds of the intervals differ slightly between the summary output from brms and emmeans, a consequence of using different ways of calculating the intervals (i.e., quantiles versus HPD intervals).

fit_wiener %>%
  emmeans( ~ condition*frequency) 
#  condition frequency emmean lower.HPD upper.HPD
#  accuracy  high       -2.94     -3.34     -2.56
#  speed     high       -2.72     -3.10     -2.28
#  accuracy  nw_high     2.24      1.96      2.50
#  speed     nw_high     1.99      1.64      2.34
# 
# HPD interval probability: 0.95

Using HPD Intervals And Histograms

As a first test, we are interested in assessing whether there is evidence for a difference between speed and accuracy conditions for both words (i.e., frequency = high) and non-words (i.e., frequency = nw_high). There are many ways to do this with emmeans one of them is via the by argument and the pairs function.

 

fit_wiener %>%
  emmeans("condition", by = "frequency") %>% 
  pairs
# frequency = high:
#  contrast         estimate lower.HPD upper.HPD
#  accuracy - speed   -0.225   -0.6964     0.256
# 
# frequency = nw_high:
#  contrast         estimate lower.HPD upper.HPD
#  accuracy - speed    0.249   -0.0647     0.550
# 
# HPD interval probability: 0.95

Here, we do not have a lot of evidence that there is a difference for either stimulus type, as both HPD intervals include 0.

Instead of getting the summary of the distribution via emmeans, we can also use the capabilities of tidybayes and extract the samples in a tidy way. Then we use one of the convenient aggregation functions coming with tidybayes and aggregate the samples based on the same conditioning variable. After trying a few different options, I have the feeling that emmeanshpd.summary() function uses the same approach for calculating HPD intervals as tidybayes, as both results match.

samp1 <- fit_wiener %>%
  emmeans("condition", by = "frequency") %>% 
  pairs %>% 
  gather_emmeans_draws()
samp1 %>% 
  median_hdi()
# # A tibble: 2 x 8
# # Groups:   contrast [1]
#   contrast         frequency .value  .lower .upper .width .point .interval
#   <fct>            <fct>      <dbl>   <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy - speed high      -0.225 -0.696   0.256   0.95 median hdi      
# 2 accuracy - speed nw_high    0.249 -0.0647  0.550   0.95 median hdi

Instead of the median, we can also use the mode as our point estimate. In the present case the differences between both are not large but noticeable for the word stimuli.

samp1 %>% 
  mode_hdi()
# # A tibble: 2 x 8
# # Groups:   contrast [1]
#   contrast         frequency .value  .lower .upper .width .point .interval
#   <fct>            <fct>      <dbl>   <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy - speed high      -0.190 -0.696   0.256   0.95 mode   hdi      
# 2 accuracy - speed nw_high    0.252 -0.0647  0.550   0.95 mode   hdi

Further, we might use a different way for calculating HPD intervals. I have the feeling, Rob Hyndman’s hdrcde package provides the most elaborated set of functions for estimating highest density intervals. Consequently, this is what we use next. Note that the package need to be installed for that.

To use it in a tidy way, we write a short function returning a data.frame in a list. Thus, when called within summarise we get a list-column. Consequently, we have to call unnest to get a nice output.

get_hdi <- function(x, level = 95) {
  tmp <- hdrcde::hdr(x, prob = level)
  list(data.frame(mode = tmp$mode[1], lower = tmp$hdr[1,1], upper = tmp$hdr[1,2]))
}
samp1 %>% 
  summarise(hdi = get_hdi(.value)) %>% 
  unnest
# # A tibble: 2 x 5
# # Groups:   contrast [1]
#   contrast         frequency   mode   lower upper
#   <fct>            <fct>      <dbl>   <dbl> <dbl>
# 1 accuracy - speed high      -0.227 -0.712  0.247
# 2 accuracy - speed nw_high    0.249 -0.0616 0.558

The results differ again slightly, but not too much. Perhaps more importantly, there is still no real evidence for a difference in the drift rate between conditions. Even when looking only at 80% HPD intervals there is only evidence for a difference for the non-word stimuli.

samp1 %>% 
  summarise(hdi = get_hdi(.value, level = 80)) %>% 
  unnest
# # A tibble: 2 x 5
# # Groups:   contrast [1]
#   contrast         frequency   mode   lower  upper
#   <fct>            <fct>      <dbl>   <dbl>  <dbl>
# 1 accuracy - speed high      -0.212 -0.540  0.0768
# 2 accuracy - speed nw_high    0.246  0.0547 0.442

Because we have the samples in a convenient form, we could now evaluate whether there is any evidence for a drift rate difference between conditions for both, word and non-word stimuli. One problem for this is, however, that the direction of the effect differs between words and non-words. This is a consequence from the fact that word stimuli require a response at the lower decision boundary and non-words a response at the upper boundary. Consequently, we need to multiply the effect with -1 for one of the conditions. After that, we can take the mean of both conditions. We do this via tidyverse magic and also add the number of values that are aggregated in this way to the table. This is just a precaution to make sure that our logic is correct and we always aggregate exactly two values. As the final check shows, this is the case.

samp2 <- samp1 %>% 
  mutate(val2 = if_else(frequency == "high", -1*.value, .value)) %>% 
  group_by(contrast, .draw) %>% 
  summarise(value = mean(val2),
            n = n())
all(samp2$n == 2)
# [1] TRUE

We can then investigate the resulting difference distribution. One way to do so is in a graphical manner via a histogram. As recommended by Hadley Wickham, it makes sense to play around with the number of bins a bit until the figure looks good. Given we have quite a large number of samples, 75 bins seemed good to me. With less bins there was not enough granularity, with more bins I felt there were too many small peaks.

ggplot(samp2, aes(value)) +
  geom_histogram(bins = 75) +
  geom_vline(xintercept = 0)

This shows that, whereas quite a bit of the posterior mass is to the right of 0, a non negligible part is still to the left. So there is some evidence for a difference, but it is still not very strong, even when looking at words and non-words together.

We can also investigate this difference distribution via the HPD intervals. To get a better overview we now look at several intervals sizes:

hdrcde::hdr(samp2$value, prob = c(99, 95, 90, 80, 85, 50))
# $`hdr`
#        [,1]  [,2]
# 99% -0.1825 0.669
# 95% -0.0669 0.554
# 90% -0.0209 0.505
# 85%  0.0104 0.471
# 80%  0.0333 0.445
# 50%  0.1214 0.340
# 
# $mode
# [1] 0.225
# 
# $falpha
#    1%    5%   10%   15%   20%   50% 
# 0.116 0.476 0.757 0.984 1.161 1.857 

This shows that only for the 85% interval and smaller intervals is 0 excluded. Note, you can use hdrcde::hdr.den instead of hdrcde::hdr to get a graphical overview of the output.

Using Bayesian p-values

An approach that requires less arbitrary cutoffs then HPDs (for which we have to define the width) is to calculate the actual proportion of samples below 0:

mean(samp2$value < 0)
# [1] 0.0665

As explained above, if this proportion would be small, this would constitute evidence for a difference. Here, the proportion of samples below 0 is .067. Unfortunately, .067 is a bit above the magical cutoff of .05, which is universally accepted as delineating small from big numbers, or perhaps more appropriately, likely from unlikely probabilities.

Let us look at such a proportion a bit more in depth. If two posterior distributions are lying exactly on top of each other, the resulting difference distribution is centered on 0 and exactly 50% of the difference distribution would be on either side of 0. Thus, a proportion of 50% corresponds to the least evidence for a difference, or alternatively, to the strongest evidence for an absence of a difference. One further consequence is that both, values near 0 and values near 1, are indicative of a difference, albeit in different directions. To make interpretation of these proportions easier, I suggest to always calculate them in such a way that small values represent evidence for a difference (e.g., by subtracting the proportion from 1 if it is above .5).

But what does this proportion tell us exactly? It represents the probability that there is a difference in a specific direction. Thus, it represents one-sided evidence for a difference. In contrast, for a 95% HPD we remove 2.5% from each sides of the difference distribution. To ensure this proportion has the same two-sided property as our HPD intervals, we need to multiply it by 2. A further benefit of this multiplication is that it stretches the range to the whole probability scale (i.e., from 0 to 1).

Thus, the resulting value is a probability (i.e., ranging from 0 to 1), with values near zero denoting evidence for a difference, and values near one provide some evidence against a difference. Thus, in contrast to a classical p-value it is a continuous measure of evidence for (when near 0) or against (when near 1) a difference between the parameter estimates. Given its superficial similarity with classical p-values (i.e., low values are seen as evidence for a difference), we could call this it a version of a Bayesian p-value or pB for short. In the present case we could say: The pB value for a difference between speed and accuracy conditions in drift rate across word and non-word stimuli is .13, indicating that the evidence for a difference is at best weak.

Bayesian p-values of course allows us to misuse them in the same way that we can misuse classical p-values. For example, by introducing arbitrary cutoff values, such as at .05. Imagine for a second that we are interested in testing whether there are differences in the absolute amount of evidence as measured via drift rate for any of the four cells of the design (I am not suggesting that is particularly sensible). For this, we would have to transform the posterior for all drift rates onto the same side (note, we do not want to take the absolute values as we still want to retain the information of switching from positive to negative drift rates or the other way around). For example, by multiplying the drift rate for words by -1. We do so and then inspect the cell means.

samp3 <- fit_wiener %>%
  emmeans( ~ condition*frequency) %>% 
  gather_emmeans_draws() %>% 
  mutate(.value = if_else(frequency == "high", -1 * .value, .value),
         intera = paste(condition, frequency, sep = ".")) 
samp3 %>% 
  mode_hdi(.value)
# # A tibble: 4 x 8
# # Groups:   condition [2]
#   condition frequency .value .lower .upper .width .point .interval
#   <fct>     <fct>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy  high        2.97   2.56   3.34   0.95 mode   hdi      
# 2 accuracy  nw_high     2.25   1.96   2.50   0.95 mode   hdi      
# 3 speed     high        2.76   2.28   3.10   0.95 mode   hdi      
# 4 speed     nw_high     2.00   1.64   2.34   0.95 mode   hdi

Inspection of the four cell means suggests that drift rate values for words are larger then the values for non-words.

To get an overview of all pairwise differences using an arbitrary cut-off value, I have written two functions that returns a compact letter display of all pairwise comparisons. The functions require the data in the wide format, with each column representing the draws for one parameter. Note that the compact letter display is calculated via another package, multcompView, which needs to be installed before using these functions.

get_p_matrix <- function(df, only_low = TRUE) {
  # pre-define matrix
  out <- matrix(-1, nrow = ncol(df), ncol = ncol(df), dimnames = list(colnames(df), colnames(df)))
  for (i in seq_len(ncol(df))) {
    for (j in seq_len(ncol(df))) {
      out[i, j] <- mean(df[,i] < df[,j]) 
    }
  }
  if (only_low) out[out > .5] <- 1- out[out > .5]
  out
}

cld_pmatrix <- function(model, pars, level = 0.05) {
  p_matrix <- get_p_matrix(model)
  lp_matrix <- (p_matrix < (level/2) | p_matrix > (1-(level/2)))
  cld <- multcompView::multcompLetters(lp_matrix)$Letters
  cld
}
samp3 %>% 
  ungroup() %>% ## to get rid of unneeded columns
  select(.value, intera, .draw) %>% 
  spread(intera, .value) %>% 
  select(-.draw) %>% ## we need to get rid of all columns not containing draws
  cld_pmatrix()
# accuracy.high accuracy.nw_high       speed.high    speed.nw_high 
#           "a"              "b"              "a"              "b"

In a compact letter display, conditions that share a common letter do not differ according to the criterion. Conditions that do not share a common letter do differ according to the criterion. Here, the compact letter display is not super informative and just recovers what we have seen above. The drift rates for the words form one group and the drift rates for the non-words form another group. In cases with more conditions or more complicated difference pattern compact letter displays can be quite informative.

We could have also used the functionality of tidybayes to inspect all pairwise comparisons. Note that it is important to use ungroup before invoking the compare_levels function. Otherwise we get an error that is difficult to understand (the grouping appears to be a consequence of using emmeans).

samp3 %>% 
  ungroup %>% 
  compare_levels(.value, by = intera) %>% 
  mode_hdi()
# # A tibble: 6 x 7
#   intera                           .value  .lower  .upper .width .point .interval
#   <fct>                             <dbl>   <dbl>   <dbl>  <dbl> <chr>  <chr>    
# 1 accuracy.nw_high - accuracy.high -0.715 -1.09   -0.351    0.95 mode   hdi      
# 2 speed.high - accuracy.high       -0.190 -0.696   0.256    0.95 mode   hdi      
# 3 speed.nw_high - accuracy.high    -0.946 -1.46   -0.526    0.95 mode   hdi      
# 4 speed.high - accuracy.nw_high     0.488  0.0879  0.876    0.95 mode   hdi      
# 5 speed.nw_high - accuracy.nw_high -0.252 -0.550   0.0647   0.95 mode   hdi      
# 6 speed.nw_high - speed.high       -0.741 -1.12   -0.309    0.95 mode   hdi

Differences in Other Parameters

As discussed above, to look at the differences in the other parameter we apparently cannot use emmeans anymore. Luckily, tidybayes still offers the possibility to extract the posterior samples in a tidy way using either gather_draws or spread_draws. It appears that for either of those you need to pass the specific variable names you want to extract. We get them via get_variables:

get_variables(fit_wiener)[1:10]
# [1] "b_conditionaccuracy:frequencyhigh"    "b_conditionspeed:frequencyhigh"      
# [3] "b_conditionaccuracy:frequencynw_high" "b_conditionspeed:frequencynw_high"   
# [5] "b_bs_conditionaccuracy"               "b_bs_conditionspeed"                 
# [7] "b_ndt_conditionaccuracy"              "b_ndt_conditionspeed"                
# [9] "b_bias_conditionaccuracy"             "b_bias_conditionspeed"

Boundary Separation

We will use spread_draws to analyze the boundary separation. First we extract the draws and then immediately calculate the difference distribution between both.

samp_bs <- fit_wiener %>%
  spread_draws(b_bs_conditionaccuracy, b_bs_conditionspeed) %>% 
  mutate(bs_diff = b_bs_conditionaccuracy - b_bs_conditionspeed)
samp_bs
# # A tibble: 2,000 x 6
#    .chain .iteration .draw b_bs_conditionaccuracy b_bs_conditionspeed bs_diff
#     <int>      <int> <int>                  <dbl>               <dbl>   <dbl>
#  1      1          1     1                   1.73                1.48   0.250
#  2      1          2     2                   1.82                1.41   0.411
#  3      1          3     3                   1.80                1.28   0.514
#  4      1          4     4                   1.85                1.42   0.424
#  5      1          5     5                   1.86                1.37   0.493
#  6      1          6     6                   1.81                1.36   0.450
#  7      1          7     7                   1.67                1.34   0.322
#  8      1          8     8                   1.90                1.47   0.424
#  9      1          9     9                   1.99                1.20   0.790
# 10      1         10    10                   1.76                1.19   0.569
# # ... with 1,990 more rows

Now we can of course use the same tools as above. For example, look at the histogram. Here, I again chose 75 bins.

samp_bs %>% 
  ggplot(aes(bs_diff)) +
  geom_histogram(bins = 75) +
  geom_vline(xintercept = 0)

The histogram reveals pretty convincing evidence for a difference. It appears as if only two samples are below 0. We confirm this suspicion and then calculate the Bayesian p-value. As it turns out, is is also extremely small.

sum(samp_bs$bs_diff < 0)
# [1] 2
mean(samp_bs$bs_diff < 0) *2
# [1] 0.002

All in all we can be pretty confident that manipulating speed versus accuracy conditions affects the boundary separation in the current data set. Exactly as expected.

Non-Decision Time

For assessing differences in the non-decision time, we use gather_draws. One benefit of this function compared to spread_draws is that it makes it easy to obtain the marginal estimates. As already said above, the HPD interval overlap only very little suggesting that there is a difference between the conditions. We save the resulting marginal estimates for later in a new data.frame called ndt_mean.

samp_ndt <- fit_wiener %>%
  gather_draws(b_ndt_conditionaccuracy, b_ndt_conditionspeed) 
(ndt_mean <- samp_ndt %>% 
  median_hdi())
# # A tibble: 2 x 7
#   .variable               .value .lower .upper .width .point .interval
#   <chr>                    <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
# 1 b_ndt_conditionaccuracy  0.323  0.293  0.362   0.95 median hdi      
# 2 b_ndt_conditionspeed     0.262  0.235  0.295   0.95 median hdi

To evaluate the difference, the easiest approach to me seems again to spread the two variables across rows and then calculate the difference (i.e., similar to starting with spread_draws in the first place). We can then again plot the resulting difference distribution.

samp_ndt2 <- samp_ndt %>% 
  spread(.variable, .value) %>% 
  mutate(ndt_diff = b_ndt_conditionaccuracy - b_ndt_conditionspeed)  

samp_ndt2 %>% 
  ggplot(aes(ndt_diff)) +
  geom_histogram(bins = 75) +
  geom_vline(xintercept = 0)

As previously speculated, there appears to be strong evidence for a difference. We can further confirm this via the Bayesian p-value:

mean(samp_ndt2$ndt_diff < 0) * 2
# [1] 0.005

So far this looks as if we found another clear difference in parameter estimates due to the manipulation. But this conclusion would be premature. In fact, investigating the non-decision time from the 4-parameter Wiener model estimated in this way is completely misleading. Instead of capturing a meaningful feature of the response time distribution, the non-decision time parameter is only sensitive to very few data points. Specifically, the non-decision time basically only reflects a specific feature of the distribution of minimum response times per participant and per condition or cell for which it is estimated. I will demonstrate this in the following for our example data.

We first need to load the data in the same manner as in the previous posts. We then calculate the minimum RTs per participant and condition.

data(speed_acc, package = "rtdists")
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) # remove extreme RTs
speed_acc <- droplevels(speed_acc[ speed_acc$frequency %in% 
                                     c("high", "nw_high"),])
min_val <- speed_acc %>% 
  group_by(condition, id) %>% 
  summarise(min = min(rt))

To investigate the problem, we want to graphically compare the the distribution of minimum RTs with the estimates for the non-decision time. For this, we need to add a condition column with matching condition names to the ndt_mean data.frame created above. Then, we can plot both into the same plot. We also add several summary statistics regarding the distribution of individual minimum RTs. Specifically, the black points show the individual minimum RTs for each of the two conditions; the blue + shows the median and the blue x the mean of the individual minimum RTs; the blue circle shows the midpoint between the largest and smallest value of the minimum RT distributions; the red square shows the point estimate of the non-decision time parameter with corresponding 95% HPD intervals.

ndt_mean$condition <- c("accuracy", "speed")

ggplot(min_val, aes(x = condition, y = min)) +
  geom_jitter(width = 0.1) +
  geom_pointrange(data = ndt_mean, 
                  aes(y = .value, ymin = .lower, ymax = .upper), 
                  shape = 15, size = 1, color = "red") +
  stat_summary(col = "blue", size = 3.5, shape = 3, 
               fun.y = "median", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 4, 
               fun.y = "mean", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 16, 
               fun.y = function(x) (min(x) + max(x))/2, 
               geom = "point")

What this graph rather impressively shows is that the estimate of the non-decision time almost perfectly matches the midpoint between largest and smallest minimum RT (i.e., the blue dot). Let us put this in perspective by comparing the number of minimum data points (i.e., the number of participants) to the number of total data points.

speed_acc %>% 
  group_by(condition) %>% 
  summarise(n())
# # A tibble: 2 x 2
#   condition `n()`
#   <fct>     <int>
# 1 accuracy   5221
# 2 speed      5241

length(unique(speed_acc$id))
# [1] 17

17 / 5000
# [1] 0.0034

This shows that the non-decision time parameter, one of only four model parameters, is essentially completely determined by less than .5% of the data. If any of these minimum RTs is an outlier (which at least in the accuracy condition seems likely) a single response time can have an immense influence on the parameter estimate. In other words, it can hardly be assumed that with the current implementation the non-decision time parameter reflects an actual latent process. Instead, it simply reflects the midpoint between smallest and largest minimum RT per participant and condition, slightly weighted toward the mass of the distribution of minimum RTs. This parameter estimate should not be used to draw substantive conclusions.

In the present case, this confound does not appear to be too consequential. If only one of the data points in the accuracy condition is an outlier and the other data points are faithful representatives of the leading edge of the response time distribution (which is essentially what the non-decision time is supposed to capture), the current parameter estimates underestimate the true difference. Using a more robust ad-hoc measure of the leading edge, specifically the 10% trimmed mean of the 40 fastest RTs per participant and condition plotted below, further supports this conclusion. This graph also does not contain any clear outliers anymore. For reference, the non-decision time estimates are still included. Nevertheless, having a parameter be essentially driven by very few data points seems completely at odds with the general idea of cognitive modeling and the interpretation of non-decision times obtained with such a model cannot be recommended.

min_val2 <- speed_acc %>% 
  group_by(condition, id) %>% 
  summarise(min = mean(sort(rt)[1:40], trim = 0.1))

ggplot(min_val2, aes(x = condition, y = min)) +
  geom_jitter(width = 0.1) +
  stat_summary(col = "blue", size = 3.5, shape = 3, 
               fun.y = "median", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 4, 
               fun.y = "mean", geom = "point") +
  stat_summary(col = "blue", size = 3.5, shape = 16, 
               fun.y = function(x) (min(x) + max(x))/2, 
               geom = "point") +
  geom_point(data = ndt_mean, aes(y = .value), shape = 15, 
             size = 2, color = "red")

It is important to note that this confound does not hold for all implementations of the diffusion model, but is specific to the 4-parameter Wiener model as implemented here. There are solutions for avoiding this problem, two of which I want to list here. First, one could add across trial variability in the non-decision time. This variability is often assumed to come a uniform distribution which can capture outliers at the leading edge of the response time distribution. Second, instead of only fitting a diffusion model one could assume that some of the responses are contaminants coming from a different process, for example random responses from a uniform distribution ranging from the absolute minimum to maximum RT. Technically, this would consitute a mixture model between the diffusion process and a uniform distribution with either a free or fixed mixture/contamination rate (e.g., ). It should be relatively easy to implement such a mixture model via a custom_family in brms and I hope to find the time to do that at some later point.

I am of course not the first one to discover this behavior of the 4-parameter Wiener model (see e.g., ). However, this problem seems especially prevalent in a Bayesian setting as the 4-parameter model variant is readily available and model variants appropriately dealing with this problem are not. Some time ago I asked Chris Donkin and Greg Cox what they thought would be the best way to address this issue and the one thing I remember from this discussion was Chris’ remark that, when he uses the 4-parameter Wiener model, he simply ignores the non-decision time parameter. That still seems like the best course of action to me.

I hope there are not too many papers out there that use the 4-parameter model in such a way and interpret differences in the non-decision time parameter. If you know of one, I would be interested to learn about it. Either write me a mail or post it in the comments below.

Starting Point / Bias

Finally, we can take a look at the starting point or bias. We do this again using spread_draws and then plot the resulting difference distribution.

samp_bias <- fit_wiener %>%
  spread_draws(b_bias_conditionaccuracy, b_bias_conditionspeed) %>% 
  mutate(bias_diff = b_bias_conditionaccuracy - b_bias_conditionspeed)
samp_bias %>% 
  ggplot(aes(bias_diff)) +
  geom_histogram(bins = 100) +
  geom_vline(xintercept = 0)

The difference distributions suggests there might be a difference. Consequently, we calculate the Bayesian p-value next. Note that we calculate the difference in the other direction this time so that evidence for a difference is represented by small values.

mean(samp_bias$bias_diff > 0) *2
# [1] 0.046

We get lucky and our Bayesian p-value is just below .05, encouraging us to believe that the difference is real. To round this up, we again take a look at the estimates:

fit_wiener %>%
  gather_draws(b_bias_conditionaccuracy, b_bias_conditionspeed) %>% 
  summarise(hdi = get_hdi(.value, level = 80)) %>% 
  unnest
# # A tibble: 2 x 4
#   .variable                 mode lower upper
#   <chr>                    <dbl> <dbl> <dbl>
# 1 b_bias_conditionaccuracy 0.470 0.457 0.484
# 2 b_bias_conditionspeed    0.498 0.484 0.516

Together with the evidence for a difference we can now postulate in a more confident manner that for the accuracy condition there is a bias toward the lower boundary and the “word” responses, whereas evidence accumulation starts unbiased in the speed condition.

Closing Words

This third part wraps up a the most important steps in a diffusion model analysis with brms. Part I shows how to setup the model, Part II shows how to evaluate the adequacy of the model, and the present Part III shows how to inspect the parameter and test hypotheses about them.

As I have mentioned quite a bit throughout these parts, the model used here is not the full diffusion model, but the 4-parameter Wiener model. Whereas this makes estimation possible in the first place, it comes with a few problems. One of them was discussed at length in the present part. The estimate of the non-decision time parameter essentially captures a feature of the distribution of minimum RTs. If these are contaminated by responses that cannot be assumed to come from the same process as the other responses (which I believe a priori to be quite likely), the estimate becomes rather meaningless. My take away from this is that I would not interpret these estimates at all. I feel that the dangers outweigh the benefits by far.

Another feature of the 4-parameter Wiener model is that, in the absence of a bias for any of the response options, it predicts equal mean response times for correct and error responses. This is perhaps the main theoretical constraint which has led to the development of many of the more highly parameterized model variants, such as the full (i.e., 7-parameter) diffusion model. An overview of this issue can, for example, be found in . They write (p. 335):

Depending on the experimental manipulation, RTs for errors are sometimes shorter than RTs for correct responses, sometimes longer, and sometimes there is a crossover in which errors are slower than correct responses when accuracy is low and faster than correct responses when accuracy is high. The models must be capable of capturing all these aspects of a data set.

For the present data we find a specific pattern that is often seen as typical. As shown below, error RTs are quite a bit slower than correct RTs in the accuracy condition. This effect cannot be found in the speed condition where, if anything, error RTs are faster than correct RTs.

speed_acc %>% 
  mutate(correct = stim_cat == response) %>% 
  group_by(condition, correct, id) %>% 
  summarise(mean = mean(rt), 
            se = mean(rt)/sqrt(n())) %>% 
  summarise(mean = mean(mean),
            se = mean(se))
# # A tibble: 4 x 4
# # Groups:   condition [?]
#   condition correct  mean     se
#   <fct>     <lgl>   <dbl>  <dbl>
# 1 accuracy  FALSE   0.751 0.339 
# 2 accuracy  TRUE    0.693 0.0409
# 3 speed     FALSE   0.491 0.103 
# 4 speed     TRUE    0.513 0.0314

Given this difference in the relative speeds of correct and error responses in the accuracy condition, it may seem unsurprising that the accuracy condition is also the one in which we have a measurable bias. Specifically, a bias towards the word responses. However, as can be seen by adding stim_cat into the group_by call above, the difference in the relative error rate is particularly strong for non-words where a bias toward words should lead to faster errors. Thus, it appears that some of the more subtle effects in the data are not fully accounted for in the current model variant.

The canonical way for dealing with differences in the relative speed of errors in diffusion modeling is via across-trial variabilities in the model parameters (see ). Variability in the starting point (introduced by Laming, 1968) allows errors RTs to be faster than correct RTs. Variability in the drift rate (introduced by ) allows error RTs to be slower than correct RTs. (As discussed above, variability in the non-decision time allows its parameter estimates to be less influenced by contaminates or individual outliers.) However, as described below, introducing these variabilities in a Bayesian framework comes with its own problems. Furthermore, there is a recent discussion of the value of these variabilities from a measurement standpoint.

Possible Future Extensions

Whereas this series comes to an end here, there are a few further things that seem either important, interesting, or viable. Maybe I will have some time in the future to talk about these as well, but I suggest to not expect those soon.

  • One important thing we have not yet looked at is the estimates of the group-level parameters (i.e., standard deviations and correlations). They may contain important information about the specific data set and research question, but also about the tradeoffs of the model parameters.

  • Replacing the pure Wiener process with a mixture between a Wiener and a uniform distribution to be able to interpret the non-decision time. As written above, this should be doable with a custom_family in brms.

  • As described above, one of the driving forces for modern response time models, such as the 7-parameter diffusion model, were differences in the relative speed of error and correct RTs. These are usually explained via variabilities in the model parameters. One relatively straight forward way to implement these variabilities in a Bayesian setting would be via the hierarchical structure. For example, each participant gets a by-trial random intercept for the drift rate, + (0+id||trial) (the double bar notation should ensure that these are uncorrelated across participants). Whereas this sounds conceptually simple, I doubt such a model will converge in a reasonable timeframe. Furthermore, as shown by , a model in which the shape of the variability distribution is essentially unconstrained (as is the case when only constraining it via the prior as suggested here) is not testable. The model becomes unfalsifiable as it can predict any data pattern. Given the importance of this approach from a theoretical point of view it nevertheless seems to be an extremely important angle to explore.

  • Fitting the Wiener model takes quite a lot of time. It would be interesting to compare the fit using full Bayesian inference (i.e., sampling as done here) with variational Bayes (i.e., parametric approximation of the posterior), which is also implemented in Stan. I expect that it does not work that well, but the comparison would still be interesting. Recently, diagnostics for variational Bayes were introduced.

  • The diffusion model is of course only one model for response time data. A popular alternative is the LBA. I know there are some implementations in Stan out there, so if they could be accessed via brms, this would be quite interesting.

The RMarkdown file for this post is available here.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/wiener-model-analysis-with-brms-part-iii/feed/ 2 708
Diffusion/Wiener Model Analysis with brms – Part II: Model Diagnostics and Model Fit http://singmann.org/wiener-model-analysis-with-brms-part-ii/ http://singmann.org/wiener-model-analysis-with-brms-part-ii/#comments Sun, 07 Jan 2018 20:08:31 +0000 http://singmann.org/?p=624 This is the considerably belated second part of my blog series on fitting diffusion models (or better, the 4-parameter Wiener model) with brms. The first part discusses how to set up the data and model. This second part is concerned with perhaps the most important steps in each model based data analysis, model diagnostics and the assessment of model fit. Note, the code in the part is completely self sufficient and can be run without running the code of part I.

Setup

At first, we load quite a few packages that we will need down the way. Obviously brms, but also some of the packages from the tidyverse (i.e., dplyr, tidyr, tibble, and ggplot2). It took me a little time to jump on the tidyverse bandwagon, but now that I use it more and more I cannot deny its utility. If your data can be made ‘tidy’, the coherent set of tools offered by the tidyverse make many seemingly complicated tasks pretty easy. A few examples of this will be shown below. If you need more introduction, I highly recommend the awesome ‘R for Data Science’ book by Grolemund and Wickham, which they made available for free! We also need gridExtra for combining plots and DescTools for the concordance correlation coefficient CCC used below.

library("brms")
library("dplyr")
library("tidyr")
library("tibble")    # for rownames_to_column
library("ggplot2")
library("gridExtra") # for grid.arrange
library("DescTools") # for CCC

As in part I, we need package rtdists for the data.

data(speed_acc, package = "rtdists")
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) # remove extreme RTs
speed_acc <- droplevels(speed_acc[ speed_acc$frequency %in% 
                                     c("high", "nw_high"),])
speed_acc$response2 <- as.numeric(speed_acc$response)-1

I have uploaded the binary R data file containing the fitted model object as well as the generated posterior predictive distributions to github, from which we can download them directly into R. Note that I needed to go the way via a temporary folder. If there is a way without that I would be happy to learn about it.

tmp <- tempdir()
download.file("https://singmann.github.io/files/brms_wiener_example_fit.rda", 
              file.path(tmp, "brms_wiener_example_fit.rda"))
download.file("https://singmann.github.io/files/brms_wiener_example_predictions.rda", 
              file.path(tmp, "brms_wiener_example_predictions.rda"))
load(file.path(tmp, "brms_wiener_example_fit.rda"))
load(file.path(tmp, "brms_wiener_example_predictions.rda"))

Model Diagnostics

We already know from part I that there are a few divergent transitions. If this were a real analysis we therefore would not be satisfied with the current fit and try to rerun brm with an increased adapt_delta with the hope that this removes the divergent transitions. The Stan warning guidelines clearly state that “the validity of the estimates is not guaranteed if there are post-warmup divergences”. However, it is unclear what the actual impact of the small number of divergent transitions (< 10) observed here is on the posterior. Also, it is unclear what one can do if adapt_delta cannot be increased anymore and the model also cannot be reparameterized. Should all fits with any divergent transitions be completely disregarded? I hope the Stan team provides more guidelines to such questions in the future.

Coming back to our fit, as a first step in our model diagnostics we check the R-hat statistic as well as the number of effective samples. Specifically, we look at the parameters with the highest R² and lowest number of effective samples.

tail(sort(rstan::summary(fit_wiener$fit)$summary[,"Rhat"]))
#                      sd_id__conditionaccuracy:frequencyhigh 
#                                                        1.00 
#                              r_id__bs[15,conditionaccuracy] 
#                                                        1.00 
#                                    b_bias_conditionaccuracy 
#                                                        1.00 
# cor_id__conditionspeed:frequencyhigh__ndt_conditionaccuracy 
#                                                        1.00 
#                                   sd_id__ndt_conditionspeed 
#                                                        1.00 
#  cor_id__conditionspeed:frequencynw_high__bs_conditionspeed 
#                                                        1.01 
head(sort(rstan::summary(fit_wiener$fit)$summary[,"n_eff"]))
#                                     lp__ 
#                                      462 
#        b_conditionaccuracy:frequencyhigh 
#                                      588 
#                sd_id__ndt_conditionspeed 
#                                      601 
#      sd_id__conditionspeed:frequencyhigh 
#                                      646 
#           b_conditionspeed:frequencyhigh 
#                                      695 
# r_id[12,conditionaccuracy:frequencyhigh] 
#                                      712

Both are unproblematic (i.e., R-hat < 1.05 and n_eff > 100) and suggest that the sampler has converged on the stationary distribution. If anyone has a similar oneliner to return the number of divergent transitions, I would be happy to learn about it.

We also visually inspect the chain behavior of a few semi-randomly selected parameters.

pars <- parnames(fit_wiener)
pars_sel <- c(sample(pars[1:10], 3), sample(pars[-(1:10)], 3))
plot(fit_wiener, pars = pars_sel, N = 6, 
     ask = FALSE, exact_match = TRUE, newpage = TRUE, plot = TRUE)

This visual inspection confirms the earlier conclusion. For all parameters the posteriors look well-behaved and the chains appears to mix well.

Finally, in the literature there are some discussions about parameter trade-offs for the diffusion and related models. These trade-offs supposedly make fitting the diffusion model in a Bayesian setting particularly complicated. To investigate whether fitting the Wiener model with HMC as implemented in Stan (i.e., NUTS) also shows this pattern we take a look at the joint posterior of the fixed-effects of the main Wiener parameters for the accuracy condition. For this we use the stanfit method of the pairs function and set the condition to "divergent__". This plots the few divergent transitions above the diagonal and the remaining samples below the diagonal.

pairs(fit_wiener$fit, pars = pars[c(1, 3, 5, 7, 9)], condition = "divergent__")

This plot shows some correlations, but nothing too dramatic. HMC appears to sample quite efficiently from the Wiener model.

Next we also take a look at the correlations across all parameters (not only the fixed effects).

posterior <- as.mcmc(fit_wiener, combine_chains = TRUE)
cor_posterior <- cor(posterior)
cor_posterior[lower.tri(cor_posterior, diag = TRUE)] <- NA
cor_long <- as.data.frame(as.table(cor_posterior))
cor_long <- na.omit(cor_long)
tail(cor_long[order(abs(cor_long$Freq)),], 10)
#                              Var1                         Var2   Freq
# 43432        b_ndt_conditionspeed  r_id__ndt[1,conditionspeed] -0.980
# 45972 r_id__ndt[4,conditionspeed] r_id__ndt[11,conditionspeed]  0.982
# 46972        b_ndt_conditionspeed r_id__ndt[16,conditionspeed] -0.982
# 44612        b_ndt_conditionspeed  r_id__ndt[6,conditionspeed] -0.983
# 46264        b_ndt_conditionspeed r_id__ndt[13,conditionspeed] -0.983
# 45320        b_ndt_conditionspeed  r_id__ndt[9,conditionspeed] -0.984
# 45556        b_ndt_conditionspeed r_id__ndt[10,conditionspeed] -0.985
# 46736        b_ndt_conditionspeed r_id__ndt[15,conditionspeed] -0.985
# 44140        b_ndt_conditionspeed  r_id__ndt[4,conditionspeed] -0.990
# 45792        b_ndt_conditionspeed r_id__ndt[11,conditionspeed] -0.991

This table lists the ten largest absolute values of correlations among posteriors for all pairwise combinations of parameters. The value in column Freq somewhat unintuitively is the observed  correlation among the posteriors of the two parameters listed in the two previous columns. To create this table I used a trick from SO using as.table, which is responsible for labeling the column containing the correlation value Freq.

What the table shows is some extreme correlations for the individual-level deviations (the first index in the squared brackets of the parameter names seems to be the participant number). Let us visualize these correlations as well.

pairs(fit_wiener$fit, pars = 
        c("b_ndt_conditionspeed", 
          "r_id__ndt[11,conditionspeed]",
          "r_id__ndt[4,conditionspeed]"), 
      condition = "divergent__")

This plot shows that some of the individual-level parameters are not well estimated.

However, overall these extreme correlations appear rather rarely.

hist(cor_long$Freq, breaks = 40)

Overall the model diagnostics do not show any particularly worrying behavior (with the exception of the divergent transitions). We have learned that a few of the individual-level estimates for some of the parameters are not very trustworthy. However, this does not disqualify the overall fit. The main take away from this fact is that we would need to be careful in interpreting the individual-level estimates. Thus, we assume the fit is okay and continue with the next step of the analysis.

Assessing Model Fit

We will now investigate the model fit. That is, we will investigate whether the model provides an adequate description of the observed data. We will mostly do so via graphical checks. To do so, we need to prepare the posterior predictive distribution and the data. As a first step, we combine the posterior predictive distributions with the data.

d_speed_acc <- as_tibble(cbind(speed_acc, as_tibble(t(pred_wiener))))

Then we calculate three important measures (or test statistics T()) on the individual level for each cell of the design (i.e., combination of condition and frequency factors):

  • Probability of giving an upper boundary response (i.e., respond “nonword”).
  • Median RTs for responses to the upper boundary.
  • Median RTs for the lower boundary.

We first calculate this for each sample of the posterior predictive distribution. We then summarize these three measures by calculating the median and some additional quantiles across the posterior predictive distribution. We calculate all of this in one step using a somewhat long combination of dplyr and tidyr magic.

d_speed_acc_agg <- d_speed_acc %>% 
  group_by(id, condition, frequency) %>%  # select grouping vars
  summarise_at(.vars = vars(starts_with("V")), 
               funs(prob.upper = mean(. > 0),
                    medrt.lower = median(abs(.[. < 0]) ),
                    medrt.upper = median(.[. > 0] )
               )) %>% 
  ungroup %>% 
  gather("key", "value", -id, -condition, -frequency) %>% # remove grouping vars
  separate("key", c("rep", "measure"), sep = "_") %>% 
  spread(measure, value) %>% 
  group_by(id, condition, frequency) %>% # select grouping vars
  summarise_at(.vars = vars(prob.upper, medrt.lower, medrt.upper), 
               .funs = funs(median = median(., na.rm = TRUE),
                            llll = quantile(., probs = 0.01,na.rm = TRUE),
                            lll = quantile(., probs = 0.025,na.rm = TRUE),
                            ll = quantile(., probs = 0.1,na.rm = TRUE),
                            l = quantile(., probs = 0.25,na.rm = TRUE),
                            h = quantile(., probs = 0.75,na.rm = TRUE),
                            hh = quantile(., probs = 0.9,na.rm = TRUE),
                            hhh = quantile(., probs = 0.975,na.rm = TRUE),
                            hhhh = quantile(., probs = 0.99,na.rm = TRUE)
               ))

Next, we calculate the three measures also for the data and combine it with the results from the posterior predictive distribution in one data.frame using left_join.

speed_acc_agg <- speed_acc %>% 
  group_by(id, condition, frequency) %>% # select grouping vars
  summarise(prob.upper = mean(response == "nonword"),
            medrt.upper = median(rt[response == "nonword"]),
            medrt.lower = median(rt[response == "word"])
  ) %>% 
  ungroup %>% 
  left_join(d_speed_acc_agg)

Aggregated Model-Fit

The first important question is whether our model can adequately describe the overall patterns in the data aggregated across participants. For this we simply aggregate the results obtained in the previous step (i.e., the summary results from the posterior predictive distribution as well as the test statistics from the data) using mean.

d_speed_acc_agg2 <- speed_acc_agg %>% 
  group_by(condition, frequency) %>% 
  summarise_if(is.numeric, mean, na.rm = TRUE) %>% 
  ungroup

We then use these summaries and plot predictions (in grey and black) as well as data (in red) for the three measures. The inner (fat) error bars show the 80% credibility intervals (CIs), the outer (thin) error bars show the 95% CIs. The black circle shows the median of the posterior predictive distributions.

new_x <- with(d_speed_acc_agg2, 
              paste(rep(levels(condition), each = 2), 
                    levels(frequency), sep = "\n"))

p1 <- ggplot(d_speed_acc_agg2, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  prob.upper_lll, ymax =  prob.upper_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  prob.upper_ll, ymax =  prob.upper_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = prob.upper_median), shape = 1) +
  geom_point(aes(y = prob.upper), shape = 4, col = "red") +
  ggtitle("Response Probabilities") + 
  ylab("Probability of upper resonse") + xlab("") +
  scale_x_discrete(labels = new_x)

p2 <- ggplot(d_speed_acc_agg2, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  medrt.upper_lll, ymax =  medrt.upper_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  medrt.upper_ll, ymax =  medrt.upper_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = medrt.upper_median), shape = 1) +
  geom_point(aes(y = medrt.upper), shape = 4, col = "red") +
  ggtitle("Median RTs upper") + 
  ylab("RT (s)") + xlab("") +
  scale_x_discrete(labels = new_x)

p3 <- ggplot(d_speed_acc_agg2, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  medrt.lower_lll, ymax =  medrt.lower_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  medrt.lower_ll, ymax =  medrt.lower_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = medrt.lower_median), shape = 1) +
  geom_point(aes(y = medrt.lower), shape = 4, col = "red") +
  ggtitle("Median RTs lower") + 
  ylab("RT (s)") + xlab("") +
  scale_x_discrete(labels = new_x)

grid.arrange(p1, p2, p3, ncol = 2)

 

Inspection of the plots show no dramatic misfit. Overall the model appears to be able to describe the general patterns in the data. Only the response probabilities for words (i.e., frequency = high) appears to be estimated too low. The red x appear to be outside the 80% CIs but possibly also outside the 95% CIs.

The plots of the RTs show an interesting (but not surprising) pattern. The posterior predictive distributions for the rare responses (i.e., “word” responses for upper/non-word stimuli and “nonword” response to lower/word stimuli) are relatively wide. In contrast, the posterior predictive distributions for the common responses are relatively narrow. In each case, the observed median is inside the 80% CI and also quite near to the predicted median.

Individual-Level Fit

To investigate the pattern of predicted response probabilities further, we take a look at them on the individual level. We again plot the response probabilities in the same way as above, but separated by participant id.

ggplot(speed_acc_agg, aes(x = condition:frequency)) +
  geom_linerange(aes(ymin =  prob.upper_lll, ymax =  prob.upper_hhh), 
                 col = "darkgrey") + 
  geom_linerange(aes(ymin =  prob.upper_ll, ymax =  prob.upper_hh), 
                 size = 2, col = "grey") + 
  geom_point(aes(y = prob.upper_median), shape = 1) +
  geom_point(aes(y = prob.upper), shape = 4, col = "red") +
  facet_wrap(~id, ncol = 3) +
  ggtitle("Prediced (in grey) and observed (red) response probabilities by ID") + 
  ylab("Probability of upper resonse") + xlab("") +
  scale_x_discrete(labels = new_x)

This plot shows a similar pattern as the aggregated data. For none of the participants do we observe dramatic misfit. Furthermore, response probabilities to non-word stimuli appear to be predicted rather well. In contrast, response probabilities for word-stimuli are overall predicted to be lower than observed. However, this misfit does not seem to be too strong.

As a next step we look at the coverage probabilities of our three measures across individuals. That is, we calculate for each of the measures, for each of the cells of the design, and for each of the CIs (i.e., 50%, 80%, 95%, and 99%), the proportion of participants for which the observed test statistics falls into the corresponding CI.

speed_acc_agg %>% 
  mutate(prob.upper_99 = (prob.upper >= prob.upper_llll) & 
           (prob.upper <= prob.upper_hhhh),
         prob.upper_95 = (prob.upper >= prob.upper_lll) & 
           (prob.upper <= prob.upper_hhh),
         prob.upper_80 = (prob.upper >= prob.upper_ll) & 
           (prob.upper <= prob.upper_hh),
         prob.upper_50 = (prob.upper >= prob.upper_l) & 
           (prob.upper <= prob.upper_h),
         medrt.upper_99 = (medrt.upper > medrt.upper_llll) & 
           (medrt.upper < medrt.upper_hhhh),
         medrt.upper_95 = (medrt.upper > medrt.upper_lll) & 
           (medrt.upper < medrt.upper_hhh),
         medrt.upper_80 = (medrt.upper > medrt.upper_ll) & 
           (medrt.upper < medrt.upper_hh),
         medrt.upper_50 = (medrt.upper > medrt.upper_l) & 
           (medrt.upper < medrt.upper_h),
         medrt.lower_99 = (medrt.lower > medrt.lower_llll) & 
           (medrt.lower < medrt.lower_hhhh),
         medrt.lower_95 = (medrt.lower > medrt.lower_lll) & 
           (medrt.lower < medrt.lower_hhh),
         medrt.lower_80 = (medrt.lower > medrt.lower_ll) & 
           (medrt.lower < medrt.lower_hh),
         medrt.lower_50 = (medrt.lower > medrt.lower_l) & 
           (medrt.lower < medrt.lower_h)
  ) %>% 
  group_by(condition, frequency) %>% ## grouping factors without id
  summarise_at(vars(matches("\\d")), mean, na.rm = TRUE) %>% 
  gather("key", "mean", -condition, -frequency) %>% 
  separate("key", c("measure", "ci"), "_") %>% 
  spread(ci, mean) %>% 
  as.data.frame()
#    condition frequency     measure    50     80    95    99
# 1   accuracy      high medrt.lower 0.706 0.8824 0.882 1.000
# 2   accuracy      high medrt.upper 0.500 0.8333 1.000 1.000
# 3   accuracy      high  prob.upper 0.529 0.7059 0.765 0.882
# 4   accuracy   nw_high medrt.lower 0.500 0.8125 0.938 0.938
# 5   accuracy   nw_high medrt.upper 0.529 0.8235 1.000 1.000
# 6   accuracy   nw_high  prob.upper 0.529 0.8235 0.941 0.941
# 7      speed      high medrt.lower 0.471 0.8824 0.941 1.000
# 8      speed      high medrt.upper 0.706 0.9412 1.000 1.000
# 9      speed      high  prob.upper 0.000 0.0588 0.588 0.647
# 10     speed   nw_high medrt.lower 0.706 0.8824 0.941 0.941
# 11     speed   nw_high medrt.upper 0.471 0.7647 1.000 1.000
# 12     speed   nw_high  prob.upper 0.235 0.6471 0.941 1.000

As can be seen, for the RTs, the coverage probability is generally in line with the width of the CIs or even above it. Furthermore, for the common response (i.e., upper for frequency = nw_high and lower for frequency = high), the coverage probability is 1 for the 99% CIs in all cases.

Unfortunately, for the response probabilities, the coverage is not that great. especially in the speed condition and for tighter CIs. However, for the wide CIs the coverage probabilities is at least acceptable. Overall the results so far suggest that the model provides an adequate account. There are some misfits that should be kept in mind if one is interested in extending the model or fitting it to new data, but overall it provides a satisfactory account.

QQ-plots: RTs

The final approach for assessing the fit of the model will be based on more quantiles of the RT distribution (i.e., so far we only looked at th .5 quantile, the median). We will then plot individual observed versus predicted (i.e., mean from posterior predictive distribution) quantiles across conditions. For this we first calculate the quantiles per sample from the posterior predictive distribution and then aggregate across the samples. This is achieved via dplyr::summarise_at using a list column and tidyr::unnest to unstack the columns (see section 25.3 in “R for data Science”). We then combine the aggregated predicted RT quantiles with the observed RT quantiles.

quantiles <- c(0.1, 0.25, 0.5, 0.75, 0.9)

pp2 <- d_speed_acc %>% 
  group_by(id, condition, frequency) %>%  # select grouping vars
  summarise_at(.vars = vars(starts_with("V")), 
               funs(lower = list(rownames_to_column(
                 data.frame(q = quantile(abs(.[. < 0]), probs = quantiles)))),
                    upper = list(rownames_to_column(
                      data.frame(q = quantile(.[. > 0], probs = quantiles ))))
               )) %>% 
  ungroup %>% 
  gather("key", "value", -id, -condition, -frequency) %>% # remove grouping vars
  separate("key", c("rep", "boundary"), sep = "_") %>% 
  unnest(value) %>% 
  group_by(id, condition, frequency, boundary, rowname) %>% # grouping vars + new vars
  summarise(predicted = mean(q, na.rm = TRUE))

rt_pp <- speed_acc %>% 
  group_by(id, condition, frequency) %>% # select grouping vars
  summarise(lower = list(rownames_to_column(
    data.frame(observed = quantile(rt[response == "word"], probs = quantiles)))),
    upper = list(rownames_to_column(
      data.frame(observed = quantile(rt[response == "nonword"], probs = quantiles ))))
  ) %>% 
  ungroup %>% 
  gather("boundary", "value", -id, -condition, -frequency) %>%
  unnest(value) %>% 
  left_join(pp2)

To evaluate the agreement between observed and predicted quantiles we calculate for each cell and quantile the concordance correlation coefficient (CCC; e.g, Barchard, 2012, Psych. Methods). The CCC is a measure of absolute agreement between two values and thus better suited than simple correlation. It is scaled from -1 to 1 where 1 represent perfect agreement, 0 no relationship, and -1 a correlation of -1 with same mean and variance of the two variables.

The following code produces QQ-plots for each condition and quantile separately for responses to the upper boundary and lower boundary. The value in the upper left of each plot gives the CCC measures of absolute agreement.

plot_text <- rt_pp %>% 
  group_by(condition, frequency, rowname, boundary) %>% 
  summarise(ccc = format(
    CCC(observed, predicted, na.rm = TRUE)$rho.c$est, 
    digits = 2))

p_upper <- rt_pp %>% 
  filter(boundary == "upper") %>% 
  ggplot(aes(x = observed, predicted)) +
  geom_abline(slope = 1, intercept = 0) +
  geom_point() +
  facet_grid(condition+frequency~ rowname) + 
  geom_text(data=plot_text[ plot_text$boundary == "upper", ],
            aes(x = 0.5, y = 1.8, label=ccc), 
            parse = TRUE, inherit.aes=FALSE) +
  coord_fixed() +
  ggtitle("Upper responses") +
  theme_bw()

p_lower <- rt_pp %>% 
  filter(boundary == "lower") %>% 
  ggplot(aes(x = observed, predicted)) +
  geom_abline(slope = 1, intercept = 0) +
  geom_point() +
  facet_grid(condition+frequency~ rowname) + 
  geom_text(data=plot_text[ plot_text$boundary == "lower", ],
            aes(x = 0.5, y = 1.6, label=ccc), 
            parse = TRUE, inherit.aes=FALSE) +
  coord_fixed() +
  ggtitle("Lower responses") +
  theme_bw()

grid.arrange(p_upper, p_lower, ncol = 1)

Results show that overall the fit is better for the accuracy than the speed conditions. Furthermore, fit is better for the common response (i.e., nw_high for upper and high for lower responses). This latter observation is again not too surprising.

When comparing the fit for the different quantiles it appears that at least the median (i.e., 50%) shows acceptable values for the common response. However, especially in the speed condition the account of the other quantiles is not great. Nevertheless, dramatic misfit is only observed for the rare responses.

One possibility for some of the low CCCs in the speed conditions may be the comparatively low variances in some of the cells. For example, for both speed conditions that are common (i.e., speed & nw_high for upper responses and speed & high for lower responses) the visual inspection of the plot suggests a acceptable account while at the same time some CCC value are low (i.e., < .5). Only for the 90% quantile in the speed conditions (and somewhat less the 75% quantile) we see some systematic deviations. The model predicts slower RTs than observed.

Taken together, the model appear to provide an at least acceptable account. The only slightly worrying patterns are (a) that the model predicts a slightly better performance for the word stimuli than observed (i.e., lower predicted rate of non-word responses than observed for word-stimuli) and (b) that in the speed conditions the model predicts somewhat longer RTs for the 75% and 90% quantile than observed.

The next step will be to look at differences between parameters as a function of the speed-accuracy condition. This is the topic of the third blog post. I am hopeful it will not take two months this time.

 

]]>
http://singmann.org/wiener-model-analysis-with-brms-part-ii/feed/ 6 624
Diffusion/Wiener Model Analysis with brms – Part I: Introduction and Estimation http://singmann.org/wiener-model-analysis-with-brms-part-i/ http://singmann.org/wiener-model-analysis-with-brms-part-i/#comments Sun, 26 Nov 2017 17:47:48 +0000 http://singmann.org/?p=570 Stan is probably the most interesting development in computational statistics in the last few years, at least for me. The version of Hamiltonian Monte-Carlo (HMC) implemented in Stan (NUTS, ) is extremely efficient and the range of probability distributions implemented in the Stan language allows to fit an extremely wide range of models. Stan has considerably changed which models I think can be realistically estimated both in terms of model complexity and data size. It is not an overstatement to say that Stan (and particularly rstan) has considerable changed the way I analyze data.

One of the R packages that allows to implement Stan models in a very convenient manner and which has created a lot of buzz recently is brms . It allows to specify a wide range of models using the R formula interface. Based on the formula and a specification of the family of the model, it generates the model code, compiles it, and then passes it together with the data to rstan for sampling. Because I usually program my models by-hand (thanks to the great Stan documentation), I have so far stayed away from brms.

However, I recently learned that brms also allows the estimation of the Wiener model (i.e., the 4-parameter diffusion model, ) for simultaneously accounting for responses and corresponding response times for data from two-choice tasks. Such data is quite common in psychology and the diffusion model is one of the more popular cognitive models out there . In a series of (probably 3) posts I provide an example of applying the Wiener model to some published data using brms. This first part shows how to set up and estimate the model. The second part gives an overview of model diagnostics and an assessment of model fit via posterior predictive distributions. The third part shows how to inspect and compare the posterior distributions of the parameters.

In addition to brms and a working C++ compiler, this first part also needs package RWiener for generating the posterior predictive distribution within brms and package rtdists for the data.

library("brms")

Data and Model

A graphical illustration of the Wiener diffusion model for two-choice reaction times. An evidence counter starts at value `alpha`*`beta` and evolves with random increments. The mean increment is `delta` . The process terminates as soon as the accrued evidence exceeds `alpha` or deceeds 0. The decision process starts at time `tau` from the stimulus presentation and terminates at the reaction time. [This figure and caption are taken from Wabersich and Vandekerckhove (2014, The R Journal, CC-BY license).]

I expect the reader to already be familiar with the Wiener model and will only provide the a very brief introduction here, for more see . The Wiener model is a continuous-time evidence accumulation model for binary choice tasks. It assumes that in each trial evidence is accumulated in a noisy (diffusion) process by a single accumulator. Evidence accumulation starts at the start point and continues until the accumulator hits one of the two decision bounds in which case the corresponding response is given. The total response time is the sum of the decision time from the accumulation process plus non-decisional components. In sum, the Wiener model allows to decompose responses to a binary choice tasks and corresponding response times into four latent processes:

  • The drift rate (delta) is the average slope of the accumulation process towards the boundaries. The larger the (absolute value of the) drift rate, the stronger the evidence for the corresponding response option.
  • The boundary separation (alpha) is the distance between the two decision bounds and interpreted as a measure of response caution.
  • The starting point (beta) of the accumulation process is a measure of response bias towards one of the two response boundaries.
  • The non-decision time (tau) captures all non-decisional process such as stimulus encoding and response processes.

We will analyze part of the data from Experiment 1 of . The data comes from 17 participants performing a lexical decision task in which they have to decide if a presented string is a word or non-word. Participants made decisions either under speed or accuracy emphasis instructions in different experimental blocks. This data comes with the rtdists package (which provides the PDF, CDF, and RNG for the full 7-parameter diffusion model). After removing some extreme RTs, we restrict the analysis to high-frequency words (frequency = high) and the corresponding high-frequency non-words (frequency = nw_high) to reduce estimation time. To setup the model we also need a numeric response variable in which 0 corresponds to responses at the lower response boundary and 1 corresponds to responses at the upper boundary. For this we transform the categorical response variable response to numeric and subtract 1 such that a word response correspond to the lower response boundary and a nonword response to the upper boundary.

data(speed_acc, package = "rtdists")
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) # remove extreme RTs
speed_acc <- droplevels(speed_acc[ speed_acc$frequency %in% 
                                     c("high", "nw_high"),])
speed_acc$response2 <- as.numeric(speed_acc$response)-1
str(speed_acc)
'data.frame':    10462 obs. of  10 variables:
 $ id       : Factor w/ 17 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ block    : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ condition: Factor w/ 2 levels "accuracy","speed": 2 2 2 2 2 2 2 2 2 2 ...
 $ stim     : Factor w/ 1611 levels "1001","1002",..: 1271 46 110 666 422 ...
 $ stim_cat : Factor w/ 2 levels "word","nonword": 2 1 1 1 1 1 2 1 1 2 ...
 $ frequency: Factor w/ 2 levels "high","nw_high": 2 1 1 1 1 1 2 1 1 2 ...
 $ response : Factor w/ 2 levels "word","nonword": 2 1 1 1 1 1 1 1 1 1 ...
 $ rt       : num  0.773 0.39 0.435 0.427 0.622 0.441 0.308 0.436 0.412 ...
 $ censor   : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ response2: num  1 0 0 0 0 0 0 0 0 0 ...

Model Formula

The important decision that has to be made before setting up a model is which parameters are allowed to differ between which conditions (i.e., factor levels). One common constraint of the Wiener model (and other evidence-accumulation models) is that the parameters that are set before the evidence accumulation process starts (i.e., boundary separation, starting point, and non-decision time) cannot change based on stimulus characteristics that are not known to the participant before the start of the trial. Thus, the item-type, in the present case word versus non-word, is usually only allowed to affect the drift rate. We follow this constraint. Furthermore, all four parameters are allowed to vary between speed and accuracy condition as this is manipulated between blocks of trials. Also note that all relevant variables are manipulated within-subjects. Thus, the maximal random-effects structure entails corresponding random-effects parameters for each fixed-effect. To set up the model we need to invoke the bf() function and construct one formula for each of the four parameters of the Wiener model.

formula <- bf(rt | dec(response2) ~ 0 + condition:frequency + 
                (0 + condition:frequency|p|id), 
               bs ~ 0 + condition + (0 + condition|p|id), 
               ndt ~ 0 + condition + (0 + condition|p|id),
               bias ~ 0 + condition + (0 + condition|p|id))

The first formula is for the drift rate and is also used for specifying the column containing the RTs (rt) and the response or decision (response2) on the left hand side. On the right hand side one can specify fixed effects as well as random effects in a way similar to lme4. The drift rate is allowed to vary between both variables, condition and frequency (stim_cat would be equivalent), thus we estimate fixed effects as well as random effects for both factors as well as their interaction.

We then also need to set up one formula for each of the other three parameters (which are only allowed to vary by condition). For these formulas, the left hand side denotes the parameter names:

  • bs: boundary separation (alpha)
  • ndt: non-decision time (tau)
  • bias: starting point (beta)

The right hand side again specifies the fixed- and random-effects. Note that one common approach for setting up evidence accumulation models is to specify that one response boundary represent correct responses and one response boundary denotes incorrect responses (in contrast to the current approach in which the response boundaries represent the actually two response options). In such a situation one cannot estimate the starting point and it needs to be fixed to 0.5 (i.e., replace the formula with bias = 0.5).

Two further points are relevant in the formulas. First, I have used a somewhat uncommon parameterization and suppressed the intercept (e.g., ~ 0 + condition instead of ~ condition). The reason for this is that when an intercept is present, categorical variables (i.e., factors) with k levels are coded with k-1 deviation variables that represent deviations from the intercept. Thus, in a Bayesian setting one needs to consider the choice of prior for these deviation variables. In contrast, when suppressing the intercept the model can be setup such that each factor level (or design cells in case of involvement of more than one factor) receives its own parameter, as done here. This essentially allows the same prior for each parameter (as long as one does not expect the parameters to vary dramatically). Furthermore, when programming a model oneself this is a common parameterization. To see the differences between the different parameterizations compare the following two calls (model.matrix is the function that creates the parameterization internally). Only the first creates a separate parameter for each condition.

unique(model.matrix(~0+condition, speed_acc))
##     conditionaccuracy conditionspeed
## 36                  0              1
## 128                 1              0
unique(model.matrix(~condition, speed_acc))
##     (Intercept) conditionspeed
## 36            1              1
## 128           1              0

Note that when more than one factor is involved and one wants to use this parameterization, one needs to combine the factors using the : and not *. This can be seen when running the code below. Also note that when combining the factors with : without suppressing the intercept, the resulting model has one parameter more than can be estimated (i.e., the model-matrix is rank deficient). So care needs to be taken at this step.

unique(model.matrix(~ 0 + condition:frequency, speed_acc))
unique(model.matrix(~ 0 + condition*frequency, speed_acc))
unique(model.matrix(~ condition:frequency, speed_acc))

Second, brms formulas provide a way to estimate correlations among random-effects parameters of different formulas. To achieve this, one can place an identifier in the middle of the random-effects formula that is separated by | on both sides. Correlations among random-effects will then be estimated for all random-effects formulas that share the same identifier. In our case, we want to estimate the full random-effects matrix with correlations among all model parameters, following the “latent-trait approach” . We therefore place the same identifier (p) in all formulas. Thus, correlations will be estimated among all individual-level deviations across all four Wiener parameters. To estimate correlations only among the random-effects parameters of each formula, simply omit the identifier (e.g., (0 + condition|id)). Furthermore, note that brms, similar to afex, supports suppressing the correlations among categorical random-effects parameters via || (e.g., (0 + condition||id)).

Family, Link-Functions, and Priors

The next step is to setup the priors. For this we can invoke the get_prior function. This function requires one to specify the formula, data, as well as the family of the model. family is the argument where we tell brms that we want to use the wiener model. We also use it to specify the link function for the four Wiener parameters. Because the drift rate can take on any value (i.e., from -Inf to Inf), the default link function is "identity" (i.e., no transformation) which we retain. The other three parameters all have a restricted range. The boundary needs to be larger than 0, the non-decision time needs to be larger than 0 and smaller than the smallest RT, and the starting point needs to be between 0 and 1. The default link-functions respect these constraints and use "log" for the first two parameters and "logit" for the bias. This certainly is a possibility, but has a number of drawbacks leading me to use the "identity" link function for all parameters. First, when parameters are transformed, the priors need to be specified on the untransformed scale. Second, the individual-levels deviations (i.e., the random-effects estimates) are assumed to come from a multivariate normal distribution. Parameters transformations would entail that these individual-deviations are only normally distributed on the untransformed scale. Likewise, the correlations of parameter deviations across parameters would also be on the untransformed scale. Both make the interpretation of the random-effects difficult.

When specifying the parameters without transformation (i.e., link = "identity") care must be taken that the priors places most mass on values inside the allowed range. Likewise, starting values need to be inside the allowed range. Using the identity link function also comes with drawbacks discussed at the end. However, as long as parameter outside the allowed range only occur rarely, such a model can converge successfully and it makes the interpretation easier.

The get_prior function returns a data.frame containing all parameters of the model. If parameters have default priors these are listed as well. One needs to define priors either for individual parameters, parameter classes, or parameter classes for specific groups, or dpars. Note that all parameters that do not have a default prior should receive a specific prior.

get_prior(formula,
          data = speed_acc, 
          family = wiener(link_bs = "identity", 
                          link_ndt = "identity", 
                          link_bias = "identity"))

[Two empty columns to the right were removed from the following output.]

                 prior class                               coef group resp dpar 
1                          b                                                    
2                          b    conditionaccuracy:frequencyhigh                 
3                          b conditionaccuracy:frequencynw_high                 
4                          b       conditionspeed:frequencyhigh                 
5                          b    conditionspeed:frequencynw_high                 
6               lkj(1)   cor                                                    
7                        cor                                       id           
8  student_t(3, 0, 10)    sd                                                    
9                         sd                                       id           
10                        sd    conditionaccuracy:frequencyhigh    id           
11                        sd conditionaccuracy:frequencynw_high    id           
12                        sd       conditionspeed:frequencyhigh    id           
13                        sd    conditionspeed:frequencynw_high    id           
14                         b                                               bias 
15                         b                  conditionaccuracy            bias 
16                         b                     conditionspeed            bias 
17 student_t(3, 0, 10)    sd                                               bias 
18                        sd                                       id      bias 
19                        sd                  conditionaccuracy    id      bias 
20                        sd                     conditionspeed    id      bias 
21                         b                                                 bs 
22                         b                  conditionaccuracy              bs 
23                         b                     conditionspeed              bs 
24 student_t(3, 0, 10)    sd                                                 bs 
25                        sd                                       id        bs 
26                        sd                  conditionaccuracy    id        bs 
27                        sd                     conditionspeed    id        bs 
28                         b                                                ndt 
29                         b                  conditionaccuracy             ndt 
30                         b                     conditionspeed             ndt 
31 student_t(3, 0, 10)    sd                                                ndt 
32                        sd                                       id       ndt 
33                        sd                  conditionaccuracy    id       ndt 
34                        sd                     conditionspeed    id       ndt

Priors can be defined with the prior or set_prior function allowing different levels of control. One benefit of the way the model is parameterized is that we only need to specify priors for one set of parameters per Wiener parameters (i.e., b) and do not have to distinguish between intercept and other parameters.

We base our choice of the priors on prior knowledge of likely parameter values for the Wiener model, but otherwise try to specify them in a weakly informative manner. That is, they should restrict the range to likely values but not affect the estimation any further. For the drift rate we use a Cauchy distribution with location 0 and scale 5 so that roughly 70% of prior mass are between -10 and 10. For the boundary separation we use a normal prior with mean 1.5 and standard deviation of 1, for the non-decision time a normal prior with mean 0.2 and standard deviation of 0.1, and for the bias we use a normal with mean of 0.5 (i.e., no-bias) and standard deviation of 0.2.

prior <- c(
 prior("cauchy(0, 5)", class = "b"),
 set_prior("normal(1.5, 1)", class = "b", dpar = "bs"),
 set_prior("normal(0.2, 0.1)", class = "b", dpar = "ndt"),
 set_prior("normal(0.5, 0.2)", class = "b", dpar = "bias")
)

With this information we can use the make_stancode function and inspect the full model code. The important thing is to make sure that all parameters listed in the parameters block have a prior listed in the model block. We can also see, at the beginning of the model block, that none of our parameters is transformed just as desired (a bug in a previous version of brms prevented anything but the default links for the Wiener model parameters).

make_stancode(formula, 
              family = wiener(link_bs = "identity", 
                              link_ndt = "identity",
                              link_bias = "identity"),
              data = speed_acc, 
              prior = prior)

 

// generated with brms 1.10.2
functions { 

  /* Wiener diffusion log-PDF for a single response
   * Args: 
   *   y: reaction time data
   *   dec: decision data (0 or 1)
   *   alpha: boundary separation parameter > 0
   *   tau: non-decision time parameter > 0
   *   beta: initial bias parameter in [0, 1]
   *   delta: drift rate parameter
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
   real wiener_diffusion_lpdf(real y, int dec, real alpha, 
                              real tau, real beta, real delta) { 
     if (dec == 1) {
       return wiener_lpdf(y | alpha, tau, beta, delta);
     } else {
       return wiener_lpdf(y | alpha, tau, 1 - beta, - delta);
     }
   }
} 
data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K;  // number of population-level effects 
  matrix[N, K] X;  // population-level design matrix 
  int<lower=1> K_bs;  // number of population-level effects 
  matrix[N, K_bs] X_bs;  // population-level design matrix 
  int<lower=1> K_ndt;  // number of population-level effects 
  matrix[N, K_ndt] X_ndt;  // population-level design matrix 
  int<lower=1> K_bias;  // number of population-level effects 
  matrix[N, K_bias] X_bias;  // population-level design matrix 
  // data for group-level effects of ID 1 
  int<lower=1> J_1[N]; 
  int<lower=1> N_1; 
  int<lower=1> M_1; 
  vector[N] Z_1_1; 
  vector[N] Z_1_2; 
  vector[N] Z_1_3; 
  vector[N] Z_1_4; 
  vector[N] Z_1_bs_5; 
  vector[N] Z_1_bs_6; 
  vector[N] Z_1_ndt_7; 
  vector[N] Z_1_ndt_8; 
  vector[N] Z_1_bias_9; 
  vector[N] Z_1_bias_10; 
  int<lower=1> NC_1; 
  int<lower=0,upper=1> dec[N];  // decisions 
  int prior_only;  // should the likelihood be ignored? 
} 
transformed data { 
  real min_Y = min(Y); 
} 
parameters { 
  vector[K] b;  // population-level effects 
  vector[K_bs] b_bs;  // population-level effects 
  vector[K_ndt] b_ndt;  // population-level effects 
  vector[K_bias] b_bias;  // population-level effects 
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations 
  matrix[M_1, N_1] z_1;  // unscaled group-level effects 
  // cholesky factor of correlation matrix 
  cholesky_factor_corr[M_1] L_1; 
} 
transformed parameters { 
  // group-level effects 
  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)'; 
  vector[N_1] r_1_1 = r_1[, 1]; 
  vector[N_1] r_1_2 = r_1[, 2]; 
  vector[N_1] r_1_3 = r_1[, 3]; 
  vector[N_1] r_1_4 = r_1[, 4]; 
  vector[N_1] r_1_bs_5 = r_1[, 5]; 
  vector[N_1] r_1_bs_6 = r_1[, 6]; 
  vector[N_1] r_1_ndt_7 = r_1[, 7]; 
  vector[N_1] r_1_ndt_8 = r_1[, 8]; 
  vector[N_1] r_1_bias_9 = r_1[, 9]; 
  vector[N_1] r_1_bias_10 = r_1[, 10]; 
} 
model { 
  vector[N] mu = X * b; 
  vector[N] bs = X_bs * b_bs; 
  vector[N] ndt = X_ndt * b_ndt; 
  vector[N] bias = X_bias * b_bias; 
  for (n in 1:N) { 
    mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n] + (r_1_2[J_1[n]]) * Z_1_2[n] + (r_1_3[J_1[n]]) * Z_1_3[n] + (r_1_4[J_1[n]]) * Z_1_4[n]; 
    bs[n] = bs[n] + (r_1_bs_5[J_1[n]]) * Z_1_bs_5[n] + (r_1_bs_6[J_1[n]]) * Z_1_bs_6[n]; 
    ndt[n] = ndt[n] + (r_1_ndt_7[J_1[n]]) * Z_1_ndt_7[n] + (r_1_ndt_8[J_1[n]]) * Z_1_ndt_8[n]; 
    bias[n] = bias[n] + (r_1_bias_9[J_1[n]]) * Z_1_bias_9[n] + (r_1_bias_10[J_1[n]]) * Z_1_bias_10[n]; 
  } 
  // priors including all constants 
  target += cauchy_lpdf(b | 0, 5); 
  target += normal_lpdf(b_bs | 1.5, 1); 
  target += normal_lpdf(b_ndt | 0.2, 0.1); 
  target += normal_lpdf(b_bias | 0.5, 0.2); 
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 10 * student_t_lccdf(0 | 3, 0, 10); 
  target += lkj_corr_cholesky_lpdf(L_1 | 1); 
  target += normal_lpdf(to_vector(z_1) | 0, 1); 
  // likelihood including all constants 
  if (!prior_only) { 
    for (n in 1:N) { 
      target += wiener_diffusion_lpdf(Y[n] | dec[n], bs[n], ndt[n], bias[n], mu[n]); 
    } 
  } 
} 
generated quantities { 
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1); 
  vector<lower=-1,upper=1>[NC_1] cor_1; 
  // take only relevant parts of correlation matrix 
  cor_1[1] = Cor_1[1,2]; 
  [...]
  cor_1[45] = Cor_1[9,10]; 
}

[The output was slightly modified.]

The last piece we need, before we can finally estimate the model, is a function that generates initial values. Without initial values that lead to an identifiable model for all data points, estimation will not start. The function needs to provide initial values for all parameters listed in the parameters block of the model. Note that many of those parameters have at least one dimension with a parameterized extent (e.g., K). We can use make_standata and create the data set used by brms for the estimation for obtaining the necessary information. We then use this data object (i.e., a list) for generating the correctly sized initial values in function initfun (note that initfun relies on the fact that tmp_dat is in the global environment which is something of a code smell).

tmp_dat <- make_standata(formula, 
                         family = wiener(link_bs = "identity", 
                              link_ndt = "identity",
                              link_bias = "identity"),
                            data = speed_acc, prior = prior)
str(tmp_dat, 1, give.attr = FALSE)
## List of 26
##  $ N          : int 10462
##  $ Y          : num [1:10462(1d)] 0.773 0.39 0.435  ...
##  $ K          : int 4
##  $ X          : num [1:10462, 1:4] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_1      : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_2      : num [1:10462(1d)] 0 1 1 1 1 1 0 1 1 0 ...
##  $ Z_1_3      : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_4      : num [1:10462(1d)] 1 0 0 0 0 0 1 0 0 1 ...
##  $ K_bs       : int 2
##  $ X_bs       : num [1:10462, 1:2] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bs_5   : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bs_6   : num [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ K_ndt      : int 2
##  $ X_ndt      : num [1:10462, 1:2] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_ndt_7  : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_ndt_8  : num [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ K_bias     : int 2
##  $ X_bias     : num [1:10462, 1:2] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bias_9 : num [1:10462(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_1_bias_10: num [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ J_1        : int [1:10462(1d)] 1 1 1 1 1 1 1 1 1 1 ...
##  $ N_1        : int 17
##  $ M_1        : int 10
##  $ NC_1       : num 45
##  $ dec        : num [1:10462(1d)] 1 0 0 0 0 0 0 0 0 0 ...
##  $ prior_only : int 0

initfun <- function() {
  list(
    b = rnorm(tmp_dat$K),
    b_bs = runif(tmp_dat$K_bs, 1, 2),
    b_ndt = runif(tmp_dat$K_ndt, 0.1, 0.15),
    b_bias = rnorm(tmp_dat$K_bias, 0.5, 0.1),
    sd_1 = runif(tmp_dat$M_1, 0.5, 1),
    z_1 = matrix(rnorm(tmp_dat$M_1*tmp_dat$N_1, 0, 0.01),
                 tmp_dat$M_1, tmp_dat$N_1),
    L_1 = diag(tmp_dat$M_1)
  )
}

Estimation (i.e., Sampling)

Finally, we have all pieces together and can estimate the Wiener model using the brm function. Note that this will take roughly a full day, depending on the speed of your PC also longer. We also already increase the maximal treedepth to 15. We probably should have also increased adapt_delta above the default value of .8 as there are a few divergent transitions, but this is left as an exercise to the reader.

After estimation is finished, we see that there are a few (< 10) divergent transitions. If this were a real analysis and not only an example, we would need to increase adapt_delta to a larger value (e.g., .95 or .99) and rerun the estimation. In this case however, we immediately begin with the second step and obtain samples from the posterior predictive distribution using predict. For this it is important to specify the number of posterior samples (here we use 500). In addition, it is important to set summary = FALSE, for obtaining the actual posterior predictive distribution and not a summary of the posterior predictive distribution, and negative_rt = TRUE. The latter ensures that predicted responses to the lower boundary receive a negative sign whereas predicted responses to the upper boundary receive a positive sign.

fit_wiener <- brm(formula, 
                  data = speed_acc,
                  family = wiener(link_bs = "identity", 
                                  link_ndt = "identity",
                                  link_bias = "identity"),
                  prior = prior, inits = initfun,
                  iter = 1000, warmup = 500, 
                  chains = 4, cores = 4, 
                  control = list(max_treedepth = 15))
NPRED <- 500
pred_wiener <- predict(fit_wiener, 
                       summary = FALSE, 
                       negative_rt = TRUE, 
                       nsamples = NPRED)

Because both steps are quite time intensive (estimation 1 day, obtaining posterior predictives a few hours), we save the results of both steps. Given the comparatively large size of both objects, using the 'xz' compression (i.e., the strongest in R) seems like a good idea.

save(fit_wiener, file = "brms_wiener_example_fit.rda", 
     compress = "xz")
save(pred_wiener, file = "brms_wiener_example_predictions.rda", 
     compress = "xz")

The second part shows how to perform model diagnostics and how to asses the model fit. The third part shows how to test for differences in parameters between conditions.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/wiener-model-analysis-with-brms-part-i/feed/ 3 570
ANOVA in R: afex may be the solution you are looking for http://singmann.org/anova-in-r-afex-may-be-the-solution-you-are-looking-for/ http://singmann.org/anova-in-r-afex-may-be-the-solution-you-are-looking-for/#comments Mon, 05 Jun 2017 15:14:59 +0000 http://singmann.org/?p=485 Prelude: When you start with R and try to estimate a standard ANOVA , which is relatively simple in commercial software like SPSS, R kind of sucks. Especially for unbalanced designs or designs with repeated-measures replicating the results from such software in base R may require considerable effort. For a newcomer (and even an old timer) this can be somewhat off-putting. After I had gained experience developing my first package and was once again struggling with R and ANOVA I had enough and decided to develop afex. If you know this feeling, afex is also for you.


A new version of afex (0.18-0) has been accepted on CRAN a few days ago. This version only fixes a small bug that was introduced in the last version.  aov_ez did not work with more than one covariate (thanks to tkerwin for reporting this bug).

I want to use this opportunity to introduce one of the main functionalities of afex. It provides a set of functions that make calculating ANOVAs easy. In the default settings, afex automatically uses appropriate orthogonal contrasts for factors, transforms numerical variables into factors, uses so-called Type III sums of squares, and allows for any number of factors including repeated-measures (or within-subjects) factors and mixed/split-plot designs. Together this guarantees that the ANOVA results correspond to the results obtained from commercial statistical packages such as SPSS or SAS. On top of this, the ANOVA object returned by afex (of class afex_aov) can be directly used for follow-up or post-hoc tests/contrasts using the lsmeans package .

Example Data

Let me illustrate how to calculate an ANOVA with a simple example. We use data courtesy of Andrew Heathcote and colleagues . The data are lexical decision and word naming latencies for 300 words and 300 nonwords from 45 participants. Here we only look at three factors:

  • task is a between subjects (or independent-samples) factor: 25 participants worked on the lexical decision task (lexdec; i.e., participants had to make a binary decision whether or not the presented string is a word or nonword) and 20 participants on the naming task (naming; i.e., participant had to say the presented string out loud).
  • stimulus is a repeated-measures or within-subjects factor that codes whether a presented string was a word or nonword.
  • length is also a repeated-measures factor that gives the number of characters of the presented strings with three levels: 3, 4, and 5.

The dependent variable is the response latency or response time for each presented string. More specifically, as is common in the literature we analyze the log of the response times, log_rt. After excluding erroneous responses each participants responded to between 135 and 150 words and between 124 and 150 nonwords. To use this data in an ANOVA one needs to aggregate the data such that only one observation per participant and cell of the design (i.e., combination of all factors) remains. As we will see, afex does this automatically for us (this is one of the features I blatantly stole from ez).

library(afex)
data("fhch2010") # load data (comes with afex) 

mean(!fhch2010$correct) # error rate
# [1] 0.01981546
fhch <- droplevels(fhch2010[ fhch2010$correct,]) # remove errors

str(fhch2010) # structure of the data
# 'data.frame': 13222 obs. of  10 variables:
#  $ id       : Factor w/ 45 levels "N1","N12","N13",..: 1 1 1 1 1 1 1 1 ...
#  $ task     : Factor w/ 2 levels "naming","lexdec": 1 1 1 1 1 1 1 1 1 1 ...
#  $ stimulus : Factor w/ 2 levels "word","nonword": 1 1 1 2 2 1 2 2 1 2 ...
#  $ density  : Factor w/ 2 levels "low","high": 2 1 1 2 1 2 1 1 1 1 ...
#  $ frequency: Factor w/ 2 levels "low","high": 1 2 2 2 2 2 1 2 1 2 ...
#  $ length   : Factor w/ 3 levels "4","5","6": 3 3 2 2 1 1 3 2 1 3 ...
#  $ item     : Factor w/ 600 levels "abide","acts",..: 363 121 ...
#  $ rt       : num  1.091 0.876 0.71 1.21 0.843 ...
#  $ log_rt   : num  0.0871 -0.1324 -0.3425 0.1906 -0.1708 ...
#  $ correct  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

We first load the data and remove the roughly 2% errors. The structure of the data.frame (obtained via str()) shows us that the data has a few more factors than discussed here. To specify our ANOVA we first use function aov_car() which works very similar to base R aov(), but as all afex functions uses car::Anova() (read as function Anova() from package car) as the backend for calculating the ANOVA.

Specifying an ANOVA

(a1 <- aov_car(log_rt ~ task*length*stimulus + Error(id/(length*stimulus)), fhch))
# Contrasts set to contr.sum for the following variables: task
# Anova Table (Type 3 tests)
# 
# Response: log_rt
#                 Effect          df  MSE          F   ges p.value
# 1                 task       1, 43 0.23  13.38 ***   .22   .0007
# 2               length 1.83, 78.64 0.00  18.55 ***  .008  <.0001
# 3          task:length 1.83, 78.64 0.00       1.02 .0004     .36
# 4             stimulus       1, 43 0.01 173.25 ***   .17  <.0001
# 5        task:stimulus       1, 43 0.01  87.56 ***   .10  <.0001
# 6      length:stimulus 1.70, 72.97 0.00       1.91 .0007     .16
# 7 task:length:stimulus 1.70, 72.97 0.00       1.21 .0005     .30
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
# 
# Sphericity correction method: GG 
# Warning message:
# More than one observation per cell, aggregating the data using mean (i.e, fun_aggregate = mean)!

The printed output is an ANOVA table that could basically be copied to a manuscript as is. One sees the terms in column Effect, the degrees of freedoms (df), the mean-squared error (MSE, I would probably remove this column in a manuscript), the F-value (F, which also contains the significance stars), and the p-value (p.value). The only somewhat uncommon column is ges which provides generalized eta-squared, ‘the recommended effect size statistics for repeated measure designs’  . The standard output also reports Greenhouse-Geisser (GG) corrected df for repeated-measures factors with more than two levels (to account for possible violations of sphericity). Note that these corrected df are not integers.

We can also see a warning notifying us that afex has detected that each participant and cell of the design provides more than one observation which are then automatically aggregated using mean. The warning serves the purpose to notify the user in case this was not intended (i.e., when there should be only one observation per participant and cell of the design). The warning can be suppressed via specifying fun_aggregate = mean explicitly in the call to aov_car.

The formula passed to aov_car basically needs to be the same as for standard aov with a few differences:

  • It must have an Error term specifying the column containing the participant (or unit of observation) identifier (e.g., minimally +Error(id)). This is necessary to allow the automatic aggregation even in designs without repeated-measures factor.
  • Repeated-measures factors only need to be defined in the Error term and do not need to be enclosed by parentheses. Consequently, the following call produces the same ANOVA:
    aov_car(log_rt ~ task + Error(id/length*stimulus), fhch)

     

In addition to aov_car, afex provides two further function for calculating ANOVAs. These function produce the same output but differ in the way how to specify the ANOVA.

  • aov_ez allows the ANOVA specification not via a formula but via character vectors (and is similar to ez::ezANOVA()):
    aov_ez(id = "id", dv = "log_rt", fhch, between = "task", within = c("length", "stimulus"))
  • aov_4 requires a formula for which the id and repeated-measures factors need to be specified as in lme4::lmer() (with the same simplification that repeated-measures factors only need to be specified in the random part):
    aov_4(log_rt ~ task + (length*stimulus|id), fhch)
    aov_4(log_rt ~ task*length*stimulus + (length*stimulus|id), fhch)
    

Follow-up Tests

A common requirement after the omnibus test provided by the ANOVA is some-sort of follow-up analysis. For this purpose, afex is fully integrated with lsmeans .

For example, assume we are interested in the significant task:stimulus interaction. As a first step we might want to investigate the marginal means of these two factors:

lsmeans(a1, c("stimulus","task"))
# NOTE: Results may be misleading due to involvement in interactions
#  stimulus task        lsmean         SE    df    lower.CL    upper.CL
#  word     naming -0.34111656 0.04250050 48.46 -0.42654877 -0.25568435
#  nonword  naming -0.02687619 0.04250050 48.46 -0.11230839  0.05855602
#  word     lexdec  0.00331642 0.04224522 47.37 -0.08165241  0.08828525
#  nonword  lexdec  0.05640801 0.04224522 47.37 -0.02856083  0.14137684
# 
# Results are averaged over the levels of: length 
# Confidence level used: 0.95 

From this we can see naming trials seems to be generally slower (as a reminder, the dv is log-transformed RT in seconds, so values below 0 correspond to RTs bewteen 0 and 1), It also appears that the difference between word and nonword trials is larger in the naming task then in the lexdec task. We test this with the following code using a few different lsmeans function. We first use lsmeans again, but this time using task as the conditioning variable specified in by. Then we use pairs() for obtaining all pairwise comparisons within each conditioning strata (i.e., level of task). This provides us already with the correct tests, but does not control for the family-wise error rate across both tests. To get those, we simply update() the returned results and remove the conditioning by setting by=NULL. In the call to update we can already specify the method for error control and we specify 'holm',  because it is uniformly more powerful than Bonferroni.

# set up conditional marginal means:
(ls1 <- lsmeans(a1, c("stimulus"), by="task"))
# task = naming:
#  stimulus      lsmean         SE    df    lower.CL    upper.CL
#  word     -0.34111656 0.04250050 48.46 -0.42654877 -0.25568435
#  nonword  -0.02687619 0.04250050 48.46 -0.11230839  0.05855602
# 
# task = lexdec:
#  stimulus      lsmean         SE    df    lower.CL    upper.CL
#  word      0.00331642 0.04224522 47.37 -0.08165241  0.08828525
#  nonword   0.05640801 0.04224522 47.37 -0.02856083  0.14137684
# 
# Results are averaged over the levels of: length 
# Confidence level used: 0.95 
update(pairs(ls1), by=NULL, adjust = "holm")
#  contrast       task      estimate         SE df t.ratio p.value
#  word - nonword naming -0.31424037 0.02080113 43 -15.107  <.0001
#  word - nonword lexdec -0.05309159 0.01860509 43  -2.854  0.0066
# 
# Results are averaged over the levels of: length 
# P value adjustment: holm method for 2 tests

Hmm. These results show that the stimulus effects in both task conditions are independently significant. Obviously, the difference between them must also be significant then, or?

pairs(update(pairs(ls1), by=NULL))
# contrast                              estimate         SE df t.ratio p.value
# wrd-nnwrd,naming - wrd-nnwrd,lexdec -0.2611488 0.02790764 43  -9.358  <.0001

They obviously are. As a reminder, the interaction is testing exactly this, the difference of the difference. And we can actually recover the F-value of the interaction using lsmeans alone by invoking yet another of its functions, test(..., joint=TRUE).

test(pairs(update(pairs(ls1), by=NULL)), joint=TRUE)
# df1 df2      F p.value
#   1  43 87.565  <.0001

These last two example were perhaps not particularly interesting from a statistical point of view, but show an important ability of lsmeans. Any set of estimated marginal means produced by lsmeans, including any sort of (custom) contrasts, can be used again for further tests or calculating new sets of marginal means. And with test() we can even obtain joint F-tests over several parameters using joint=TRUE. lsmeans is extremely powerful and one of my most frequently used packages that basically performs all tests following an omnibus test (and in its latest version it directly interfaces with rstanarm so it can now also be used for a lot of Bayesian stuff, but this is the topic of another blog post).

Finally, lsmeans can also be used directly for plotting by envoking lsmip:

lsmip(a1, task ~ stimulus)

Note that lsmip does not add error bars to the estimated marginal means, but only plots the point estimates. There are mainly two reasons for this. First, as soon as repeated-measures factors are involved, it is difficult to decide which error bars to plot. Standard error bars based on the standard error of the mean are not appropriate for within-subjects comparisons. For those, one would need to use a within-subject intervals  (see also here or here). Especially for plots as the current one with both independent-samples and repeated-measures factors (i.e., mixed within-between designs or split-plot designs) no error bar will allow comparisons across both dimensions. Second, only ‘if the SE [i.e., standard error] of the mean is exactly 1/2 the SE of the difference of two means — which is almost never the case — it would be appropriate to use overlapping confidence intervals to test comparisons of means’ (lsmeans author Russel Lenth, the link provides an alternative).

We can also use lsmeans in combination with lattice to plot the results on the unconstrained scale (i.e., after back-transforming tha data from the log scale to the original scale of response time in seconds). The plot is not shown here.

lsm1 <- summary(lsmeans(a1, c("stimulus","task")))
lsm1$lsmean <- exp(lsm1$lsmean)
require(lattice)
xyplot(lsmean ~ stimulus, lsm1, group = task, type = "b", 
       auto.key = list(space = "right"))

 

Summary

  • afex provides a set of functions that make specifying standard ANOVAs for an arbitrary number of between-subjects (i.e., independent-sample) or within-subjects (i.e., repeated-measures) factors easy: aov_car(), aov_ez(), and aov_4().
  • In its default settings, the afex ANOVA functions replicate the results of commercial statistical packages such as SPSS or SAS (using orthogonal contrasts and Type III sums of squares).
  • Fitted ANOVA models can be passed to lsmeans for follow-up tests, custom contrast tests, and plotting.
  • For specific questions visit the new afex support forum: afex.singmann.science (I think we just need someone to ask the first ANOVA question to get the ball rolling).
  • For more examples see the vignette or here (blog post by Ulf Mertens) or download the full example R script used here.

As a caveat, let me end this post with some cautionary remarks from Douglas Bates (fortunes::fortune(184)) who explains why ANOVA in R is supposed to not be the same as in other software packages (i.e., he justifies why it ‘sucks’):

You must realize that R is written by experts in statistics and statistical computing who, despite popular opinion, do not believe that everything in SAS and SPSS is worth copying. Some things done in such packages, which trace their roots back to the days of punched cards and magnetic tape when fitting a single linear model may take several days because your first 5 attempts failed due to syntax errors in the JCL or the SAS code, still reflect the approach of “give me every possible statistic that could be calculated from this model, whether or not it makes sense”. The approach taken in R is different. The underlying assumption is that the useR is thinking about the analysis while doing it.
— Douglas Bates (in reply to the suggestion to include type III sums of squares and lsmeans in base R to make it more similar to SAS or SPSS)
R-help (March 2007)

Maybe he is right, but maybe what I have described here is useful to some degree.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/anova-in-r-afex-may-be-the-solution-you-are-looking-for/feed/ 8 485
Mixed models for ANOVA designs with one observation per unit of observation and cell of the design http://singmann.org/mixed-models-for-anova-designs-with-one-observation-per-unit-of-observation-and-cell-of-the-design/ http://singmann.org/mixed-models-for-anova-designs-with-one-observation-per-unit-of-observation-and-cell-of-the-design/#comments Mon, 29 May 2017 20:34:17 +0000 http://singmann.org/?p=499 Together with David Kellen I am currently working on an introductory chapter to mixed models for a book edited by Dan Spieler and Eric Schumacher (the current version can be found here). The goal is to provide a theoretical and practical introduction that is targeted mainly at experimental psychologists, neuroscientists, and others working with experimental designs and human data. The practical part focuses obviously on R, specifically on lme4 and afex.

One part of the chapter was supposed to deal with designs that cannot be estimated with the maximal random effects structure justified by the design because there is only one observation per participant and cell of the design. Such designs are the classical repeated-measures ANOVA design as ANOVA cannot deal with replicates at the cell levels (i.e., those are usually aggregated to yield one observation per cell and unit of observation). Based on my previous thoughts that turned out to be wrong we wrote the following:

Random Effects Structures for Traditional ANOVA Designs

The estimation of the maximal model is not possible when there is only one observation per participant and cell of a repeated-measures design. These designs are typically analyzed using a repeated-measures ANOVA. Currently, there are no clear guidelines on how to proceed in such situations, but we will try to provide some advice. If there is only a single random effects grouping factor, for example participants, we feel that instead of a mixed model, it is appropriate to use a standard repeated-measures ANOVA that addresses sphericity violations via the Greenhouse-Geisser correction.

One alternative strategy that employs mixed models and that we do not recommend consists of using the random-intercept only model or removing the random slopes for the highest within-subject interaction. The resulting model assumes invariance of the omitted random effects across participants. If this assumption is violated such a model produces results that cannot be trusted . […]

Fortunately, we asked Jake Westfall to take a look at the chapter and Jake responded:

I don’t think I agree with this. In the situation you describe, where we have a single random factor in a balanced ANOVA-like design with 1 observation per unit per cell, personally I am a proponent of the omit-the-the-highest-level-random-interaction approach. In this kind of design, the random slopes for the highest-level interaction are perfectly confounded with the trial-level error term (in more technical language, the model is only identifiable up to the sum of these two variance components), which is what causes the identifiability problems when one tries to estimate the full maximal model there. (You know all of this of course.) So two equivalent ways to make the model identifiable are to (1) omit the error term, i.e., force the residual variance to be 0, or (2) omit the random slopes for the highest-level interaction. Both of these approaches should (AFAIK) result in a statistically equivalent model, but lme4 does not provide an easy way to do (1), so I generally recommend (2). The important point here is that the standard errors should still be correct in either case — because these two variance components are confounded, omitting e.g. the random interaction slopes simply causes that omitted variance component to be implicitly added to the residual variance, where it is still incorporated into the standard errors of the fixed effects in the appropriate way (because the standard error of the fixed interaction looks roughly like sqrt[(var_error + var_interaction)/n_subjects]). I think one could pretty easily put together a little simulation that would demonstrate this.

Hmm, that sounds very reasonable, but can my intuition on the random effects structure and mixed models really be that wrong? To investigate this I followed Jake’s advise and coded a short simulation that tested this and as it turns out, Jake is right and I was wrong.

In the simulation we will simulate a simple one-factor repeated-measures design with one factor with three levels. Importantly, each unit of observation will only have one observation per factor level. We will then fit this simulated data with both repeated-measures ANOVA and random-intercept only mixed model and compare their p-values. Note again that for such a design we cannot estimate random slopes for the condition effect.

First, we need a few packages and set some parameters for our simulation:

require(afex)
set_sum_contrasts() # for orthogonal sum-to-zero contrasts
require(MASS) 

NSIM <- 1e4  # number of simulated data sets
NPAR <- 30  # number of participants per cell
NCELLS <- 3  # number of cells (i.e., groups)

Now we need to generate the data. For this I employed an approach that is clearly not the most parsimonious, but most clearly follows the formulation of a mixed model that has random variability in the condition effect and on top of this residual variance (i.e., the two confounded factors).

We first create a bare bone data.frame with participant id and condition column and a corresponding model.matrix. Then we create the three random parameters (i.e., intercept and the two parameters for the three conditions) using a zero-centered multivarite normal with specified variance-covariance matrix. We then loop over the participant and estimate the predictions deriving from the three random effects parameters. Only after this we add uncorrelated residual variance to the observations for each simulated data set.

dat <- expand.grid(condition = factor(letters[seq_len(NCELLS)]),
                   id = factor(seq_len(NPAR)))
head(dat)
#   condition id
# 1         a  1
# 2         b  1
# 3         c  1
# 4         a  2
# 5         b  2
# 6         c  2

mm <- model.matrix(~condition, dat)
head(mm)
#   (Intercept) condition1 condition2
# 1           1          1          0
# 2           1          0          1
# 3           1         -1         -1
# 4           1          1          0
# 5           1          0          1
# 6           1         -1         -1

Sigma_c_1 <- matrix(0.6, NCELLS,NCELLS)
diag(Sigma_c_1) <- 1
d_c_1 <- replicate(NSIM, mvrnorm(NPAR, rep(0, NCELLS), Sigma_c_1), simplify = FALSE)

gen_dat <- vector("list", NSIM)
for(i in seq_len(NSIM)) {
  gen_dat[[i]] <- dat
  gen_dat[[i]]$dv <- NA_real_
  for (j in seq_len(NPAR)) {
    gen_dat[[i]][(j-1)*3+(1:3),"dv"] <- mm[1:3,] %*% d_c_1[[i]][j,]
  }
  gen_dat[[i]]$dv <- gen_dat[[i]]$dv+rnorm(nrow(mm), 0, 1)
}

Now we only need a function that estimates the ANOVA and mixed model for each data set and returns the p-value and loop over it.

## functions returning p-value for ANOVA and mixed model
within_anova <- function(data) {
  suppressWarnings(suppressMessages(
  a <- aov_ez(id = "id", dv = "dv", data, within = "condition", return = "univariate", anova_table = list(es = "none"))
  ))
  c(without = a[["univariate.tests"]][2,6],
    gg = a[["pval.adjustments"]][1,2],
    hf = a[["pval.adjustments"]][1,4])
}

within_mixed <- function(data) {
  suppressWarnings(
    m <- mixed(dv~condition+(1|id),data, progress = FALSE)  
  )
  c(mixed=anova(m)$`Pr(>F)`)
}

p_c1_within <- vapply(gen_dat, within_anova, rep(0.0, 3))
m_c1_within <- vapply(gen_dat, within_mixed, 0.0)

The following graph shows the results (GG is the results using the Greenhouse-Geisser adjustment for sphericity violations).

ylim <- c(0, 700)
par(mfrow = c(1,3))
hist(p_c1_within[1,], breaks = 20, main = "ANOVA (default)", xlab = "p-value", ylim=ylim)
hist(p_c1_within[2,], breaks = 20, main = "ANOVA (GG)", xlab = "p-value", ylim=ylim)
hist(m_c1_within, breaks = 20, main = "Random-Intercept Model", xlab = "p-value", ylim=ylim)

What these graph clearly shows is that the p-value distribution for the standard repeated-measures ANOVA and the random-intercept mixed model is virtually identical. This clearly shows that my intuition was wrong and Jake was right.

We also see that for ANOVA and mixed model the rate of significant findings with p < .05 is slightly above the nominal level. More specifically:

mean(p_c1_within[1,] < 0.05) # ANOVA default
# [1] 0.0684
mean(p_c1_within[2,] < 0.05) # ANOVA GG
# [1] 0.0529
mean(p_c1_within[3,] < 0.05) # ANOVA HF
# [1] 0.0549
mean(m_c1_within < 0.05)     # random-intercept mixed model
# [1] 0.0701

These additional results indicate that maybe one also needs to adjust the degrees of freedom for mixed models for violations of sphericity. But this is not the topic of today’s post.

To sum this up, this simulation shows that removing the highest-order random slope seems to be the right decision if one wants to use a mixed model for a design with one observation per cell of the design and participant, but wants to implement the ‘maximal random effects structure’.

One more thing to note. Ben Bolker raised the same issue and pointed us to one of his example analyses of the starling data that is relevant to the current question (alternatively, the more up to date Rmd file). We are very grateful that Jake and Ben took the time to go through our chapter!

You can also download the RMarkdown file of the simulation.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/mixed-models-for-anova-designs-with-one-observation-per-unit-of-observation-and-cell-of-the-design/feed/ 3 499
rtdists 0.7-2: response time distributions now with Rcpp and faster http://singmann.org/rtdists-0-7-2-response-time-distributions-now-with-rcpp-and-faster/ http://singmann.org/rtdists-0-7-2-response-time-distributions-now-with-rcpp-and-faster/#comments Fri, 26 May 2017 07:11:09 +0000 http://singmann.org/?p=475 It took us quite a while but we have finally released a new version of rtdists to CRAN which provides a few significant improvements. As a reminder, rtdists

[p]rovides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA)  with different distributions underlying the drift rate.

The main reason it took us relatively long to push the new version was that we had a problem with the C code for the diffusion model that we needed to sort out first. Specifically, the CDF (i.e., pdiffusion) in versions prior to 0.5-2 did not produce correct results in many cases (one consequence of this is that the model predictions given in the previous blog post are wrong). As a temporary fix, we resorted to the correct but slow numerical integration of the PDF (i.e., ddiffusion) to obtain the CDF in version 0.5-2 and later. Importantly, it appears as if the error was not present in  fastdm which is the source of the C code we use. Matthew Gretton carefully investigated the original C code, changed it such that it connects to R via Rcpp, and realized that there are two different variants of the CDF, a fast variant and a precise variant. Up to this point we had only used the fast variant and, as it turns out, this was responsible for our incorrect results. We now per default use the precise variant (which only seems to be marginally slower) as it produces the correct results for all cases we have tested (and we have tested quite a few).

In addition to a few more minor changes (see NEWS for full list), we made two more noteworthy changes. First, all diffusion functions as well as rLBA received a major performance update, mainly in situations with trial-wise parameters. Now it should almost always be fastest to call the diffusion functions (e.g., ddiffusion) only once with vectorized parameters instead of calling it several times for different sets of parameters. The speed up with the new version depends on the number of unique parameter sets, but even with only a few different sets the speed up should be clearly noticeable. For completely trial-wise parameters the speed-up should be quite dramatic.

Second, I also updated the vignette which now uses the tidyverse in, I believe, a somewhat more reasonable manner. Specifically, it now is built on nested data (via tidyr::nest) and purrr::map instead of relying heavily on dplyr::do.  The problem I had with dplyr::do is that it often leads to somewhat ugly syntax. The changes in the vignette are mainly due to me reading Chapter 25 in the great R for Data Science book by Wickham and Gorlemund. However, I still prefer lattice over ggplot2.

Example Analysis

To show the now correct behavior of the diffusion CDF let me repeat the example from the last post. As a reminder, we somewhat randomly pick one participant from the speed_acc data set and fit both diffusion model and LBA to the data.

require(rtdists)

# Exp. 1; Wagenmakers, Ratcliff, Gomez, & McKoon (2008, JML)
data(speed_acc)   
# remove excluded trials:
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) 
# create numeric response variable where 1 is an error and 2 a correct response: 
speed_acc$corr <- with(speed_acc, as.numeric(stim_cat == response))+1 
# select data from participant 11, accuracy condition, non-word trials only
p11 <- speed_acc[speed_acc$id == 11 & 
                   speed_acc$condition == "accuracy" & 
                   speed_acc$stim_cat == "nonword",] 
prop.table(table(p11$corr))
#          1          2 
# 0.04166667 0.95833333 


ll_lba <- function(pars, rt, response) {
  d <- dLBA(rt = rt, response = response, 
            A = pars["A"], 
            b = pars["A"]+pars["b"], 
            t0 = pars["t0"], 
            mean_v = pars[c("v1", "v2")], 
            sd_v = c(1, pars["sv"]), 
            silent=TRUE)
  if (any(d == 0)) return(1e6)
  else return(-sum(log(d)))
}

start <- c(runif(3, 0.5, 3), runif(2, 0, 0.2), runif(1))
names(start) <- c("A", "v1", "v2", "b", "t0", "sv")
p11_norm <- nlminb(start, ll_lba, lower = c(0, -Inf, 0, 0, 0, 0), 
                   rt=p11$rt, response=p11$corr)
p11_norm[1:3]
# $par
#          A         v1         v2          b         t0         sv 
#  0.1182940 -2.7409230  1.0449963  0.4513604  0.1243441  0.2609968 
# 
# $objective
# [1] -211.4202
# 
# $convergence
# [1] 0


ll_diffusion <- function(pars, rt, response) 
{
  densities <- ddiffusion(rt, response=response, 
                          a=pars["a"], 
                          v=pars["v"], 
                          t0=pars["t0"], 
                          sz=pars["sz"], 
                          st0=pars["st0"],
                          sv=pars["sv"])
  if (any(densities == 0)) return(1e6)
  return(-sum(log(densities)))
}

start <- c(runif(2, 0.5, 3), 0.1, runif(3, 0, 0.5))
names(start) <- c("a", "v", "t0", "sz", "st0", "sv")
p11_diff <- nlminb(start, ll_diffusion, lower = 0, 
                   rt=p11$rt, response=p11$corr)
p11_diff[1:3]
# $par
#         a         v        t0        sz       st0        sv 
# 1.3206011 3.2727202 0.3385602 0.4621645 0.2017950 1.0551706 
# 
# $objective
# [1] -207.5487
# 
# $convergence
# [1] 0

As is common, we pass the negative summed log-likelihood to the optimization algorithm (here trusty nlminb) and hence lower values of objective indicate a better fit. Results show that the LBA provides a somewhat better account. The interesting question is whether this somewhat better fit translates into a visibly better fit when comparing observed and predicted quantiles.

# quantiles:
q <- c(0.1, 0.3, 0.5, 0.7, 0.9)

## observed data:
(p11_q_c <- quantile(p11[p11$corr == 2, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4900 0.5557 0.6060 0.6773 0.8231 
(p11_q_e <- quantile(p11[p11$corr == 1, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4908 0.5391 0.5905 0.6413 1.0653 

### LBA:
# predicted error rate  
(pred_prop_correct_lba <- pLBA(Inf, 2, 
                               A = p11_norm$par["A"], 
                               b = p11_norm$par["A"]+p11_norm$par["b"], 
                               t0 = p11_norm$par["t0"], 
                               mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), 
                               sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.9581342

(pred_correct_lba <- qLBA(q*pred_prop_correct_lba, response = 2, 
                          A = p11_norm$par["A"], 
                          b = p11_norm$par["A"]+p11_norm$par["b"], 
                          t0 = p11_norm$par["t0"], 
                          mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), 
                          sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4871710 0.5510265 0.6081855 0.6809796 0.8301286
(pred_error_lba <- qLBA(q*(1-pred_prop_correct_lba), response = 1, 
                        A = p11_norm$par["A"], 
                        b = p11_norm$par["A"]+p11_norm$par["b"], 
                        t0 = p11_norm$par["t0"], 
                        mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), 
                        sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4684374 0.5529575 0.6273737 0.7233961 0.9314820


### diffusion:
# same result as when using Inf, but faster:
(pred_prop_correct_diffusion <- pdiffusion(rt = 20,  response = "upper",
                                      a=p11_diff$par["a"], 
                                      v=p11_diff$par["v"], 
                                      t0=p11_diff$par["t0"], 
                                      sz=p11_diff$par["sz"], 
                                      st0=p11_diff$par["st0"], 
                                      sv=p11_diff$par["sv"]))  
# [1] 0.964723

(pred_correct_diffusion <- qdiffusion(q*pred_prop_correct_diffusion, 
                                      response = "upper",
                                      a=p11_diff$par["a"], 
                                      v=p11_diff$par["v"], 
                                      t0=p11_diff$par["t0"], 
                                      sz=p11_diff$par["sz"], 
                                      st0=p11_diff$par["st0"], 
                                      sv=p11_diff$par["sv"]))
# [1] 0.4748271 0.5489903 0.6081182 0.6821927 0.8444566
(pred_error_diffusion <- qdiffusion(q*(1-pred_prop_correct_diffusion), 
                                    response = "lower",
                                    a=p11_diff$par["a"], 
                                    v=p11_diff$par["v"], 
                                    t0=p11_diff$par["t0"], 
                                    sz=p11_diff$par["sz"], 
                                    st0=p11_diff$par["st0"], 
                                    sv=p11_diff$par["sv"]))
# [1] 0.4776565 0.5598018 0.6305120 0.7336275 0.9770047


### plot predictions

par(mfrow=c(1,2), cex=1.2)
plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "LBA")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_lba, q*pred_prop_correct_lba, type = "b")
lines(pred_error_lba, q*(1-pred_prop_correct_lba), type = "b")
legend("right", legend = c("data", "predictions"), pch = c(2, 1), lty = c(0, 1))

plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "Diffusion")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_diffusion, q*pred_prop_correct_diffusion, type = "b")
lines(pred_error_diffusion, q*(1-pred_prop_correct_diffusion), type = "b")

The fit plot compares observed quantiles (as triangles) with predicted quantiles (circles connected by lines). Here we decided to plot the 10%, 30%, 50%, 70% and 90% quantiles. In each plot, the x-axis shows RTs and the y-axis cumulative probabilities. From this it follows that the upper line and points correspond to the correct trials (which are common) and the lower line and points to the incorrect trials (which are uncommon). For both models the fit looks pretty good especially for the correct trials. However, it appears the LBA does a slightly better job in predicting very fast and slow trials here, which may be responsible for the better fit in terms of summed log-likelihood. In contrast, the diffusion model seems somewhat better in predicting the long tail of the error trials.

Checking the CDF

Finally, we can also check whether the analytical CDF does in fact correspond to the empirical CDF of the data. For this we can compare the analytical CDF function pdiffusion to the empirical CDF obtained from random deviates. One thing one needs to be careful about is that pdiffusion provides the ‘defective’ CDF that only approaches one if one adds the CDF for both response boundaries. Consequently, to compare the empirical CDF for one response with the analytical CDF, we need to scale the latter to also go from 0 to 1 (simply by dividing it by its maximal value). Here we will use the parameters values obtained in the previous fit.

rand_rts <- rdiffusion(1e5, a=p11_diff$par["a"], 
                            v=p11_diff$par["v"], 
                            t0=p11_diff$par["t0"], 
                            sz=p11_diff$par["sz"], 
                            st0=p11_diff$par["st0"], 
                            sv=p11_diff$par["sv"])
plot(ecdf(rand_rts[rand_rts$response == "upper","rt"]))

normalised_pdiffusion = function(rt,...) pdiffusion(rt,...)/pdiffusion(rt=Inf,...) 
curve(normalised_pdiffusion(x, response = "upper",
                            a=p11_diff$par["a"], 
                            v=p11_diff$par["v"], 
                            t0=p11_diff$par["t0"], 
                            sz=p11_diff$par["sz"], 
                            st0=p11_diff$par["st0"], 
                            sv=p11_diff$par["sv"]), 
      add=TRUE, col = "yellow", lty = 2)

This figure shows that the analytical CDF (in yellow) lies perfectly on top the empirical CDF (in black). If it does not for you, you still use an old version of rtdists. We have also added a series of unit tests to rtdists that compare the empirical CDF to the analytical CDF (using ks.test) for a variety of parameter values to catch if such a problem ever occurs again.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/rtdists-0-7-2-response-time-distributions-now-with-rcpp-and-faster/feed/ 2 475
New Version of rtdists on CRAN (v. 0.4-9): Accumulator Models for Response Time Distributions http://singmann.org/new-version-of-rtdists-on-cran-v-0-4-9-accumulator-models-for-response-time-distributions/ http://singmann.org/new-version-of-rtdists-on-cran-v-0-4-9-accumulator-models-for-response-time-distributions/#comments Sun, 03 Apr 2016 13:59:49 +0000 http://singmann.org/?p=391 I have just submitted a new version of rtdists to CRAN (v. 0.4-9). As I haven’t mentioned rtdists on here yet, let me simply copy it’s description as a short introduction, a longer introduction follows below:

Provides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA) with different distributions underlying the drift rate.

Cognitive models of response time distributions are (usually) bivariate distributions that simultaneously account for choices and corresponding response latencies. The arguably most prominent of these models are the Ratcliff diffusion model and the linear ballistic accumulator (LBA) . The main assumption of both is the idea of an internal evidence accumulation process. As soon as the accumulated evidence reaches a specific threshold the corresponding response is invariably given. To predict errors, the evidence accumulation process in each model can reach the wrong threshold (because it is noisy or because of variability in its direction). The central parameters of both models are the quality of the evidence accumulation process (the drift rate) and the position of the threshold. The latter can be voluntarily set by the decision maker, for example to trade off speed and accuracy. Additionally, the models can account for an initial bias towards one response (via position of the start point) and non-decision processes. To account for differences between the distribution besides their differential weighting (e.g., fast or slow errors) the models allow trial-by-trial variability of most parameters.

The new version of rtdists provides a completely new interface for the LBA and a considerably overhauled interface for the diffusion model. In addition the package now provides quantile functions for both models. In line with general R designs for distribution functions, the density starts with d (dLBA & ddiffusion), the distribution function with p (pLBA & pdiffusion), the quantile function with q (qLBA & qdiffusion), and the random generation with r (rLBA & rdiffusion). All main functions are now fully vectorized across all parameters and also across response (i.e., boundary or accumulator).

As an example, I will show how to estimate both models for a single individual data set using trial wise maximum likelihood estimation (in contrast to the often used binned chi-square estimation). We will be using one (somewhat randomly picked) participant from the data set that comes as an example with rtdists, speed_acc . Thanks to EJ Wagenmakers for providing this data and allowing it to be published on CRAN. We first prepare the data and plot the response time distribution.

require(rtdists)

require(lattice) # for plotting
lattice.options(default.theme = standard.theme(color = FALSE))
lattice.options(default.args = list(as.table = TRUE))

# Exp. 1; Wagenmakers, Ratcliff, Gomez, & McKoon (2008, JML)
data(speed_acc)   
# remove excluded trials:
speed_acc <- droplevels(speed_acc[!speed_acc$censor,]) 
# create numeric response variable where 1 is an error and 2 a correct response: 
speed_acc$corr <- with(speed_acc, as.numeric(stim_cat == response))+1 
# select data from participant 11, accuracy condition, non-word trials only
p11 <- speed_acc[speed_acc$id == 11 & speed_acc$condition == "accuracy" & speed_acc$stim_cat == "nonword",] 
prop.table(table(p11$corr))
#          1          2 
# 0.04166667 0.95833333 

densityplot(~rt, p11, group = corr, auto.key=TRUE, plot.points=FALSE, weights = rep(1/nrow(p11), nrow(p11)), ylab = "Density")

p11_nonwords_online

The plot obviously does not show the true density of both response time distributions (which can also be inferred from the warning messages produced by the call to densityplot) but rather the defective density in which only the sum of both integrals is one. This shows that there are indeed a lot more correct responses (around 96% of the data) and that the error RTs have quite a long tail.

To estimate the LBA for this data we simply need a wrapper function to which we can pass the RTs and responses and which will return the summed log-likelihood of all data points (actually the negative value of that because most optimizers minimize per default). This function and the data then only needs to be passed to our optimizer of choice (I like nlminb). To make the model identifiable we fix the SD of the drift rate for error RTs to 1 (other choices would be possible). The model converges at a maximum likelihood estimate (MLE) of 211.42 with parameters that look reasonable (not that the boundary b is parametrized as A + b). One might wonder about the mean negative dirft rate for error RTs, but the default for the LBA is a normal truncated at zero so even though the mean is negative, it only produces positive drift rates (negative drift rates could produce unidentified RTs).

ll_lba <- function(pars, rt, response) {
  d <- dLBA(rt = rt, response = response, A = pars["A"], b = pars["A"]+pars["b"], t0 = pars["t0"], mean_v = pars[c("v1", "v2")], sd_v = c(1, pars["sv"]), silent=TRUE)
  if (any(d == 0)) return(1e6)
  else return(-sum(log(d)))
}

start <- c(runif(3, 0.5, 3), runif(2, 0, 0.2), runif(1))
names(start) <- c("A", "v1", "v2", "b", "t0", "sv")
p11_norm <- nlminb(start, ll_lba, lower = c(0, -Inf, 0, 0, 0, 0), rt=p11$rt, response=p11$corr)
p11_norm
# $par
#          A         v1         v2          b         t0         sv 
#  0.1182951 -2.7409929  1.0449789  0.4513499  0.1243456  0.2609930 
# 
# $objective
# [1] -211.4202
# 
# $convergence
# [1] 0
# 
# $iterations
# [1] 57
# 
# $evaluations
# function gradient 
#       76      395 
# 
# $message
# [1] "relative convergence (4)"

We also might want to fit the diffusion model to these data. For this we need a similar wrapper. However, as the diffusion model can fail for certain parameter combinations the safest way is to wrap the ddiffusion call into a tryCatch call. Note that the diffusion model is already identified as the diffusion constant is set to 1 internally. Note that obtaining that fit can take longer than for the LBA and might need a few different tries with different random starting values to reach the MLE which is at 207.55. The lower MLE indicates that the diffusion model provides a somewhat worse account for this data set, but the parameters look reasonable.

ll_diffusion <- function(pars, rt, boundary) 
{
  densities <- tryCatch(ddiffusion(rt, boundary=boundary, a=pars[1], v=pars[2], t0=pars[3], z=0.5, sz=pars[4], st0=pars[5], sv=pars[6]), error = function(e) 0)
  if (any(densities == 0)) return(1e6)
  return(-sum(log(densities)))
}

start <- c(runif(2, 0.5, 3), 0.1, runif(3, 0, 0.5))
names(start) <- c("a", "v", "t0", "sz", "st0", "sv")
p11_fit <- nlminb(start, ll_diffusion, lower = 0, rt=p11$rt, boundary=p11$corr)
p11_fit
# $par
#         a         v        t0        sz       st0        sv 
# 1.3206011 3.2727201 0.3385602 0.3499652 0.2017950 1.0551704 
# 
# $objective
# [1] -207.5487
# 
# $convergence
# [1] 0
# 
# $iterations
# [1] 31
# 
# $evaluations
# function gradient 
#       50      214 
# 
# $message
# [1] "relative convergence (4)"

Finally, we might be interested to assess the fit of the models graphically in addition to simply comparing their MLEs (see also ). Specifically, we will produce a version of a quantile probability plot in which we plot for the .1, .3, .5, .7, and .9 quantile both the RTs and cumulative probabilities and compare the model predictions with those values from the data (see , pp. 162). For this we need both the CDFs and the quantile functions. The cumulative probabilities are simply the quantiles for each response, for example, the .1 quantile for the error RTs is .1 times the overall error rate (which is .04166667). Therefore, the first step in obtaining the model predictions is to obtain the predicted error rate by evaluating the CDF at infinity (or a high value). We use this obtained error rate then to get the actual quantiles for each response which are then used to obtain the corresponding predicted RTs using the quantile functions. Finally, we plot predictions and observed data separately for both models.

# quantiles:
q <- c(0.1, 0.3, 0.5, 0.7, 0.9)

## observed data:
(p11_q_c <- quantile(p11[p11$corr == 2, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4900 0.5557 0.6060 0.6773 0.8231 
(p11_q_e <- quantile(p11[p11$corr == 1, "rt"], probs = q))
#    10%    30%    50%    70%    90% 
# 0.4908 0.5391 0.5905 0.6413 1.0653 

### LBA:
# predicted error rate  
(pred_prop_correct_lba <- pLBA(Inf, 2, A = p11_norm$par["A"], b = p11_norm$par["A"]+p11_norm$par["b"], t0 = p11_norm$par["t0"], mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.9581342

(pred_correct_lba <- qLBA(q*pred_prop_correct_lba, response = 2, A = p11_norm$par["A"], b = p11_norm$par["A"]+p11_norm$par["b"], t0 = p11_norm$par["t0"], mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4871709 0.5510265 0.6081855 0.6809797 0.8301290
(pred_error_lba <- qLBA(q*(1-pred_prop_correct_lba), response = 1, A = p11_norm$par["A"], b = p11_norm$par["A"]+p11_norm$par["b"], t0 = p11_norm$par["t0"], mean_v = c(p11_norm$par["v1"], p11_norm$par["v2"]), sd_v = c(1, p11_norm$par["sv"])))
# [1] 0.4684367 0.5529569 0.6273732 0.7233959 0.9314825


### diffusion:
# same result as when using Inf, but faster:
(pred_prop_correct_diffusion <- do.call(pdiffusion, args = c(rt = 20, as.list(p11_fit$par), boundary = "upper")))  
# [1] 0.938958

(pred_correct_diffusion <- qdiffusion(q*pred_prop_correct_diffusion, a=p11_fit$par["a"], v=p11_fit$par["v"], t0=p11_fit$par["t0"], sz=p11_fit$par["sz"], st0=p11_fit$par["st0"], sv=p11_fit$par["sv"], boundary = "upper"))
# [1] 0.4963608 0.5737010 0.6361651 0.7148225 0.8817063
(pred_error_diffusion <- qdiffusion(q*(1-pred_prop_correct_diffusion), a=p11_fit$par["a"], v=p11_fit$par["v"], t0=p11_fit$par["t0"], sz=p11_fit$par["sz"], st0=p11_fit$par["st0"], sv=p11_fit$par["sv"], boundary = "lower"))
# [1] 0.4483908 0.5226722 0.5828972 0.6671577 0.8833553


### plot predictions

par(mfrow=c(1,2), cex=1.2)
plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "LBA")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_lba, q*pred_prop_correct_lba, type = "b")
lines(pred_error_lba, q*(1-pred_prop_correct_lba), type = "b")
legend("right", legend = c("data", "predictions"), pch = c(2, 1), lty = c(0, 1))

plot(p11_q_c, q*prop.table(table(p11$corr))[2], pch = 2, ylim=c(0, 1), xlim = c(0.4, 1.3), ylab = "Cumulative Probability", xlab = "Response Time (sec)", main = "Diffusion")
points(p11_q_e, q*prop.table(table(p11$corr))[1], pch = 2)
lines(pred_correct_diffusion, q*pred_prop_correct_diffusion, type = "b")
lines(pred_error_diffusion, q*(1-pred_prop_correct_diffusion), type = "b")

p11_predictions_online

The plot confirms the somewhat better fit for the LBA compared to the diffusion model for this data set; while the LBA provides a basically perfect fit for the correct RTs, the diffusion model is somewhat off, especially for the higher quantiles. However, both models have similar problems predicting the long tail for the error RTs.

Many thanks to my package coauthors, Andrew Heathcote, Scott Brown, and Matthew Gretton, for developing rtdists with me. And also many thanks to Andreas and Jochen Voss for releasing their C code of the diffusion model under the GPL.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/new-version-of-rtdists-on-cran-v-0-4-9-accumulator-models-for-response-time-distributions/feed/ 1 391
Hierarchical MPT in Stan I: Dealing with Convergent Transitions via Control Arguments http://singmann.org/hierarchical-mpt-in-stan-i-dealing-with-convergent-transitions-via-control-arguments/ http://singmann.org/hierarchical-mpt-in-stan-i-dealing-with-convergent-transitions-via-control-arguments/#comments Sat, 05 Mar 2016 12:54:12 +0000 http://singmann.org/?p=337 I have recently restarted working with Stan and unfortunately ran into the problem that my (hierarchical) Bayesian models often produced divergent transitions. And when this happens, the warning basically only suggests to increase adapt_delta:

Warning messages:
1: There were X divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
2: Examine the pairs() plot to diagnose sampling problems

However, increasing adapt_delta often does not help, even if one uses values such as .99. Also, I never found pairs() especially illuminating. This is the first of two blog posts dealing with this issue. In this (the first) post I will show which Stan settings need to be changed to remove the divergent transitions (to foreshadow, these are adapt_delta, stepsize, and max_treedepth). In the next blog post I will show how reparameterizations of the model following Stan recommendations can remove divergent transitions often without the necessity to extensively fiddle with the sampler settings while at the same time dramatically improve the fitting speed.

My model had some similarities to the multinomial processing tree (MPT) example in the Lee and Wagenmakers cognitive modeling book. As I am a big fan of both, MPTs and the book, I investigated the issue of divergent transitions using this example. Luckily, a first implementation of all the examples of Lee and Wagenmakers in Stan has been provided by Martin Šmíra (who is now working on his PhD in Birmingham) and is part of the Stan example models. I submitted a pull request with the changes to the model discussed here so they are now also part of the example models (and contains a README file discussing those changes).

The example uses the pair-clustering model also discussed in the paper introducing MPTs formally . The model has three parameters, c for cluster-storage, r for cluster-retrieval, and u for unique storage-retrieval. For the hierarchical structure the model employs the latent trait approach of : The group level (i.e., hyper-) parameters are estimated separately on the unconstrained space from -infinity to +infinity. Individual level parameters are added to the group means as displacements estimated from a multivariate normal with mean zero and freely estimated variance/covariance matrix. Only then is the unconstrained space mapped onto the unit range (i.e., 0 to 1), which represents the parameter space, via the probit transformation. This allows to freely estimate the correlation among the individual parameters on the unconstrained space and at the same time constrains the parameters after transformation onto the allowed range.

The original implementation employed two features that are particularly useful for models estimated via Gibbs sampling (as implemented in Jags), but not so much for the NUTS sampler implemented in Stan: (a) A scaled inverse Wishart as prior for the covariance matrix due to its computational convenience (following ) and (b) parameter expansion to move the scale parameters of the variance-covariance matrix away from zero and ensure reasonable priors.

The original implementation of the model in Stan is simply a literal translation of the Jags code given in Lee and Wagenmakers. Consequently, it retains the Gibbs specific features. When fitting this model it seems to produce stable estimates, but Stan reports several divergent transitions after warm up. Given that the estimates seem stable and the results basically replicate what is reported in Lee and Wagenmakers (Figures 14.5 and 14.6) one may wonder why not too trust these results. I can give no full explanation, so let me copy the relevant part from the shinystan help. Important is the last section, it clearly says not to use the results if there are any divergent transitions.

n_divergent

Quick definition The number of leapfrog transitions with diverging error. Because NUTS terminates at the first divergence this will be either 0 or 1 for each iteration. The average value of n_divergent over all iterations is therefore the proportion of iterations with diverging error.

More details

Stan uses a symplectic integrator to approximate the exact solution of the Hamiltonian dynamics and when stepsize is too large relative to the curvature of the log posterior this approximation can diverge and threaten the validity of the sampler. n_divergent counts the number of iterations within a given sample that have diverged and any non-zero value suggests that the samples may be biased in which case the step size needs to be decreased. Note that, because sampling is immediately terminated once a divergence is encountered, n_divergent should be only 0 or 1.

If there are any post-warmup iterations for which n_divergent = 1 then the results may be biased and should not be used. You should try rerunning the model with a higher target acceptance probability (which will decrease the step size) until n_divergent = 0 for all post-warmup iterations.

My first step trying to get rid of the divergent transitions was to increase adapt_delta as suggested by the warning. But as said initially, this did not help in this case even when using quite high values such as .99 or .999. Fortunately, the quote above tells that divergent transitions are related to the stepsize with which the sampler traverses the posterior. stepsize is also one of the control arguments one can pass to Stan in addition to adapt_delta. Unfortunately, the stan help page is relatively uninformative with respect to the stepsize argument and does not even provide its default value, it simply says stepsize (double, positive). Bob Carpenter clarified on the Stan mailing list that the default value is 1 (referring to the CMD Stan documentation). He goes on:

The step size is just the initial step size.  It lets the first few iterations move around a bit and set relative scales on the parameters.  It’ll also reduce numerical issues. On the negative side, it will also be slower because it’ll take more steps at a smaller step size before hitting a U-turn.

The adapt_delta (target acceptance rate) determines what the step size will be during sampling — the higher the accept rate, the lower the step size has to be.  The lower the step size, the less likely there are to be divergent (numerically unstable) transitions.

Taken together, this means that divergent transitions can be dealt with by increasing adapt_delta above the default value of .8 while at the same time decreasing the initial stepsize below the default value of 1. As this may increase the necessary number of steps one might also need to increase the max_treedepth above the default value of 10. After trying out various different values, the following set of control arguments seems to remove all divergent transitions in the example model (at the cost of prolonging the fitting process quite considerably):

control = list(adapt_delta = 0.999, stepsize = 0.001, max_treedepth = 20)

As this uses rstan, the R interface to stan, here the full call:

samples_1 <- stan(model_code=model,   
                  data=data, 
                  init=myinits,  # If not specified, gives random inits
                  pars=parameters,
                  iter=myiterations, 
                  chains=3, 
                  thin=1,
                  warmup=mywarmup,  # Stands for burn-in; Default = iter/2
                  control = list(adapt_delta = 0.999, stepsize = 0.01, max_treedepth = 15)
)

With these values the traceplots of the post-warmup samples look pretty good. Even for the sigma parameters which occasionally have problems moving away from 0. As you can see from these nice plots, rstan uses ggplot2.

traceplot(samples_1, pars = c("muc", "mur", "muu", "Omega", "sigma", "lp__"))

traceplots_orig

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6
]]>
http://singmann.org/hierarchical-mpt-in-stan-i-dealing-with-convergent-transitions-via-control-arguments/feed/ 2 337
" ["headers"]=> object(WpOrg\Requests\Response\Headers)#5544 (1) { ["data":protected]=> array(10) { ["content-type"]=> array(1) { [0]=> string(34) "application/rss+xml; charset=UTF-8" } ["x-ws-ratelimit-limit"]=> array(1) { [0]=> string(4) "1000" } ["x-ws-ratelimit-remaining"]=> array(1) { [0]=> string(3) "999" } ["date"]=> array(1) { [0]=> string(29) "Fri, 21 Mar 2025 10:13:11 GMT" } ["server"]=> array(1) { [0]=> string(6) "Apache" } ["x-powered-by"]=> array(1) { [0]=> string(10) "PHP/8.3.19" } ["vary"]=> array(1) { [0]=> string(19) "accept,content-type" } ["link"]=> array(1) { [0]=> string(56) "; rel="https://api.w.org/"" } ["last-modified"]=> array(1) { [0]=> string(29) "Fri, 22 Mar 2024 20:46:35 GMT" } ["etag"]=> array(1) { [0]=> string(34) ""0d2bbeef40ab9477580aaf48ebef6249"" } } } ["status_code"]=> int(200) ["protocol_version"]=> float(1.1) ["success"]=> bool(true) ["redirects"]=> int(0) ["url"]=> string(25) "http://singmann.org/feed/" ["history"]=> array(0) { } ["cookies"]=> object(WpOrg\Requests\Cookie\Jar)#5541 (1) { ["cookies":protected]=> array(0) { } } } ["filename":protected]=> NULL ["data"]=> NULL ["headers"]=> NULL ["status"]=> NULL } }