More on Biontech/Pfizer’s Covid-19 vaccine trial: Adjusting for interim testing in the Bayesian analysis
This article is originally published at http://skranz.github.io/Here is one more post about Biontech/Pfizer’s vaccine trial. That is because my previous two posts (here and here) have so far ignored one interesting topic: How do Biontech/Pfizer statistically account for interim analyses? Studying this topic gives us also general insights into how adaptive trial designs with multiple success conditions can be properly evaluated.According to their study plan sufficient vaccine efficacy is established if one can infer that the efficacy is at least 30% with a type I error (wrongly declaring sufficient efficacy) of no more than 2.5%. While the final evaluation shall take place once there are 164 confirmed Covid-19 cases among the 43538 study participants, the plan states that also at intermediate stages of 32, 62, 92, and 120 cases overwhelming efficacy or, if outcomes are bad, futility can be declared.Here is a screenshot of Table 5 on p. 103 in the study plan that specifies the exact success and futility thresholds for the interim and final efficacy analyses:
# Parameters of prior beta distribution
a0 = 0.700102; b0 = 1
# Covid cases in treatment and control group
mv = 6; mc=26
# Compute posterior density of theta
theta.seq = seq(0,1,by=0.01)
density = dbeta(theta.seq,a0+mv,b0+mc)
# Thresholds
VE.min = 0.3
theta.max = (1-VE.min)/(2-VE.min) # 0.41176
ggplot(data.frame(theta=theta.seq, density=density), aes(x=theta, y=density)) +
geom_area(col="blue", fill="blue", alpha=0.5)+
geom_vline(xintercept=theta.max) +
ggtitle("Posterior if from 32 Covid cases 6 were vaccinated")
prob.VE.above.30 = pbeta(theta.max,a0+mv, b0+mc)
round(prob.VE.above.30*100,3)
## [1] 99.648
Interim efficacy claim: P(VE > 30% given data) > 0.995For the final analysis the footnote also implies an error bound below 2.5%:
success at the final analysis: P(VE > 30% given data) > 0.986.The crucial point is that the 2.5% bound on the type I error shall hold for the complete analysis that allows to declare sufficient efficacy at 5 different occasions: either at one of the 4 interim analysis or at the final analysis. In a similar fashion as controlling for multiple testing, we have to correct the individual error thresholds of each of the 5 analyses to guarantee an overall 2.5% error bound. The Bayesian framework does not relieve us from such a “multiple testing correction”.So how does one come up with error bounds for each separate analysis that guarantee a total error bound of 2.5%? For an overview and links to relevant literature you can e.g. look at the article “Do we need to adjust for interim analyses in a Bayesian adaptive trial design?” by Ryan et al. (2020). The article states that in practice the overall error bound of a Bayesian trial design with interim stopping opportunities is usually computed via simulations.While I want to reiterate that I am no biostatistician, I would like to make an educated guess how such simulations could have looked like.The following code simulates a trial run until
m.max=164
Covid-19 cases are observed, assuming a true vaccine efficacy of only VE.true = 30%
:simulate.trial = function(runid=1, m.max=164, VE.true = 0.3,
VE.min=VE.true, a0 = 0.700102, b0 = 1, m.analyse = 1:m.max) {
theta.true = (1-VE.true)/(2-VE.true)
theta.max = (1-VE.min)/(2-VE.min)
is.vaccinated = ifelse(runif(m.max) >= theta.true,0,1)
mv = cumsum(is.vaccinated)[m.analyse]
mc = m.analyse - mv
prob.above.VE.min = pbeta(theta.max,shape1 = a0+mv, shape2=b0+mc, lower.tail=TRUE)
# Returning results as matrix is faster than as data frame
cbind(runid=runid, m=m.analyse, mv=mv,mc=mc, prob.above.VE.min)
}
set.seed(42)
dat = simulate.trial() %>% as_tibble
dat
## # A tibble: 164 x 5
## runid m mv mc prob.above.VE.min
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 0 1 0.759
## 2 1 2 0 2 0.869
## 3 1 3 1 2 0.618
## 4 1 4 1 3 0.746
## 5 1 5 1 4 0.834
## 6 1 6 1 5 0.893
## 7 1 7 1 6 0.932
## 8 1 8 2 6 0.828
## 9 1 9 2 7 0.881
## 10 1 10 2 8 0.919
## # ... with 154 more rows
m
Covid-19 cases have been observed. The corresponding number of cases from vaccinated subjects mv
and from control group subjects mc
are random. The column prob.above.VE.min
denotes the posterior probability that the vaccine efficacy is better than VE.min = 30%
given the simulated interim data and Biontech/Pfizer’s assumed prior characterized by the arguments a0
and b0
. You see how every additional Covid-19 case of a vaccinated subject reduces prob.above.VE.min
while every additional Covid-19 case of a control group subject increases it.Let us plot at these posterior probabilities of a vaccine effectiveness above 30% for all possible interim analyses and the final analysis for our simulated trial:ggplot(dat, aes(x=m, y=prob.above.VE.min*100)) +
geom_line() + geom_hline(yintercept = 97.5)
simulate.trial
function can also simulate the relevant values just for a smaller set of interim analyses, e.g. the five analyses planed by Biontech/Pfizer:simulate.trial(m.analyse = c(32,64,90,120,164)) %>% as_tibble
## # A tibble: 5 x 5
## runid m mv mc prob.above.VE.min
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 32 14 18 0.393
## 2 1 64 25 39 0.639
## 3 1 90 40 50 0.270
## 4 1 120 54 66 0.202
## 5 1 164 79 85 0.0364
set.seed(1)
dat = do.call(rbind,lapply(1:100000, simulate.trial, m.analyse = c(32,64,90,120,164))) %>%
as_tibble
agg = dat %>%
group_by(runid) %>%
summarize(
highest.prob = max(prob.above.VE.min),
final.prob = last(prob.above.VE.min)
)
# Share of simulation runs in which highest posterior probability
# of VE > 30% across all possible interim analyses is larger than
# 97.5%
mean(agg$highest.prob > 0.975)
## [1] 0.07179
# Corresponding share looking only at the final analysis
mean(agg$final.prob > 0.975)
## [1] 0.02656
quantile(agg$highest.prob, 0.975)
## 97.5%
## 0.9922456
agg = dat %>%
group_by(runid) %>%
summarize(
interim.success = max(prob.above.VE.min* (m<164)) > 0.995,
final.prob = last(prob.above.VE.min)
)
# Fraction of trials where we have an interim success
share.interim.success = mean(agg$interim.success)
share.interim.success
## [1] 0.01516
# Remaining trials without interim success
agg.remain = filter(agg, !interim.success)
# Maximal share of success in final analysis in remaining trials
# to guarantee 2.5% error rate
max.share.final.success = (0.025-share.interim.success)*
NROW(agg)/NROW(agg.remain)
quantile(agg.remain$final.prob, 1-max.share.final.success)
## 99.00085%
## 0.9852906
m=100
cases and earlier wants to have exactly one interim analysis, which shall take place at a specific date no matter how many cases mi
have accrued at that date. One could then possibly make a valid study plan that just specifies the probability threshold for the interim analysis, like 99.5% while the threshold for the final analysis will be a function of the number of cases mi
at the interim analysis and set such that for every possible value of mi
a total type error I bound of 2.5% is guaranteed.As long as one can verify that one has yet looked at the data, one possibly can also adapt the analysis plan in a running study. E.g. Biontech/Pfizer’s press release states:After discussion with the FDA, the companies recently elected to drop the 32-case interim analysis and conduct the first interim analysis at a minimum of 62 cases. Upon the conclusion of those discussions, the evaluable case count reached 94 and the DMC [independent Data Monitoring Committee] performed its first analysis on all cases.While we don’t know the chosen thresholds for the modified study plan, it seems obvious from the thresholds of the original design that the vaccine efficacy is established already at the interim analysis with 94 cases. It is also stated in the press release that the clinical trial continues to the final analysis at 164 confirmed cases. However, this shall be done in order to collect further data and characterize the vaccine candidate’s performance against other study endpoints. Submission for Emergency Use Authorization seems still to require a safety milestone to be achieved, but it doesn’t look like proving efficacy is an issue anymore.As a personal summary, I had not thought that I end up writing three blog posts when I started with the idea to understand a bit better how this famous vaccine trial is evaluated. But with each post I learned new, interesting things about study design and analysis. Hope you also enjoyed reading the posts a bit.UPDATE (2020-11-16 14:00): I just read the good news from Moderna’s vaccine trial and skimmed over the study plan that Google located. Interestingly, they seem to use a pure frequentist approach in the statistical analysis and the study plan contains terms like Cox proportional hazard regression and O’Brien-Fleming boundaries (for dealing with interim analyses).
Thanks for visiting r-craft.org
This article is originally published at http://skranz.github.io/
Please visit source website for post related comments.