Dear Ohlbe,
❝ […] you should have a look at question 4.1 in the Q&A...
THX for reminding me, I missed it.
Apart from Mittyri’s
stuff above, calculating
any confidence interval in the pooled analysis is statistically deeply flawed. All studies were
confirmatory and thus, the entire \(\small{\alpha}\) was already
spent in each of them. Therefore, even using an ‘adjusted \(\small{\alpha}\)’ – like in group-sequential designs – is outright wrong. Only the T/R-ratio (like Health Canada for
Cmax) would make sense in a pooled analysis. If the software insists to specify a CI-level, use 50%. Of course, that results in a noninformative zero-width CI.
Does the almighty oracle mean by ‘
coverage probability’ \(\small{1-2\times p(\text{TOST})}\), where a failed study shows < 0.90 and ≥ 0.90 otherwise? What’s the point of it? See
above. Or is the ugly
post hoc power creeping through the back door (<50% is a failure and ≥50% a success)?
I also saw deficiency letters asking for a ‘
tipping point analysis’. No idea how that can be done here. Usually it is used to assess the impact of imputations of missing data.
Lengthy
-script for the examples at the end.
First example:
We plan a study based on assumptions (CV 0.2, T/R-ratio 0.95) and target a power of ≥80% as usual. However, the outcome was unfavorable. The CV was higher and the PE worse than assumed; we had two dropouts → the study failed.
Study 1 planned on assumptions (CV, T/R-ratio)
values CV T/R PE n power lower upper BE width post.hoc
planned 0.200 0.9500 – 20 0.8347 – – – – –
observed 0.220 – 90.25 18 – 79.52 102.42 fail 22.90 47.521%
‘coverage probability’ 88.43% < 90%
We plan the second study based on the first (by the
‘Carved-in-Stone’ approach for simplicity). Of course, we need a substantially larger sample size. We were lucky. The CV was lower and the PE better than assumed. Although we had a higher dropout-rate than in the first study, the second passed with flying colors.
Study 2 planned on observed values of Study 1
values CV T/R PE n power lower upper BE width post.hoc
planned 0.220 0.9025 – 42 0.8032 – – – – –
observed 0.198 – 92.56 36 – 85.61 100.09 pass 14.48 92.611%
‘coverage probability’ 99.67% >= 90%
What happens if we combine the studies (acceptable according to the guideline because one was passing)?
Pooled analysis: CV = 0.205 (50 df), N = 54
PE alpha CI lower upper BE width post.hoc
91.79 0.0294 94.12%: wrong (Pocock’s superiority) 87.23 96.60 “pass” 9.37 99.934%
91.79 0.0304 93.92%: wrong (Pocock’s equivalence) 87.26 96.56 “pass” 9.30 99.938%
91.79 0.0500 90.00%: wrong (naïve) 87.81 95.95 “pass” 8.14 99.974%
91.79 0.5000 50% : correct (but noninformative) 91.79 91.79 “pass” 0.00 100.000%
‘coverage probability’ 99.9999% >= 90%
I don’t get the idea behind. Can someone enlighten me?
If the
same batches of the test
and comparator were used in the passing study than in the failed one, I would definitely
»discuss the results and justify the BE claim«
instead of performing a wacky pooled analysis.
I would be wary if a study was repeated with the
same sample size like the first (
The Guy in the Armani Suit[© ElMaestro] said “
The study was planned for 80% power and there was a 20% chance of failure. Let’s cross fingers and try it again!” The biostatistician shook his head in despair and mused about getting a new job…).
Second example with the conditions as above, random CV, PE, and number of dropouts in the studies. I simulated a failed study and repeated studies until one passed.
Study n PE lower upper CV BE width ‘coverage’ post.hoc
Pivotal 18 82.59 75.40 90.48 0.1566 fail 15.08 0.4526 14.552%
Repeated 1 20 89.66 78.88 101.90 0.2368 fail 23.03 0.8600 43.301%
Repeated 2 19 86.61 77.92 96.26 0.1886 fail 18.34 0.7913 34.822%
Repeated 3 19 90.22 83.07 97.99 0.1469 pass 14.92 0.9785 78.359%
Pooled analysis: CV = 0.1866 (68 df), N = 76
PE alpha CI lower upper BE width ‘coverage’ post.hoc
87.36 0.0294 94.12%: wrong (Pocock’s superiority) 82.47 92.54 “pass” 10.07 0.9955 84.348%
87.36 0.0304 93.92%: wrong (Pocock’s equivalence) 82.51 92.50 “pass” 9.99 0.9955 84.712%
87.36 0.0500 90.00%: wrong (naïve) 83.10 91.84 “pass” 8.74 0.9955 89.658%
87.36 0.5000 50% : correct (but noninformative) 87.36 87.36 “pass” 0.00 0.9955 99.833%
Since the sample size wasn’t increased, how often would one repeat studies? In some of my simulations it needed four studies to pass. What would they present? Three failed ones and the one passing? What about the
whole body of evidence? Why should an assessor trust in the passing study more than in the failed ones?
The Guy in the Armani Suit feels confirmed because there is indeed a high probability that already the
first repeated study will pass. Sure, planned for 80% power again.
100,000 simulations; studies until passing
repeated probability cumulative
1 81.205% 81.205%
2 15.330% 96.535%
3 2.819% 99.354%
4 0.515% 99.869%
5 0.108% 99.977%
6 0.019% 99.996%
7 0.003% 99.999%
8 0.001% 100.000%
But what if not? We have again a producer’s risk of 20%… Will he sack the deplorable biostatistician or listen the next time? The biostatistician mutters “geometric series…” and walks away.
Hearsay:
Regulators are concerned that if the first study failed (pointing towards a ‘bad’ T/R-ratio), the applicant starts hunting for a comparator’s batch, which – based on the measured content – is expected to give a ‘better’ T/R-ratio. Resources allowing, the applicant might even produce a set of biobatches. Finally, a favorable combination is selected for the next study.
First example
library(PowerTOST)
# assumptions for the first study
CV.1 <- 0.20
theta0.1 <- 0.95
outcome <- rep(FALSE, 2)
study.1 <- data.frame(values = c("planned", "observed"),
CV = c(CV.1, NA_real_),
theta0 = c(theta0.1, NA_real_),
PE = rep(NA_real_, 2), n = NA_integer_,
power = NA_real_, lower = NA_real_,
upper = NA_real_, BE = NA_character_,
width = NA_character_, post.hoc = NA_real_)
study.1[1, 5:6] <- sampleN.TOST(CV = CV.1, theta0 = theta0.1,
print = FALSE)[c("Sample size",
"Achieved power")]
# higher CV, worse T/R-ratio, 2 dropouts
study.1[2, c(2, 4:5)] <- c(CV.1 * 1.1, theta0.1 * 0.95, study.1$n[1] - 2)
study.1[2, 7:8] <- round(100 * CI.BE(CV = study.1$CV[2], pe = study.1$PE[2],
n = study.1$n[2]), 2)
if (study.1$lower[2] >= 80 & study.1$upper[2] <= 125) {
study.1$BE[2] <- "pass"
outcome[1] <- TRUE
} else {
study.1$BE[2] <- "fail"
}
study.1[2, 10] <- sprintf("%5.2f", study.1$upper[2] - study.1$lower[2])
study.1[2, 11] <- sprintf("%7.3f%%",
100 * power.TOST(CV = study.1$CV[2],
theta0 = study.1$PE[2],
n = study.1$n[2]))
# is that meant?
coverage.1 <- 1 - 2 * pvalue.TOST(CV = study.1$CV[2], pe = study.1$PE[2],
n = study.1$n[2])
if (coverage.1 < 0.95) {
cov.1 <- sprintf("‘coverage probability’ %.2f%% < 90%s", 100 * coverage.1, "%\n")
} else {
cov.1 <- sprintf("‘coverage probability’ %.2f%% >= 90%s", 100 * coverage.1, "%\n")
}
# ‘Carved-in-Stone’ approach, i.e., plan based on the outcome of study 1
study.2 <- data.frame(values = c("planned", "observed"),
CV = c(study.1$CV[2], NA_real_),
theta0 = c(study.1$PE[2], NA_real_),
PE = rep(NA_real_, 2), n = NA_integer_,
power = NA_real_, lower = NA_real_,
upper = NA_real_, BE = NA_character_,
width = NA_character_, post.hoc = NA_real_)
study.2[1, 5:6] <- sampleN.TOST(CV = study.2$CV[1], theta0 = study.2$theta0[1],
print = FALSE)[c("Sample size",
"Achieved power")]
# lower CV, better T/R-ratio, 6 dropouts
study.2[2, c(2, 4:5)] <- c(study.2$CV[1] * 0.9, study.2$theta0[1] / 0.975, study.2$n[1] - 6)
study.2[2, 7:8] <- round(100 * CI.BE(CV = study.2$CV[2], pe = study.2$PE[2],
n = study.2$n[2]), 2)
if (study.2$lower[2] >= 80 & study.2$upper[2] <= 125) {
study.2$BE[2] <- "pass"
outcome[2] <- TRUE
} else {
study.2$BE[2] <- "fail"
}
study.2[2, 10] <- sprintf("%5.2f", study.2$upper[2] - study.2$lower[2])
study.2[2, 11] <- sprintf("%7.3f%%",
100 * power.TOST(CV = study.2$CV[2],
theta0 = study.2$PE[2],
n = study.2$n[2]))
coverage.2 <- 1 - 2 * pvalue.TOST(CV = study.2$CV[2], pe = study.2$PE[2],
n = study.2$n[2])
if (coverage.2 < 0.90) {
cov.2 <- sprintf("‘coverage probability’ %.2f%% < 90%s", 100 * coverage.2, "%\n")
} else {
cov.2 <- sprintf("‘coverage probability’ %.2f%% >= 90%s", 100 * coverage.2, "%\n")
}
# pooled analysis of the two pivotal studies only if at least one passed
if (sum(outcome) >= 1) {
CVdata <- data.frame(source = c("Study 1", "Study 2"),
CV = c(study.1$CV[2], study.2$CV[2]),
n = c(study.1$n[2], study.2$n[2]), design = "2x2")
tmp <- CVpooled(CVdata)
CV.pooled <- tmp$CV
df.pooled <- tmp$df
PE.pooled <- (study.1$PE[2] * study.1$n[2] + study.2$PE[2] * study.2$n[2]) /
(study.1$n[2] + study.2$n[2])
alpha <- c(0.0294, 0.0304, 0.05, 0.5)
pooled <- data.frame(PE = PE.pooled, alpha = alpha,
CI = c("94.12%: wrong (Pocock’s superiority)",
"93.92%: wrong (Pocock’s equivalence)",
"90.00%: wrong (naïve)",
"50% : correct (but noninformative)"),
lower = NA_real_, upper = NA_real_, BE = "“fail”",
width = NA_character_, post.hoc = NA_real_)
for (j in 1:4) {
pooled[j, 4:5] <- round(100 * CI.BE(alpha = alpha[j], CV = CV.pooled,
pe = PE.pooled,
n = study.1$n + study.2$n), 2)
if (pooled$lower[j] >= 80 & pooled$upper[j] <= 125) pooled$BE[j] <- "“pass”"
pooled[j, 7] <- sprintf("%5.2f", pooled[j, 5] - pooled[j, 4])
pooled[j, 8] <- sprintf("%7.3f%%",
100 * power.TOST(alpha = alpha[j],
CV = CV.pooled,
theta0 = PE.pooled,
n = study.1$n + study.2$n))
}
coverage.3 <- 1 - 2 * pvalue.TOST(CV = CV.pooled, pe = PE.pooled,
n = study.1$n + study.2$n)
if (coverage.3 < 0.90) {
cov.3 <- sprintf("‘coverage probability’ %.4f%% < 90%s",
100 * coverage.3, "%\n")
} else {
cov.3 <- sprintf("‘coverage probability’ %.4f%% >= 90%s",
100 * coverage.3, "%\n")
}
pooled[, 1] <- sprintf("%.2f", 100 * pooled[, 1])
names(pooled)[c(1, 3, 6)] <- paste0(" ", names(pooled)[c(1, 3, 6)])
}
# pure cosmetics
study.1[, 2] <- sprintf("%.3f", study.1[, 2])
study.1[, 3:6] <- signif(study.1[, 3:6], 4)
study.1[, 4] <- sprintf("%.2f", 100 * study.1[, 4])
study.1[, 3] <- sprintf("%.4f", study.1[, 3])
study.1[is.na(study.1)] <- study.1[1, 4] <- study.1[2, 3] <- "– "
study.1[1, 11] <- "– "
names(study.1)[c(2, 9)] <- paste0(names(study.1)[c(2, 9)], " ")
study.2[, 2] <- sprintf("%.3f", study.2[, 2])
study.2[, 3:6] <- signif(study.2[, 3:6], 4)
study.2[, 4] <- sprintf("%.2f", 100 * study.2[, 4])
study.2[is.na(study.2)] <- study.2[1, 4] <- study.2[2, 3] <- "– "
study.2[1, 11] <- "– "
names(study.2)[c(2, 9)] <- paste0(names(study.2)[c(2, 9)], " ")
names(study.1)[3] <- names(study.2)[3] <- "T/R "
names(study.1)[4] <- names(study.2)[4] <- names(pooled)[1] <- "PE "
names(study.1)[3] <- names(study.2)[3] <- "T/R "
t <- c("\nStudy 1 planned on assumptions (CV, T/R-ratio)\n",
"\nStudy 2 planned on observed values of Study 1\n")
if (sum(outcome) == 1) {
t <- c(t, paste0("\nPooled analysis: CV = ", signif(CV.pooled, 3),
" (", df.pooled, " df)",
", N = ", study.1$n[2] + study.2$n[2], "\n"))
cat(t[1]); print(study.1, row.names = FALSE); cat(cov.1)
cat(t[2]); print(study.2, row.names = FALSE); cat(cov.2)
cat(t[3]); print(pooled, row.names = FALSE, right = FALSE); cat(cov.3)
} else {
cat(t[1]); print(study.1, row.names = FALSE); cat(cov.1)
cat(t[2]); print(study.2, row.names = FALSE); cat(cov.2)
}
Second example (with every execution of the script you get a different number of repeats)
library(PowerTOST)
sim.study <- function(alpha = 0.05, theta0 = 0.95, CV, CVb, n, eligible,
per.effect = c(0, 0), carryover = c(0, 0),
setseed = TRUE) {
if (length(per.effect) == 1) per.effect <- c(0, per.effect)
# carryover: first element R → T, second element T → R
if (setseed) set.seed(123456)
if (missing(eligible)) eligible <- n
if (missing(CVb)) CVb <- CV * 1.5 # arbitrary
sd <- CV2se(CV)
sd.b <- CV2se(CVb)
subj <- 1:n
# within subjects
T <- rnorm(n = n, mean = log(theta0), sd = sd)
R <- rnorm(n = n, mean = 0, sd = sd)
# between subjects
TR <- rnorm(n = n, mean = 0, sd = sd.b)
T <- T + TR
R <- R + TR
TR.sim <- exp(mean(T) - mean(R))
data <- data.frame(subject = rep(subj, each = 2),
period = 1:2L,
sequence = c(rep("RT", n),
rep("TR", n)),
treatment = c(rep(c("R", "T"), n/2),
rep(c("T", "R"), n/2)),
logPK = NA_real_)
subj.T <- subj.R <- 0L # subject counters
for (j in 1:nrow(data)) { # clumsy but transparent
if (data$treatment[j] == "T") {
subj.T <- subj.T + 1L
if (data$period[j] == 1L) {
data$logPK[j] <- T[subj.T] + per.effect[1]
} else {
data$logPK[j] <- T[subj.T] + per.effect[2] + carryover[1]
}
} else {
subj.R <- subj.R + 1L
if (data$period[j] == 1L) {
data$logPK[j] <- R[subj.R] + per.effect[1]
} else {
data$logPK[j] <- R[subj.T] + per.effect[2] + carryover[2]
}
}
}
data <- data[data$subject <= eligible, ]
f <- c("subject", "period", "sequence", "treatment")
data[f] <- lapply(data[f], factor)
# bogus nested model acc. to the GLs
muddle <- lm(logPK ~ sequence + subject %in% sequence +
period + treatment, data = data)
PE <- 100 * exp(coef(muddle)[["treatmentT"]])
CI <- 100 * as.numeric(exp(confint(muddle, "treatmentT",
level = 1 - 2 * alpha)))
MSE <- anova(muddle)[["Residuals", "Mean Sq"]]
CV.hat <- mse2CV(MSE)
if (round(CI[1], 2) >= 80 & round(CI[2], 2) <= 125) {
BE <- "pass"
} else {
BE <- "fail"
}
res <- data.frame(n = eligible, PE = PE, lower = CI[1], upper = CI[2],
CV = CV.hat, BE = BE)
return(res)
}
CV <- 0.20 # assumed with-subject CV
theta0 <- 0.95 # assumed T/R-ratio
do.rate <- 0.1 # i.e., 10% anticipated
n <- sampleN.TOST(CV = CV, theta0 = theta0, print = FALSE)[["Sample size"]]
runs.1 <- 0L
passed.1 <- 0L
failed.1 <- 0L
CVdata <- data.frame(source = 1L, CV = NA_real_,
n = NA_integer_, PE = NA_real_, design = "2x2")
repeat { # until the first study fails by chance
runs.1 <- runs.1 + 1L
eligible <- round(runif(1, min = n * (1 - do.rate), max = n), 0)
study.1 <- sim.study(theta0 = theta0, CV = CV, n = n, eligible = eligible,
setseed = FALSE)
if (study.1$BE == "pass") passed.1 <- passed.1 + 1L
if (study.1$BE == "fail") failed.1 <- failed.1 + 1L
if (study.1$BE == "fail") {
study.1 <- cbind(study = runs.1 + 1L, study.1)
study.1$width <- study.1$upper - study.1$lower
study.1$coverage <- 1 - 2 *
suppressMessages(
pvalue.TOST(CV = study.1$CV,
pe = study.1$PE / 100,
n = study.1$n))
study.1$post.hoc <- sprintf("%7.3f%%",
100 * suppressMessages(
power.TOST(CV = study.1$CV,
theta0 = study.1$PE / 100,
n = study.1$n)))
study.1[, c(3:6, 8:9)] <- signif(study.1[, c(3:6, 8:9)], 4)
break
}
}
CVdata[failed.1, 2:4] <- study.1[c(6, 2:3)]
study.rep <- data.frame(n = integer(), PE = numeric(), lower = numeric(),
upper = numeric(), CV = numeric(), BE = character(),
width = numeric(), coverage = numeric(),
post.hoc = character())
runs.2 <- 0L
passed.2 <- 0L
failed.2 <- 0L
repeat { # until a repeated study passes by chance
runs.2 <- runs.2 + 1L
eligible <- round(runif(1, min = n * (1 - do.rate), max = n), 0)
study.rep[runs.2, ] <- sim.study(theta0 = theta0, CV = CV, n = n,
eligible = eligible, setseed = FALSE)
tmp <- data.frame(source = runs.2 + 1L, CV = study.rep[runs.2, 5],
n = study.rep[runs.2, 1], PE = study.rep[runs.2, 2],
design = "2x2")
CVdata <- rbind(CVdata, tmp)
if (study.rep$BE[runs.2] == "pass") passed.2 <- passed.2 + 1L
if (study.rep$BE[runs.2] == "fail") failed.2 <- failed.2 + 1L
study.2 <- cbind(study = runs.2 + 1, study.rep[runs.2, ])
study.2$width <- study.2$upper - study.2$lower
study.2$coverage <- 1 - 2 *
suppressMessages(
pvalue.TOST(CV = study.2$CV,
pe = study.2$PE / 100,
n = study.2$n))
study.2$post.hoc <- sprintf("%7.3f%%",
100 * suppressMessages(
power.TOST(CV = study.2$CV,
theta0 = study.2$PE / 100,
n = study.2$n)))
study.rep[runs.2, c(2:5, 7:9)] <- signif(study.2[, c(3:6, 8:9)], 4)
study.rep[runs.2, 9] <- study.2$post.hoc
if (study.rep$BE[runs.2] == "pass") break
}
studies <- rbind(study.1[, 2:10], study.rep)
studies$coverage <- sprintf("%.4f ", studies$coverage)
names(studies)[c(3, 6:7)] <- paste0(names(studies)[c(3, 6:7)], " ")
names(studies)[8] <- "‘coverage’"
studies <- cbind(Study = c("Pivotal", paste0("Repeated ", 1:runs.2)), studies)
tmp <- CVpooled(CVdata)
CV.pooled <- tmp$CV
df.pooled <- tmp$df
PE.pooled <- sum(CVdata$PE * CVdata$n) / sum(CVdata$n)
alpha <- c(0.0294, 0.0304, 0.05, 0.5)
pooled <- data.frame(PE = PE.pooled, alpha = alpha,
CI = c("94.12%: wrong (Pocock’s superiority)",
"93.92%: wrong (Pocock’s equivalence)",
"90.00%: wrong (naïve)",
"50% : correct (but noninformative)"),
lower = NA_real_, upper = NA_real_, BE = "“fail”",
width = NA_character_, coverage = NA_real_,
post.hoc = NA_character_)
for (j in 1:4) {
pooled[j, 4:5] <- round(100 * suppressMessages(
CI.BE(alpha = alpha[j], CV = CV.pooled,
pe = PE.pooled / 100,
n = sum(CVdata$n))), 2)
if (pooled$lower[j] >= 80 & pooled$upper[j] <= 125) pooled$BE[j] <- "“pass”"
pooled[j, 7] <- sprintf("%5.2f", pooled[j, 5] - pooled[j, 4])
pooled[j, 9] <- sprintf("%7.3f%%",
100 * suppressMessages(
power.TOST(alpha = alpha[j],
CV = CV.pooled,
theta0 = PE.pooled / 100,
n = sum(CVdata$n))))
}
pooled$coverage <- 1 - 2 *
suppressMessages(
pvalue.TOST(CV = CV.pooled, pe = PE.pooled / 100,
n = sum(CVdata$n)))
pooled$PE <- round(pooled$PE, 2)
pooled$coverage <- sprintf(" %.4f", pooled$coverage)
names(pooled)[c(1, 3, 6, 8)] <- c(" PE", " CI", " BE", "‘coverage’")
t1 <- paste0("Simulation target: A failed pivotal and repeated",
"studies until one passed.",
sprintf("\nPivotals: %2i (%2i failed = target, %2i passed)",
runs.1, failed.1, passed.1),
sprintf("\nRepeats : %2i (%2i passed = target, %2i failed)",
runs.2, passed.2, failed.2), "\n\n")
t2 <- paste0("\nPooled analysis: CV = ", sprintf("%.4f", CV.pooled), " (",
df.pooled, " df)", ", N = ", sum(CVdata$n), "\n")
cat(t1); print(studies, row.names = FALSE)
cat(t2); print(pooled, row.names = FALSE, right = FALSE)