Dear all,
some more stuff related to
this post.
Burton
et al.1 recommend that the random number generator to use and the starting seeds are specified. Furthermore:
»A random number generator must be able to reproduce the identical set of random numbers when the same starting value, known as a seed, is specified. This is also essential when performing simulation studies to enable the generated data sets and hence results to be reproduced, if necessary, for monitoring purposes.«
Although I appreciate the article
2 it should have been published in the
Journal of Irreproducible Results in parallel.
»A different randomly selected seed was used for each scenario.«
Inspect the plots below at 10
6 simulations to assess the effect of random seeds. Given, one can
almost reproduce the results.
Empiric Type I Errors, shifted
t-distribution,
random seeds, 10
6 simulations (part of Potvin
et al.2 Table I)$$\small{\begin{array}{cccc}
& & \text{Method} &\\\hline
CV\,20\% & \text{B} & \text{C} & \text{D}\\
n_1 & \alpha=0.0294 & \alpha=0.0294 & \alpha=0.0280\\\hline
12 & \phantom{0}0.0463\phantom{0} & \color{Red}{0.0510} & \phantom{0}0.0499\phantom{0}\\
24 & \phantom{0}0.0320\phantom{0} & 0.0490 & \phantom{0}0.0493\phantom{0}\\
36 & \phantom{0}0.0294\phantom{0} & 0.0499 & \phantom{0}0.0499\phantom{0}\\
48 & \phantom{0}0.0292\phantom{0} & 0.0495 & \phantom{0}0.0497\phantom{0}\\
60 & \phantom{0}0.0297\phantom{0} & 0.0500 & \phantom{0}0.0500\phantom{0}\\\hline
\end{array}}$$Empiric Type I Errors,
noncentral t-distribution,
fixed seeds, 10
6 simulations (
-package
Power2Stage
)$$\small{\begin{array}{cccc}
& & \text{Method} &\\\hline
CV\,20\% & \text{B} & \text{C} & \text{D}\\
n_1 & \alpha=0.0294 & \alpha=0.0294 & \alpha=0.0280\\\hline
12 & \phantom{0}0.0463\phantom{0} & \color{Red}{0.0511} & \phantom{0}0.0499\phantom{0}\\
24 & \phantom{0}0.0315\phantom{0} & 0.0492 & \phantom{0}0.0492\phantom{0}\\
36 & \phantom{0}0.0294\phantom{0} & 0.0499 & \phantom{0}0.0499\phantom{0}\\
48 & \phantom{0}0.0293\phantom{0} & 0.0498 & \phantom{0}0.0498\phantom{0}\\
60 & \phantom{0}0.0293\phantom{0} & 0.0496 & \phantom{0}0.0496\phantom{0}\\\hline
\end{array}}$$ Regulators on this this of the pond were/are concerned about an inflated Type I Error in ‘Method C’ with a low \(\small{CV}\) and small sample size \(\small{n_1}\) in the first stage. The authors reported an even larger – but
irreproducibleA – inflation of the Type I Error with 0.0504 for \(\small{CV=10\%}\) and 60 (‼) subjects in the first stage. Who on earth would design such a study?
Let’s dive into the matter.
Five simulations at every number of simulations. Filled circles = fixed seeds, empty
circles = random seeds, magenta line = mean of empiric Type I Error, dashed curves =
significance limits of the binomial test for \(\small{\alpha=0.05}\) and number of simulations.
≈2.43 billion simulations in total.
We see that the Type I Error is controlled with 12 subjects in the first stage (≈0.0463 with one and 100 million simulations).
Although within the limit of a ‘negligible inflation’
2 (0.052), the Type I Error is indeed
significantly inflated (≈0.0511 with one million simulations and ≈0.0513 with 100 millions).
However, already with 16 subjects in the first stage the Type I Error is no more inflated (≈0.0496 with one million simulations and ≈0.0495 with 100 millions). The same holds for any larger first stage. Therefore,
in practice there is no problem in applying ‘Method C’ and why it is recommended by the FDA and Health Canada.
3,4
Don’t be tempted to start with a small first stage. Unless the \(\small{CV}\) is low, there will be high chance to have to proceed to the second stage, which takes time. Instead, plan the first stage in such a way that there is a reasonably high chance to show BE already in the interim. Rule of thumb: 80–90% of a fixed sample design will give a power of 60–70% in the first stage. Then the second stage can be understood as a kind of ‘safety net’ – something European assessors seemingly prefer.
It is a mystery to me why the authors recommended ‘Method C’, while
no inflated Type I Error has been observed in ‘Method D’ with slightly more adjustment (\(\small{\alpha=0.0280}\)) in
any of the \(\small{CV}\)-\(\small{n_1}\) combinations.
No significant inflation of the Type I Error (≈0.0500 with one million simulations and ≈0.0499 with 100 millions).
We could have avoided the endless and fruitless discussions if they had thrown ‘Method C’ in the garbage. Further, the loss in power compared to ‘Method C’ is negligible.
BEstimated Type I Errors and Powers for Methods B, C, and D.
Shifted central t-distribution, one million simulations, fixed seeds.
n1 CV(%) TIE(B) TIE(C) TIE(D) Pwr(B) Pwr(C) Pwr(D)
12 10 0.0294 0.0499 0.0499 0.9772 0.9890 0.9890
24 10 0.0292 0.0496 0.0496 0.9999 1.0000 1.0000
36 10 0.0293 0.0499 0.0499 1.0000 1.0000 1.0000
48 10 0.0293 0.0498 0.0498 1.0000 1.0000 1.0000
60 10 0.0293 0.0496 0.0496 1.0000 1.0000 1.0000
12 20 0.0464 0.0512 0.0499 0.8442 0.8486 0.8478
24 20 0.0316 0.0492 0.0491 0.8816 0.9097 0.9098
36 20 0.0294 0.0499 0.0499 0.9553 0.9752 0.9752
48 20 0.0293 0.0498 0.0498 0.9887 0.9946 0.9946
60 20 0.0293 0.0496 0.0496 0.9974 0.9989 0.9989
12 30 0.0436 0.0440 0.0416 0.7868 0.7872 0.7850
24 30 0.0475 0.0490 0.0472 0.8309 0.8320 0.8315
36 30 0.0397 0.0476 0.0467 0.8385 0.8483 0.8481
48 30 0.0321 0.0493 0.0492 0.8550 0.8873 0.8873
60 30 0.0295 0.0496 0.0496 0.8999 0.9363 0.9363
12 40 0.0343 0.0342 0.0326 0.7513 0.7513 0.7499
24 40 0.0430 0.0431 0.0407 0.8037 0.8033 0.8026
36 40 0.0486 0.0488 0.0466 0.8238 0.8244 0.8229
48 40 0.0457 0.0470 0.0453 0.8297 0.8308 0.8302
60 40 0.0406 0.0464 0.0455 0.8310 0.8369 0.8368
12 50 0.0313 0.0310 0.0296 0.7360 0.7363 0.7352
24 50 0.0336 0.0335 0.0319 0.7830 0.7830 0.7822
36 50 0.0418 0.0417 0.0392 0.8054 0.8046 0.8042
48 50 0.0483 0.0480 0.0458 0.8190 0.8193 0.8179
60 50 0.0478 0.0481 0.0461 0.8257 0.8253 0.8251
12 60 0.0299 0.0297 0.0284 0.7288 0.7295 0.7278
24 60 0.0309 0.0310 0.0294 0.7749 0.7744 0.7733
36 60 0.0332 0.0332 0.0312 0.7910 0.7908 0.7904
48 60 0.0399 0.0398 0.0372 0.8048 0.8045 0.8031
60 60 0.0466 0.0468 0.0442 0.8156 0.8153 0.8146
12 70 0.0293 0.0294 0.0280 0.7253 0.7254 0.7248
24 70 0.0305 0.0305 0.0287 0.7707 0.7707 0.7702
36 70 0.0305 0.0303 0.0290 0.7850 0.7850 0.7845
48 70 0.0325 0.0325 0.0306 0.7933 0.7933 0.7933
60 70 0.0378 0.0380 0.0356 0.8026 0.8027 0.8017
12 80 0.0293 0.0293 0.0278 0.7234 0.7232 0.7225
24 80 0.0300 0.0300 0.0285 0.7683 0.7683 0.7679
36 80 0.0300 0.0300 0.0284 0.7833 0.7833 0.7822
48 80 0.0300 0.0300 0.0286 0.7895 0.7895 0.7890
60 80 0.0318 0.0318 0.0299 0.7950 0.7950 0.7943
I must confess that I performed most of my studies according to ‘Method C’ but all passed BE already in the interim. This might explain why I never received a deficiency letter from European assessors.
In the published methods
2,5–10 there is an inflation of letters (B, C, C/D, E, F, …) – Detlew called it ‘
letteritis’. To avoid ambiguities we prefer ‘Type 1’ for ones where an adjusted \(\small{\alpha}\) is used both in the interim and the final analysis (
e.g., ‘Method B’). In ‘Type 2’ \(\small{\alpha=0.05}\)
or – conditional on power – an adjusted \(\small{\alpha}\) is used in the interim and an adjusted \(\small{\alpha}\) in the final analysis (
e.g., ‘Method C’).
11,12 Note that in some of the methods
9,10 the adjusted \(\small{\alpha}\) in the interim and the final analysis is not the same.
- Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Statist Med. 2006; 25: 4279–92. doi:10.1002/sim.2673.
- Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs. Pharm Stat. 2007; 7(4): 245—262.
- FDA (CDER). Bioequivalence Studies With Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA. Guidance for Industry. Draft. Silver Spring. August 2021.
- Health Canada. Guidance Document. Conduct and Analysis of Comparative Bioavailability Studies. Ottawa. Adopted 2012/02/08, Revised 2023/01/30.
- Montague TH, Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ. Additional results for ‘Sequential design approaches for bioequivalence studies with crossover designs’. Pharm Stat. 2011; 11(1): 8–13. doi:10.1002/pst.483
- Fuglsang A. Sequential Bioequivalence Trial Designs with Increased Power and Controlled Type I Error Rates. AAPS J. 2013; 15(3): 659–61. doi:10.1208/s12248-013-9475-5
- Karalis V, Macheras P. An Insight into the Properties of a Two-Stage Design in Bioequivalence Studies. Pharm Res. 2013; 30(7): 1824–35. doi:10.1007/s11095-013-1026-3
- Karalis V, Macheras P. On the Statistical Model of the Two-Stage Designs in Bioequivalence Assessment. J Pharm Pharmacol. 2014; 66(1): 48–52. doi:10.1111/jphp.12164
- Zheng Ch, Zhao L, Wang J. Modifications of sequential designs in bioequivalence trials. Pharm Stat. 2015; 14(3): 180–8. doi:10.1002/pst.1672
- Xu J, Audet C, DiLiberti CE, Hauck WW, Montague TH, Parr TH, Potvin D, Schuirmann DJ. Optimal adaptive sequential designs for crossover bioequivalence studies. Pharm Stat. 2016; 15(1): 15–27. doi:10.1002/pst.1721
- Schütz H. Two-stage designs in bioequivalence trials. Eur J Clin Pharmacol. 2015; 71(3): 271–81. doi:10.1007/s00228-015-1806-2
- Molins E, Labes D, Schütz H, Cobo E, Ocaña J. An iterative method to protect the type I error rate in bioequivalence studies under two-stage adaptive 2×2 crossover designs. Biom J. 2021; 63(1): 122–33. doi:10.1002/bimj.201900388
- Possibly Mr. Murphy hit:
library(Power2Stage)
runs <- 25
comp <- data.frame(run = 1:runs, TIE = NA_real_)
for (j in 1:runs) {
ifelse (comp$run[j] == 1, setseed <- TRUE, setseed <- FALSE)
comp$TIE[j] <- round(power.tsd(method = "C", CV = .1, n1 = 60, theta0 = 1.25, setseed = setseed,
pmethod = "shifted")$pBE, 4) # one million sim’s by default
}
cat("Empiric Type I Error in", runs, "repeated simulations\n"); summary(comp$TIE)
Empiric Type I Error in 25 repeated simulations
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.04940 0.04980 0.04990 0.04993 0.05000 0.05040
- Comparison of empiric Type I Errors and Powers:
# Assess the operating characteristics in the grid of Potvin et al.:
# Use the defaults of the function: GMR = 0.95 and targetpower = 0.80.
# For power use 1E6 sim’s for comparison with the article (default are 1E5).
# Use theta0 = 1.25 for the Type I Error (1E6 sim’s by default).
# Use a fixed seed to support reproducibility (default setseed = TRUE) and by
# the shifted central t-distribution (instead of by the noncentral t-distribution,
# the function’s default).
n1 <- seq(12L, 60L, 12L)
CV <- seq(10L, 80L, 10L) / 100
x <- data.frame(n1 = n1, CV = rep(CV, each = length(n1)),
B.TIE = NA_real_, C.TIE = NA_real_, D.TIE = NA_real_,
B.pwr = NA_real_, C.pwr = NA_real_, D.pwr = NA_real_)
B <- C <- rep(0.0294, 2); D <- rep(0.0280, 2) # adjusted alphas
pm <- "shifted" # like in the paper
pbc <- 0L
pb <- txtProgressBar(style = 3)
for (j in 1L:nrow(x)) {
pbc <- pbc + 1L
CV.j <- x$CV[j]
n1.j <- x$n1[j]
x$B.TIE[j] <- power.tsd(method = "B", alpha = B, CV = CV.j, n1 = n1.j,
pmethod = pm, theta0 = 1.25)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
x$C.TIE[j] <- power.tsd(method = "C", alpha = C, CV = CV.j, n1 = n1.j,
pmethod = pm, theta0 = 1.25)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
x$D.TIE[j] <- power.tsd(method = "C", alpha = D, CV = CV.j, n1 = n1.j,
pmethod = pm, theta0 = 1.25)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
x$B.pwr[j] <- power.tsd(method = "B", alpha = B, CV = CV.j, n1 = n1.j,
pmethod = pm, nsims = 1E6)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
x$C.pwr[j] <- power.tsd(method = "C", alpha = C, CV = CV.j, n1 = n1.j,
pmethod = pm, nsims = 1E6)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
x$D.pwr[j] <- power.tsd(method = "C", alpha = D, CV = CV.j, n1 = n1.j,
pmethod = pm, nsims = 1E6)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
}
close(pb)
x$CV <- x$CV * 100
x[, 3:8] <- round(x[, 3:8], 4)
x$CV <- sprintf("%i ", x$CV)
names(x)[2:8] <- c("CV(%)", sprintf("TIE(%s)", LETTERS[2:4]),
sprintf("Pwr(%s)", LETTERS[2:4]))
txt <- paste0("Estimated Type I Errors and Powers for Methods B, C, and D.\n",
"Shifted central t-distribution, one million simulations, ",
"fixed seeds.\n")
cat(txt); print(x, row.names = FALSE)