## Simulation-based 2×2×2 TSDs – a deep dive [Two-Stage / GS Designs]

Dear all,

some more stuff related to this post.

Burton et al.1 recommend that the random number generator to use and the starting seeds are specified. Furthermore:

»A random number generator must be able to reproduce the identical set of random numbers when the same starting value, known as a seed, is specified. This is also essential when performing simulation studies to enable the generated data sets and hence results to be reproduced, if necessary, for monitoring purposes.«

Although I appreciate the article2 it should have been published in the Journal of Irreproducible Results in parallel.

»A different randomly selected seed was used for each scenario.«

Inspect the plots below at 106 simulations to assess the effect of random seeds. Given, one can almost reproduce the results.

Empiric Type I Errors, shifted t-distribution, random seeds, 106 simulations (part of Potvin et al.2 Table I)$$\small{\begin{array}{cccc} & & \text{Method} &\\\hline CV\,20\% & \text{B} & \text{C} & \text{D}\\ n_1 & \alpha=0.0294 & \alpha=0.0294 & \alpha=0.0280\\\hline 12 & \phantom{0}0.0463\phantom{0} & \color{Red}{0.0510} & \phantom{0}0.0499\phantom{0}\\ 24 & \phantom{0}0.0320\phantom{0} & 0.0490 & \phantom{0}0.0493\phantom{0}\\ 36 & \phantom{0}0.0294\phantom{0} & 0.0499 & \phantom{0}0.0499\phantom{0}\\ 48 & \phantom{0}0.0292\phantom{0} & 0.0495 & \phantom{0}0.0497\phantom{0}\\ 60 & \phantom{0}0.0297\phantom{0} & 0.0500 & \phantom{0}0.0500\phantom{0}\\\hline \end{array}}$$Empiric Type I Errors, noncentral t-distribution, fixed seeds, 106 simulations (-package Power2Stage)$$\small{\begin{array}{cccc} & & \text{Method} &\\\hline CV\,20\% & \text{B} & \text{C} & \text{D}\\ n_1 & \alpha=0.0294 & \alpha=0.0294 & \alpha=0.0280\\\hline 12 & \phantom{0}0.0463\phantom{0} & \color{Red}{0.0511} & \phantom{0}0.0499\phantom{0}\\ 24 & \phantom{0}0.0315\phantom{0} & 0.0492 & \phantom{0}0.0492\phantom{0}\\ 36 & \phantom{0}0.0294\phantom{0} & 0.0499 & \phantom{0}0.0499\phantom{0}\\ 48 & \phantom{0}0.0293\phantom{0} & 0.0498 & \phantom{0}0.0498\phantom{0}\\ 60 & \phantom{0}0.0293\phantom{0} & 0.0496 & \phantom{0}0.0496\phantom{0}\\\hline \end{array}}$$ Regulators on this this of the pond were/are concerned about an inflated Type I Error in ‘Method C’ with a low $$\small{CV}$$ and small sample size $$\small{n_1}$$ in the first stage. The authors reported an even larger – but irreproducibleA – inflation of the Type I Error with 0.0504 for $$\small{CV=10\%}$$ and 60 (‼) subjects in the first stage. Who on earth would design such a study?

Let’s dive into the matter.

Five simulations at every number of simulations. Filled circles = fixed seeds, empty
circles = random seeds, magenta line = mean of empiric Type I Error, dashed curves =
significance limits of the binomial test for $$\small{\alpha=0.05}$$ and number of simulations.
≈2.43 billion simulations in total.

We see that the Type I Error is controlled with 12 subjects in the first stage (≈0.0463 with one and 100 million simulations).

Although within the limit of a ‘negligible inflation’2 (0.052), the Type I Error is indeed significantly inflated (≈0.0511 with one million simulations and ≈0.0513 with 100 millions).

However, already with 16 subjects in the first stage the Type I Error is no more inflated (≈0.0496 with one million simulations and ≈0.0495 with 100 millions). The same holds for any larger first stage. Therefore, in practice there is no problem in applying ‘Method C’ and why it is recommended by the FDA and Health Canada.3,4
Don’t be tempted to start with a small first stage. Unless the $$\small{CV}$$ is low, there will be high chance to have to proceed to the second stage, which takes time. Instead, plan the first stage in such a way that there is a reasonably high chance to show BE already in the in­ter­im. Rule of thumb: 80–90% of a fixed sample design will give a power of 60–70% in the first stage. Then the second stage can be understood as a kind of ‘safety net’ – something Eu­ro­pe­an assessors seemingly prefer.

It is a mystery to me why the authors recommended ‘Method C’, while no inflated Type I Error has been observed in ‘Method D’ with slightly more adjustment ($$\small{\alpha=0.0280}$$) in any of the $$\small{CV}$$-$$\small{n_1}$$ combinations.

No significant inflation of the Type I Error (≈0.0500 with one million simulations and ≈0.0499 with 100 millions).

We could have avoided the endless and fruitless discussions if they had thrown ‘Method C’ in the garbage. Further, the loss in power compared to ‘Method C’ is negligible.B

Estimated Type I Errors and Powers for Methods B, C, and D.
Shifted central t-distribution, one million simulations, fixed seeds.
n1 CV(%) TIE(B) TIE(C) TIE(D) Pwr(B) Pwr(C) Pwr(D)
12  10   0.0294 0.0499 0.0499 0.9772 0.9890 0.9890
24  10   0.0292 0.0496 0.0496 0.9999 1.0000 1.0000
36  10   0.0293 0.0499 0.0499 1.0000 1.0000 1.0000
48  10   0.0293 0.0498 0.0498 1.0000 1.0000 1.0000
60  10   0.0293 0.0496 0.0496 1.0000 1.0000 1.0000
12  20   0.0464
0.0512 0.0499 0.8442 0.8486 0.8478
24  20   0.0316 0.0492 0.0491 0.8816 0.9097 0.9098
36  20   0.0294 0.0499 0.0499 0.9553 0.9752 0.9752
48  20   0.0293 0.0498 0.0498 0.9887 0.9946 0.9946
60  20   0.0293 0.0496 0.0496 0.9974 0.9989 0.9989
12  30   0.0436 0.0440 0.0416 0.7868 0.7872 0.7850
24  30   0.0475 0.0490 0.0472 0.8309 0.8320 0.8315
36  30   0.0397 0.0476 0.0467 0.8385 0.8483 0.8481
48  30   0.0321 0.0493 0.0492 0.8550 0.8873 0.8873
60  30   0.0295 0.0496 0.0496 0.8999 0.9363 0.9363
12  40   0.0343 0.0342 0.0326 0.7513 0.7513 0.7499
24  40   0.0430 0.0431 0.0407 0.8037 0.8033 0.8026
36  40   0.0486 0.0488 0.0466 0.8238 0.8244 0.8229
48  40   0.0457 0.0470 0.0453 0.8297 0.8308 0.8302
60  40   0.0406 0.0464 0.0455 0.8310 0.8369 0.8368
12  50   0.0313 0.0310 0.0296 0.7360 0.7363 0.7352
24  50   0.0336 0.0335 0.0319 0.7830 0.7830 0.7822
36  50   0.0418 0.0417 0.0392 0.8054 0.8046 0.8042
48  50   0.0483 0.0480 0.0458 0.8190 0.8193 0.8179
60  50   0.0478 0.0481 0.0461 0.8257 0.8253 0.8251
12  60   0.0299 0.0297 0.0284 0.7288 0.7295 0.7278
24  60   0.0309 0.0310 0.0294 0.7749 0.7744 0.7733
36  60   0.0332 0.0332 0.0312 0.7910 0.7908 0.7904
48  60   0.0399 0.0398 0.0372 0.8048 0.8045 0.8031
60  60   0.0466 0.0468 0.0442 0.8156 0.8153 0.8146
12  70   0.0293 0.0294 0.0280 0.7253 0.7254 0.7248
24  70   0.0305 0.0305 0.0287 0.7707 0.7707 0.7702
36  70   0.0305 0.0303 0.0290 0.7850 0.7850 0.7845
48  70   0.0325 0.0325 0.0306 0.7933 0.7933 0.7933
60  70   0.0378 0.0380 0.0356 0.8026 0.8027 0.8017
12  80   0.0293 0.0293 0.0278 0.7234 0.7232 0.7225
24  80   0.0300 0.0300 0.0285 0.7683 0.7683 0.7679
36  80   0.0300 0.0300 0.0284 0.7833 0.7833 0.7822
48  80   0.0300 0.0300 0.0286 0.7895 0.7895 0.7890
60  80   0.0318 0.0318 0.0299 0.7950 0.7950 0.7943

I must confess that I performed most of my studies according to ‘Method C’ but all passed BE already in the interim. This might explain why I never received a deficiency letter from European assessors.

In the published methods2,5–10 there is an inflation of letters (B, C, C/D, E, F, …) – Detlew called it ‘letteritis’. To avoid ambiguities we prefer ‘Type 1’ for ones where an adjusted $$\small{\alpha}$$ is used both in the interim and the final analysis (e.g., ‘Method B’). In ‘Type 2’ $$\small{\alpha=0.05}$$ or – conditional on power – an adjusted $$\small{\alpha}$$ is used in the interim and an adjusted $$\small{\alpha}$$ in the final analysis (e.g., ‘Method C’).11,12 Note that in some of the methods9,10 the adjusted $$\small{\alpha}$$ in the interim and the final analysis is not the same.

1. Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Statist Med. 2006; 25: 4279–92. doi:10.1002/sim.2673.
2. Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs. Pharm Stat. 2007; 7(4): 245—262.
3. FDA (CDER). Bioequivalence Studies With Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA. Guidance for Industry. Draft. Silver Spring. August 2021.
4. Health Canada. Guidance Document. Conduct and Analysis of Comparative Bioavailability Studies. Ottawa. Adopted 2012/02/08, Revised 2023/01/30.
5. Montague TH, Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ. Additional results for ‘Sequential design approaches for bio­equivalence studies with crossover designs’. Pharm Stat. 2011; 11(1): 8–13. doi:10.1002/pst.483
6. Fuglsang A. Sequential Bioequivalence Trial Designs with Increased Power and Controlled Type I Error Rates. AAPS J. 2013; 15(3): 659–61. doi:10.1208/s12248-013-9475-5
7. Karalis V, Macheras P. An Insight into the Properties of a Two-Stage Design in Bioequivalence Studies. Pharm Res. 2013; 30(7): 1824–35. doi:10.1007/s11095-013-1026-3
8. Karalis V, Macheras P. On the Statistical Model of the Two-Stage Designs in Bioequivalence Assessment. J Pharm Pharmacol. 2014; 66(1): 48–52. doi:10.1111/jphp.12164
9. Zheng Ch, Zhao L, Wang J. Modifications of sequential designs in bioequivalence trials. Pharm Stat. 2015; 14(3): 180–8. doi:10.1002/pst.1672
10. Xu J, Audet C, DiLiberti CE, Hauck WW, Montague TH, Parr TH, Potvin D, Schuirmann DJ. Optimal adaptive sequential designs for crossover bioequivalence studies. Pharm Stat. 2016; 15(1): 15–27. doi:10.1002/pst.1721
11. Schütz H. Two-stage designs in bioequivalence trials. Eur J Clin Pharmacol. 2015; 71(3): 271–81. doi:10.1007/s00228-015-1806-2
12. Molins E, Labes D, Schütz H, Cobo E, Ocaña J. An iterative method to protect the type I error rate in bioequivalence studies under two-stage adaptive 2×2 crossover designs. Biom J. 2021; 63(1): 122–33. doi:10.1002/bimj.201900388

1. Possibly Mr. Murphy hit:
library(Power2Stage)
runs <- 25
comp <- data.frame(run = 1:runs, TIE = NA_real_)
for (j in 1:runs) {
ifelse (comp$run[j] == 1, setseed <- TRUE, setseed <- FALSE) comp$TIE[j] <- round(power.tsd(method = "C", CV = .1, n1 = 60, theta0 = 1.25, setseed = setseed,
pmethod = "shifted")$pBE, 4) # one million sim’s by default } cat("Empiric Type I Error in", runs, "repeated simulations\n"); summary(comp$TIE)

Empiric Type I Error in 25 repeated simulations
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.04940 0.04980 0.04990 0.04993 0.05000 0.05040

2. Comparison of empiric Type I Errors and Powers:
# Assess the operating characteristics in the grid of Potvin et al.:
# Use the defaults of the function: GMR = 0.95 and targetpower = 0.80.
# For power use 1E6 sim’s for comparison with the article (default are 1E5).
# Use theta0 = 1.25 for the Type I Error (1E6 sim’s by default).
# Use a fixed seed to support reproducibility (default setseed = TRUE) and by
# the shifted central t-distribution (instead of by the noncentral t-distribution,
# the function’s default).

n1  <- seq(12L, 60L, 12L)
CV  <- seq(10L, 80L, 10L) / 100
x   <- data.frame(n1 = n1, CV = rep(CV, each = length(n1)),
B.TIE = NA_real_, C.TIE = NA_real_, D.TIE = NA_real_,
B.pwr = NA_real_, C.pwr = NA_real_, D.pwr = NA_real_)
B   <- C <- rep(0.0294, 2); D <- rep(0.0280, 2) # adjusted alphas
pm  <- "shifted"                                # like in the paper
pbc <- 0L
pb  <- txtProgressBar(style = 3)
for (j in 1L:nrow(x)) {
pbc  <- pbc + 1L
CV.j <- x$CV[j] n1.j <- x$n1[j]
x$B.TIE[j] <- power.tsd(method = "B", alpha = B, CV = CV.j, n1 = n1.j, pmethod = pm, theta0 = 1.25)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
x$C.TIE[j] <- power.tsd(method = "C", alpha = C, CV = CV.j, n1 = n1.j, pmethod = pm, theta0 = 1.25)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
x$D.TIE[j] <- power.tsd(method = "C", alpha = D, CV = CV.j, n1 = n1.j, pmethod = pm, theta0 = 1.25)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
x$B.pwr[j] <- power.tsd(method = "B", alpha = B, CV = CV.j, n1 = n1.j, pmethod = pm, nsims = 1E6)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
x$C.pwr[j] <- power.tsd(method = "C", alpha = C, CV = CV.j, n1 = n1.j, pmethod = pm, nsims = 1E6)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
x$D.pwr[j] <- power.tsd(method = "C", alpha = D, CV = CV.j, n1 = n1.j, pmethod = pm, nsims = 1E6)$pBE
setTxtProgressBar(pb, pbc / nrow(x))
}
close(pb)
x$CV <- x$CV * 100
x[, 3:8]      <- round(x[, 3:8], 4)
x$CV <- sprintf("%i ", x$CV)
names(x)[2:8] <- c("CV(%)", sprintf("TIE(%s)", LETTERS[2:4]),
sprintf("Pwr(%s)", LETTERS[2:4]))
txt <- paste0("Estimated Type I Errors and Powers for Methods B, C, and D.\n",
"Shifted central t-distribution, one million simulations, ",
"fixed seeds.\n")
cat(txt); print(x, row.names = FALSE)

Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes