Bioequivalence and Bioavailability Forum • Simulation-based 2×2×2 TSDs

Simulation-based 2×2×2 TSDs – a deep dive [Two-Stage / GS Designs]

posted by Helmut – Vienna, Austria, 2024-09-03 15:18 (270 d 20:54 ago) – Posting: # 24179
Views: 1,222

Dear all,

some more stuff related to this post.

Burton et al.¹ recommend that the random number generator to use and the starting seeds are specified. Furthermore:

»A random number generator must be able to reproduce the identical set of random numbers when the same starting value, known as a seed, is specified. This is also essential when performing simulation studies to enable the generated data sets and hence results to be reproduced, if necessary, for monitoring purposes.«

Although I appreciate the article² it should have been published in the Journal of Irreproducible Results in parallel.

»A different randomly selected seed was used for each scenario.«

Inspect the plots below at 10⁶ simulations to assess the effect of random seeds. Given, one can almost reproduce the results.

Empiric Type I Errors, shifted t-distribution, random seeds, 10⁶ simulations (part of Potvin et al.² Table I)$$\small{\begin{array}{cccc}
& & \text{Method} &\\\hline
CV\,20\% & \text{B} & \text{C} & \text{D}\\
n_1 & \alpha=0.0294 & \alpha=0.0294 & \alpha=0.0280\\\hline
12 & \phantom{0}0.0463\phantom{0} & \color{Red}{0.0510} & \phantom{0}0.0499\phantom{0}\\
24 & \phantom{0}0.0320\phantom{0} & 0.0490 & \phantom{0}0.0493\phantom{0}\\
36 & \phantom{0}0.0294\phantom{0} & 0.0499 & \phantom{0}0.0499\phantom{0}\\
48 & \phantom{0}0.0292\phantom{0} & 0.0495 & \phantom{0}0.0497\phantom{0}\\
60 & \phantom{0}0.0297\phantom{0} & 0.0500 & \phantom{0}0.0500\phantom{0}\\\hline
\end{array}}$$Empiric Type I Errors, noncentral t-distribution, fixed seeds, 10⁶ simulations ( [image]

-package Power2Stage)$$\small{\begin{array}{cccc}
& & \text{Method} &\\\hline
CV\,20\% & \text{B} & \text{C} & \text{D}\\
n_1 & \alpha=0.0294 & \alpha=0.0294 & \alpha=0.0280\\\hline
12 & \phantom{0}0.0463\phantom{0} & \color{Red}{0.0511} & \phantom{0}0.0499\phantom{0}\\
24 & \phantom{0}0.0315\phantom{0} & 0.0492 & \phantom{0}0.0492\phantom{0}\\
36 & \phantom{0}0.0294\phantom{0} & 0.0499 & \phantom{0}0.0499\phantom{0}\\
48 & \phantom{0}0.0293\phantom{0} & 0.0498 & \phantom{0}0.0498\phantom{0}\\
60 & \phantom{0}0.0293\phantom{0} & 0.0496 & \phantom{0}0.0496\phantom{0}\\\hline
\end{array}}$$ Regulators on this this of the pond were/are concerned about an inflated Type I Error in ‘Method C’ with a low $\small{CV}$ and small sample size $\small{n_1}$ in the first stage. The authors reported an even larger – but irreproducible^A – inflation of the Type I Error with 0.0504 for $\small{CV=10\%}$ and 60 (‼) subjects in the first stage. Who on earth would design such a study?

Let’s dive into the matter.

[image]
Five simulations at every number of simulations. Filled circles = fixed seeds, empty
circles = random seeds, magenta line = mean of empiric Type I Error, dashed curves =
significance limits of the binomial test for $\small{\alpha=0.05}$ and number of simulations.
≈2.43 billion simulations in total.

We see that the Type I Error is controlled with 12 subjects in the first stage (≈0.0463 with one and 100 million simulations).

[image]

Although within the limit of a ‘negligible inflation’² (0.052), the Type I Error is indeed significantly inflated (≈0.0511 with one million simulations and ≈0.0513 with 100 millions).

[image]

However, already with 16 subjects in the first stage the Type I Error is no more inflated (≈0.0496 with one million simulations and ≈0.0495 with 100 millions). The same holds for any larger first stage. Therefore, in practice there is no problem in applying ‘Method C’ and why it is recommended by the FDA and Health Canada.^3,4
Don’t be tempted to start with a small first stage. Unless the $\small{CV}$ is low, there will be high chance to have to proceed to the second stage, which takes time. Instead, plan the first stage in such a way that there is a reasonably high chance to show BE already in the interim. Rule of thumb: 80–90% of a fixed sample design will give a power of 60–70% in the first stage. Then the second stage can be understood as a kind of ‘safety net’ – something European assessors seemingly prefer.

It is a mystery to me why the authors recommended ‘Method C’, while no inflated Type I Error has been observed in ‘Method D’ with slightly more adjustment ($\small{\alpha=0.0280}$) in any of the $\small{CV}$-$\small{n_1}$ combinations.

[image]

No significant inflation of the Type I Error (≈0.0500 with one million simulations and ≈0.0499 with 100 millions).

We could have avoided the endless and fruitless discussions if they had thrown ‘Method C’ in the garbage. Further, the loss in power compared to ‘Method C’ is negligible.^B

Estimated Type I Errors and Powers for Methods B, C, and D. Shifted central t-distribution, one million simulations, fixed seeds. n1 CV(%) TIE(B) TIE(C) TIE(D) Pwr(B) Pwr(C) Pwr(D) 12 10 0.0294 0.0499 0.0499 0.9772 0.9890 0.9890 24 10 0.0292 0.0496 0.0496 0.9999 1.0000 1.0000 36 10 0.0293 0.0499 0.0499 1.0000 1.0000 1.0000 48 10 0.0293 0.0498 0.0498 1.0000 1.0000 1.0000 60 10 0.0293 0.0496 0.0496 1.0000 1.0000 1.0000 12 20 0.0464 0.0512 0.0499 0.8442 0.8486 0.8478 24 20 0.0316 0.0492 0.0491 0.8816 0.9097 0.9098 36 20 0.0294 0.0499 0.0499 0.9553 0.9752 0.9752 48 20 0.0293 0.0498 0.0498 0.9887 0.9946 0.9946 60 20 0.0293 0.0496 0.0496 0.9974 0.9989 0.9989 12 30 0.0436 0.0440 0.0416 0.7868 0.7872 0.7850 24 30 0.0475 0.0490 0.0472 0.8309 0.8320 0.8315 36 30 0.0397 0.0476 0.0467 0.8385 0.8483 0.8481 48 30 0.0321 0.0493 0.0492 0.8550 0.8873 0.8873 60 30 0.0295 0.0496 0.0496 0.8999 0.9363 0.9363 12 40 0.0343 0.0342 0.0326 0.7513 0.7513 0.7499 24 40 0.0430 0.0431 0.0407 0.8037 0.8033 0.8026 36 40 0.0486 0.0488 0.0466 0.8238 0.8244 0.8229 48 40 0.0457 0.0470 0.0453 0.8297 0.8308 0.8302 60 40 0.0406 0.0464 0.0455 0.8310 0.8369 0.8368 12 50 0.0313 0.0310 0.0296 0.7360 0.7363 0.7352 24 50 0.0336 0.0335 0.0319 0.7830 0.7830 0.7822 36 50 0.0418 0.0417 0.0392 0.8054 0.8046 0.8042 48 50 0.0483 0.0480 0.0458 0.8190 0.8193 0.8179 60 50 0.0478 0.0481 0.0461 0.8257 0.8253 0.8251 12 60 0.0299 0.0297 0.0284 0.7288 0.7295 0.7278 24 60 0.0309 0.0310 0.0294 0.7749 0.7744 0.7733 36 60 0.0332 0.0332 0.0312 0.7910 0.7908 0.7904 48 60 0.0399 0.0398 0.0372 0.8048 0.8045 0.8031 60 60 0.0466 0.0468 0.0442 0.8156 0.8153 0.8146 12 70 0.0293 0.0294 0.0280 0.7253 0.7254 0.7248 24 70 0.0305 0.0305 0.0287 0.7707 0.7707 0.7702 36 70 0.0305 0.0303 0.0290 0.7850 0.7850 0.7845 48 70 0.0325 0.0325 0.0306 0.7933 0.7933 0.7933 60 70 0.0378 0.0380 0.0356 0.8026 0.8027 0.8017 12 80 0.0293 0.0293 0.0278 0.7234 0.7232 0.7225 24 80 0.0300 0.0300 0.0285 0.7683 0.7683 0.7679 36 80 0.0300 0.0300 0.0284 0.7833 0.7833 0.7822 48 80 0.0300 0.0300 0.0286 0.7895 0.7895 0.7890 60 80 0.0318 0.0318 0.0299 0.7950 0.7950 0.7943

I must confess that I performed most of my studies according to ‘Method C’ but all passed BE already in the interim. This might explain why I never received a deficiency letter from European assessors.

In the published methods^2,5–10 there is an inflation of letters (B, C, C/D, E, F, …) – Detlew called it ‘letteritis’. To avoid ambiguities we prefer ‘Type 1’ for ones where an adjusted $\small{\alpha}$ is used both in the interim and the final analysis (e.g., ‘Method B’). In ‘Type 2’ $\small{\alpha=0.05}$ or – conditional on power – an adjusted $\small{\alpha}$ is used in the interim and an adjusted $\small{\alpha}$ in the final analysis (e.g., ‘Method C’).^11,12 Note that in some of the methods^9,10 the adjusted $\small{\alpha}$ in the interim and the final analysis is not the same.

Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Statist Med. 2006; 25: 4279–92. doi:10.1002/sim.2673.
Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs. Pharm Stat. 2007; 7(4): 245—262.
FDA (CDER). Bioequivalence Studies With Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA. Guidance for Industry. Draft. Silver Spring. August 2021.
Health Canada. Guidance Document. Conduct and Analysis of Comparative Bioavailability Studies. Ottawa. Adopted 2012/02/08, Revised 2023/01/30.
Montague TH, Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ. Additional results for ‘Sequential design approaches for bioequivalence studies with crossover designs’. Pharm Stat. 2011; 11(1): 8–13. doi:10.1002/pst.483
Fuglsang A. Sequential Bioequivalence Trial Designs with Increased Power and Controlled Type I Error Rates. AAPS J. 2013; 15(3): 659–61. doi:10.1208/s12248-013-9475-5
Karalis V, Macheras P. An Insight into the Properties of a Two-Stage Design in Bioequivalence Studies. Pharm Res. 2013; 30(7): 1824–35. doi:10.1007/s11095-013-1026-3
Karalis V, Macheras P. On the Statistical Model of the Two-Stage Designs in Bioequivalence Assessment. J Pharm Pharmacol. 2014; 66(1): 48–52. doi:10.1111/jphp.12164
Zheng Ch, Zhao L, Wang J. Modifications of sequential designs in bioequivalence trials. Pharm Stat. 2015; 14(3): 180–8. doi:10.1002/pst.1672
Xu J, Audet C, DiLiberti CE, Hauck WW, Montague TH, Parr TH, Potvin D, Schuirmann DJ. Optimal adaptive sequential designs for crossover bioequivalence studies. Pharm Stat. 2016; 15(1): 15–27. doi:10.1002/pst.1721
Schütz H. Two-stage designs in bioequivalence trials. Eur J Clin Pharmacol. 2015; 71(3): 271–81. doi:10.1007/s00228-015-1806-2
Molins E, Labes D, Schütz H, Cobo E, Ocaña J. An iterative method to protect the type I error rate in bioequivalence studies under two-stage adaptive 2×2 crossover designs. Biom J. 2021; 63(1): 122–33. doi:10.1002/bimj.201900388

Possibly Mr. Murphy hit:
library(Power2Stage) runs <- 25 comp <- data.frame(run = 1:runs, TIE = NA_real_) for (j in 1:runs) { ifelse (comp$run[j] == 1, setseed <- TRUE, setseed <- FALSE) comp$TIE[j] <- round(power.tsd(method = "C", CV = .1, n1 = 60, theta0 = 1.25, setseed = setseed, pmethod = "shifted")$pBE, 4) # one million sim’s by default } cat("Empiric Type I Error in", runs, "repeated simulations\n"); summary(comp$TIE) Empiric Type I Error in 25 repeated simulations Min. 1st Qu. Median Mean 3rd Qu. Max. 0.04940 0.04980 0.04990 0.04993 0.05000 0.05040
Comparison of empiric Type I Errors and Powers:
# Assess the operating characteristics in the grid of Potvin et al.: # Use the defaults of the function: GMR = 0.95 and targetpower = 0.80. # For power use 1E6 sim’s for comparison with the article (default are 1E5). # Use theta0 = 1.25 for the Type I Error (1E6 sim’s by default). # Use a fixed seed to support reproducibility (default setseed = TRUE) and by # the shifted central t-distribution (instead of by the noncentral t-distribution, # the function’s default). n1 <- seq(12L, 60L, 12L) CV <- seq(10L, 80L, 10L) / 100 x <- data.frame(n1 = n1, CV = rep(CV, each = length(n1)), B.TIE = NA_real_, C.TIE = NA_real_, D.TIE = NA_real_, B.pwr = NA_real_, C.pwr = NA_real_, D.pwr = NA_real_) B <- C <- rep(0.0294, 2); D <- rep(0.0280, 2) # adjusted alphas pm <- "shifted" # like in the paper pbc <- 0L pb <- txtProgressBar(style = 3) for (j in 1L:nrow(x)) { pbc <- pbc + 1L CV.j <- x$CV[j] n1.j <- x$n1[j] x$B.TIE[j] <- power.tsd(method = "B", alpha = B, CV = CV.j, n1 = n1.j, pmethod = pm, theta0 = 1.25)$pBE setTxtProgressBar(pb, pbc / nrow(x)) x$C.TIE[j] <- power.tsd(method = "C", alpha = C, CV = CV.j, n1 = n1.j, pmethod = pm, theta0 = 1.25)$pBE setTxtProgressBar(pb, pbc / nrow(x)) x$D.TIE[j] <- power.tsd(method = "C", alpha = D, CV = CV.j, n1 = n1.j, pmethod = pm, theta0 = 1.25)$pBE setTxtProgressBar(pb, pbc / nrow(x)) x$B.pwr[j] <- power.tsd(method = "B", alpha = B, CV = CV.j, n1 = n1.j, pmethod = pm, nsims = 1E6)$pBE setTxtProgressBar(pb, pbc / nrow(x)) x$C.pwr[j] <- power.tsd(method = "C", alpha = C, CV = CV.j, n1 = n1.j, pmethod = pm, nsims = 1E6)$pBE setTxtProgressBar(pb, pbc / nrow(x)) x$D.pwr[j] <- power.tsd(method = "C", alpha = D, CV = CV.j, n1 = n1.j, pmethod = pm, nsims = 1E6)$pBE setTxtProgressBar(pb, pbc / nrow(x)) } close(pb) x$CV <- x$CV * 100 x[, 3:8] <- round(x[, 3:8], 4) x$CV <- sprintf("%i ", x$CV) names(x)[2:8] <- c("CV(%)", sprintf("TIE(%s)", LETTERS[2:4]), sprintf("Pwr(%s)", LETTERS[2:4])) txt <- paste0("Estimated Type I Errors and Powers for Methods B, C, and D.\n", "Shifted central t-distribution, one million simulations, ", "fixed seeds.\n") cat(txt); print(x, row.names = FALSE)

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

Simulation-based 2×2×2 TSDs – a deep diveHelmut 2024-09-03 13:18 [Two-Stage / GS Designs]