Helmut
★★★
avatar
Homepage
Vienna, Austria,
2024-09-03 15:18
(15 d 20:55 ago)

Posting: # 24179
Views: 285
 

 Simulation-based 2×2×2 TSDs – a deep dive [Two-Stage / GS Designs]

Dear all,

some more stuff related to this post.

Burton et al.1 recommend that the random number generator to use and the starting seeds are specified. Furthermore:

»A random number generator must be able to reproduce the identical set of random numbers when the same starting value, known as a seed, is specified. This is also essential when performing simulation studies to enable the generated data sets and hence results to be reproduced, if necessary, for monitoring purposes.«


Although I appreciate the article2 it should have been published in the Journal of Irreproducible Results in parallel.

»A different randomly selected seed was used for each scenario.«

Inspect the plots below at 106 simulations to assess the effect of random seeds. Given, one can almost reproduce the results.

Empiric Type I Errors, shifted t-distribution, random seeds, 106 simulations (part of Potvin et al.2 Table I)$$\small{\begin{array}{cccc}
& & \text{Method} &\\\hline
CV\,20\% & \text{B} & \text{C} & \text{D}\\
n_1 & \alpha=0.0294 & \alpha=0.0294 & \alpha=0.0280\\\hline
12 & \phantom{0}0.0463\phantom{0} & \color{Red}{0.0510} & \phantom{0}0.0499\phantom{0}\\
24 & \phantom{0}0.0320\phantom{0} & 0.0490 & \phantom{0}0.0493\phantom{0}\\
36 & \phantom{0}0.0294\phantom{0} & 0.0499 & \phantom{0}0.0499\phantom{0}\\
48 & \phantom{0}0.0292\phantom{0} & 0.0495 & \phantom{0}0.0497\phantom{0}\\
60 & \phantom{0}0.0297\phantom{0} & 0.0500 & \phantom{0}0.0500\phantom{0}\\\hline
\end{array}}$$Empiric Type I Errors, noncentral t-distribution, fixed seeds, 106 simulations ([image]-package Power2Stage)$$\small{\begin{array}{cccc}
& & \text{Method} &\\\hline
CV\,20\% & \text{B} & \text{C} & \text{D}\\
n_1 & \alpha=0.0294 & \alpha=0.0294 & \alpha=0.0280\\\hline
12 & \phantom{0}0.0463\phantom{0} & \color{Red}{0.0511} & \phantom{0}0.0499\phantom{0}\\
24 & \phantom{0}0.0315\phantom{0} & 0.0492 & \phantom{0}0.0492\phantom{0}\\
36 & \phantom{0}0.0294\phantom{0} & 0.0499 & \phantom{0}0.0499\phantom{0}\\
48 & \phantom{0}0.0293\phantom{0} & 0.0498 & \phantom{0}0.0498\phantom{0}\\
60 & \phantom{0}0.0293\phantom{0} & 0.0496 & \phantom{0}0.0496\phantom{0}\\\hline
\end{array}}$$ Regulators on this this of the pond were/are concerned about an inflated Type I Error in ‘Method C’ with a low \(\small{CV}\) and small sample size \(\small{n_1}\) in the first stage. The authors reported an even larger – but irreproducibleA – inflation of the Type I Error with 0.0504 for \(\small{CV=10\%}\) and 60 (‼) subjects in the first stage. Who on earth would design such a study?

Let’s dive into the matter.

[image]
Five simulations at every number of simulations. Filled circles = fixed seeds, empty
circles = random seeds, magenta line = mean of empiric Type I Error, dashed curves =
significance limits of the binomial test for \(\small{\alpha=0.05}\) and number of simulations.
≈2.43 billion simulations in total.

We see that the Type I Error is controlled with 12 subjects in the first stage (≈0.0463 with one and 100 million simulations).

[image]

Although within the limit of a ‘negligible inflation’2 (0.052), the Type I Error is indeed significantly inflated (≈0.0511 with one million simulations and ≈0.0513 with 100 millions).

[image]

However, already with 16 subjects in the first stage the Type I Error is no more inflated (≈0.0496 with one million simulations and ≈0.0495 with 100 millions). The same holds for any larger first stage. Therefore, in practice there is no problem in applying ‘Method C’ and why it is recommended by the FDA and Health Canada.3,4
Don’t be tempted to start with a small first stage. Unless the \(\small{CV}\) is low, there will be high chance to have to proceed to the second stage, which takes time. Instead, plan the first stage in such a way that there is a reasonably high chance to show BE already in the in­ter­im. Rule of thumb: 80–90% of a fixed sample design will give a power of 60–70% in the first stage. Then the second stage can be understood as a kind of ‘safety net’ – something Eu­ro­pe­an assessors seemingly prefer.

It is a mystery to me why the authors recommended ‘Method C’, while no inflated Type I Error has been observed in ‘Method D’ with slightly more adjustment (\(\small{\alpha=0.0280}\)) in any of the \(\small{CV}\)-\(\small{n_1}\) combinations.

[image]

No significant inflation of the Type I Error (≈0.0500 with one million simulations and ≈0.0499 with 100 millions).

We could have avoided the endless and fruitless discussions if they had thrown ‘Method C’ in the garbage. Further, the loss in power compared to ‘Method C’ is negligible.B

Estimated Type I Errors and Powers for Methods B, C, and D.
Shifted central t-distribution, one million simulations, fixed seeds.
 n1 CV(%) TIE(B) TIE(C) TIE(D) Pwr(B) Pwr(C) Pwr(D)
 12  10   0.0294 0.0499 0.0499 0.9772 0.9890 0.9890
 24  10   0.0292 0.0496 0.0496 0.9999 1.0000 1.0000
 36  10   0.0293 0.0499 0.0499 1.0000 1.0000 1.0000
 48  10   0.0293 0.0498 0.0498 1.0000 1.0000 1.0000
 60  10   0.0293 0.0496 0.0496 1.0000 1.0000 1.0000
 12  20   0.0464
0.0512 0.0499 0.8442 0.8486 0.8478
 24  20   0.0316 0.0492 0.0491 0.8816 0.9097 0.9098
 36  20   0.0294 0.0499 0.0499 0.9553 0.9752 0.9752
 48  20   0.0293 0.0498 0.0498 0.9887 0.9946 0.9946
 60  20   0.0293 0.0496 0.0496 0.9974 0.9989 0.9989
 12  30   0.0436 0.0440 0.0416 0.7868 0.7872 0.7850
 24  30   0.0475 0.0490 0.0472 0.8309 0.8320 0.8315
 36  30   0.0397 0.0476 0.0467 0.8385 0.8483 0.8481
 48  30   0.0321 0.0493 0.0492 0.8550 0.8873 0.8873
 60  30   0.0295 0.0496 0.0496 0.8999 0.9363 0.9363
 12  40   0.0343 0.0342 0.0326 0.7513 0.7513 0.7499
 24  40   0.0430 0.0431 0.0407 0.8037 0.8033 0.8026
 36  40   0.0486 0.0488 0.0466 0.8238 0.8244 0.8229
 48  40   0.0457 0.0470 0.0453 0.8297 0.8308 0.8302
 60  40   0.0406 0.0464 0.0455 0.8310 0.8369 0.8368
 12  50   0.0313 0.0310 0.0296 0.7360 0.7363 0.7352
 24  50   0.0336 0.0335 0.0319 0.7830 0.7830 0.7822
 36  50   0.0418 0.0417 0.0392 0.8054 0.8046 0.8042
 48  50   0.0483 0.0480 0.0458 0.8190 0.8193 0.8179
 60  50   0.0478 0.0481 0.0461 0.8257 0.8253 0.8251
 12  60   0.0299 0.0297 0.0284 0.7288 0.7295 0.7278
 24  60   0.0309 0.0310 0.0294 0.7749 0.7744 0.7733
 36  60   0.0332 0.0332 0.0312 0.7910 0.7908 0.7904
 48  60   0.0399 0.0398 0.0372 0.8048 0.8045 0.8031
 60  60   0.0466 0.0468 0.0442 0.8156 0.8153 0.8146
 12  70   0.0293 0.0294 0.0280 0.7253 0.7254 0.7248
 24  70   0.0305 0.0305 0.0287 0.7707 0.7707 0.7702
 36  70   0.0305 0.0303 0.0290 0.7850 0.7850 0.7845
 48  70   0.0325 0.0325 0.0306 0.7933 0.7933 0.7933
 60  70   0.0378 0.0380 0.0356 0.8026 0.8027 0.8017
 12  80   0.0293 0.0293 0.0278 0.7234 0.7232 0.7225
 24  80   0.0300 0.0300 0.0285 0.7683 0.7683 0.7679
 36  80   0.0300 0.0300 0.0284 0.7833 0.7833 0.7822
 48  80   0.0300 0.0300 0.0286 0.7895 0.7895 0.7890
 60  80   0.0318 0.0318 0.0299 0.7950 0.7950 0.7943


I must confess that I performed most of my studies according to ‘Method C’ but all passed BE already in the interim. This might explain why I never received a deficiency letter from European assessors.

In the published methods2,5–10 there is an inflation of letters (B, C, C/D, E, F, …) – Detlew called it ‘letteritis’. To avoid ambiguities we prefer ‘Type 1’ for ones where an adjusted \(\small{\alpha}\) is used both in the interim and the final analysis (e.g., ‘Method B’). In ‘Type 2’ \(\small{\alpha=0.05}\) or – conditional on power – an adjusted \(\small{\alpha}\) is used in the interim and an adjusted \(\small{\alpha}\) in the final analysis (e.g., ‘Method C’).11,12 Note that in some of the methods9,10 the adjusted \(\small{\alpha}\) in the interim and the final analysis is not the same.


  1. Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Statist Med. 2006; 25: 4279–92. doi:10.1002/sim.2673.
  2. Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs. Pharm Stat. 2007; 7(4): 245—262.
  3. FDA (CDER). Bioequivalence Studies With Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA. Guidance for Industry. Draft. Silver Spring. August 2021.
  4. Health Canada. Guidance Document. Conduct and Analysis of Comparative Bioavailability Studies. Ottawa. Adopted 2012/02/08, Revised 2023/01/30.
  5. Montague TH, Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ. Additional results for ‘Sequential design approaches for bio­equivalence studies with crossover designs’. Pharm Stat. 2011; 11(1): 8–13. doi:10.1002/pst.483
  6. Fuglsang A. Sequential Bioequivalence Trial Designs with Increased Power and Controlled Type I Error Rates. AAPS J. 2013; 15(3): 659–61. doi:10.1208/s12248-013-9475-5
  7. Karalis V, Macheras P. An Insight into the Properties of a Two-Stage Design in Bioequivalence Studies. Pharm Res. 2013; 30(7): 1824–35. doi:10.1007/s11095-013-1026-3
  8. Karalis V, Macheras P. On the Statistical Model of the Two-Stage Designs in Bioequivalence Assessment. J Pharm Pharmacol. 2014; 66(1): 48–52. doi:10.1111/jphp.12164
  9. Zheng Ch, Zhao L, Wang J. Modifications of sequential designs in bioequivalence trials. Pharm Stat. 2015; 14(3): 180–8. doi:10.1002/pst.1672
  10. Xu J, Audet C, DiLiberti CE, Hauck WW, Montague TH, Parr TH, Potvin D, Schuirmann DJ. Optimal adaptive sequential designs for crossover bioequivalence studies. Pharm Stat. 2016; 15(1): 15–27. doi:10.1002/pst.1721
  11. Schütz H. Two-stage designs in bioequivalence trials. Eur J Clin Pharmacol. 2015; 71(3): 271–81. doi:10.1007/s00228-015-1806-2
  12. Molins E, Labes D, Schütz H, Cobo E, Ocaña J. An iterative method to protect the type I error rate in bioequivalence studies under two-stage adaptive 2×2 crossover designs. Biom J. 2021; 63(1): 122–33. doi:10.1002/bimj.201900388

  1. Possibly Mr. Murphy hit:
    library(Power2Stage)
    runs <- 25
    comp <- data.frame(run = 1:runs, TIE = NA_real_)
    for (j in 1:runs) {
      ifelse (comp$run[j] == 1, setseed <- TRUE, setseed <- FALSE)
      comp$TIE[j] <- round(power.tsd(method = "C", CV = .1, n1 = 60, theta0 = 1.25, setseed = setseed,
                                     pmethod = "shifted")$pBE, 4) # one million sim’s by default
    }
    cat("Empiric Type I Error in", runs, "repeated simulations\n"); summary(comp$TIE)

    Empiric Type I Error in 25 repeated simulations
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    0.04940 0.04980 0.04990 0.04993 0.05000 0.05040


  2. Comparison of empiric Type I Errors and Powers:
    # Assess the operating characteristics in the grid of Potvin et al.:
    # Use the defaults of the function: GMR = 0.95 and targetpower = 0.80.
    # For power use 1E6 sim’s for comparison with the article (default are 1E5).
    # Use theta0 = 1.25 for the Type I Error (1E6 sim’s by default).
    # Use a fixed seed to support reproducibility (default setseed = TRUE) and by
    # the shifted central t-distribution (instead of by the noncentral t-distribution,
    # the function’s default).

    n1  <- seq(12L, 60L, 12L)
    CV  <- seq(10L, 80L, 10L) / 100
    x   <- data.frame(n1 = n1, CV = rep(CV, each = length(n1)),
                      B.TIE = NA_real_, C.TIE = NA_real_, D.TIE = NA_real_,
                      B.pwr = NA_real_, C.pwr = NA_real_, D.pwr = NA_real_)
    B   <- C <- rep(0.0294, 2); D <- rep(0.0280, 2) # adjusted alphas
    pm  <- "shifted"                                # like in the paper
    pbc <- 0L
    pb  <- txtProgressBar(style = 3)
    for (j in 1L:nrow(x)) {
      pbc  <- pbc + 1L
      CV.j <- x$CV[j]
      n1.j <- x$n1[j]
      x$B.TIE[j] <- power.tsd(method = "B", alpha = B, CV = CV.j, n1 = n1.j,
                              pmethod = pm, theta0 = 1.25)$pBE
      setTxtProgressBar(pb, pbc / nrow(x))
      x$C.TIE[j] <- power.tsd(method = "C", alpha = C, CV = CV.j, n1 = n1.j,
                              pmethod = pm, theta0 = 1.25)$pBE
      setTxtProgressBar(pb, pbc / nrow(x))
      x$D.TIE[j] <- power.tsd(method = "C", alpha = D, CV = CV.j, n1 = n1.j,
                              pmethod = pm, theta0 = 1.25)$pBE
      setTxtProgressBar(pb, pbc / nrow(x))
      x$B.pwr[j] <- power.tsd(method = "B", alpha = B, CV = CV.j, n1 = n1.j,
                              pmethod = pm, nsims = 1E6)$pBE
      setTxtProgressBar(pb, pbc / nrow(x))
      x$C.pwr[j] <- power.tsd(method = "C", alpha = C, CV = CV.j, n1 = n1.j,
                              pmethod = pm, nsims = 1E6)$pBE
      setTxtProgressBar(pb, pbc / nrow(x))
      x$D.pwr[j] <- power.tsd(method = "C", alpha = D, CV = CV.j, n1 = n1.j,
                              pmethod = pm, nsims = 1E6)$pBE
      setTxtProgressBar(pb, pbc / nrow(x))
    }
    close(pb)
    x$CV          <- x$CV * 100
    x[, 3:8]      <- round(x[, 3:8], 4)
    x$CV          <- sprintf("%i  ", x$CV)
    names(x)[2:8] <- c("CV(%)", sprintf("TIE(%s)", LETTERS[2:4]),
                       sprintf("Pwr(%s)", LETTERS[2:4]))
    txt <- paste0("Estimated Type I Errors and Powers for Methods B, C, and D.\n",
                  "Shifted central t-distribution, one million simulations, ",
                  "fixed seeds.\n")
    cat(txt); print(x, row.names = FALSE)

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
UA Flag
Activity
 Admin contact
23,228 posts in 4,879 threads, 1,652 registered users;
44 visitors (0 registered, 44 guests [including 8 identified bots]).
Forum time: 12:13 CEST (Europe/Vienna)

To know that we know what we know,
and to know that we do not know what we do not know,
that is true knowledge.    Nicolaus Copernicus

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5