Parallel (replicate?) [Two-Stage / GS Designs]

posted by Helmut Homepage – Vienna, Austria, 2024-08-28 12:54 (19 d 17:53 ago) – Posting: # 24168
Views: 462

Hi Achievwin,

❝ I feel your pain…

Thank you for the sympathy.

❝ Any example how two stage adaptive design is applied/validated for parallel study or replicate design?

For the parallel design see Anders’ goody [16] in the post above (methods B and C validated for CV 10–100%, n1 48–120; GMR 0.95 and 80% power).
However, if you want something else, you need own simulations to find a suitable adjusted α. Optionally you can also explore a futility criterion for the maximum total sample size. Not complicated in the [image]-pack­age Power2Stage.
Hint: In all TSDs the maximum inflation of the Type I Error occurs at a combination of low CV and small n1. Therefore, explore this area first. Once you found a suitable adjusted α, simulate power and the empiric Type I Error for the entire grid. Regulators will ask you for that.

For parallel TSDs there are two functions in Power2Stage, namely power.tsd.pAF()⁠ and power.tsd.p():I suggest to opt for the second because – due to dropouts – unequal group sizes are common. A rather lengthy example at the end.

❝ In case of parallel design what kind of statistical factor we need to use Potvin …


Anders used α = 0.294 for the analogues of Potvin’s methods B and C. As usual, a slight inflation of the Type I Error in method C with CV ≤ 20% – which is unlikely in parallel designs anyway. Evaluation by the Welch-Satterthwaite test (for unequal variances and group sizes).

If someone knows what might be meant in the ICH M13A’s Section 2.2.3.4 …

The use of stratification in the randomisation procedure based on a limited number of known relevant factors is therefore recommended. Those factors are also recommended to be accounted for […]

… please enlighten me.

❝ … or Bonferroni …


[image]I think (‼) that it will be an acceptable alternative because it is the most conservative one (strictly speaking, it is not correct in a TSD because the hypotheses are not independent).
Assessors love Signore Bonferroni.

❝ … and same for 4-period or 3-period replicate design?


If you mean reference-scaling, no idea. You can try Bonferroni as well. Recently something was published by the FDA but it is wacky (see this post why I think so). I’m not convinced that it is worth the efforts.
Plan the study for the assumed CVwR (and the CVwT if you have the information). In reference-scaling the observed CVwR is taken into account anyway. If the variability is higher than assumed, you can scale more and will gain power. If it is lower than assumed, bad luck. However, the crucial point is – as always – the GMR…

If you mean by ‘3-period replicate design’ the partial replicate (TRR|RTR|RRT) and want to use the FDA’s RSABE, please don’t (see this article why). It is fine for the EMA’s ABEL. If you want a 3-period replicate for the FDA, please opt for one of the full replicates (TRT|RTR, TTR|RRT, or TRR|RTT). Otherwise, you might be in deep shit.


library(PowerTOST)
library(Power2Stage)
# start with a fixed sample design
CV     <- 0.2 # max. inflation of the Type I Error with small CV
GMR    <- 0.9 # realistic with large CV in parallel designs
target <- 0.9 # well…
n      <- sampleN.TOST(CV = CV, theta0 = GMR, targetpower = target,
                       design = "parallel", print = FALSE)[["Sample size"]]
# first stage n1
n1     <- n / 2
# assess the empiric Type I Error at one of the BE-limits (under the Null)
# always one mio simulations (very time consuming…)
# try a range of adjusted alphas

alpha  <- seq(0.0292, 0.0306, 0.0001)
sig    <- binom.test(0.05 * 1e6, 1e6, alternative = "less",
                     conf.level = 0.95)$conf.int[2]
alpha  <- seq(0.0280, 0.0304, 0.0001)
sig    <- binom.test(0.05 * 1e6, 1e6, alternative = "less",
                     conf.level = 0.95)$conf.int[2]
res    <- data.frame(alpha = alpha, TIE = NA_real_, TIE.05 = FALSE,
                     signif = FALSE, TIE.052 = FALSE)
# TIE.05  checks whether the TIE > 0.05
# signif  checks whether the TIE > the limit of the binomial test for 1 mio sim’s
# TIE.052 checks whether the TIE > 0.052 (Potvin’s acceptable inflation)

pb     <- txtProgressBar(style = 3)
for (j in seq_along(alpha)) {
  res$TIE[j] <- power.tsd.p(method = "B", alpha = rep(alpha[j], 2), n1 = n1,
                            GMR = GMR, CV = CV, targetpower = target,
                            test = "welch", theta0 = 1.25, nsims = 1e6)$pBE
  if (res$TIE[j] > 0.05)  res$TIE.05[j]  <- TRUE
  if (res$TIE[j] > sig)   res$signif[j]  <- TRUE
  if (res$TIE[j] > 0.052) res$TIE.052[j] <- TRUE
  setTxtProgressBar(pb, j / length(alpha))
}
close(pb)
wary   <- which(res$TIE.05 == TRUE & res$TIE.052 == FALSE) # belt plus suspenders (EMA?)
res    <- res[(head(wary, 1) - 1):(tail(wary, 1) + 1), ]   # drop some alphas
names(res)[3:5] <- c(">0.05", "* >0.05",">0.052")          # cosmetics
print(res, row.names = FALSE)

Gives:

  alpha      TIE >0.05 * >0.05 >0.052
 0.0293 0.049518 FALSE   FALSE  FALSE
 0.0294 0.050004  TRUE   FALSE  FALSE
 0.0295 0.050178  TRUE   FALSE  FALSE
 0.0296 0.050182  TRUE   FALSE  FALSE
 0.0297 0.050486  TRUE    TRUE  FALSE
 0.0298 0.050777  TRUE    TRUE  FALSE
 0.0299 0.050772  TRUE    TRUE  FALSE
 0.0300 0.050806  TRUE    TRUE  FALSE
 0.0301 0.050974  TRUE    TRUE  FALSE
 0.0302 0.050890  TRUE    TRUE  FALSE
 0.0303 0.051308  TRUE    TRUE  FALSE
 0.0304 0.051535  TRUE    TRUE  FALSE
 0.0305 0.051616  TRUE    TRUE  FALSE
 0.0306 0.052007  TRUE    TRUE   TRUE

If you think that a nonsignificant inflation is fine (makes sense, IMHO), use 0.0296 (Type I Error 0.050182 < 0.050360). If you are a disciple of Madame Potvin, even 0.0305 would be OK (0.051616 < 0.052) . Say, you opted for belt plus suspenders 0.0293 (0.049518 < 0.05), planned the first stage with 300 subjects, and observed a CV of 40%. You had some dropouts (15 in one group and 20 in the other). Therefore, instead of n1 = 300, specify n1 = c(135, 130). What can you expect?

power.tsd.p(method = "B", alpha = rep(0.0293, 2), n1 = c(135, 130),
            GMR = 0.9, CV = 0.4, targetpower = 0.9,
            npct = c(0.05, 0.25, 0.5, 0.75, 0.95))

Gives:

TSD with 2 parallel groups
Method B: alpha (s1/s2) = 0.0293 0.0293
CIs based on Welch's t-test
Target power in power monitoring and sample size est. = 0.9
Power calculation via non-central t approx.
CV1 and GMR = 0.9 in sample size est. used
No futility criterion
BE acceptance range = 0.8 ... 1.25

CV = 0.4; ntot(stage 1) = 265 (nT, nR = 135, 130); GMR = 0.9

1e+05 sims at theta0 = 0.9 (p(BE) = 'power').
p(BE)    = 0.91405
p(BE) s1 = 0.72275
Studies in stage 2 = 27.73%

Distribution of n(total)
- mean (range) = 312.8 (265 ... 628)
- percentiles
 5% 25% 50% 75% 95%
265 265 265 390 472

You have a chance of ≈72% to show BE already in the first stage and ≈91% if you have to proceed to the second (chance ≈28%). As in the Potvin’s methods there is no futility on the total sample size.
However, in this method you can specify one. Say, you don’t want more than 450 subjects:

power.tsd.p(method = "B", alpha = rep(0.0293, 2), n1 = c(135, 130),
            GMR = 0.9, CV = 0.4, targetpower = 0.9,
            npct = c(0.05, 0.25, 0.5, 0.75, 0.95), Nmax = 450)

Gives:

TSD with 2 parallel groups
Method B: alpha (s1/s2) = 0.0293 0.0293
CIs based on Welch's t-test
Target power in power monitoring and sample size est. = 0.9
Power calculation via non-central t approx.
CV1 and GMR = 0.9 in sample size est. used
Futility criterion Nmax = 450
BE acceptance range = 0.8 ... 1.25

CV = 0.4; ntot(stage 1) = 265 (nT, nR = 135, 130); GMR = 0.9

1e+05 sims at theta0 = 0.9 (p(BE) = 'power').
p(BE)    = 0.83875
p(BE) s1 = 0.72275
Studies in stage 2 = 17.91%

Distribution of n(total)
- mean (range) = 292 (265 ... 450)
- percentiles
 5% 25% 50% 75% 95%
265 265 265 265 434

Of course, you have the same chance to pass in the first stage as before. But since studies with a total sample size > 450 are considered a failure, less studies proceed to the second stage (≈18% vs ≈28%) and the overall power is lower than without the futility (≈84% vs ≈91%).
Let’s compare now the empiric Type I Errors for both.

sig  <- binom.test(0.05 * 1e6, 1e6, alternative = "less",
                   conf.level = 0.95)$conf.int[2]
comp <- data.frame(study = c("no futility", "with futility"),
                   TIE = NA_real_, TIE.05 = FALSE,
                   signif = FALSE, TIE.052 = FALSE)
for (j in 1:2) {
  if (comp$study[j] == "no futility") {
    comp$TIE[j] <- power.tsd.p(method = "B", alpha = rep(0.0293, 2),
                               n1 = c(135, 130), GMR = 0.9, CV = 0.4,
                               targetpower = 0.9, test = "welch",
                               theta0 = 1.25, nsims = 1e6)$pBE
  } else {
    comp$TIE[j] <- power.tsd.p(method = "B", alpha = rep(0.0293, 2),
                               n1 = c(135, 130), GMR = 0.9, CV = 0.4,
                               targetpower = 0.9, test = "welch",
                               theta0 = 1.25, nsims = 1e6, Nmax = 450)$pBE
  }
  if (comp$TIE[j] > 0.05)  comp$TIE.05[j]  <- TRUE
  if (comp$TIE[j] > sig)   comp$signif[j]  <- TRUE
  if (comp$TIE[j] > 0.052) comp$TIE.052[j] <- TRUE
}
names(comp)[3:5] <- c(">0.05", "* >0.05",">0.052")
print(comp, row.names = FALSE)

Gives:

         study      TIE >0.05 * >0.05 >0.052
   no futility 0.045936 FALSE   FALSE  FALSE
 with futility 0.040638 FALSE   FALSE  FALSE

Lessons learned: We obtained the adjusted α for CV 20%. For a larger one (here 40%), the Type I Error will be similar or even lower. If we introduce a futility, the Type I Error will always decrease because fewer studies will proceed to the second stage. This holds also for any published method. Therefore, you don’t have to repeat simulations – that’s trivial and can be used as a justification.

A caveat: Actually it is not that simple. In practice you have to repeat this exercise for a range of unequal variances and group sizes in the first stage. It might be that you have to adjust more based on the worst case combination. I did that some time ago. Took me a week, four simultaneous [image]-sessions, CPU-load close to 90%…

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
Activity
 Admin contact
23,225 posts in 4,879 threads, 1,654 registered users;
37 visitors (0 registered, 37 guests [including 5 identified bots]).
Forum time: 06:48 CEST (Europe/Vienna)

The real purpose of the scientific method is to make sure
nature hasn’t misled you into thinking you know something
you actually don’t know.    Robert M. Pirsig

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5