## Exact TSD methods: Example [Two-Stage / GS Designs]

Dear all,

answering my own post in order to keep it short.
In the following an example. We have a guesstimate of the CV (0.20), assume a GMR of 0.95, and aim at power 0.80. No futility criteria. Some regulatory statisticians told me to prefer a first stage as estimated for a fixed-sample design (i.e., the second stage is solely a ‘safety net’).

library(PowerTOST) library(Power2Stage) CV0   <- 0.20 n0    <- sampleN.TOST(CV=CV0, details=FALSE, print=FALSE)[["Sample size"]] n.tot <- power.tsd.in(CV=CV0, n1=n0, fCrit="No", npct=0.5)$nperc[["50%"]] w <- c(n0, n.tot - n0) / n.tot if (w == 1) w <- w + c(-1, +1) * 1e-6 In the method the weights have to be pre-specified, stated in the SAP, and used throughout subsequent steps (irrespective of the re-estimated n2). In the fixed-sample design we would need 20 subjects. How to set the weights? An intuitive way is to use the x̃ (20) of the total sample size based on simulations. This would give us weights of [1, 0]. Great. But weights have to be >0 and <1. Hence, I tweaked them a little to [0.999999, 0.000001]. What can we expect if we run the study with n1 20? power.tsd.in(CV=CV0, n1=n0, fCrit="No", weight=w, npct=c(0.05, 0.25, 0.50, 0.75, 0.95)) TSD with 2x2 crossover Inverse Normal approach - maximum combination test (weights = 0.999999 1e-06) - alpha (s1/s2) = 0.02531 0.02531 - critical value (s1/s2) = 1.95463 1.95463 - with conditional error rates and conditional power Overall target power = 0.8 Threshold in power monitoring step for futility = 0.8 Power calculation via non-central t approx. CV1 and GMR = 0.95 in sample size est. used No futility criterion regarding PE, CI or Nmax Minimum sample size in stage 2 = 4 BE acceptance range = 0.8 ... 1.25 CV = 0.2; n(stage 1) = 20; GMR = 0.95 1e+05 sims at theta0 = 0.95 (p(BE) = 'power'). p(BE) = 0.84868 p(BE) s1 = 0.72513 Studies in stage 2 = 21.76% Distribution of n(total) - mean (range) = 23.4 (20 ... 86) - percentiles 5% 25% 50% 75% 95% 20 20 20 20 42 Fine. If everything turns out as expected we have to be unlucky to need a second stage. Power in the first is already 0.73 and stage 2 sample sizes are not shocking. As common in TSDs the overall power is generally higher than in a fixed-sample design. We perform the first stage and get GMR 0.91 and CV 0.25. Oops! Both are worse than assumed. Especially the GMR is painful. n1 <- n0 GMR1 <- 0.91 CV1 <- 0.25 res <- interim.tsd.in(GMR1=GMR1, CV1=CV1, n1=n1, fCrit="No", weight=w) res TSD with 2x2 crossover Inverse Normal approach - maximum combination test with weights for stage 1 = 1 0 - significance levels (s1/s2) = 0.02531 0.02531 - critical values (s1/s2) = 1.95463 1.95463 - BE acceptance range = 0.8 ... 1.25 - Observed point estimate from stage 1 is not used for SSR - with conditional error rates and conditional (estimated target) power Interim analysis of first stage - Derived key statistics: z1 = 1.57468, z2 = 3.38674, Repeated CI = (0.77306, 1.07120) - No futility criterion met - Test for BE not positive (not considering any futility rule) - Calculated n2 = 24 - Decision: Continue to stage 2 with 24 subjects We fail to show BE (lower CL 77.31%) and should initiate the second stage with 24 subjects. How would a ‘Type 1’ TSD perform? Interim analysis (specified α1 0.0294) ─────────────────────────────────────────────────── 94.12% CI: 77.77–106.48% (failed to demonstrate BE) Power : 0.5092 (approx. via non-central t) Second stage with 14 subjects (N=34) is justified. Pretty similar though a lower n2 is suggested. OK, we perform the second stage and get GMR 0.93 and CV 0.21. Both are slightly better than what we got in the first stage but again worse than assumed. n2 <- res$n2 GMR2  <- 0.93 CV2   <- 0.21 final.tsd.in(GMR1=GMR1, CV1=CV1, n1=n1,              GMR2=GMR2, CV2=CV2, n2=n2, weight=w) TSD with 2x2 crossover Inverse Normal approach  - maximum combination test with weights for stage 1 = 1 0  - significance levels (s1/s2) = 0.02531 0.02531  - critical values (s1/s2) = 1.95463 1.95463  - BE acceptance range = 0.8 ... 1.25 Final analysis of second stage - Derived key statistics:   z1 = 2.32999, z2 = 4.00748,   Repeated CI = (0.82162, 1.05264)   Median unbiased estimate = 0.8997 - Decision: BE achieved

We survived.
In a ‘Type 1’ TSD we would get:

Final analysis of pooled data (specified α2 0.0294) ═══════════════════════════════════════════════════ 94.12% CI: 83.86–101.12% (BE concluded)

Pretty similar again.

If we state it in the protocol, we could also aim for higher power in the second stage if the GMR in the first doesn’t look nice. If we switch to 0.90 we would run the second stage with 36 subjects.

Final analysis of second stage - Derived key statistics:   z1 = 2.86939, z2 = 4.94730,   Repeated CI = (0.84220, 1.02693)   Median unbiased estimate = 0.9053 - Decision: BE achieved

Helps. Another option would be to adjust for GMR1 by using the argument usePE=TRUE in interim.tsd.in(). For power 0.80 that would mean 40 subjects in the second stage and for 0.90 already 62…

Cheers,
Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes Ing. Helmut Schütz 