But what is the real problem? [Two-Stage / GS Designs]
❝ We have a drug and do not know it CV. We take an arbitrary sample …
Well, I always try to make an educated guess of the CV. If you aim too low in the first stage, the sample size penalty in the second stage will be larger.
Example: “Guesstimate” CV 25%, Potvin B, n1 12 or 24.
library(Power2Stage)
power.tsd(CV=0.25, n1=12)
TSD with 2x2 crossover
Method B: alpha (s1/s2) = 0.0294 0.0294
Target power in power monitoring and sample size est. = 0.8
Power calculation via non-central t approx.
CV1 and GMR = 0.95 in sample size est. used
No futility criterion
BE acceptance range = 0.8 ... 1.25
CV = 0.25; n(stage 1) = 12; GMR = 0.95
1e+05 sims at theta0 = 0.95 (p(BE) = 'power').
p(BE) = 0.81438
p(BE) s1 = 0.17895
Studies in stage 2 = 81.38%
Distribution of n(total)
- mean (range) = 32.4 (12 ... 126)
- percentiles
5% 50% 95%
12 32 60
power.tsd(CV=0.25, n1=24)
TSD with 2x2 crossover
Method B: alpha (s1/s2) = 0.0294 0.0294
Target power in power monitoring and sample size est. = 0.8
Power calculation via non-central t approx.
CV1 and GMR = 0.95 in sample size est. used
No futility criterion
BE acceptance range = 0.8 ... 1.25
CV = 0.25; n(stage 1) = 24; GMR = 0.95
1e+05 sims at theta0 = 0.95 (p(BE) = 'power').
p(BE) = 0.84244
p(BE) s1 = 0.63203
Studies in stage 2 = 33.56%
Distribution of n(total)
- mean (range) = 29 (24 ... 86)
- percentiles
5% 50% 95%
24 24 48
With 12 subjects in the first stage on the average you will have a total sample size of 32.4 (median 32) and with 24 only 29 (median 24). In the latter case you have already a chance of 63% to show BE in the first stage and in the former only 18%.
❝ … and then calculate the CV and real GMR in the first stage.
Yep. But in ‘Type 1’ TSDs you generally ignore the observed GMR and work with a fixed (assumed) T/R-ratio.
❝ In the second stage, we, if I understand correctly, should calculate post-hoc power and recalculate the sample size with the data (GMR and CV) obtained in the first stage, if bioequivalence has not been achieved in the first stage.
Nope. You calculate interim power after the first stage. If you want to use the GMR of the first stage as well (go fully adaptive) you might shoot yourself in the foot. Practically you need two futility criteria:
- Stop if the GMR is outside [0.80, 1.25].
- Stop if the re-estimated sample size is above a pre-specified limit (U).
❝ But what to do, if we have got bad GMR(0,83), whatever CV and low power (around 30) in the first stage.
You are free to include futility criteria for early stopping in the method. You don’t have to worry about the adjusted α because any futility criterion decreases the patient’s risk.
❝ In this case, as I understand, we should recruit much more subjects according to our protocol.
In general you should not give a total sample size in the protocol – unless it is part of the framework (simulations recommended: have an eye on power).
If you are courageous try the Inverse-Normal Combination Method / Maximum Combination Test (Maurer et al. 2018).
Pro: Proven to preserve the Type I Error (makes regulatory statisticians happy).
Con: Might be the first time they ever have seen sumfink like this. Expect questions.
Example: Like above but two futility criteria: GMR within [0.8, 1.25] and maximum total sample size 120. GMR observed in the first stage used (fully adaptive).
library(Power2Stage)
power.tsd.in(CV=0.25, n1=24, usePE=TRUE, fClower=0.8, fCupper=1.25,
fCNmax=120, fCrit=c("PE", "Nmax"))
TSD with 2x2 crossover
Inverse Normal approach
- maximum combination test (weights = 0.5 0.25)
- alpha (s1/s2) = 0.02635 0.02635
- critical value (s1/s2) = 1.93741 1.93741
- with conditional error rates and conditional power
Overall target power = 0.8
Threshold in power monitoring step for futility = 0.8
Power calculation via non-central t approx.
CV1 and PE1 in sample size est. used
Futility criterion Nmax = 120
Futility criterion PE outside 0.8 ... 1.25
Minimum sample size in stage 2 = 4
BE acceptance range = 0.8 ... 1.25
CV = 0.25; n(stage 1) = 24; GMR = 0.95
1e+05 sims at theta0 = 0.95 (p(BE) = 'power').
p(BE) = 0.84312
p(BE) s1 = 0.60784
Studies in stage 2 = 30.03%
Distribution of n(total)
- mean (range) = 31.3 (24 ... 120)
- percentiles
5% 50% 95%
24 24 68
❝ The questions, how can we avoid slipping into "forced bioequivalence"? Or should we go straight away and recruit this large number of subjects? What should be put into the protocol?
As ElMaestro wrote above, I don’t see how you could run into “forced BE”.
- No defined total sample size. You perform the second stage in the re-estimated sample size.
- If at the end the CV is lower and/or the GMR “better” than assumed, be happy. Post hoc power is irrelevant. Open a bottle of champagne.
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Complete thread:
- Two-stage design and 'forced bioequivalence' Mikalai 2018-06-06 08:28 [Two-Stage / GS Designs]
- Two-stage design and 'forced bioequivalence' ElMaestro 2018-06-06 10:53
- Two-stage design and 'forced bioequivalence' Yura 2018-06-07 10:24
- But what is the real problem? ElMaestro 2018-06-07 13:53
- But what is the real problem? Yura 2018-06-07 14:59
- But what is the real problem? Mikalai 2018-06-07 15:47
- But what is the real problem?Helmut 2018-06-07 17:33
- But what is the real problem? Mikalai 2018-06-08 12:24
- U as a futility criterion Helmut 2018-06-08 14:00
- But what is the real problem? Mikalai 2018-06-08 12:24
- But what is the real problem?Helmut 2018-06-07 17:33
- But what is the real problem? Mikalai 2018-06-07 15:47
- But what is the real problem? Yura 2018-06-07 14:59
- But what is the real problem? ElMaestro 2018-06-07 13:53
- Two-stage design and 'forced bioequivalence' Yura 2018-06-07 10:24
- Two-stage design and 'forced bioequivalence' ElMaestro 2018-06-06 10:53