But what is the real problem? [Two-Stage / GS Designs]

posted by Helmut Homepage – Vienna, Austria, 2018-06-07 17:33  – Posting: # 18866
Views: 2,297

Hi Mikalai,

» We have a drug and do not know it CV. We take an arbitrary sample …

Well, I always try to make an educated guess of the CV. If you aim too low in the first stage, the sample size penalty in the second stage will be larger.
Example: “Guesstimate” CV 25%, Potvin B, n1 12 or 24.

library(Power2Stage)
power.tsd(CV=0.25, n1=12)

TSD with 2x2 crossover
Method B: alpha (s1/s2) = 0.0294 0.0294
Target power in power monitoring and sample size est. = 0.8
Power calculation via non-central t approx.
CV1 and GMR = 0.95 in sample size est. used
No futility criterion
BE acceptance range = 0.8 ... 1.25

CV = 0.25; n(stage 1) = 12; GMR = 0.95

1e+05 sims at theta0 = 0.95 (p(BE) = 'power').
p(BE)    = 0.81438
p(BE) s1 = 0.17895
Studies in stage 2 = 81.38%

Distribution of n(total)
- mean (range) = 32.4 (12 ... 126)
- percentiles
 5% 50% 95%
 12  32  60


power.tsd(CV=0.25, n1=24)

TSD with 2x2 crossover
Method B: alpha (s1/s2) = 0.0294 0.0294
Target power in power monitoring and sample size est. = 0.8
Power calculation via non-central t approx.
CV1 and GMR = 0.95 in sample size est. used
No futility criterion
BE acceptance range = 0.8 ... 1.25

CV = 0.25; n(stage 1) = 24; GMR = 0.95

1e+05 sims at theta0 = 0.95 (p(BE) = 'power').
p(BE)    = 0.84244
p(BE) s1 = 0.63203
Studies in stage 2 = 33.56%

Distribution of n(total)
- mean (range) = 29 (24 ... 86)
- percentiles
 5% 50% 95%
 24  24  48


With 12 subjects in the first stage on the average you will have a total sample size of 32.4 (median 32) and with 24 only 29 (median 24). In the latter case you have already a chance of 63% to show BE in the first stage and in the former only 18%.

» … and then calculate the CV and real GMR in the first stage.

Yep. But in ‘Type 1’ TSDs you generally ignore the observed GMR and work with a fixed (assumed) T/R-ratio.

» In the second stage, we, if I understand correctly, should calculate post-hoc power and recalculate the sample size with the data (GMR and CV) obtained in the first stage, if bioequivalence has not been achieved in the first stage.

Nope. You calculate interim power after the first stage. If you want to use the GMR of the first stage as well (go fully adaptive) you might shoot yourself in the foot. Practically you need two futility criteria:
  1. Stop if the GMR is outside [0.80, 1.25].
  2. Stop if the re-estimated sample size is above a pre-specified limit (U).
The methods of Karalis & Macheras might (!) have terrible power making their application ethically doubtful. Here it works with U 120 if n1 24 but not with n1 12 (power only ~73%). There are alternatives where you don’t stop if n1+n2>U but perform the the second stage in U–n2 subjects. No problem with the Type I Error but might compromise power; I suggest simulations.

» But what to do, if we have got bad GMR(0,83), whatever CV and low power (around 30) in the first stage.

You are free to include futility criteria for early stopping in the method. You don’t have to worry about the adjusted α because any futility criterion decreases the patient’s risk.

» In this case, as I understand, we should recruit much more subjects according to our protocol.

In general you should not give a total sample size in the protocol – unless it is part of the framework (simulations recommended: have an eye on power).

If you are courageous try the Inverse-Normal Combination Method / Maximum Combination Test (Maurer et al. 2018).
Pro: Proven to preserve the Type I Error (makes regulatory statisticians happy).
Con: Might be the first time they ever have seen sumfink like this. Expect questions.
Example: Like above but two futility criteria: GMR within [0.8, 1.25] and maximum total sample size 120. GMR observed in the first stage used (fully adaptive).

library(Power2Stage)
power.tsd.in(CV=0.25, n1=24, usePE=TRUE, fClower=0.8, fCupper=1.25,
             fCNmax=120, fCrit=c("PE", "Nmax"))
TSD with 2x2 crossover
Inverse Normal approach
 - maximum combination test (weights = 0.5 0.25)
 - alpha (s1/s2) = 0.02635 0.02635
 - critical value (s1/s2) = 1.93741 1.93741
 - with conditional error rates and conditional power
Overall target power = 0.8
Threshold in power monitoring step for futility = 0.8
Power calculation via non-central t approx.
CV1 and PE1 in sample size est. used
Futility criterion Nmax = 120
Futility criterion PE outside 0.8 ... 1.25
Minimum sample size in stage 2 = 4
BE acceptance range = 0.8 ... 1.25

CV = 0.25; n(stage 1) = 24; GMR = 0.95

1e+05 sims at theta0 = 0.95 (p(BE) = 'power').
p(BE)    = 0.84312
p(BE) s1 = 0.60784
Studies in stage 2 = 30.03%

Distribution of n(total)
- mean (range) = 31.3 (24 ... 120)
- percentiles
 5% 50% 95%
 24  24  68

Not that bad.

» The questions, how can we avoid slipping into "forced bioequivalence"? Or should we go straight away and recruit this large number of subjects? What should be put into the protocol?

As ElMaestro wrote above, I don’t see how you could run into “forced BE”.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

Activity
 Admin contact
20,115 posts in 4,245 threads, 1,383 registered users;
online 14 (3 registered, 11 guests [including 5 identified bots]).
Forum time (Europe/Vienna): 17:41 CET

We must be careful not to confuse data with the abstractions
we use to analyze them.    William James

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5