Sims w.o. intermediate power [Two-Stage / GS Designs]

posted by Helmut Homepage – Vienna, Austria, 2012-07-26 18:20 (4657 d 14:34 ago) – Posting: # 8983
Views: 20,975

Dear Detlew & all!

❝ I've heard some rumour from regulators that you should not stop but recruit 2 more subjects for stage 2 in that case. But this lacks any scientific justification IMHO.


SCNR. Implemented your “rumour scheme”. In contrast to Potvin B, where a study might already fail in stage 1 (not BE at α 0.0294 and power ≥80%) we must continue to stage 2. Oh, only a few more subjects – presumably not an ethical concern to (some?) regulators… :not really:

[image]

Example 1: CV 20%, T/R 0.95, α 0.0294, expected power with n1 24 is 83.6%. Sample size for a fixed design, 80% power, α 0.05 would be 20. Run on two machines (R 2.15.1, PowerTOST 0.9-10), 106 simulations each.
Potvin B
            Ratio 1.25               │             Ratio 0.95             
─────────────────────────────────────┼─────────────────────────────────────
                      % in   empiric │                       % in   empiric
  n  ( 5%, 50%, 95%) stage 2    α    │   n  ( 5%, 50%, 95%) stage 2   1-β 
26.0  24   24   34    34.5   0.0320  │ 24.6  24   24   28     8.6    0.8810

“rumour scheme”
            Ratio 1.25               │             Ratio 0.95             
─────────────────────────────────────┼─────────────────────────────────────
                      % in   empiric │                       % in   empiric
  n  ( 5%, 50%, 95%) stage 2    α    │   n  ( 5%, 50%, 95%) stage 2   1-β 
27.2  26   26   34    97.1   0.0379  │ 24.7  24   24   28    16.3    0.9037
27.2  26   26   34    97.1   0.0379  │ 24.7  24   24   28    16.4    0.9030


[image]


Example 2: CV 30%, T/R 0.95, α 0.0294, expected power with n1 24 is 41.1%. Sample size for a fixed design, 80% power, α 0.05 would be 40. Sims as above.
Potvin B
            Ratio 1.25               │             Ratio 0.95             
─────────────────────────────────────┼─────────────────────────────────────
                      % in   empiric │                       % in   empiric
  n  ( 5%, 50%, 95%) stage 2    α    │   n  ( 5%, 50%, 95%) stage 2   1-β 
46.9  24   46   72    95.0   0.0475  │ 39.9  24   38   70    58.3    0.8305

“rumour scheme”
            Ratio 1.25               │             Ratio 0.95             
─────────────────────────────────────┼─────────────────────────────────────
                      % in   empiric │                       % in   empiric
  n  ( 5%, 50%, 95%) stage 2    α    │   n  ( 5%, 50%, 95%) stage 2   1-β 
46.8  26   46   72    97.1   0.0480  │ 39.8  24   36   70    58.8    0.8303
46.8  26   46   70    97.1   0.0482  │ 39.9  24   36   70    58.9    0.8303


[image]

Good news: In both examples the patient’s risk is preserved. But note that empiric α is slightly lower for Method B as compared to the ‘powerless method’.1 In other words, EMA’s ‘method’ is liberal. In the first example twice as much studies are send to the second stage (caused by the ‘n2=n1+2 rule’). Studies which would fail already in stage 1 according to Method B are forced to the second stage. :angry:

Differences in the second example are less pronounced, because n1 is likely too small to claim BE already in the first stage.

Lesson learned: I always thought of Two-Stage designs to offer an opportunity to ‘save’ a study if assumptions about the variance turn out to be incorrect. I was never a friend of playing chances and go with an interim analysis at a sample size where presumable you would fail and have to proceed to the second stage anyway.* Doubles the study time and even if your assumptions were correct you have to pay the penalty in the total sample size (10–20% more subjects as compared to a fixed design). Since exhaustive simulations of the ‘rumour scheme’ are not published (has EMA performed some at all or is this just a ‘gut-feeling-approach’?), IMHO it would require simulations for every single study [sic] to demonstrate that αemp. ≤0.05.2

P.S.: Thanks to Detlew for finding a bug in my code. ;-)


  1. Instead of ‘powerless’ can we introduce the term ‘weak’?
  2. For 106 simulations the one-sided level of significance based on the exact binominal test is 0.05035995. Only an αemp. larger than the critical value is significantly >0.05. Also mentioned in the note below Table I of Potvin’s paper. If you are patient and opt for 107 sims: 0.05011351… For R-freaks:
    sims <- 1E6
    x    <- 0.05
    binom.test(x*sims, sims, alternative='less')


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
Activity
 Admin contact
23,424 posts in 4,927 threads, 1,667 registered users;
24 visitors (0 registered, 24 guests [including 5 identified bots]).
Forum time: 08:55 CEST (Europe/Vienna)

It is true that many scientists are not philosophically minded
and have hitherto shown much skill and ingenuity
but little wisdom.    Max Born

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5