Sims w.o. intermediate power [Two-Stage / GS Designs]
Dear Detlew & all!
SCNR. Implemented your “rumour scheme”. In contrast to Potvin B, where a study might already fail in stage 1 (not BE at α 0.0294 and power ≥80%) we must continue to stage 2. Oh, only a few more subjects – presumably not an ethical concern to (some?) regulators…
![[image]](img/uploaded/image111.png)
Example 1: CV 20%, T/R 0.95, α 0.0294, expected power with n1 24 is 83.6%. Sample size for a fixed design, 80% power, α 0.05 would be 20. Run on two machines (R 2.15.1, PowerTOST 0.9-10), 106 simulations each.
Potvin B
“rumour scheme”
![[image]](img/uploaded/image112.png)
Example 2: CV 30%, T/R 0.95, α 0.0294, expected power with n1 24 is 41.1%. Sample size for a fixed design, 80% power, α 0.05 would be 40. Sims as above.
Potvin B
“rumour scheme”
![[image]](img/uploaded/image113.png)
Good news: In both examples the patient’s risk is preserved. But note that empiric α is slightly lower for Method B as compared to the ‘powerless method’.1 In other words, EMA’s ‘method’ is liberal. In the first example twice as much studies are send to the second stage (caused by the ‘n2=n1+2 rule’). Studies which would fail already in stage 1 according to Method B are forced to the second stage.
Differences in the second example are less pronounced, because n1 is likely too small to claim BE already in the first stage.
Lesson learned: I always thought of Two-Stage designs to offer an opportunity to ‘save’ a study if assumptions about the variance turn out to be incorrect. I was never a friend of playing chances and go with an interim analysis at a sample size where presumable you would fail and have to proceed to the second stage anyway.* Doubles the study time and even if your assumptions were correct you have to pay the penalty in the total sample size (10–20% more subjects as compared to a fixed design). Since exhaustive simulations of the ‘rumour scheme’ are not published (has EMA performed some at all or is this just a ‘gut-feeling-approach’?), IMHO it would require simulations for every single study [sic] to demonstrate that αemp. ≤0.05.2
P.S.: Thanks to Detlew for finding a bug in my code.
❝ I've heard some rumour from regulators that you should not stop but recruit 2 more subjects for stage 2 in that case. But this lacks any scientific justification IMHO.
SCNR. Implemented your “rumour scheme”. In contrast to Potvin B, where a study might already fail in stage 1 (not BE at α 0.0294 and power ≥80%) we must continue to stage 2. Oh, only a few more subjects – presumably not an ethical concern to (some?) regulators…

![[image]](img/uploaded/image111.png)
Example 1: CV 20%, T/R 0.95, α 0.0294, expected power with n1 24 is 83.6%. Sample size for a fixed design, 80% power, α 0.05 would be 20. Run on two machines (R 2.15.1, PowerTOST 0.9-10), 106 simulations each.
Potvin B
Ratio 1.25 │ Ratio 0.95
─────────────────────────────────────┼─────────────────────────────────────
% in empiric │ % in empiric
n ( 5%, 50%, 95%) stage 2 α │ n ( 5%, 50%, 95%) stage 2 1-β
26.0 24 24 34 34.5 0.0320 │ 24.6 24 24 28 8.6 0.8810
“rumour scheme”
Ratio 1.25 │ Ratio 0.95
─────────────────────────────────────┼─────────────────────────────────────
% in empiric │ % in empiric
n ( 5%, 50%, 95%) stage 2 α │ n ( 5%, 50%, 95%) stage 2 1-β
27.2 26 26 34 97.1 0.0379 │ 24.7 24 24 28 16.3 0.9037
27.2 26 26 34 97.1 0.0379 │ 24.7 24 24 28 16.4 0.9030
![[image]](img/uploaded/image112.png)
Example 2: CV 30%, T/R 0.95, α 0.0294, expected power with n1 24 is 41.1%. Sample size for a fixed design, 80% power, α 0.05 would be 40. Sims as above.
Potvin B
Ratio 1.25 │ Ratio 0.95
─────────────────────────────────────┼─────────────────────────────────────
% in empiric │ % in empiric
n ( 5%, 50%, 95%) stage 2 α │ n ( 5%, 50%, 95%) stage 2 1-β
46.9 24 46 72 95.0 0.0475 │ 39.9 24 38 70 58.3 0.8305
“rumour scheme”
Ratio 1.25 │ Ratio 0.95
─────────────────────────────────────┼─────────────────────────────────────
% in empiric │ % in empiric
n ( 5%, 50%, 95%) stage 2 α │ n ( 5%, 50%, 95%) stage 2 1-β
46.8 26 46 72 97.1 0.0480 │ 39.8 24 36 70 58.8 0.8303
46.8 26 46 70 97.1 0.0482 │ 39.9 24 36 70 58.9 0.8303
![[image]](img/uploaded/image113.png)
Good news: In both examples the patient’s risk is preserved. But note that empiric α is slightly lower for Method B as compared to the ‘powerless method’.1 In other words, EMA’s ‘method’ is liberal. In the first example twice as much studies are send to the second stage (caused by the ‘n2=n1+2 rule’). Studies which would fail already in stage 1 according to Method B are forced to the second stage.

Differences in the second example are less pronounced, because n1 is likely too small to claim BE already in the first stage.
Lesson learned: I always thought of Two-Stage designs to offer an opportunity to ‘save’ a study if assumptions about the variance turn out to be incorrect. I was never a friend of playing chances and go with an interim analysis at a sample size where presumable you would fail and have to proceed to the second stage anyway.* Doubles the study time and even if your assumptions were correct you have to pay the penalty in the total sample size (10–20% more subjects as compared to a fixed design). Since exhaustive simulations of the ‘rumour scheme’ are not published (has EMA performed some at all or is this just a ‘gut-feeling-approach’?), IMHO it would require simulations for every single study [sic] to demonstrate that αemp. ≤0.05.2
P.S.: Thanks to Detlew for finding a bug in my code.

- Instead of ‘powerless’ can we introduce the term ‘weak’?
- For 106 simulations the one-sided level of significance based on the exact binominal test is 0.05035995. Only an αemp. larger than the critical value is significantly >0.05. Also mentioned in the note below Table I of Potvin’s paper. If you are patient and opt for 107 sims: 0.05011351… For R-freaks:
sims <- 1E6
x <- 0.05
binom.test(x*sims, sims, alternative='less')
- For similar resons I fail to understand the application of O’Brien/Fleming in BE.
—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!![[image]](https://static.bebac.at/pics/Blue_and_yellow_ribbon_UA.png)
Helmut Schütz
![[image]](https://static.bebac.at/img/CC by.png)
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
![[image]](https://static.bebac.at/pics/Blue_and_yellow_ribbon_UA.png)
Helmut Schütz
![[image]](https://static.bebac.at/img/CC by.png)
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Complete thread:
- Sequential designs, draft FDA guidance Loteprednol ElMaestro 2012-07-03 18:33 [Two-Stage / GS Designs]
- What the heck? Helmut 2012-07-03 19:25
- some unofficial opinion with this regard in Europe Shuanghe 2012-07-04 09:07
- Great! Helmut 2012-07-04 15:43
- Great! Shuanghe 2012-07-05 13:48
- Potvin & Montague not acceptable at all?! Helmut 2012-07-05 14:57
- Powerless Potvin & Montague? d_labes 2012-07-05 16:11
- Example w.o. intermediate power Helmut 2012-07-06 02:06
- Example w.o. intermediate power d_labes 2012-07-06 08:13
- Sims w.o. intermediate powerHelmut 2012-07-26 16:20
- Example w.o. intermediate power Helmut 2012-07-06 02:06
- Potvin & Montague not acceptable at all?! ElMaestro 2012-07-05 16:20
- Sims? Helmut 2012-07-05 16:36
- Powerless Potvin & Montague? d_labes 2012-07-05 16:11
- Potvin & Montague not acceptable at all?! Helmut 2012-07-05 14:57
- Great! Shuanghe 2012-07-05 13:48
- Great! Helmut 2012-07-04 15:43
- some unofficial opinion with this regard in Europe Shuanghe 2012-07-04 09:07
- What the heck? Helmut 2012-07-03 19:25