## Now what? w & w* examples [Two-Stage / GS Designs]

Dear All,

Sorry for my rather late reply. A lot of very good comments have been made around this new function power.tsd.in. I hope there will be applications and more investigations in the future regarding this. As Detlew already mentioned: the type 1 error is controlled regardless of the scenario (we know it by theory, no simulations needed). This makes it very valuable in my opinion.

I try to comment on some points made.

» What about alpha0 for method C? Is it deprecated?

I hope you (mittyri) had a look into the references and found some hints on it. alpha0 just does not exist in this method. For the inverse normal method we always need (only) two adjusted alpha values, one for stage 1 and another for stage 2. The fact that the function also allows you to specify only one value is for your convenience, it will then calculate the adjusted ones internally.

» isn't it possible that we get some value lower than 4?

It actually is never possible to get a smaller sample size. All sample size functions used in this R package give at least 4 and thus this criterion implicitly applies to all functions within Power2Stage.

Comment on the weights:
Detlew already pointed out some important remarks. I can only highlight again that the standard combination test for the inverse normal method already uses 2 weights, but only one (w) needs to be specified because the second is just 1-w. For the maximum combination test we have 2 pairs of weights (so 4 in total), but again only the first ones of the two pairs are relevant. Those two first weights need to be spedified in the argument weight.

» Some regulatory statisticians told me to prefer a first stage as estimated for a fixed-sample design (i.e., the second stage is solely a ‘safety net’).

Sounds interesting. At first this sounds nice, but I am a bit puzzled about it. "Safety net" sounds like we have a rather good understanding about the CV but in case we observe some unforeseen value we have the possibility to add some extra subjects. However, in such a case we could just go with a fixed design and adapt the Power. In a TSD setting we typically have no good understanding about the CV... Do I miss something here? Based on what assumptions would we select n1 (= fixed design sample size)? We typically have some range of possible values and we don't know where we will be. For n1 I would then rather use the lower end of this range. Comments?

Usual practice when dealing with adaptive designs was to define not just n1 but also n2 (the theory and examples were introduced for superiority designs). One way of doing that is by calculating a fixed study design sample size and then ask yourself after what fraction we want to make a look into the data+. This is done by Maurer et al and they choose 50% for the interim look. So n1 equals n2. If we assume that all subjects are evaluable this would give us a weight w of 0.5 for the first pair of weights. For superiority trials it is common practice not to go below n2 (in case the second stage is performed). Thus: if we want to apply the maximum combination test, a second weight w* being greater than w does not make sense. For the BE setting it seems this is all different. Here, n2 is flexible and can also be lower than the initially planned one. In fact, the initially planned stage 2 sample size is not really formally defined (although it theoretically exists - at least in case you calculate n1 according to some fixed design sample size). This makes the decision regarding how to define the two pairs of weights even harder. There is no unique way of defining the weights. One could for example perform some optimization procedure (with a side condition that fixes either power or sample size). Unfortunately I currently don't have an ideal solution solution to this either.
+ Note that the intention for the interim look may be very different for superiority trials than for BE trials.

» Take nfix/2 as sample size in stage 1 (Maurer et al.)
» ...

» Too optimistic 0.25 0.95 0.5 0.4 0.822 37.1 34 78
I can't reproduce the numbers. For ASN I get 35 and for power 0.79468.

» ...
» Too pessimistic specifications result in higher power and lower expected sample size (!) (at least for CVs around 0.2)
See also my comment above on the safety net. I am wondering: would we actually plan n1 such that we are (too) pessimistic? I would say: no.

» Too optimistic specifications may result in lower power and higher expected sample size (!)

Brings us back to: We should plan with a realistic/slightly optimistic scenario.

Best regards,
Ben.