Sequential designs (history and future) [Power / Sample Size]

posted by Helmut Homepage – Vienna, Austria, 2010-03-29 22:43 (5526 d 00:37 ago) – Posting: # 4990
Views: 13,174

Dear bears!

❝ So in the case of sequential designs, when we get non BE results,...


Almost. It is important in any sequential design not to ‘consume’ the entire alpha-risk in the interim looks. In other words - you must not evaluate the study for BE (i.e., calculate the 90% CI and check for inclusion in the AR). If you do that, your entire alpha-risk has gone - nothing left for further looks. This was actually the problem with the Canadian and Japanese method. Sequential designs have a long tradition in clinical phases II/III; the alpha level at each look is selected in such a way, that the overall alpha-risk is maintained at ≤0.05.

❝ … the first thing is to check if the power is o.k. (defined as equal to or greater than 80%).


Yes, according to Potvin et al. It’s important to notice that the value calculated here is not some kind of a posteriori power, but only an instrument in deciding whether the study will be stopped after Stage 1 (evaluate at alpha 0.05: pass/fail) or enter the right branch of the flow chart.

❝ If not, then we can go to Stage 2 by recruiting more subjects.


You are too fast. If power <80% we evaluate Stage 1 at alpha 0.0294 (instead of 0.05). In my workshops many people are scared by this alpha-value (i.e., a 94.12% CI instead of 90.00%). In the ol’ days of BE testing a 95% was applied all the time. In reality there is not a big difference in sample sizes. For CV 20%, ±5%, 80-125%, 80% power the sample sizes are 19 (alpha 0.05) and 23 (alpha 0.0294) - or the other way 'round: If you just miss the 80% (let's say you had one drop-out; power 79.1%). Power for alpha 0.0294 is 69.3% and that’s still a pretty good chance that we will show BE at Stage 1 and stop. Only if we fail here, we will advance to Stage 2.
Another argument I’ve heard a couple of times is: ‘Costs! We will need more subjects in a two-stage design.’ Maybe. Simulations show a penalty of ~10% compared to fixed-sample designs. But only if one assumes that (s)he knows the ‘correct’ variance beforehand. If one has a lot of experiences with a formulation (own studies, same analytical method) and the CV is ‘stable’, fine. But with an uncertain estimate of the CV (literature data only or small pilot study), would you really want to ‘save’ 10% of the budget and end up with an upper CI of 125.94%. ;-)
My suggestion is to power the first stage of the study as if it is a fixed-sample design. If expectations come true - business as usual: conventional statistical model, 90% CI, everybody is happy. If not, you get a second chance!

❝ According to Potvin D, et al., 2008, there seems only one chance (Method A-D).


Yes. The methods were validated only for one interim analysis. There are others, which would allow for more than one look into the data (e.g., Gould 1995). However, in the EU only a two-stage design is acceptable.

❝ If we like to add the sequential designs into bear, we have to implement the methods: to evaluate BE at Stage 1 (alpha = 0.0294) and to calculate sample size based on Stage 1 and alpha = 0.0294, and finally to evaluate BE at Stage 2 using data from both stages (alpha = 0.0294).


In principle, yes. It is important that you don’t fall into the trap of calculating the sample size based on the point estimate of Stage 1. Potvin’s method is not a full adaptive design; it was not validated to adjust for the effect size, but only for the unknown variance of Stage 1. If you planned the first Stage for an expected point estimate of 95% and get only 90% (or even more tempting 100%), you must not use this value in sample size estimation, but the original one! You may simply be caught by random walks.
Another important point: There’s no futility rule in the method (the sample size estimation may come up with 3142 subjects for Stage 2). Ethics committees may be surprised by that, because they are familiar with ‘early stopping-rules’ in clinical trials. Again, the method was not validated for such a rule. There’s only the possibility for the sponsor to pull the ripcord based on non-statistical grounds.

❝ The alpha level is not commonly used in statistics. However it seems feasible, I guess.


Oh yes, it is quite common in clinical trials. Actually 0.0294 were first proposed by Armitage (1975) and Pocock (1977), I guess. This value is the one with longest tradition, but there are many, many others as well. For some links, see this post.

If you plan to implement the method in bear, don’t forget to modify the model if the study is evaluated after pooling (see this thread). See also the last sentence of the Two-stage design Section of EMA’s GL:

When analysing the combined data from the two stages, a term for stage should be included in the ANOVA model.


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
Activity
 Admin contact
23,424 posts in 4,927 threads, 1,669 registered users;
117 visitors (0 registered, 117 guests [including 38 identified bots]).
Forum time: 23:20 CEST (Europe/Vienna)

Don’t undertake a project
unless it’s manifestly important
and nearly impossible.    Edwin H. Land

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5