## Science fiction [Two-Stage / GS Designs]

Hi Mauricio,

» Are there any problem if, in the same protocol, replicated 4x2 and two-stage-design are considered?

A lot of problems. I agree with ElMaestro.

» For example, in first group were used 96 subjects on replicated (4x2) design with IC 95%. The result wasn't bioequivalent and the power was less than 80%.

Wow. I guess by “IC” you mean the GMR or T/R-ratio, right?* For FDA’s RSABE that would imply a CV of >361% and for EMA’s ABEL still a CV of >166%. What a nasty drug/formulation! BTW, for HVDs / HVDPs assuming a GMR of 95% is not a good idea. The two Lászlós recommend 90% – even if a “better” one was observed in a previous study.
You cannot simply assess the study for BE (α 0.05 or 90% CI) – that’s an add-on design, which was shown to inflate the TIE.1,2 In TSDs you have to use an adjusted α (at least if you proceed to the second stage).

» Therefore, a second group was added with 48 subjects, on replicated (4x2) design with the same IC 95%. In the end, first and second stage were combined and the result was bioequivalent …

… with a completely unknown type I error. If you used 0.05 already in the first stage you are dead.

» I am to consider this strategy because I don't know how much variability of drug is that!

Reference-scaling was developed to deal with the CV. If the CV turns out to be higher than expected you are allowed to scale more – and don’t loose too much power:

Problems arise not from the CV but from the GMR! BTW, most TSDs assume a fixed GMR. Full adaptive ones (i.e., adjusting for the observed GMR in the first stage) require a futility criterion and quite often are lacking power.2,3,4

» Is it possible?

Not yet – unless you have access to a massive-parallel supercomputer. You would have to find a suit­able adjusted α and demonstrate beforehand that the overall type I error is maintained. Unlike in con­ventional (crossovers, parallel) designs due to the mixed-strategy (GMR-restriction of 0.80–1.25, no scaling at CV <30%; CVs >50% treated as if CV=50% for EMA) the power/sample-size estimation needs 105 simu­la­tions. Combine that with the 106 (slow convergence) needed to simulate the TIE in an entire grid of possible n1/CV-combinations. You’ll end up with 1013–1014 simulations…

Recently I faced an example where the sponsor (despite serious warnings of the CRO) insisted in a similar design. The sponsor is always right. A regulator asked for justification of the chosen α. I made a quick estimation (I have a very fast workstation): ~60 years running 24/7…
You don’t want to go there.

1. Wonnemann M, Frömke C, Koch A. Inflation of the Type I Error: Investigations on Regulatory Recommendations for Bioequivalence of Highly Variable Drugs. Pharm Res. 2015;32(1):135–43. doi:10.1007/s11095-014-1450-z
2. Schütz H. Two-stage designs in bioequivalence trials. Eur J Clin Pharmacol. 2015;71(3):271-81. doi:10.1007/s00228-015-1806-2
3. Fuglsang A. Futility rules in bioequivalence trials with sequential designs. AAPS J. 2014;16(1):79–82. doi:10.1208/s12248-013-9540-0
4. Kieser M, Rauch G. Two-stage designs for cross-over bioequivalence trials. Stat Med. 2015;34(16):2403–16. doi:10.1002/sim.6487

• After reading your post again, I think I was in error. By IC you mean the confidence interval (<span lang="pt">inter­valo de con­fiança</span>). Forget what I wrote about the GMR. So it seems that you applied Bon­ferroni’s α 0.025 (95% CI). Will this control the TIE? Nobody knows. Reference-scaling itself might lead to an inflated TIE (see Wonnemann’s paper and this thread). Anyhow, you would have to prove that the TIE is controlled.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes