## Explained crap remains crap [Two-Stage / GS Designs]

Dear Detlew!

❝ I'm quite sure: This is because of the simulation error. The differences of the TIE without and with min.n2 are so small.

❝ Any try with a different seed of the random number generator may and will change the comparison.

As usual you are right.
The standard error of a single estimate from 1 mio simulations is $$\small{\sqrt{0.5\alpha/10^6}\approx 0.00016}$$. With random seeds results spread but the trend is obvious:

25 replicates; blue dots fixed seeds, light blue dots random seeds. Model fits, 95% and 99% prediction intervals.

Walking in the footsteps of zizou and trying an argument:
• In all methods the sample size of the second stage is estimated based on the adjusted α, the – generally – fixed GMR, target power, and the CV observed in the interim. Only for these conditions the respective adjusted α was validated (none of the published methods used a minimum n2). a
• Arbitrarily increasing n2 is a relevant change of the framework which would force it outside its validated boundaries. The chance to demonstrate BE (i.e., falsely rejecting the true Null) increases as well and hence, the Type I Error.
Since the α 0.0294 of Potvin’s1 Method B is overly conservative, ANVISA’s requirement fortunately controls the Type I Error (see the first plot above) but this might not be the case with other methods where the adjusted α gives a TIE closer to the nominal 0.05.

Consequences for the Consulta Pública N° 760:
• I don’t get the point why one should treat more subjects than necessary. IMHO, that’s not ethical. b
• If it will be implemented in its current form, one is bound to Potvin’s Method B.
Stupid because only applicable for GMR 0.95 and 80% power. Not fully adaptive (i.e., using the PE of the interim), no futility rules (maximum sample size, early stopping due to extreme PE, etc).
• Type 2 TSDs are seemingly not acceptable. Why? Fine for the FDA and Health Canada…
• Hopefully we can convince the ANVISA that other methods are valid as well. However, if the ANVISA insists on n2 ≥ 50% n1, simulations are mandatory to find a suitable – potentially lower – adjusted α.
• When nowadays dealing with crossover designs, I would leave the simulation-based methods aside and recommend Maurer’s2 approach instead. It is the most flexible one and allows to specify a minimum n2 (≥ 4) whilst controlling the TIE in the strict sense.
NB, the minimum n2 of the method is 4 (required for the ANOVA of the second stage).

1. The minimum n2 of two subjects given in the EMA’s Q&A document is nonsense for obvious reasons: If a second stage can be initiated (study failed in stage 1 and interim power below target), any software will come up with balanced sequences. What’s the minimum? Guess.
2. Sponsors will like the increased power (see the second plot above). However, regulators should be interested in protecting the public health and not the profits of the industry.

1. Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs. Pharm Stat. 2008; 7(4): 245–62. doi:10.1002/pst.294.
2. Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018; 37(10): 1587–607. doi:10.1002/sim.7614.

