## On the contrary, my dear Dr Watson! [Two-Stage / GS Designs]

Hi ElMaestro,

» I remember having heard EU regulators mention preference for method C out of consideration for the type I error.

What? Where?

» But I can't seem to find a presentation from anyone saying so.

Would surprise me if there is any.

» Do you […] have a link or a presentation by a regulator where this was stated?

Nope. The collaborative work about the type I error was removed from the work plan last year (Paola Coppola’s presentation at BioBridges 2018):

No work plans published this year for both parties due to Brexit. However, there is an unequivocal preference towards methods which show analytically strict control of the TIE.1,2,3 In my experience European regulatory statisticians hate simulation-based methods.

On Wednesday’s workshop I endured a frustrating chat with a statistician of the Austrian agency AGES. Collection of errors and misconceptions:
• Simulation-based methods ‘type 2’ (e.g., Potvin C) lead to an inflated TIE.
Wrong. Even with the original adjusted α 0.0294 only within n1 12–16 and CV 16–26%. Could easily be counteracted by a more conservative adjusted α 0.0282.
• Kieser and Rauch4 showed that 0.0294 is not correct.
Wrong. The authors didn’t show anything (in the sense of a proof) but lamented that 0.0294 is Pocock’s adjusted α for a one-sided test and for equivalence the correct one is 0.0304. Right but both are for a group-sequential design with a fixed sample size and one interim at exactly ½N.a That’s not what we have in a TSD with sample size re-estimation in the interim. When you inspect the electronic supplementary material of the paper (or better perform simulations with a narrower grid) you will find a slight inflation of the TIE. In TSDs the adjustment depends on the ranges of n1 and CV, the fixed T/R-ratio, and the desired power. Incidentally in Method B 0.0294 turns out to be conservative.b That’s the reason why regulators prefer B over C. For an example where Method C was not accepted see there. If you want to go with Method B, you could use an adjusted α 0.0301. Quoting the GL: “… the choice of how much alpha to spend at the interim analysis is at the company’s discretion.”
The NLYW had a PhD in biostatistics and believed [sic] that 0.0304 is suitable in all settings. Jesus fucking Christ!
• Simulation-based methods are basically to be rejected in principal, since there are exact methods that control the TIE.
Well roared, lion! For 2×2 crossovers only since our posters.1,2 I have strong doubts that – given the rudimentary information – anybody ever successfully used it. The R-code given by Maurer et al.3 is almost useless. Practically the method couldn’t be applied until we implemented it (THX to Ben!) in Power2Stage. In other words, that someone could have used the method before mid-2018 is wishful thinking.
An analogous version for repeated confidence intervals in parallel designs doesn’t exist at all. I don’t know anybody working on it. Not trivial for unequal group sizes and/or variances. Reply: “Doesn’t matter because parallel designs are rarely used in BE.” Wake up, girlie!
Was like talking to a brick wall.

Yesterday I sent a clarification  e-mail  rant to Thomas Lang (AGES, member of the BSWP). Don’t expect to get a reply.

1. König F, Wolfsegger M, Jaki T, Schütz H, Wassmer G. Adaptive two-stage bioequivalence trials with early stopping and sample size re-estimation. 2014. doi:10.13140/RG.2.1.5190.0967.
2. König F, Wolfsegger M, Jaki T, Schütz H, Wassmer G. Adaptive two-stage bioequivalence trials with early stopping and sample size re-estimation. Trials. 2015; 16(Suppl 2);P218. doi:10.1186/1745-6215-16-S2-P218.
3. Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018; 37(10): 1587–1607. doi:10.1002/sim.7614.
4. Kieser M, Rauch G. Two-stage designs for cross-over bioequivalence trials. Stat Med. 2015; 34(16): 2403–16. doi:10.1002/sim.6487.

1. An all too often overlooked detail: If the interim is at <½N (due to dropouts) one has to use an error-spending function (e.g., Lan and DeMets, Jennison and Turnbull) to control the TIE.
2. One mio simulations of a narrow grid (step size 2); TIEmax at n1 12 and CV 24%. Approximations by the shifted central t and the non­central t, exact by Owen’s Q. Go for a cup of coffee. The exact method is very slow.
library(Power2Stage) pmethod <- c("shifted", "nct", "exact") res     <- data.frame(method = pmethod, TIE = NA, speed = NA) for (j in seq_along(pmethod)) {   start        <- proc.time()[[3]]   res$TIE[j] <- power.tsd(method = "B", alpha = rep(0.0294, 2), n1 = 12, CV = 0.24, GMR = 0.95, targetpower = 0.80, theta0 = 1.25, pmethod = pmethod[j], nsims = 1e6)$pBE   res$speed[j] <- proc.time()[[3]] - start } res$speed <- signif(res$speed / res$speed[1], 3) print(res, row.names = FALSE)  method      TIE speed shifted 0.048959  1.00     nct 0.048762  1.44   exact 0.048924 28.40

With alpha = rep(0.0301, 2):
 method      TIE speed shifted 0.050004  1.00     nct 0.049790  1.44   exact 0.049693 28.50

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes