Dear all,
since we published our paper ^{1} about the Type I Error, I received questions about its application, regulatory acceptability, etc. In the following I summarize my answers. - Q Is the method accepted by regulatory agencies?
A I don’t know. Since the paper was published quite recently I don’t think that – even if in the meantime studies were performed accordingly – any made it through regulatory assessment so far.
- Q Is there a reason why an agency should not accept the method?
A Not at all! Even if an agency (following the BSWP’s opinion of “We don’t like simulations!”) believes [sic] that the entire approach is nonsense, this is no sufficient reason to reject it. The adjusted α is always smaller than the nominal α 0.05. In other words, it means that you care more about the patient’s risk than they do (the method is more conservative than the one recommended in the guideline). You can’t be punished for being cautious (i.e., using a wider CI). Nobody can reasonably argue against an α <0.05. Remember that in the beginning of BE we routinely used a 95% CI and not a 90% CI!
- Q How to deal with the sample size penalty?
A First of all one has to increase the sample size (in order to maintain the desired power) only in the critical region around CV_{wR} 30%. The sample size penalty drops quickly and vanishes for “true” HVD(P)s (say CV_{wR} >40%) completely (see Fig. 3b of the paper).
Example: RTRT|TRTR, CV_{wR} 35%, assumed T/R-ratio 0.90, target power 80% which calls for a study in 34 subjects (81.2% power). With the nominal α 0.05 the Type I Error would be 0.0656. If one evaluates the study with the adjusted α 0.03630 (92.74% CI) power will drop to 77.3%. Up to the sponsor to decide whether this is still acceptable. Of course one could increase the sample size to 38 (81.0% power). Note that in this case another (slightly smaller) adjusted α is calculated.
- Q Should I specify the adjusted α in the protocol?
A No. The actual study will require an adjusted α which likely is different from what was planned (different CV_{wR}, smaller sample size due to dropouts, unbalanced sequences). If the CV_{wR} turns out to be closer to 30% than expected and one will evaluate the study with what was stated in the protocol, the TIE would be inflated. On the other hand if the CV_{wR} would be more far away from the critical region, one would have to adjust less and therefore, gain power. To quote the conclusions of the paper:It should be sufficient for regulatory acceptance to unambiguously specify the method in the study protocol, which was shown to be more conservative than the current recommendations in any case. Hence, one should state in the protocol what was assumed (CV_{wR}, n), expected based on it (adjusted α), and that the actual adjusted α will be derived from the study’s data. It will not hurt to make clear that the adjusted α will be ≤0.05.
- Q I do not understand how the underlying statistical distributions are used in the simulations. Is that correct at all?
A Good point. A similar question was raised by one of the reviewers – and it took us two months* to compare the method introduced by Zheng et al.^{2} with simulations of subject data. For the results see the Supplementary material 1 and 2. In short: The agreement is fine for the full replicate designs (RTRT|TRTR and RTR|TRT) and sufficient for the partial replicate (RRT|RTR|TRR). If you don’t trust in it, PowerTOST contains since v1.4-4 the new function power.scABEL.sdsims() . A comparison:power.scABEL(CV=0.3, design="2x2x4", n=34, theta0=1.25, nsims=1e6)
# [1] 0.081626
power.scABEL.sdsims(CV=0.3, design="2x2x4", n=34, theta0=1.25, nsims=1e6)
# [1] 0.081602 You have to endure runtimes which are 50–300× slower than the ones of power.scABEL() . On GitHub I updated the functions scABEL.ad() and sampleN.scABEL.ad() accordingly. Expect them on CRAN in April.
- Q The functions in PowerTOST cover balanced and unbalanced studies. How to deal with incomplete data (e.g., data set I of the Q&A)?
A That’s a valid point. First of all we could only use the slow subject data simulations. We would have to eliminate missing periods from the simulated data sets as in the study. There are some problems: How to unambiguously (read: foolproof) specify in the function calls which data are missing? The degrees of freedom have to be modified which would require a major re-write of the code. Doable but cumbersome. We are thinking about; don’t expect anything in the near future.
Answers to questions not asked: - If the study was performed in a full replicate design you could (or better: should) take the additional information about CV_{wT} into account. Example: RTRT|TRTR, CV_{wR} = CV_{wT} 35%, T/R-ratio 0.90, 34 subjects. Adjusted α 0.03630 (post hoc power 77.3%). Now: CV_{wR} 35%, CV_{wT} 30%. Adjusted α 0.03512 (post hoc power 81.4%). Although you adjust more (due to the lower CV_{wT} the chance of passing BE increases and hence, the TIE) power increases as well – despite a wider CI.
- The partial replicate design, again. The ‘crippled model’ (© Jiří Hofmann) recommended by the EMA in the Q&A prevents the disaster we sometimes observe with the FDA’s over-specified mixed-effects model for ABE (i.e., if σ_{wR} <0.294). However, assuming equal variances of T and R might be (and likely is) simply wrong! Essentially we are fishing in the dark. Hence, all simulations should be taken with a grain of salt. If ever possible, avoid this design.
@Detlew: Something wrong or did I forget anything?
- Labes D, Schütz H. Inflation of Type I Error in the Evaluation of Scaled Average Bioequivalence, and a Method for its Control. Pharm Res. 2016;33(11):2805–14. doi:10.1007/s11095-016-2006-1. free full-text view-only.
- Zheng C, Wang J, Zhao L. Testing bioequivalence for multiple formulations with power and sample size calculations. Pharm Stat. 2012;11(4):334–41. doi:10.1002/pst.1522
- Up to eight simultaneous R-sessions on two machines running 24/7. Detlew and Ben did a hell of a job to tweak the code. Simulations of 34 subjects RTRT|TRTR took originally more than two hours to complete. Now we are down at 30 seconds!
—
Cheers,
Helmut Schütz
The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes |