## Probability to pass multiple studies ? [Power / Sample Size]

Hi ElMaestro and all,

sorry for excavating an old story.

❝ ❝ say, we have $$\small{n}$$ studies, each powered at 90%. What is the probability (i.e., power) that all of them pass BE?

❝ ❝ Let’s keep it simple: T/R-ratios and CVs are identical in studies $$\small{1\ldots n}$$. Hence, $$\small{p_{\,1}=\ldots=p_{\,n}}$$. If the outcomes of studies are independent, is $$\small{p_{\,\text{pass all}}=\prod_{i=1}^{i=n}p_{\,i}}$$, e.g., for $$\small{p_{\,1}=\ldots=p_{\,6}=0.90\rightarrow 0.90^6\approx0.53}$$?

❝ ❝ Or does each study stand on its own and we don’t have to care?

❝ Yes to 0.53.

❝ The risk is up to you or your client. I think there is no general awareness, …

❝ "Have to care" really involves the fine print. I think in the absence of further info it is difficult to tell if you should care and/or from which perspective care is necessary.

I’m pretty sure that we were wrong:

We want to demonstrate BE in all studies. Otherwise, the product would not get an approval (based on multiple studies in the dossier). That means, we have an ‘AND-composition’. Hence, the Inter­sec­tion-Union Test (IUT) principle applies1,2 and each study stands indeed on its own. Therefore, any kind of ‘power adjustment’ I mused about before is not necessary.

In my example above one would have to power each of the studies to $$\small{\sqrt[6]{0.90}=98.26\%}$$ to achieve ≥ 90% overall power. I cannot imagine that this was ever done.
Detlew and I have some empiric evidence. The largest number of confirmatory studies in a dossier I have seen so far was 12, powered to 80–90% (there were more in the package but only exploratory like comparing types of food, sprinkle studies, ). If overall power in multiple studies would have been really that low (say, $$\small{0.85^{12}\approx14\%}$$), I should have seen many more failures – which I didn’t.

❝ … but my real worry is the type I error, as I have indicated elsewhere.

We discussed that above.
Agencies accept repeating an inconclusive3,4 study in a larger sample size. I agree with your alter ego5 that such an approach may inflate the Type I Error indeed. I guess regulators trust more in the repeated study believing [sic] that its outcome is more ‘reliable’ due to the larger sample size. But that’s – apart from the inflated Type I Error – a fallacy.

1. Berger RL, Hsu JC. Bioequivalence Trials, Intersection-Union Tests and Equivalence Confidence Sets. Stat Sci. 1996; 11(4): 283–302. JSTOR:2246021. free resource.
2. Wellek S. Testing statistical hypotheses of equivalence. Boca Raton: Chapman & Hall/CRC; 2010. Chapter 7. p. 161–176.
3. If at least one of the confidence limits lies outside of the acceptance limits. That is disctinct from a bio­in­equi­valent study, where the confidence interval lies entirely outside the acceptance limits, i.e., the Null hypothesis is not rejected. That calls for a reformulation and starting over from scratch.
4. García-Arieta A. The failure to show bioequivalence is not evidence against generics. Br J Clin Pharmacol. 2010; 70(3): 452–3. doi:10.1111/j.1365-2125.2010.03684.x. Open access.
5. Fuglsang A. Pilot and Repeat Trials as Development Tools Associated with Demonstration of Bio­equi­va­lence. AAPS J. 2015; 17(3): 678–83. doi:10.1208/s12248-015-9744-6. Free Full text.

Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes