Martynkf
☆    

Budapest, Hungary,
2026-05-12 15:23
(22 d 11:23 ago)

Posting: # 24616
Views: 1,208
 

 Sample size with theta0 < 0.95: any regulatory pushback? [Regulatives / Guidelines]

Dear all,

Long time lurker, first time questioner here! Hi everybody!

A regulatory-reception question rather than a statistical one, but I hope it would still be welcome.

The convention is to drive the BE sample size with an assumed T/R-ratio of 0.95 (lower side, since 1/1.05 = 0.9524 ofc). My read of ICH M13A is that no particular value of theta0 is prescribed, only that the assumption be justified.

The adjacent thread already covers most of the statistical ground, with ElMaestro's view that one simply plugs in the GMR considered realistic plus a worst-case buffer, and d_labes' observation that he has never met a sponsor-desired sample size which failed to find a 'scientific' justification (Armani suit optional). Helmut's own articles default to theta0 = 0.90 for RSABE/ABEL without apology (reminiscent of the two Lászlós recommendations). Among practitioners the matter seems quite settled.

The question is the assessors' side. Has anyone here run into pushback from EU agencies — EMA via DCP/MRP, or national authorities — when an ABE protocol used an assumed deviation larger than 5%, say theta0 = 0.90, outside the HVD/NTID setting? Statistically that is the conservative direction (n inflated, power preserved against a less flattering outcome), and a deficiency on principle would be hard to justify. But principle and practice may differ.

The context, and the reason I am asking: some Central European authorities I deal with routinely question the sample size assumptions in ways that suggest they are overly strict in analysing the a priori assumptions post hoc (regardless of outcome). Rounded ISCVs are contested, back-calculated ISCVs are contested, occasionally one wonders whether anything would not be contested (dropout rate anyone?) We had a recent pivotal that failed with an observed GMR around 0.88, and we are now wondering whether reaching for theta0 = 0.90 in the repeat study is a sensible pre-emptive move or merely an invitation to a different flavour of a deficiency letter. EAEU experience also welcome, though I suspect the relevant adjective there is 'creative' rather than 'consistent'.

Thanks,
Marty
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2026-05-12 16:55
(22 d 09:52 ago)

@ Martynkf
Posting: # 24617
Views: 1,154
 

 Sample size with theta0 < 0.95: Why not?

Hi Marty,

❝ Long time lurker, first time questioner here! Hi everybody!

Welcome to the club!

❝ A regulatory-reception question rather than a statistical one, but I hope it would still be welcome.

Of course.

❝ […] My read of ICH M13A is that no particular value of theta0 is prescribed, only that the assumption be justified.

Correct. Not even sure whether it has to be justified, rather than there should be “an appropriate sample size determination”.
This rubber clause was 1:1 pasted from the EMA’s 2010 guideline, where itself was taken from ICH E9 of 1998.

❝ The adjacent thread already covers most of the statistical ground […] Among practitioners the matter seems quite settled.

Correct again.

❝ […] Has anyone here run into pushback from EU agencies — EMA via DCP/MRP, or national authorities — when an ABE protocol used an assumed deviation larger than 5%, say theta0 = 0.90, outside the HVD/NTID setting? Statistically that is the conservative direction (n inflated, power preserved against a less flattering outcome), and a deficiency on principle would be hard to justify. But principle and practice may differ.

Not me.

❝ […] some Central European authorities I deal with routinely question the sample size assumptions in ways that suggest they are overly strict in analysing the a priori assumptions post hoc (regardless of outcome). Rounded ISCVs are contested, back-calculated ISCVs are contested, occasionally one wonders whether anything would not be contested (dropout rate anyone?)

The dreadful post hoc (a posteriori, retrospective) power entering through the backdoor? Excuse my French – WTF?
In BE – according to all global guidelines – there is no place for Bayesian priors / posteriors. FrequentistsI define hypotheses (H0 = inequivalence, H1 = equivalence), test them with a pre-defined \(\alpha\), and get a dichotomous outcome (pass | fail).II That’s all, end of the story.

❝ We had a recent pivotal that failed with an observed GMR around 0.88, and we are now wondering whether reaching for theta0 = 0.90 in the repeat study is a sensible pre-emptive move or merely an invitation to a different flavour of a deficiency letter.

Did you plan the sample size of the failed study for an assumed T/R-ratio of 0.95 or on what? This 0.88 is the ‘best’ estimate you have right now. Planning the next study for 0.90 is somewhat risky. You know that power curves (and thus the sample sizes) are most sensitive to the T/R-ratio. n for 0.88  might  will be substantial larger than the n for 0.90. If you have to deal with an agency already notoriously questioning your assumptions, even with a passing study you may open a can of worms. Maybe of interest a sneak-preview of a presentation about repeating studies for the upcoming BioBridges. Talk to the guy in the Armani suit.

❝ EAEU experience also welcome, though I suspect the relevant adjective there is 'creative' rather than 'consistent'.

No practical experiences but I’ve heard that they have some funky points of view.


  1. Everybody is a Bayesian.
    It’s just that some know it, and some don’t.
        Trivellore Raghunathan


  2. Hardcore statisticians don’t care about the alternative hypothesis H1 (equivalence).
    If H0 is not rejected → fail and if it is rejected → pass. Although the latter implies that H1 is accepted, they would not necessarily state it as such.
    Using the confidence inclusion approach according to the guidelines (i.e., by assessing the \(\small{100\left(1-2\,\alpha\right)}\) CI of the PE and the pre-specified BE margins), we get more information:
    1. CI entirely within the margins
      pass and equivalence proven with \(\small{\alpha}\)
    2. At least one confidence limit outside the margins
      fail (inconclusive, underpowered)
    3. CI entirely outside the margins
      fail and inequivalence proven with \(\small{\alpha}\)
    Only in the second case you could consider repeating the study in a larger sample size.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
zizou
★    

Plzeň, Czech Republic,
2026-05-22 18:11
(12 d 08:35 ago)

@ Helmut
Posting: # 24629
Views: 598
 

 Repeating Studies

Dear Helmut,

❝ Maybe of interest a sneak-preview of a presentation about repeating studies for the upcoming BioBridges.

I sneaked to the preview of the presentation about repeating studies - interesting topic.

After looking at slide 11, i started to be interested also in case with not bioequivalent formulations (related to Type I Error - incorrectly reject true null hypothesis of bioinequivalence, i.e. incorrectly conclude bioequivalence for not bioequivalent formulations). Just theoretical case with poor test formulation - not bioequivalent with reference, assumed true ratio 0.8 - to get power not more than 5 % (as TIE).
So (not suggested) scenario is to repeat bioequivalence study several times until the success and to see probability when "success" could happen.
    n      p
    1   0.0500
    2   0.0975
    3   0.1426
    4   0.1855
    5   0.2262
    6   0.2649
    7   0.3017
    8   0.3366
    9   0.3698
   10   0.4013
   11   0.4312
   12   0.4596
   13   0.4867
   14   0.5123
   15   0.5367
   16   0.5599
   17   0.5819
   18   0.6028
   19   0.6226
   20   0.6415
   21   0.6594
   22   0.6765
   23   0.6926
   24   0.7080
   25   0.7226
   26   0.7365
   27   0.7497
   28   0.7622
   29   0.7741
   30   0.7854
   31   0.7961
   32   0.8063
n ... Try No.
p ... Probability of BE Concluded
Note: p = 1 - 0.95^n (calculation is similar like with a dice, to roll at least one six from several tries, all tries have probability 5/6 to not achieve six, in case above probability is 0.95 to not achieve BE in one try).

Best regards,
zizou
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2026-05-23 10:30
(11 d 16:17 ago)

@ zizou
Posting: # 24631
Views: 569
 

 Buongiorno, signor Bonferroni!

Hi zizou,

I didn’t think about the Type I error! THX for point it out.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2026-05-25 19:13
(9 d 07:33 ago)

@ Helmut
Posting: # 24632
Views: 465
 

 Buongiorno, signor Bonferroni!

Well, if you see Šidák, pass on my sincere greetings while you're at it. :-)

Pass or fail!
ElMaestro
ElMaestro
★★★

Denmark,
2026-05-13 03:26
(21 d 23:21 ago)

@ Martynkf
Posting: # 24618
Views: 1,083
 

 Sample size with theta0 < 0.95: any regulatory pushback?

Hi Martynkf,

❝ The question is the assessors' side. Has anyone here run into pushback from EU agencies — EMA via DCP/MRP, or national authorities — when an ABE protocol used an assumed deviation larger than 5%, say theta0 = 0.90, outside the HVD/NTID setting? Statistically that is the conservative direction (n inflated, power preserved against a less flattering outcome), and a deficiency on principle would be hard to justify. But principle and practice may differ.



Great question. And wise words from Hötzi.
I did not have sample size calcs questioned that way. Whatever is assumed can be justified, so is my empirical experience. For a tyrosine kinase inhibitor known to misbehave a deviation of 12.5% on the GMR was accepted as far as I recall.

I'd like to bring one perspective into this: Potency correction is allowed by M13a and other guidelines. The only reason it is there is that regulators are fully aware that sometimes products differ by (much) more than 5%. Obivously no IVIVC is ever a perfect 1:1, so an x% difference on the COAs can translate into an in vivo deviation (obs GMR) or more than or less than x%. The whole reason we conduct in vivo studies is that we can't well predict in vivo behaviour. So, I'd say it would be almost unethical to have to always assume a deviation of max 5%.

Add to this that in BE, by convention we use a parametric model. We do not test for normality, we just assume it. This has unknown consequences for the evaluation. It can go both ways, depending on how the data is actually distributed (for the same reason I am not huge a fan of saying that testing for XYZ in BE is useless because this or that estimator is biased). So, we have a bunch of assumptions, plus a bunch of uncertainty. A deviation of more than 5% is certainty justified in many cases.

And in real life: The guy in the Armani suit dictates that the head of clin ops can use up to -certainly not more than- $XYZ on the trial. Now sample size derivation becomes a matter of identifying the worst GMR that keeps the sample size within the budget, given some kind of CV. That is unfortunately how it often works in practice. And that part of the story is never disclosed to regulators in a dossier.

Pass or fail!
ElMaestro
Martynkf
☆    

Budapest, Hungary,
2026-05-13 10:27
(21 d 16:20 ago)

@ ElMaestro
Posting: # 24619
Views: 1,091
 

 Sample size with theta0 < 0.95: any regulatory pushback?

Thank you both for wise counsel!

To add some spice to the situation, this failed pivotal came after a pitch-perfect pilot with GMR ~1, ISCV ~18% (same as in the literature) on a BCS III molecule (although with a nice 2-compartmental kinetics).

The failed pivotal reported a ~27% ISCV, with this bad GMR of ~0.88. You can show outliers with Cook's distance, DFFITS, QQplot etc., but in my practice you can show these things in most studies (even ones which pass) and they should be accepted rather than relied upon in any meaningful way unless you have something damning on the subjects themselves.

The CMC people of course swear that the products from the pilot and the pivotal are practically the same.

My thought is that there is a kind of survival bias at play here. The study was powered at 80%, and we wouldn't be discussing it if it passed. If the study fails, the reported GMR and/or ISCV tends to be bad. Sorry because this may slightly muddle the frequentist and baysean realms :-(

The Armani-pressure is in the other way in this case… and if I take the ISCV and the GMR as read from the failed study I'm looking at 100+ subjects for said molecule and even I start to get nervous :lookaround:

I intend to take the orig. reported ISCVs, treat the failed ISCV as an outlier and dismiss it (I guess Helmut would advise me to pool them based on the chi-sq. distribution), power the new study at 90% with standard 95% theta0 and show the GMR sensitivity plot to the Armani people.

Thanks again, didn't want to leave you guys without an update!
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2026-05-13 10:53
(21 d 15:54 ago)

@ Martynkf
Posting: # 24620
Views: 1,090
 

 More information, please

Hi Marty,

before diving into the details of your post, can you give further information?
  1. theta0 you used in planning the pivotal study, target power (CV 0.18, right?)
  2. Number of eligible subjects in the failed study (CV 0.27, PE 0.88)
Before talking to the guy in the Armani suit see the Bayesian stuff for sample size estimation based on a previous study and statistical assurance.*


  • Ring A, Lang B, Kazaroho C, Labes D, Schall R, Schütz H. Sample size determination in bioequivalence studies using statistical assurance. Br J Clin Pharmacol. 2019; 85(10): 2369–77. doi:10.1111/bcp.14055. [image] Open access.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Martynkf
☆    

Budapest, Hungary,
2026-05-13 12:27
(21 d 14:20 ago)

@ Helmut
Posting: # 24621
Views: 1,031
 

 More information, please

❝ before diving into the details of your post, can you give further information?


❝ 1. theta0 you used in planning the pivotal study, target power (CV 0.18, right?)


theta0 = .95, confirmed. targetpower = .8, dropout rate 10% (PowerTOST_output/0.9)

❝ 2. Number of eligible subjects in the failed study (CV 0.27, PE 0.88)


27 (this is its own can of worms right?)

❝ Before talking to the guy in the Armani suit see the Bayesian stuff for sample size estimation based on a previous study and statistical assurance.*


Thanks! I don't routinely use the Baysean stuff, and I like the assurance framework!
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2026-05-13 15:25
(21 d 11:21 ago)

@ Martynkf
Posting: # 24622
Views: 1,034
 

 Still 🤔

Hi Marty,

sorry, I’m not sure whether I do understand your values.

❝ theta0 = .95, confirmed. targetpower = .8, dropout rate 10% (PowerTOST_output/0.9)


library(PowerTOST)
nadj <- function(n, do.rate, nseq = 2) { # adjusted sample size (balanced sequences)
  x  <- n / (1 - do.rate)
  return(as.integer(nseq * (x %/% nseq + as.logical(x %% nseq))))
}
CV      <- 0.18 # assumed
theta0  <- 0.95 # assumed
target  <- 0.80 # target (desired) power
do.rate <- 0.10 # anticipated dropout-rate 10%
n       <- sampleN.TOST(CV = CV, theta0 = theta0, targetpower = target,
                        design = "2x2")[["Sample size"]]

+++++++++++ Equivalence test - TOST +++++++++++
            Sample size estimation
-----------------------------------------------
Study design: 2x2 crossover
log-transformed data (multiplicative model)

alpha = 0.05, target power = 0.8
BE margins = 0.8 ... 1.25
True ratio = 0.95,  CV = 0.18

Sample size (total)
 n     power
16   0.820357


cat("adjusted sample size =", nadj(n, do.rate), "\n")

adjusted sample size = 18


❝ ❝ 2. Number of eligible subjects in the failed study (CV 0.27, PE 0.88)


❝ 27 (this is its own can of worms right?)


Given the sample size estimation above, how did you end up with 27 eligible in the study? Even if you would have targeted 90% power, we get n = 22 and nadj = 26.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Martynkf
☆    

Budapest, Hungary,
2026-05-15 10:02
(19 d 16:45 ago)

@ Helmut
Posting: # 24623
Views: 965
 

 Still 🤔

Hi Helmut!

❝ ❝ sorry, I’m not sure whether I do understand your values.


Integrity, curiosity and openness mostly :flower:

❝ Given the sample size estimation above, how did you end up with 27 eligible in the study? Even if you would have targeted 90% power, we get n = 22 and nadj = 26.


There were some divergent literature values and the samp.size calculation was based on that bigger value which ended up being ~30 subjects -> 27 evaluable. I was not involved in that stage.
UA Flag
Activity
 Admin contact
23,653 posts in 4,991 threads, 1,571 registered users;
383 visitors (0 registered, 383 guests [including 55 identified bots]).
Forum time: 02:47 CEST (Europe/Vienna)

I’m all in favor of the democratic principle
that one idiot is as good as one genius, but I draw the line
when someone takes the next step and concludes
that two idiots are better than one genius.    Leo Szilard

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5