Bioequivalence and Bioavailability Forum

Helmut
★★★

Vienna, Austria,
2021-01-26 11:47
(1179 d 08:09 ago)

Posting: # 22190
Views: 2,818

Deficiencies ?? [Study Assessment]

Post reply

Dear all,

I recently came across a deficiency letter of the Polish agency.

Clinical documentation
Pharmacokinetics

_max

_(0-t)

[…]
Lack of a posteriori data on the power of statistical inference and at at the same time, lack of detailed criteria for estimating the sample size excludes the possibility of assessing whether the 90% confidence interval in the range of 80 – 125% for log-transformed pharmacokinetic parameters C_max and AUC_(0-t) of ██████ was designated with at least 80% power.
ANOVA analysis of variance showed statistically significant (at a 5% significance level) differences in AUC_(0-t) between investigational products, which further exacerbated the uncertainty about fulfillment of the bioequivalence criteria.

WTF‽

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro ★★★ Denmark, 2021-01-26 13:26 (1179 d 06:31 ago) @ Helmut Posting: # 22191 Views: 2,431	Deficiencies ?? Post reply
	Hi Hötzi, I understand your frustration. Here's a good basis for answering, just remember: 1. They do not know all the stuff that you know. 2. You cannot educate them. The rest is easy. — Pass or fail! ElMaestro

Helmut
★★★

Vienna, Austria,
2021-01-26 19:52
(1179 d 00:05 ago)

@ ElMaestro
Posting: # 22192
Views: 2,400

Deficiencies ??

Post reply

Hi ElMaestro,

❝ I understand your frustration.

I’m not frustrated. Just extremely surprised. :surprised:

❝ Here's a good basis for answering, just remember:

❝

❝ 1. They do not know all the stuff that you know.

Given.

❝ 2. You cannot educate them.

S^oo_{o sad!}

❝ The rest is easy. :-)

Define ‘the rest’.
Crude [image]

-script at the end. The example’s output:

Assumed CV : 25.00% Assumed PE : 95.00% Target power : 90.00% Sample size : 38 Achieved power: 90.89% Dosed : 44 (anticipated dropout-rate of 10%) 100,000 simulated 2×2×2 studies CV: 13.87 – 40.66% (geom. mean 24.62%) PE: 83.65 – 107.59% (geom. mean 94.99%) n : 28 – 37 (median 34) passed BE (90% CI within 80.00 – 125.00%): 98.61% percentages of passing studies with ‘post hoc’ power of <50%: 0.03% ≥50 – <60%: 3.11% ≥60 – <70%: 7.99% ≥70 – <80%: 16.74% ≥80 – <90%: 30.51% >90%: 41.63% 100% not within CI : 5.76% (∆ stat. significant)

Although the target power was 90%, 58.38% of passing studies did so with ‘post hoc’ power of <90% (and 27.87% with less than 80%). So what?

To quote the WHO:

The a posteriori power of the study does not need to be calculated.

library(PowerTOST) balance <- function(n, sequences) { # round up to get balanced sequences for potentially unbalanced case return (as.integer(sequences * (n %/% sequences + as.logical(n %% sequences)))) } adjust.dropouts <- function(n, do.rate) { # to be dosed subjects which should result in n eligible subjects based on the # anticipated droput-rate return (as.integer(balance(n / (1 - do.rate), sequences = 2))) } set.seed(123456) nsims <- 1e5L # number of simulations target <- 0.90 # target power CV <- 25 # assumed CV PE <- 95 # assumed PE do.rate <- 0.1 # anticipated dropout-rate CV.do <- 0.25 # assumed CV of the dropout-rate tmp <- sampleN.TOST(CV = CV/100, theta0 = PE/100, targetpower = target, details = FALSE, print = FALSE) n.des <- tmp[["Sample size"]] if (n.des >= 12) { power <- tmp[["Achieved power"]] } else { # acc. to GL n.des <- 12 power <- power.TOST(CV = CV/100, theta0 = PE/100, n = n.des) } n.adj <- adjust.dropouts(n = n.des, do.rate = do.rate) res <- data.frame(CV = rep(NA, nsims), n = NA, PE = NA, lower = NA, upper = NA, BE = FALSE, power = NA, signif = FALSE) post <- data.frame(sim = 1:nsims, pwr.50minus = FALSE, pwr.60 = FALSE, pwr.70 = FALSE, pwr.80 = FALSE, pwr.90 = FALSE, pwr.90plus = FALSE) pb <- txtProgressBar(0, 1, 0, char = "\u2588", width = NA, style = 3) for (j in 1:nsims) { do <- rlnorm(1, meanlog = log(do.rate) - 0.5*CV2mse(CV.do), sdlog = sqrt(CV2mse(CV.do))) res$n[j] <- as.integer(round(n.des * (1 - do))) res$CV[j] <- 100*mse2CV(CV2mse(CV/100) * rchisq(1, df = res$n[j] - 2)/(res$n[j] - 2)) res$PE[j] <- 100*exp(rnorm(1, mean = log(PE/100), sd = sqrt(0.5 / res$n[j]) * sqrt(CV2mse(CV/100)))) res[j, 4:5] <- round(100*CI.BE(CV = res$CV[j]/100, pe = res$PE[j]/100, n = res$n[j]), 2) res$power[j] <- suppressMessages( signif(power.TOST(CV = res$CV[j]/100, theta0 = res$PE[j]/100, n = res$n[j]), 5)) if (res$lower[j] >= 80 & res$upper[j] <= 125) { # only the ones which pass res$BE[j] <- TRUE if (res$power[j] < 0.5) post$pwr.50minus[j] <- TRUE if (res$power[j] >= 0.5 & res$power[j] < 0.6) post$pwr.60[j] <- TRUE if (res$power[j] >= 0.6 & res$power[j] < 0.7) post$pwr.70[j] <- TRUE if (res$power[j] >= 0.7 & res$power[j] < 0.8) post$pwr.80[j] <- TRUE if (res$power[j] >= 0.8 & res$power[j] < 0.9) post$pwr.90[j] <- TRUE if (res$power[j] >= 0.9) post$pwr.90plus[j] <- TRUE if (res$lower[j] > 100 | res$upper[j] < 100) res$signif[j] <- TRUE } setTxtProgressBar(pb, j/nsims) } close(pb) passed <- sum(res$BE) cat("\nAssumed CV :", sprintf("%.2f%%", CV), "\nAssumed PE :", sprintf("%.2f%%", PE), "\nTarget power :", sprintf("%.2f%%", 100*target), "\nSample size :", n.des, "\nAchieved power:", sprintf("%.2f%%", 100*power), "\nDosed :", n.adj, sprintf("(anticipated dropout-rate of %g%%)", 100*do.rate), "\n ", formatC(nsims, format = "d", big.mark = ","), "simulated 2\u00D72\u00D72 studies", "\n CV:", sprintf("%5.2f \u2013 %6.2f%%", min(res$CV), max(res$CV)), sprintf("(geom. mean %.2f%%)", exp(mean(log(res$CV)))), "\n PE:", sprintf("%5.2f \u2013 %6.2f%%", min(res$PE), max(res$PE)), sprintf("(geom. mean %.2f%%)", exp(mean(log(res$PE)))), "\n n :", min(res$n), "\u2013", max(res$n), sprintf("(median %g)", median(res$n)), "\n\npassed BE (90% CI within 80.00 \u2013 125.00%):", sprintf("%5.2f%%", 100*passed/nsims), "\npercentages of passing studies", "\n with \u2018post hoc\u2019 power of <50%:", sprintf("%5.2f%%", 100*sum(post$pwr.50minus)/passed), "\n \u226550 \u2013 <60%:", sprintf("%5.2f%%", 100*sum(post$pwr.60)/passed), "\n \u226560 \u2013 <70%:", sprintf("%5.2f%%", 100*sum(post$pwr.70)/passed), "\n \u226570 \u2013 <80%:", sprintf("%5.2f%%", 100*sum(post$pwr.80)/passed), "\n \u226580 \u2013 <90%:", sprintf("%5.2f%%", 100*sum(post$pwr.90)/passed), "\n >90%:", sprintf("%5.2f%%", 100*sum(post$pwr.90plus)/passed), "\n 100% not within CI :", sprintf("%5.2f%%", 100*sum(res$signif)/passed), "(\u2206 stat. significant)\n\n")

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2021-01-26 23:34
(1178 d 20:22 ago)

@ Helmut
Posting: # 22193
Views: 2,405

Deficiencies ??

Post reply

Hi Hötzi,

❝ Define ‘the rest’.

Just explain what you think, without in any way suggesting that your way of thinking is the only way of thinking, the right way of thinking or a better way of thinking than theirs.
Just present it as you see it. Briefly. Express your point without too much technical detail; for the Polish agency I would not emphasize my point with simulations, I don't think this will change much.

I am sure you will get approval. As I read it this is a minor thing to the assessor anyway.

—
Pass or fail!
ElMaestro

zizou
★

Plzeň, Czech Republic,
2021-01-31 02:07
(1174 d 17:49 ago)

@ Helmut
Posting: # 22195
Views: 2,532

Deficiencies or not

Post reply

Dear Helmut.

❝

3. ANOVA analysis of variance showed statistically significant (at a 5% significance level) differences in AUC_(0-t) between investigational products, which further exacerbated the uncertainty about fulfillment of the bioequivalence criteria.

The statistically significant formulation effect for AUC_(0-t) is quite common. The sample size is usually estimated with intra-subject CV of C_max (as it is usually higher than of AUC_(0-t)).

To continue with your example:
Assumed intra-subject CV of C_max = 25% -> with assumed PE of 95% for target power 90%: n = 38
Assumed intra-subject CV of AUC_(0-t) e.g. 10%
When assumed parameters used for sample size estimation will be theoretically observed in the study, i.e. the observed GMR will be 95% and observed intra-subect CV will be 10%, we will get 90% CI equal to 91.40-98.74% and statistically significant formulation effect (at a 10% significance level).
If the 90% CI does not contain 100%, the p value will always be <0.1 but how it could further exacerbated the uncertainty about fulfillment of the bioequivalence criteria? (Question for regulators.)

❝

100% not within CI : 5.76% (∆ stat. significant)

I just want to point that it's observed more often for AUC_(0-t) than for C_max and it would be interesting to know the percentage of that also for AUC_(0-t) with lower variability. ;)

❝

2. Lack of a posteriori data on the power...

The only good thing on this point is that the regulators believe that test and reference IMPs are bioequivalent. (As they deal with power.)
Posteriori power is never ending story. The question on the power should be raised to the protocol, if it is addressed to the report, it is just a suggestion for future projects.
Additionaly You can remind them what the low power means: With lower power, the type II error (sponsor's risk) is higher - it means that the bioequivalent preparations could be assessed wrongly as not bioequivalent with higher probability. The regulators should sleep well with higher type II error (unless the study was designed for e.g. 50% power from the start - but such protocols should be rejected).

I wish the regulators would be interested also in the type I error - it would mean that regulators do not believe that test and reference IMPs are bioequivalent. The probability of approving non-bioequivalent test product should be up to 5%. I noticed several studies which suprised me more than this deficiency letter. E.g. by using:

an approach - if study fail to conclude bioequivalence, perform bigger one.
As e.g. here in PAR DE/H/5934/003/DC (formerly UK/H/5815/003/DC) (PAR UK/H/5815/003/DC was also mentioned in this post but link to PAR UK/H/5815/003/DC isn't working anymore and I failed to find it elsewhere on the internet.)
STUDY 1 - "was not considered suitable" STUDY 2 - N=19, Cmax GMR = 84.24, 90% CI 78.00 - 90.98 %, ISCV=13.72% "Therefore the results failed to demonstrate that the test product desloratadine is bioequivalent to the reference product. This could probably be due to the number of drop-outs which was higher than expected, especially for desloratadine." With assumption of low ISCV (observed ISCV was 13.72%) it could be sufficient to have N=12 (for statistical analysis) with standardly assumed GMR of 95% for at least 80% power. So the reason for fail was observed GMR. Nevertheless (as usually) reason for repetition is low sample size resulting in the bigger repeated study regardless the TIE inflation. STUDY 3 - N=32, Cmax GMR = 104.24, 90% CI 98.45 - 110.36 %, ISCV=13.53% GMR in interval 95-105, ISCV lower than in previous study. BE is concluded.
If I would be a patient I would like to know how many bioequivalence studies were performed before achieving the bioeqivalence. (Or just report the TIE (the probability that it is non-bioequivalent treatment) if it is higher than 5%.)
or another approach - scheme:

Image according to: PAR NL/H/4422/001/DC or the same in PAR Melatonin 3 mg film-coated tablets PL 39936/0006
Medicines & Healthcare products
Regulatory Agency
MHRA

Public Assessment Report
National Procedure
Melatonin 3 mg film-coated tablets
(melatonin)
PL 39936/0006
Arriello s.r.o - Again, I failed to find it on the internet (I downloaded it in the past). In the PAR PL 39936/0006, the results are reported with two more decimal places there, which is really better in this case.)
There is interesting pooling of 2-period pilot study with 3-period partial replicate pivotal study - evaluated in similar way as described by FDA for Groups. Nevertheless period 3 was only in the pivotal study so there could be some incomplete blocks? Moreover the pilot study is pilot study! The design was obviously changed from pilot to pivotal. Sampling times seem to be the same, but washout was changed from 3 to 6 days. Does it mean that the washout was insufficient in pilot study?
As the pilot study is providing us only with informations for conducting following pivotal study. The pilot study had neither concluded BE nor failed to conclude BE, i.e. no alpha spended in the pilot study. So simply the pilot study doesn't demonstrate bioequivalence. If the bioequivalence would be demonstrated in the pilot study, why would they continue with the pivotal study? So I think there is no study which demonstrated bioequivalence but only one study which failed.
Above that 90% CI of "pooled Cmax" (76–95%) is not within the standard range 80.00-125.00%. (Widen limits are based on one of the pooled studies - clinical justification for widening not reported in PAR - justification that calculated intra-subject CV is a reliable estimate and that it is not the result of outliers also not reported.)

Btw. I am also sure you will get approval. As even studies where 90% CI was (partly) outside 80-125% were approved at the end.

Best regards,
zizou

PT Frustration - not related to vaccine

Helmut
★★★

Vienna, Austria,
2021-01-31 16:46
(1174 d 03:10 ago)

@ zizou
Posting: # 22196
Views: 2,287

Deficiencies or not

Post reply

Hi zizou,

❝ The statistically significant formulation effect for AUC_(0-t) is quite common. The sample size is usually estimated with intra-subject CV of C_max (as it is usually higher than of AUC_(0-t)).

Correct.

❝ To continue with your example:

❝ Assumed intra-subject CV of C_max = 25% -> with assumed PE of 95% for target power 90%: n = 38

❝ Assumed intra-subject CV of AUC_(0-t) e.g. 10%

❝ When assumed parameters used for sample size estimation will be theoretically observed in the study, i.e. the observed GMR will be 95% and observed intra-subect CV will be 10%, we will get 90% CI equal to 91.40-98.74% and statistically significant formulation effect (at a 10% significance level).

Correct as well. A study should be powered for the worst case combination (assumed CV, PE) of PK metrics. Naturally, PK metrics with ‘better combinations’ will have higher power.

❝ If the 90% CI does not contain 100%, the p value will always be <0.1 but how it could further exacerbated the uncertainty about fulfillment of the bioequivalence criteria? (Question for regulators.)

Sorry, I can’t answer. ;-)

❝ ❝

100% not within CI : 5.76% (∆ stat. significant)

❝ I just want to point that it's observed more often for AUC_(0-t) than for C_max and it would be interesting to know the percentage of that also for AUC_(0-t) with lower variability. ;)

Your wish is my command^I ( [image]

-script upon request).

Assumed CV (Cmax) : 25.00% Assumed CVs (AUC) : 25.00%, 20.00%, 15.00%, 10.00% Assumed PE : 95.00% Target power : 90.00% Sample size : 38 (based on Cmax) Achieved power (Cmax): 90.89% Achieved powers (AUC): 90.89%, 98.05%, 99.95%, 100.00% Dosed : 44 (anticipated dropout-rate of 10%) 100,000 simulated 2×2×2 studies n: 28 – 37 (median 34) Cmax (25.00%) CV : 13.87 – 40.66% (geom. mean 24.62%) PE : 83.65 – 107.59% (geom. mean 94.99%) passed BE: 98.61% (‘empiric power’) passing studies with ‘post hoc’ power of <50%: 0.03% ≥50 – <60%: 3.11% ≥60 – <70%: 7.99% ≥70 – <80%: 16.74% ≥80 – <90%: 30.51% >90%: 41.63% 100% not within CI (∆ stat. significant) : 5.76% AUC (25.00%) CV : 12.22 – 41.01% (geom. mean 24.60%) PE : 82.68 – 107.20% (geom. mean 95.00%) passed BE: 98.63% (‘empiric power’) passing studies with ‘post hoc’ power of <50%: 0.03% ≥50 – <60%: 3.10% ≥60 – <70%: 7.74% ≥70 – <80%: 16.78% ≥80 – <90%: 30.30% >90%: 42.05% 100% not within CI (∆ stat. significant) : 5.82% AUC (20.00%) CV : 9.77 – 31.70% (geom. mean 19.68%) PE : 84.80 – 105.86% (geom. mean 95.00%) passed BE: 99.98% (‘empiric power’) passing studies with ‘post hoc’ power of <50%: 0.00% ≥50 – <60%: 0.13% ≥60 – <70%: 0.62% ≥70 – <80%: 2.87% ≥80 – <90%: 12.47% >90%: 83.92% 100% not within CI (∆ stat. significant) : 13.06% AUC (15.00%) CV : 7.24 – 24.83% (geom. mean 14.77%) PE : 88.31 – 102.73% (geom. mean 95.01%) passed BE: 100.00% (‘empiric power’) passing studies with ‘post hoc’ power of <50%: 0.00% ≥50 – <60%: 0.00% ≥60 – <70%: 0.00% ≥70 – <80%: 0.01% ≥80 – <90%: 0.23% >90%: 99.75% 100% not within CI (∆ stat. significant) : 31.35% AUC (10.00%) CV : 4.77 – 15.32% (geom. mean 9.85%) PE : 90.25 – 100.03% (geom. mean 95.00%) passed BE: 100.00% (‘empiric power’) passing studies with ‘post hoc’ power of <50%: 0.00% ≥50 – <60%: 0.00% ≥60 – <70%: 0.00% ≥70 – <80%: 0.00% ≥80 – <90%: 0.00% >90%: 100.00% 100% not within CI (∆ stat. significant) : 79.43%

❝ ❝

2. Lack of a posteriori data on the power...

❝ The only good thing on this point is that the regulators believe that test and reference IMPs are bioequivalent. (As they deal with power.)

❝ Posteriori power is never ending story. The question on the power should be raised to the protocol, if it is addressed to the report, it is just a suggestion for future projects.

❝ Additionaly You can remind them what the low power means: With lower power, the type II error (sponsor's risk) is higher - it means that the bioequivalent preparations could be assessed wrongly as not bioequivalent with higher probability. The regulators should sleep well with higher type II error (unless the study was designed for e.g. 50% power from the start - but such protocols should be rejected).

Agree, though in the statistical sense we assume (i.e., believe) that products are not bioequivalent (that’s the Null) and hope that it will rejected.

❝

❝ I wish the regulators would be interested also in the type I error - …

So do I. Inflated Type I Error in reference-scaling – another issue generally ignored.

❝ … it would mean that regulators do not believe that test and reference IMPs are bioequivalent. The probability of approving non-bioequivalent test product should be up to 5%.

Correct.

❝ I noticed several studies which suprised me more than this deficiency letter…

Funny stories! Assessors of the MHRA are a strange bunch.
The most bizarre I have seen was this one:

1. Pilot:
  Not particularly nice PE though BE seems to be possible in a pivotal study.
2. Pivotal:
  CV similar to the pilot study but the PE moved further away from 100% than in the pilot. Study failed and repeating in a larger sample size (like in your example) was considered futile. Product reformulated and →
1. Pilot:
  CV similar to the others, PE promising this time.
2. Pivotal:
  Passed BE with flying colors.

Submitted #2.b. to the MHRA. Synopses of the others as well to document the product development.

The MHRA wanted to see a pooled (pooled ‼) analysis of all four studies. What the heck?

The applicant replied that the first formulation went into the waste bin and hence, only market authorization of the second one was sought. The purpose of #2.a. was just to design #2.b. – which stands on its own. Refused to pool any of the studies. Pointed also out that the studies were evaluated with α 0.05 and by pooling the consumer risk cannot be controlled by any means.

[image]

Then the MHRA insisted to get a pooled analysis of #2 (with a 95% CI^II). Passed AUC, failed C_max (by a small margin).
Accepted and market authorization granted…

❝ PT Frustration - not related to vaccine

Yep.

Homework (not for an initiated like you but interested readers): Why pass in the simulations more studies than what we expect for the sample size, or – in other words – why is the ‘empiric power’ higher than planned?
Oh dear, Bonferroni misused post mortem!
Nonsense cause the entire α was already spent.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Deficiencies ?? [Study As­sess­ment]

Deficiencies ??

Deficiencies ??

Deficiencies ??

Deficiencies or not

Deficiencies or not

Deficiencies ?? [Study Assessment]