Bioequivalence and Bioavailability Forum • Inflation type one error

Mikalai
★

Belarus,
2019-11-05 19:59
(1627 d 07:24 ago)

Posting: # 20750
Views: 4,741

Inflation type one error [RSABE / ABEL]

Post reply

Dear all,

I am not a statistician and struggle to grasp the concept of type one error inflation in the reference-scaled approach. We basically do the same things as in the usual average bioequivalence where we are able to preserve TIE at 5%, but when we expand CI, we have got TIE inflation. What is behind this inflation, philosophically and mathematically? I also bumped into this discussion where some ever argue about whether this concept even exists.
https://daniellakens.blogspot.com/2016/12/why-type-1-errors-are-more-important.html
I assume this is not related to multiple testing. Also, we have a statistical concept, but do we have any real proof of this concept? Does anyone know products that were initially registered and then withdrawn from the market because their initial bioequivalence had been due to the inflation TIE.
Maybe this is not related to SABE but two-stage design, but anyway. What prevents us from using Bonferroni correction in the two-stage adaptive design, instead of rather complicated other statistical approaches?

Thanks in advance

Helmut
★★★

Vienna, Austria,
2019-11-05 23:50
(1627 d 03:33 ago)

@ Mikalai
Posting: # 20751
Views: 4,087

Inflation type one error

Post reply

Hi Mikalai,

❝ I am not a statistician […]

So am I.

❝ We basically do the same things as in the usual average bioequivalence where we are able to preserve TIE at 5%, [... ]

No, we aren’t. In ABE we have fixed limits of the acceptance range, i.e. a pre-specified Null Hypothesis. In ABEL the limits are random variables or in other words, the Null is generated ‘in face of the data’. That means that each study sets it own standards and if we have a couple of HVDPs, each of them was approved according to different rules.

❝ What is behind this inflation, philosophically and mathematically?

Maybe this presentation helps. In short: Reference-scaling is based on the true population parameters (hence the Greek letters $\theta_s,\,\mu_T,\,\mu_R,\,\sigma_{wR}$). The true standard deviation $\sigma_{wR}$ of the reference is unknown. We have only its estimate $s_{wR}$ from the study. Imagine: The true within-subject CV of the reference is 27%. Hence, it is not an HVD(P) and we should use the conventional limits of 80.00-125.00%. However, by chance in our study we get an estimate of 35% and we expand the limits. Since the PE and the 90% are not affected it means that the chance of passing BE increases. The chance to falsely not accepting the Null increases and this is the inflated type I error.

❝ I also bumped into this discussion where some ever argue about whether this concept even exists.

❝ https://daniellakens.blogspot.com/2016/12/why-type-1-errors-are-more-important.html

❝ I assume this is not related to multiple testing.

Nice one. Your assumption is correct.

❝ Also we have a statistical concept, but do we have any real proof of this concept? Does any❝ W know products that were initially registered and then withdrawn from the market because their initial bioequivalence had been due to the inflation TIE?

No (twice). But these questions deserve a detailed discussion. More when I’ll be back from Athens.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Mikalai
★

Belarus,
2019-11-06 16:13
(1626 d 11:10 ago)

@ Helmut
Posting: # 20756
Views: 3,951

Inflation type one error

Post reply

❝ Maybe this presentation helps. In short: Reference-scaling is based on the true population parameters (hence the Greek letters $\theta_s,\,\mu_T,\,\mu_R,\,\sigma_{wR}$). The true standard deviation $\sigma_{wR}$ of the reference is unknown. We have only its estimate $s_{wR}$ from the study. Imagine: The true within-subject CV of the reference is 27%. Hence, it is not an HVD(P) and we should use the conventional limits of 80.00-125.00%. However, by chance in our study we get an estimate of 35% and we expand the limits. Since the PE and the 90% are not affected it means that the chance of passing BE increases. The chance to falsely not accepting the Null increases and this is the inflated type I error.

Dear Helmut,
I may be wrong but I cannot see how we can get a true within-subject CV of any drug. I may be wrong, but even with simulations (I suppose with simulations some assumptions regarding variance should be made), it is very difficult or even impossible. Usually, we have very scarce data on within-subject CVs. How can we control in this situation TIE and what regulators say on this subject? I do not remember any reflection on this matter in official documents (EMA, FDA)?
Regards,
Mikalai

Helmut
★★★

Vienna, Austria,
2019-11-08 15:52
(1624 d 11:31 ago)

@ Mikalai
Posting: # 20766
Views: 3,903

Inflation type one error

Post reply

Hi Mikalai,

❝ I may be wrong but I cannot see how we can get a true within-subject CV of any drug.

Of course, you are right. The true CV_wR is unknown. But reference-scaling should be done for true HVD(P)s, i.e., where the population’s CV_wR >30%. However, $s_{wR}$ is the best unbiased estimate of $\sigma_{wR}$. The former is used in the expansion formula, i.e., treating $s_{wR}$ as a true value.

❝ I may be wrong, but even with simulations (I suppose with simulations some assumptions regarding variance should be made), it is very difficult or even impossible.

There are essentially two options:

Our ad hoc solution¹ by simulating under the assumption $s_{wR}=\sigma_{wR}$ to iteratively adjust $\alpha$. That’s in the “spirit of the guideline” where the observed CV_wR is used for expanding the limits.
Muñoz et al.² suggested to “assume the worst” and – since the true value is unknown – adjust $\alpha$ always as if CV_wR = 30%. That’s in any case the most conservative approach but might negatively impact power in case of high CVs (were the upper cap of scaling and the GMR-restriction already effectively controls the TIE). For examples see there.

Note that in both approaches the GMR of the Null is specified according to the expanded limits.

Still: The expansion is based on the observed CV_wR. We once had the crazy idea of using a very conservative (99.9%) CI instead. Doesn’t work because then we would practically never be allowed to scale…

❝ […] what regulators say on this subject?

Nothing. I raised this issue at numerous conferences. Dead silence. Armin Koch (co-author of one of the papers³ noting the inflated TIE) is a member of the EMA’s Biostatistical Working Party. Sent him an e-mail in 2016. No answer.

❝ I do not remember any reflection on this matter in official documents (EMA, FDA)?

EMA = zero. At the 2^nd GBHI conference (Sep 2016, Rockville) László Endrenyi gave a presentation “Features, Constraints, and Extensions of the Scaling Approach” where he showed examples of the TIE, both for the EMA’s and the FDA’s approaches. Donald Schuirmann said “There is a recent paper in Pharm Res. showing how to deal with the inflation of the type I error. This is an excellent and applicable approach.” and told me in a coffee-break “… if this is correct, we have to modify our method”. Didn’t happen. Will ask him again next month at the 4th GBHI in Bethesda.

Labes D, Schütz H. Inflation of Type I Error in the Evaluation of Scaled Average Bioequivalence, and a Method for its Control. Pharm Res. 2016: 33(11); 2805–14. doi:10.1007/s11095-016-2006-1.
Muñoz J, Alcaide D, Ocaña J. Consumer’s risk in the EMA and FDA regulatory approaches for bioequivalence in highly variable drugs. Stat Med. 2016: 35(12); 1933–43. doi:10.1002/sim.6834.
Wonnemann M, Frömke C, Koch A. Inflation of the Type I Error: Investigations on Regulatory Recommendations for Bioequivalence of Highly Variable Drugs. Pharm Res. 2015: 32(1); 135–43. doi:10.1007/s11095-014-1450-z.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Helmut
★★★

Vienna, Austria,
2019-11-10 12:33
(1622 d 14:50 ago)

@ Mikalai
Posting: # 20780
Views: 3,966

Inflation type one error: FDA

Post reply

Hi Mikalai,

❝ […] what regulators say on this subject? I do not remember any reflection on this matter in official documents (EMA, FDA)?

Some slides of Terry Hyslop (Director, Division of Biostatistics) of his presentation “Bioequivalence (BE) for Highly Variable Drugs” at the AAPS Workshop (New Orleans, Nov 2010) – after the progesterone guidance was published…

[image]

[image]
Aka the ‘implied limits’ (see this post and slide 23 below).

[image]

[image]
Well roared, lion! Trouble starts because we use $s_{WR}$ instead of the unknown $\sigma_{WR}$.
Nasty but $s_{WR}$ is all we have.
Typo, should read
… use scaled average BE if s_WR > cutoff.

[image]

[image]
Illegible text (white with grey shadowing):
assuming no subject-by-formulation interaction, σWT = σWR,
true GMR = max (1.25, implied scaled BE limit)

What‽
library(PowerTOST) swR <- sort(c(CV2se(0.3), 0.294, seq(0.2, 0.3, 0.01))) CVwR <- se2CV(swR) reg <- reg_const("FDA") reg$CVswitch <- se2CV(0.294) reg$pe_constr <- FALSE # pure RSABE (without PE constraint) GMRs <- data.frame(GMR = scABEL(CV = CVwR, regulator = reg)[, "upper"], GMR.Terry = exp(reg$r_const * swR)) res <- data.frame(swR = swR, CVwR = CVwR, GMR = GMRs$GMR, TIE = NA_real_, GMR.max = NA_real_, TIE.Terry = NA_real_) for (j in seq_along(swR)) { # Cheating: That is not implemented in the guidance! res$GMR.max[j] <- max(c(1.25, GMRs$GMR.Terry[j])) res$TIE.Terry[j] <- power.RSABE(CV = CVwR[j], theta0 = res$GMR.max[j], design = "2x3x3", n = 36, nsims = 1e6) # That is correct! res$TIE[j] <- power.RSABE(CV = CVwR[j], theta0 = res$GMR[j], design = "2x3x3", n = 36, nsims = 1e6) } reg; print(res, digits = 4, row.names = FALSE) FDA regulatory settings - CVswitch = 0.3004689 - no cap on scABEL - regulatory constant = 0.8925742 - no pe constraint swR CVwR GMR TIE GMR.max TIE.Terry 0.2000 0.2020 1.250 0.04988 1.250 0.04988 0.2100 0.2123 1.250 0.04997 1.250 0.04997 0.2200 0.2227 1.250 0.05037 1.250 0.05037 0.2300 0.2331 1.250 0.05166 1.250 0.05166 0.2400 0.2435 1.250 0.05460 1.250 0.05460 0.2500 0.2540 1.250 0.06027 1.250 0.06027 0.2600 0.2645 1.250 0.06985 1.261 0.05152 0.2700 0.2750 1.250 0.08389 1.273 0.04749 0.2800 0.2856 1.250 0.10211 1.284 0.04621 0.2900 0.2962 1.250 0.12375 1.295 0.04612 0.2936 0.3000 1.250 0.13233 1.300 0.04621 0.2940 0.3005 1.250 0.13341 1.300 0.04621 0.3000 0.3069 1.307 0.04628 1.307 0.04628

[image]

[image]

Check:

library(PowerTOST) res <- data.frame(method = c("ABE", "RSABE"), TIE = NA) res$TIE[1] <- power.TOST(CV = 0.3, n = 36, theta0 = 1.25, design ="2x3x3") res$TIE[2] <- power.RSABE(CV = 0.3, n = 36, theta0 = 1.25, design ="2x3x3", nsims = 1e6) res$TIE <- signif(res$TIE, 4) print(res, row.names = FALSE) # method TIE # ABE 0.0500 # RSABE 0.1323

Hence, the FDA was well aware of the inflated type I error and decided to ignore it.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

PharmCat
★

Russia,
2019-11-06 00:06
(1627 d 03:17 ago)

@ Mikalai
Posting: # 20752
Views: 4,393

Inflation type one error

Post reply

❝ Dear all,

❝ Thanks in advance

Dear Mikalai, Bonferroni correction performed when two o more independent tests done. And this correction is very rude (More delicate is Sidak correction). But in case of adaptive design we have one test, and then another with part of same data. We don't have independent comparation and we should spend our alpha: one part in first test, another at second. And we can spend any proportion of alpha as we wish, but overal alpha should not be greater, for example 0.05. We should use an application of alpha-spending function. Pocock boundary, Haybittle–Peto boundary, O'Brien–Fleming boundary - there are many approaches to work with interim analysis.

Range of CI itself don't influence on TIE, it is only convention. But when CI dynamically changing I think there is no good definition for TIE. For fixed CI TIE means that real GMR may be outside permissible range with this chance. Very touching assumption to consider that TIE for RSABE is a chance when GMR outside 0.8-1.25 and with this comprehension make CI range wider. Really in this situation TIE not the same as in fixed case. But people want to make ABE for high-variable drug and try to do this :cool:

it's like attempt to trick statistics...

But I could be wrong...

Helmut
★★★

Vienna, Austria,
2019-11-08 16:21
(1624 d 11:02 ago)

@ PharmCat
Posting: # 20767
Views: 3,883

Inflation type one error

Post reply

Hi PharmCat,

❝ But when CI dynamically changing I think there is no good definition for TIE.

I guess you mean that the acceptance range changes (depending on the $s_{wR}$). The CI is not affected.

❝ For fixed CI TIE means that real GMR may be outside permissible range with this chance.

Yep. For fixed limits the TIE is defined based on the Null of bioinquivalence. Directly accessible as the power for GMR exactly at one of the limits.

library(PowerTOST) CV <- 0.3 n <- 34 design <- "2x2x4" GMR <- 1.25 # exact power.TOST(CV = CV, n = n, theta0 = GMR, design = design) # [1] 0.05 # simulations power.TOST.sim(CV = CV, n = n, theta0 = GMR, design = design, nsims = 1e6) # [1] 0.050097

You can plug in any CV, n, design and the TIE will never exceed nominal α.

❝ Very touching assumption to consider that TIE for RSABE is a chance when GMR outside 0.8-1.25 and with this comprehension make CI range wider. Really in this situation TIE not the same as in fixed case.

Here the trouble starts (see what I wrote above). What we are doing here is actually HARKing (Hypothesizing After the Results are Known). Not exactly but we definitely generate the Null from the data. Apart from the TIE-issues every product approved by RSABE/ABEL followed its own rules. From a consumer’s perspective this is not fortunate.

❝ But people want to make ABE for high-variable drug and try to do this :cool: it's like attempt to trick statistics...

Not sure what you mean here. Can you elaborate?

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

PharmCat
★

Russia,
2019-11-08 19:15
(1624 d 08:08 ago)

@ Helmut
Posting: # 20769
Views: 3,884

Inflation type one error

Post reply

Hello!

Sorry for my bad english!

❝ Here the trouble starts (see what I wrote above). What we are doing here is actually HARKing (Hypothesizing After the Results are Known). Not exactly but we definitely generate the Null from the data. Apart from the TIE-issues every product approved by RSABE/ABEL followed its own rules. From a consumer’s perspective this is not fortunate.

Yes, we generate hypothesis, but we loss any link of TIE with reality. We form hypothesis from variance estimate, but it is only estimate we don't know real variance. I can't imagine how to definite TIE in this case.

❝ Not sure what you mean here. Can you elaborate?

Of course. It it my description of situation :-D

I think that "HARKing" come to bioequivalce because it is very expensive to make BE with big sample size or make therapeutic equivalence and it is compromise between regulators and industry. And from my side HARKing is a bad statistics (yes: consumer’s perspective this is not fortunate), but it is discussible ... some persons recommending that HARKing not be taught by educators, encouraged by reviewers or editors, or practiced by authors

Helmut
★★★

Vienna, Austria,
2019-11-08 21:26
(1624 d 05:57 ago)

@ PharmCat
Posting: # 20770
Views: 4,009

TIE = chance of passing at the border(s)

Post reply

Hi PharmCat,

❝ Sorry for my bad english!

No worries, mine is hardly better.

❝ Yes, we generate hypothesis, but we loss any link of TIE with reality. We form hypothesis from variance estimate, but it is only estimate we don't know real variance.

Well, the expansion according to the guideline(s) uses the estimate as well. Try this one:

library(PowerTOST) theta0 <- seq(0.75, 1, 0.01) theta0 <- sort(unique(c(theta0, 1/theta0))) CV <- 0.30 design <- "2x2x4" n <- sampleN.scABEL(CV = CV, design = design, theta0 = 0.90, print = FALSE, details = FALSE)[["Sample size"]] powerRSABE.ad <- powerRSABE <- powerABEL.ad <- powerABEL <- powerABE <- numeric() ABEL.ad <- scABEL.ad(CV = CV, n = n, design = design, print = FALSE)$alpha.adj RSABE.ad <- scABEL.ad(CV = CV, n = n, design = design, regulator = "FDA", print = FALSE)$alpha.adj for (j in seq_along(theta0)) { if (theta0[j] == 0.80 | theta0[j] == 1.25) nsims <- 1e6 else nsims <- 1e5 powerABE[j] <- power.TOST(CV = CV, theta0 = theta0[j], n = n, design = design) powerABEL[j] <- power.scABEL(CV = CV, theta0 = theta0[j], n = n, design = design, nsims = nsims) powerABEL.ad[j] <- power.scABEL(alpha = ABEL.ad, CV = CV, theta0 = theta0[j], n = n, design = design, nsims = nsims) powerRSABE[j] <- power.RSABE(CV = CV, theta0 = theta0[j], n = n, design = design, nsims = nsims) powerRSABE.ad[j] <- power.RSABE(alpha = RSABE.ad, CV = CV, theta0 = theta0[j], n = n, design = design, nsims = nsims) } plot(theta0, powerABE, type = "n", log = "x", lwd = 2, las = 1, ylab = "chance of passing") grid() col <- c("#00AA00", "red", "blue", "magenta", "grey25") abline(v = c(0.80, 1.25), col = "grey75") abline(h = 0.05, lty = 2, col = "red") lines(theta0, powerABE, lwd = 2, col = col[1]) lines(theta0, powerABEL, lwd = 2, col = col[2]) lines(theta0, powerABEL.adj, lwd = 2, col = col[3]) lines(theta0, powerRSABE, lwd = 2, col = col[4]) lines(theta0, powerRSABE.ad, lwd = 2, col = col[5]) legend("center", bg = "white", box.lty = 0, text.col = col, legend = c("ABE", "ABEL (\u03B1 0.05)", paste0("ABEL (\u03B1 ", signif(ABEL.adj, 3), ")"), "RSABE (\u03B1 0.05)", paste0("RSABE (\u03B1 ", signif(RSABE.adj, 3), ")"))) powerABE[which(theta0 == 0.80 | theta0 == 1.25)] # [1] 0.05 0.05 powerABEL[which(theta0 == 0.80 | theta0 == 1.25)] # [1] 0.081285 0.081626 powerABEL.ad[which(theta0 == 0.80 | theta0 == 1.25)] # [1] 0.049751 0.050000

With a true CV of 30% we are not allowed to scale but the chance of passing with ABEL is higher than with ABE.
In ~50% of studies we will observe a CV of >30% ($s_{wR} >0.294$) and expand the limits although the drug is not highly variable in the population ($\sigma_{wR} \leq0.294$). The fact that more than 5% pass at each of the borders of the acceptance range is a nasty side effect.

❝ I can't imagine how to definite TIE in this case.

In analogy to ABE (where the TIE is the chance of passing at the borders of the acceptance range) all authors (with one exception¹) of papers dealing with RSABE/ABEL employed the borders of expanded limits. IMHO, that’s a natural choice.
Davit et al.¹ distinguished between the ‘implied limits’ and the limits of the ‘desired consumer risk model’. The FDA assessed the TIE at the border of the latter, which decreases the TIE. I believe it that the FDA desires something but in actual studies one has to follow the guidance ending up with the former…

res <- data.frame(CV = sort(c(seq(0.25, 0.32, 0.01), se2CV(0.25))), impl.L = NA, impl.U = NA, impl.TIE = NA, des.L = 0.80, des.U = 1.25, des.TIE = NA) for (j in 1:nrow(res)) { res[j, 2:3] <- scABEL(CV = res$CV[j], regulator = "FDA") if (CV2se(res$CV[j]) > 0.25) { # Hey presto, hocus-pocus! res[j, 5:6] <- exp(c(-1, +1)*(log(1.25)/0.25)*CV2se(res$CV[j])) } res[j, 4] <- power.RSABE(CV = res$CV[j], theta0 = res[j, 3], design = "2x2x4", n = 32, nsims = 1e6) res[j, 7] <- power.RSABE(CV = res$CV[j], theta0 = res[j, 5], design = "2x2x4", n = 32, nsims = 1e6) } print(signif(res, 4), row.names = FALSE) # CV impl.L impl.U impl.TIE des.L des.U des.TIE # 0.250 0.8000 1.250 0.06068 0.8000 1.250 0.06068 # 0.254 0.8000 1.250 0.06396 0.8000 1.250 0.06396 # 0.260 0.8000 1.250 0.07008 0.7959 1.256 0.05731 # 0.270 0.8000 1.250 0.08352 0.7892 1.267 0.05098 # 0.280 0.8000 1.250 0.10130 0.7825 1.278 0.04810 # 0.290 0.8000 1.250 0.12290 0.7760 1.289 0.04685 # 0.300 0.8000 1.250 0.14710 0.7695 1.300 0.04611 # 0.310 0.7631 1.310 0.04515 0.7631 1.310 0.04515 # 0.320 0.7568 1.321 0.04373 0.7568 1.321 0.04373

❝ […] because it is very expencive to make BE with big samplesize or make theraputic equivalence and it is compromise between regulators and industry.

Yep. That was the original idea of SABE – avoiding extreme sample sizes whilst preserving power. Discussions started already at the first BioInternational conference.² Heck, thirty years ago!

❝ And from my side HARKing is a bad statistics (yes: consumer’s perspective this is not fortunate), …

Agree.

❝ … but it is discussible ...

I’m not sure whether HARKing is the correct term. Given, the Null is constructed post hoc although at least according to a pre-specified procedure.

❝ … some persons recommending that HARKing not be taught by educators, encouraged by reviewers or editors, or practiced by authors

Agree.

Davit BM, Chen ML, Conner DP, Haidar SH, Kim S, Lee CH, Lionberger RA, Makhlouf FT, Nwakama PE, Patel DT, Schuirmann DJ, Yu LX. Implementation of a Reference-Scaled Average Bioequivalence Approach for Highly Variable Generic Drug Products by the US Food and Drug Administration. AAPS J. 2012: 14(4); 915–24. doi:10.1208/s12248-012-9406-x.
McGilveray IJ. An Overview of Problems and Progress at Bio-Internationals ‘89 and ‘92. In: Bio-International 2. Bioavailability, Bioequivalence and Pharmacokinetic Studies. Blume HH, Midha KK, editors. Stuttgart: Medpharm Scientific Publishers; 1995. p. 109–15.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes