Bioequivalence and Bioavailability Forum • Likely it does not work (potentially inflated Type I Error)

Likely it does not work (potentially inflated Type I Error) [Two-Stage / GS Designs]

posted by Helmut – Vienna, Austria, 2023-12-19 12:10 (916 d 07:18 ago) – Posting: # 23796
Views: 9,190

Dear all,

I could not resist and had a closer look (a lengthy [image]

-script at the end).

Say, we have a 2-sequence 4-period (full) replicate design and start the study in 16 subjects (n₁).
We observe a CV_wR of 0.30. Since s_wR < 0.294, we have to go with ABE (no scaling). Power based on a fixed GMR 0.95 is below the target of 0.80. Hence, we initiate a second stage. With Pocock’s adjusted α 0.0294 we recruit 6 subjects (n₂). We observe a CV_wR of 0.30 again and GMR 0.92 in the final analysis of pooled data.
Power will be only 0.7093 since the GMR is worse than assumed. However, the Type I Error will be significantly inflated (0.0861 > α). We would have needed at least an adjusted α of 0.0149 (which is substantially lower than the one we used) in order to control the Type I Error.

Call the script with the example’s data:

RSABE.TSD(adj = 0.0294, design = "2x2x4", n1 = 16, CVwR = 0.3, GMR = 0.95, target = 0.8, CVwR.2 = 0.3, GMR.2 = 0.92) adjusted alpha : 0.0294 (Pocock’s for superiority) design : 2x2x4 n1 : 16 futility on N : none CVwR : 0.3000 (observed) theta1 : 0.8000 (lower implied limit) theta2 : 1.2500 (upper implied limit) power : 0.6812 (estimated) Stage 2 initiated (insufficent power in stage 1) GMR : 0.9500 (fixed) target power : 0.8000 (fixed) n2 : 6 N : 22; less than the FDA’s minimum of 24 subjects! CVwR : 0.3000 (observed) GMR : 0.9200 (observed) theta1 : 0.8000 (lower implied limit) theta2 : 1.2500 (upper implied limit) power : 0.7093 (estimated, may pass RSABE) empirical TIE : 0.0861 (all publications; significantly inflated)) An adjusted alpha of 0.0149 (or less) would be needed to control the Type I Error.

Note that the Type I Error in RSABE depends strongly on the sample size. Hence, even if we re-estimate the sample size with an adjusted α of 0.0149 we would see an inflated Type I Error of 0.0546 due to the larger stage 2 sample size of 10 subjects. For a total sample size (N) of 26 subjects we would need an even smaller adjusted α of 0.01333…
Try a fixed GMR of 0.90 – which is more realistic for HVD(P)s – and you will be surprised.
Note also that in the RSABE-branch (s_wR ≥ 0.294) the empirical Type I Error drops to the adjusted α for CV_wR infinitesimal greater than 30%. Example above changed to:

CVwR.2 = 0.30 + 1e-9

Part of the output:

theta1 : 0.7695 (lower implied limit) theta2 : 1.2996 (upper implied limit) empirical TIE : 0.0294 (all publications)

For any larger CV_wR the empirical Type I Error will be lower than the adjusted α.

Of course, increasing the sample size in stage 1 does not help in the ABE-branch (s_wR < 0.294).

RSABE.TSD(adj = 0.0304, design = "2x2x4", n1 = 38, ...) adjusted alpha : 0.0304 (Pocock’s for equivalence) design : 2x2x4 n1 : 38 futility on N : none CVwR : 0.3000 (observed) theta1 : 0.8000 (lower implied limit) theta2 : 1.2500 (upper implied limit) power : 0.9708 (estimated) Study stopped in stage 1 (sufficient power) empirical TIE : 0.1122 (all publications; significantly inflated) An adjusted alpha of 0.0100 (or less) would be needed to control the Type I Error.

Let’s go fully adaptive, i.e., use the observed stage 1 GMR rather than a fixed one. In the final analysis the GMR is worse than in the first stage and the CV_wR lower. We use some of the defaults. We have to increase our target power. That’s a guessing game because in the interim we don’t know what will happen in the second stage.

RSABE.TSD(adj = 0.0304, design = "2x2x4", n1 = 24, CVwR = 0.35, GMR = 0.9, usePE = TRUE, target = 0.9, CVwR.2 = 0.32, GMR.2 = 0.88) adjusted alpha : 0.0304 (Pocock’s for equivalence) design : 2x2x4 n1 : 24 futility on N : none CVwR : 0.3500 (observed) theta1 : 0.7383 (lower implied limit) theta2 : 1.3545 (upper implied limit) power : 0.6906 (estimated) Stage 2 initiated (insufficent power in stage 1) GMR : 0.9000 (observed) target power : 0.9000 (fixed) n2 : 20 N : 44 CVwR : 0.3200 (observed) GMR : 0.8800 (observed) theta1 : 0.7568 (lower implied limit) theta2 : 1.3214 (upper implied limit) power : 0.7762 (estimated, may pass RSABE) empirical TIE : 0.0271 (all publications

Only if you are a devout follower of the FDA church and believe in the ‘desired consumer risk model’,¹ run the first example with

RSABE.TSD(adj = 0.0294, design = "2x2x4", n1 = 16, CVwR = 0.3, GMR = 0.95, target = 0.8, CVwR.2 = 0.3, GMR.2 = 0.92, risk = TRUE) adjusted alpha : 0.0294 (Pocock’s for superiority) design : 2x2x4 n1 : 16 futility on N : none CVwR : 0.3000 (observed) theta1 : 0.8000 (lower implied limit) 0.7695 (lower limit of the ‘desired consumer risk model’) theta2 : 1.2500 (upper implied limit) 1.2996 (upper limit of the ‘desired consumer risk model’) power : 0.6812 (estimated) Stage 2 initiated (insufficent power in stage 1) GMR : 0.9500 (fixed) target power : 0.8000 (fixed) n2 : 6 N : 22; less than the FDA’s minimum of 24 subjects! CVwR : 0.3000 (observed) GMR : 0.9200 (observed) theta1 : 0.8000 (lower implied limit) 0.7695 (lower limit of the ‘desired consumer risk model’) theta2 : 1.2500 (upper implied limit) 1.2996 (upper limit of the ‘desired consumer risk model’) power : 0.8354 (estimated, may pass RSABE) empirical TIE : 0.0861 (all publications; significantly inflated) 0.0294 (‘desired consumer risk model’) An adjusted alpha of 0.0149 (or less) would be needed to control the Type I Error.

By means of Harry Potter’s magic wand, the inflation of the Type I Error apparently disappears because the null hypothesis is assessed at wider limits.² If you don’t want to chew reference #4 in the post above, maybe the first four slides of a contribution to the discussion (5^th GBHI International Workshop. Amsterdam, 28 September 2022) will help.

Davit BM, Chen ML, Conner DP, Haidar SH, Kim S, Lee CH, Lionberger RA, Makhlouf FT, Nwakama PE, Patel DT, Schuirmann DJ, Yu LX. Implementation of a Reference-Scaled Average Bioequivalence Approach for Highly Variable Generic Drug Products by the US Food and Drug Administration. AAPS J. 2012; 14(4): 915–24. doi:10.1208/s12248-012-9406-x. Free Full text.
The FDA’s regulatory constant $\small{\theta_\text{s}=\log_{e}(1.25)/0.25\cong0.8925742\ldots}$
The null hypothesis on inequivalence $\small{\theta_0}$ is assessed in the ‘desired consumer risk model’:
- If $\small{s_\text{wR}\leq0.25}$ with $\small{\theta_0=\left\{0.8000,1.2500\right\}}$
- If $\small{s_\text{wR}>0.25}$ with $\small{\theta_0=\exp(\mp\theta_\text{s}\times s_\text{wR})}$
That’s different to the ‘implied limits’ assessed in all (‼) other publications:
- If $\small{CV_\text{wR}\leq30\%}$ with $\small{\theta_0=\left\{0.8000,1.2500\right\}}$
- If $\small{CV_\text{wR}>30\%}$ with $\small{\theta_0=\exp(\mp\theta_\text{s}\times s_\text{wR})}$
A simple script to calculate the null hypotheses:
- nulls <- function(CVwR, risk = FALSE) { # null hypotheses in RSABE theta.s <- log(1.25) / 0.25 # regulatory constant swR <- sqrt(log(CVwR^2 + 1)) # within-subject standard deviation of R if (risk) { # ‘desired consumer risk model’ if (swR <= 0.25) { thetas <- c(0.8, 1.25) } else { thetas <- exp(c(-1, +1) * theta.s * swR) } } else { # ‘implied limits’ if (CVwR <= 0.3) { thetas <- c(0.8, 1.25) } else { thetas <- exp(c(-1, +1) * theta.s * swR) } } names(thetas) <- c("H0.1", "H0.2") return(thetas) }
  Examples (CV_wR = 0.27, s_wR ≈ 0.2652645…):
  nulls(CVwR = 0.27, risk = TRUE) H0.1 H0.2 0.7891741 1.2671474 nulls(CVwR = 0.27, risk = FALSE) H0.1 H0.2 0.80 1.25

RSABE.TSD <- function(adj = 0.0294, design = "2x2x4", n1, CVwR, GMR = 0.9, target = 0.8, usePE = FALSE, nmax = Inf, final = TRUE, CVwR.2, GMR.2 = 0.9, risk = FALSE, details = TRUE) { # adj : adjusted alpha (stage 1 and final analysis) like Potvin ‘Method B’ # design : "2x2x2": 2-sequence 4-period full replicate # "2x2x3": 2-sequence 4-period full replicate # "2x3x3": 3-sequence 2-period partial replicate # n1 : stage 1 sample size # CvwR : within-subject CV of R in stage 1 # GMR : T/R-ratio in stage 1 # usePE : FALSE: use the fixed GMR, TRUE: use the observed GMR # nmax : futility on total sample size # final : TRUE : final analysis (requires CVwR.2 and GMR.2) # FALSE: interim analyis only # risk : FALSE: TIE acc. to all publications based on the ‘implied limits’ # TRUE : additionallly TIE acc. to the ‘desired consumer risk model’ # details: TRUE : output to the console # FALSE: data.frame of results if (!design %in% c("2x2x4", "2x2x3", "2x3x3")) stop ("design ", design, " not supported.") if (missing(n1)) stop ("n1 must be given.") if (missing(CVwR)) stop ("CVwR must be given.") if (GMR <= 0.8 | GMR >= 1.25) stop ("GMR must be within 0.8 – 1.25.") if (nmax <= n1) stop ("nmax <= n1 does not make sense.") if (target <= 0.5 | target >= 1) stop ("target ", target, " does not make sense.") if (final) { if (missing(CVwR.2)) stop ("CVwR.2 must be given.") if (missing(GMR.2)) stop ("GMR.2 must be given.") } suppressMessages(require(PowerTOST)) # ≥1.5-4 (2022-02-21) limits <- function(CVwR, risk = FALSE) { # limits thetas <- scABEL(CV = CVwR, regulator = "FDA") # implied if (risk) { # ‘desired consumer risk model’ swR <- CV2se(CVwR) if (swR > 0.25) { thetas <- setNames(exp(c(-1, +1) * log(1.25) / 0.25 * swR), c("lower", "upper")) } else { thetas <- setNames(c(0.8, 1.25), c("lower", "upper")) } } return(thetas) } power <- function(alpha = 0.05, CVwR, GMR, n, design) { return(power.RSABE(alpha = alpha, CV = CVwR, theta0 = GMR, n = n, design = design)) } TIE <- function(alpha = 0.05, CVwR, n, design, risk) { return(power.RSABE(alpha = alpha, CV = CVwR, n = n, theta0 = limits(CVwR, risk)[["upper"]], design = design, nsims = 1e6)) } TIE.1.1 <- TIE.1.2 <- TIE.2.1 <- TIE.2.2 <- NA pwr.1 <- power(adj, CVwR, GMR, n = n1, design) sig <- binom.test(0.05 * 1e6, 1e6, alternative = "less", conf.level = 0.95)$conf.int[2] txt <- paste("adjusted alpha :", sprintf("%.4f", adj)) if (adj == 0.0294) { txt <- paste(txt, "(Pocock’s for superiority)") } else { if (adj == 0.0304) { txt <- paste(txt, "(Pocock’s for equivalence)") } else { if (adj == 0.0250) { txt <- paste(txt, "(Bonferroni’s for two tests)") } else { txt <- paste(txt, "(custom)") } } } txt <- paste(txt, "\ndesign :", design, "\nn1 :", sprintf("%3.0f", n1)) if (nmax < Inf) { txt <- paste(txt, "\nfutility on N :", sprintf("%3.0f", nmax)) } else { txt <- paste(txt, "\nfutility on N : none") } txt <- paste(txt, "\nCVwR :", sprintf("%.4f (observed)", CVwR), "\ntheta1 :", sprintf("%.4f (lower implied limit)", limits(CVwR)[["lower"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f (lower limit of the ‘desired consumer risk model’)", limits(CVwR, risk)[["lower"]])) } txt <- paste(txt, "\ntheta2 :", sprintf("%.4f (upper implied limit)", limits(CVwR)[["upper"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f (upper limit of the ‘desired consumer risk model’)", limits(CVwR, risk)[["upper"]])) } txt <- paste(txt, "\npower :", sprintf("%.4f (estimated)", pwr.1)) if (pwr.1 >= target) { # stop in the interim TIE.1.1 <- TIE(adj, CVwR, n1, design, risk = FALSE) txt <- paste0(txt, "\nStudy stopped in stage 1 (sufficient power)", "\nempirical TIE :", sprintf(" %.4f", TIE.1.1), " (all publications") if (TIE.1.1 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } if (risk) { TIE.1.2 <- TIE(adj, CVwR, n1, design, risk = TRUE) txt <- paste(txt, "\n ", sprintf("%.4f", TIE.1.2), "(‘desired consumer risk model’") if (TIE.1.2 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } } if (TIE.1.1 > sig) { req <- scABEL.ad(alpha.pre = adj, theta0 = GMR, CV = CVwR, design = design, regulator = "FDA", n = n1, print = FALSE, details = FALSE)[["alpha.adj"]] txt <- paste(txt, "\nAn adjusted alpha of", sprintf("%.4f", req), "(or less)\nwould be needed to control the Type I Error.") } } else { # initiate stage 2 N <- sampleN.RSABE(alpha = adj, CV = CVwR, theta0 = GMR, targetpower = target, design = design, print = FALSE, details = FALSE)[["Sample size"]] if (N > nmax) { txt <- paste(txt, "\nStage 2 not initiated", "(insufficent power in stage 1\nbut total sample size", N, "above futility limit)") } else { if (final) { pwr.2 <- power(adj, CVwR.2, GMR.2, n = N, design) if (GMR.2 <= 0.8 | GMR.2 >= 1.25) { final.est <- FALSE } else { final.est <- TRUE TIE.2.1 <- TIE(adj, CVwR.2, N, design, risk = FALSE) if (risk) TIE.2.2 <- TIE(adj, CVwR.2, N, design, risk = TRUE) } } else { CVwR.2 <- GMR.2 <- pwr.2 <- TIE.2.1 <- TIE.2.2 <- NA theta1.1 <- theta1.2 <- req <- NA } txt <- paste(txt, "\nStage 2 initiated (insufficent power in stage 1)", "\nGMR :", sprintf("%.4f", GMR)) ifelse (usePE, txt <- paste(txt, "(observed)"), txt <- paste(txt, "(fixed)")) txt <- paste(txt, "\ntarget power :", sprintf("%.4f (fixed)", target), "\nn2 :", sprintf("%3.0f", N - n1), "\nN :", sprintf("%3.0f", N)) if (N < 24) txt <- paste0(txt, "; less than the FDA’s minimum of 24 subjects!") if (final) { txt <- paste(txt, "\nCVwR :", sprintf("%.4f (observed)", CVwR.2), "\nGMR :", sprintf("%.4f (observed)", GMR.2), "\ntheta1 :", sprintf("%.4f (lower implied limit)", limits(CVwR.2)[["lower"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f (lower limit of the ‘desired consumer risk model’)", limits(CVwR.2, risk)[["lower"]])) } txt <- paste(txt, "\ntheta2 :", sprintf("%.4f (upper implied limit)", limits(CVwR.2)[["upper"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f (upper limit of the ‘desired consumer risk model’)", limits(CVwR.2, risk)[["upper"]])) } txt <- paste(txt, "\npower :", sprintf("%.4f (estimated,", pwr.2)) ifelse (pwr.2 < 0.5, txt <- paste(txt, "fails RSABE)"), txt <- paste(txt, "may pass RSABE)")) if (final.est) { txt <- paste0(txt, "\nempirical TIE :", sprintf(" %.4f", TIE.2.1), " (all publications") if (TIE.2.1 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f", TIE.2.2), "(‘desired consumer risk model’") if (TIE.2.2 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } } if (TIE.2.1 > sig) { req <- scABEL.ad(alpha.pre = adj, theta0 = GMR.2, CV = CVwR.2, design = design, regulator = "FDA", n = N, print = FALSE, details = FALSE)[["alpha.adj"]] txt <- paste(txt, "\nAn adjusted alpha of", sprintf("%.4f", req), "(or less)\nwould be needed to control the Type I Error.") } } } } } if (details) { # output to the console cat(txt, "\n") } else { # data.frame of results # limits in stage 1 L.1.1 <- limits(CVwR, FALSE)[["lower"]] U.1.1 <- limits(CVwR, FALSE)[["upper"]] L.1.2 <- U.1.2 <- NA if (risk) { L.1.2 <- limits(CVwR, TRUE)[["lower"]] U.1.2 <- limits(CVwR, TRUE)[["upper"]] } if (final) { # limits in the final analysis L.2.1 <- limits(CVwR.2, FALSE)[["lower"]] U.2.1 <- limits(CVwR.2, FALSE)[["upper"]] L.2.2 <- U.2.2 <- NA if (risk) { L.2.2 <- limits(CVwR.2, TRUE)[["lower"]] U.2.2 <- limits(CVwR.2, TRUE)[["upper"]] } } else { L.2.1 <- U.2.1 <- L.2.2 <- U.2.2 <- NA } result <- data.frame(alpha.adj = adj, design = design, n1 = n1, CVwR = CVwR, GMR = GMR, usePE = usePE, nmax = nmax, risk.model = risk, L.1.1 = L.1.1, U.1.1 = U.1.1, L.1.2 = L.1.2, U.1.2 = U.1.2, power.1 = pwr.1, TIE.1.1 = TIE.1.1, TIE.1.2 = TIE.1.2, n2 = N - n1, N = N, CVwR.2 = CVwR.2, GMR.2 = GMR.2, L.2.1 = L.2.1, U.2.1 = U.2.1, L.2.2 = L.2.2, U.2.2 = U.2.2, power.2 = pwr.2, TIE.2.1 = TIE.2.1, TIE.2.2 = TIE.2.2, alpha.req = req) result <- result[, colSums(is.na(result)) < nrow(result)] return(result) } }

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

Adaptive Design for the FDA’s RSABE? Helmut 2023-12-18 11:20 [Two-Stage / GS Designs]
- Likely it does not work (potentially inflated Type I Error)Helmut 2023-12-19 11:10
  - Exploring package adaptIVPT, function rss() Helmut 2023-12-20 13:27
    - Extreme test case Helmut 2023-12-24 13:01
      - Extreme GMR Naksh 2023-12-25 04:16
        
        PE outside {0.80, 1.25} not possible Helmut 2023-12-25 10:54
        
        PE outside {0.80, 1.25} not possible Naksh 2023-12-25 11:42
        
        Forget rss() Helmut 2023-12-25 13:15
        
        Forget rss() Naksh 2023-12-26 04:49
        
        TSD useful at all? Helmut 2023-12-26 12:50