## Inflated type I error with fixed widened limits [RSABE / ABEL]

Dear Detlew & Osama,

» » ... how could this type I error inflation happen?

» May be Helmut has a better (paedogogical) explanation because he is an exceptionally gifted preacher .

Only you say so. I’ll try.

Let’s start with two power curves (first -script at the end). The 2-sequence 4-period replicate study is designed for ABE with conventional limits, 80% power. Since we are wary, we assume a $$\small{\theta_0}$$ (T/R-ratio) of 0.90 and estimate a sample size of 40. One power curve for the conventional limits as planned (──) and one for the widened limits (──). The probability of a type I error (i.e., the consumer risk) to be treated with a formulation which is not BE (true $$\small{\theta_0}$$ outside the limits) is ≤5%. You see that the blue power curve intersects 0.05 at $$\small{\theta_0=0.80}$$ and $$\small{\theta_0=1.25}$$. That means also the chance of falsely passing BE is 5%. The same is applicable to South Africa’s approach where we could use pre-specified widened limits independent from the observed CV. Then we have a TIE of 0.05 as well (the red power curve intersects 0.05 at $$\small{\theta_0=0.75}$$ and $$\small{\theta_0=1.\dot{3}}$$).

Imagine an even more extreme case than the one Detlew mentioned. We observe in the study a CVwR of 30.01% although the true one is 30%. That means we will use the widened limits of 75.00–133.33% instead of the correct 80.00–125.00%. With the same $$\small{\theta_0=0.80}$$ and $$\small{\theta_0=1.25}$$ we jump to the red power curve and therefore, have a much higher chance of passing (~40% instead of 5%).

Now for the tricky part (sorry): In the previous posts we estimated an inflation of the TIE of ~20%. Why not 40%? Say, the true CVwR is 30% and acc. to the convention the drug is considered to be not highly variable. Hence, we would have to apply the conventional limits. There is a ~50% chance that in the actual study we will observe a CVwR >30% and misclassify the drug/formulation as highly variable and apply the widened limits. But there is also a ~50% chance that we observe a CVwR ≤30% and use the conventional limits. Both together gives the ~20% inflation (actually it is more complicated*).

What if we estimate the sample size for the GCC’s approach? With 28 it will be lower (second -script). Since with a CV of 30% we have to use the conventional limits (power at $$\small{\theta_0=0.80}$$ and $$\small{\theta_0=1.25}$$ will be ~0.16) and we felt into the trap of an inflated type I error. Note that the TIE depends also on the sample size. Hence, it will be smaller than with 40 subjects.

If you think about iteratively adjusted α like for the reference-scaling methods (third -script) – examples for the 2x2x4 design (sample sizes estimated for the GCC’s approach):

CV = 0.150, n = 12 (power = 0.8496) TIE = 0.050148: inflated (α-adjustment necessary).  2 iterations: adjusted α = 0.049854 (90.029% CI), TIE = 0.05 (power = 0.8493). CV = 0.200, n = 18 (power = 0.8015) TIE = 0.050910: inflated (α-adjustment necessary).  7 iterations: adjusted α = 0.049082 (90.184% CI), TIE = 0.05 (power = 0.7988). CV = 0.250, n = 28 (power = 0.8220) TIE = 0.072558: inflated (α-adjustment necessary).  5 iterations: adjusted α = 0.032075 (93.585% CI), TIE = 0.05 (power = 0.7644). CV = 0.300, n = 28 (power = 0.8079) TIE = 0.164829: inflated (α-adjustment necessary).  8 iterations: adjusted α = 0.009071 (98.186% CI), TIE = 0.05 (power = 0.5859). CV = 0.301, n = 28 (power = 0.8085) TIE = 0.022390: not inflated (no α-adjustment necessary).

However, we could face a substantial loss in power (for CV 30% and the adjusted α of ~0.0091 it would drop from 81% to 59%). 1. Power-curves for fixed limits

library(PowerTOST) design <- "2x2x4" # any replicate - see known.designs() CV     <- 0.30 target <- 0.80 theta0 <- 0.90 n      <- sampleN.TOST(CV = CV, theta0 = theta0, targetpower = target,                        design = design, details = FALSE,                        print = FALSE)[["Sample size"]] theta0 <- c(theta0, seq(0.75*0.96, 1, length.out = 60)) theta0 <- sort(unique(c(theta0, 0.8, 1, 1.25, 1/theta0))) res    <- data.frame(theta0 = theta0) for (j in seq_along(theta0)) {   res$pwr1[j] <- power.TOST(CV = CV, n = n, theta0 = theta0[j], theta1 = 0.80, design = design) res$pwr2[j] <- power.TOST(CV = CV, n = n, theta0 = theta0[j],                             theta1 = 0.75, design = design) } if (is.null(attr(dev.list(), "names")))   windows(width = 4.5, height = 3.3, record = TRUE) op  <- par(no.readonly = TRUE) par(mar = c(4, 4, 0, 0) + 0.1, cex.axis = 0.8) plot(log(theta0), res$pwr1, type = "n", axes = FALSE, yaxs = "i", xlim = log(c(0.75, 1/0.75)), ylim = c(0, 1.04), xlab = expression(theta), ylab = "power") axis(2, las = 1) axis(2, at = 0.05, las = 1) axis(1, at = log(c(0.75, seq(0.8, 1.2, 0.1), 1.25, 1/0.75)), labels = sprintf("%.2f", c(0.75, seq(0.8, 1.2, 0.1), 1.25, 1/0.75))) abline(h = pretty(c(0, 1)), lty = 3, col = "lightgrey") abline(h = 0.05, lty = 2) abline(v = log(c(0.9, 1, 1.1, 1.2)), lty = 3, col = "lightgrey") abline(v = log(c(0.75, 0.8, 1.25, 1/0.75)), lty = 2) box() lines(log(theta0), res$pwr1, lwd = 3, col = "blue") lines(log(theta0), res$pwr2, lwd = 3, col = "red") par(op) 2. Power curves for sample sizes acc. to the GCC’s approach library(PowerTOST) design <- "2x2x4" # only "2x2x4", "2x2x3", "2x3x3" implemented! CV <- 0.30 target <- 0.80 theta0 <- 0.90 reg <- reg_const("user", r_const = log(1/0.75)/CV2se(0.3), CVswitch = 0.3, CVcap = 0.3, pe_const = TRUE) n <- sampleN.scABEL(CV = CV, theta0 = theta0, targetpower = target, design = design, regulator = reg, details = FALSE, print = FALSE)[["Sample size"]] theta0 <- c(theta0, seq(0.75*0.96, 1, length.out = 60)) theta0 <- sort(unique(c(theta0, 0.8, 1, 1.25, 1/theta0))) res <- data.frame(theta0 = theta0) for (j in seq_along(theta0)) { res$pwr[j] <- power.scABEL(CV = CV, n = n, theta0 = theta0[j],                              theta1 = 0.80, design = design,                              regulator = reg) } if (is.null(attr(dev.list(), "names")))   windows(width = 4.5, height = 3.3, record = TRUE) op  <- par(no.readonly = TRUE) par(mar = c(4, 4, 0, 0) + 0.1, cex.axis = 0.8) plot(log(theta0), res$pwr, type = "n", axes = FALSE, yaxs = "i", xlim = log(c(0.75, 1/0.75)), ylim = c(0, 1.04), xlab = expression(theta), ylab = "power") axis(2, las = 1) axis(2, at = 0.05, las = 1) axis(1, at = log(c(0.75, seq(0.8, 1.2, 0.1), 1.25, 1/0.75)), labels = sprintf("%.2f", c(0.75, seq(0.8, 1.2, 0.1), 1.25, 1/0.75))) abline(h = pretty(c(0, 1)), lty = 3, col = "lightgrey") abline(h = 0.05, lty = 2) abline(v = log(c(0.9, 1, 1.1, 1.2)), lty = 3, col = "lightgrey") abline(v = log(c(0.8, 1.25)), lty = 2) if (CV > 0.3) abline(v = log(c(0.75, 1/0.75)), lty = 2) box() lines(log(theta0), res$pwr, lwd = 3, col = "blue") par(op)

3. Iteratively adjusted α for the GCC’s approach

library(PowerTOST) opt <- function(x) {   if (sdsims) {     power.scABEL.sds(alpha = x, CV = CV, n = n, theta0 = U, design = design,                      regulator = reg, nsims = 1e6, setseed = setseed,                      progress = FALSE) - alpha   } else {     power.scABEL(alpha = x, CV = CV, n = n, theta0 = U, design = design,                  regulator = reg, nsims = 1e6, setseed = setseed) - alpha   } } design  <- "2x2x4" # only "2x2x4", "2x2x3", "2x3x3" implemented! alpha   <- 0.05    # nominal level of the test CV      <- 0.30    # can be a 2-element vector: CV for T, CV for R theta0  <- 0.90    # assumed T/R-ratio target  <- 0.80    # target (desired) power setseed <- TRUE    # for reproducibility sdsims  <- FALSE   # set to TRUE for partial replicate (much slower) reg     <- reg_const("user", r_const = log(1/0.75)/CV2se(0.3),                      CVswitch = 0.3, CVcap = 0.3, pe_const = TRUE) # power and sample size for the GCC’s approach x       <- sampleN.scABEL(CV = CV, theta0 = theta0, design = design,                           targetpower = target, regulator = reg,                           details = FALSE, print = FALSE) power   <- x[["Achieved power"]] n       <- x[["Sample size"]] U       <- scABEL(CV = CV, regulator = reg)[["upper"]] power   <- power.TOST(CV = CV, n = n, theta0 = theta0, design = design) if (!sdsims) {     # simulate underlying statistics   TIE <- power.scABEL(CV = CV, n = n, theta0 = U, design = design,                       regulator = reg, nsims = 1e6) } else {           # simulate subject data   TIE <- power.scABEL.sds(CV = CV, n = n, theta0 = U, design = design,                           regulator = reg, nsims = 1e6, progress = FALSE) } txt <- paste0("CV = ", sprintf("%.3f", CV), ", n = ", n,               " (power = ", sprintf("%.4f)", power)) if (TIE <= alpha) {   txt <- paste0(txt, "\nTIE = ", sprintf("%.6f", TIE), ": not inflated ",                 "(no \u03B1-adjustment necessary).\n") } else {   txt <- paste0(txt, "\nTIE = ", sprintf("%.6f", TIE), ": inflated ",                 "(\u03B1-adjustment necessary).")   x   <- uniroot(opt, interval = c(0, alpha), tol = 1e-8)   alpha.adj <- x$root TIE.adj <- alpha + x$f.root   power.adj <- power.TOST(alpha = alpha.adj, CV = CV, n = n, theta0 = theta0,                           design = design)   txt <- paste0(txt, "\n", sprintf("%2i", x\$iter), " iterations: ",                 "adjusted \u03B1 = ", sprintf("%.6f", alpha.adj), " (",                 sprintf("%.3f%%", 100*(1-2*alpha.adj)), " CI), TIE = ",                 TIE.adj, " (power = ", sprintf("%.4f)", power.adj), ".\n") } cat(txt)

• $$\small{\theta_0}$$ follows a lognormal distribution and $$\small{CV_\textrm{wR}}$$ a $$\small{\chi^2}$$ distribution. Both distributions are not symmetric but skewed to the right: Hence, at a true $$\small{\theta_0}$$ of 1.25 and a true $$\small{CV_\textrm{wR}}$$ of 30% in a particular study the chance of a classify the drug falsely as highly variable (based on the observed $$\small{CV_\textrm{wR}}$$) and proceed with scaling is slightly higher than 50%.

Edit: We implemented regulator = "GCC" in version 1.5-3 (2021-01-18) of PowerTOST. Example:

 library(PowerTOST) design  <- "2x2x4 CV      <- 0.30 theta0  <- 0.90 target  <- 0.80 n       <- sampleN.scABEL(CV = CV, theta0 = theta0, design = design,                           targetpower = target, regulator = "GCC",                           details = FALSE, print = FALSE)[["Sample size"]] scABEL.ad(CV = CV, theta0 = theta0, design = design, n = n,           regulator = "GCC", details = TRUE) +++++++++++ scaled (widened) ABEL ++++++++++++          iteratively adjusted alpha    (simulations based on ANOVA evaluation) ---------------------------------------------- Study design: 2x2x4 (4 period full replicate) log-transformed data (multiplicative model) 1,000,000 studies in each iteration simulated. CVwR 0.3, CVwT 0.3, n(i) 14|14 (N 28) Nominal alpha                 : 0.05 True ratio                    : 0.9000 Regulatory settings           : GCC (ABE) Switching CVwR                : 0.3 BE limits                     : 0.8000 ... 1.2500 PE constraints                : 0.8000 ... 1.2500 Empiric TIE for alpha 0.0500  : 0.16483 (rel. change of risk: +230%) Power for theta0 0.9000       : 0.808 Iteratively adjusted alpha    : 0.00907 Empiric TIE for adjusted alpha: 0.05000 Power for theta0 0.9000       : 0.586 (rel. impact: -27.5%) Runtime    : 6.6 seconds Simulations: 8,100,000 (7 iterations)

The impact on power is massive. Which sample size would we need to maintain the target power?

 sampleN.scABEL.ad(CV = CV, theta0 = theta0, design = design,                   targetpower = target, regulator = "GCC",                   details = TRUE) +++++++++++ scaled (widened) ABEL ++++++++++++             Sample size estimation         for iteratively adjusted alpha    (simulations based on ANOVA evaluation) ---------------------------------------------- Study design: 2x2x4 (4 period full replicate) log-transformed data (multiplicative model) 1,000,000 studies in each iteration simulated. Assumed CVwR 0.3, CVwT 0.3 Nominal alpha      : 0.05 True ratio         : 0.9000 Target power       : 0.8 Regulatory settings: GCC (ABE) Switching CVwR     : 0.3 BE limits          : 0.8000 ... 1.2500 PE constraints     : 0.8000 ... 1.2500 n  28, nomin. alpha: 0.05000 (power 0.8079), TIE: 0.1648 Sample size search and iteratively adjusting alpha n  28,   adj. alpha: 0.00907 (power 0.5859), rel. impact on power: -27.48% n  48,   adj. alpha: 0.00343 (power 0.7237) n  46,   adj. alpha: 0.00376 (power 0.7136) n  48,   adj. alpha: 0.00343 (power 0.7237) n  50,   adj. alpha: 0.00313 (power 0.7330) n  52,   adj. alpha: 0.00283 (power 0.7402) n  54,   adj. alpha: 0.00258 (power 0.7490) n  56,   adj. alpha: 0.00233 (power 0.7554) n  58,   adj. alpha: 0.00215 (power 0.7641) n  60,   adj. alpha: 0.00198 (power 0.7703) n  62,   adj. alpha: 0.00180 (power 0.7789) n  64,   adj. alpha: 0.00164 (power 0.7851) n  66,   adj. alpha: 0.00152 (power 0.7909) n  68,   adj. alpha: 0.00138 (power 0.7958) n  70,   adj. alpha: 0.00126 (power 0.8010), TIE: 0.05000 Compared to nominal alpha's sample size increase of 150.0% (~study costs). Runtime    : 96.4 seconds Simulations: 120,700,000

Oops! Since the TIE depends on the sample size itself (see the plots in this post), we have to adjust more.

Dif-tor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes  Ing. Helmut Schütz 