NK
☆    

India,
2023-12-27 13:15
(286 d 18:13 ago)

Posting: # 23813
Views: 2,293
 

 Go/No-go Decision for Pivotal Study [Power / Sample Size]

Dear All,

Greetings!

How to simulate the pilot BE study data?

For example, if the pilot study data available for 14 subjects (Eg. T/R ratio is 84% & 90% CI is 79 to 89), I would like to know if we perform the pivotal study with the same test formulation in higher sample size (based on the intra CV), what would be the results? (Eg. in 36 subjects or 48 subjects).

This will help us to take decision whether go/no go for the pivotal BE study.

Thanks
NK


Edit: Category changed; see also this post #1[Helmut]
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2023-12-27 14:27
(286 d 17:01 ago)

@ NK
Posting: # 23814
Views: 2,017
 

 Problematic T/R-ratio…

Hi NK,

❝ For example, if the pilot study data available for 14 subjects (Eg. T/R ratio is 84% & 90% CI is 79 to 89), I would like to know if we perform the pivotal study with the same test formulation in higher sample size (based on the intra CV), what would be the results? (Eg. in 36 subjects or 48 subjects).

❝ This will help us to take decision whether go/no go for the pivotal BE study.


If you believe (‼) that the CV and T/R-ratio will be exactly realized in the pivotal study, use the ‘carved in stone approach’ (for details see this article). Easy in the [image]-package PowerTOST:

library(PowerTOST)
m       <- 14          # sample size of the pilot study
GMR     <- 0.84        # observed T/R-ratio
lower   <- 0.79        # lower 90% CL
upper   <- 0.89        # upper 90% CL
tgt     <- c(0.8, 0.9) # target (desired) powers of the pivotal study
design  <- "2x2"       # guess
CV      <- signif(CI2CV(lower = lower, upper = upper, n = m), 3)
up2even <- function(x) 2 * (x %/% 2 + as.logical(x %% 2))
stoned1 <- sampleN.TOST(CV = CV, theta0 = GMR, design = design, targetpower = tgt[1],
                        print = FALSE)[["Sample size"]]
stoned2 <- sampleN.TOST(CV = CV, theta0 = GMR, design = design, targetpower = tgt[2],
                        print = FALSE)[["Sample size"]]
n       <- seq(up2even(stoned1 * 0.80), up2even(stoned2 * 1.09), 2)
res     <- data.frame(n = n, power = NA_real_, t1 = tgt[1], a1 = "", t2 = tgt[2], a2 = "")
for (j in seq_along(n)) {
  res$power[j] <- signif(power.TOST(CV = CV, theta0 = GMR,
                         design = design, n = res$n[j]), 4)
  if (n[j] == up2even(stoned1 * 0.80)) res$a1[j] <- "optimistic"
  if (n[j] == stoned1)                 res$a1[j] <- "carved in stone"
  if (n[j] == up2even(stoned1 * 1.09)) res$a1[j] <- "pessimistic"
  if (n[j] == up2even(stoned2 * 0.80)) res$a2[j] <- "optimistic"
  if (n[j] == stoned2)                 res$a2[j] <- "carved in stone"
  if (n[j] == up2even(stoned2 * 1.09)) res$a2[j] <- "pessimistic"
}
names(res)[3:6] <- rep(c("target", "approach"), 2)
txt     <- sprintf("Results for target powers of %.0f and %.0f%%:\n",
                   100 * tgt[1], 100 * tgt[2])
target  <- 0.8 # for the following scripts
cat(txt); print(res, row.names = FALSE, right = FALSE)
Results for target powers of 80 and 90%:
 n  power  target approach        target approach
 36 0.7419 0.8    optimistic      0.9
 38 0.7626 0.8                    0.9
 40 0.7818 0.8                    0.9
 42 0.7997 0.8                    0.9
 44 0.8162 0.8    carved in stone 0.9
 46 0.8315 0.8                    0.9
 48 0.8457 0.8    pessimistic     0.9    optimistic
 50 0.8587 0.8                    0.9
 52 0.8708 0.8                    0.9
 54 0.8819 0.8                    0.9
 56 0.8921 0.8                    0.9
 58 0.9015 0.8                    0.9    carved in stone
 60 0.9102 0.8                    0.9
 62 0.9181 0.8                    0.9
 64 0.9254 0.8                    0.9    pessimistic

Assuming a CV of 8.86% and T/R-ratio of 0.84 you achieve at least 80% power with 44 subjects and at least 90% with 58. You could also perform bootstrapping (some ideas in this post and followings) though I’m not convinced whether it is useful.

However, both the CV and the T/R-ratio are estimates, i.e., are uncertain (the degree of uncertainty depends on the sample size of the pilot study). Power – and hence, the sample size – is less sensitive to the CV than to the T/R-ratio. The latter is a killer, especially in your case which is so close to the lower BE-limit:

f      <- function(x, obj) power.TOST(theta0 = x, CV = CV, design = design, n = n) - obj
stoned <- sampleN.TOST(CV = CV, theta0 = GMR, design = design, targetpower = target, print = FALSE)
n      <- stoned[["Sample size"]]
pwr    <- 100 * stoned[["Achieved power"]]
obj    <- c(50, 70)
GMRmin <- uniroot(f, obj = obj[1] / 100, interval = c(0.8, 1), tol = 1e-12)$root
GMR0.7 <- uniroot(f, obj = obj[2] / 100, interval = c(0.8, 1), tol = 1e-12)$root
GMRs   <- sort(unique(c(GMRmin, GMR0.7, GMR, seq(0.8, 0.9, length.out = 201))))
power  <- numeric(length(GMRs))
for (j in seq_along(GMRs)) {
  power[j] <- 100 * power.TOST(CV = CV, theta0 = GMRs[j], design = design, n = n)
}
clr    <- c("red", "blue", "darkgreen")
plot(GMRs, power, type = "n", ylim = c(0, 100), xlab = "GMR", axes = FALSE,
     xaxs = "i", yaxs = "i", font.main = 1,
     main = sprintf("%s design, CV = %.3g%%: n = %.0f", design, 100 * CV, n))
x.axis <- seq(0.8, 0.9, 0.025)
y.axis <- 100 * c(0.05, 0.5, 0.7, seq(0.2, 1, 0.2))
abline(v = x.axis, h = y.axis, col = "lightgrey", lty = 3)
lines(x = c(rep(GMRmin, 2), 0), y = c(0, rep(obj[1], 2)), lwd = 2, lty = 3, col = clr[1])
lines(GMRs[GMRs <= GMRmin], power[GMRs <= GMRmin], col = clr[1], lwd = 3)
mtext(1, line = 2.1, at = GMRmin, text = sprintf("%.4g", GMRmin), cex = 0.75, col = clr[1])
lines(x = c(rep(GMR0.7, 2), 0), y = c(0, rep(obj[2], 2)), lwd = 2, lty = 2, col = clr[2])
lines(GMRs[GMRs >= GMRmin & power <= pwr], power[GMRs >= GMRmin & power <= pwr],
      col = clr[2], lwd = 3)
mtext(1, line = 2.1, at = GMR0.7, text = sprintf("%.4g", GMR0.7), cex = 0.75, col = clr[2])
lines(x = c(rep(GMR, 2), 0), y = c(0, rep(pwr, 2)), lwd = 2, col = clr[3])
lines(GMRs[power >= pwr], power[power >= pwr], col = clr[3], lwd = 3)
mtext(1, line = 2.1, at = GMR, text = sprintf("%.4g", GMR), cex = 0.75, col = clr[3])
axis(1, at = x.axis, labels = sprintf("%.3f", x.axis))
axis(1, at = c(GMRmin, GMR0.7, GMR), labels = FALSE)
axis(1, at = seq(0.8, 0.9, 0.005), labels = FALSE, tcl = -0.25)
axis(2, at = y.axis, labels = sprintf("%.0f%%", y.axis), las = 1)
axis(2, at = c(5, seq(10, 90, 10)), labels = FALSE, tcl = -0.25)
box()
cat("With", n, "subjects and", sprintf("GMR = %.4g", GMR0.7), "power will be",
    "only 70%;", sprintf("any GMR < %.4g", GMRmin), "will fail BE.\n")

[image]
With 44 subjects and GMR = 0.834 power will be only 70%; any GMR < 0.8256 will fail BE.

That’s why the ‘carved in stone approach’ is not a particularly good idea.

Let’s explore some combinations of CVs and T/R-ratios:

sampleN.TOST.vec <- function(CVs, GMRs, ...) {
  n <- matrix(ncol = length(CVs), nrow = length(GMRs))
  for (j in seq_along(GMRs)) {
    for (k in seq_along(CVs)) {
      n[j, k] <- sampleN.TOST(CV = CVs[k], theta0 = GMRs[j], design = design, targetpower = target,
                              print = FALSE)[["Sample size"]]
    }
  }
  dec         <- function(x) match(TRUE, round(x, 1:15) == x)
  fmt.col     <- paste0("CV=%.",  max(sapply(100 * CVs,  dec), na.rm = TRUE), "f%%")
  fmt.row     <- paste0("GMR=%.", max(sapply(GMRs, dec), na.rm = TRUE), "f")
  colnames(n) <- sprintf(fmt.col, 100 * CVs)
  rownames(n) <- sprintf(fmt.row, GMRs)
  return(as.data.frame(n))
}
CVs  <- sort(unique(c(CV, seq(0.08, 0.1, 0.005))))
GMRs <- seq(0.82, 0.86, 0.01)
res  <- sampleN.TOST.vec(CVs, GMRs, design, target)
cat("Sample sizes to achieve at least", sprintf("%2g%% power:", 100 * target), "\n"); print(res)
Sample sizes to achieve at least 80% power:
         CV=8.00% CV=8.50% CV=8.86% CV=9.00% CV=9.50% CV=10.00%
GMR=0.82      132      148      160      166      184       204
GMR=0.83       60       68       74       76       84        94
GMR=0.84       36       40       44       44       50        54
GMR=0.85       24       26       28       30       32        36
GMR=0.86       18       20       20       22       24        26

If you assume an only slightly ‘worse’ T/R-ratio of 0.82 you would need already 160 subjects to achieve ≥80% power. For details see also the article about prospective power estimation.

Bayesian methods based on the expected power are implemented in PowerTOST, which take the uncertainty of estimates obtained in the pilot study into account.
  • Uncertain CV
    res1 <- expsampleN.TOST(CV = CV, theta0 = GMR, targetpower = target, design = design,
                            prior.parm = list(m = m, design = design), prior.type = "CV",
                            details = FALSE, print = FALSE)
    cat("Sample size estimation based on uncertain CV:",
        sprintf("\nExpected power of %.4f with %.0f subjects.\n",
                res1[["Achieved power"]], res1[["Sample size"]]))
    Sample size estimation based on uncertain CV:
    Expected power of 0.8016 with 48 subjects.

    9% more subjects than in the ‘carved in stone approach’. However, the CV is not the main problem.

  • Uncertain T/R-ratio
    res2 <- expsampleN.TOST(CV = CV, theta0 = GMR, targetpower = target, design = design,
                            prior.parm = list(m = m, design = design), prior.type = "theta0",
                            details = FALSE, print = FALSE)
    cat("Sample size estimation based on uncertain T/R-ratio:",
        sprintf("\nExpected power of %.4f with %.0f subjects.\n",
                res2[["Achieved power"]], res2[["Sample size"]]))
    Sample size estimation based on uncertain T/R-ratio:
    Expected power of 0.8013 with 120 subjects.

    That hurts! If you propose that to your boss, likely you get fired.

  • Uncertain CV and T/R-ratio
    res3 <- expsampleN.TOST(CV = CV, theta0 = GMR, targetpower = target, design = design,
                            prior.parm = list(m = m, design = design), prior.type = "both",
                            details = FALSE, print = FALSE)
    cat("Sample size estimation based on uncertain CV and T/R-ratio:",
        sprintf("\nExpected power of %.4f with %.0f subjects.\n",
                res3[["Achieved power"]], res3[["Sample size"]]))
    Sample size estimation based on uncertain CV and T/R-ratio:
    Expected power of 0.8005 with 146 subjects.

    Ouch! Even with such an extreme sample size there is still a 20% chance of failure. If you want 90% power, you would need thousands (‼) of subjects…
See also this presentation (BioBridges. Prague. September 2017). An alternative would be a fully adaptive two-stage design with certain futility rules (5th GBHI. Amsterdam. September 2022).

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2024-01-01 14:50
(281 d 16:39 ago)

@ Helmut
Posting: # 23816
Views: 1,746
 

 TSD?

Hi NK and all,

let’s try a fully adaptive two-stage design. As usual an [image]-script at the end.

I used the results of your pilot study and – given the ‘bad’ T/R-ratio started with 24 subjects. For the sample size re-estimation I used the same T/R-ratio and targeted 80% power. I had two futility criteria: The PE outside the acceptance range and a maximum total sample size of 60. In the hypothetical second stage I assumed the same CV like in the first.

Maurer(n1 = 24, CV1 = 0.0886, GMR1 = 0.84)
TSD with 2x2 crossover
Inverse Normal approach
 - Maximum combination test with weights for stage 1 = 0.5 0.25
 - Significance levels (s1/s2) = 0.02635 0.02635
 - Critical values (s1/s2) = 1.93741 1.93741
 - BE acceptance range = 0.8 ... 1.25
 - Observed point estimate from stage 1 is used for SSR
 - With conditional error rates and conditional estimated target power

Interim analysis after first stage
- Derived key statistics:
  z1 = 1.81793, z2 = 7.32955
  Repeated CI = (0.79722, 0.88508)
  Median unbiased estimate = NA
- No futility criterion met
- Test for BE not positive (not considering any futility rule)
- Calculated n2 = 12
- Decision: Continue to stage 2 with 12 subjects

Results of the final analysis with adjusted alpha = 0.02635. CV
identical in both stages and various GMRs in the second stage.
 GMR1 n1  GMR2 n2     PE  lower  upper  pass
 0.84 24 0.820 12 0.8298 0.7965 0.8686 FALSE
 0.84 24 0.825 12 0.8319 0.7986 0.8706 FALSE
 0.84 24 0.830 12 0.8340 0.8006 0.8727  TRUE
 0.84 24 0.835 12 0.8360 0.8027 0.8748  TRUE
 0.84 24 0.840 12 0.8381 0.8046 0.8769  TRUE
 0.84 24 0.845 12 0.8406 0.8066 0.8791  TRUE
 0.84 24 0.850 12 0.8433 0.8085 0.8812  TRUE
 0.84 24 0.855 12 0.8460 0.8104 0.8834  TRUE
 0.84 24 0.860 12 0.8487 0.8122 0.8856  TRUE

Even with a T/R-ratio of 0.83 in the second stage (which is worse than the one of the first) you will pass due to the large sample size. Well, it’s a close shave.

Let’s be optimistic and hope for a T/R-ratio of 0.85:

Maurer(n1 = 24, CV1 = 0.0886, GMR1 = 0.85)
TSD with 2x2 crossover
Inverse Normal approach
 - Maximum combination test with weights for stage 1 = 0.5 0.25
 - Significance levels (s1/s2) = 0.02635 0.02635
 - Critical values (s1/s2) = 1.93741 1.93741
 - BE acceptance range = 0.8 ... 1.25
 - Observed point estimate from stage 1 is used for SSR
 - With conditional error rates and conditional estimated target power

Interim analysis after first stage
- Derived key statistics:
  z1 = 2.21598, z2 = 7.24735
  Repeated CI = (0.80671, 0.89562)
  Median unbiased estimate = 0.8500
- No futility criterion met
- Test for BE positive (not considering any futility rule)
- Decision: Stop due to BE

Mission accomplished.


Maurer <- function(n1, CV1, GMR1, target = 0.8) {
  require(Power2Stage)
  st1    <- interim.tsd.in(weight = c(0.5, 0.25), n1 = n1, CV1 = CV1, GMR1 = GMR1,
                           GMR = GMR1, targetpower = target, usePE = TRUE,
                           fCrit = c("PE", "Nmax"), fCNmax = 60)
  if (st1[["stop_BE"]]) {
    print(st1)
  } else {
    GMR2s  <- seq(0.82, 0.86, 0.005) # GMRs observed in stage 2
    fin    <- data.frame(GMR1 = GMR1, n1 = n1, GMR2 = GMR2s, n2 = st1[["n2"]],
                         PE = NA_real_, lower = NA_real_, upper = NA_real_,
                         pass = FALSE)
    for (j in seq_along(GMR2s)) {
      tmp         <- final.tsd.in(weight = c(0.5, 0.25),
                                  GMR1 = GMR1, CV1 = CV1, n1 = n1,
                                  GMR2 = GMR2s[j], CV2 = CV1, n2 = st1[["n2"]])
      fin[j, 5]   <- tmp[["MEUE"]]
      fin[j, 6:7] <- tmp[["RCI"]]
      fin[j, 8]   <- tmp[["stop_BE"]]
    }
    txt   <- paste0("\nResults of the final analysis with adjusted alpha = ",
                    signif(st1[["alpha"]][[1]], 4), ". CV\nidentical in both ",
                    "stages and various GMRs in the second stage.\n")
    print(st1); cat(txt); print(fin, digits = 4, row.names = FALSE)
  }
}


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
mittyri
★★  

Russia,
2024-01-01 20:54
(281 d 10:34 ago)

@ Helmut
Posting: # 23817
Views: 1,723
 

 sample size for TSD is better than for 1SD?

Hi Helmut,

naive question

Maurer(n1 = 24, CV1 = 0.0886, GMR1 = 0.84)

<...>

- Calculated n2 = 12

- Decision: Continue to stage 2 with 12 subjects


does it mean that TSD is more attractive due to lower sample size? That's significantly lower than 'Carved in stone'! I did not expect that with corrected PIE

Kind regards,
Mittyri
UA Flag
Activity
 Admin contact
23,249 posts in 4,885 threads, 1,653 registered users;
72 visitors (1 registered, 71 guests [including 4 identified bots]).
Forum time: 08:29 CEST (Europe/Vienna)

I have never in my life learned anything
from any man who agreed with me.    Dudley Field Malone

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5