Helmut
★★★
avatar
Homepage
Vienna, Austria,
2018-04-21 19:17
(2364 d 02:11 ago)

Posting: # 18714
Views: 19,830
 

 Finally: Exact TSD methods for 2×2 crossover designs [Two-Stage / GS Designs]

Dear all,

in version 0.5-1 of package Power2Stage exact methods1,2,3 are implemented – after months of struggling (many THX to Ben). The methods are extremely flexible (arbitrary BE-limits and target power, futility criteria on the PE, its CI, and the maximum total sample size, adapting for the PE of stage 1).

I’ve heard in the past that regulatory statisticians in the EU prefer methods which strictly control the Type I Error (however, at the 3rd GBHI conference in Amsterdam last week it was clear that methods based on simulations are perfectly fine for the FDA) and the inverse normal method with repeated confidence intervals would be the method of choice. Well roared lion; wasn’t aware of software which can do this job. That’s like saying “Fly to Mars but you are not allowed to use a rocket!” What else? Levitation? Witchcraft? Obtaining two p-values (like in TOST) is fairly easy but to convert them into a confidence interval (as required in all guidelines) not trivial.
Despite we showed this approach4 a while ago, nothing was published in a peer-reviewed journal until very recently.
Although we have a method now which demonstrated to control the TIE, I was curious how it performs in simulations (just to set it into perspective).
R-code at the end of the post (with small step sizes of CV and n1 expect runtimes of some hours; in large simulations I don’t recommend pmethod="exact" – about 10times slower than pmethod="nct"). See the documentation of function power.tsd.in() about how to set futility criteria and make it fully adaptive. As usual in the latter case say goodbye to power…

I explored scenarios, maximum combination test
  1. Conventional BE-limits, GMR 0.95, target power 0.90, CV 0.10–0.60, n1 12–72, no futility
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.21 56  0.05     0.05025     no
    print(x$power.min, row.names=FALSE)
      CV n1 target minimum power
     0.6 12    0.8        0.7902

  2. Conventional BE-limits, GMR 0.95, target power 0.80, CV 0.10–0.60, n1 12–72, no futility
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.38 48  0.05    0.050164     no
    print(x$power.min, row.names=FALSE)
      CV n1 target minimum power
     0.6 12    0.9       0.86941

  3. Conventional BE-limits, GMR 0.90, target power 0.80, CV 0.10–0.60, n1 12–72, no futility
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.32 44  0.05      0.0503     no
    print(x$power.min, row.names=FALSE)
      CV n1 target minimum power
     0.6 12    0.9      0.78578

  4. Conventional BE-limits, GMR 0.90, target power 0.90, CV 0.10–0.60, n1 12–72, no futility
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.46 70  0.05    0.050246     no
    print(x$power.min, row.names=FALSE)
       CV n1 target minimum power
     0.56 12    0.9      0.86528

  5. Conventional BE-limits, GMR 0.95, target power 0.80, CV 0.10–0.30, n1 18–36, futility of 90% CI in stage 1 outside [0.9374, 1.0668], futility of Nmax 42 (similar to Xu et al. ‘Method E’)
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.22 18  0.05    0.029716     no
    print(x$power.min, row.names=FALSE)
      CV n1 target minimum power
     0.3 18    0.8        0.2187

  6. Narrow BE-limits, GMR 0.975, target power 0.90, CV 0.05–0.25, n1 12–72, no futility
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.14 28  0.05    0.050164     no
    print(x$power.min, row.names=FALSE)
      CV n1 target minimum power
     0.2 12    0.9       0.88495

  7. HVD(P), conventional BE-limits (no clinical justification for ABEL), GMR 0.90, target power 0.80, CV 0.30–0.60, n1 60–276, no futility
    print(x$TIE.max, row.names=FALSE)
       CV  n1 alpha maximum TIE signif
     0.55 288  0.05     0.05022     no
    print(x$power.min, row.names=FALSE)
       CV  n1 target minimum power
     0.55 156    0.8        0.7996

As expected in simulations sometimes we get slight inflations of the TIE though they are never significantly >0.05. No news for initiates but may end the winching of regulatory statisticians who have no clue about simulations. Contrary to the simulation methods where the maximum TIE is observed at small n1 combined with high CV, the maximum TIE can be anywhere in the area where ~50% of studies proceed to the second stage. Scenario #5 is overly conservative and lacks power for small n1 and high CV. Not a good idea. More about that later.
Plots of the first four scenarios:

[image]

When exploring the details it is also clear that the exact method keeps the desired power better than the simulation methods in extreme cases.

Power of scenario #5 (a) and modifications:
  1. futility of 90% CI outside [0.9374, 1.0668], futility of Nmax 42: 0.2187
  2. futility of 90% CI outside [0.9500, 1.0526] (the code’s default): 0.80237
  3. futility of 90% CI outside [0.9374, 1.0668]: 0.81086
  4. futility of 90% CI outside [0.9500, 1.0526], futility of Nmax 42: 0.2168
  5. futility of Nmax 42: 0.22373
  6. futility of Nmax 64: 0.55658
  7. futility of Nmax 72: 0.66376
  8. futility of Nmax 96: 0.79136
Given that, it is clear that Nmax might be the bad boy. On the other hand, power collapses only if the chosen n1 was ‘too small’ for the assumed CV. Hence, even we don’t have to worry about the TIE any more, simulations are still useful exploring power.


  1. Wassmer G, Brannath W. Group Sequential and Confirmatory Adaptive Designs in Clinical Trials. Switzerland: Springer; 2016. doi:10.1007/978-3-319-32562-0.
  2. Patterson SD, Jones B. Bioequivalence and Statistics in Clinical Pharmacology. Boca Raton: CRC Press; 2nd edition 2017. ISBN 978-1-4665-8520-1.
  3. Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018;37(10):1–21. doi:10.1002/sim.7614.
  4. König F, Wolfsegger M, Jaki T, Schütz H, Wassmer G. Adaptive two-stage bioequivalence trials with early stopping and sample size re-estimation. Vienna: 2014; 35th Annual Conference of the International Society for Clinical Biostatistics. Poster P1.2.88. doi:10.13140/RG.2.1.5190.0967.

R-code

library(PowerTOST)
library(Power2Stage)
checkMaxCombTest <- function(alpha=0.05, CV.from=0.2, CV.to=0.6,
                             CV.step=0.02, n1.from=12, n1.to=72, n1.step=2,
                             theta1=0.80, theta2=1.25, GMR=0.95, usePE=FALSE,
                             targetpower=0.80, fCrit="No", fClower, fCNmax,
                             pmethod="nct", setseed=TRUE, print=TRUE)
{
  if(packageVersion("Power2Stage") < "0.5.1") {
    txt <- paste0("Requires at least version 0.5-1 of Power2Stage!",
                  "\nPlease install/update from your preferred CRAN-mirror.\n")
    stop(txt)
  } else {
    CV     <- seq(CV.from, CV.to, CV.step)
    n1     <- seq(n1.from, n1.to, n1.step)
    grid   <- matrix(nrow=length(CV), ncol=length(n1), byrow=TRUE,
                     dimnames=list(CV, n1))
    pwr1   <- pct.2 <- pwr <- n.mean <- n.q1 <- n.med <- n.q3 <- grid
    TIE    <- costs.change <- grid
    n      <- integer(length(CV))
    cells  <- length(CV)*length(n1)
    cell   <- 0
    t.0    <- proc.time()[[3]]
    pb     <- txtProgressBar(min=0, max=1, char="\u2588", style=3)
    for (j in seq_along(CV)) {
      n[j] <- sampleN.TOST(alpha=alpha, CV=CV[j], theta0=GMR, theta1=theta1,
                           theta2=theta2, targetpower=targetpower,
                           print=FALSE, details=FALSE)[["Sample size"]]
      if (n[j] < 12) n[j] <- 12
      for (k in seq_along(n1)) {
        # median of expected total sample size as a 'best guess'
        n.tot <- power.tsd.in(alpha=alpha, CV=CV[j], n1=n1[k], GMR=GMR,
                              usePE=usePE, theta1=theta1, theta2=theta2,
                              targetpower=targetpower, fCrit=fCrit,
                              fClower=fClower, fCNmax=fCNmax, pmethod=pmethod,
                              npct=0.5)$nperc[["50%"]]
        w     <- c(n1[k], n.tot - n1[k]) / n.tot
        # force extreme weights if expected to stop in stage 1 with n1
        if (w[1] == 1) w <- w + c(-1, +1) * 1e-6
        res <- power.tsd.in(alpha=alpha, CV=CV[j], n1=n1[k], GMR=GMR,
                            usePE=usePE, theta1=theta1, theta2=theta2,
                            targetpower=targetpower, fCrit=fCrit,
                            fClower=fClower, fCNmax=fCNmax, pmethod=pmethod,
                            npct=c(0.25, 0.50, 0.75), weight=w,
                            setseed=setseed)
        pwr1[j, k]   <- res$pBE_s1
        pct.2[j, k]  <- res$pct_s2
        pwr[j, k]    <- res$pBE
        n.mean[j, k] <- res$nmean
        n.q1[j, k]   <- res$nperc[["25%"]]
        n.med[j, k]  <- res$nperc[["50%"]]
        n.q3[j, k]   <- res$nperc[["75%"]]
        TIE[j, k]    <- power.tsd.in(alpha=alpha, CV=CV[j], n1=n1[k],
                                     GMR=GMR, usePE=usePE, theta1=theta1,
                                     theta2=theta2, theta0=theta2,
                                     targetpower=targetpower, fCrit=fCrit,
                                     fClower=fClower, fCNmax=fCNmax,
                                     pmethod=pmethod, npct=1, weight=w,
                                     setseed=setseed)$pBE
      cell <- cell + 1
      setTxtProgressBar(pb, cell/cells)
      }
    }
    costs.change <- round(100*(n.mean-n)/n, 1)
    close(pb)
    t.1     <- proc.time()[[3]] - t.0
    sig     <- binom.test(alpha*1e6, 1e6, alternative="less")$conf.int[2]
    max.TIE <- max(TIE, na.rm=TRUE)
    pos.TIE <- which(TIE == max.TIE, arr.ind=TRUE)
    CV.TIE  <- as.numeric(rownames(pos.TIE))
    n1.TIE  <- as.integer(colnames(TIE)[as.numeric(pos.TIE[, 2])])
    min.pwr <- min(pwr, na.rm=TRUE)
    pos.pwr <- which(pwr == min.pwr, arr.ind=TRUE)
    CV.pwr  <- as.numeric(rownames(pos.pwr))
    n1.pwr  <- as.integer(colnames(pwr)[as.numeric(pos.pwr[, 2])])
    TIE.max <- data.frame(CV=CV.TIE, n1=n1.TIE, alpha,
                          TIE=rep(max.TIE, length(CV.TIE)))
    colnames(TIE.max)[4] <- "maximum TIE"
    TIE.max <- cbind(TIE.max, signif="no", stringsAsFactors=FALSE)
    TIE.max[["maximum TIE"]][TIE.max[["maximum TIE"]] > sig] <- "yes"
    power.min <- data.frame(CV=CV.pwr, n1=n1.pwr, target=targetpower,
                            pwr=rep(min.pwr, length(CV.pwr)))
    colnames(power.min)[4] <- "minimum power"
    if (print) {
      cat("\nEmpiric Type I Error\n"); print(TIE)
      cat("Maximum TIE", max.TIE, "at CV", CV.TIE, "and n1", n1.TIE,
          "\n\nEmpiric Power in Stage 1\n")
      print(round(pwr1, 4))
      cat("\n% of studies expected to proceed to Stage 2\n")
      print(pct.2)
      cat("\nEmpiric overall Power\n")
      print(round(pwr, 4))
      cat("\nMinimum Power", min.pwr, "at CV", CV.pwr, "and n1", n1.pwr,
          "\n\nAverage Total Sample Size E[N]\n")
      print(round(n.mean, 1))
      cat("\nQuartile I of Total Sample Size\n")
      print(n.q1)
      cat("\nMedian of Total Sample Size\n")
      print(n.med)
      cat("\nQuartile III of Total Sample Size\n")
      print(n.q3)
      cat("\n% rel. costs change compared to fixed-sample design\n")
      print(costs.change)
      cat("\nRuntime", signif(t.1/60, 3), "minutes\n")
    }
    res <- list(TIE=TIE, TIE.max=TIE.max, power.stage1=pwr1,
                pct.stage2=pct.2, power=pwr, power.min=power.min,
                n.mean=n.mean, n.quartile1=n.q1, n.median=n.med,
                n.quartile3=n.q3, costs.change=costs.change, runtime=t.1)
    return(res)
    rm(grid, pwr1, pct.2, pwr, n.mean, n.q1, n.med, n.q3,
       costs.change, TIE, n)
  }
}
#########################
# Your conditions below #
#########################
alpha       <- 0.05
CV.from     <- 0.1
CV.to       <- 0.6
CV.step     <- 0.02
n1.from     <- 12
n1.to       <- 72
n1.step     <- 2
theta1      <- 0.80
theta2      <- 1/theta1
GMR         <- 0.95
usePE       <- FALSE
targetpower <- 0.80
pmethod     <- "nct"
fCrit       <- "No"
fClower     <- 0
fCNmax      <- Inf
x <- checkMaxCombTest(alpha, CV.from, CV.to, CV.step, n1.from, n1.to, n1.step,
                      theta1, theta2, GMR, usePE, targetpower, fCrit,
                      fClower, fCNmax)


[image] In memory of Willi Maurer, Dr. sc. math. ETH
who passed away on December 30, 2017.


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2018-04-21 22:33
(2363 d 22:55 ago)

@ Helmut
Posting: # 18715
Views: 18,349
 

 Exact TSD methods: Example

Dear all,

answering my own post in order to keep it short.
In the following an example. We have a guesstimate of the CV (0.20), assume a GMR of 0.95, and aim at power 0.80. No futility criteria. Some regulatory statisticians told me to prefer a first stage as estimated for a fixed-sample design (i.e., the second stage is solely a ‘safety net’).

library(PowerTOST)
library(Power2Stage)
CV0   <- 0.20
n0    <- sampleN.TOST(CV=CV0, details=FALSE, print=FALSE)[["Sample size"]]
n.tot <- power.tsd.in(CV=CV0, n1=n0, fCrit="No", npct=0.5)$nperc[["50%"]]
w     <- c(n0, n.tot - n0) / n.tot
if (w[1] == 1) w <- w + c(-1, +1) * 1e-6


In the method the weights have to be pre-specified, stated in the SAP, and used throughout subsequent steps (irrespective of the re-estimated n2). In the fixed-sample design we would need 20 subjects. How to set the weights? An intuitive way is to use the x̃ (20) of the total sample size based on simulations. This would give us weights of [1, 0]. Great. But weights have to be >0 and <1. Hence, I tweaked them a little to [0.999999, 0.000001]. What can we expect if we run the study with n1 20?

power.tsd.in(CV=CV0, n1=n0, fCrit="No", weight=w,
             npct=c(0.05, 0.25, 0.50, 0.75, 0.95))

TSD with 2x2 crossover
Inverse Normal approach
 - maximum combination test (weights = 0.999999 1e-06)
 - alpha (s1/s2) = 0.02531 0.02531
 - critical value (s1/s2) = 1.95463 1.95463
 - with conditional error rates and conditional power
Overall target power = 0.8
Threshold in power monitoring step for futility = 0.8
Power calculation via non-central t approx.
CV1 and GMR = 0.95 in sample size est. used
No futility criterion regarding PE, CI or Nmax
Minimum sample size in stage 2 = 4
BE acceptance range = 0.8 ... 1.25

CV = 0.2; n(stage 1) = 20; GMR = 0.95

1e+05 sims at theta0 = 0.95 (p(BE) = 'power').

p(BE)    = 0.84868
p(BE) s1 = 0.72513

Studies in stage 2 = 21.76%

Distribution of n(total)
- mean (range) = 23.4 (20 ... 86)
- percentiles
 5% 25% 50% 75% 95%
 20  20  20  20  42


Fine. If everything turns out as expected we have to be unlucky to need a second stage. Power in the first is already 0.73 and stage 2 sample sizes are not shocking. As common in TSDs the overall power is generally higher than in a fixed-sample design.
We perform the first stage and get GMR 0.91 and CV 0.25. Oops! Both are worse than assumed. Especially the GMR is painful.

n1    <- n0
GMR1  <- 0.91
CV1   <- 0.25
res   <- interim.tsd.in(GMR1=GMR1, CV1=CV1, n1=n1, fCrit="No", weight=w)
res

TSD with 2x2 crossover
Inverse Normal approach
 - maximum combination test with weights for stage 1 = 1 0
 - significance levels (s1/s2) = 0.02531 0.02531
 - critical values (s1/s2) = 1.95463 1.95463
 - BE acceptance range = 0.8 ... 1.25
 - Observed point estimate from stage 1 is not used for SSR
 - with conditional error rates and conditional (estimated target) power

Interim analysis of first stage
- Derived key statistics:
  z1 = 1.57468, z2 = 3.38674,

  Repeated CI = (0.77306, 1.07120)
- No futility criterion met
- Test for BE not positive (not considering any futility rule)
- Calculated n2 = 24
- Decision: Continue to stage 2 with 24 subjects


We fail to show BE (lower CL 77.31%) and should initiate the second stage with 24 subjects.
How would a ‘Type 1’ TSD perform?

Interim analysis (specified α1 0.0294)
───────────────────────────────────────────────────
94.12% CI:
77.77–106.48% (failed to demonstrate BE)
Power    : 0.5092 (approx. via non-central t)
Second stage with 14 subjects (N=34) is justified.


Pretty similar though a lower n2 is suggested.
OK, we perform the second stage and get GMR 0.93 and CV 0.21. Both are slightly better than what we got in the first stage but again worse than assumed.

n2    <- res$n2
GMR2  <- 0.93
CV2   <- 0.21
final.tsd.in(GMR1=GMR1, CV1=CV1, n1=n1,
             GMR2=GMR2, CV2=CV2, n2=n2, weight=w)

TSD with 2x2 crossover
Inverse Normal approach
 - maximum combination test with weights for stage 1 = 1 0
 - significance levels (s1/s2) = 0.02531 0.02531
 - critical values (s1/s2) = 1.95463 1.95463
 - BE acceptance range = 0.8 ... 1.25

Final analysis of second stage
- Derived key statistics:
  z1 = 2.32999, z2 = 4.00748,

  Repeated CI = (0.82162, 1.05264)
  Median unbiased estimate = 0.8997
- Decision: BE achieved


We survived.
In a ‘Type 1’ TSD we would get:

Final analysis of pooled data (specified α2 0.0294)
═══════════════════════════════════════════════════
94.12% CI:
83.86–101.12% (BE concluded)


Pretty similar again.

If we state it in the protocol, we could also aim for higher power in the second stage if the GMR in the first doesn’t look nice. If we switch to 0.90 we would run the second stage with 36 subjects.

Final analysis of second stage
- Derived key statistics:
  z1 = 2.86939, z2 = 4.94730,

  Repeated CI = (0.84220, 1.02693)
  Median unbiased estimate = 0.9053
- Decision: BE achieved


Helps. Another option would be to adjust for GMR1 by using the argument usePE=TRUE in interim.tsd.in(). For power 0.80 that would mean 40 subjects in the second stage and for 0.90 already 62…

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2018-04-21 22:49
(2363 d 22:39 ago)

@ Helmut
Posting: # 18716
Views: 17,916
 

 Finally: Exact TSD methods for 2×2 crossover designs

Hi Hötzi,

thank you for this post. What is "the inverse normal method with repeated confidence intervals" ?

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2018-04-21 23:41
(2363 d 21:48 ago)

@ ElMaestro
Posting: # 18717
Views: 18,295
 

 Flow chart (without details)

Hi ElMaestro,

flow chart (futility of the CI, unrestricted total sample size):

[image]

Details:

[image]

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
mittyri
★★  

Russia,
2018-04-28 17:54
(2357 d 03:34 ago)

@ Helmut
Posting: # 18737
Views: 17,506
 

 naive questions regarding new functions in Power2Stage

Hi Helmut,

sorry for naive questions raised from my hazelnut brain

1. I'm trying to compare the old function
power.tsd(method = c("B", "C", "B0"), alpha0 = 0.05, alpha = c(0.0294, 0.0294),
          n1, GMR, CV, targetpower = 0.8, pmethod = c("nct", "exact", "shifted"),
          usePE = FALSE, Nmax = Inf, min.n2 = 0, theta0, theta1, theta2,
          npct = c(0.05, 0.5, 0.95), nsims, setseed = TRUE, details = FALSE)

with a new one
power.tsd.in(alpha, weight, max.comb.test = TRUE, n1, CV, targetpower = 0.8,
             theta0, theta1, theta2, GMR, usePE = FALSE, min.n2 = 4, max.n = Inf,
             fCpower = targetpower, fCrit = "CI", fClower, fCupper, fCNmax,
             ssr.conditional = c("error_power", "error", "no"),
             pmethod = c("nct", "exact", "shifted"), npct = c(0.05, 0.5, 0.95),
             nsims, setseed = TRUE, details = FALSE)


So the old function was nice since the user can choose the method or specify 3 alphas.
In the new one I see the comment regarding alpha
If one element is given, the overall one-sided significance level. If two elements are given, the adjusted one-sided alpha levels for stage 1 and stage 2, respectively.
If missing, defaults to 0.05.

What about alpha0 for method C? Is it deprecated?

2. Why did you decide to include CI futility rule by default?

3. Regarding your flowchart:
isn't it possible that we get some value lower than 4?
power.tsd.in(CV=0.13, n1=12)
<...>
p(BE)    = 0.91149
p(BE) s1 = 0.83803
Studies in stage 2 = 9.71%

Distribution of n(total)
- mean (range) = 12.5 (12 ... 42)
- percentiles
 5% 50% 95%
 12  12  16

for example and after first stage CV=15%, CI=[0.7991897 1.0361745]:
sampleN2.TOST(CV=0.15, n1=12)
 Design  alpha   CV theta0 theta1
    2x2 0.0294 0.15   0.95    0.8
 theta2 n1 Sample size
   1.25 12           2
 Achieved power Target power
        0.82711          0.8


4. Is it possible to update the docs attached to the library?

5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time
Are there any reasons to double that functions?

PS:
regarding 3rd point:
I tried
interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No")
TSD with 2x2 crossover
Inverse Normal approach
 - maximum combination test with weights for stage 1 = 0.5 0.25
 - significance levels (s1/s2) = 0.02635 0.02635
 - critical values (s1/s2) = 1.93741 1.93741
 - BE acceptance range = 0.8 ... 1.25
 - Observed point estimate from stage 1 is not used for SSR
 - with conditional error rates and conditional (estimated target) power

Interim analysis of first stage
- Derived key statistics:
  z1 = 1.87734, z2 = 3.54417,
  Repeated CI = (0.79604, 1.04028)
- No futility criterion met
- Test for BE not positive (not considering any futility rule)
- Calculated n2 = 4
- Decision: Continue to stage 2 with 4 subjects

oh, there's a default argument min.n2 = 4
OK, let's try to change that:
> interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No", min.n2 = 2)
Error in interim.tsd.in(GMR1 = sqrt(0.7991897 * 1.0361745), CV1 = 0.15,  :
  min.n2 has to be at least 4.

Why couldn't I select a smaller one?

Kind regards,
Mittyri
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2018-04-28 19:29
(2357 d 02:00 ago)

@ mittyri
Posting: # 18738
Views: 17,359
 

 Some answers

Hi Mittyri,

I’m in a hurry; so answering only part of your questions (leave the others to Detlew or Ben).

❝ 2. Why did you decide to include CI futility rule by default?


This applies only to the x.tsd.in functions (to be in accordance with the paper of Maurer et al.).

❝ 3. Regarding your flowchart:

❝ isn't it possible that we get some value lower than 4?

❝ for example and after first stage CV=15%, CI=[0.7991897 1.0361745]:

sampleN2.TOST(CV=0.15, n1=12)

❝  Design  alpha   CV theta0 theta1 theta2 n1 Sample size
❝     2x2 0.0294 0.15   0.95    0.8   1.25 12           2


sampleN2.TOST() is intended for the other methods where at the end stages are pooled.
In the inverse normal method stages are evaluated separately (PE and MSE from ANOVAS of each stage). If you have less than 4 subjects in the second stage you will run out of steam (too low degrees of freedom). Well, 3 would work, but…

❝ 5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time

❝ Are there any reasons to double that functions?


Since this is a 0.x-release according to CRAN’s policy we can rename functions or even remove them without further notice. ;-) We decided to unify the function-names. In order not to break existing code we introduced the aliases. In the next release functions x.2stage.x() will be removed and only their counterparts x.tsd.x() kept.

❝ regarding 3rd point:

❝ I tried

interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No")

❝ […]


❝ - Calculated n2 = 4

❝ - Decision: Continue to stage 2 with 4 subjects

❝ oh, there's a default argument min.n2 = 4

❝ OK, let's try to change that:

interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No", min.n2 = 2)

Error in interim.tsd.in(GMR1 = sqrt(0.7991897 * 1.0361745), CV1 = 0.15,  :

❝   min.n2 has to be at least 4.

❝ Why couldn't I select a smaller one?


See above. Doesn’t make sense with zero degrees of freedom (n2=2).

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2018-04-29 23:11
(2355 d 22:17 ago)

@ mittyri
Posting: # 18741
Views: 17,332
 

 Some more "answers"

Dear Michael,

just my two cents.

❝ 1. I'm trying to compare the old function

❝ ...

❝ So the old function was nice since the user can choose the method or specify 3 alphas.

❝ In the new one I see the comment regarding alpha

If one element is given, the overall one-sided significance level. If two elements are given, the adjusted one-sided alpha levels for stage 1 and stage 2, respectively.

❝ If missing, defaults to 0.05.

❝ What about alpha0 for method C? Is it deprecated?


Sorry for confusion, but you definitely have to study the references (start with 1)) to get a clue whats going on with this new function(s) implementing a new method for evaluating TSDs. New in the sense that it was not implemented in Power2Stage and was not applied in the evaluation of TSDs up to now.
It's by no means a method adding to or amending the Potvin methods.
It is a new method with a different philosophy behind.
And this method, combination of p-values of applying the TOST with the data of the two stages separately, is said to control the TIE rate at <=0.05, regardless of what design changes are done during interim, e.g. re-estimation of the sample size at interim analysis. And this is not proven by simulations, but in theory, by proof. A feature which is demanded by EMA statisticians. Do you remember the statement "Potvin's methods are not valid / acceptable in Europe"?
Exept Russia which is at least to some extent also in Europe IIRC...

❝ 2. Why did you decide to include CI futility rule by default?


See Helmut's answer. Maurer et al. have included a CI futility rule in their paper.
And it's our behavior to set defaults according to the (first) paper(s) describing TSD evaluation methods. Ok, that may be sub-optimal in comparing methods, since you always have to remember the defaults and differences within them for different functions.
But, ...
The re-calculation or verification of results comes first. And my lazyness calls for defaults resembling the details done in the paper(s) after wich a function in Power2Stage was implemented.

❝ 3. Regarding your flowchart:

❝ isn't it possible that we get some value lower than 4?


See Helmut's answer.
Since min.n2<4 doesn't make sense it is restricted to >=4. As described in the Maurer et al. paper.

❝ 4. Is it possible to update the docs attached to the library?


Not quite clear for me what we should update. Could you please elaborate?

❝ 5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time

❝ Are there any reasons to double that functions?


The real reason behind this change is lazyness of mine (sic!). It saves me 3(!) keystrokes :cool:. Believe it or not ...

Don't hesitate to ask more "naive" question. We all here, including not at least me, are naive with respect to this new method of evaluating TSDs.
If you feel more comfortable ask me or Helmut or Ben via the private way. I.e. write to the maintainer of Power2Stage ;-).


1) Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018;37(10):1–21. doi:10.1002/sim.7614.

Drop me a Mehl if you need that sheet of paper.

Regards,

Detlew
mittyri
★★  

Russia,
2018-04-30 15:41
(2355 d 05:47 ago)

@ d_labes
Posting: # 18744
Views: 17,379
 

 clarification regarding user Power2Stage guides

Dear Detlew, Dear Helmut,

I'm very sorry for that post, looks like I'm out of current state of TSDs...
OK, I need to review the paper since certainly that's a new standard like Potvin's paper before.

❝ ❝ 4. Is it possible to update the docs attached to the library?


❝ Not quite clear for me what we should update. Could you please elaborate?


I looked into Power2Stage/doc
and I've found that updated on Jan-2016.

Kind regards,
Mittyri
d_labes
★★★

Berlin, Germany,
2018-04-25 16:19
(2360 d 05:09 ago)

@ Helmut
Posting: # 18729
Views: 17,577
 

 Technicality: Weigths for the inverse normal approach

Dear Helmut,

great post:clap:.

Only one remark about the weights you choose for the maximum combination test in your R code.

...

❝     for (j in seq_along(CV)) {
❝       n[j] <- sampleN.TOST(alpha=alpha, CV=CV[j], theta0=GMR, theta1=theta1,
❝                            theta2=theta2, targetpower=targetpower,
❝                            print=FALSE, details=FALSE)[["Sample size"]]
❝       if (n[j] < 12) n[j] <- 12
❝       for (k in seq_along(n1)) {
❝         # median of expected total sample size as a 'best guess'
❝         n.tot <- power.tsd.in(alpha=alpha, CV=CV[j], n1=n1[k], GMR=GMR,
❝                               usePE=usePE, theta1=theta1, theta2=theta2,
❝                               targetpower=targetpower, fCrit=fCrit,
❝                               fClower=fClower, fCNmax=fCNmax, pmethod=pmethod,
❝                               npct=0.5)$nperc[["50%"]]
❝         w     <- c(n1[k], n.tot - n1[k]) / n.tot
❝         # force extreme weights if expected to stop in stage 1 with n1
❝         if (w[1] == 1) w <- w + c(-1, +1) * 1e-6
❝     ...


Defining the weights that way is IMHO not what you intended. Or I don't understand what you intended.
It is correct if you think in terms of the standard combination test and think further that you have to specify two weights for that. But since the two weights are connected by w, 1-w the second is calculated within the function power.tsd.in() automatically. You only need to define w[1] in the input argument.

The idea behind the maximum combination test now is:
If our first pair of weights w, 1-w (chosen anyhow) is not "optimal", choose a second pair of weights w*, 1-w* which is more adapted to the real situation.
If you were too optimistic in your planing of n2, i.e. have chosen n2 too low compared to what really happens in the sample size adaption, it would be wise to define w* lower than w.
You do that, but your choice (w in w[1]=0.999999, w* in w[2]=1e-6) is too extreme I think and not your intention I further think. The second pair of weights w*=1e-6, 1-w*=0.999999 here is for a situation were the p-values from the second stage nearly exclusively determine the overall outcome of the maximum combination test. The p-values from the first stage data are down-weighted with w*=1e-6.

Hope this sermon is not too confusing.

BTW: Choosing the weights "optimal" is for me a mystery. To do that, we had to know the outcomes of the two stages, but we don't have them until the study has been done. On the other hand we have to predefine them to gain strict TIE control. Hier beißt sich die Katze in den Schwanz.

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2018-04-26 11:51
(2359 d 09:38 ago)

@ d_labes
Posting: # 18733
Views: 17,412
 

 Selection of w and w*

Dear Detlew,

❝ Defining the weights that way is IMHO not what you intended.


OK, I see!

❝ BTW: Choosing the weights "optimal" is for me a mystery. To do that, we had to know the outcomes of the two stages, but we don't have them until the study has been done. On the other hand we have to predefine them to gain strict TIE control. Hier beißt sich die Katze in den Schwanz.


Using the median of n.tot to define the weights from the sim’s was a – maybe too naïve – attempt. Other suggestions? Some regulatory statisticians prefer the first stage in a TSD to be like in a fixed sample design. For some combinations of n1/CV in my grid this will be ≤ the median of n.tot. In other words, I’m not too optimistic but rather too pessimistic. Now what?
Example: CV 0.1, GMR 0.95, target power 0.80. Fixed sample design’s n 8 (n1 ⇒ 12 acc. to GLs). n.mean and median of n.tot 12 with the default weights (0.5, 0.25). Even the 95% percentile of n.tot is 12.
:confused:

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2018-04-26 22:02
(2358 d 23:26 ago)

@ Helmut
Posting: # 18734
Views: 17,475
 

 Selection of w and w*

Dear Helmut,

❝ ...

❝ Using the median of n.tot to define the weights from the sim’s was a – maybe too naïve – attempt. Other suggestions? Some regulatory statisticians prefer the first stage in a TSD to be like in a fixed sample design. For some combinations of n1/CV in my grid this will be ≤ the median of n.tot. In other words, I’m not too optimistic but rather too pessimistic. Now what?


As I already said, DUNO really.

❝ Example: CV 0.1, GMR 0.95, target power 0.80. Fixed sample design’s n 8 (n1 ⇒ 12 acc. to GLs). n.mean and median of n.tot 12 with the default weights (0.5, 0.25). Even the 95% percentile of n.tot is 12.

:confused:


If you were pesssimistic, so in the spirit of the MCT ist would be wise to choose the second pair of weights with decreased value. Or do I err here ("real" n2 lower than the pessimistic)?
If I'm right, possible values could be:
w=0.999, w*=0.5 (or something like that value)

Or we stay for that extremal case with the standard combination test?

But to state it again: For me it is a mystery how to choose the weights.
But I think it doesn't make so much difference if we are not totally wrong with our choosen weights.
As far as I have seen so far for a small number of examples: The power is influenced only to a "minor" extent. The TIE is controlled, whatsoever weights we choose.

Regards,

Detlew
d_labes
★★★

Berlin, Germany,
2018-05-09 15:53
(2346 d 05:35 ago)

@ Helmut
Posting: # 18757
Views: 17,206
 

 Now what? w & w* examples

Dear Helmut,

I have tried to demystify some aspects of choosing w and w* for the maximum combination test by looking into some examples:

Take nfix as sample size in stage 1 (Helmut’s proposal)
Guess:
CV=0.2, theta0=0.95 -> nfix = 20
Choose n1 = nfix = 20, i.e. w= 0.99, since w has to be <1.

Guess was too pessimistic:
e.g. true CV=0.15 -> nfix = 12
or theta0=0.975 -> nfix = 16
For both the sample size for stage 1 exceed the necessary total sample size of a fixed design. Thus a more realistic w* can’t be defined or should be set to the same value as w.
This results in the standard combination test.

Guess was too optimistic:
e.g. true CV=0.25 -> nfix = 28
or theta0=0.925 -> nfix = 26
Both lead to a ‘more realistic’ w*= 0.71 or 0.77. Let's choose w* = 0.7 for simplicity.


Power & sample size of the scenarios
                                                               N
                                                       ------------------
                   CV   theta0  w      w*     power    ASN   Median  p95%
-------------------------------------------------------------------------
Guess             0.20  0.95   0.99   0.5*)   0.866    21.5    20     34
                               0.99   0.99    0.872    24.9    20     30
                               0.99   0.7     0.870    21.5    20     28
Too pessimistic   0.15  0.95   0.99   0.99    0.966    20.1    20     20
                  0.20  0.975  0.99   0.99    0.936    22.9    20     24
Too optimistic    0.25  0.95   0.99   0.7     0.842    29.1    20     64
                  0.20  0.925  0.99   0.7     0.760    22.6    20     36
-------------------------------------------------------------------------
*) w* = w/2 according to Maurer et al.
No futility criterion



Take nfix/2 as sample size in stage 1 (Maurer et al.)
Guess:
CV=0.2, theta0=0.95 -> nfix = 20
Choose n1 = nfix/2 = 10, i.e. w= 0.5.

Guess was too pessimistic:
e.g. true CV=0.15 -> nfix = 12
or theta0=0.975 -> nfix = 16
This would let to a ‘more realistic’ w*= 0.83 or 0.625, respectively. Let's take for simplicity w* = 0.7.

Guess was too optimistic:
e.g. true CV=0.25 -> nfix = 28
or theta0=0.925 -> nfix = 26
Both lead to a ‘more realistic’ w*= 0.36 or 0.38. Let's take for simplicity w* = 0.4.


Power & sample size of the scenarios
                                                               N
                                                       ------------------
                   CV   theta0  w      w*     power    ASN   Median  p95%
-------------------------------------------------------------------------
Guess             0.20  0.95   0.5    0.25*)  0.838    22.7    20     46
                               0.5    0.7     0.844    22.6    18     50
                               0.5    0.4     0.841    22.5    20     48
Too pessimistic   0.15  0.95   0.5    0.7     0.881    13.0    10     24
                  0.20  0.975  0.5    0.7     0.896    21.4    18     48
Too optimistic    0.25  0.95   0.5    0.4     0.822    37.1    34     78
                  0.20  0.925  0.99   0.4     0.747    24.1    20     52
-------------------------------------------------------------------------
*) w* = w/2 according to Maurer et al.
No futility criterion


Confusion :-D:
  • Different weights w* don’t make a big difference, I think
  • Too pessimistic specifications result in higher power and lower expected sample size (!) :surprised: (at least for CVs around 0.2)
  • Too optimistic specifications may result in lower power and higher expected sample size (!)
  • Choosing the sample size for stage 1 as the sample size of a fixed design seems to have some advantages w.r.t. power and expected sample size compared to the 'midterm' sample size re-estimation. Except the too pessimistic settings row CV=0.15, theta0=0.95 ... for the 'midterm' SSR.

Regards,

Detlew
Ben
★    

2018-06-10 22:12
(2313 d 23:16 ago)

@ d_labes
Posting: # 18880
Views: 16,717
 

 Now what? w & w* examples

Dear All,

Sorry for my rather late reply. A lot of very good comments have been made around this new function power.tsd.in. I hope there will be applications and more investigations in the future regarding this. As Detlew already mentioned: the type 1 error is controlled regardless of the scenario (we know it by theory, no simulations needed). This makes it very valuable in my opinion.

I try to comment on some points made.

❝ What about alpha0 for method C? Is it deprecated?


I hope you (mittyri) had a look into the references and found some hints on it. alpha0 just does not exist in this method. For the inverse normal method we always need (only) two adjusted alpha values, one for stage 1 and another for stage 2. The fact that the function also allows you to specify only one value is for your convenience, it will then calculate the adjusted ones internally.

❝ isn't it possible that we get some value lower than 4?


It actually is never possible to get a smaller sample size. All sample size functions used in this R package give at least 4 and thus this criterion implicitly applies to all functions within Power2Stage.

Comment on the weights:
Detlew already pointed out some important remarks. I can only highlight again that the standard combination test for the inverse normal method already uses 2 weights, but only one (w) needs to be specified because the second is just 1-w. For the maximum combination test we have 2 pairs of weights (so 4 in total), but again only the first ones of the two pairs are relevant. Those two first weights need to be spedified in the argument weight.

❝ Some regulatory statisticians told me to prefer a first stage as estimated for a fixed-sample design (i.e., the second stage is solely a ‘safety net’).


Sounds interesting. At first this sounds nice, but I am a bit puzzled about it. "Safety net" sounds like we have a rather good understanding about the CV but in case we observe some unforeseen value we have the possibility to add some extra subjects. However, in such a case we could just go with a fixed design and adapt the Power. In a TSD setting we typically have no good understanding about the CV... Do I miss something here? Based on what assumptions would we select n1 (= fixed design sample size)? We typically have some range of possible values and we don't know where we will be. For n1 I would then rather use the lower end of this range. Comments?

More comments on the weights:
Usual practice when dealing with adaptive designs was to define not just n1 but also n2 (the theory and examples were introduced for superiority designs). One way of doing that is by calculating a fixed study design sample size and then ask yourself after what fraction we want to make a look into the data+. This is done by Maurer et al and they choose 50% for the interim look. So n1 equals n2. If we assume that all subjects are evaluable this would give us a weight w of 0.5 for the first pair of weights. For superiority trials it is common practice not to go below n2 (in case the second stage is performed). Thus: if we want to apply the maximum combination test, a second weight w* being greater than w does not make sense. For the BE setting it seems this is all different. Here, n2 is flexible and can also be lower than the initially planned one. In fact, the initially planned stage 2 sample size is not really formally defined (although it theoretically exists - at least in case you calculate n1 according to some fixed design sample size). This makes the decision regarding how to define the two pairs of weights even harder. There is no unique way of defining the weights. One could for example perform some optimization procedure (with a side condition that fixes either power or sample size). Unfortunately I currently don't have an ideal solution solution to this either. :-|
+ Note that the intention for the interim look may be very different for superiority trials than for BE trials.


Take nfix/2 as sample size in stage 1 (Maurer et al.)

❝ ...


❝ Too optimistic 0.25 0.95 0.5 0.4 0.822 37.1 34 78

I can't reproduce the numbers. For ASN I get 35 and for power 0.79468.

❝ ...

Too pessimistic specifications result in higher power and lower expected sample size (!) :surprised: (at least for CVs around 0.2)

See also my comment above on the safety net. I am wondering: would we actually plan n1 such that we are (too) pessimistic? I would say: no.

Too optimistic specifications may result in lower power and higher expected sample size (!)


Brings us back to: We should plan with a realistic/slightly optimistic scenario.


Best regards,
Ben.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2018-06-11 15:57
(2313 d 05:31 ago)

@ Ben
Posting: # 18883
Views: 16,844
 

 Now what? w & w* examples

Hi Ben,

❝ I hope there will be applications and more investigations in the future regarding this.


So do I – once we solved the mystery of finding a “suitable” n1 and specifying “appropriate” weights.

❝ ❝ Some regulatory statisticians told me to prefer a first stage as estimated for a fixed-sample design (i.e., the second stage is solely a ‘safety net’).


❝ Sounds interesting. At first this sounds nice, but I am a bit puzzled about it. "Safety net" sounds like we have a rather good understanding about the CV …


Not necessarily good but a “guesstimate”.

❝ … but in case we observe some unforeseen value we have the possibility to add some extra subjects.

❝ However, in such a case we could just go with a fixed design and adapt the Power.


I’m not sure what you mean here. In a fixed sample design I would rather work with the upper CL of the CV or – if not available – assume a reasonably higher CV than my original guess rather than fiddling around with power.

❝ In a TSD setting we typically have no good understanding about the CV... Do I miss something here?


(1) Yep and (2) no.

❝ Based on what assumptions would we select n1 (= fixed design sample size)? We typically have some range of possible values and we don't know where we will be.


I was just quoting a regulatory statistician (don’t want to out him). Others didn’t contradict him. So likely he wasn’t alone with his point of view.

❝ For n1 I would then rather use the lower end of this range. Comments?


Very interesting. I expected that the sample size penalty (n2) will be higher if we use a low n1. Of course it all depends on which CV we observe in stage 1.

library(PowerTOST)
library(Power2Stage)
CVguess <- 0.2 # from a fixed design study with n=18
CL      <- CVCL(CV=CVguess, df=18-2, side="2-sided")
# Sample sizes of fixed desiogn based on guesstimate CV and its CL
n1.fix  <- sampleN.TOST(CV=CVguess, print=FALSE)[["Sample size"]]
n1.75   <- floor(0.75*n1.fix) + as.integer(0.75*n1.fix) %%2
n1.lo   <- sampleN.TOST(CV=CL[["lower CL"]], print=FALSE)[["Sample size"]]
n1.hi   <- sampleN.TOST(CV=CL[["upper CL"]], print=FALSE)[["Sample size"]]
# In all variants use the guesstimate
x       <- power.tsd.in(CV=CVguess, n1=n1.fix)
ASN.fix <- x$nmean
med.fix <- x$nperc[["50%"]]
p.fix   <- as.numeric(x[25:24])
pct.fix <- x$pct_s2
x       <- power.tsd.in(CV=CVguess, n1=n1.75)
ASN.75  <- x$nmean
med.75  <- x$nperc[["50%"]]
p.75    <- as.numeric(x[25:24])
pct.75  <- x$pct_s2
x       <- power.tsd.in(CV=CVguess, n1=n1.lo)
ASN.lo  <- x$nmean
med.lo  <- x$nperc[["50%"]]
p.lo    <- as.numeric(x[25:24])
pct.lo  <- x$pct_s2
x       <- power.tsd.in(CV=CVguess, n1=n1.hi)
ASN.hi  <- x$nmean
med.hi  <- x$nperc[["50%"]]
p.hi    <- as.numeric(x[25:24])
pct.hi  <- x$pct_s2
result  <- data.frame(CV=c(rep(CVguess, 2), CL),
                      n1=c(n1.fix, n1.75, n1.lo, n1.hi),
                      CV.obs=rep(CVguess, 4),
                      ASN=c(ASN.fix, ASN.75, ASN.lo, ASN.hi),
                      median=c(med.fix, med.75, med.lo, med.hi),
                      pwr.stg1=c(p.fix[1], p.75[1], p.lo[1], p.hi[1]),
                      pwr=c(p.fix[2], p.75[2], p.lo[2], p.hi[2]),
                      pct.2=c(pct.fix, pct.75, pct.lo, pct.hi))
row.names(result) <- c("like fixed", "75% of fixed", "lower CL", "upper CL")
print(signif(result, 4))

                 CV n1 CV.obs   ASN median pwr.stg1    pwr  pct.2
like fixed   0.2000 20    0.2 21.59     20   0.7325 0.8514 18.840
75% of fixed 0.2000 16    0.2 19.90     16   0.5946 0.8371 33.780
lower CL     0.1483 12    0.2 20.66     16   0.3834 0.8261 55.630
upper CL     0.3084 42    0.2 42.00     42   0.9740 0.9740  0.006

If we base n1 on the lower end and the CV is close to the guesstimate that’s the winner. One the other hand there is a ~56% chance of proceeding to the second stage which is not desirable – and contradicts the concept of a “safety net”. ;-) A compromise would be 75% of the fixed sample design.
The pessimistic approach would be crazy.

❝ More comments on the weights:


Have to chew on that…

❝ Brings us back to: We should plan with a realistic/slightly optimistic scenario.


Seems so.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Ben
★    

2018-06-12 21:14
(2312 d 00:14 ago)

@ Helmut
Posting: # 18892
Views: 16,443
 

 Now what? w & w* examples

Hi Helmut,

❝ Not necessarily good but a “guesstimate”.

Got it. Well... whatever a guesstimate is ;-)

❝ ❝ … but in case we observe some unforeseen value we have the possibility to add some extra subjects.

❝ ❝ However, in such a case we could just go with a fixed design and adapt the Power.


❝ I’m not sure what you mean here. In a fixed sample design I would rather work with the upper CL of the CV or – if not available – assume a reasonably higher CV than my original guess rather than fiddling around with power.

I try to explain. In case the argument is that a TSD approach should be performed not because of an uncertain CV per se (e.g. quite a big range observed so far) but because it is desired to safeguard against an unfavorable outcome of the CV (i.e. an extreme realization / random deviate of the CV), then: stop right there. To protect against such an outcome is exactly the definition of Power (type II error) and I would question whether a TSD is really the right tool - maybe a fixed design already suffices (with a proper Power).

❝ ❝ In a TSD setting we typically have no good understanding about the CV... Do I miss something here?

❝ (1) Yep and (2) no.

:ok:


❝ ❝ Based on what assumptions would we select n1 (= fixed design sample size)? We typically have some range of possible values and we don't know where we will be.

❝ I was just quoting a regulatory statistician (don’t want to out him). Others didn’t contradict him. So likely he wasn’t alone with his point of view.

:cool:

❝ Very interesting. I expected that the sample size penalty (n2) will be higher if we use a low n1.

Me too.

❝ If we base n1 on the lower end and the CV is close to the guesstimate that’s the winner. One the other hand there is a ~56% chance of proceeding to the second stage which is not desirable – and contradicts the concept of a “safety net”. ;-) A compromise would be 75% of the fixed sample design.

❝ The pessimistic approach would be crazy.

I agree to all of it. :-D

Best regards,
Ben.
mittyri
★★  

Russia,
2018-06-12 01:27
(2312 d 20:01 ago)

@ Ben
Posting: # 18884
Views: 16,616
 

 a bug in interim.tsd.in()?

Dear Ben,

Thank you for explanations!
As I mentioned above, I was not aware that Maurer method is not just another set of alphas. After reading the manuscript I understood the concept.
I tried to simulate some data to show the approach of safety net with inverse normal vs fixed (Helmut is right, it is very popular in Russia last days as my colleagues said; of course the Sponsors are using Potvin C :-D)

But my loop was interrupted:
interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38)
Error in tval[, 1] : incorrect number of dimensions
In addition: Warning messages:
1: In qnorm(p2) : NaNs produced
2: In min(df) : no non-missing arguments to min; returning Inf


What is going on here?
I thought the problem with GMR1<0.9, tried to add some condition to omit the replicates with GMR1<0.9, but even that case got the same error for some replicates.

Kind regards,
Mittyri
Ben
★    

2018-06-12 21:32
(2311 d 23:56 ago)

@ mittyri
Posting: # 18893
Views: 16,784
 

 a bug in interim.tsd.in()?

Dear mittyri,

❝ I tried to simulate some data to show the approach of safety net with inverse normal vs fixed (Helmut is right, it is very popular in Russia last days as my colleagues said; of course the Sponsors are using Potvin C :-D)

I hope Sponsors will be using the Inverse Normal approach in near future :cool:

❝ But my loop was interrupted:

interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38)

Error in tval[, 1] : incorrect number of dimensions

❝ In addition: Warning messages:

❝ 1: In qnorm(p2) : NaNs produced

❝ 2: In min(df) : no non-missing arguments to min; returning Inf


❝ What is going on here?

Thank you for that. Indeed, a bug. The underlying cause is the fact that the power of stage 1 is > 0.8 (in your example it is 0.8226768). That means we actually fall into the futility criterion "BE at interim not achieved and power of stage 1 is > 0.8" (this criterion was carried over from the Potvin et al decision tree). Instead of just stopping all the procedures (due to futility), interim.tsd.in proceeded and still wanted to calculate n2. This is however not possible because the estimated conditional target power is only defined if the power of stage 1 is greater than the overall power (argument targetpower). If you still try to calculate it, you will end up with a negative estimated conditional target power which will then be put into the sample size routine as input target power - which of course will fail.

I have corrected this bug on GitHub and it will be part of the next release.

General remark here: In your example we see that BE has not been achieved only marginally. The Repeated CI is (0.79215, 0.99993). Even though the power of stage 1 is large enough so that we formally conclude futility, one could question whether it is really a good idea to stop the trial due to futility. On the other hand: If we want to have this futility criterion then we need a cut-off threshold, and at some point this cut-off will be met...

Best regards,
Ben.
d_labes
★★★

Berlin, Germany,
2018-06-13 18:59
(2311 d 02:30 ago)

@ Ben
Posting: # 18900
Views: 16,363
 

 Nonbinding futility rule

Dear Ben,

❝ General remark here: In your example we see that BE has not been achieved only marginally. The Repeated CI is (0.79215, 0.99993). Even though the power of stage 1 is large enough so that we formally conclude futility, one could question whether it is really a good idea to stop the trial due to futility ...


If I see it correctly that's called in Maurer et al. "futility rule can be applied in a nonbinding manner, ie, it can be used as guidance but must not necessarily be followed." (page 19, bottom)
How to obtain a sample size number for the second stage if we want to do so?

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2018-06-13 21:23
(2311 d 00:05 ago)

@ d_labes
Posting: # 18901
Views: 16,467
 

 Bad weather?

Dear Detlew,

❝ If I see it correctly…


You do.

❝ … that's called in Maurer et al. "futility rule can be applied in a nonbinding manner, ie, it can be used as guidance but must not necessarily be followed." (page 19, bottom)


Sounds to me like the statement of the FDA’s guidances “Contains Nonbinding Recommendations” or a similar one in the protocols of European Scientific Advices. :cool:

❝ How to obtain a sample size number for the second stage if we want to do so?


Introducing fuzzy logic to the AlGore Rhythm? Stop if it looks really terrible or continue if the weather is not that bad.*


  • Questions to the brave Irishman:
    “How are you doing?” – “Not that bad…”
    “How is the weather in Ireland?” – “It doesn’t rain all the time.”

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2018-06-14 12:18
(2310 d 09:10 ago)

@ Helmut
Posting: # 18905
Views: 16,369
 

 NLYW?

Dear Helmut,

❝ ❝ How to obtain a sample size number for the second stage if we want to do so?


❝ Introducing fuzzy logic to the AlGore Rhythm? Stop if it looks really terrible or continue if the weather is not that bad.


Let the decision to use the futility or not to NLYW :-D?
"... there is a fundamental gender-based distinction in the functional system of this thinking apparatus: unlike male, female logic is based on fuzzy logic -- wherein each statement has got several values in such a way that if women say "No", this response doesn't mean absolute 'no-thing-ness', but implies some insensible and imperceptible features of the quite opposite response -- "Yes". The same is also true for a shift in the opposite direction of evaluation in female logic -- from "Yes" to "No". That is why it sometimes turns out to be a very difficult task to translate women's fuzzy logic to men's two-valued logic, that includes only "Yes" or "No", without a third value."
Elmar Hussein

Regards,

Detlew
Ben
★    

2018-06-13 22:26
(2310 d 23:02 ago)

@ d_labes
Posting: # 18902
Views: 16,576
 

 Nonbinding futility rule

Dear Detlew,

❝ If I see it correctly that's called in Maurer et al. "futility rule can be applied in a nonbinding manner, ie, it can be used as guidance but must not necessarily be followed." (page 19, bottom)

❝ How to obtain a sample size number for the second stage if we want to do so?


The reference at page 19 actually refers to the CI futility criterion, but nevertheless the same argument should apply to the 'power of stage 1' criterion. Well, as I said: Formula (15), i.e. the formula for the estimated conditional target power is only defined if the power of stage 1, P(R1), is less than the overall power 1 - beta. That's a key feature of the equation. Therefore, in my opinion the only possibility is: if you want to be able to handle it in a nonbinding manner, then you have to go with conditional error rates only (i.e. you cannot use the estimated conditional target power as target power for calculation of n2). So, we would need to select ssr.conditional = "error".

Best regards,
Ben.

PS: Please don't shoot the messenger ;-)
d_labes
★★★

Berlin, Germany,
2018-06-14 12:47
(2310 d 08:42 ago)

@ Ben
Posting: # 18906
Views: 16,699
 

 Nonbinding futility rule

Dear Ben,

❝ The reference at page 19 actually refers to the CI futility criterion,


I know.

❝ ... in my opinion the only possibility is: if you want to be able to handle it in a nonbinding manner, then you have to go with conditional error rates only (i.e. you cannot use the estimated conditional target power as target power for calculation of n2). So, we would need to select ssr.conditional = "error".


My first thought was: Set fCpower = 1, that results in do not use the power futility criterion. This gives n2=16 for mittyri's example
interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38, fCpower=1).

Your suggestion
interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38, ssr.conditional = "error")
gives also n2=16. Astonishing or correct?

Avoiding the conditional sample size re-estimation, i.e. using the conventional sample size re-estimation via
interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38, ssr.conditional = "no")
gives n2=4. Ooops? Wow!

Helmuts caveat of how to decide in case of "nonbinding futility" needs to be considered, scientifically, not via NLYW :-D.
IIRC the term "nonbinding" in the context of sequential designs is used for flexibility in stopping or continuing due to external reasons. Do we have such here?

Binding, nonbinding - does it have an impact on the alpha control? I think not, but are not totally sure.

Regards,

Detlew
Ben
★    

2018-06-15 19:58
(2309 d 01:30 ago)

@ d_labes
Posting: # 18908
Views: 16,394
 

 Nonbinding futility rule

Dear Detlew,

❝ My first thought was: Set fCpower = 1, that results in do not use the power futility criterion. This gives n2=16 for mittyri's example

interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38, fCpower=1).


❝ Your suggestion

interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38, ssr.conditional = "error")

❝ gives also n2=16. Astonishing or correct?


This is correct. Please note that if fCpower = 1, then (as intended) the futility criterion regarding power of stage 1 never applies. If you then encounter a scenario where power of stage 1 is greater than targetpower (this must not happen, but it can happen), then the conditional estimated target power will be negative. Thus, we would have a problem with this being the target power for sample size calculation. To avoid this from happening the function automatically sets the target power for recalculation to targetpower (which is equivalent to ssr.conditional = "error"). See 'Details' in the man page.

❝ Avoiding the conditional sample size re-estimation, i.e. using the conventional sample size re-estimation via

interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38, ssr.conditional = "no")

❝ gives n2=4. Ooops? Wow!


I have to think about that :confused:

❝ IIRC the term "nonbinding" in the context of sequential designs is used for flexibility in stopping or continuing due to external reasons. Do we have such here?

For example?

❝ Binding, nonbinding - does it have an impact on the alpha control? I think not, but are not totally sure.

Non-binding: Type 1 error is protected, even if the futility criterion is ignored.
Binding: Type 1 error is protected only if the futility criterion will be adhered to. ('Binding' is not common practice, authorities don't want this).

Best regards,
Ben.
d_labes
★★★

Berlin, Germany,
2018-06-16 21:42
(2307 d 23:46 ago)

@ Ben
Posting: # 18909
Views: 16,261
 

 Binding / Nonbinding futility rule - alpha control

Dear Ben,

❝ ❝ Binding, nonbinding - does it have an impact on the alpha control? I think not, but are not totally sure.

❝ Non-binding: Type 1 error is protected, even if the futility criterion is ignored.


Was also my thought because I didn't find any relationship to a futility rule in the proof of alpha control in the paper of Maurer et al. Or do I err here?

❝ Binding: Type 1 error is protected only if the futility criterion will be adhered to. ('Binding' is not common practice, authorities don't want this).


Are you sure for the binding case?
I thought: If the TIE is controlled without adhering to any futility rule, then it is more than ever controlled also by applying a futility criterion. The probability of deciding BE is lowered by doing so, and therefore also the TIE.

Of course the power may be compromized.
Example (some sort of 'forced BE', whatever this is):
power.tsd.in(CV=0.25, theta0=0.9, GMR=0.9, n1=36)
gives pBE('empiric power')= 0.68452 (!). Increasing n1 doesn't help. Try it.
Empiric TIE (theta0=1.25) is: pBE= 0.034186.

Without the futility criterion w.r.t. the CI
power.tsd.in(CV=0.25, theta0=0.9, GMR=0.9, n1=36, fCrit="no")
you obtain pBE('empiric power')= 0.80815.
Power much more raised if you also forget the power futility rule:
power.tsd.in(CV=0.25, theta0=0.9, GMR=0.9, n1=36, fCrit="no", fCpower=1)
gives a pBE= 0.90658.
Empiric TIE (theta0=1.25) is: pBE= 0.050012. Nitpickers! Don't cry "alpha inflation"! The +0.000012 to 0.05 is the simulation error. Try setseed=F and you will get something like p(BE)= 0.049858 or in the next run p(BE)= 0.04982.

I think that your statement for the binding case is only valid if you make a further adaption of the local alpha / critical values taking the futility rule into consideration. But I don't know how this could be done. The implementation in Power2Stage anyhow doesn't make such an adaption, if I see it correctly.

Do you have any experinces for your statement 'Binding' is not common practice, authorities don't want this'.
If yes, what is/are the reason(s) given by authorities to abandon binding futility rule(s) or not to 'like' them?

Regards,

Detlew
Ben
★    

2019-03-30 10:52
(2021 d 09:36 ago)

@ d_labes
Posting: # 20105
Views: 13,286
 

 Binding / Nonbinding futility rule - alpha control

Dear Detlew,

Sorry, totally forgot about this post.

❝ Avoiding the conditional sample size re-estimation, i.e. using the conventional sample size re-estimation via

❝ interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38, ssr.conditional = "no")

❝ gives n2=4. Ooops? Wow!


Ok, again: the recommendation here is to stop due to futility because the power of stage 1 is greater than the target power 80%. The result of n2 = 4 is correct in this situation. The reason is (i) we calculate n2 based on GMR which is 95% and (ii) we are not using conditional error rates, i.e. we ignore the magnitude of the p-values from stage 1.

❝ ❝ ❝ Binding, nonbinding - does it have an impact on the alpha control? I think not, but are not totally sure.

❝ ❝ Non-binding: Type 1 error is protected, even if the futility criterion is ignored.


❝ Was also my thought because I didn't find any relationship to a futility rule in the proof of alpha control in the paper of Maurer et al. Or do I err here?


You are correct.

❝ ❝ Binding: Type 1 error is protected only if the futility criterion will be adhered to. ('Binding' is not common practice, authorities don't want this).


❝ Are you sure for the binding case?


I believe so, yes.

❝ Of course the power may be compromized.


Agreed!

❝ I think that your statement for the binding case is only valid if you make a further adaption of the local alpha / critical values taking the futility rule into consideration.


No, I don't think that a further adaptation needs to be made. Should be mentioned in e.g. the book of Wassmer and Brannath. I will check this when I have more time.

❝ Do you have any experinces for your statement 'Binding' is not common practice, authorities don't want this'.

❝ If yes, what is/are the reason(s) given by authorities to abandon binding futility rule(s) or not to 'like' them?


Should also be mentioned in the book, but I haven't checked. I learned that in a workshop. I think the problem is that people may not believe you that you will always adhere to the stopping rule.

Best regards,
Ben.
UA Flag
Activity
 Admin contact
23,249 posts in 4,885 threads, 1,652 registered users;
66 visitors (0 registered, 66 guests [including 7 identified bots]).
Forum time: 21:29 CEST (Europe/Vienna)

The rise of biometry in this 20th century,
like that of geometry in the 3rd century before Christ,
seems to mark out one of the great ages or critical periods
in the advance of the human understanding.    R.A. Fisher

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5