Bioequivalence and Bioavailability Forum

Main page Policy/Terms of Use Abbreviations Latest Posts

 Log-in |  Register |  Search

Back to the forum  Query: 2018-05-22 06:34 CEST (UTC+2h)
 
Helmut
Hero
Homepage
Vienna, Austria,
2018-04-21 17:17

Posting: # 18714
Views: 1,411
 

 Finally: Exact TSD methods for 2×2 crossover designs [Two-Stage / GS Designs]

Dear all,

in version 0.5-1 of package Power2Stage exact methods1,2,3 are implemented – after months of struggling (many THX to Ben). The methods are extremely flexible (arbitrary BE-limits and target power, futility criteria on the PE, its CI, and the maximum total sample size, adapting for the PE of stage 1).

I’ve heard in the past that regulatory statisticians in the EU prefer methods which strictly control the Type I Error (however, at the 3rd GBHI conference in Amsterdam last week it was clear that methods based on simulations are perfectly fine for the FDA) and the inverse normal method with repeated confidence intervals would be the method of choice. Well roared lion; wasn’t aware of software which can to this job. That’s like saying “Fly to Mars but you are not allowed to use a rocket!” What else? Levitation? Witchcraft? Obtaining two p-values (like in TOST) is fairly easy but to convert them into a confidence interval (as required in all guidelines) not trivial.
Despite we showed this approach4 a while ago, nothing was published in a peer-reviewed journal until very recently.
Although we have a method now which demonstrated to control the TIE, I was curious how it performs in simulations (just to set it into perspective).
R-code at the end of the post (with small step sizes of CV and n1 expect runtimes of some hours; in large simulations I don’t recommend pmethod="exact" – about 10times slower than pmethod="nct"). See the documentation of function power.tsd.in() about how to set futility criteria and make it fully adaptive. As usual in the latter case say goodbye to power…

I explored scenarios, maximum combination test
  1. Conventional BE-limits, GMR 0.95, target power 0.90, CV 0.10–0.60, n1 12–72, no futility
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.21 56  0.05     0.05025     no
    print(x$power.min, row.names=FALSE)
      CV n1 target minimum power
     0.6 12    0.8        0.7902

  2. Conventional BE-limits, GMR 0.95, target power 0.80, CV 0.10–0.60, n1 12–72, no futility
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.38 48  0.05    0.050164     no
    print(x$power.min, row.names=FALSE)
      CV n1 target minimum power
     0.6 12    0.9       0.86941

  3. Conventional BE-limits, GMR 0.90, target power 0.80, CV 0.10–0.60, n1 12–72, no futility
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.32 44  0.05      0.0503     no
    print(x$power.min, row.names=FALSE)
      CV n1 target minimum power
     0.6 12    0.9      0.78578

  4. Conventional BE-limits, GMR 0.90, target power 0.90, CV 0.10–0.60, n1 12–72, no futility
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.46 70  0.05    0.050246     no
    print(x$power.min, row.names=FALSE)
       CV n1 target minimum power
     0.56 12    0.9      0.86528

  5. Conventional BE-limits, GMR 0.95, target power 0.80, CV 0.10–0.30, n1 18–36, futility of 90% CI in stage 1 outside [0.9374, 1.0668], futility of Nmax 42 (similar to Xu et al. ‘Method E’)
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.22 18  0.05    0.029716     no
    print(x$power.min, row.names=FALSE)
      CV n1 target minimum power
     0.3 18    0.8        0.2187

  6. Narrow BE-limits, GMR 0.975, target power 0.90, CV 0.05–0.25, n1 12–72, no futility
    print(x$TIE.max, row.names=FALSE)
       CV n1 alpha maximum TIE signif
     0.14 28  0.05    0.050164     no
    print(x$power.min, row.names=FALSE)
      CV n1 target minimum power
     0.2 12    0.9       0.88495

  7. HVD(P), conventional BE-limits (no clinical justification for ABEL), GMR 0.90, target power 0.80, CV 0.30–0.60, n1 60–276, no futility
    print(x$TIE.max, row.names=FALSE)
       CV  n1 alpha maximum TIE signif
     0.55 288  0.05     0.05022     no
    print(x$power.min, row.names=FALSE)
       CV  n1 target minimum power
     0.55 156    0.8        0.7996

As expected in simulations sometimes we get slight inflations of the TIE though they are never significantly >0.05. No news for initiates but may end the winching of regulatory statisticians who have no clue about simulations. Contrary to the simulation methods where the maximum TIE is observed at small n1 combined with high CV, the maximum TIE can be anywhere in the area where ~50% of studies proceed to the second stage. Scenario #5 is overly conservative and lacks power for small n1 and high CV. Not a good idea. More about that later.
Plots of the first four scenarios:

[image]

When exploring the details it is also clear that the exact method keeps the desired power better than the simulation methods in extreme cases.

Power of scenario #5 (a) and modifications:
  1. futility of 90% CI outside [0.9374, 1.0668], futility of Nmax 42: 0.2187
  2. futility of 90% CI outside [0.9500, 1.0526] (the code’s default): 0.80237
  3. futility of 90% CI outside [0.9374, 1.0668]: 0.81086
  4. futility of 90% CI outside [0.9500, 1.0526], futility of Nmax 42: 0.2168
  5. futility of Nmax 42: 0.22373
  6. futility of Nmax 64: 0.55658
  7. futility of Nmax 72: 0.66376
  8. futility of Nmax 96: 0.79136
Given that, it is clear that Nmax might be the bad boy. On the other hand, power collapses only if the chosen n1 was ‘too small’ for the assumed CV. Hence, even we don’t have to worry about the TIE any more, simulations are still useful exploring power.


  1. Wassmer G, Brannath W. Group Sequential and Confirmatory Adaptive Designs in Clinical Trials. Switzerland: Springer; 2016. doi:10.1007/978-3-319-32562-0.
  2. Patterson SD, Jones B. Bioequivalence and Statistics in Clinical Pharmacology. Boca Raton: CRC Press; 2nd edition 2017. ISBN 978-1-4665-8520-1.
  3. Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018;37(10):1–21. doi:10.1002/sim.7614.
  4. König F, Wolfsegger M, Jaki T, Schütz H, Wassmer G. Adaptive two-stage bioequivalence trials with early stopping and sample size re-estimation. Vienna: 2014; 35th Annual Conference of the International Society for Clinical Biostatistics. Poster P1.2.88. doi:10.13140/RG.2.1.5190.0967.

R-code

library(PowerTOST)
library(Power2Stage)
checkMaxCombTest <- function(alpha=0.05, CV.from=0.2, CV.to=0.6,
                             CV.step=0.02, n1.from=12, n1.to=72, n1.step=2,
                             theta1=0.80, theta2=1.25, GMR=0.95, usePE=FALSE,
                             targetpower=0.80, fCrit="No", fClower, fCNmax,
                             pmethod="nct", setseed=TRUE, print=TRUE)
{
  if(packageVersion("Power2Stage") < "0.5.1") {
    txt <- paste0("Requires at least version 0.5-1 of Power2Stage!",
                  "\nPlease install/update from your preferred CRAN-mirror.\n")
    stop(txt)
  } else {
    CV     <- seq(CV.from, CV.to, CV.step)
    n1     <- seq(n1.from, n1.to, n1.step)
    grid   <- matrix(nrow=length(CV), ncol=length(n1), byrow=TRUE,
                     dimnames=list(CV, n1))
    pwr1   <- pct.2 <- pwr <- n.mean <- n.q1 <- n.med <- n.q3 <- grid
    TIE    <- costs.change <- grid
    n      <- integer(length(CV))
    cells  <- length(CV)*length(n1)
    cell   <- 0
    t.0    <- proc.time()[[3]]
    pb     <- txtProgressBar(min=0, max=1, char="\u2588", style=3)
    for (j in seq_along(CV)) {
      n[j] <- sampleN.TOST(alpha=alpha, CV=CV[j], theta0=GMR, theta1=theta1,
                           theta2=theta2, targetpower=targetpower,
                           print=FALSE, details=FALSE)[["Sample size"]]
      if (n[j] < 12) n[j] <- 12
      for (k in seq_along(n1)) {
        # median of expected total sample size as a 'best guess'
        n.tot <- power.tsd.in(alpha=alpha, CV=CV[j], n1=n1[k], GMR=GMR,
                              usePE=usePE, theta1=theta1, theta2=theta2,
                              targetpower=targetpower, fCrit=fCrit,
                              fClower=fClower, fCNmax=fCNmax, pmethod=pmethod,
                              npct=0.5)$nperc[["50%"]]
        w     <- c(n1[k], n.tot - n1[k]) / n.tot
        # force extreme weights if expected to stop in stage 1 with n1
        if (w[1] == 1) w <- w + c(-1, +1) * 1e-6
        res <- power.tsd.in(alpha=alpha, CV=CV[j], n1=n1[k], GMR=GMR,
                            usePE=usePE, theta1=theta1, theta2=theta2,
                            targetpower=targetpower, fCrit=fCrit,
                            fClower=fClower, fCNmax=fCNmax, pmethod=pmethod,
                            npct=c(0.25, 0.50, 0.75), weight=w,
                            setseed=setseed)
        pwr1[j, k]   <- res$pBE_s1
        pct.2[j, k]  <- res$pct_s2
        pwr[j, k]    <- res$pBE
        n.mean[j, k] <- res$nmean
        n.q1[j, k]   <- res$nperc[["25%"]]
        n.med[j, k]  <- res$nperc[["50%"]]
        n.q3[j, k]   <- res$nperc[["75%"]]
        TIE[j, k]    <- power.tsd.in(alpha=alpha, CV=CV[j], n1=n1[k],
                                     GMR=GMR, usePE=usePE, theta1=theta1,
                                     theta2=theta2, theta0=theta2,
                                     targetpower=targetpower, fCrit=fCrit,
                                     fClower=fClower, fCNmax=fCNmax,
                                     pmethod=pmethod, npct=1, weight=w,
                                     setseed=setseed)$pBE
      cell <- cell + 1
      setTxtProgressBar(pb, cell/cells)
      }
    }
    costs.change <- round(100*(n.mean-n)/n, 1)
    close(pb)
    t.1     <- proc.time()[[3]] - t.0
    sig     <- binom.test(alpha*1e6, 1e6, alternative="less")$conf.int[2]
    max.TIE <- max(TIE, na.rm=TRUE)
    pos.TIE <- which(TIE == max.TIE, arr.ind=TRUE)
    CV.TIE  <- as.numeric(rownames(pos.TIE))
    n1.TIE  <- as.integer(colnames(TIE)[as.numeric(pos.TIE[, 2])])
    min.pwr <- min(pwr, na.rm=TRUE)
    pos.pwr <- which(pwr == min.pwr, arr.ind=TRUE)
    CV.pwr  <- as.numeric(rownames(pos.pwr))
    n1.pwr  <- as.integer(colnames(pwr)[as.numeric(pos.pwr[, 2])])
    TIE.max <- data.frame(CV=CV.TIE, n1=n1.TIE, alpha,
                          TIE=rep(max.TIE, length(CV.TIE)))
    colnames(TIE.max)[4] <- "maximum TIE"
    TIE.max <- cbind(TIE.max, signif="no", stringsAsFactors=FALSE)
    TIE.max[["maximum TIE"]][TIE.max[["maximum TIE"]] > sig] <- "yes"
    power.min <- data.frame(CV=CV.pwr, n1=n1.pwr, target=targetpower,
                            pwr=rep(min.pwr, length(CV.pwr)))
    colnames(power.min)[4] <- "minimum power"
    if (print) {
      cat("\nEmpiric Type I Error\n"); print(TIE)
      cat("Maximum TIE", max.TIE, "at CV", CV.TIE, "and n1", n1.TIE,
          "\n\nEmpiric Power in Stage 1\n")
      print(round(pwr1, 4))
      cat("\n% of studies expected to proceed to Stage 2\n")
      print(pct.2)
      cat("\nEmpiric overall Power\n")
      print(round(pwr, 4))
      cat("\nMinimum Power", min.pwr, "at CV", CV.pwr, "and n1", n1.pwr,
          "\n\nAverage Total Sample Size E[N]\n")
      print(round(n.mean, 1))
      cat("\nQuartile I of Total Sample Size\n")
      print(n.q1)
      cat("\nMedian of Total Sample Size\n")
      print(n.med)
      cat("\nQuartile III of Total Sample Size\n")
      print(n.q3)
      cat("\n% rel. costs change compared to fixed-sample design\n")
      print(costs.change)
      cat("\nRuntime", signif(t.1/60, 3), "minutes\n")
    }
    res <- list(TIE=TIE, TIE.max=TIE.max, power.stage1=pwr1,
                pct.stage2=pct.2, power=pwr, power.min=power.min,
                n.mean=n.mean, n.quartile1=n.q1, n.median=n.med,
                n.quartile3=n.q3, costs.change=costs.change, runtime=t.1)
    return(res)
    rm(grid, pwr1, pct.2, pwr, n.mean, n.q1, n.med, n.q3,
       costs.change, TIE, n)
  }
}
#########################
# Your conditions below #
#########################
alpha       <- 0.05
CV.from     <- 0.1
CV.to       <- 0.6
CV.step     <- 0.02
n1.from     <- 12
n1.to       <- 72
n1.step     <- 2
theta1      <- 0.80
theta2      <- 1/theta1
GMR         <- 0.95
usePE       <- FALSE
targetpower <- 0.80
pmethod     <- "nct"
fCrit       <- "No"
fClower     <- 0
fCNmax      <- Inf
x <- checkMaxCombTest(alpha, CV.from, CV.to, CV.step, n1.from, n1.to, n1.step,
                      theta1, theta2, GMR, usePE, targetpower, fCrit,
                      fClower, fCNmax)


[image] In memory of Willi Maurer, Dr. sc. math. ETH
who passed away on December 30, 2017.


[image]Cheers,
Helmut Schütz 
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Helmut
Hero
Homepage
Vienna, Austria,
2018-04-21 20:33

@ Helmut
Posting: # 18715
Views: 1,295
 

 Exact TSD methods: Example

Dear all,

answering my own post in order to keep it short.
In the following an example. We have a guestimate of the CV (0.20), assume a GMR of 0.95, and aim at power 0.80. No futility criteria. Some regulatory statisticians told me to prefer a first stage as estimated for a fixed-sample design (i.e., the second stage is solely a ‘safety net’).

library(PowerTOST)
library(Power2Stage)
CV0   <- 0.20
n0    <- sampleN.TOST(CV=CV0, details=FALSE, print=FALSE)[["Sample size"]]
n.tot <- power.tsd.in(CV=CV0, n1=n0, fCrit="No", npct=0.5)$nperc[["50%"]]
w     <- c(n0, n.tot - n0) / n.tot
if (w[1] == 1) w <- w + c(-1, +1) * 1e-6


In the method the weights have to be pre-specified, stated in the SAP, and used throughout subsequent steps (irrespective of the re-estimated n2). In the fixed-sample design we would need 20 subjects. How to set the weights? An intuitive way is to use the median (20) of the total sample size based on simulations. This would give us weights of [1, 0]. Great. But weights have to be >0 and <1. Hence, I tweaked them a little to [0.999999, 0.000001]. What can we expect if we run the study with n1 20?

power.tsd.in(CV=CV0, n1=n0, fCrit="No", weight=w,
             npct=c(0.05, 0.25, 0.50, 0.75, 0.95))

TSD with 2x2 crossover
Inverse Normal approach
 - maximum combination test (weights = 0.999999 1e-06)
 - alpha (s1/s2) = 0.02531 0.02531
 - critical value (s1/s2) = 1.95463 1.95463
 - with conditional error rates and conditional power
Overall target power = 0.8
Threshold in power monitoring step for futility = 0.8
Power calculation via non-central t approx.
CV1 and GMR = 0.95 in sample size est. used
No futility criterion regarding PE, CI or Nmax
Minimum sample size in stage 2 = 4
BE acceptance range = 0.8 ... 1.25

CV = 0.2; n(stage 1) = 20; GMR = 0.95

1e+05 sims at theta0 = 0.95 (p(BE) = 'power').

p(BE)    = 0.84868
p(BE) s1 = 0.72513

Studies in stage 2 = 21.76%

Distribution of n(total)
- mean (range) = 23.4 (20 ... 86)
- percentiles
 5% 25% 50% 75% 95%
 20  20  20  20  42


Fine. If everything turns out as expected we have to be unlucky to need a second stage. Power in the first is already 0.73 and stage 2 sample sizes are not shocking. As common in TSDs the overall power is generally higher than in a fixed-sample design.
We perform the first stage and get GMR 0.91 and CV 0.25. Oops! Both are worse than assumed. Especially the GMR is painful.

n1    <- n0
GMR1  <- 0.91
CV1   <- 0.25
res   <- interim.tsd.in(GMR1=GMR1, CV1=CV1, n1=n1, fCrit="No", weight=w)
res

TSD with 2x2 crossover
Inverse Normal approach
 - maximum combination test with weights for stage 1 = 1 0
 - significance levels (s1/s2) = 0.02531 0.02531
 - critical values (s1/s2) = 1.95463 1.95463
 - BE acceptance range = 0.8 ... 1.25
 - Observed point estimate from stage 1 is not used for SSR
 - with conditional error rates and conditional (estimated target) power

Interim analysis of first stage
- Derived key statistics:
  z1 = 1.57468, z2 = 3.38674,

  Repeated CI = (0.77306, 1.07120)
- No futility criterion met
- Test for BE not positive (not considering any futility rule)
- Calculated n2 = 24
- Decision: Continue to stage 2 with 24 subjects


We fail to show BE (lower CL 77.31%) and should initiate the second stage with 24 subjects.
How would a ‘Type 1’ TSD perform?

Interim analysis (specified α1 0.0294)
───────────────────────────────────────────────────
94.12% CI:
77.77–106.48% (failed to demonstrate BE)
Power    : 0.5092 (approx. via non-central t)
Second stage with 14 subjects (N=34) is justified.


Pretty similar though a lower n2 is suggested.
OK, we perform the second stage and get GMR 0.93 and CV 0.21. Both are slightly better than what we got in the first stage but again worse than assumed.

n2    <- res$n2
GMR2  <- 0.93
CV2   <- 0.21
final.tsd.in(GMR1=GMR1, CV1=CV1, n1=n1,
             GMR2=GMR2, CV2=CV2, n2=n2, weight=w)

TSD with 2x2 crossover
Inverse Normal approach
 - maximum combination test with weights for stage 1 = 1 0
 - significance levels (s1/s2) = 0.02531 0.02531
 - critical values (s1/s2) = 1.95463 1.95463
 - BE acceptance range = 0.8 ... 1.25

Final analysis of second stage
- Derived key statistics:
  z1 = 2.32999, z2 = 4.00748,

  Repeated CI = (0.82162, 1.05264)
  Median unbiased estimate = 0.8997
- Decision: BE achieved


We survived.
In a ‘Type 1’ TSD we would get:

Final analysis of pooled data (specified α2 0.0294)
═══════════════════════════════════════════════════
94.12% CI:
83.86–101.12% (BE concluded)


Pretty similar again.

If we state it in the protocol, we could also aim for higher power in the second stage if the GMR in the first doesn’t look nice. If we switch to 0.90 we would run the second stage with 36 subjects.

Final analysis of second stage
- Derived key statistics:
  z1 = 2.86939, z2 = 4.94730,

  Repeated CI = (0.84220, 1.02693)
  Median unbiased estimate = 0.9053
- Decision: BE achieved


Helps. Another option would be to adjust for GMR1 by using the argument usePE=TRUE in interim.tsd.in(). For power 0.80 that would mean 40 subjects in the second stage and for 0.90 already 62…

[image]Cheers,
Helmut Schütz 
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
ElMaestro
Hero

Denmark,
2018-04-21 20:49

@ Helmut
Posting: # 18716
Views: 1,279
 

 Finally: Exact TSD methods for 2×2 crossover designs

Hi Hötzi,

thank you for this post. What is "the inverse normal method with repeated confidence intervals" ?

“A ten-year, double-blind study from the Mayo Clinic concluded that even in late stages of dementia, the last to go is the lobe of the brain in charge of cafeteria layout.” (Serge Storms/Tim Dorsey).


Best regards,
ElMaestro

- Bootstrapping is a relatively new hobby of mine. I am only 30 years late to the party.
Helmut
Hero
Homepage
Vienna, Austria,
2018-04-21 21:41

@ ElMaestro
Posting: # 18717
Views: 1,280
 

 Flow chart (without details)

Hi ElMaestro,

flow chart (futility of the CI, unrestricted total sample size):

[image]

Details:

[image]

[image]Cheers,
Helmut Schütz 
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
mittyri
Senior

Russia,
2018-04-28 15:54
(edited by mittyri on 2018-04-28 16:08)

@ Helmut
Posting: # 18737
Views: 617
 

 naive questions regarding new functions in Power2Stage

Hi Helmut,

sorry for naive questions raised from my hazelnut brain

1. I'm trying to compare the old function
power.tsd(method = c("B", "C", "B0"), alpha0 = 0.05, alpha = c(0.0294, 0.0294),
          n1, GMR, CV, targetpower = 0.8, pmethod = c("nct", "exact", "shifted"),
          usePE = FALSE, Nmax = Inf, min.n2 = 0, theta0, theta1, theta2,
          npct = c(0.05, 0.5, 0.95), nsims, setseed = TRUE, details = FALSE)

with a new one
power.tsd.in(alpha, weight, max.comb.test = TRUE, n1, CV, targetpower = 0.8,
             theta0, theta1, theta2, GMR, usePE = FALSE, min.n2 = 4, max.n = Inf,
             fCpower = targetpower, fCrit = "CI", fClower, fCupper, fCNmax,
             ssr.conditional = c("error_power", "error", "no"),
             pmethod = c("nct", "exact", "shifted"), npct = c(0.05, 0.5, 0.95),
             nsims, setseed = TRUE, details = FALSE)


So the old function was nice since the user can choose the method or specify 3 alphas.
In the new one I see the comment regarding alpha
If one element is given, the overall one-sided significance level. If two elements are given, the adjusted one-sided alpha levels for stage 1 and stage 2, respectively.
If missing, defaults to 0.05.

What about alpha0 for method C? Is it deprecated?

2. Why did you decide to include CI futility rule by default?

3. Regarding your flowchart:
isn't it possible that we get some value lower than 4?
power.tsd.in(CV=0.13, n1=12)
<...>
p(BE)    = 0.91149
p(BE) s1 = 0.83803
Studies in stage 2 = 9.71%

Distribution of n(total)
- mean (range) = 12.5 (12 ... 42)
- percentiles
 5% 50% 95%
 12  12  16

for example and after first stage CV=15%, CI=[0.7991897 1.0361745]:
sampleN2.TOST(CV=0.15, n1=12)
 Design  alpha   CV theta0 theta1
    2x2 0.0294 0.15   0.95    0.8
 theta2 n1 Sample size
   1.25 12           2
 Achieved power Target power
        0.82711          0.8


4. Is it possible to update the docs attached to the library?

5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time
Are there any reasons to double that functions?

PS:
regarding 3rd point:
I tried
interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No")
TSD with 2x2 crossover
Inverse Normal approach
 - maximum combination test with weights for stage 1 = 0.5 0.25
 - significance levels (s1/s2) = 0.02635 0.02635
 - critical values (s1/s2) = 1.93741 1.93741
 - BE acceptance range = 0.8 ... 1.25
 - Observed point estimate from stage 1 is not used for SSR
 - with conditional error rates and conditional (estimated target) power

Interim analysis of first stage
- Derived key statistics:
  z1 = 1.87734, z2 = 3.54417,
  Repeated CI = (0.79604, 1.04028)
- No futility criterion met
- Test for BE not positive (not considering any futility rule)
- Calculated n2 = 4
- Decision: Continue to stage 2 with 4 subjects

oh, there's a default argument min.n2 = 4
OK, let's try to change that:
> interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No", min.n2 = 2)
Error in interim.tsd.in(GMR1 = sqrt(0.7991897 * 1.0361745), CV1 = 0.15,  :
  min.n2 has to be at least 4.

Why couldn't I select a smaller one?

Kind regards,
Mittyri
Helmut
Hero
Homepage
Vienna, Austria,
2018-04-28 17:29

@ mittyri
Posting: # 18738
Views: 611
 

 Some answers

Hi Mittyri,

I’m in a hurry; so answering only part of your questions (leave the others to Detlew or Ben).

» 2. Why did you decide to include CI futility rule by default?

This applies only to the x.tsd.in functions (to be in accordance with the paper of Maurer et al.).

» 3. Regarding your flowchart:
» isn't it possible that we get some value lower than 4?
» for example and after first stage CV=15%, CI=[0.7991897 1.0361745]:
» sampleN2.TOST(CV=0.15, n1=12)
»  Design  alpha   CV theta0 theta1 theta2 n1 Sample size
»     2x2 0.0294 0.15   0.95    0.8   1.25 12           2


sampleN2.TOST() is intended for the other methods where at the end stages are pooled.
In the inverse normal method stages are evaluated separately (PE and MSE from ANOVAS of each stage). If you have less than 4 subjects in the second stage you will run out of steam (too low degrees of freedom). Well, 3 would work, but…

» 5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time
» Are there any reasons to double that functions?

Since this is a 0.x-release according to CRAN’s policy we can rename functions or even remove them without further notice. ;-) We decided to unify the function-names. In order not to break existing code we introduced the aliases. In the next release functions x.2stage.x() will be removed and only their counterparts x.tsd.x() kept.

» PS:
» regarding 3rd point:
» I tried
» interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No")
» […]
» - Calculated n2 = 4
» - Decision: Continue to stage 2 with 4 subjects

» oh, there's a default argument min.n2 = 4
» OK, let's try to change that:
» interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No", min.n2 = 2)
» Error in interim.tsd.in(GMR1 = sqrt(0.7991897 * 1.0361745), CV1 = 0.15,  :
»   min.n2 has to be at least 4.

» Why couldn't I select a smaller one?

See above. Doesn’t make sense with zero degrees of freedom (n2=2).

[image]Cheers,
Helmut Schütz 
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
d_labes
Hero

Berlin, Germany,
2018-04-29 21:11

@ mittyri
Posting: # 18741
Views: 584
 

 Some more "answers"

Dear Michael,

just my two cents.

» 1. I'm trying to compare the old function
» ...
» So the old function was nice since the user can choose the method or specify 3 alphas.
» In the new one I see the comment regarding alpha
» If one element is given, the overall one-sided significance level. If two elements are given, the adjusted one-sided alpha levels for stage 1 and stage 2, respectively.
» If missing, defaults to 0.05.
» What about alpha0 for method C? Is it deprecated?

Sorry for confusion, but you definitely have to study the references (start with 1)) to get a clue whats going on with this new function(s) implementing a new method for evaluating TSDs. New in the sense that it was not implemented in Power2Stage and was not applied in the evaluation of TSDs up to now.
It's by no means a method adding to or amending the Potvin methods.
It is a new method with a different philosophy behind.
And this method, combination of p-values of applying the TOST with the data of the two stages separately, is said to control the TIE rate at <=0.05, regardless of what design changes are done during interim, e.g. re-estimation of the sample size at interim analysis. And this is not proven by simulations, but in theory, by proof. A feature which is demanded by EMA statisticians. Do you remember the statement "Potvin's methods are not valid / acceptable in Europe"?
Exept Russia which is at least to some extent also in Europe IIRC...

» 2. Why did you decide to include CI futility rule by default?

See Helmut's answer. Maurer et al. have included a CI futility rule in their paper.
And it's our behavior to set defaults according to the (first) paper(s) describing TSD evaluation methods. Ok, that may be sub-optimal in comparing methods, since you always have to remember the defaults and differences within them for different functions.
But, ...
The re-calculation or verification of results comes first. And my lazyness calls for defaults resembling the details done in the paper(s) after wich a function in Power2Stage was implemented.

» 3. Regarding your flowchart:
» isn't it possible that we get some value lower than 4?
» ...

See Helmut's answer.
Since min.n2<4 doesn't make sense it is restricted to >=4. As described in the Maurer et al. paper.

» 4. Is it possible to update the docs attached to the library?

Not quite clear for me what we should update. Could you please elaborate?

» 5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time
» Are there any reasons to double that functions?

The real reason behind this change is lazyness of mine (sic!). It saves me 3(!) keystrokes :cool:. Believe it or not ...

Don't hesitate to ask more "naive" question. We all here, including not at least me, are naive with respect to this new method of evaluating TSDs.
If you feel more comfortable ask me or Helmut or Ben via the private way. I.e. write to the maintainer of Power2Stage ;-).


1)Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018;37(10):1–21. doi:10.1002/sim.7614.

Drop me a Mehl if you need that sheet of paper.

Regards,

Detlew
mittyri
Senior

Russia,
2018-04-30 13:41

@ d_labes
Posting: # 18744
Views: 559
 

 clarification regarding user Power2Stage guides

Dear Detlew, Dear Helmut,

I'm very sorry for that post, looks like I'm out of current state of TSDs...
OK, I need to review the paper since certainly that's a new standard like Potvin's paper before.

» » 4. Is it possible to update the docs attached to the library?
»
» Not quite clear for me what we should update. Could you please elaborate?

I looked into Power2Stage/doc
and I've found that updated on Jan-2016.

Kind regards,
Mittyri
d_labes
Hero

Berlin, Germany,
2018-04-25 14:19

@ Helmut
Posting: # 18729
Views: 769
 

 Technicality: Weigths for the inverse normal approach

Dear Helmut,

great post:clap:.

Only one remark about the weights you choose for the maximum combination test in your R code.

» ...
»     for (j in seq_along(CV)) {
»       n[j] <- sampleN.TOST(alpha=alpha, CV=CV[j], theta0=GMR, theta1=theta1,
»                            theta2=theta2, targetpower=targetpower,
»                            print=FALSE, details=FALSE)[["Sample size"]]
»       if (n[j] < 12) n[j] <- 12
»       for (k in seq_along(n1)) {
»         # median of expected total sample size as a 'best guess'
»         n.tot <- power.tsd.in(alpha=alpha, CV=CV[j], n1=n1[k], GMR=GMR,
»                               usePE=usePE, theta1=theta1, theta2=theta2,
»                               targetpower=targetpower, fCrit=fCrit,
»                               fClower=fClower, fCNmax=fCNmax, pmethod=pmethod,
»                               npct=0.5)$nperc[["50%"]]
»         w     <- c(n1[k], n.tot - n1[k]) / n.tot
»         # force extreme weights if expected to stop in stage 1 with n1
»         if (w[1] == 1) w <- w + c(-1, +1) * 1e-6
»     ...


Defining the weights that way is IMHO not what you intended. Or I don't understand what you intended.
It is correct if you think in terms of the standard combination test and think further that you have to specify two weights for that. But since the two weights are connected by w, 1-w the second is calculated within the function power.tsd.in() automatically. You only need to define w[1] in the input argument.

The idea behind the maximum combination test now is:
If our first pair of weights w, 1-w (chosen anyhow) is not "optimal", choose a second pair of weights w*, 1-w* which is more adapted to the real situation.
If you were too optimistic in your planing of n2, i.e. have chosen n2 too low compared to what really happens in the sample size adaption, it would be wise to define w* lower than w.
You do that, but your choice (w in w[1]=0.999999, w* in w[2]=1e-6) is too extreme I think and not your intention I further think. The second pair of weights w*=1e-6, 1-w*=0.999999 here is for a situation were the p-values from the second stage nearly exclusively determine the overall outcome of the maximum combination test. The p-values from the first stage data are down-weighted with w*=1e-6.

Hope this sermon is not too confusing.

BTW: Choosing the weights "optimal" is for me a mystery. To do that, we had to know the outcomes of the two stages, but we don't have them until the study has been done. On the other hand we have to predefine them to gain strict TIE control. Hier beißt sich die Katze in den Schwanz.

Regards,

Detlew
Helmut
Hero
Homepage
Vienna, Austria,
2018-04-26 09:51

@ d_labes
Posting: # 18733
Views: 714
 

 Selection of w and w*

Dear Detlew,

» Defining the weights that way is IMHO not what you intended.

OK, I see!

» BTW: Choosing the weights "optimal" is for me a mystery. To do that, we had to know the outcomes of the two stages, but we don't have them until the study has been done. On the other hand we have to predefine them to gain strict TIE control. Hier beißt sich die Katze in den Schwanz.

Using the median of n.tot to define the weights from the sim’s was a – maybe too naïve – attempt. Other suggestions? Some regulatory statisticians prefer the first stage in a TSD to be like in a fixed sample design. For some combinations of n1/CV in my grid this will be ≤ the median of n.tot. In other words, I’m not too optimistic but rather too pessimistic. Now what?
Example: CV 0.1, GMR 0.95, target power 0.80. Fixed sample design’s n 8 (n1 ⇒ 12 acc. to GLs). n.mean and median of n.tot 12 with the default weights (0.5, 0.25). Even the 95% percentile of n.tot is 12.
:confused:

[image]Cheers,
Helmut Schütz 
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
d_labes
Hero

Berlin, Germany,
2018-04-26 20:02

@ Helmut
Posting: # 18734
Views: 680
 

 Selection of w and w*

Dear Helmut,

» ...
» Using the median of n.tot to define the weights from the sim’s was a – maybe too naïve – attempt. Other suggestions? Some regulatory statisticians prefer the first stage in a TSD to be like in a fixed sample design. For some combinations of n1/CV in my grid this will be ≤ the median of n.tot. In other words, I’m not too optimistic but rather too pessimistic. Now what?

As I already said, DUNO really.

» Example: CV 0.1, GMR 0.95, target power 0.80. Fixed sample design’s n 8 (n1 ⇒ 12 acc. to GLs). n.mean and median of n.tot 12 with the default weights (0.5, 0.25). Even the 95% percentile of n.tot is 12.
» :confused:

If you were pesssimistic, so in the spirit of the MCT ist would be wise to choose the second pair of weights with decreased value. Or do I err here ("real" n2 lower than the pessimistic)?
If I'm right, possible values could be:
w=0.999, w*=0.5 (or something like that value)

Or we stay for that extremal case with the standard combination test?

But to state it again: For me it is a mystery how to choose the weights.
But I think it doesn't make so much difference if we are not totally wrong with our choosen weights.
As far as I have seen so far for a small number of examples: The power is influenced only to a "minor" extent. The TIE is controlled, whatsoever weights we choose.

Regards,

Detlew
d_labes
Hero

Berlin, Germany,
2018-05-09 13:53
(edited by d_labes on 2018-05-09 14:25)

@ Helmut
Posting: # 18757
Views: 307
 

 Now what? w & w* examples

Dear Helmut,

I have tried to demystify some aspects of choosing w and w* for the maximum combination test by looking into some examples:

Take nfix as sample size in stage 1 (Helmut’s proposal)
Guess:
CV=0.2, theta0=0.95 -> nfix = 20
Choose n1 = nfix = 20, i.e. w= 0.99, since w has to be <1.

Guess was too pessimistic:
e.g. true CV=0.15 -> nfix = 12
or theta0=0.975 -> nfix = 16
For both the sample size for stage 1 exceed the necessary total sample size of a fixed design. Thus a more realistic w* can’t be defined or should be set to the same value as w.
This results in the standard combination test.

Guess was too optimistic:
e.g. true CV=0.25 -> nfix = 28
or theta0=0.925 -> nfix = 26
Both lead to a ‘more realistic’ w*= 0.71 or 0.77. Let's choose w* = 0.7 for simplicity.


Power & sample size of the scenarios
                                                               N
                                                       ------------------
                   CV   theta0  w      w*     power    ASN   Median  p95%
-------------------------------------------------------------------------
Guess             0.20  0.95   0.99   0.5*)   0.866    21.5    20     34
                               0.99   0.99    0.872    24.9    20     30
                               0.99   0.7     0.870    21.5    20     28
Too pessimistic   0.15  0.95   0.99   0.99    0.966    20.1    20     20
                  0.20  0.975  0.99   0.99    0.936    22.9    20     24
Too optimistic    0.25  0.95   0.99   0.7     0.842    29.1    20     64
                  0.20  0.925  0.99   0.7     0.760    22.6    20     36
-------------------------------------------------------------------------
*) w* = w/2 according to Maurer et al.
No futility criterion



Take nfix/2 as sample size in stage 1 (Maurer et al.)
Guess:
CV=0.2, theta0=0.95 -> nfix = 20
Choose n1 = nfix/2 = 10, i.e. w= 0.5.

Guess was too pessimistic:
e.g. true CV=0.15 -> nfix = 12
or theta0=0.975 -> nfix = 16
This would let to a ‘more realistic’ w*= 0.83 or 0.625, respectively. Let's take for simplicity w* = 0.7.

Guess was too optimistic:
e.g. true CV=0.25 -> nfix = 28
or theta0=0.925 -> nfix = 26
Both lead to a ‘more realistic’ w*= 0.36 or 0.38. Let's take for simplicity w* = 0.4.


Power & sample size of the scenarios
                                                               N
                                                       ------------------
                   CV   theta0  w      w*     power    ASN   Median  p95%
-------------------------------------------------------------------------
Guess             0.20  0.95   0.5    0.25*)  0.838    22.7    20     46
                               0.5    0.7     0.844    22.6    18     50
                               0.5    0.4     0.841    22.5    20     48
Too pessimistic   0.15  0.95   0.5    0.7     0.881    13.0    10     24
                  0.20  0.975  0.5    0.7     0.896    21.4    18     48
Too optimistic    0.25  0.95   0.5    0.4     0.822    37.1    34     78
                  0.20  0.925  0.99   0.4     0.747    24.1    20     52
-------------------------------------------------------------------------
*) w* = w/2 according to Maurer et al.
No futility criterion


Confusion :-D:
  • Different weights w* don’t make a big difference, I think
  • Too pessimistic specifications result in higher power and lower expected sample size (!) :surprised: (at least for CVs around 0.2)
  • Too optimistic specifications may result in lower power and higher expected sample size (!)
  • Choosing the sample size for stage 1 as the sample size of a fixed design seems to have some advantages w.r.t. power and expected sample size compared to the 'midterm' sample size re-estimation. Except the too pessimistic settings row CV=0.15, theta0=0.95 ... for the 'midterm' SSR.

Regards,

Detlew
Back to the forum Activity
 Thread view
Bioequivalence and Bioavailability Forum | Admin contact
18,275 posts in 3,884 threads, 1,164 registered users;
16 users online (0 registered, 16 guests).

Bones, I want the impossible checked out too.    William Shatner (as James T. Kirk)

The BIOEQUIVALENCE / BIOAVAILABILITY FORUM is hosted by
BEBAC Ing. Helmut Schütz
XHTML/CSS RSS Feed