Helmut
★★★

Vienna, Austria,
2015-11-27 19:05
(1790 d 22:06 ago)

Posting: # 15680
Views: 13,192

## Adaptive TSD vs. “classical” GSD [Two-Stage / GS Designs]

Dear all,

I received a question and suggested the sender to register at the forum, which he didn’t do. However, I think that the question is interesting and I want to get your opinions. The study is planed for USFDA submission.

“BE study will be initiated with dosing for 50% subjects of protocol and samples will be analysed; if results with 50% subjects show bioequivalence, data will be submitted to regulatory. If results are not bioequivalent, study will be continued with dosing for remaining 50% subjects and samples will be analysed; The results with all subjects (100%) will be evaluated for BE and results show bio­equi­va­lence, data will be submitted to regulatory.”

OK, smells of a “classical” Group-Sequential Design with one interim at N/2. The best guess CV is around 40% and the expected GMR 0.951:

library(PowerTOST)
sampleN.TOST(CV=0.4, theta0=0.95)

+++++++++++ Equivalence test - TOST +++++++++++
Sample size estimation
-----------------------------------------------
Study design:  2x2 crossover
log-transformed data (multiplicative model)

alpha = 0.05, target power = 0.8
BE margins        = 0.8 ... 1.25
Null (true) ratio = 0.95,  CV = 0.4

Sample size (total)
n     power
66   0.805252

In a TSD one would opt for a stage 1 sample size of ~75% of the fixed sample design. So we would also start with 50 – the same number chosen for the GSD. Below the results of my sim’s for CVs of 30-50% (R-code2). The ‘best guess’ CV is marked.

1. GSD
GMR CV%  n  alpha  pwr%  2nd%   N  alpha  pwr%     TIE
0.95  30 50 0.0310 84.05 15.95 100 0.0277 98.72 0.04839 ns
0.95  31 50 0.0310 81.66 18.34 100 0.0277 98.28 0.04839 ns
0.95  32 50 0.0310 79.19 20.82 100 0.0277 97.73 0.04839 ns
0.95  33 50 0.0310 76.59 23.41 100 0.0277 97.09 0.04847 ns
0.95  34 50 0.0310 73.98 26.02 100 0.0277 96.27 0.04856 ns
0.95  35 50 0.0310 71.32 28.68 100 0.0277 95.45 0.04856 ns
0.95  36 50 0.0310 68.67 31.33 100 0.0277 94.43 0.04848 ns
0.95  37 50 0.0310 65.89 34.11 100 0.0277 93.48 0.04855 ns
0.95  38 50 0.0310 63.05 36.95 100 0.0277 92.32 0.04831 ns
0.95  39 50 0.0310 60.27 39.73 100 0.0277 91.18 0.04839 ns
0.95  40 50 0.0310 57.48 42.52 100 0.0277 89.95 0.04848 ns
0.95  41 50 0.0310 54.76 45.24 100 0.0277 88.64 0.04825 ns
0.95  42 50 0.0310 52.03 47.97 100 0.0277 87.29 0.04825 ns
0.95  43 50 0.0310 49.28 50.72 100 0.0277 85.94 0.04849 ns
0.95  44 50 0.0310 46.51 53.49 100 0.0277 84.48 0.04826 ns
0.95  45 50 0.0310 43.72 56.28 100 0.0277 83.08 0.04799 ns
0.95  46 50 0.0310 41.00 59.00 100 0.0277 81.50 0.04813 ns
0.95  47 50 0.0310 38.44 61.56 100 0.0277 79.92 0.04777 ns
0.95  48 50 0.0310 35.86 64.14 100 0.0277 78.32 0.04766 ns
0.95  49 50 0.0310 33.34 66.66 100 0.0277 76.69 0.04741 ns
0.95  50 50 0.0310 30.77 69.23 100 0.0277 75.02 0.04712 ns

No inflation of the TIE if we use Pocock’s approach with Lan/DeMets α-spending. Power is pretty high and drops below 80% only for CV > 46%.

2. ‘Type 1’ TSD
GMR CV% n1  alpha  pwr%  2nd% E[N]  alpha  pwr%     TIE
0.95  30 50 0.0302 83.74  6.62   51 0.0302 86.14 0.03192 ns
0.95  31 50 0.0302 81.33  9.85   51 0.0302 85.20 0.03298 ns
0.95  32 50 0.0302 78.79 13.58   52 0.0302 84.42 0.03437 ns
0.95  33 50 0.0302 76.18 17.59   52 0.0302 83.90 0.03594 ns
0.95  34 50 0.0302 73.56 21.53   53 0.0302 83.55 0.03756 ns
0.95  35 50 0.0302 70.87 25.45   54 0.0302 83.31 0.03924 ns
0.95  36 50 0.0302 68.18 29.13   56 0.0302 83.13 0.04093 ns
0.95  37 50 0.0302 65.36 32.70   57 0.0302 83.07 0.04238 ns
0.95  38 50 0.0302 62.51 36.19   59 0.0302 82.98 0.04353 ns
0.95  39 50 0.0302 59.69 39.44   61 0.0302 82.87 0.04469 ns
0.95  40 50 0.0302 56.90 42.55   64 0.0302 82.89 0.04578 ns
0.95  41 50 0.0302 54.13 45.52   66 0.0302 82.83 0.04632 ns
0.95  42 50 0.0302 51.37 48.42   69 0.0302 82.68 0.04717 ns
0.95  43 50 0.0302 48.60 51.26   72 0.0302 82.64 0.04797 ns
0.95  44 50 0.0302 45.82 54.10   75 0.0302 82.55 0.04843 ns
0.95  45 50 0.0302 43.01 56.94   79 0.0302 82.48 0.04893 ns
0.95  46 50 0.0302 40.31 59.65   82 0.0302 82.45 0.04909 ns
0.95  47 50 0.0302 37.73 62.25   86 0.0302 82.33 0.04977 ns
0.95  48 50 0.0302 35.16 64.83   90 0.0302 82.23 0.04949 ns
0.95  49 50 0.0302 32.58 67.41   95 0.0302 82.13 0.04975 ns
0.95  50 50 0.0302 30.02 69.98   99 0.0302 82.03 0.04963 ns

No inflation of the TIE. Power in the first stage is similar to the GSD (since alphas are similar). Overall power is more consistent and doesn’t drop below the target 80%.

Now my questions (especially @Ben). If the CV is lower than the ‘best guess’ in the GSD we have to go full throttle with another 50 subjects. Compare the column “2nd%” which gives the chance to proceed to the 2nd part. Not only the chance is higher in the GSD, we are punished with another 50 subjects. Have a look at the TSD’s column “E[N]” giving the expected average total sample size. Much lower. Sure. Sometimes we need just a few more subjects and not another 50. Only for high CVs the TSD’s approach the GSD’s. Nice side effect: If we start the TSD in 75% of the fixed sample design’s n, on the average the total sample will be even (slightly) lower (64 < 66).
Given all that: Why should one use a GSD instead of a TSD?

1. Edit: I misinterpreted the question. He was talking about 50% (regardless the sample size) – not n=50.
2. R-code
2.1. GSD (14 seconds on my machine)
library(ldbounds)
library(Power2Stage)
## More than one interim possible in GSDs. However, since
## not acceptable according to BE-GLs, only one interim
## (at arbitrary time) is implemented in Power2Stage.
GMR    <- 0.95
n      <- c(50, 50)
CV     <- seq(0.3, 0.5, 0.01)
cum    <- vector("numeric", length=length(n))
for (j in seq_along(n)) cum[j] <- max(cum) + n[j] t      <- cum/max(cum)
alpha  <- rep(0.05/2, 2)
iuse   <- rep(2, 2)
bnds   <- bounds(t=t, iuse=iuse, alpha=alpha)
alpha  <- round(2*(1-pnorm(bnds$upper.bounds)), 4) sig <- binom.test(x=0.05*1e6, n=1e6, alternative='less', conf.level=1-0.05)$conf.int[2]
res    <- matrix(nrow=length(CV), ncol=11, byrow=TRUE,
dimnames=list(NULL, c("GMR", "CV%",
"n", "alpha", "pwr%", "2nd%",
"N", "alpha", "pwr%", "TIE", " ")))
ptm    <- proc.time()
for (j in seq_along(CV)) {
tmp1 <- power.2stage.GS(alpha=alpha, n=n, CV=CV[j], theta0=GMR)
tmp2 <- power.2stage.GS(alpha=alpha, n=n, CV=CV[j], theta0=1.25, nsims=1e6)
res[j,  1] <- sprintf("%.2f", GMR)
res[j,  2] <- sprintf("%.0f", 100*CV[j])
res[j,  3] <- sprintf("%.0f", n[1])
res[j,  4] <- sprintf("%.4f", alpha[1])
res[j,  5] <- sprintf("%.2f", 100*tmp1$pBE_s1) res[j, 6] <- sprintf("%.2f", tmp1$pct_s2)
res[j,  7] <- sprintf("%.0f", max(cum))
res[j,  8] <- sprintf("%.4f", alpha[2])
res[j,  9] <- sprintf("%.2f", 100*tmp1$pBE) res[j, 10] <- sprintf("%.5f", tmp2$pBE)
res[j, 11] <- "ns"
if (tmp2$pBE > sig) res[j, 11] <- "*" } run.time <- proc.time()-ptm cat("Runtime:",signif(run.time[3], 3), " seconds\n");print(as.data.frame(res), row.names=F) op <- par(no.readonly=TRUE) par(mfrow = c(1, 2), oma=c(0, 0, 5, 0), mar=c(4, 4, 0, 1)) plot(res[, 2], res[, 10], xlab="CV (%)", ylab="empiric Type I Error", pch=16, col="#0000FF") abline(h=c(0.05, sig), lty=c(1, 3), col=c("#0000FF", "#FF0000")) plot(res[, 2], res[, 9], xlab="CV (%)", ylab="power (%)", ylim=c(70, 100), pch=16, col="#008000") points(res[, 2], res[, 5], pch=16, col="#0000FF") abline(h=80, col="#008000") legend("topright", legend=c("at interim", "final"), pch=rep(16, 2), col=c("#0000FF", "#008000"), bty="n") main.txt <- paste0("Pocock-Lan/DeMets Group Sequential Design\n", "with one interim analysis, expected GMR: ", GMR, ".\n", "Cumulative sample sizes: ") n.txt <- "" for (j in seq_along(cum)) { if (j < length(cum)) { n.txt <- paste0(n.txt, cum[j], " (\u03b1 ", sprintf("%.4f", alpha[j]), "), ") } else { n.txt <- paste0(n.txt, cum[j], " (\u03b1 ", sprintf("%.4f", alpha[j]), ").\n") } } main.txt <- paste0(main.txt, n.txt) title(main.txt, outer=TRUE) par(op) 2.2. TSD (be patient; 7 minutes on my machine) library(Power2Stage) GMR <- 0.95 n1 <- 50 CV <- seq(0.3, 0.5, 0.01) alpha <- rep(0.0302, 2) sig <- binom.test(x=0.05*1e6, n=1e6, alternative='less', conf.level=1-0.05)$conf.int[2]
res    <- matrix(nrow=length(CV), ncol=11, byrow=TRUE,
dimnames=list(NULL, c("GMR", "CV%",
"n1", "alpha", "pwr%", "2nd%",
"E[N]", "alpha", "pwr%", "TIE", " ")))
ptm    <- proc.time()
for (j in seq_along(CV)) {
tmp1 <- power.2stage(alpha=alpha, n1=n1, CV=CV[j], theta0=GMR)
tmp2 <- power.2stage(alpha=alpha, n1=n1, CV=CV[j], theta0=1.25, nsims=1e6)
res[j,  1] <- sprintf("%.2f", GMR)
res[j,  2] <- sprintf("%.0f", 100*CV[j])
res[j,  3] <- sprintf("%.0f", n1)
res[j,  4] <- sprintf("%.4f", alpha[1])
res[j,  5] <- sprintf("%.2f", 100*tmp1$pBE_s1) res[j, 6] <- sprintf("%.2f", tmp1$pct_s2)
res[j,  7] <- sprintf("%.0f", tmp1$nmean) res[j, 8] <- sprintf("%.4f", alpha[2]) res[j, 9] <- sprintf("%.2f", 100*tmp1$pBE)
res[j, 10] <- sprintf("%.5f", tmp2$pBE) res[j, 11] <- "ns" if (tmp2$pBE > sig) res[j, 11] <- "*"
}
run.time <- proc.time()-ptm
cat("Runtime:",signif(run.time[3]/60, 3),
" minutes\n");print(as.data.frame(res), row.names=F)
par(mfrow = c(1, 2),
oma=c(0, 0, 5, 0),
mar=c(4, 4, 0, 1))
plot(res[, 2], res[, 10], xlab="CV (%)", ylab="empiric Type I Error",
pch=16, col="#0000FF")
abline(h=c(0.05, sig), lty=c(1, 3), col=c("#0000FF", "#FF0000"))
plot(res[, 2], res[, 9], xlab="CV (%)", ylab="power (%)",
ylim=c(70, 100), pch=16, col="#008000")
points(res[, 2], res[, 5], pch=16, col="#0000FF")
abline(h=80, col="#008000")
legend("topright", legend=c("at interim", "final"), pch=rep(16, 2),
col=c("#0000FF", "#008000"), bty="n")
main.txt <- paste0("\u2018Type 1\u2019 Adaptive Two-Stage Sequential Design,",
"\nexpected GMR: ", GMR, ".\n",
"Stage 1 sample size: ", n1, ", \u03b1 in both stages: ",
sprintf("%.4f", alpha[1]), ".")
title(main.txt, outer=TRUE)
par(op)

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2015-11-27 19:54
(1790 d 21:16 ago)

@ Helmut
Posting: # 15681
Views: 11,225

## Adaptive TSD vs. “classical” GSD

Hi Hötzi,

» Given all that: Why should one use a GSD instead of a TSD?

It is a great question, and I will not offer a definitive answer, but I will volunteer an opinion

You can look at it this way: A GSD is a kind of TSD where you make assumptions about both the GMR and the CV (you use the anticipated ones, not the observed ones) when you transit from stage 1 to stage 2. That anticipated pair of CV+GMR is exactly the (or a) combo that naïvely doubles the sample size. Simple but extremely rigid.

Thereby the GSDs should be considered a relic from bygone ages when computers were not fast enough to allow simulations to achieve what Potvin et al. have done.

I could be wrong, but...

Best regards,
ElMaestro

No, of course you do not need to audit your CRO if it was inspected in 1968 by the agency of Crabongostan.
d_labes
★★★

Berlin, Germany,
2015-11-30 11:15
(1788 d 05:56 ago)

@ Helmut
Posting: # 15684
Views: 11,064

## “classical” GSD - E[n]

Dear Helmut,

» Now my questions (especially @Ben). If the CV is lower than the ‘best guess’ in the GSD we have to go full throttle with another 50 subjects. Compare the column “2nd%” which gives the chance to proceed to the 2nd part. Not only the chance is higher in the GSD, we are punished with another 50 subjects. Have a look at the TSD’s column “E[N]” giving the expected average total sample size. Much lower.

Much lower than what?
Your presentation of the GSD results is a little bit unfair. It seems that the expected N is 100.
But thats not true:

E[N] = (1-pctS2/100)*n1 + (pctS2/100)*(n1+n2)

in case of a GSD with one interim. That gives f.i. E[N] = 71.3 for CV=40% and n1=n2=50. IMHO not that much higher compared to 64 for the adaptive TSD.

The fact itselve is left: E[N]GSD > E[N]TSD, at least for this example.

Regards,

Detlew
Helmut
★★★

Vienna, Austria,
2015-12-01 16:35
(1787 d 00:36 ago)

@ d_labes
Posting: # 15685
Views: 10,885

## Apples are pears by comparing the weight

Dear Detlew

» » […] Much lower.
» Much lower than what?

TSD’s E[N] than GSD’s N.

» Your presentation of the GSD results is a little bit unfair. It seems that the expected N is 100.

I see.

» But thats not true: […]

You are absolutely right. As (almost) always. THX!

The line
res[j,  7] <- sprintf("%.0f", max(cum))
should be replaced by
res[j,  7] <- sprintf("%.1f", (1-tmp1$pct_s2/100)*n[1] + (tmp1$pct_s2/100)*max(cum))

E[N]
───────────
expected N   GSD    TSD
───────────────────────
50      71.3   63.7
100      84.6   99.2

Interesting.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2015-12-03 09:16
(1785 d 07:55 ago)

@ Helmut
Posting: # 15691
Views: 10,779

## Apples are pears by comparing the weight

Dear Helmut,

» The line
»   res[j,  7] <- sprintf("%.0f", max(cum))
» should be replaced by
»   res[j,  7] <- sprintf("%.1f", (1-tmp1$pct_s2/100)*n[1] + » (tmp1$pct_s2/100)*max(cum))
»
»                 E[N]
»             ───────────
» expected N   GSD    TSD
» ───────────────────────
»     50      71.3   63.7
»    100      84.6   99.2

» Interesting.

BTW: ? cumsum.

Regards,

Detlew
Helmut
★★★

Vienna, Austria,
2015-12-03 13:10
(1785 d 04:01 ago)

@ d_labes
Posting: # 15693
Views: 10,816

## Apples are pears by comparing the weight

Dear Detlew,

» »                 E[N]
» »             ───────────
» » expected N   GSD    TSD
» » ───────────────────────
» »     50      71.3   63.7
» »    100      84.6   99.2
» » Interesting.
»
»

My interpretation of the question in the first post was 50. If I understood your post correctly this was a misinterpretation and it should be 100.

» BTW: ? cumsum.

I didn’t know that! Hence, my loop. Replace

cum    <- vector("numeric", length=length(n))
for (j in seq_along(n)) cum[j] <- max(cum) + n[j]

by

cum    <- cumsum(n)

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2015-12-03 13:56
(1785 d 03:15 ago)

@ Helmut
Posting: # 15694
Views: 10,672

## Oranges

Dear Helmut,

seems I have spoken Suaheli.

What I meant was:
You perform stage 1 with n1. Only if necessary you perform stage 2 with n2. Thus N(total) isn't always n1+n2 (=100 in your example).
Then the 'expected' total sample size aka 'mean' sample size aka ASN can be calculated via my formula given above.

If it is reasonable to calculate a mean for a variable with only 2 values is left to you. That's the reason why power.2stage.GS() dosn't give back components concerning the sample size 'distribution', unlike the other power.2stage.whatever() functions.

Hope my English was now the better Suaheli .

Regards,

Detlew
Ben
★

2015-12-02 19:27
(1785 d 21:43 ago)

@ Helmut
Posting: # 15687
Views: 10,991

## Adaptive TSD vs. “classical” GSD

Dear Helmut / All,

You raised an interesting question and yes the TSD from Potvin et al appears to have astonishing design features. The classical GSD or the adaptive two-stage design according to the inverse normal method rely on a formal statistical framework: mathematical theorems including proofs are available on why they work, what properties they have and how they should be applied. This is nice. For the Potvin approach we only have simulations for certain scenarios at hand. Even though it appears to be good, it is not clear if this is always the case. More information on that topic with some more elaborations contains for example the article from Kieser and Rauch (2015).

» In a TSD one would opt for a stage 1 sample size of ~75% of the fixed sample design.

Reference? Some software packages give an inflation factor that helps determining the study size… Anyhow, I think such a rule of thumb is too strict and inflexible.

Consider for example two alternative scenarios:
• Pre-planned n1 = 52 and final N = 78 (i.e. n2 = 26). The average sample number (ASN) is smaller than for the Potvin TSD. Power is higher up until a certain point where the CV gets too high.
• Pre-planned n1 = 48, n2 = 48. ASN comparable, Power similarly as above.
Two further comments on the comparison:
• For some reason the variable alpha and iuse for bounds() should be 1-dimensional to get the correct values. IMHO alpha is then 1-pnorm(bnds$upper.bounds), i.e. do not multiply by 2. • As Detlew already pointed out, the comparison using the overall N from the classical GSD to the average sample number from the TSD is not fair, one should use the ASN in both cases. Therefore, I think the GSD has some charme and can be useful in situations with uncertainty. Moreover, the advantage is that we do not have to rely on only simulation results from certain parameter settings (So I disagree with ElMaestro on that GSDs are a relic from the past with inferior properties). Best regards, Ben Ref: Kieser M, Rauch G Two-stage designs for cross-over bioequivalence trials Stat Med (Epub ahead of print 24 March 2015) doi 10.1002/sim.6487 Helmut ★★★ Vienna, Austria, 2015-12-03 03:11 (1785 d 14:00 ago) @ Ben Posting: # 15689 Views: 11,062 ## Adaptive TSD vs. “classical” GSD Dear Ben et alii, » […] the TSD from Potvin et al appears to have astonishing design features. The classical GSD or the adaptive two-stage design according to the inverse normal method rely on a formal statistical framework: mathematical theorems including proofs are available on why they work, what properties they have and how they should be applied. This is nice. For the Potvin approach we only have simulations for certain scenarios at hand. Even though it appears to be good, it is not clear if this is always the case. More information on that topic with some more elaborations contains for example the article from Kieser and Rauch (2015). I agree that the frameworks of Potvin etc. are purely empirical. To show whether a given α maintains the TIE for a desired range of n1/CV and target power takes 30 minutes in Power2Stage. I’m not sure whether the two lines in Kieser/Rauch fulfill the requirements of a formal proof. IMHO, it smells more of a claim. At least Gernot Wassmer told me that it it not that easy. » » In a TSD one would opt for a stage 1 sample size of ~75% of the fixed sample design. » » Reference? Some software packages give an inflation factor that helps determining the study size… Anyhow, I think such a rule of thumb is too strict and inflexible. See the discussion in my review, Table 3 in the Supplementary Material, and R-code at the end. GMR : 0.95 target power : 0.8 ‘best guess’ CV : 0.2 Fixed sample size: 20 power : 0.835 ‘Type 1’ TSD, n1 : 16 (80.0% of N) ASN=E[N] : 20.0 power (interim): 0.619 power (final) : 0.851 ~75% are not a natural constant, but work pretty well in many cases. ASN ~ N. You can play this game for any scenario you like (look up in the papers which α is suitable for which combination of GMR and target power). Final power is always higher than in the fixed sample design any you get already a fair chance of passing in the first stage. » Consider for example two alternative scenarios: » ● Pre-planned n1 = 52 and final N = 78 (i.e. n2 = 26). The average sample number (ASN) is smaller than for the Potvin TSD. Power is higher up until a certain point where the CV gets too high. Hhm. See the code at the end. I tried to implement your suggestions. CV% method alpha[1] alpha[2] ASN=E[N] power TIE 30.00 GSD 0.03817 0.02704 55.2 0.9631 0.05009 ns 30.00 Potvin B 0.02940 0.02940 52.4 0.8673 0.03062 ns 40.00 GSD 0.03817 0.02704 61.2 0.8236 0.05026 ns 40.00 Potvin B 0.02940 0.02940 64.5 0.8283 0.04396 ns 44.19 GSD 0.03817 0.02704 64.0 0.7492 0.04995 ns 44.19 Potvin B 0.02940 0.02940 76.2 0.8250 0.04725 ns 50.00 GSD 0.03817 0.02704 67.7 0.6365 0.04980 ns 50.00 Potvin B 0.02940 0.02940 99.1 0.8211 0.04841 ns 60.00 GSD 0.03817 0.02704 73.3 0.4224 0.04432 ns 60.00 Potvin B 0.02940 0.02940 151.4 0.8079 0.04253 ns What I don’t understand in GSDs (lacking experience): How do you arrive at N? Is Detlew right when he said that this is the expected sample size? Your example would translate to a fixed sample design with GMR 0.95, CV ~44%, and target power 0.8. So the only purpose of the interim is hoping for a lucky punch (i.e., ASN 64)? If the CV is just a little bit higher (50%), power is unacceptable. Type I Error? » ● Pre-planned n1 = 48, n2 = 48. ASN comparable, Power similarly as above. CV% method alpha[1] alpha[2] ASN=E[N] power TIE 30.00 GSD 0.03101 0.02973 56.5 0.9858 0.05004 ns 30.00 Potvin B 0.02940 0.02940 48.9 0.8535 0.03202 ns 40.00 GSD 0.03101 0.02973 69.6 0.8927 0.04996 ns 40.00 Potvin B 0.02940 0.02940 64.0 0.8290 0.04540 ns 49.65 GSD 0.03101 0.02973 82.2 0.7451 0.04841 ns 49.65 Potvin B 0.02940 0.02940 100.4 0.8173 0.04821 ns 50.00 GSD 0.03101 0.02973 82.6 0.7382 0.04840 ns 50.00 Potvin B 0.02940 0.02940 102.0 0.8184 0.04811 ns 60.00 GSD 0.03101 0.02973 91.9 0.5492 0.03998 ns 60.00 Potvin B 0.02940 0.02940 154.6 0.8025 0.03988 ns Well… » Therefore, I think the GSD has some charme and can be useful in situations with uncertainty. If (if!) you have some clue about the variability. » Moreover, the advantage is that we do not have to rely on only simulation results from certain parameter settings. 30 minutes. I will again chew on the e-mail conversation we had last April. R-codes 1. Find n1 for TSDs based on a ‘best guess’ CV. library(PowerTOST) library(Power2Stage) stg1 <- function(x) { power.2stage(n1=x, method="B", CV=CV, alpha=rep(0.0294, 2), theta0=0.95, targetpower=0.8)$nmean
} # defaults to Potvin B
method      <- "B"
alpha       <- rep(0.0294, 2)
GMR         <- 0.95
targetpower <- 0.8
CV          <- 0.2
methods     <- c("B", "C")
types       <- c("\u2018Type 1\u2019", "\u2018Type 2\u2019")
fix         <- sampleN.TOST(CV=CV, targetpower=targetpower,
theta0=GMR, details=F, print=F)
N           <- fix[["Sample size"]]
pwr         <- fix[["Achieved power"]]
n1          <- round(optimize(stg1, interval=c(12, N),
tol=0.1)$minimum, 0) n1 <- n1 + n1%%2 res <- power.2stage(method=method, n1=n1, CV=CV, alpha=alpha, theta0=GMR, targetpower=targetpower, details=F) cat("\nGMR :", GMR, "\ntarget power :", targetpower, "\n\u2018best guess\u2019 CV :", CV, "\nFixed sample size:", N, sprintf("%s %.3f", "\n power :", pwr), sprintf("%s %d %s%.1f%% %s", paste0("\n", types[match(method, methods)], " TSD, n1 :"), n1, "(", 100*n1/N, "of N)"), sprintf("%s %.1f", "\n ASN=E[N] :", res$nmean),
sprintf("%s %.3f", "\n  power (interim):", res$pBE_s1), sprintf("%s %.3f", "\n power (final) :", res$pBE), "\n")

2. Comparison of GSD and TSD

library(ldbounds)
library(PowerTOST)
library(Power2Stage)
findCV <- function(x) power.TOST(CV=x, n=n1+n2)-0.8
n1    <- 52 # 48
n2    <- 26 # 48
t     <- c(n1/(n1+n2), 1)
bnds  <- bounds(t=t, iuse=2, alpha=0.05)
alpha <- 1-pnorm(bnds$upper.bounds) CVest <- uniroot(findCV, interval=c(0.01, 3), tol=1e-7)$root
CV    <-sort(c(seq(0.3, 0.6, 0.1), CVest))
sig   <- binom.test(x=0.05*1e6, n=1e6, alternative="less",
conf.level=1-0.05)$conf.int[2] res <- matrix(data=NA, nrow=length(CV)*2, ncol=8, byrow=TRUE, dimnames=list(NULL, c("CV%", "method", "alpha[1]", "alpha[2]", "ASN=E[N]", "power", "TIE", " "))) k <- 0 for (j in seq_along(CV)) { k <- k + 1 GSD.TIE <- power.2stage.GS(alpha=alpha, n=c(n1, n2), CV[j], theta0=1.25, nsims=1e6, details=FALSE) GSD.pwr <- power.2stage.GS(alpha=alpha, n=c(n1, n2), CV[j], theta0=0.95, nsims=1e5, details=FALSE) Pot.TIE <- power.2stage(alpha=rep(0.0294, 2), n1=n1, CV=CV[j], theta0=1.25, nsims=1e6, details=FALSE) Pot.pwr <- power.2stage(alpha=rep(0.0294, 2), n1=n1, CV=CV[j], theta0=0.95, nsims=1e5, details=FALSE) res[k, 1] <- sprintf("%.2f", CV[j]*100) res[k, 2] <- "GSD" res[k, 3] <- sprintf("%.5f", alpha[1]) res[k, 4] <- sprintf("%.5f", alpha[2]) res[k, 5] <- sprintf("%.1f", (1-GSD.pwr$pct_s2/100)*n1 +
(GSD.pwr$pct_s2/100)*(n1+n2)) res[k, 6] <- sprintf("%.4f", GSD.pwr$pBE)
res[k, 7] <- sprintf("%.5f", GSD.TIE$pBE) if (GSD.TIE$pBE <= sig) res[k, 8] <- "ns" else res[k, 8] <- "*"
k <- k + 1
res[k, 1] <- sprintf("%.2f", CV[j]*100)
res[k, 2] <- "Potvin B"
res[k, 3:4] <- rep(sprintf("%.5f", 0.0294), 2)
res[k, 5] <- sprintf("%.1f", Pot.pwr$nmean) res[k, 6] <- sprintf("%.4f", Pot.pwr$pBE)
res[k, 7] <- sprintf("%.5f", Pot.TIE$pBE) if (Pot.TIE$pBE <= sig) res[k, 8] <- "ns" else res[k, 8] <- "*"
}
print(as.data.frame(res), row.names=FALSE)

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2015-12-03 09:47
(1785 d 07:24 ago)

(edited by d_labes on 2015-12-03 16:19)
@ Helmut
Posting: # 15692
Views: 10,770

## “classical” GSD alpha's

Dear Helmut, dear Ben!

Two-sided or not two-sided, that is the question!

library(ldbounds)
# two-sided, check via summary()
bds2.poc <- bounds(t=c(0.5,1), iuse=c(2,2), alpha=rep(0.025,2))
summary(bds2.poc)
2*(1-pnorm(bds2.poc$upper.bounds)) gives us: [1] 0.03100573 0.02774015 # one-sided bds1.poc <- bounds(t=c(0.5,1), iuse=2, alpha=0.05) summary(bds1.poc) 1-pnorm(bds1.poc$upper.bounds)

gives us:
[1] 0.03100573 0.02972542
simsalabim Ben's preferred values.

I personally opt for two-sided .

BTW: Lan/deMets spending function is Pocock like.
Nearer to original Pocock are the mean of the critical values. Try
2*(1-pnorm(rep(mean(bds2.poc$upper.bounds),2))) simsalabim, Pocock's natural constant! Nearly. (1-pnorm(rep(mean(bds1.poc$upper.bounds),2)))
hokus pokus fidibus, Ben's magical number!

Regards,

Detlew
Helmut
★★★

Vienna, Austria,
2015-12-03 14:56
(1785 d 02:15 ago)

@ d_labes
Posting: # 15695
Views: 10,799

## N sufficiently large‽

Dear Detlew & Ben,

» Two-sided or not two-sided, that is the question!

Yessir!

» 2*(1-pnorm(rep(mean(bds2.poc$upper.bounds),2))) » simsalabim, Pocock's natural constant! mean(bds2.poc$upper.bounds)
[1] 2.17897

Therefore,

[1] 0.02933386 0.02933386

Close! Actually:

rep(2*(1-pnorm(2.178)), 2)
[1] 0.02940604 0.02940604

2.178 from Jennison/Turnbull1 Table 2.1
‘Exact’

library(mvtnorm)
mu    <- c(0, 0)
sigma <- diag(2); sigma[sigma == 0] <- 1/sqrt(2)
C     <- qmvnorm(1-0.05, tail="both.tails", mean=mu,
sigma=sigma)$quantile C [1] 2.178273 rep(2*(1-pnorm(C)), 2) [1] 0.0293857 0.0293857 I think that Kieser/Rauch are correct in their lament about one- vs. two-sided Pocock’s limits. They argue for 0.0304 (which Jones/Kenward2 used in chapter 13 as well). Jennison/Turnbull give Cp (K=2, α=0.10) 1.875: rep(1-pnorm(1.875), 2) [1] 0.03039636 0.03039636 Or C <- qmvnorm(1-2*0.05, tail="both.tails", mean=mu, sigma=sigma)$quantile
C
[1] 1.875424
rep(1-pnorm(C), 2)
[1] 0.03036722 0.03036722

Furthermore:

library(ldbounds)
C <- mean(bounds(t=c(0.5, 1), iuse=c(2, 2), alpha=rep(0.05, 2))\$upper.bounds)
C
[1] 1.875529
rep(1-pnorm(C), 2)
[1] 0.03035998 0.03035998

It’s a mess!

In chapter 12 Jones/Kenward (in the context of blinded sample re-estimation) report an inflation of the TIE. The degree of inflation depends on the timing of the interim (the earlier, the worse). They state:

“In the presence of Type I error rate inflation, the value of α used in the TOST must be reduced, so that the achieved Type I error rate is no larger than 0.05.”

(my emphasis)
They recommend an iterative algorithm [sic] by Golkowski et al3 and conclude:

“[…] before using any of the methods […], their operating characteristics should be evalu­ated for a range of values of n1, CV and true ratio of means that are of interest, in order to decide if the Type I error rate is controlled, the power is adequate and the potential maxi­mum total sample size is not too great.”

Given all that, I’m not sure whether the discussion of proofs, exact values, etc. does make sense at all. This wonderful stuff is based solely on normal theory and I’m getting bored by reading the phrase “when N is sufficiently large” below a series of fancy formulas. Unless someone comes up with a proof for small samples (many tried, all failed so far) I rather stick to simulations.

1. Jennison C, Turnbull BW. Group sequential methods with applications to clinical trials. Boca Raton: Chapman & Hall/CRC; 1999.
2. Jones B, Kenward MG. Design and analysis of cross-over trials. Boca Raton: Chapman & Hall/CRC; 3rd ed 2014.
3. Golkowski D, Friede T, Kieser M. Blinded sample size reestimation in crossover bioequivalence trials. Pharm Stat. 2014;13(3):157–62. doi 10.1002/pst.1617

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2015-12-03 16:15
(1785 d 00:56 ago)

@ Helmut
Posting: # 15696
Views: 10,669

## An other one with 0.0304

Dear Helmut,

» ... I think that Kieser/Rauch are correct in their lament about one- vs. two-sided Pocock’s limits. They argue for 0.0304 (which Jones/Kenward2 used in chapter 13 as well). Jennison/Turnbull give Cp (K=2, α=0.10) 1.875:
» rep(1-pnorm(1.875), 2)
» [1] 0.03039636 0.03039636

I have another one:
Gould A. L.
"Group Sequential Extensions of a Standard Bioequivalence Testing Procedure"
Journal of Pharmacokinetics and Biopharmaceutics. Vol 23. No.1. 1995
Table I: critical value for n1=n2: 1.8753

Seems I have to change my personal preference stated in my post above.

That means on the other hand: Potvin und Konsorten were much more lucky then they should have been.
Thats great .

Regards,

Detlew
Helmut
★★★

Vienna, Austria,
2015-12-03 16:26
(1785 d 00:45 ago)

@ d_labes
Posting: # 15697
Views: 10,654

## An other one with 0.0304

Dear Detlew,

» I have another one:
» Gould A. L.
» "Group Sequential Extensions of a Standard Bioequivalence Testing Procedure"
» Journal of Pharmacokinetics and Biopharmaceutics. Vol 23. No.1. 1995
» Table I: critical value for n1=n2: 1.8753

How could I forget Mr Gould? He was first in exploring this stuff for BE-studies.

rep(1-pnorm(1.8753), 2)
[1] 0.03037573 0.03037573

BTW, in the late 1990s I tried his method (scientific advices in France and Germany). Both didn’t accept it.

» That means on the other hand: Potvin und Konsorten were much more lucky then they should have been.

Yep. Lucky punch.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Ben
★

2016-01-10 12:43
(1747 d 04:28 ago)

@ Helmut
Posting: # 15808
Views: 8,524

## Adaptive TSD vs. “classical” GSD

Dear Helmut / All,

» I agree that the frameworks of Potvin etc. are purely empirical. To show whether a given α maintains the TIE for a desired range of n1/CV and target power takes 30 minutes in Power2Stage.

Well, yes, but this is again only empirical.

» I’m not sure whether the two lines in Kieser/Rauch fulfill the requirements of a formal proof.

I actually meant the dicsussion on the decision scheme and the properties from Potvin et al (not mathematical theorems and proofs - there are in fact none).

» What I don’t understand in GSDs (lacking experience): How do you arrive at N? Is Detlew right when he said that this is the expected sample size?

You can use the sample size from a fixed design and adapt it based on an inflation factor. Addplan for example provides such values (one should however keep in mind that everything in Addplan is based on the Normal approximation). Of course, no one keeps you from further playing around and checking some design properties (as for example the resulting average sample size). A good idea may be to focus on a realistic best guess for the interim CV and so determine n1, and to cover a bad CV scenario via the second stage n2.

» Your example would translate to a fixed sample design with GMR 0.95, CV ~44%, and target power 0.8. So the only purpose of the interim is hoping for a lucky punch (i.e., ASN 64)? If the CV is just a little bit higher (50%), power is unacceptable.

In your case the CV is already pretty high and maybe the design properties do not behave so well in those regions? I have not investigated this thoroughly...

» If (if!) you have some clue about the variability.

Yes, but when is this not the case? You would not conduct a confirmatory BE study without having performed other PK studies with that substance, would you? You will always have a first in man trial and some bioavailability trials (or historical trials from a comparator).

Regarding the boundaries that Detlew mentioned: It should be based on one-sided bounds. Using directly two-sided can mess things up.

Best regards,
Ben