EEU-rules, TSD-methods (lengthy answer) [Two-Stage / GS Designs]
❝ We are planning to submit applications in Belarus (the country I`m from) and in Russia. Hope method C will be OK for them.
THX for the information. Sections 97/98 of the EEU regulations are a 1:1 translation of the corresponding section about TSDs in the EMA’s BE-GL.
It is difficult to predict how regulators of the EEU interpret their own guideline. Maybe other members from Belarus (4) and Russia (21) can share their experiences.
My ranking (not based on scientific value but on likelihood of acceptance) in the following. To explore the empiric type I error (TIE) I recommend functions of the -package
Power2Stage
with 1 mio simulations at theta0=1.25
. When I give locations of the maximum TIE it is based on a much narrower grid than in the publications (n1 12…72, step size 2 and CV 10…80%, step size 2%).[list=1][*]Potvin et al. “Method B” 1According to the wording of the GL “… both analyses [should be] conducted at adjusted significance levels …”
Maximum inflation of the TIE 0.0490 (with n1 12 and CV 24%). Hence, the adjusted α 0.0294 is conservative.
power.tsd(method="B", alpha=c(0.0294, 0.0294), CV=0.24,
n1=12, theta0=1.25, nsims=1e6)[["pBE"]]
# [1] 0.048762
However, no inflation of the TIE with a slightly more liberal α 0.0302.
power.tsd(method="B", alpha=c(0.0302, 0.0302), CV=0.24,
n1=12, theta0=1.25, nsims=1e6)[["pBE"]]
# [1] 0.049987
[*]Karalis “TSD-2” 2
Futility rule for the total sample size of 150. No inflation of the TIE. Compare with the Potvin B above (0.048762):
power.tsd.KM(method="B", alpha=c(0.0294, 0.0294), CV=0.24,
n1=12, theta0=1.25, Nmax=150, nsims=1e6)[["pBE"]]
# [1] 0.041874
However, power may be negatively affected 3,4 and total sample sizes sometimes even larger. Comparison:
CV <- 0.25
n1 <- 14
alpha <- c(0.0294, 0.0294)
theta0 <- 0.95
res <- data.frame(method=c("Potvin B", "KM TSD-2"), power=NA,
N.min=NA, perc.5=NA, N.med=NA, perc.95=NA,
N.max=NA, stringsAsFactors=FALSE)
for (j in 1:2) {
if (j == 1) {
x <- power.tsd(method="B", alpha=alpha, CV=CV, n1=n1, theta0=theta0,
Nmax=Inf)
} else {
x <- power.tsd.KM(method="B", alpha=alpha, CV=CV, n1=n1, theta0=theta0,
Nmax=150)
}
res[j, "power"] <- x[["pBE"]]
res[j, "N.min"] <- x[["nrange"]][1]
res[j, 4:6] <- x[["nperc"]]
res[j, "N.max"] <- x[["nrange"]][2]
}
names(res)[c(3:7)] <- c("N min", "N 5%", "N med", "N 95%", "N max")
print(res, row.names=FALSE)
method power N min N 5% N med N 95% N max
Potvin B 0.82372 14 14 30 58 110
KM TSD-2 0.79893 14 14 32 106 150
[*]Karalis “TSD-1” 2
As above but decision scheme similar to Potvin C and α 0.0280.
power.tsd.KM(method="C", alpha=c(0.0280, 0.0280), CV=0.22,
n1=12, theta0=1.25, Nmax=150, nsims=1e6)[["pBE"]]
# [1] 0.041893
Compare to the TIE below.
[*]Potvin et al. “Method C” 1
Ignoring the sentence of the GL mentioned at #1 above and concentrating on “… there are many acceptable alternatives and the choice of how much alpha to spend at the interim analysis is at the company’s discretion.”
With the adjusted α 0.0294 there is a maximum inflation of the TIE of 0.0514 (with n1 12 and CV 22%).
power.tsd(method="C", alpha=c(0.0294, 0.0294), CV=0.22,
n1=12, theta0=1.25, nsims=1e6)[["pBE"]]
# [1] 0.051426
However, there is no inflation of the TIE for any CV and n1 ≥18.
If you want to go with Method C, I suggest a more conservative adjusted α 0.0280.
power.tsd(method="C", alpha=c(0.0280, 0.0280), CV=0.22,
n1=12, theta0=1.25, nsims=1e6)[["pBE"]]
# [1] 0.049669
[*]Xu et al., “Method E”, “Method F” 5
More powerful than the original methods of the same group of authors since two CV-ranges are considered. “Method E” is an extension of “Method B” and “Method F” of “Method C”. Both have different alphas in the stages and a futility rule based on the 90% CI and a maximum sample size (though not as futility). Slight mis-specification of the CV (say, you assumed CV 25% and the CV turns out to be 35%) still controls the TIE.
- “Method E”
CV 10–30%:
adjusted α 0.0249, 0.0363, min. n1 18, max.n 42, CI within {0.9374, 1.0667}
CV 30–55%:
adjusted α 0.0254, 0.0357, min. n1 48, max.n 180, CI within {0.9305, 1.0747}
- “Method F”
CV 10–30%:
adjusted α 0.0248, 0.0364, min. n1 18, max.n 42, CI within {0.9492, 1.0535}
CV 30–55%:
adjusted α 0.0259, 0.0349, min. n1 48, max.n 180, CI within {0.9350, 1.0695}
power.tsd.fC(method="B", alpha=c(0.0249, 0.0363), CV=0.30, n1=18,
fCrit="CI", fClower=0.9374, max.n=42, theta0=1.25,
nsims=1e6)[["pBE"]] # Method E (low CV)
# [1] 0.048916
power.tsd.fC(method="B", alpha=c(0.0254, 0.0357), CV=0.55, n1=48,
fCrit="CI", fClower=0.9305, max.n=180, theta0=1.25,
nsims=1e6)[["pBE"]] # Method E (high CV)
# [1] 0.045969
power.tsd.fC(method="C", alpha=c(0.0248, 0.0364), CV=0.30, n1=18,
fCrit="CI", fClower=0.9492, max.n=42, theta0=1.25,
nsims=1e6)[["pBE"]] # Method F (low CV)
# [1] 0.049194
power.tsd.fC(method="C", alpha=c(0.0259, 0.0349), CV=0.55, n1=48,
fCrit="CI", fClower=0.9350, max.n=180, theta0=1.25,
nsims=1e6)[["pBE"]] # Method F (high CV)
# [1] 0.045471
[*]Maurer et al. 6
The only approach not based on simulations and seemingly preferred by the EMA.
That’s the most flexible method because you can specify futility rules on the CI, achievable total power, maximum total sample size. Furthermore, you can base the decision to proceed to the second stage on the PE observed in the first stage (OK, this is supported by the functions of
Power2Stage
as well but not in the published methods – you would have to perform own simulations). Example: power.tsd.in(CV=0.24, n1=12, theta0=1.25, fCrit="No",
ssr.conditional="no", nsims=1e6)[["pBE"]]
# [1] 0.04642
Let us compare the method with data of Example 2 given by Potvin et al. Note that in this method you perform separate ANOVAs, one in the interim and one in the final analysis. In Example 2 we had 12 subjects in stage 1 and with both methods a second stage with 8 subjects. The final PE was 101.45% with a 94.12% CI of 88.45–116.38%. I switched off futility criteria and kept all other defaults.
interim.tsd.in(GMR1=1.0876, CV1=0.18213, n1=12,
fCrit="No", ssr.conditional="no")
TSD with 2x2 crossover
Inverse Normal approach
- Maximum combination test with weights for stage 1 = 0.5 0.25
- Significance levels (s1/s2) = 0.02635 0.02635
- Critical values (s1/s2) = 1.9374 1.9374
- BE acceptance range = 0.8 ... 1.25
- Observed point estimate from stage 1 is not used for SSR
- Without conditional error rates and conditional (estimated target) power
Interim analysis after first stage
- Derived key statistics:
z1 = 3.10000, z2 = 1.70344,
Repeated CI = (0.92491, 1.27891)
- No futility criterion met
- Test for BE not positive (not considering any futility rule)
- Calculated n2 = 8
- Decision: Continue to stage 2 with 8 subjects
Similar outcome. Not BE and second stage with 8 subjects.
final.tsd.in(GMR1=1.0876, CV1=0.18213, n1=12,
GMR2=0.9141, CV2=0.25618, n2=8)
TSD with 2x2 crossover
Inverse Normal approach
- Maximum combination test with weights for stage 1 = 0.5 0.25
- Significance levels (s1/s2) = 0.02635 0.02635
- Critical values (s1/s2) = 1.93741 1.93741
- BE acceptance range = 0.8 ... 1.25
Final analysis after second stage
- Derived key statistics:
z1 = 2.87952, z2 = 2.60501,
Repeated CI = (0.87690, 1.17356)
Median unbiased estimate = 1.0135
- Decision: BE achieved
Passed BE as well. PE 101.35 with a 94.73% CI of 87.69–117.36%.
Acceptance in Belarus & Russia – no idea. Might well be that their experts never have seen such a study before.[/list]Personally (‼) I would rank the methods[list=1][*]Maurer et al.
[*]Xu et al. “Method F”
[*]Xu et al. “Method E”
[*]Potvin et al. “Method C” (modified α 0.0280)
[*]Potvin et al. “Method B” (modified α 0.0302)
[*]Potvin et al. “Method C” (original α 0.0294)
[*]Potvin et al. “Method B” (original α 0.0294)
[*]Karalis “TSD-1”
[*]Karalis “TSD-2”[/list]Maybe the original “Method C” is risky when you proceed to the second stage (all my accepted studies were BE already in stage 1 and I have seen nasty deficiency letters in the past).
❝ Another 2 questions:
❝ 1. Is it a good idea to add the statement that, in case of conducting stage 2, data from both stages will be pooled by default, without evaluation of differences between stages?
If you performed the second stage, it’s mandatory to pool the data. None of the methods contains any kind of test between stages. Furthermore, a formulation-by-stage interaction term in the model is considered nonsense in the EMA’s Q&A.
❝ 2. Is there any established minimum for a number of subjects that should be included in stage 2 (e.g. at least 2, or at least 1 for each sequence (TR/RT))?
Nothing in the guidelines, but mentioned in the EMA’s Q&A document. However, that’s superfluous. If you perform a sample size estimation, in all software the minimum stage 2 sample size will be 2 anyhow (if odd, rounded up to the next even to obtain balanced sequences). In the functions of
Power2Stage
you can used the argument min.n2=2
and will never see any difference.Only if you are a nerd, read the next paragraph.
The conventional sample size estimation does not take the stage-term in the final analysis into account. If you prefer to use braces with suspenders, use the function sampleN2.TOST()
.
CV <- 0.25
n1 <- 12
res <- data.frame(method=c("PowerTOST::sampleN.TOST()",
"Power2Stage::sampleN2.TOST()"),
n1=n1, n2=NA, power=NA, stringsAsFactors=FALSE)
for (j in 1:2) {
if (j == 1) {
x <- PowerTOST::sampleN.TOST(alpha=0.0294, CV=CV, print=FALSE)[7:8]
res[j, 3] <- x[1] - n1
} else {
x <- Power2Stage::sampleN2.TOST(alpha=0.0294, CV=CV, n1=n1)[8:9]
res[j, 3] <- x[1]
}
res[j, "power"] <- x[["Achieved power"]]
}
print(res, row.names=FALSE)
method n1 n2 power
PowerTOST::sampleN.TOST() 12 22 0.8127230
Power2Stage::sampleN2.TOST() 12 22 0.8141106
Practically it is unlikely to get a difference in sample sizes…
In #5 the minimum is 4 because you perform a separate ANOVA in the second stage. One word of caution: If you have a nasty drug (dropouts due to AEs) take care that you don’t end up with <3 subjects – otherwise the ANOVA would not be possible.
In designing a study I recommend to call the functions with the arguments
theta0
and CV
, which are your best guesses. Don’t confuse that with the argument GMR
, which is fixed in most methods. Then you get an impression what might happen (chance to show BE in the first stage, probability to proceed to stage 2, average & range of total sample sizes…). n1 which is ~80% of a fixed sample design is a good compromise between chances to show BE in the first stage whilst keeping overall power. Example of finding a suitable futility rule of the total sample size:CV <- 0.25
n1 <- 0.8*PowerTOST::sampleN.TOST(CV=CV, print=FALSE)[["Sample size"]]
n1 <- ceiling(n1 + ceiling(n1) %% 2)
lo <- ceiling(1.5*n1 + ceiling(1.5*n1) %% 2)
hi <- ceiling(3*n1 + ceiling(3*n1) %% 2)
Nmax <- c(seq(lo, hi, 4), Inf)
res <- data.frame(Nmax=Nmax, power=NA)
for (j in seq_along(Nmax)) {
res$power[j] <- Power2Stage::power.tsd(CV=CV, n1=n1,
Nmax=Nmax[j])[["pBE"]]
}
print(res, row.names=FALSE)
Nmax power
36 0.70564
40 0.74360
44 0.77688
48 0.80214
52 0.81854
56 0.82976
60 0.83596
64 0.83957
68 0.84156
72 0.84153
Inf 0.84244
A futility rule of 48 looks good. Let’s explore the details:
Power2Stage::power.tsd(CV=CV, n1=n1, Nmax=48)
TSD with 2x2 crossover
Method B: alpha (s1/s2) = 0.0294 0.0294
Target power in power monitoring and sample size est. = 0.8
Power calculation via non-central t approx.
CV1 and GMR = 0.95 in sample size est. used
Futility criterion Nmax = 48
BE acceptance range = 0.8 ... 1.25
CV = 0.25; n(stage 1) = 24; GMR = 0.95
1e+05 sims at theta0 = 0.95 (p(BE) = 'power').
p(BE) = 0.80214
p(BE) s1 = 0.63203
Studies in stage 2 = 29.06%
Distribution of n(total)
- mean (range) = 27.6 (24 ... 48)
- percentiles
5% 50% 95%
24 24 44
Not that bad.
However, futility rules can be counterproductive because you have to come up with a “best guess” CV – which is actually against the “sprit” of TSDs. Homework:
Power2Stage::power.tsd(CV=0.30, n1=24, Nmax=48)
As ElMaestro wrote above you have to perform own simulations if you are outside the published methods (GMR, target power, n1/CV-grid, futility rules). My basic algorithm is outlined by Molins et al. 7
A final reminder: In the sample size estimation use the fixed
GMR
(not the observed one), unless the method allows that. [list=1][*]Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs. Pharm Stat. 2008; 7(4): 245–62. doi:10.1002/pst.294.
[*]Karalis V. The role of the upper sample size limit in two-stage bioequivalence designs. Int J Pharm. 2013; 456: 87–94. doi:j.ijpharm.2013.08.013.
[*]Fuglsang A. Futility Rules in Bioequivalence Trials with Sequential Designs. AAPS J. 2014; 16(1): 79–82. doi:10.1208/s12248-013-9540-0.
[*]Schütz H. Two-stage designs in bioequivalence trials. Eur J Clin Pharmacol. 2015; 71(3): 271–81. doi:10.1007/s00228-015-1806-2.
[*]Xu J, Audet C, DiLiberti CE, Hauck WW, Montague TH, Parr TH, Potvin D, Schuirmann DJ. Optimal adaptive sequential designs for crossover bioequivalence studies. Pharm Stat. 2016; 15(1): 15–27. doi:10.1002/pst.1721.
[*]Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018; 37(10): 1587–607. doi:10.1002/sim.7614.
[*]Molins E, Cobo E, Ocaña J. Two-stage designs versus European scaled average designs in bioequivalence studies for highly variable drugs: Which to choose? Stat Med. 2017; 36(30): 4777–88. doi:10.1002/sim.7452.[/list]
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Complete thread:
- Appropriate wording for a protocol Elena777 2019-09-09 19:34 [Two-Stage / GS Designs]
- Appropriate wording for a protocol ElMaestro 2019-09-09 21:39
- Appropriate wording for a protocol Helmut 2019-09-09 23:27
- Appropriate wording for a protocol Ohlbe 2019-09-10 10:27
- Which country? Helmut 2019-09-09 23:17
- Which country? Elena777 2019-09-11 20:24
- EEU-rules, TSD-methods (lengthy answer)Helmut 2019-09-12 01:31
- EEU-rules, TSD-methods (lengthy question) Astea 2019-09-14 14:56
- n2 based on PK metric with higher CV Helmut 2019-09-16 11:50
- Q&A ref Astea 2019-09-16 18:28
- The omniscient oracle has spoken Helmut 2019-09-17 12:27
- The omniscient oracle has spoken Astea 2019-09-17 20:34
- OT: Булга́ков Helmut 2019-09-18 12:12
- The omniscient oracle has spoken Astea 2019-09-17 20:34
- The omniscient oracle has spoken Helmut 2019-09-17 12:27
- n2 based on PK metric with higher CV Elena777 2019-09-16 19:48
- AUC passes with 0.05 and Cmax with 0.0294 Helmut 2019-09-16 23:30
- AUC passes with 0.05 and Cmax with 0.0294 Mikalai 2019-09-18 16:56
- Hybrid B/C Helmut 2019-09-18 17:09
- AUC passes with 0.05 and Cmax with 0.0294 Elena777 2019-09-19 08:34
- Use data of all dosed subjects Helmut 2019-09-19 15:16
- Use data of all dosed subjects Elena777 2019-09-19 15:27
- ‘Method C’ ⇒ risky Helmut 2019-09-19 16:15
- Use data of all dosed subjects Elena777 2019-09-19 15:27
- Use data of all dosed subjects Helmut 2019-09-19 15:16
- AUC passes with 0.05 and Cmax with 0.0294 Mikalai 2019-09-18 16:56
- AUC passes with 0.05 and Cmax with 0.0294 Helmut 2019-09-16 23:30
- Q&A ref Astea 2019-09-16 18:28
- EEU-rules, TSD-methods (lengthy question) Elena777 2019-09-16 19:35
- apple tree for two-stage Astea 2019-09-16 20:39
- overripe apples Helmut 2019-09-16 23:37
- override apples Astea 2019-09-17 06:14
- overripe apples Helmut 2019-09-16 23:37
- apple tree for two-stage Astea 2019-09-16 20:39
- n2 based on PK metric with higher CV Helmut 2019-09-16 11:50
- EEU-rules, TSD-methods (lengthy question) Astea 2019-09-14 14:56
- EEU-rules, TSD-methods (lengthy answer)Helmut 2019-09-12 01:31
- Which country? Elena777 2019-09-11 20:24
- Appropriate wording for a protocol ElMaestro 2019-09-09 21:39