Elena777
☆

Belarus,
2019-09-09 19:34

Posting: # 20564
Views: 2,314

## Appropriate wording for a protocol [Two-Stage / GS Designs]

Dear all, I would be pleased to get your opinion on the following. We are planning to conduct several BE studies with adaptive design using the drugs with uncertain intraCV. We have decided to use method C described by Potvin and included the description of the model C in the protocols (the same as in the corresponding scheme presented in Potvin's article). But it seems it's not enough.
1. Should we include the information that evaluation after stage 1 completion should be performed assuming GMR=0.95?
2. Should we describe the maximum number of subjects who can be included in whole or in stage 2?
3. Any other information that should be clearly stated in order to be accurate and to satisfy regulatory authorities?
4. What if BE criteria are met after stage 1, but estimated power is too low (e.g. 30%)?

Post number 20,000.  [Helmut]
ElMaestro
★★★

Belgium?,
2019-09-09 21:39

@ Elena777
Posting: # 20565
Views: 2,185

## Appropriate wording for a protocol

Hello Elena777,

» 1. Should we include the information that evaluation after stage 1 completion should be performed assuming GMR=0.95?

I would do so.

» 2. Should we describe the maximum number of subjects who can be included in whole or in stage 2?

I would only put a cap on it if you can refer to simulations having done exactly so (having done so in exactly your way of capping).

» 3. Any other information that should be clearly stated in order to be accurate and to satisfy regulatory authorities?

Exact decision tree, and exact values for alphas, desired power level, and power being calculated using GMR=0.95.

» 4. What if BE criteria are met after stage 1, but estimated power is too low (e.g. 30%)?

"Too low"?
It is not a crime to be lucky. I don't see any issue. Regulators are generally not afraid of low power after results become available. This forum is a paradise for grumpy old men being adverse to post-hoc power. I used be to a reasonably happy, cheerful bloke, but then I got a profile here and quickly I went very sour if not outright angry. I blend in nicely, I think?!?

I could be wrong, but...
Best regards,
ElMaestro
Helmut
★★★

Vienna, Austria,
2019-09-09 23:27

@ ElMaestro
Posting: # 20567
Views: 2,165

## Appropriate wording for a protocol

Hi ElMaestro,

» » 1. Should we include the information that evaluation after stage 1 completion should be performed assuming GMR=0.95?
»
» I would do so.

So would I.

» » 2. Should we describe the maximum number of subjects who can be included in whole or in stage 2?
»
» I would only put a cap on it if you can refer to simulations having done exactly so (having done so in exactly your way of capping).

From a regulatory perspective this is not necessary. Any futility rule (like max. n2) decreases the chance to show BE if compared to a published method without one. Hence, if the type I error was controlled in a method without a futility rule, the TIE will always be lower with a futility rule. However, if a futility rule is too strict, you may shoot yourself in the foot since power might be compromised. To check that, sim’s are a good idea indeed.

» » 3. Any other information that should be clearly stated in order to be accurate and to satisfy regulatory authorities?
»
» Exact decision tree, and exact values for alphas, desired power level, and power being calculated using GMR=0.95.

Yep.

» » 4. What if BE criteria are met after stage 1, but estimated power is too low (e.g. 30%)?
»
» It is not a crime to be lucky.

Absolutely. As one of the grumpy old men: Forget power, doesn’t matter.

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Ohlbe
★★★

France,
2019-09-10 10:27

@ ElMaestro
Posting: # 20570
Views: 2,118

## Appropriate wording for a protocol

Dear ElMaestro,

[off topic]

» This forum is a paradise for grumpy old men being adverse to post-hoc power.

Hey, I'm not old !

[/off topic]

Regards
Ohlbe
Helmut
★★★

Vienna, Austria,
2019-09-09 23:17

@ Elena777
Posting: # 20566
Views: 2,160

## Which country?

Hi Elena,

» We are planning to conduct several BE studies with adaptive design using the drugs with uncertain intraCV. We have decided to use method C described by Potvin …

Whether Potvin’s Method C will be accepted depends on the jurisdiction you are bound to.a,b

1. Accepted by the FDA (Donald Schuirmann is a co-author of this paper and later ones) and Health Canada. Confirmed at the 2nd/3rd GBHI conferences (Rockville 2016, Amsterdam 2018) that any simulation-based method is acceptable.
2. For the EMA possible if BE already in stage 1, difficult if you proceeded to stage 2. Even Method B is tricky. The EMA dislikes (oh dear!) methods based on simulations and prefers ones which showed strict control of the type I error, i.e.,
1. König F, Wolfsegger M, Jaki T, Schütz H, Wassmer G. Adaptive two-stage bioequivalence trials with early stopping and sample size re-estimation. Vienna: 2014; 35th Annual Conference of the International Society for Clinical Biostatistics. Poster P1.2.88. doi:10.13140/RG.2.1.5190.0967.
2. Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018;37(10):1–21. doi:10.1002/sim.7614.
BTW, implemented in the R-package Power2Stage: functions power.2stage.in(), interim.tsd.in(), final.tsd.in() since October 2017.
The EMA’s Pharmacokinetics Working Party and the Biostatistics Working Party had two-stage design on their workplan for years (!) with any outcome. At last year’s BioBridges Paola Cop­pola (MHRA) showed this slide:

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Elena777
☆

Belarus,
2019-09-11 20:24

@ Helmut
Posting: # 20587
Views: 2,044

## Which country?

Dear Helmut and ElMaestro,

Thank you very much for responding me shortly. We are planning to submit applications in Belarus (the country Im from) and in Russia. Hope method C will be OK for them.

Another 2 questions:
1. Is it a good idea to add the statement that, in case of conducting stage 2, data from both stages will be pooled by default, without evaluation of differences between stages?
2. Is there any established minimum for a number of subjects that should be included in stage 2 (e.g. at least 2, or at least 1 for each sequence (TR/RT))?
Helmut
★★★

Vienna, Austria,
2019-09-12 01:31

@ Elena777
Posting: # 20589
Views: 2,046

Hi Elena,

» We are planning to submit applications in Belarus (the country Im from) and in Russia. Hope method C will be OK for them.

THX for the information. Sections 97/98 of the EEU regulations are a 1:1 translation of the corresponding section about TSDs in the EMA’s BE-GL.
It is difficult to predict how regulators of the EEU interpret their own guideline. Maybe other members from Belarus (4) and Russia (21) can share their experiences.

My ranking (not based on scientific value but on likelihood of acceptance) in the following. To explore the empiric type I error (TIE) I recommend functions of the R-package Power2Stage with 1 mio simulations at theta0=1.25. When I give locations of the maximum TIE it is based on a much narrower grid than in the publications (n1 12…72, step size 2 and CV 10…80%, step size 2%).
1. Potvin et al. “Method B” 1
According to the wording of the GL “… both analyses [should be] conducted at adjusted significance levels …”
Maximum inflation of the TIE 0.0490 (with n1 12 and CV 24%). Hence, the adjusted α 0.0294 is conservative.
  power.tsd(method="B", alpha=c(0.0294, 0.0294), CV=0.24,             n1=12, theta0=1.25, nsims=1e6)[["pBE"]]   # [1] 0.048762
However, no inflation of the TIE with a slightly more liberal α 0.0302.
  power.tsd(method="B", alpha=c(0.0302, 0.0302), CV=0.24,             n1=12, theta0=1.25, nsims=1e6)[["pBE"]]   # [1] 0.049987

2. Karalis “TSD-2” 2
Futility rule for the total sample size of 150. No inflation of the TIE. Compare with the Potvin B above (0.048762):
  power.tsd.KM(method="B", alpha=c(0.0294, 0.0294), CV=0.24,                n1=12, theta0=1.25, Nmax=150, nsims=1e6)[["pBE"]]   # [1] 0.041874
However, power may be negatively affected 3,4 and total sample sizes sometimes even larger. Comparison:
  CV     <- 0.25   n1     <- 14   alpha  <- c(0.0294, 0.0294)   theta0 <- 0.95   res    <- data.frame(method=c("Potvin B", "KM TSD-2"), power=NA,                                 N.min=NA, perc.5=NA, N.med=NA, perc.95=NA,                                 N.max=NA, stringsAsFactors=FALSE)   for (j in 1:2) {     if (j == 1) {       x <- power.tsd(method="B", alpha=alpha, CV=CV, n1=n1, theta0=theta0,                      Nmax=Inf)     } else {       x <- power.tsd.KM(method="B", alpha=alpha, CV=CV, n1=n1, theta0=theta0,                         Nmax=150)     }     res[j, "power"]  <- x[["pBE"]]     res[j, "N.min"]  <- x[["nrange"]][1]     res[j, 4:6]      <- x[["nperc"]]     res[j, "N.max"]  <- x[["nrange"]][2]   }   names(res)[c(3:7)] <- c("N min", "N 5%", "N med", "N 95%", "N max")   print(res, row.names=FALSE)     method   power N min N 5% N med N 95% N max   Potvin B 0.82372    14   14    30    58   110   KM TSD-2 0.79893    14   14    32   106   150

3. Karalis “TSD-1” 2
As above but decision scheme similar to Potvin C and α 0.0280.
  power.tsd.KM(method="C", alpha=c(0.0280, 0.0280), CV=0.22,                n1=12, theta0=1.25, Nmax=150, nsims=1e6)[["pBE"]]   # [1] 0.041893
Compare to the TIE below.

4. Potvin et al. “Method C” 1
Ignoring the sentence of the GL mentioned at #1 above and concentrating on “… there are many acceptable alternatives and the choice of how much alpha to spend at the interim analysis is at the company’s discretion.”
With the adjusted α 0.0294 there is a maximum inflation of the TIE of 0.0514 (with n1 12 and CV 22%).
  power.tsd(method="C", alpha=c(0.0294, 0.0294), CV=0.22,             n1=12, theta0=1.25, nsims=1e6)[["pBE"]]   # [1] 0.051426
However, there is no inflation of the TIE for any CV and n1 ≥18.
If you want to go with Method C, I suggest a more conservative adjusted α 0.0280.
  power.tsd(method="C", alpha=c(0.0280, 0.0280), CV=0.22,             n1=12, theta0=1.25, nsims=1e6)[["pBE"]]   # [1] 0.049669

5. Xu et al., “Method E”, “Method F” 5
More powerful than the original methods of the same group of authors since two CV-ranges are considered. “Method E” is an extension of “Method B” and “Method F” of “Method C”. Both have different alphas in the stages and a futility rule based on the 90% CI and a maximum sample size (though not as futility). Slight mis-specification of the CV (say, you assumed CV 25% and the CV turns out to be 35%) still controls the TIE.
• “Method E”
CV 10–30%:
adjusted α 0.0249, 0.0363, min. n1 18, max.n 42, CI within {0.9374, 1.0667}
CV 30–55%:
adjusted α 0.0254, 0.0357, min. n1 48, max.n 180, CI within {0.9305, 1.0747}
• “Method F”
CV 10–30%:
adjusted α 0.0248, 0.0364, min. n1 18, max.n 42, CI within {0.9492, 1.0535}
CV 30–55%:
adjusted α 0.0259, 0.0349, min. n1 48, max.n 180, CI within {0.9350, 1.0695}
Examples:
  power.tsd.fC(method="B", alpha=c(0.0249, 0.0363), CV=0.30, n1=18,                fCrit="CI", fClower=0.9374, max.n=42, theta0=1.25,                nsims=1e6)[["pBE"]] # Method E (low CV)   # [1] 0.048916   power.tsd.fC(method="B", alpha=c(0.0254, 0.0357), CV=0.55, n1=48,                fCrit="CI", fClower=0.9305, max.n=180, theta0=1.25,                nsims=1e6)[["pBE"]] # Method E (high CV)   # [1] 0.045969   power.tsd.fC(method="C", alpha=c(0.0248, 0.0364), CV=0.30, n1=18,                fCrit="CI", fClower=0.9492, max.n=42, theta0=1.25,                nsims=1e6)[["pBE"]] # Method F (low CV)   # [1] 0.049194   power.tsd.fC(method="C", alpha=c(0.0259, 0.0349), CV=0.55, n1=48,                fCrit="CI", fClower=0.9350, max.n=180, theta0=1.25,                nsims=1e6)[["pBE"]] # Method F (high CV)   # [1] 0.045471

6. Maurer et al. 6
The only approach not based on simulations and seemingly preferred by the EMA.
That’s the most flexible method because you can specify futility rules on the CI, achievable total power, maximum total sample size. Furthermore, you can base the decision to proceed to the second stage on the PE observed in the first stage (OK, this is supported by the functions of Power2Stage as well but not in the published methods – you would have to perform own simulations). Example:
  power.tsd.in(CV=0.24, n1=12, theta0=1.25, fCrit="No",                ssr.conditional="no", nsims=1e6)[["pBE"]]   # [1] 0.04642
Let us compare the method with data of Example 2 given by Potvin et al. Note that in this method you perform separate ANOVAs, one in the interim and one in the final analysis. In Example 2 we had 12 subjects in stage 1 and with both methods a second stage with 8 subjects. The final PE was 101.45% with a 94.12% CI of 88.45–116.38%. I switched off futility criteria and kept all other defaults.
  interim.tsd.in(GMR1=1.0876, CV1=0.18213, n1=12,                  fCrit="No", ssr.conditional="no")   TSD with 2x2 crossover   Inverse Normal approach    - Maximum combination test with weights for stage 1 = 0.5 0.25    - Significance levels (s1/s2) = 0.02635 0.02635    - Critical values (s1/s2) = 1.9374 1.9374    - BE acceptance range = 0.8 ... 1.25    - Observed point estimate from stage 1 is not used for SSR    - Without conditional error rates and conditional (estimated target) power   Interim analysis after first stage   - Derived key statistics:     z1 = 3.10000, z2 = 1.70344,     Repeated CI = (0.92491, 1.27891)   - No futility criterion met   - Test for BE not positive (not considering any futility rule)   - Calculated n2 = 8   - Decision: Continue to stage 2 with 8 subjects
Similar outcome. Not BE and second stage with 8 subjects.
  final.tsd.in(GMR1=1.0876, CV1=0.18213, n1=12,                GMR2=0.9141, CV2=0.25618, n2=8)   TSD with 2x2 crossover   Inverse Normal approach    - Maximum combination test with weights for stage 1 = 0.5 0.25    - Significance levels (s1/s2) = 0.02635 0.02635    - Critical values (s1/s2) = 1.93741 1.93741    - BE acceptance range = 0.8 ... 1.25   Final analysis after second stage   - Derived key statistics:     z1 = 2.87952, z2 = 2.60501,     Repeated CI = (0.87690, 1.17356)     Median unbiased estimate = 1.0135   - Decision: BE achieved
Passed BE as well. PE 101.35 with a 94.73% CI of 87.69–117.36%.

Acceptance in Belarus & Russia – no idea. Might well be that their experts never have seen such a study before.
Personally (‼) I would rank the methods
1. Maurer et al.
2. Xu et al. “Method F”
3. Xu et al. “Method E”
4. Potvin et al. “Method C” (modified α 0.0280)
5. Potvin et al. “Method B” (modified α 0.0302)
6. Potvin et al. “Method C” (original α 0.0294)
7. Potvin et al. “Method B” (original α 0.0294)
8. Karalis “TSD-1”
9. Karalis “TSD-2”
Maybe the original “Method C” is risky when you proceed to the second stage (all my accepted studies were BE already in stage 1 and I have seen nasty deficiency letters in the past).

» Another 2 questions:
» 1. Is it a good idea to add the statement that, in case of conducting stage 2, data from both stages will be pooled by default, without evaluation of differences between stages?

If you performed the second stage, it’s mandatory to pool the data. None of the methods contains any kind of test between stages. Furthermore, a formulation-by-stage interaction term in the model is considered nonsense in the EMA’s Q&A.

» 2. Is there any established minimum for a number of subjects that should be included in stage 2 (e.g. at least 2, or at least 1 for each sequence (TR/RT))?

Nothing in the guidelines, but mentioned in the EMA’s Q&A document. However, that’s superfluous. If you perform a sample size estimation, in all software the minimum stage 2 sample size will be 2 anyhow (if odd, rounded up to the next even to obtain balanced sequences). In the functions of Power2Stage you can used the argument min.n2=2 and will never see any difference.
Only if you are a nerd, read the next paragraph.

The conventional sample size estimation does not take the stage-term in the final analysis into account. If you prefer to use braces with suspenders, use the function sampleN2.TOST().
CV  <- 0.25 n1  <- 12 res <- data.frame(method=c("PowerTOST::sampleN.TOST()",                            "Power2Stage::sampleN2.TOST()"),                   n1=n1, n2=NA, power=NA, stringsAsFactors=FALSE) for (j in 1:2) {   if (j == 1) {     x <- PowerTOST::sampleN.TOST(alpha=0.0294, CV=CV, print=FALSE)[7:8]     res[j, 3] <- x[1] - n1   } else {     x <- Power2Stage::sampleN2.TOST(alpha=0.0294, CV=CV, n1=n1)[8:9]     res[j, 3] <- x[1]   }   res[j, "power"] <- x[["Achieved power"]] } print(res, row.names=FALSE)                       method n1 n2     power    PowerTOST::sampleN.TOST() 12 22 0.8127230 Power2Stage::sampleN2.TOST() 12 22 0.8141106
Practically it is unlikely to get a difference in sample sizes…

In #5 the minimum is 4 because you perform a separate ANOVA in the second stage. One word of caution: If you have a nasty drug (dropouts due to AEs) take care that you don’t end up with <3 subjects – otherwise the ANOVA would not be possible.

In designing a study I recommend to call the functions with the arguments theta0 and CV, which are your best guesses. Don’t confuse that with the argument GMR, which is fixed in most methods. Then you get an impression what might happen (chance to show BE in the first stage, probability to proceed to stage 2, average & range of total sample sizes…). n1 which is ~80% of a fixed sample design is a good compromise between chances to show BE in the first stage whilst keeping overall power. Example of finding a suitable futility rule of the total sample size:
CV   <- 0.25 n1   <- 0.8*PowerTOST::sampleN.TOST(CV=CV, print=FALSE)[["Sample size"]] n1   <- ceiling(n1 + ceiling(n1) %% 2) lo   <- ceiling(1.5*n1 + ceiling(1.5*n1) %% 2) hi   <- ceiling(3*n1 + ceiling(3*n1) %% 2) Nmax <- c(seq(lo, hi, 4), Inf) res  <- data.frame(Nmax=Nmax, power=NA) for (j in seq_along(Nmax)) {   res$power[j] <- Power2Stage::power.tsd(CV=CV, n1=n1, Nmax=Nmax[j])[["pBE"]] } print(res, row.names=FALSE) Nmax power 36 0.70564 40 0.74360 44 0.77688 48 0.80214 52 0.81854 56 0.82976 60 0.83596 64 0.83957 68 0.84156 72 0.84153 Inf 0.84244 A futility rule of 48 looks good. Let’s explore the details: Power2Stage::power.tsd(CV=CV, n1=n1, Nmax=48) TSD with 2x2 crossover Method B: alpha (s1/s2) = 0.0294 0.0294 Target power in power monitoring and sample size est. = 0.8 Power calculation via non-central t approx. CV1 and GMR = 0.95 in sample size est. used Futility criterion Nmax = 48 BE acceptance range = 0.8 ... 1.25 CV = 0.25; n(stage 1) = 24; GMR = 0.95 1e+05 sims at theta0 = 0.95 (p(BE) = 'power'). p(BE) = 0.80214 p(BE) s1 = 0.63203 Studies in stage 2 = 29.06% Distribution of n(total) - mean (range) = 27.6 (24 ... 48) - percentiles 5% 50% 95% 24 24 44 Not that bad. However, futility rules can be counterproductive because you have to come up with a “best guess” CV – which is actually against the “sprit” of TSDs. Homework: Power2Stage::power.tsd(CV=0.30, n1=24, Nmax=48) As ElMaestro wrote above you have to perform own simulations if you are outside the published methods (GMR, target power, n1/CV-grid, futility rules). My basic algorithm is outlined by Molins et al. 7 A final reminder: In the sample size estimation use the fixed GMR (not the observed one), unless the method allows that. 1. Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs. Pharm Stat. 2008; 7(4): 245–62. doi:10.1002/pst.294. 2. Karalis V. The role of the upper sample size limit in two-stage bioequivalence designs. Int J Pharm. 2013; 456: 87–94. doi:j.ijpharm.2013.08.013. 3. Fuglsang A. Futility Rules in Bioequivalence Trials with Sequential Designs. AAPS J. 2014; 16(1): 79–82. doi:10.1208/s12248-013-9540-0. 4. Schütz H. Two-stage designs in bioequivalence trials. Eur J Clin Pharmacol. 2015; 71(3): 271–81. doi:10.1007/s00228-015-1806-2. 5. Xu J, Audet C, DiLiberti CE, Hauck WW, Montague TH, Parr TH, Potvin D, Schuirmann DJ. Optimal adaptive sequential designs for crossover bioequivalence studies. Pharm Stat. 2016; 15(1): 15–27. doi:10.1002/pst.1721. 6. Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018; 37(10): 1587–607. doi:10.1002/sim.7614. 7. Molins E, Cobo E, Ocaña J. Two-stage designs versus European scaled average designs in bioequivalence studies for highly variable drugs: Which to choose? Stat Med. 2017; 36(30): 4777–88. doi:10.1002/sim.7452. Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes Astea ★ Russia, 2019-09-14 14:56 @ Helmut Posting: # 20597 Views: 1,848 ## EEU-rules, TSD-methods (lengthy question) Dear all! Elena777, I guess I misunderstood something. Did you mean aposteriory power or interim power? If interim is 30% go to the next step by the decision tree. » It is difficult to predict how regulators of the EEU interpret their own guideline So true! "Words are chameleons, which reflect the color of their environment" (Learned Hand) Helmut, what was the final conclusions on the post? Are there any suggestions on how to deal with two metrics in adaptive trials? For example: a). Let us consider Type II design: first step - estimated power is less than target (80%) for Cmax and more than target for AUC, besides 90%CI for AUC is OK. 1). We calculate 100(1-2αadj) CI for Cmax, should we also do it for AUC? It can fail. 2). Suppose further we go to the 2nd stage. Should we use data from the 2nd stage to estimate CI for AUC the second time? If yes, it possibly can fail, if not - how to explain the fact that we do not use the data? Another example: b). First step - estimated power is less than target (80%) for both metrics and adjusted level CI is outside the range. Should we use the largest observed CV to calculate the total sample size? Would the study be overpowered for the second PK metric? Would it affect the TIE? To conclude: what is the best strategy to follow in this situation in order to avoid inflation of the TIE and the loss of power? (Some mad idea: is it possible to make some hybrid monster to combine both Cmax and AUC in the same test for adaptive designs? Something like Cmax/AUC but with more powerful reflection of the situations (I dealt with a plenty of studies (BE and not proven BE) with Cmax/AUC as an additional metric, only once it was outside the range) » Furthermore, a formulation-by-stage interaction term in the model is considered nonsense in the EMA’s Q&A. What ANOVA model should be used for the second stage? By the way, what about the code on R for the full decision tree? "We are such stuff as dreams are made on, and our little life, is rounded with a sleep" Helmut ★★★ Vienna, Austria, 2019-09-16 11:50 @ Astea Posting: # 20598 Views: 1,769 ## n2 based on PK metric with higher CV Hi Nastia, » Helmut, what was the final conclusions on the post? I was wrong and we shouldn’t worry. See Detlew’s simulations. » Are there any suggestions on how to deal with two metrics in adaptive trials? » For example: a). Let us consider Type II design: first step - estimated power is less than target (80%) for Cmax and more than target for AUC, besides 90%CI for AUC is OK. » 1). We calculate 100(1-2αadj) CI for Cmax, should we also do it for AUC? It can fail. » 2). Suppose further we go to the 2nd stage. Should we use data from the 2nd stage to estimate CI for AUC the second time? If yes, it possibly can fail, if not - how to explain the fact that we do not use the data? Think about how we design a fixed sample design. Always based on the metric with the higher CV. I would go with your 2). How likely is it that AUC (which passed already in the first stage) will fail in the second? Let’s consider the example of the other post. I assumed the best, i.e., all studies in the ‘type II’ design passed with α 0.05. Now: library(PowerTOST) alpha <- c(0.05, 0.0294) ns <- c(28, 28 + 20) CV <- 0.20 res <- data.frame(analysis = c("interim", "final"), alpha = c(0.05, 0.0294), n = ns, df = NA, power = NA, beta = NA) for (j in 1:2) { if (j == 1) { n <- ns[j] } else { # workaround since we have 1 df less n <- ns[j] - 1 } res[j, 4] <- n - 2 res[j, 5] <- suppressMessages( power.TOST(alpha = alpha[j], CV = CV, n = n)) res[j, 6] <- 1 - res[j, 5] } res[, 5:6] <- signif(res[, 5:6], 4) print(res, row.names = FALSE) analysis alpha n df power beta interim 0.0500 28 26 0.9349 0.0651 final 0.0294 48 45 0.9872 0.0128 » Another example: b). First step - estimated power is less than target (80%) for both metrics and adjusted level CI is outside the range. Should we use the largest observed CV to calculate the total sample size? Yes. » Would the study be overpowered for the second PK metric? Would it affect the TIE? According to Detlew’s simulations, no. Given, only ‘type I’ implemented which is more conservative anyhow. Power2Stage:::power.tsd.2m(CV = c(0.3, 0.2), theta0 = rep(0.95, 2), n1 = 28) TSD with 2x2 crossover Method B2m: alpha (s1/s2) = 0.0294 0.0294 Target power in power monitoring and sample size est. = 0.8 Power calculation via non-central t approx. CV1 and GMR = 0.95 0.95 in sample size est. used BE acceptance range = 0.8 ... 1.25 CVs = 0.3, 0.2; n(stage 1) = 28; GMR = 0.95, 0.95 1e+05 sims at theta0 = 0.95, 0.95 (p(BE) = 'power'). p(BE) = 0.81339 p(BE) s1 = 0.46224 Studies in stage 2 = 52.48% Distribution of n(total) - mean (range) = 39.8 (28 ... 120) - percentiles 5% 50% 95% 28 34 68 » To conclude: what is the best strategy to follow in this situation in order to avoid inflation of the TIE and the loss of power? Estimate the sample size based on the metric with the higher CV. No inflation of the TIE and a gain in power for the other metric. » (Some mad idea: is it possible to make some hybrid monster to combine both Cmax and AUC in the same test for adaptive designs? Take some Schützomycin? » Something like Cmax/AUC but with more powerful reflection of the situations (I dealt with a plenty of studies (BE and not proven BE) with Cmax/AUC as an additional metric, only once it was outside the range) As expected. Cmax/AUC is generally less variable than Cmax. » » Furthermore, a formulation-by-stage interaction term in the model is considered nonsense in the EMA’s Q&A. » » What ANOVA model should be used for the second stage? According to the Q&A: stage, sequence, sequence × stage, subject(sequence × stage), period(stage), treatment. As usual for the EMA, all effects fixed and the nested term subject(sequence × stage) superfluous. The simple model stage, sequence, sequence × stage, subject, period(stage), treatment. gives exactly the same result. I once received a deficiency letter for a ‘type 2’ study passing in the first stage (α 0.05!) where I dared to model subjects as a random effect… Interesting that there were no questions to use an adjusted α (would have passed as well but I followed my SAP which was approved by the BfArM and for “educational reasons” I didn’t show the adjusted CI). » By the way, what about the code on R for the full decision tree? Ask Detlew or inspect the sources of power.tsd() and power.tsd.2m(). Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes Astea ★ Russia, 2019-09-16 18:28 @ Helmut Posting: # 20599 Views: 1,705 ## Q&A ref Dear Helmut! » I was wrong and we shouldn’t worry. See Detlew’s simulations. That's good. Sorry, I didn't realized it at first. » How likely is it that AUC (which passed already in the first stage) will fail in the second? Thank you for the example! I've puzzled whether it will be reproduced for other cases. Let us consider the situation when CV of Cmax and AUC are very close to each other, like 21% and 20%, and for the first stage the number of subjects (n1=20) was sufficient for AUC, but not for Cmax. Calculation shows that even then the power for AUC for the second stage would be always enough. for(j in 5:100){nj1<-sampleN.TOST(CV=j/100,print=FALSE)[1,7] n02<-sampleN2.TOST(CV=(j+1)/100,n1=nj1)[1,8] nj2<-nj1+n02 print(suppressMessages(power.TOST(CV=j/100,n=nj2-1,alpha=0.0294)))}  » Take some Schützomycin? Did you patent that? I gonna make a generic » According to the Q&A: stage, sequence, sequence × stage, subject(sequence × stage), period(stage), treatment. Are there any documents to refer which mention this model (excepting the answer on the EMA's web page?) » Ask Detlew or inspect the sources of power.tsd() and power.tsd.2m(). Ok, need more tea to dive to the source... "We are such stuff as dreams are made on, and our little life, is rounded with a sleep" Helmut ★★★ Vienna, Austria, 2019-09-17 12:27 @ Astea Posting: # 20606 Views: 1,623 ## The omniscient oracle has spoken Hi Nastia, » Let us consider the situation when CV of Cmax and AUC are very close to each other, like 21% and 20%, and for the first stage the number of subjects (n1=20) was sufficient for AUC, but not for Cmax. » […] even then the power for AUC for the second stage would be always enough. […] Wow, you are a master of condensed R-code! Here my version: library(PowerTOST) library(Power2Stage) delta <- 0.01 CV.lo <- seq(0.1, 0.3, 0.01) CV.hi <- CV.lo + delta res <- data.frame(CV.lo = CV.lo, CV.hi = CV.hi, n1 = NA, N = NA, power.1 = NA, power.2 = NA) for (j in seq_along(CV.lo)) { res$n1[j]      <- sampleN.TOST(CV = CV.lo[j],                                  print=FALSE)[["Sample size"]]   if (res$n1[j] < 12) res$n1[j] <- 12 # acc. to guidelines   temp           <- sampleN2.TOST(CV = CV.hi[j], n1 = res$n1[j]) res$N[j]       <- sum(temp[7:8])     # N = n1 + n2   res$power.1[j] <- signif(suppressMessages( power.TOST(alpha = 0.0294, CV = CV.lo[j], n = res$N[j] - 1)), 4)   res$power.2[j] <- signif(suppressMessages( power.TOST(alpha = 0.0294, CV = CV.hi[j], n = res$N[j] - 1)), 4) } cat("delta", delta, "\n"); print(res, row.names = FALSE) delta 0.01  CV.lo CV.hi n1  N power.1 power.2   0.10  0.11 12 12  0.9561  0.9168   0.11  0.12 12 12  0.9168  0.8664   0.12  0.13 12 12  0.8664  0.8079   0.13  0.14 12 14  0.8836  0.8334   0.14  0.15 12 14  0.8334  0.7778   0.15  0.16 12 16  0.8470  0.7974   0.16  0.17 14 18  0.8539  0.8086   0.17  0.18 14 20  0.8566  0.8146   0.18  0.19 16 22  0.8567  0.8171   0.19  0.20 18 24  0.8549  0.8173   0.20  0.21 20 26  0.8517  0.8157   0.21  0.22 22 28  0.8475  0.8129   0.22  0.23 22 30  0.8426  0.8091   0.23  0.24 24 32  0.8370  0.8045   0.24  0.25 26 34  0.8310  0.7994   0.25  0.26 28 36  0.8246  0.7937   0.26  0.27 30 40  0.8392  0.8109   0.27  0.28 32 42  0.8315  0.8038   0.28  0.29 34 44  0.8238  0.7965   0.29  0.30 38 48  0.8336  0.8081   0.30  0.31 40 50  0.8253  0.8001 delta 0.05  CV.lo CV.hi n1  N power.1 power.2   0.10  0.15 12 14  0.9826  0.7778   0.11  0.16 12 16  0.9813  0.7974   0.12  0.17 12 18  0.9789  0.8086   0.13  0.18 12 20  0.9757  0.8146   0.14  0.19 12 22  0.9718  0.8171   0.15  0.20 12 24  0.9673  0.8173   0.16  0.21 14 26  0.9621  0.8157   0.17  0.22 14 28  0.9565  0.8129   0.18  0.23 16 30  0.9503  0.8091   0.19  0.24 18 32  0.9438  0.8045   0.20  0.25 20 34  0.9368  0.7994   0.21  0.26 22 36  0.9296  0.7937   0.22  0.27 22 40  0.9348  0.8109   0.23  0.28 24 42  0.9271  0.8038   0.24  0.29 26 44  0.9192  0.7965   0.25  0.30 28 48  0.9226  0.8081   0.26  0.31 30 50  0.9144  0.8001   0.27  0.32 32 52  0.9061  0.7922   0.28  0.33 34 56  0.9084  0.8005   0.29  0.34 38 60  0.9098  0.8071   0.30  0.35 40 62  0.9014  0.7987

Is this what you mean?

» » Take some Schützomycin?
»
» Did you patent that? I gonna make a generic

Not mine. It was mentioned for the first time by ElMaestro back in 2010:

» » Let's say we want to develop a generic of Schützomycin. The product is available in one strength, posology is 1 tablet daily. […] Schützomycin is a nice drug with little safety concern.

My claim seems to be unfounded.

» » According to the Q&A:

stage, sequence, sequence × stage, subject(sequence × stage), period(stage), treatment.

» Are there any documents to refer which mention this model (excepting the answer on the EMA's web page?)

Made up out of thin air by the EMA. To quote myself1

In none of the published procedures, a test for poolability was part of the simulations. Although statistical tests could be constructed comparing variances of stages, their precision is poor in such designs and should be applied with caution. Nonetheless, in 2013, the European Medicines Agency introduced an additional term sequence×stage to the statistical model. Since both sequence and stage are between-subject effects, the residual error (hence, the CI) should not be affected – which was recently demonstrated.2

Excerpt2

Special emphasis was also given to the significance (P value) of the additional term ‘sequence × stage’ used in the ANOVA model proposed by EMA. […]
In almost all situations, the significance of the ‘sequence × stage’ term was found to be nonsignificant (i.e. values were greater than 5%). Only when GMR was close to the limit of 1.25 can the significance obtain lower values than the significance level 5%, and thus, the ‘sequence stage’ effect was declared significant.
The overall performance in terms of percentage of BE acceptance of the TSD remains unaltered. Plausibly, no difference in both the df and the SS values is observed for the ‘residual error’; the ‘sequence × stage’ is in essence a between-subject factor, whereas BE assessment is based on CVw.
In any case, the EMA guideline does not clarify what the consequence would be if the ‘sequence × stage’ is statistically significant.

You don’t have to be a rocket scientist to understand that. Why the EMA introduced it, remains a mystery. Maybe influenced by García-Arieta and Gordon?3

A term for the stage should be included in the ANOVA model. However, the guideline does not clarify what the consequence should be if it is statistically significant. In principle, the data sets of both stages could not be combined.

Well, stage  is  already a factor in all published methods (didn’t they read them?). Concerning “poolability” see above. Reminds me on Grizzle’s nonsense for crossovers “if the sequence-effect is significant, analyze data of the first period as a parallel design”.

1. Schütz H. Two-stage designs in bioequivalence trials. Eur J Clin Pharmacol. 2015; 71:271–81. doi:10.1007/s00228-015-1806-2.
2. Karalis V, Macheras P. On the statistical model of the two-stage designs in bioequivalence assessment. J Pharm Pharmacol. 2013; 66:48–52. doi:10.1111/jphp.12164.
3. García-Arieta A, Gordon J. Bioequivalence requirements in the European Union: critical discussion. AAPS J. 2012; 14:738–48. doi:10.1208/s12248-012-9382-1.

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Astea
★

Russia,
2019-09-17 20:34

@ Helmut
Posting: # 20607
Views: 1,524

## The omniscient oracle has spoken

Dear Helmut!

Thanks for enlighting the dark story of ANOVA model!

» Is this what you mean?

Yes, my code gives power.1 for delta 0.01, but without limitations of minimum 12 subjects. Note that for CV.lo≤14% the 2nd stage would not be started, cause power for 12 is already more than 80%.

» My claim seems to be unfounded.
Oh, it turns out that Schützomycin could be a secret ingredient of Azazello's cream?
Helmut
★★★

Vienna, Austria,
2019-09-18 12:12

@ Astea
Posting: # 20610
Views: 1,504

## OT: Булга́ков

Hi Nastia,

» Oh, it turns out that Schützomycin could be a secret ingredient of Azazello's cream?

Possible.

THX for pointing to The Master and Margarita.
I’ve read it when I was (sweet?) little sixteen.

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Elena777
☆

Belarus,
2019-09-16 19:48

@ Helmut
Posting: # 20601
Views: 1,688

## n2 based on PK metric with higher CV

Dear Helmut,

Still need some clarifications on the question from Astea:

» a) first step - estimated power is less than target (80%) for Cmax and more than target for AUC, besides 90%CI for AUC is OK.

To be in compliance with method C of Potvin, we re-calculate CI for Cmax with α=0,0294. Lets assume that BE criterion for Cmax is met after this step. Should we also re-calculate CI for AUC with α=0,0294? Otherwise, finally we will have the following in a CSR: BE criterion was met for AUC using α=0,05 (90%CI) and BE criterion was met for Cmax using α=0,0294 (94,12% CI). Does it look OK?
Helmut
★★★

Vienna, Austria,
2019-09-16 23:30

@ Elena777
Posting: # 20603
Views: 1,677

## AUC passes with 0.05 and Cmax with 0.0294

Hi Elena,

» To be in compliance with method C of Potvin, we re-calculate CI for Cmax with α=0,0294. Lets assume that BE criterion for Cmax is met after this step. […] Otherwise, finally we will have the following in a CSR: BE criterion was met for AUC using α=0,05 (90%CI) and BE criterion was met for Cmax using α=0,0294 (94,12% CI). Does it look OK?

Absolutely. The ideas behind the different alphas in Potvin C are:
1. If interim power is ≥80%, essentially your assumptions about the CV were correct. Assess the study like a fixed sample design with α 0.05. You stop anyway (pass/fail).
2. If interim power is <80%, your assumptions about the CV were not correct. Assess the study with the adjusted α 0.0294.
1. If you pass, stop.
2. If you fail, initiate the second stage.
Hence, in your example you are for Cmax in the branch 2.a. and for AUC in branch 1. All is good.

» Should we also re-calculate CI for AUC with α=0,0294?

You could but please only “at home”. That’s against the method and what you should have laid down in the protocol. See the end of this post. My study would have passed with the adjusted α as well.
But what if not? See this bizarre case study. If the sponsor would have known before that the agency will not accept Method C, they would have planned for Method B, initiated a second stage with 6 (six!) subjects and happily walked away. Stupid.

What will you do with your ”homework”?
• If the adjusted α passes as well, you sleep well and are prepared to answer a deficiency letter.
• If you fail, I’ll promise you sleepless nights. What can you do? Nothing. Confess that you cherry-picked and think about coming up with an amendment switching from C to B? Too late. If you are religious, pray. If not, get drunk.
That’s why in my personal ranking Potvin C is just № 6.
I recommended it for years. Well, no more.
IMHO, the small gain in power claimed by the authors is not worth the troubles:

library(PowerTOST) library(Power2Stage) CV  <- seq(0.15, 0.4, 0.05) res <- data.frame(CV = CV, fixed = NA, n1 = NA,                   B = NA, C = NA, C.B = NA) n1  <- n <- numeric() for (j in seq_along(CV)) {   res$fixed[j] <- sampleN.TOST(CV = CV[j], print = FALSE)[["Sample size"]] res$n1[j]     <- 0.8 * res$fixed[j] # my recommendation res$n1[j]     <- ceiling(res$n1[j] + ceiling(res$n1[j]) %% 2)   if (res$n1[j] < 12) res$n1[j] <- 12   res$B[j] <- power.tsd(method = "B", CV = CV[j], n1 = res$n1[j])[["pBE"]]   res$C[j] <- power.tsd(method = "C", CV = CV[j], n1 = res$n1[j])[["pBE"]]   res$C.B[j] <- 100 * (res$C[j] - res$B[j]) / res$B[j] } res[, 4:6] <- signif(res[, 4:6], 4) names(res)[4:6] <- c("Method B", "Method C", "C/B (%)") cat("Power in the final analysis\n"); print(res, row.names = FALSE) Power in the final analysis    CV fixed n1 Method B Method C C/B (%)  0.15    12 12   0.8830   0.8994  1.8580  0.20    20 16   0.8521   0.8624  1.2120  0.25    28 24   0.8424   0.8508  0.9924  0.30    40 32   0.8343   0.8411  0.8163  0.35    52 42   0.8315   0.8350  0.4245  0.40    66 54   0.8287   0.8321  0.4067

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Mikalai
☆

Belarus,
2019-09-18 16:56

@ Helmut
Posting: # 20611
Views: 1,473

## AUC passes with 0.05 and Cmax with 0.0294

Dear Helmut,

» 1. If interim power is <80%, your assumptions about the CV were not correct. Assess the study with the adjusted α 0.0294.
»    a. If you pass, stop.

What prevents us from evaluating the bioequivalence at α-level of 5% as the first step, and if we pass the bioequivalence criteria, we stop the trial. If we fail, we then evaluate the power. If power is more than 80% for the failed parameter, we stop the trial and we are done. If power is less than 80% for the failed parameter, then we go the next stage and adust α-level correspondingly to preserve overall α-level at 0,5. Of course, this should be written in the protocol and is a deviation, maybe a big one, from Potvin C method.
Regards,
Mikalai
Helmut
★★★

Vienna, Austria,
2019-09-18 17:09

@ Mikalai
Posting: # 20612
Views: 1,465

## Hybrid B/C

Dear Mikalai,

» What prevents us from evaluating the bioequivalence at α-level of 5% as the first step, and if we pass the bioequivalence criteria, we stop the trial. If we fail, we then evaluate the power. If power is more than 80% for the failed parameter, we stop the trial and we are done. If power is less than 80% for the failed parameter, then we go the next stage and adust α-level correspondingly to preserve overall α-level at 0,5. Of course, this should be written in the protocol and is a deviation, maybe a big one, from Potvin C method.

That’s more or less a hybrid of Method C (where you asses power first) and Method B (where you assess power after). You are free to develop such a method but have to validate it (i.e., find a suitable adjusted α which controls the type I error in every possible combination of n1/CV).

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Elena777
☆

Belarus,
2019-09-19 08:34

@ Helmut
Posting: # 20615
Views: 1,410

## AUC passes with 0.05 and Cmax with 0.0294

Dear Helmut,

Im sorry for being persistent on the topic.

» »First step - estimated power is less than target (80%) for Cmax and more than target for AUC, 90%CI for AUC is OK. To be in compliance with method C of Potvin, we re-calculate CI for Cmax with α=0,0294. And BE criterion for Cmax is met after this step. Finally we will have the following in a CSR: BE criterion was met for AUC using α=0,05 (90%CI) and BE criterion was met for Cmax using α=0,0294 (94,12% CI). Does it look OK?
»
» Absolutely. The ideas behind the different alphas in Potvin C are:
1. If interim power is ≥80%, essentially your assumptions about the CV were correct. Assess the study like a fixed sample design with α 0.05. You stop anyway (pass/fail).
»
2. If interim power is <80%, your assumptions about the CV were not correct. Assess the study with the adjusted α 0.0294.
1. If you pass, stop.
»
2. If you fail, initiate the second stage.
Hence, in your example you are for Cmax in the branch 2.a. and for AUC in branch 1. All is good.
»

To be completely sure that I understood you in a proper way, the final question is:

First step - estimated power is less than target (80%) for Cmax and more than target for AUCt. 90%CI for AUCt is OK. As per method C of Potvin, we re-calculate CI for Cmax with α=0,0294. And BE criterion for Cmax is NOT met after this step. Then we calculate CVintra for Cmax and proceed with the second stage. What combined data should be evaluated after stage 2 completion: ONLY for Cmax or for Cmax and AUCt?
Helmut
★★★

Vienna, Austria,
2019-09-19 15:16

@ Elena777
Posting: # 20617
Views: 1,382

## Use data of all dosed subjects

Hi Elena,

» […] What combined data should be evaluated after stage 2 completion: ONLY for Cmax or for Cmax and AUCt?

Whatever drives the second stage, you have to use all data (Cmax and AUC). Whilst from a statistical perspective there would be no need to assess AUC (already BE in the first stage) no regulator would accept that. Once you dosed subjects, you have to use the data. No way out.

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Elena777
☆

Belarus,
2019-09-19 15:27

@ Helmut
Posting: # 20618
Views: 1,374

## Use data of all dosed subjects

Dear Helmut,

Thank you for responding me shortly. This is our first experience in conducting such studies, so we are quite excited.
Helmut
★★★

Vienna, Austria,
2019-09-19 16:15

@ Elena777
Posting: # 20622
Views: 1,375

## ‘Method C’ ⇒ risky

Hi Elena,

» This is our first experience in conducting such studies, so we are quite excited.

Keep in mind that it might also be the first experience for the experts of the agencies you are aiming at. Possibly they have heard about the skeptic attitudes of European assessors towards ‘Method C’. Consider ‘Method B’ instead. See the end of this post for a comparison of power. What will it help to have (maybe) two subject less in the second stage and a study which is not accepted? I warned you.

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Elena777
☆

Belarus,
2019-09-16 19:35

@ Astea
Posting: # 20600
Views: 1,698

## EEU-rules, TSD-methods (lengthy question)

Dear Astea,

» Elena777, I guess I misunderstood something. Did you mean aposteriory power or interim power? If interim is 30% go to the next step by the decision tree.
tree?

I meant interim power (power that we calculate after stage 1 completion).
Astea
★

Russia,
2019-09-16 20:39

@ Elena777
Posting: # 20602
Views: 1,694

## apple tree for two-stage

Dear Elena777!

» tree?

I meant decision scheme (in graph theory mathematicians call "trees" undirected graphs). See also this message.

» To be in compliance with method C of Potvin, we re-calculate CI for Cmax with α=0,0294. Lets assume that BE criterion for Cmax is met after this step. Should we also re-calculate CI for AUC with α=0,0294? Otherwise, finally we will have the following in a CSR: BE criterion was met for AUC using α=0,05 (90%CI) and BE criterion was met for Cmax using α=0,0294 (94,12% CI). Does it look OK?

In this situation we still have to go to the next stage. As it was shown before the fail for AUC (if it has less variability than Cmax) in the second stage is very unlikely. So it just doesn't matter what was CI for AUC after the first stage, the second stage for Cmax should be crucial.

"We are such stuff as dreams are made on, and our little life, is rounded with a sleep"
Helmut
★★★

Vienna, Austria,
2019-09-16 23:37

@ Astea
Posting: # 20604
Views: 1,668

## overripe apples

Hi Nastia,

» » […] BE criterion was met for AUC using α=0,05 (90%CI) and BE criterion was met for Cmax using α=0,0294 (94,12% CI). Does it look OK?
»
» In this situation we still have to go to the next stage.

Why (see above)? Both AUC and Cmax passed already in the first stage.

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Astea
★

Russia,
2019-09-17 06:14

@ Helmut
Posting: # 20605
Views: 1,641

## override apples

Dear Helmut and Elena777!

» Why (see above)? Both AUC and Cmax passed already in the first stage.

Oops, sorry . I was wrong (thought about the case of fail Cmax). Thank you!