Bioequivalence and Bioavailability Forum • Two Stage Desing: ANOVA

earlybird
☆

2009-10-16 13:25
(5296 d 07:07 ago)

Posting: # 4367
Views: 8,942

Two Stage Desing: ANOVA [Two-Stage / GS Designs]

Dear all,

we are planning a two-stage Design according Method B, Diane Potvin. Does somebody know if the ANOVA statement has to be adapted? I.E. to add stage, stage-treatment interaction as fixed effects and subject nested within (stage x sequence) as random effect.

Earlybird

--
The answer: Please find solution on Page 9 of Potvin paper. Thanks to Berlin!

earlybird

Edit: I restored your original post – otherwise my reply might be confusing. ;-)

[Helmut]

Helmut
★★★

Vienna, Austria,
2009-10-16 15:32
(5296 d 05:01 ago)

@ earlybird
Posting: # 4368
Views: 6,359

Potvin et al: effects in stage 2

Post reply

Dear Earlybird!

❝ we are planning a two-stage Design according Method B, Diane Potvin.

Method B = “Gürtel mit Hosenträgern”. SCNR; this is a German proverb for overweariness, which roughly translates into “waistbelt plus suspenders”.

❝ Does somebody know if the ANOVA statement has to be adapted? I.E. to add stage, stage-treatment interaction as fixed effects and subject nested within (stage × sequence) as random effect.

At least if you proceed to the second stage (see the last page of the methods-section of Potvin’s paper). Just checked it in one of my studies (Method C, which was accepted by the BfArM) – difference in the CI was 0.15% (with/without effects)… However, even if you find significant effects, no poolability criteria are to be applied anyhow.

If you have time, go and meet Diane Potvin in Ottawa.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

d_stat
☆

Slovenia,
2021-07-19 18:26
(1002 d 02:07 ago)

@ Helmut
Posting: # 22478
Views: 2,800

Potvin et al: effects in stage 2

Post reply

Dear Helmut,

I have a questions regarding analysis of two-stage Potvin C data, i.e. with regard to interaction stage*formulation that mentioned here:

❝ At least if you proceed to the second stage (see the last page of the methods-section of Potvin’s paper). Just checked it in one of my studies (Method C, which was accepted by the BfArM) – difference in the CI was 0.15% (with/without effects)… However, even if you find significant effects, no poolability criteria are to be applied anyhow.

Are you aware of any literature or guidance that would suggest that poolability of stages still applies even in case of significant formulation*stage interaction?

Thank you.

Regards

Helmut
★★★

Vienna, Austria,
2021-07-19 19:22
(1002 d 01:11 ago)

@ d_stat
Posting: # 22479
Views: 2,955

Forget simulation-based TSDs for 2×2×2 in Europe

Post reply

Hi d_stat,

❝ I have a questions regarding analysis of two-stage Potvin C data, i.e. with regard to interaction stage*formulation that mentioned here:

❝

❝ Are you aware of any literature or guidance that would suggest that poolability of stages still applies even in case of significant formulation*stage interaction?

I know only one¹ stating

A term for the stage should be included in the ANOVA model. However, the guideline does not clarify what the consequence should be if it is statistically significant. In principle, the data sets of both stages could not be combined.
Although the guideline is not explicit, even if the final sample size is going to be decided based on the intra-subject variability estimated in the interim analysis, a proposal for a final sample size must be included in the protocol so that a significant number of subjects (e.g., 12) is added to the interim sample size to avoid looking twice at almost identical samples. This proposed final sample size should be recruited even if the estimation obtained from the interim analysis is lower than the one pre-defined in the protocol in order to maintain the consumer risk.

This statement lead to heated debates and a compromise in the Q&A document.² Correct:

A model which also includes a term for a formulation*stage interaction would give equal weight to the two stages, even if the number of subjects in each stage is very different. The results can be very misleading hence such a model is not considered acceptable. Furthermore, this model assumes that the formulation effect is truly different in each stage. If such an assumption were true there is no single formulation effect that can be applied to the general population, and the estimate from the study has no real meaning.

Furthermore, none [sic] of the published methods contains a sequence(stage) term and a poolability criterion – combining is always allowed, even if a significant difference between stages is observed.
BTW, the EMA’s modification of the model was shown to be irrelevant.³

Nowadays trying ‘Method C’ in Europe is a recipe for disaster. Even ‘Method B’ is risky. For – a bit outdated – background see here and there. If you nowadays aim at a 2×2×2 crossover, opt for the exact method – which controls the Type I Error in the strict sense (without requiring simulations).⁴ It is implemented in the [image]

-package Power2Stage since April 2018.
Recently I faced a deficiency letter of a European agency where a study (passing BE with ‘Method B’ already in the first stage) was not accepted. Passed BE with the exact method as well. Passed even with Bonferroni’s 0.025. Oh dear!

If you insist in a simulation-based method, consider a recent one.⁵

García-Arieta A, Gordon J. Bioequivalence Requirements in the European Union: Critical Discussion. AAPS J. 2012; 14(4): 738–48. doi:10.1208/s12248-012-9382-1.
EMA, CHMP. Questions & Answers: positions on specific questions addressed to the Pharmacokinetics Working Party (PKWP). London. 19 November 2015. EMA/618604/2008 Rev. 13.
Karalis V, Macheras P. On the Statistical Model of the Two-Stage Designs in Bioequivalence Assessment. J Pharm Pharmacol. 2014; 66(1): 48–52. doi:10.1111/jphp.12164.
Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018;1–21. doi:10.1002/sim.7614.
Molins E, Labes D, Schütz H, Cobo E, Ocaña J. An iterative method to protect the type I error rate in bioequivalence studies under two-stage adaptive 2×2 crossover designs. Biom J. 2021;63(1):122–33. doi:10.1002/bimj.201900388.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

d_stat
☆

Slovenia,
2021-07-20 15:58
(1001 d 04:34 ago)

@ Helmut
Posting: # 22482
Views: 2,772

TSD statistical model - with multiple sites

Post reply

Dear Helmut,

Thank you for sharing these references and valuable comments.

❝ BTW, the EMA’s modification of the model was shown to be irrelevant.

And if I deducted correctly, this helps that at least for FDA statistical model for TSD we can therefore omit interaction term and always combine stage data :-)

- as per Karalis model looks like this (1):
Stage 1: ‘sequence’, ‘period’, ‘treatment’ and ‘subject(sequence)’ all fixed
Stage 2 (EX): ‘sequence’, ‘treatment’, ‘stage’, ‘period(stage)’ and ‘subject(sequence × stage)’ all fixed - subject(sequence × stage) can also be random, since it produces same result.

We will conduct study on multiple sites, so it adds complexity to the statistical models to be used:
Stage 1 model: Sequence, Treatment, Site, Period (Site), Sequence*Site, Treatment*Site as fixed and Subject (Sequence*Site) as random
Stage 2 model: Sequence, Treatment, Site, Stage, Period (Site*Stage), Sequence*Site, Treatment*Site, Stage*Site, Sequence*Site*Stage as fixed effects and Subject (Sequence*Site*Stage) as random - and based on (1) we can omit Treatment*Site*Stage term.

❝ It is implemented in the [image] -package Power2Stage since April 2018.

Indeed, we have used [image]

-package Power2Stage calculations when discussing approach with the FDA. These packages are lifesaver :clap:

Regardless FDA still requires us to submit simulations on the validated model to justify our "specific" TSD approach. We still need to figure out what this means.

❝ I recently faced a deficiency letter of a European agency where a study (passing BE with ‘Method B’ already in the first stage) was not accepted. Passed BE with the exact method as well…

But 'Method B' success in Stage 1 means your were already within the BE limits with even wider intervals (i.e. even smaller patient risk)! Cannot image why someone would reject this? :confused:

Regards
d_stat

Karalis V, Macheras P. On the Statistical Model of the Two-Stage Designs in Bioequivalence Assessment. J Pharm Pharmacol. 2014; 66(1): 48–52. doi:10.1111/jphp.12164.

Helmut
★★★

Vienna, Austria,
2021-07-20 23:52
(1000 d 20:40 ago)

@ d_stat
Posting: # 22483
Views: 2,947

TSD statistical model - with multiple sites

Post reply

Hi d_stat,

❝ And if I deducted correctly, this helps that at least for FDA statistical model for TSD we can therefore omit interaction term and always combine stage data :-)

❝

❝ We will conduct study on multiple sites, so it adds complexity to the statistical models to be used:

Confirmed. I had a ‘Type A’ meeting with the FDA last March. Agreed that the stupid site-by-treatment interaction can be dropped (as any pre-test it inflates the Type I Error¹). The model was like yours:

site,
sequence,
treatment,
subject (nested within site × sequence),
period (nested within site), and
site-by-sequence interaction,

where subject (nested within site × sequence) is a random effect and all other effects are fixed.

Of course, we proposed Maurer’s method. Note that there is no stage-term in the model because the interim (IA) and final analysis (FA) are evaluated separately (though the entire information is used in the FA by the repeated confidence intervals).
In practice, run the mixed-model in both stages. You need the actual values of n₁, CV₁, GMR₁, df₁, and SEM₁ and in the – optional – FA additionally CV₂, GMR₂, df₂, and SEM₂.

❝ ❝ It is implemented in the [image] -package Power2Stage since April 2018.

❝

❝ Indeed, we have used [image] -package Power2Stage calculations when discussing approach with the FDA. These packages are lifesaver :clap:

THX especially to Detlew Labes and Benjamin Lang.

❝ Regardless FDA still requires us to submit simulations on the validated model to justify our "specific" TSD approach. We still need to figure out what this means.

An example (simulated data of a study which proceeds to the second stage):

library(Power2Stage) # defaults used: # alpha = 0.05 # theta1 = 0.80 # theta2 = 1.25 # targetpower = 0.80 n1 <- 76 CV1 <- 0.4237714285 GMR1 <- 0.8818736281 df1 <- 65 SEM1 <- 0.06592665941 # values which are not the defaults interim.tsd.in(weight = 0.80, max.comb.test = FALSE, GMR = 0.95, usePE = TRUE, min.n2 = 6, max.n = 140, n1 = n1, GMR1 = GMR1, CV1 = CV1, df1 = df1, SEM1 = SEM1, fCrit = "PE", ssr.conditional = "error_power", pmethod = "exact") TSD with 2x2 crossover Inverse Normal approach - Standard combination test with weight for stage 1 = 0.8 - Significance levels (s1/s2) = 0.03585 0.03585 - Critical values (s1/s2) = 1.80107 1.80107 - BE acceptance range = 0.8 ... 1.25 - Observed point estimate from stage 1 is used for SSR - With conditional error rates and conditional estimated target power Interim analysis after first stage - Derived key statistics: z1 = 1.46015, z2 = 4.80735 Repeated CI = (0.78160, 0.99501) Median unbiased estimate = NA - No futility criterion met - Test for BE not positive (not considering any futility rule) - Calculated n2 = 6 - Decision: Continue to stage 2 with 6 subjects n2 <- c(3, 2) # six dosed, one dropout CV2 <- 0.5761171133 GMR2 <- 1.302483215 df2 <- 3 SEM2 <- 0.2319825004 final.tsd.in(weight = 0.80, max.comb.test = FALSE, n1 = n1, GMR1 = GMR1, CV1 = CV1, df1 = df1, SEM1 = SEM1, n2 = n2, GMR2 = GMR2, CV2 = CV2, df2 = df2, SEM2 = SEM2) TSD with 2x2 crossover Inverse Normal approach - Standard combination test with weight for stage 1 = 0.8 - Significance levels (s1/s2) = 0.03585 0.03585 - Critical values (s1/s2) = 1.80107 1.80107 - BE acceptance range = 0.8 ... 1.25 Final analysis after second stage - Derived key statistics: z1 = 1.98949, z2 = 4.22696 Repeated CI = (0.81071, 1.03975) Median unbiased estimate = 0.9179 - Decision: BE achieved

This was a HVD and hence, the large n₁. Due to the nature of the drug, reference-scaling was not an option. It was a formulation change, we had pilot data, and therefore, we assumed a GMR of 0.95 (and not 0.90 as usual for HVDs). We opted for the Standard Combination test with a weight of 0.80 because it was expected to give us the highest power already in the IA. We went all in (fully adaptive: sample size re-estimation based on CV₁ and GMR₁). We set a minimum stage 2 sample size of six (the method’s default is four and still ‘works’ with three if not all are in the same sequence). We didn’t want the model to collapse. We also set a maximum total sample size of 140 and a futility on the PE in the IA.

Yes, we performed lots of simulations to show that our setup is reasonable… To give you an idea:

Compare the Maximum Combination Test (weights 0.50|0.20–0.80) and the Standard Combination Test (weight 0.20–0.80) in terms of stopping for futility and power in the IA and FA. Based on that we opted for the SCT with a weight of 0.80.
Impact of dropouts in the first stage. We wanted to dose 96 and expected to have 76 eligible. Hence, we assessed zero (n₁ = 96) to 26 dropouts (n₁ = 70).
Reproducibility of simulations (20 runs with random seeds).
Power in the first stage dependent on the number of sites. With every additional site you loose one degree of freedom. In our case not an issue (relatively high n₁ and ≤16 sites).
Probability that the FA is not possible due to more dropouts than anticipated. With our setup it was ~0.2%. However, we had a condition in the protocol that in such a case more subjects have to be recruited. The probability decreased exponentially. With 12 dosed subjects in the second stage it was just 0.001%.
Interim and final power for n₁ 70–96 and CV₁ 0.25–0.75 (based on the pilot we expected 0.40).
Although the TIE is strictly controlled, simulations of the empiric TIE for CV₁ 0.25–0.75, each with n₁ 70–96.

❝ ❝ […] a deficiency letter of a European agency where a study (passing BE with ‘Method B’ already in the first stage) was not accepted. Passed BE with the exact method as well…

❝

❝ But 'Method B' success in Stage 1 means your were already within the BE limits with even wider intervals …

Yep.

❝ … (i.e. even smaller patient risk)!

Not necessarily. If you accept that ‘Method B’ is the only one (before Maurer’s paper I preferred ‘Method C’), the patient’s risk depends on n₁ and CV₁. In some cases (early stopping for success in the IA or in the FA with a high n₂) it can be as low as α_adj. In cases with a ~50% chance to proceed to stage 2 it can approach (though not exceed) nominal α. The maximum empiric TIE is generally observed at combinations of small n₁ and low to moderate CV₁.

library(Power2Stage) n1 <- 12 # location of the CV <- 0.24 # maximum TIE TIE <- power.tsd(method = "B", alpha = rep(0.0294, 2), CV = CV, n1 = n1, theta0 = 1.25, pmethod = "exact", nsims = 1e6)$pBE # takes a couple of minutes! cat(paste0("Maximum empiric TIE (1,116 scenarios: n1 12\u201372, ", "CV 10\u201380%)", "\nat n1 = ", n1, " and CV = ", 100 * CV, "%: ", TIE, "\n")) Maximum empiric TIE (1,116 scenarios: n1 12–72, CV 10–80%) at n1 = 12 and CV = 24%: 0.048925

❝ Cannot image why someone would reject this? :confused:

See there. Just bullshit. The α_adj = 0.0294 selected by Potvin et al. was arbitrary and not ‘derived’ from Pocock’s Group-Sequential Design for superiority [sic] testing (fixed N and IA at N/2). That’s a widespread misconception. It was no more than a lucky punch. It can be shown that α_adj = 0.0301 controls the TIE as well. Comparison of the study:
$$\small{\begin{array}{llrcc}
\hline
\text{Evaluation} & \text{PK metric} & \alpha_\textrm{adj} & CI & TIE_\textrm{ emp} \\
\hline
\text{Method B} & C_\text{max} & 0.02940 & 91.54-124.84\% & 0.04478 \\
& AUC_\text{0-t} & 0.02940 & 95.38-118.06\% & 0.03017 \\
\text{modif. Method B} & C_\text{max} & 0.03010 & 91.62-124.72\% & 0.04573 \\
& AUC_\text{0-t} & 0.03010 & 91.62-117.99\% & 0.03080 \\
\text{Standard Comb. Test} & C_\text{max} & \sim0.03037 & 91.65-124.68\% & 0.04816 \\
& AUC_\text{0-t} & \sim0.03037 & 94.46-117.96\% & 0.03322 \\
\hline
\end{array}}$$The confidence intervals with the modified ‘Method B’ are similar to the ones obtained by the Inverse Normal Combination Method / SCT, thus confirming that the original ‘Method B’ is already overly conservative. Even in ‘borderline’ cases like this one, the patient’s risk is not compromised if the study is evaluated by ‘Method B’. So what?

Edit (a couple of hours later): Perhaps I’m guilty that the FDA asked you for simulations. Backstory: Originally we wanted to go with a variant of ‘Method C’ cause it’s slightly more powerful (esp. when you expect to stop in the IA with BE) and it is preferred by the FDA.^2,3 However, that meant a lot of simulations to find a suitable α_adj (implementing futility criteria which don’t compromise power are not that easy in simulation-based methods). Then I discovered a goody by authors of the FDA.⁴ Hey, they know Maurer’s paper! Was a game-changer.
However, in the meeting I got the impression that nobody ever submitted such a protocol to the FDA. They were happy with what I presented though it ended in a nightmare. Study in patients, recruitment even in a country with 1.38 billion people difficult. Standard treatment regimen has to be followed and we expected 15% to be excluded due to pre-dose concentrations >5% C_max. Our problem (loss of power, increased producer’s risk). Reply: ‘A washout of less then 5times t_½ in any of the patients is not acceptable. Use a parallel design.’ Roughly 200 patients / arm. My client is still trying to recover from this shock.

European Medicines Agency, CHMP. Guideline on adjustment for baseline covariates in clinical trials. London. 26 February 2015. EMA/CHMP/295050/2013.
Davit B, Braddy AC, Conner DP, Yu LX. International Guidelines for Bioequivalence of Systemically Available Orally Administered Generic Drug Products: A Survey of Similarities and Differences. AAPS J. 2013; 15(4): 974–90. doi:10.1208/s12248-013-9499-x.
Tsang YC, Brandt A (moderators). Session III: Scaling Procedure and Adaptive Design(s) in BE Assessment of Highly Variable Drugs. EUFEPS/AAPS 2^nd International Conference of the Global Bioequivalence Harmonization Initiative. Rockville, MD. 14–16 September 2016.
Lee J, Feng K, Xu M,Gong X, Sun W, Kim J, Zhang Z, Wang M, Fang L, Zhao L. Applications of Adaptive Designs in Generic Drug Development. Clin Pharm Ther. 2020; 110(1): 32–5. doi:10.1002/cpt.2050.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

d_stat
☆

Slovenia,
2021-10-11 19:48
(918 d 00:45 ago)

@ Helmut
Posting: # 22624
Views: 1,649

TSD statistical model - with multiple sites

Post reply

Hi Helmut

Thank you for sharing this additional information and insight into Maurer's method.

❝ Perhaps I’m guilty that the FDA asked you for simulations. Backstory: Originally we wanted to go with a variant of ‘Method C’ cause it’s slightly more powerful (esp. when you expect to stop in the IA with BE) and it is preferred by the FDA.

Thus, we are sticking to Potvin. In order to do the validations for multiple-site nature of the study we are thinking of amending the code in the Power2Stage, since it's not part of package yet. Am I wrong to assume the amendment will be connected to decreasing of df for the error term with the number planned sites, or is there any other part that I am missing?
Of course, the alternative is always to encourage (pray to) the holy trinity to update the package ;-)

❝ Our problem (loss of power, increased producer’s risk). Reply: ‘A washout of less then 5times t_½ in any of the patients is not acceptable. Use a parallel design.’ Roughly 200 patients / arm. My client is still trying to recover from this shock.

I can image the pain, since I understand the effort to execute such study.

Best Regards, d_stat

Helmut
★★★

Vienna, Austria,
2021-10-12 13:27
(917 d 07:06 ago)

@ d_stat
Posting: # 22625
Views: 1,592

TSD statistical model - with multiple sites

Post reply

Hi d_stat,

❝ ❝ […] Originally we wanted to go with a variant of ‘Method C’ cause it’s slightly more powerful (esp. when you expect to stop in the IA with BE) and it is preferred by the FDA.

❝

❝ Thus, we are sticking to Potvin.

I wouldn’t. It might be that the site-model mentioned above is not stable. When I tried to add ‘sites’ to Potvin’s ‘Example 2’ (subjects 1–4: 1, 5–8: 2, 9–12: 3), Phoenix/WinNonlin showed me the finger.

❝ In order to do the validations for multiple-site nature of the study we are thinking of amending the code in the Power2Stage, since it's not part of package yet. Am I wrong to assume the amendment will be connected to decreasing of df for the error term with the number planned sites, …

That’s correct.

❝ … or is there any other part that I am missing?

You would also have to modify the degrees of freedom in the function to re-estimate the sample size (sampsiz2.R).

❝ Of course, the alternative is always to encourage (pray to) the holy trinity to update the package ;-)

Sorry. I'm afraid your prayers will not be answered.

Maurer’s method as implemented in Power2Stage allows already to provide the degrees of freedom and the standard error of the means of a more complex model than the conventional 2×2×2 crossover.

Modifying Potvin’s method(s) would not be a trivial job and you have to demonstrate control of the Type I Error.
Maurer’s controls the TIE in the strict sense and is accepted by the FDA as well.

Hence, why the efforts?

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes