## TSD statistical model - with multiple sites [Two-Stage / GS Designs]

» And if I deducted correctly, this helps that at least for

**FDA**statistical model for TSD we can therefore omit interaction term and always combine stage data

»

» We will conduct study on multiple sites, so it adds complexity to the statistical models to be used:

Confirmed. I had a ‘Type A’ meeting with the FDA last March. Agreed that the stupid site-by-treatment interaction can be dropped (as any pre-test it inflates the Type I Error

^{1}). The model was like yours:

*site*,

*sequence*,

*treatment*,

*subject (nested within site × sequence)*,

*period (nested within site)*, and

*site-by-sequence interaction*,

*subject (nested within site × sequence)*is a random effect and all other effects are fixed.

Of course, we proposed Maurer’s method. Note that there is no stage-term in the model because the interim (IA) and final analysis (FA) are evaluated separately (though the entire information is used in the FA by the repeated confidence intervals).

In practice, run the mixed-model in both stages. You need the actual values of n

_{1}, CV

_{1}, GMR

_{1}, df

_{1}, and SEM

_{1}and in the – optional – FA additionally CV

_{2}, GMR

_{2}, df

_{2}, and SEM

_{2}.

» » It is implemented in the -package

`Power2Stage`

since April 2018. »

» Indeed, we have used -package

`Power2Stage`

calculations when discussing approach with the FDA. These packages are lifesaver THX especially to Detlew Labes and Benjamin Lang.

» Regardless FDA still requires us to submit simulations on the validated model to justify our "specific" TSD approach. We still need to figure out what this means.

An example (simulated data of a study which proceeds to the second stage):

`library(Power2Stage)`

# defaults used:

# alpha = 0.05

# theta1 = 0.80

# theta2 = 1.25

# targetpower = 0.80

n1 <- 76

CV1 <- 0.4237714285

GMR1 <- 0.8818736281

df1 <- 65

SEM1 <- 0.06592665941

# values which are not the defaults

interim.tsd.in(weight = 0.80,

max.comb.test = FALSE,

GMR = 0.95, usePE = TRUE,

min.n2 = 6, max.n = 140,

n1 = n1, GMR1 = GMR1, CV1 = CV1,

df1 = df1, SEM1 = SEM1, fCrit = "PE",

ssr.conditional = "error_power",

pmethod = "exact")

TSD with 2x2 crossover

Inverse Normal approach

- Standard combination test with weight for stage 1 = 0.8

- Significance levels (s1/s2) = 0.03585 0.03585

- Critical values (s1/s2) = 1.80107 1.80107

- BE acceptance range = 0.8 ... 1.25

- Observed point estimate from stage 1 is used for SSR

- With conditional error rates and conditional estimated target power

Interim analysis after first stage

- Derived key statistics:

z1 = 1.46015, z2 = 4.80735

Repeated CI = (0.78160, 0.99501)

Median unbiased estimate = NA

- No futility criterion met

- Test for BE not positive (not considering any futility rule)

- Calculated n2 = 6

- Decision: Continue to stage 2 with 6 subjects

n2 <- c(3, 2) # six dosed, one dropout

CV2 <- 0.5761171133

GMR2 <- 1.302483215

df2 <- 3

SEM2 <- 0.2319825004

final.tsd.in(weight = 0.80,

max.comb.test = FALSE,

n1 = n1, GMR1 = GMR1, CV1 = CV1,

df1 = df1, SEM1 = SEM1,

n2 = n2, GMR2 = GMR2, CV2 = CV2,

df2 = df2, SEM2 = SEM2)

TSD with 2x2 crossover

Inverse Normal approach

- Standard combination test with weight for stage 1 = 0.8

- Significance levels (s1/s2) = 0.03585 0.03585

- Critical values (s1/s2) = 1.80107 1.80107

- BE acceptance range = 0.8 ... 1.25

Final analysis after second stage

- Derived key statistics:

z1 = 1.98949, z2 = 4.22696

Repeated CI = (0.81071, 1.03975)

Median unbiased estimate = 0.9179

- Decision: BE achieved

*n*

_{1}. Due to the nature of the drug, reference-scaling was not an option. It was a formulation change, we had pilot data, and therefore, we assumed a GMR of 0.95 (and not 0.90 as usual for HVDs). We opted for the Standard Combination test with a weight of 0.80 because it was expected to give us the highest power already in the IA. We went all in (fully adaptive: sample size re-estimation based on

*CV*

_{1}and

*GMR*

_{1}). We set a minimum stage 2 sample size of six (the method’s default is four and still ‘works’ with three if not all are in the same sequence). We didn’t want the model to collapse. We also set a maximum total sample size of 140 and a futility on the PE in the IA.

Yes, we performed lots of simulations to show that our setup is reasonable… To give you an idea:

- Compare the Maximum Combination Test (weights 0.50|0.20–0.80) and the Standard Combination Test (weight 0.20–0.80) in terms of stopping for futility and power in the IA and FA. Based on that we opted for the SCT with a weight of 0.80.

- Impact of dropouts in the first stage. We wanted to dose 96 and expected to have 76 eligible. Hence, we assessed zero (
*n*_{1}= 96) to 26 dropouts (*n*_{1}= 70).

- Reproducibility of simulations (20 runs with random seeds).

- Power in the first stage dependent on the number of sites. With every additional site you loose one degree of freedom. In our case not an issue (relatively high
*n*_{1}and ≤16 sites).

- Probability that the FA is not possible due to more dropouts than anticipated. With our setup it was ~0.2%. However, we had a condition in the protocol that in such a case more subjects have to be recruited. The probability decreased exponentially. With 12 dosed subjects in the second stage it was just 0.001%.

- Interim and final power for
*n*_{1}70–96 and*CV*_{1}0.25–0.75 (based on the pilot we expected 0.40).

- Although the
*TIE*is strictly controlled, simulations of the empiric*TIE*for*CV*_{1}0.25–0.75, each with*n*_{1}70–96.

» » […] a deficiency letter of a European agency where a study (passing BE with ‘Method B’ already in the first stage) was not accepted. Passed BE with the exact method as well…

»

» But 'Method B' success in Stage 1 means your were already within the BE limits with even wider intervals …

Yep.

» … (i.e. even smaller patient risk)!

Not necessarily. If you accept that ‘Method B’ is the only one (before Maurer’s paper I preferred ‘Method C’), the patient’s risk depends on

*n*

_{1}and

*CV*

_{1}. In some cases (early stopping for success in the IA or in the FA with a high

*n*

_{2}) it can be as low as α

_{adj}. In cases with a ~50% chance to proceed to stage 2 it can approach (though not exceed) nominal α. The maximum empiric

*TIE*is generally observed at combinations of small

*n*

_{1}and low to moderate

*CV*

_{1}.

`library(Power2Stage)`

n1 <- 12 # location of the

CV <- 0.24 # maximum TIE

TIE <- power.tsd(method = "B",

alpha = rep(0.0294, 2),

CV = CV, n1 = n1,

theta0 = 1.25,

pmethod = "exact",

nsims = 1e6)$pBE # takes a couple of minutes!

cat(paste0("Maximum empiric TIE (1,116 scenarios: n1 12\u201372, ",

"CV 10\u201380%)", "\nat n1 = ", n1, " and CV = ", 100 * CV,

"%: ", TIE, "\n"))

Maximum empiric TIE (1,116 scenarios: n1 12–72, CV 10–80%)

at n1 = 12 and CV = 24%: 0.048925

» Cannot image why someone would reject this?

See there. Just bullshit. The α

_{adj}= 0.0294 selected by Potvin

*et al.*was arbitrary and not

*‘derived’*from Pocock’s Group-Sequential Design for superiority [

*sic*] testing (

*fixed N*and IA at

*N*/2). That’s a widespread misconception. It was no more than a lucky punch. It can be shown that α

_{adj}= 0.0301 controls the

*TIE*as well. Comparison of the study:

$$\small{\begin{array}{llrcc}

\hline

\text{Evaluation} & \text{PK metric} & \alpha_\textrm{adj} & CI & TIE_\textrm{ emp} \\

\hline

\text{Method B} & C_\text{max} & 0.02940 & 91.54-124.84\% & 0.04478 \\

& AUC_\text{0-t} & 0.02940 & 95.38-118.06\% & 0.03017 \\

\text{modif. Method B} & C_\text{max} & 0.03010 & 91.62-124.72\% & 0.04573 \\

& AUC_\text{0-t} & 0.03010 & 91.62-117.99\% & 0.03080 \\

\text{Standard Comb. Test} & C_\text{max} & \sim0.03037 & 91.65-124.68\% & 0.04816 \\

& AUC_\text{0-t} & \sim0.03037 & 94.46-117.96\% & 0.03322 \\

\hline

\end{array}}$$The confidence intervals with the modified ‘Method B’ are similar to the ones obtained by the Inverse Normal Combination Method / SCT, thus confirming that the original ‘Method B’ is already overly conservative. Even in ‘borderline’ cases like this one, the patient’s risk is not compromised if the study is evaluated by ‘Method B’. So what?

Edit (a couple of hours later): Perhaps I’m guilty that the FDA asked you for simulations. Backstory: Originally we wanted to go with a variant of ‘Method C’ cause it’s slightly more powerful (esp. when you expect to stop in the IA with BE) and it is preferred by the FDA.

^{2,3}However, that meant a lot of simulations to find a suitable α

_{adj}(implementing futility criteria which don’t compromise power are not that easy in simulation-based methods). Then I discovered a goody by authors of the FDA.

^{4}Hey, they know Maurer’s paper! Was a game-changer.

However, in the meeting I got the impression that nobody ever submitted such a protocol to the FDA. They were happy with what I presented though it ended in a nightmare. Study in patients, recruitment even in a country with 1.38 billion people difficult. Standard treatment regimen has to be followed and we expected 15% to be excluded due to pre-dose concentrations >5%

*C*

_{max}. Our problem (loss of power, increased producer’s risk). Reply: ‘A washout of less then 5times

*t*

_{½}in

*any*of the patients is not acceptable. Use a parallel design.’ Roughly 200 patients / arm. My client is still trying to recover from this shock.

- European Medicines Agency, CHMP.
*Guideline on adjustment for baseline covariates in clinical trials.*London. 26 February 2015. EMA/CHMP/295050/2013.

- Davit B, Braddy AC, Conner DP, Yu LX.
*International Guidelines for Bioequivalence of Systemically Available Orally Administered Generic Drug Products: A Survey of Similarities and Differences.*AAPS J. 2013; 15(4): 974–90. doi:10.1208/s12248-013-9499-x.

- Tsang YC, Brandt A (moderators).
*Session III: Scaling Procedure and Adaptive Design(s) in BE Assessment of Highly Variable Drugs.*EUFEPS/AAPS 2^{nd}International Conference of the Global Bioequivalence Harmonization Initiative. Rockville, MD. 14–16 September 2016.

- Lee J, Feng K, Xu M,Gong X, Sun W, Kim J, Zhang Z, Wang M, Fang L, Zhao L.
*Applications of Adaptive Designs in Generic Drug Development.*Clin Pharm Ther. 2020; 110(1): 32–5. doi:10.1002/cpt.2050.

*Dif-tor heh smusma*🖖

Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮

Science Quotes

### Complete thread:

- Two Stage Desing: ANOVA earlybird 2009-10-16 11:25 [Two-Stage / GS Designs]
- Potvin et al: effects in stage 2 Helmut 2009-10-16 13:32
- Potvin et al: effects in stage 2 d_stat 2021-07-19 16:26
- Forget simulation-based TSDs for 2×2×2 in Europe Helmut 2021-07-19 17:22
- TSD statistical model - with multiple sites d_stat 2021-07-20 13:58
- TSD statistical model - with multiple sitesHelmut 2021-07-20 21:52
- TSD statistical model - with multiple sites d_stat 2021-10-11 17:48
- TSD statistical model - with multiple sites Helmut 2021-10-12 11:27

- TSD statistical model - with multiple sites d_stat 2021-10-11 17:48

- TSD statistical model - with multiple sitesHelmut 2021-07-20 21:52

- TSD statistical model - with multiple sites d_stat 2021-07-20 13:58

- Forget simulation-based TSDs for 2×2×2 in Europe Helmut 2021-07-19 17:22

- Potvin et al: effects in stage 2 d_stat 2021-07-19 16:26

- Potvin et al: effects in stage 2 Helmut 2009-10-16 13:32