Mikalai
★

Belarus,
2018-06-06 06:28
(906 d 18:47 ago)

(edited by Mikalai on 2018-06-06 06:44)
Posting: # 18854
Views: 3,693

## Two-stage design and 'forced bioequivalence' [Two-Stage / GS Designs]

Dear all,
We plan to conduct a sequential (two stage) BE study, and I am concerned with "forced bioequivalence". Specifically, if we obtain non-equivalent results in the first stage with very low power and should recruit more volunteers, how can we protect ourselves from getting into "forced bioequivalence"? In other words, how can we differentiate between underpowered trials and non-equivalent results in the sequential BE? And how can we put this (protection against "forced bioequivalence") in the protocol not to raise many questions from regulators?
Any suggestions and advice will be appreciated.
Sincerely,
Mikalai
ElMaestro
★★★

Belgium?,
2018-06-06 08:53
(906 d 16:22 ago)

@ Mikalai
Posting: # 18855
Views: 3,264

## Two-stage design and 'forced bioequivalence'

Hi Mikalai,

» We plan to conduct a sequential (two stage) BE study, and I am concerned with "forced bioequivalence". Specifically, if we obtain non-equivalent results in the first stage with very low power and should recruit more volunteers, how can we protect ourselves from getting into "forced bioequivalence"? In other words, how can we differentiate between underpowered trials and non-equivalent results in the sequential BE? And how can we put this (protection against "forced bioequivalence") in the protocol not to raise many questions from regulators?

If you use one of the Potvin variants (and you will do that, no discussion) then forced bioequivalence is not your biggest issue.
Bear in mind that your "very low power" is still based on a fixed GMR of e.g. 0.95, not on the observed GMR. I don't think it is a good idea to fiddle with the Potvin-like decision trees. I mean, if you do a little, minor, innocent modification without running a series of tests for power, sample size and type I error, then all manners of hell can break loose on you. I have seen it several times now.
Forced BE is not a term that is widely adopted from the regulatory side. If the (true) GMR is within 80.00-125.00 then in principle you have a product for which you can one way or another show BE and which can be approvable. Obviously you will never know the true GMR, only you can estimate it through observations which have a variance, hence the need for a CI.
Your two-stage approach is great if you are certain about the GMR (close to 100%) and uncertain about the CV. If you are not convinced that you have a good GMR, then lay your hands off the two-stage approach. Run like hell. It will 'on average' not work well for you.

I could be wrong, but...

Best regards,
ElMaestro

No, of course you do not need to audit your CRO if it was inspected in 1968 by the agency of Crabongostan.
Yura
★

Belarus,
2018-06-07 08:24
(905 d 16:51 ago)

@ ElMaestro
Posting: # 18859
Views: 3,182

## Two-stage design and 'forced bioequivalence'

Hi Mikalai,
'forced bioequivalence' - it's not from "that opera".
If you took for 2х2х2 n = 120 for CL AUCt (90-111.11 - with narrow therapeutic index), CV = 16.5% and recommended GMR = 97.5%, while in calculation (in R) for 2х2х4 you get n = 30.
The sample size is twice as high as possible, which is not ethical - to expose an unknown effect of the test drug to more people than necessary.
If I understand correctly, that's it 'forced bioequivalence'
regards
ElMaestro
★★★

Belgium?,
2018-06-07 11:53
(905 d 13:22 ago)

@ Yura
Posting: # 18863
Views: 3,167

## But what is the real problem?

Hi Yura,

you do your study as best you can, making some assumoptions -good or bad- about GMR and CV.
At the end of the day you may show BE or not, and if you do, then it may be with a large or small margin. I guess forced BE just means the margin was large whatever that means quantitatively.
There is no real issue here. The discussions I have seen about BE consider forced BE as a hindsight phenomenon, like post-hoc power.

If you start fiddling with "forced BE" being convincingly planned before a trial then I would of course oppose it.

Remember: In principle, either the product is BE or it isn't. There just happens to be some uncertainty on the degree by which we can demonstrate it.

I could be wrong, but...

Best regards,
ElMaestro

No, of course you do not need to audit your CRO if it was inspected in 1968 by the agency of Crabongostan.
Yura
★

Belarus,
2018-06-07 12:59
(905 d 12:16 ago)

@ ElMaestro
Posting: # 18864
Views: 3,103

## But what is the real problem?

Hi ElMaestro,
Why then estimate the size of the sample? "Take more, throw farther"
regards
Mikalai
★

Belarus,
2018-06-07 13:47
(905 d 11:28 ago)

@ Yura
Posting: # 18865
Views: 3,109

## But what is the real problem?

Good afternoon everyone
I just would like to clarify a bit the situation. We have a drug and do not know it CV. We take an arbitrary sample and then calculate the CV and real GMR in the first stage. In the second stage, we, if I understand correctly, should calculate post-hoc power and recalculate the sample size with the data (GMR and CV) obtained in the first stage, if bioequivalence has not been achieved in the first stage. But what to do, if we have got bad GMR(0,83), whatever CV and low power (around 30) in the first stage. In this case, as I understand, we should recruit much more subjects according to our protocol. The questions, how can we avoid slipping into "forced bioequivalence"? Or should we go straight away and recruit this large number of subjects? What should be put into the protocol?
Best regards,
Mikalai
Helmut
★★★

Vienna, Austria,
2018-06-07 15:33
(905 d 09:42 ago)

@ Mikalai
Posting: # 18866
Views: 3,189

## But what is the real problem?

Hi Mikalai,

» We have a drug and do not know it CV. We take an arbitrary sample …

Well, I always try to make an educated guess of the CV. If you aim too low in the first stage, the sample size penalty in the second stage will be larger.
Example: “Guesstimate” CV 25%, Potvin B, n1 12 or 24.

library(Power2Stage) power.tsd(CV=0.25, n1=12) TSD with 2x2 crossover Method B: alpha (s1/s2) = 0.0294 0.0294 Target power in power monitoring and sample size est. = 0.8 Power calculation via non-central t approx. CV1 and GMR = 0.95 in sample size est. used No futility criterion BE acceptance range = 0.8 ... 1.25 CV = 0.25; n(stage 1) = 12; GMR = 0.95 1e+05 sims at theta0 = 0.95 (p(BE) = 'power'). p(BE)    = 0.81438 p(BE) s1 = 0.17895 Studies in stage 2 = 81.38% Distribution of n(total) - mean (range) = 32.4 (12 ... 126) - percentiles  5% 50% 95%  12  32  60 power.tsd(CV=0.25, n1=24) TSD with 2x2 crossover Method B: alpha (s1/s2) = 0.0294 0.0294 Target power in power monitoring and sample size est. = 0.8 Power calculation via non-central t approx. CV1 and GMR = 0.95 in sample size est. used No futility criterion BE acceptance range = 0.8 ... 1.25 CV = 0.25; n(stage 1) = 24; GMR = 0.95 1e+05 sims at theta0 = 0.95 (p(BE) = 'power'). p(BE)    = 0.84244 p(BE) s1 = 0.63203 Studies in stage 2 = 33.56% Distribution of n(total) - mean (range) = 29 (24 ... 86) - percentiles  5% 50% 95%  24  24  48

With 12 subjects in the first stage on the average you will have a total sample size of 32.4 (median 32) and with 24 only 29 (median 24). In the latter case you have already a chance of 63% to show BE in the first stage and in the former only 18%.

» … and then calculate the CV and real GMR in the first stage.

Yep. But in ‘Type 1’ TSDs you generally ignore the observed GMR and work with a fixed (assumed) T/R-ratio.

» In the second stage, we, if I understand correctly, should calculate post-hoc power and recalculate the sample size with the data (GMR and CV) obtained in the first stage, if bioequivalence has not been achieved in the first stage.

Nope. You calculate interim power after the first stage. If you want to use the GMR of the first stage as well (go fully adaptive) you might shoot yourself in the foot. Practically you need two futility criteria:
1. Stop if the GMR is outside [0.80, 1.25].
2. Stop if the re-estimated sample size is above a pre-specified limit (U).
The methods of Karalis & Macheras might (!) have terrible power making their application ethically doubtful. Here it works with U 120 if n1 24 but not with n1 12 (power only ~73%). There are alternatives where you don’t stop if n1+n2>U but perform the the second stage in U–n2 subjects. No problem with the Type I Error but might compromise power; I suggest simulations.

» But what to do, if we have got bad GMR(0,83), whatever CV and low power (around 30) in the first stage.

You are free to include futility criteria for early stopping in the method. You don’t have to worry about the adjusted α because any futility criterion decreases the patient’s risk.

» In this case, as I understand, we should recruit much more subjects according to our protocol.

In general you should not give a total sample size in the protocol – unless it is part of the framework (simulations recommended: have an eye on power).

If you are courageous try the Inverse-Normal Combination Method / Maximum Combination Test (Maurer et al. 2018).
Pro: Proven to preserve the Type I Error (makes regulatory statisticians happy).
Con: Might be the first time they ever have seen sumfink like this. Expect questions.
Example: Like above but two futility criteria: GMR within [0.8, 1.25] and maximum total sample size 120. GMR observed in the first stage used (fully adaptive).

library(Power2Stage) power.tsd.in(CV=0.25, n1=24, usePE=TRUE, fClower=0.8, fCupper=1.25,              fCNmax=120, fCrit=c("PE", "Nmax")) TSD with 2x2 crossover Inverse Normal approach  - maximum combination test (weights = 0.5 0.25)  - alpha (s1/s2) = 0.02635 0.02635  - critical value (s1/s2) = 1.93741 1.93741  - with conditional error rates and conditional power Overall target power = 0.8 Threshold in power monitoring step for futility = 0.8 Power calculation via non-central t approx. CV1 and PE1 in sample size est. used Futility criterion Nmax = 120 Futility criterion PE outside 0.8 ... 1.25 Minimum sample size in stage 2 = 4 BE acceptance range = 0.8 ... 1.25 CV = 0.25; n(stage 1) = 24; GMR = 0.95 1e+05 sims at theta0 = 0.95 (p(BE) = 'power'). p(BE)    = 0.84312 p(BE) s1 = 0.60784 Studies in stage 2 = 30.03% Distribution of n(total) - mean (range) = 31.3 (24 ... 120) - percentiles  5% 50% 95%  24  24  68

» The questions, how can we avoid slipping into "forced bioequivalence"? Or should we go straight away and recruit this large number of subjects? What should be put into the protocol?

As ElMaestro wrote above, I don’t see how you could run into “forced BE”.
• No defined total sample size. You perform the second stage in the re-estimated sample size.
• If at the end the CV is lower and/or the GMR “better” than assumed, be happy. Post hoc power is irrelevant. Open a bottle of champagne.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Mikalai
★

Belarus,
2018-06-08 10:24
(904 d 14:51 ago)

@ Helmut
Posting: # 18871
Views: 3,071

## But what is the real problem?

Dear Helmut,
Thank you very much for your explanation. Could you, please, clarify a bit further?

» Nope. You calculate interim power after the first stage. If you want to use the GMR of the first stage as well (go fully adaptive) you might shoot yourself in the foot. Practically you need two futility criteria:
1. Stop if the GMR is outside [0.80, 1.25].
»
2. Stop if the re-estimated sample size is above a pre-specified limit (U).
The methods of Karalis & Macheras might (!) have terrible power making their application ethically doubtful. Here it works with U 120 if n1 24 but not with n1 12 (power only ~73%). There are alternatives where you don’t stop if n1+n2>U but perform the the second stage in U–n2 subjects. No problem with the Type I Error but might compromise power; I suggest simulations.

Are there any rules or recommendations for setting up the pre-specified limit (U) as a futility criterion?
Regards,
Mikalai
Helmut
★★★

Vienna, Austria,
2018-06-08 12:00
(904 d 13:15 ago)

@ Mikalai
Posting: # 18874
Views: 3,119

## U as a futility criterion

Hi Mikalai,

» Are there any rules or recommendations for setting up the pre-specified limit (U) as a futility criterion?

Not really. You have to find a balance between the maximum study costs you are accepting to spend and the potential loss in power. Xu et al.* recommend a futility of 42 on ntotal for CV ≤30% and 180 for CV >30%. Generally a small stage 1 sample size is not a good idea.

library(Power2Stage) power.tsd.fC(method="B", alpha=c(0.0249, 0.0357), CV=0.25, n1=24,              fCrit="CI", fClower=0.9374, max.n=42) # fixed GMR 0.95 TSD with 2x2 crossover Method B: alpha (s1/s2) = 0.0249 0.0357 Interim power monitoring step included Target power in power monitoring and sample size est. = 0.8 Power calculation via non-central t approx. CV1 and GMR = 0.95 in sample size est. used Maximum sample size max.n = 42 Futility criterion 90% CI outside 0.9374 ... 1.06678 BE acceptance range = 0.8 ... 1.25 CV = 0.25; n(stage 1) = 24; GMR = 0.95 1e+05 sims at theta0 = 0.95 (p(BE) = 'power'). p(BE)    = 0.83087 p(BE) s1 = 0.6057 Studies in stage 2 = 33.2% Distribution of n(total) - mean (range) = 27.9 (24 ... 42) - percentiles  5% 50% 95%  24  24  42 power.tsd.fC(method="B", alpha=c(0.0249, 0.0357), CV=0.25, n1=24,              fCrit="CI", fClower=0.9374, max.n=42, usePE=TRUE)              # fully adaptive TSD with 2x2 crossover Method B: alpha (s1/s2) = 0.0249 0.0357 Interim power monitoring step included Target power in power monitoring and sample size est. = 0.8 Power calculation via non-central t approx. CV1 and PE1 in sample size est. used Maximum sample size max.n = 42 Futility criterion 90% CI outside 0.9374 ... 1.06678 BE acceptance range = 0.8 ... 1.25 CV = 0.25; n(stage 1) = 24 1e+05 sims at theta0 = 0.95 (p(BE) = 'power'). p(BE)    = 0.87839 p(BE) s1 = 0.6057 Studies in stage 2 = 33.2% Distribution of n(total) - mean (range) = 30 (24 ... 42) - percentiles  5% 50% 95%  24  24  42

Remember that if you deviate from one of the published methods (except by adding a futility which leads to early stopping) you have to assess the Type I Error. Fine with the setting above:

power.tsd.fC(method="B", alpha=c(0.0249, 0.0357), CV=0.25, n1=24,              fCrit="CI", fClower=0.9374, max.n=42, usePE=TRUE,              theta0=1.25) TSD with 2x2 crossover Method B: alpha (s1/s2) = 0.0249 0.0357 Interim power monitoring step included Target power in power monitoring and sample size est. = 0.8 Power calculation via non-central t approx. CV1 and PE1 in sample size est. used Maximum sample size max.n = 42 Futility criterion 90% CI outside 0.9374 ... 1.06678 BE acceptance range = 0.8 ... 1.25 CV = 0.25; n(stage 1) = 24 1e+06 sims at theta0 = 1.25 (p(BE) = TIE 'alpha'). p(BE)    = 0.045069

The maximum inflation of the TIE is often observed at combinations of small n1 and low CV. The minimum n1 for Xu’s method is 18. With CV 10% we get a TIE of 0.035744.

• Xu J, Audet C, DiLiberti CE, Hauck WW, Montague TH, Parr AF, Potvin D, Schuirmann DJ. Optimal adaptive sequential designs for crossover bioequivalence studies. Pharm Stat. 2016;15(1):15–27. doi:10.1002/pst.1721.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes