Bioequivalence and Bioavailability Forum • TSDs for RSABE/ABEL

Helmut
★★★

Vienna, Austria,
2015-05-06 04:17
(3718 d 21:00 ago)

Posting: # 14753
Views: 16,768

TSDs for RSABE/ABEL [Two-Stage / GS Designs]

To whom it may concern!

Since Two-Stage Designs were mentioned in various guidelines (EMA, FDA,…) sponsors want to combine them with options for HVDs/HVDPs (EMA: ABEL, FDA: RSABE). I do understand that this a tempting idea. What I have heard at recent conferences gives me the impression that sponsors already performed such studies. I don’t know whether any one of them was accepted. I know of one leading to a deficiency letter.

I will summarize the caveats.

Reference-scaling might lead to an inflation of the type I error (TIE) – caused by modifying the Null-Hypothesis “in the face of the data”. The method is intrinsically sequential:
- Estimate the intra-subject variability of the reference.
- Eventually widen the acceptance-range (EMA) or assess the linearized criterion (FDA).
Chow and Liu¹ state:
[…] the equivalence limits for the scaled ABE involve the unknown intra-subject variability. As a result, the equivalence limits become random variables and are not fixed constants and the variability introduced by scaling will have an impact on the type I error rate. Therefore, the attempt to use the scaled ABE for resolution of high variable drug products will face the similar issues and challenges that the individual and population bioequivalence encountered in the 1990s […]. These issues must be satisfactorily addressed before the scaled ABE can be implemented into regulatory framework.
Inflation of the TIE was already reported by Endrényi and Tóthfalusi² and recently demonstrated by Wonnemann et al.³ for the EMA’s method by mans of simulations.
So far Two-Stage Sequential methods are only published for 2×2×2 crossovers and studies in a two-group parallel design. See the review⁴ for references. With one exception (“Type 2” TSDs showing BE in the first stage with at least the target power) all of them require an adjustment of the nominal α in order to maintain the overall TIE at ≤0.05.
Though attempts have been made to adjust α based on the first stage’s data^5,6 regulatory acceptance is problematic, since – at least for the EMA – the adjusted level has to be pre-specified in the protocol.
Until somebody comes up with a clever algo and publishes it, the combination of TSDs with reference-scaling remains an open issue. It boils down to a Radio Yerevan’s answer: “In principle yes, but you have to demonstrate that the patient’s risk is maintained.”
If one wants to pre-specify an adjusted α (GL…) one has to (empirically) find one which keeps the TIE for all possible combinations of stage 1 sample sizes and CVs. Currently I use a matrix with ~1,000 combos (n₁ and CV with a step size of 2). For every grid-point one has to simulate 10⁶ BE-studies for the TIE (slow convergence) and 10⁵ for power. This requires 1.1×10⁹ simulations.
Contrary to 2×2×2 and parallel designs – where power can be directly calculated – for scaling we need simulations since the limits are not fixed, we have a restriction on the GMR, and an upper cap at 50% for EMA. The convergence is fine, but we still need 10⁵ simulations within every simulated study. This leads to ~10¹⁴ (110,000,000,000,000‼) simulations overall.
To give you an idea:

run0 <- 1e7; run <- 1:run0 ptm <- proc.time() # time the loop's overhead for(j in seq_along(run)) { } t0 <- as.numeric(1000*(proc.time()[3]-ptm[3])/run0) run1 <- 1e4; run <- 1:run1 ptm <- proc.time() for(j in seq_along(run)) { power.TOST(CV=0.4, theta0=0.9, design="2x2x2", n=134, method="nct") } t1 <- as.numeric(1000*(proc.time()[3]-ptm[3])/run1-t0) run2 <- 1e3; run <- 1:run2 ptm <- proc.time() # ABEL (EMA) for(j in seq_along(run)) { power.scABEL(CV=0.4, theta0=0.9, design="2x2x4", n=30, nsims=1e5) } t2 <- as.numeric(1000*(proc.time()[3]-ptm[3])/run2-t0) run3 <- 1e3; run <- 1:run3 ptm <- proc.time() # RSABE (FDA) for(j in seq_along(run)) { power.RSABE(CV=0.4, theta0=0.9, design="2x2x4", n=24, nsims=1e5) } t3 <- as.numeric(1000*(proc.time()[3]-ptm[3])/run3-t0) cat("Runtimes of PowerTOST's functions:\n", sprintf("%s %5.2f %s", "2x2x2 ABE :", t1, "ms\n"), sprintf("%s %5.2f %s (%1.0fx slower)%s", "2x2x4 ABEL (EMA):", t2, "ms", t2/t1, "\n"), sprintf("%s %5.2f %s (%1.0fx slower)%s", "2x2x4 RSABE (FDA):", t3, "ms", t3/t1, "\n"))

Gives on my machine:

Runtimes of PowerTOST's functions: 2x2x2 ABE : 2.22 ms 2x2x4 ABEL (EMA): 82.82 ms (37x slower) 2x2x4 RSABE (FDA): 80.45 ms (36x slower)

Detlew did a great job. The scaled power-functions are not by a factor of 10⁵ slower – only ~40times. If you have a lot of time, go ahead and become famous.
Furthermore, the intra-subject variabilities of test and reference must not be identical. Keep that in mind.

Edit: Easier with the package microbenchmark:

library(microbenchmark) library(PowerTOST) microbenchmark( power.TOST(CV=0.4, theta0=0.9, design="2x2x2", n=134, method="nct"), power.scABEL(CV=0.4, theta0=0.9, design="2x2x4", n= 30, nsims=1e5), power.RSABE(CV=0.4, theta0=0.9, design="2x2x4", n= 24, nsims=1e5))

On my machine (slow because I have 8 R-sessions running; CPU-load >95%):

Unit: milliseconds expr min lq mean median uq max neval power.TOST() 3.703643 4.336665 5.82738 4.784337 5.475469 67.81569 100 power.scABEL() 84.496413 91.930565 120.51301 97.457502 156.742944 176.92144 100 power.RSABE() 91.391727 103.207520 139.51786 155.408528 167.997411 205.35759 100

Chow S-C, Liu J-p. Design and Analysis of Bioavailability and Bioequivalence Studies. Boca Raton: Chapman & Hall/CRC; 3^rd ed 2009: p. 598.
Endrényi L, Tóthfalusi L. Regulatory conditions for the determination of bioequivalence of highly variable drugs. J Pharm Pharm Sci. 2009;12:138–49. free resource.
Wonnemann M, Frömke C, Koch A. Inflation of the Type I Error: Investigations on Regulatory Recommendations for Bioequivalence of Highly Variable Drugs. Pharm Res. 2015;32(1):135–43. doi:10.1007/s11095-014-1450-z.
Schütz H. Two-stage designs in bioequivalence trials. Eur J Clin Pharmacol. 2015;71(3):271-81. doi:10.1007/s00228-015-1806-2.
Fuglsang A. Controlling type I errors for two-stage bioequivalence study designs. Clin Res Reg Aff. 2011;28(4):100–5. doi:10.3109/10601333.2011.631547.
Kieser M, Rauch G. Two-stage designs for cross-over bioequivalence trials. Stat Med. 2015;34(16):2403–16. doi:10.1002/sim.6487.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

nobody nothing 2015-05-06 13:13 (3718 d 12:04 ago) @ Helmut Posting: # 14754 Views: 13,163	TSDs for RSABE/ABEL Post reply
	Who planned the one with the deficiency letter? ...just asking Οὐδείς — Kindest regards, nobody

Helmut
★★★

Vienna, Austria,
2015-05-06 13:55
(3718 d 11:21 ago)

@ nobody
Posting: # 14755
Views: 13,175

TSDs for RSABE/ABEL

Post reply

Γειά σου Οδυσσέα!

❝ Who planned the one with the deficiency letter? :-D

Last week during a coffee break at the EGA/EMA-workshop I prevented yet another one. The “design” was suggested by a ~~ruthless, greedy~~ incompetent [image]

consultant. :no:

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

nobody nothing 2015-05-06 14:26 (3718 d 10:50 ago) @ Helmut Posting: # 14757 Views: 13,071	TSDs for RSABE/ABEL Post reply
	...maybe this team :o) ...but maybe someone solved alpha dilemma... — Kindest regards, nobody

ElMaestro
★★★

Denmark,
2015-05-06 14:31
(3718 d 10:45 ago)

@ Helmut
Posting: # 14758
Views: 13,086

TSDs for RSABE/ABEL

Post reply

Hi Helmut,

this was a great post, and really a necessary one. I hope it will make a difference.
It is the eternal opportunistic battle

a. "Noone has proven it doesn't work so we'll try it."

vs.

b. "So-And-So et al. showed that it works so we'll try it."

[and absence of evidence is not evidence of absence, perhaps??]
I have heard that there is going to be a symposium in Prague in two weeks where one of the true authorities on the issue of two-stage approaches (I am talking about a certain consultant from Vienna who also has a nasty habit of torturing statistical software) is a panelist; perhaps this guy will use the opportunity to express his such views. I certainly hope so as the message conveyed here really deserves to be wider known and could spare companies for much trouble.

—
Pass or fail!
ElMaestro

Helmut
★★★

Vienna, Austria,
2015-05-06 15:49
(3718 d 09:28 ago)

@ ElMaestro
Posting: # 14763
Views: 13,169

TSDs for RSABE/ABEL

Post reply

Hi ElMaestro,

❝ this was a great post, and really a necessary one.

THX.

❝ a. "Noone has proven it doesn't work so we'll try it."

Courageous (or just stupid?)

Primary concern in bioequivalence assessment is to limit the risk of erroneously accepting bioequivalence, i.e., maintain the patient’s risk (α) at ≤0.05.^1,2 Only statistical procedures, which do not exceed the nominal risk of 5%, can be accepted.^3,4

❝ I have heard that there is going to be a symposium in Prague in two weeks where one of the true authorities on the issue of two-stage approaches (I am talking about a certain consultant from Vienna who also has a nasty habit of torturing statistical software) is a panelist…

I have heard that another eminent consultant from Haderslev (having the habit of coding nasty software) will be a panelist as well.

McGilveray IJ, Midha KK, Skelly JP, Dighe S, Doluisio JT, French IW, Karim A, Burford R. Consensus Report from ‘Bio International ’89’: Issues in the Evaluation of Bioavailability Data. J Pharm Sci. 1990;79(10):945–6. doi 10.1002/jps.2600791022
Melander H. Problems and Possibilities with the Add-On Subject Design. In: Midha KK, Blume HH. (eds). Bio-International. Bioavailability, Bioequivalence and Pharmacokinetics. Stuttgart: medpharm; 1992; p. 85–90.
Commission of the European Communities. CPMP Guideline: Investigation of Bioavailability and Bioequivalence. Brussels 1991;111/54/89_EN. online
International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. ICH Harmonised Tripartite Guideline: Statistical Principles for Clinical Trials. E9 (Step 4, 1998) online

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

d_labes
★★★

Berlin, Germany,
2015-05-08 22:47
(3716 d 02:30 ago)

(edited on 2015-05-09 17:09)
@ Helmut
Posting: # 14778
Views: 12,913

TSDs for RSABE/ABEL - run-time

Post reply

Dear Helmut,

this is a very eminent important post

.

But I believe those to whom it really concerns unfortunately will not read it or are not able to read or duno understand its message :no:

❝ Contrary to 2×2×2 and parallel designs – where power can be directly calculated – for scaling we need simulations since the limits are not fixed, we have a restriction on the GMR, and an upper cap at 50% for EMA. The convergence is fine, but we still need 10⁵ simulations within every simulated study. This leads to ~10¹⁴ (110,000,000,000,000‼) simulations overall.

This is an optimistic (sic!) estimation since you forgot the sims necessary for the sample size estimation step of 2 ... 8 iterations with 10⁵ simulations each, since the start values of sample size search are naturally not as good as for the case of the classical sample size estimation.

So let us estimate more pessimistically (but may be too optimistically) an additional factor of 4 (one for the power monitoring step + 3 for the sample size estimation) within each of the simulations of your preferred grid of CV and n1.
This will then give, using your determined ~80 msecs, ~ 4x10¹⁴ sims x 80 ms =
3.2x 10¹³ secs = 533.333.333.333 min = 8.888.888.889 h = 370.370.370 days = 1.014.713 years!
calculated with the newest R version 3.2.0
(hope I got all the conversions using the thousands sparator "." correct, prove me wrong)
Oh what a weired calculation. Seems yesterday evening there was one beer too much :party:

. And alcohol impairs the ability of self critics.
Try it once more, hope now I got it right:
1.1×10⁹ sims each with 4x80 msecs for the power calculations =
5,866,667 min = 97,777.78 h = 4,074.074 days = 11.16185 years

Even if someone came out with a much more clever algo having a boost of 1E3 (thousand) faster than mine in PowerTOST (which I of course can't believe that it is possible to beat me to such an amount :cool:

) ~~we all will not be alive~~ we have to wait more than one month (~41 days) for noticing the result for one shot of alpha adjustment. And you need more than one ...

Just to make your message crystal clear!

If this all is necessary to be really done for a certain regulatory body (which one remains my secret :not really:

) remains an open question. As elsewhere said: Regulators are a bunch of strange people.

—
Regards,

Detlew

Helmut
★★★

Vienna, Austria,
2015-05-10 03:11
(3714 d 22:06 ago)

@ d_labes
Posting: # 14779
Views: 12,846

Really worth the efforts?

Post reply

Dear Detlew,

❝ this is a very eminent important post .

THX. Was mainly a kind of self-defense. If such an idea pops up in the future, I would just link to this post.

❝ ❝ […] we still need 10⁵ simulations within every simulated study. This leads to ~10¹⁴ (110,000,000,000,000‼) simulations overall.

❝

❝ This is an optimistic (sic!) estimation […]

❝

❝ So let us estimate more pessimistically (but may be too optimistically) an additional factor of 4 (one for the power monitoring step + 3 for the sample size estimation) within each of the simulations […]

At least for “Type 1” designs, you need only one step – as you ingeniously have implemented it in Power2Stage. An estimated total sample size ≤ n₁ implies power ≥ target. ;-)

❝ Just to make your message crystal clear!

Yep. Whether it takes five or fifty years makes no difference in practice. It simply takes too long.

❝ If this all is necessary to be really done for a certain regulatory body (which one remains my secret :not really: ) remains an open question. As elsewhere said: Regulators are a bunch of strange people.

I think that it is justified if regulators ask for demonstration that the TIE is maintained. That’s their job.
Sponsors should learn that Two-Stage Design are not the jack of all trades device (<span lang="de"> „eierlegende Wollmilchsau” </span>) as they inadvertently believe. TSDs are fine in ABE to deal with an uncertain CV. This is not necessary if we apply scaling – which inherently takes care of higher than expected variability. If a study is powered for ABE, everything should be OK (well, cough, inflation in ABEL/RSABE – another story).
Like in ABE the nasty thing is the ratio…

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes