Bioequivalence and Bioavailability Forum

Mauricio Sampaio
★

Brazil,
2015-04-06 10:37
(3306 d 23:47 ago)

Posting: # 14660
Views: 17,734

Replicated 4×2 and two-stage-design, in the same protocol [Two-Stage / GS Designs]

Are there any problem if, in the same protocol, replicated 4x2 and two-stage-design are considered?

For example, in first group were used 96 subjects on replicated (4x2) design with IC 95%. The result wasn't bioequivalent and the power was less than 80%. Therefore, a second group was added with 48 subjects, on replicated (4x2) design with the same IC 95%. In the end, first and second stage were combined and the result was bioequivalent.

I am to consider this strategy because I don't know how much variability of drug is that! But I would like to save money in the first moment risking a short number of subject and to have a opportunity to increase this number in the same study of bioequivalence.
Is it possible? Or I am crazy in front of the statistic and statisticians? :confused:

Edit: Category changed. [Helmut]

ElMaestro
★★★

Denmark,
2015-04-06 10:53
(3306 d 23:31 ago)

@ Mauricio Sampaio
Posting: # 14661
Views: 15,812

Replicated 4×2 and two-stage-design, in the same protocol

Post reply

Hi Mauricio,

❝ Is it possible? Or I am crazy in front of the statistic and statisticians? :confused:

Combining 2-stage and replicated studies in one protocol is hitherto undescribed in the scientific literature. So you likely do not know how the type I error is controlled or how your method performs in terms of power/sample sizes. If the agency asks you to give proof that the Type I Error is controlled you might be somewhat doomed?

There appears to me to be no reason why it wouldn't work, though. It just needs to be investigated and reported.

—
Pass or fail!
ElMaestro

Helmut
★★★

Vienna, Austria,
2015-04-06 16:20
(3306 d 18:04 ago)

@ Mauricio Sampaio
Posting: # 14662
Views: 16,054

Science fiction

Post reply

Hi Mauricio,

❝ Are there any problem if, in the same protocol, replicated 4x2 and two-stage-design are considered?

A lot of problems. I agree with ElMaestro.

❝ For example, in first group were used 96 subjects on replicated (4x2) design with IC 95%. The result wasn't bioequivalent and the power was less than 80%.

Wow. I guess by “IC” you mean the GMR or T/R-ratio, right?* For FDA’s RSABE that would imply a CV of >361% and for EMA’s ABEL still a CV of >166%. What a nasty drug/formulation! BTW, for HVDs / HVDPs assuming a GMR of 95% is not a good idea. The two Lászlós recommend 90% – even if a “better” one was observed in a previous study.
You cannot simply assess the study for BE (α 0.05 or 90% CI) – that’s an add-on design, which was shown to inflate the TIE.^1,2 In TSDs you have to use an adjusted α (at least if you proceed to the second stage).

❝ Therefore, a second group was added with 48 subjects, on replicated (4x2) design with the same IC 95%. In the end, first and second stage were combined and the result was bioequivalent …

… with a completely unknown type I error. If you used 0.05 already in the first stage you are dead.

❝ I am to consider this strategy because I don't know how much variability of drug is that!

Reference-scaling was developed to deal with the CV. If the CV turns out to be higher than expected you are allowed to scale more – and don’t loose too much power:

[image]

[image]

Problems arise not from the CV but from the GMR! BTW, most TSDs assume a fixed GMR. Full adaptive ones (i.e., adjusting for the observed GMR in the first stage) require a futility criterion and quite often are lacking power.^2,3,4

❝ Is it possible?

Not yet – unless you have access to a massively parallel supercomputer. You would have to find a suitable adjusted α and demonstrate beforehand that the overall type I error is maintained. Unlike in conventional (crossovers, parallel) designs due to the mixed-strategy (GMR-restriction of 0.80–1.25, no scaling at CV <30%; CVs >50% treated as if CV=50% for EMA) the power/sample-size estimation needs 10⁵ simulations. Combine that with the 10⁶ (slow convergence) needed to simulate the TIE in an entire grid of possible n₁/CV-combinations. You’ll end up with 10¹³–10¹⁴ simulations…

Recently I faced an example where the sponsor (despite serious warnings of the CRO) insisted in a similar design. The sponsor is always right. :crying:

A regulator asked for justification of the chosen α. I made a quick estimation (I have a very fast workstation): ~60 years running 24/7…
You don’t want to go there.

Wonnemann M, Frömke C, Koch A. Inflation of the Type I Error: Investigations on Regulatory Recommendations for Bioequivalence of Highly Variable Drugs. Pharm Res. 2015;32(1):135–43. doi:10.1007/s11095-014-1450-z
Schütz H. Two-stage designs in bioequivalence trials. Eur J Clin Pharmacol. 2015;71(3):271-81. doi:10.1007/s00228-015-1806-2
Fuglsang A. Futility rules in bioequivalence trials with sequential designs. AAPS J. 2014;16(1):79–82. doi:10.1208/s12248-013-9540-0
Kieser M, Rauch G. Two-stage designs for cross-over bioequivalence trials. Stat Med. 2015;34(16):2403–16. doi:10.1002/sim.6487

After reading your post again, I think I was in error. By IC you mean the confidence interval (<span lang="pt">intervalo de confiança</span>). Forget what I wrote about the GMR. So it seems that you applied Bonferroni’s α 0.025 (95% CI). Will this control the TIE? Nobody knows. Reference-scaling itself might lead to an inflated TIE (see Wonnemann’s paper and this thread). Anyhow, you would have to prove that the TIE is controlled.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Dr_Dan
★★

Germany,
2016-03-04 11:33
(2973 d 21:51 ago)

@ Helmut
Posting: # 16048
Views: 14,113

Science fiction

Post reply

Dear all
nearly one year ago ElMaestro wrote:

❝ Combining 2-stage and replicated studies in one protocol is hitherto undescribed in the scientific literature. So you likely do not know how the type I error is controlled or how your method performs in terms of power/sample sizes. If the agency asks you to give proof that the Type I Error is controlled you might be somewhat doomed?

❝ There appears to me to be no reason why it wouldn't work, though. It just needs to be investigated and reported

and Helmut:

❝ A lot of problems. I agree with ElMaestro.

❝ ❝ Is it possible?

❝ Not yet – unless you have access to a massive-parallel supercomputer... I made a quick estimation (I have a very fast workstation): ~60 years running 24/7…

My question for today: is this still the case/state of science that you can not combine 2-stage and replicate designs?
A short update would be very much apreciated. In case that there is new literature available I would love if you could give references.
Thanks for your comments in advance.
Kind regards
Dr_Dan

—
Kind regards and have a nice day
Dr_Dan

d_labes ★★★ Berlin, Germany, 2016-03-04 14:21 (2973 d 19:03 ago) @ Dr_Dan Posting: # 16049 Views: 14,114	Science fiction furthermore Post reply
	Dear Dan, AFAIK the situation has not changed. Helmut's estimated 60 years for run-time of the necessary simulations are not elapsed yet . — Regards, Detlew

Helmut
★★★

Vienna, Austria,
2016-03-04 14:41
(2973 d 18:44 ago)

@ Dr_Dan
Posting: # 16050
Views: 14,115

Still SF

Post reply

Hi Dan,

❝ My question for today: is this still the case/state of science that you can not combine 2-stage and replicate designs?

Yes.

❝ A short update would be very much apreciated. In case that there is new literature available I would love if you could give references.

Though I’m deeply involved in this kind of stuff I even don’t know anybody working on it.

If we take the GL literally

“The plan to use a two-stage approach must be pre-specified in the protocol along with the adjusted significance levels to be used for each of the analyses.”

the blue part is the show-stopper.
It’s even worse than a year ago. Do you remember this post? Since many decisions have to be taken into account in the EMA’s ABEL (CV_wR >30%? CV_wR >50%? GMR within 80.00–125.00?) this method itself may lead to an inflation of the type I error. The latest release of PowerTOST contains two functions which iteratively adjust α in such a way that the TIE is preserved: scABEL.ad() for the adjustment and sampleN.scABEL.ad() for the sample size. With this algo on the average you need four iterations to get the adjusted α. Hence, multiply the runtimes given at the end of this post by four…

If (‼) you succeed in convincing regulators that a pre-specified α is not necessary (te-hee) but can be estimated based on stage 1 data, it should be doable (runtime a couple of minutes at the most). Given the EMA’s skepticism concerning TSDs in general (see this post) IMHO, chances are close to nil.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

VStus
★

Poland,
2016-10-06 23:08
(2757 d 11:17 ago)

@ Helmut
Posting: # 16700
Views: 12,769

Still SF

Post reply

Hi Helmut,

❝ ❝ My question for today: is this still the case/state of science that you can not combine 2-stage and replicate designs?

❝ Yes.

Just predicting: recent update of Moore's law predicts doubling of computer's performance every 2.5 years, so, with current semiconductor technology we may expect that Helmut's new desktop bought in 2028 will be able to do the job within for less than a year (~16 years instead of initial estimate), or we need to create distributed computing technology to parallel calculations on decent number of devices... I've heard that CCTV cameras and home internet routers are running on Linux and almost always use default passwords, and we can use Android phones too ;)

But back to reality: isn't it more practical in case of HVD to perform pilot on development with let's say 50% power (wondering: replicated pilot to keep reasonable small population?), better understand in-vitro in-vivo relationship, optimize formulation and than run replicate scaled trial? In 2-stage we also need to wait for results from the 1st stage...

Regards, VStus

ElMaestro
★★★

Denmark,
2016-10-07 00:13
(2757 d 10:12 ago)

@ VStus
Posting: # 16702
Views: 12,732

Still SF

Post reply

Hello VStus,

❝ But back to reality: isn't it more practical in case of HVD to perform pilot on development with let's say 50% power (wondering: replicated pilot to keep reasonable small population?), better understand in-vitro in-vivo relationship, optimize formulation and than run replicate scaled trial? In 2-stage we also need to wait for results from the 1st stage...

Yes
Nitpicking: you perform a 2-stage trial or a pilot trial because you do not know the variability, right? Which means the "let's say 50% power" in actuality means guessworking. Optimizing the formulation on basis of a pilot trial with inherently low power (=high uncertainty on the PE) is in scientific terms as solid as tarot cards or crystal healing.
I read on LinkedIn the other day: "The flat earth society has members all over the globe" :-D

OK, surely I will get in trouble for this post.
:pirate:

—
Pass or fail!
ElMaestro

VStus
★

Poland,
2016-10-07 00:53
(2757 d 09:31 ago)

@ ElMaestro
Posting: # 16703
Views: 12,700

Still SF

Post reply

Hello Maestro,

❝ Nitpicking: you perform a 2-stage trial or a pilot trial because you do not know the variability, right? Which means the "let's say 50% power" in actuality means guessworking.

Well, when we start generic development, in best case we have only data published by Originator in NDA or SmPC, and there may be one or two trials comparing phase 3 and commercial formulation or 2x2 mg vs. 1x4mg... Which we may use as inputs for our guesstimate of Intra-subject CV. We never know it in advance. T/R is also just an assumption...
"Intra-subject CV is not carved in stone".

So someone can say it's always a guessworking untill we receive data from our own trial(s) to improve precision of it.

Regards, VStus

VStus
★

Poland,
2016-10-11 12:23
(2752 d 22:01 ago)

@ ElMaestro
Posting: # 16719
Views: 12,646

Still SF

Post reply

Hello, Maestro!

❝ Nitpicking: you perform a 2-stage trial or a pilot trial because you do not know the variability, right? Which means the "let's say 50% power" in actuality means guessworking. Optimizing the formulation on basis of a pilot trial with inherently low power (=high uncertainty on the PE) is in scientific terms as solid as tarot cards or crystal healing.

Well, I've done some small research into the BE data of one HVD product. An iterative assessment of BE for dataset from our trial with iterative increase of number of subjects from 4 to 42 performed (with slightly modified code of Analyse222BE)
Study design assumptions:
T/R = 0.95;
Intra-subject CV = 0.36.


    data  <- read.csv(tk_choose.files(filters = matrix(c("Comma-separed values", ".csv"), 1, 2, byrow = TRUE)), header = TRUE, row.names=NULL, sep = ",", dec = ".") # change parameters if your .csv uses different separators/decimals

    data2 <- data.frame (subjN=data$SUB, subj=as.factor(data$SUB), drug=as.factor(data$TRT), seq=as.factor(data$SEQ), prd=as.factor(data$PER), Cmax=data$CMAX, AUCt=data$AUCT, AUCinf=data$AUCI)

  # end of data import and conversion

    maxN  <- max(data2$subjN) #define max for cycle

    minN  <- min(data2$subjN)

    alpha <- 0.05

    N     <- NULL

    ISCV  <- NULL

    P_E   <- NULL

    L_CI  <- NULL

    U_CI  <- NULL

    x     <- 0

    for (i in minN:maxN) {

      if (!i %% 4){

        x <- x + 1

        anovadata <- lm(log(Cmax) ~ seq + subj:seq + prd + drug, data=subset(data2, subjN<i+1), na.action=na.exclude)

        MSE       <- anova(anovadata)["Residuals","Mean Sq"]


        PE        <- coef(anovadata)[4]


        CI        <- confint(anovadata, c(4), level=1-2*alpha)

        N[x]      <- nrow(subset(data2, subjN<i+1))/2

        ISCV[x]   <- 100*sqrt(exp(MSE)-1)

        P_E[x]    <- 100*exp(PE)

        L_CI[x]   <- 100*exp(CI[1])

        U_CI[x]   <- 100*exp(CI[2])

      }

    }

  BE_table <- data.frame(N, ISCV, P_E, L_CI, U_CI, stringsAsFactors = FALSE)

I've added some power calculations to this dataframe: T/R assumed by design (0.95), observed Intra-subject CV and number of actual number of subjects for each iteration...
So, the graphical output looks like this:
[image]

Legend:
Point Estimates (%) - Round markers with blue line;
Upper and lower 90% CI (%) - Black lines below and above PE;
Observed Intra-subject CV (%) - Triangles;
Power (ala Potvin C power for 1st stage) - Red curve (lower red :)).

For this particular study, if it was an exploratory pilot, we would receive more-o-less reliable prediction of Intra-subject CV starting from 12 subjects and T/R starting from 15 (let's assume 16) subjects. For pilot study powered at 50% (using design assumptions it's 32 subjects) we would have good chances not only to get reliable Intra-subject CV and T/R data, but to demonstrate BE.

Therefore, it's a guesswork. But having some kind of pilot or previous pivotal study gives better estimates than literature data.

Regards, VStus

Helmut
★★★

Vienna, Austria,
2016-10-07 12:38
(2756 d 21:46 ago)

@ VStus
Posting: # 16704
Views: 12,655

Still SF

Post reply

Hi VStus,

lucid words about SF and great ideas about hacking embeded Linux-machines. :-D

I think that TSDs for reference-scaling are not so important compared to ABE. Let us consider two scenarios.

CV higher than expected.
- ABE
  - In a fixed sample design we loose power.
  - In a TSD with sample size re-estimation we proceed to the second stage and preserve power.
- RSABE or ABEL
  The acceptance range is scaled and power is maintained.
CV lower than expected.
- ABE
  - In a fixed sample design we gain power.
  - In a TSD we have some chances to demonstrate BE already in the first stage. Hence, my personal recommendation for n₁ 75-80% of the fixed sample’s n.
- RSABE or ABEL
  We loose power until we reach CV_wR. Then power increases.

To summarize: In reference-scaling we have to consider lower CVs only. However, the potential impact is over-rated by many. What really hurts (in all designs) is a too optimistic assumption of the GMR.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

VStus
★

Poland,
2016-10-11 11:10
(2752 d 23:14 ago)

@ Helmut
Posting: # 16716
Views: 12,553

Still SF

Post reply

Hi Helmut,

❝ lucid words about SF and great ideas about hacking embeded Linux-machines. :-D

Well, it's a proven solution (not by me!), just not quite legal in most of civilized countries ;)

❝ In a TSD we have some chances to demonstrate BE already in the first stage. Hence, my personal recommendation for n₁ 75-80% of the fixed sample’s n.

After playing with data from other studies, I've made a suggestion that same recommendation is good single stage cross-over with 2 groups to keep the study safe or "conservative"... We can then demonstrate BE based on larger (first) group if requested by regulators.

Regards, VStus

ElMaestro
★★★

Denmark,
2016-10-11 09:44
(2753 d 00:41 ago)

@ Mauricio Sampaio
Posting: # 16714
Views: 12,591

mindblowing

Post reply

I think I overlooked this:

This was a two stage, four period, cross-over, block randomized, replicate design, single dose bioavailability study in healthy volunteers without charcoal blockade. Forty-eight healthy volunteers (24 males and 24 females, aged 18-55 years) were enrolled in the first stage according to study protocol.
Bioequivalence was reached at the first stage according to protocol; therefore the second stage of the study was not required to be performed. There were four periods in the first stage: both test and reference product were given twice according to a replicated design.

Alpha 2.94%. Approved in Germany, Sweden, Portugal, Iceland, Hungary, Italy, 2015 :-)

—
Pass or fail!
ElMaestro

Helmut
★★★

Vienna, Austria,
2016-10-11 11:50
(2752 d 22:35 ago)

@ ElMaestro
Posting: # 16717
Views: 12,676

Natural constant as usual; not for reference-scaling

Post reply

Hi ElMaestro,

very interesting. From the wording of the 2-period study I assume that the original analysis was performed according to “Method C”. I have seen similar requests by the MEB (i.e., post hoc changing to “Method B”). With budesonide the applicant was lucky enough to pass (lower CL 0.80; both of AUC and C_max) but I have seen other cases. BTW, the GL tells us that the CI should be given in percent rounded to two decimals. Would this study still be accepted now?

BSWP:

Methods based entirely on simulations are not acceptable.
The maximum TIE must not exceed 0.05 (exactly).

Some assessors I know:

“I accept studies with Potvin’s methods if the CI is not too close to the acceptance range.

Now to the fully replicated 4-period study:

❝ Alpha 2.94%. Approved in Germany, Sweden, Portugal, Iceland, Hungary, Italy, 2015 :-) :-) :-)

“Correct statistical analysis was conduced.” Hhm. Pocock’s natural constant. “Method B” applied outside its valid range (2×2×2 crossover, n₁ 12–60, CV 10–100%). C_max of budesonide again a close shave.

Edit: Seems that the study was not intended for reference-scaling (page 10: “C_max […] within the bioequivalence acceptance range of 0.80-1.25.”
The CV of C_max was ~50%. If we assume that n₁ (2×2×2) is 2n₁ (2×2×4) we are again outside Potvin’s range (92 > 80). However, likely the TIE was controlled. Quick & dirty:

library(Power2Stage) power.2stage(method="B", alpha=rep(0.0294, 2), n1=46*2, GMR=0.95, CV=0.5, targetpower=0.8, theta0=1.25, nsims=1e6)$pBE [1] 0.040382

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

DavidManteigas
★

Portugal,
2016-10-11 13:25
(2752 d 20:59 ago)

@ Helmut
Posting: # 16721
Views: 12,567

Consistency

Post reply

Hi all,

I get shocked sometimes with the lack of consistency in assessments of member states which should all follow the same guidelines and made their reviews according to the current opinions of the scientific groups of the EMA. I think this happens due to lack of training in regulatory reviews and "regulatory science" in general (in Portugal, almost all of the reviewers I know are "academic") and also due to lack of resources in some agencies to have qualified reviewers for each 'specialty'.
In some countries, I believe that as long as you got a favourable opinion from an ethics committee and regulatory approval for the trial, they will consider your trial "valid" regardless of the appopriateness of the design, statistical methodology and compliance with guidelines/recommendations for design & analysis.

Replicated 4×2 and two-stage-design, in the same pro­tocol [Two-Stage / GS Designs]

Replicated 4×2 and two-stage-design, in the same pro­to­col

Science fiction

Science fiction

Science fiction furthermore

Still SF

Still SF

Still SF

Still SF

Still SF

Still SF

Still SF

mindblowing

Natural constant as usual; not for reference-scaling

Consistency

Replicated 4×2 and two-stage-design, in the same protocol [Two-Stage / GS Designs]

Replicated 4×2 and two-stage-design, in the same protocol