BE-proff
★★

Russia,
2017-02-18 05:40
(1265 d 01:02 ago)

Posting: # 17078
Views: 11,460

## Data for 2nd stage of Potvin’s designs [Two-Stage / GS Designs]

Hi All,

I have some questions on Potvin's designs:

1) is it correct that GMR and CV are required for sample size calculation for 2nd stage in all Potvin's designs?

2) Why Method C is considered "better" for sponsors than Method B?

The 1st question arised because Potvin's article dated 2007 says that only CV is needed for A and B while Method C uses both GMR and CV.

But on the other hand there are also opinions that all methods require GMR and CV.

Unclear....
ElMaestro
★★★

Belgium?,
2017-02-18 09:20
(1264 d 21:22 ago)

@ BE-proff
Posting: # 17079
Views: 10,863

## Data for 2nd stage of Potvin’s designs

Hi BEproff,

» 1) is it correct that GMR and CV are required for sample size calculation for 2nd stage in all Potvin's designs?

All calculation of sample size requires that CV and GMR be plugged in. You have a choice between using the observed GMR or a fixed GRM like 0.95. There is a lot of confusion about it, but the short version is: Two stage methods in all known forms only behave well if the true GMR is controlled. Therefore methods using GMR=0.95 (Potvin B and C and more) behave well when that criterion is true. Performance easily become abysmally bad if you are not in control of the GMR.

» 2) Why Method C is considered "better" for sponsors than Method B?

It is in the eye of the beholder. B has a bit lower observed alpha inflation than C. Asking why an authority does prefers C to B, or why an airline passenger prefers beef over chicken, is not productive.

» The 1st question arised because Potvin's article dated 2007 says that only CV is needed for A and B while Method C uses both GMR and CV.

Both use GMR=0.95 regardless of observed GMR.

Yes, all this is bloody confusing and it is easy to take the wrong decisions in this area. If you just remember the sentence in red above and base your decisions on it, you'l be more or less fine.

I could be wrong, but...

Best regards,
ElMaestro

"Pass or fail" (D. Potvin et al., 2008)
BE-proff
★★

Russia,
2017-02-18 12:29
(1264 d 18:13 ago)

@ ElMaestro
Posting: # 17083
Views: 10,772

## Data for 2nd stage of Potvin’s designs

Hi ElMaestro,

What do you mean under controlled GMR - being within 0.95-1.05? Correct?

So, if 1 stage of any method shows GMR 1.19 which figure shoud be taken for 2 stage: 0.95 or 1.19
Helmut
★★★

Vienna, Austria,
2017-02-18 15:23
(1264 d 15:20 ago)

@ BE-proff
Posting: # 17085
Views: 10,918

## GMR = fixed!

Hi BE-proff,

» What do you mean under controlled GMR […]

Let me answer for our ol’ Capt’n: The GMR in the estimation of interim power (stage 1) and in the sample size estimation for the second stage is fixed (and not the observed one).

» - being within 0.95-1.05? Correct?

Nope. The observed one can be anything. For Potvin’s B and C it is 0.95 (or 1/0.95 if you prefer). Other methods use other fixed GMRs (the A in my figures below). Look them up in the publications (AFAIK, only methods with 0.95 and 0.90 are published).

» So, if 1 stage of any method shows GMR 1.19 which figure shoud be taken for 2 stage: 0.95 or 1.19

For most methods 0.95. If you already expect a “bad” GMR, you could opt for Montague’s or one of Anders’ methods which use a fixed GMR of 0.90 (or 1/0.90). Alternatively you could work with one of the methods with a futility criterion (stopping in stage 1).
Only fully adaptive methods (e.g., by Karalis and Macheras) would use the observed GMR 1.19. See my paper mentioned below why this might not be a good idea…*

• For beginners… Of course, the original methods can be tweaked in such a way that the power is sufficient. But that requires exhaustive simulations.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Helmut
★★★

Vienna, Austria,
2017-02-18 10:51
(1264 d 19:52 ago)

@ BE-proff
Posting: # 17081
Views: 11,096

## “Type 1” slightly higher power than “Type 2” for the same adj. α

Hi BE-proff,

I agree with ElMaestro.

» 2) Why Method C is considered "better" for sponsors than Method B?

Unfortunately there is an „inflation” of letters denoting methods.
Therefore, I suggested* to use “Type 1” (B, E, …) and “Type 2” (C, D, C/D, F, …) instead.

“Type 1”

“Type 2”

In “Type 2” TSDs the conventional (unadjusted) α 0.05 may be used in the first stage (dependent on interim power). Hence, under certain conditions you have a decent chance to stop already in the first stage with no sample size penalty (due to the mandatory adjusted α in “Type 1” TSDs).

Potvin et al. recommended Method C over B due to its higher power. Examples (power by the noncentral t-approximation):

n1 CV (%)   B      C
12   10   0.97697 0.98858
24   20   0.88046 0.90882
36   30   0.83704 0.84676
48   40   0.82901 0.82838
60   50   0.82477 0.82405

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
BE-proff
★★

Russia,
2017-02-18 12:31
(1264 d 18:12 ago)

@ Helmut
Posting: # 17084
Views: 10,737

## “Type 1” slightly higher power than “Type 2” for the same adj. α

Hi Helmut,

Rather risky - such methods must be present in our guides otherwise our designs will be rejected...
Helmut
★★★

Vienna, Austria,
2017-02-18 15:27
(1264 d 15:16 ago)

@ BE-proff
Posting: # 17086
Views: 10,671

## Terminology

Hi BE-proff,

» Rather risky - such methods must be present in our guides otherwise our designs will be rejected...

“Type 1” or “Type 2” was my proposal to introduce an unambiguous terminology. Get the paper at sci-hub.
On the contrary the Russian guideline (copypasted from the EMA’s) is ambigous (“For example, using 94.12% confidence intervals […] would be acceptable, but there are many acceptable alternatives and the choice of how much alpha to spend at the interim analysis is at the company's discretion”).

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Yura
★

Belarus,
2017-02-20 10:28
(1262 d 20:14 ago)

@ Helmut
Posting: # 17088
Views: 10,590

## Terminology

Hi All,
If the ratio of AUC and Cmax T / R beyond 0.95 - 1.05, whether it is a violation of the conditions of the calculation for algorithm of adaptive design?
ElMaestro
★★★

Belgium?,
2017-02-20 10:46
(1262 d 19:56 ago)

@ Yura
Posting: # 17089
Views: 10,645

## Which GMR to plug in

Hi Yura,

there are not that many people working with these designs. But those who do have all tried to plug in the observed GMR from stage 1 for sample size calculation, rather than 0.95 as in Potvin B & C.

The result is strikingly bad news: The chance that you have a greater departure from 0.95 goes up as sample size in stage 1 goes down, so you easily end up in scenarios where you need 800 subjkects in stage 2 if you apply the observed GMR, and this may happen even if the true GMR is 0.95 or better. Of course you can put a cap on max sample size, but you are punished on power.

Heartbreaking, really!!

As stated before: At a time when you do not know the true GMR very well (such as after the first stage) it is not a particularly good idea to base decisions on it (such as final sample size).

That is why Potvin's method are great for formulations with known and controlled GMR, but not great for new formulations where you don 't know how they match. Two-stage designs are useful for unknown CV's and not much else, at least in the presrnt form.

Pilot trials suffer the exxact same issue. "They are better than nothing" is a sentence I have heard a few times, but it is often not the case. Depending on how you use the info available the info you may well decide wrongly and be punished.

I could be wrong, but...

Best regards,
ElMaestro

"Pass or fail" (D. Potvin et al., 2008)
BE-proff
★★

Russia,
2017-02-21 10:35
(1261 d 20:08 ago)

@ ElMaestro
Posting: # 17094
Views: 10,547

## Which GMR to plug in

Hi ElMaestro,

I suppose you are talking about futility studies as a alternative for adaptive designs.

Why are futility designs more popular if risks are similar?
ElMaestro
★★★

Belgium?,
2017-02-21 10:47
(1261 d 19:55 ago)

@ BE-proff
Posting: # 17095
Views: 10,531

## Which GMR to plug in

Hi BE-proff,

» I suppose you are talking about futility studies as a alternative for adaptive designs.

Errr.... what????

» Why are futility designs more popular if risks are similar?

Come again, what does this mean?
What is a futility design, futility study, how are they alternatives?

I meant to say that plugging in the observed GMR has a bunch of drawbacks, which often are plain showstoppers. The sample size explosion can be fixed by futility caps on sample size, and that of course keeps the sample size (and cost) down, but then power may be so low that the trial makes no sense and is nothing more than unethical exposure.

2-stage trials work well then you are absolutely sure about the match and only unsure about the variability.

I could be wrong, but...

Best regards,
ElMaestro

"Pass or fail" (D. Potvin et al., 2008)
Helmut
★★★

Vienna, Austria,
2017-02-22 11:03
(1260 d 19:40 ago)

@ Yura
Posting: # 17096
Views: 10,434

## Validated frameworks; observed GMR not relevant

Hi Yura,

» If the ratio of AUC and Cmax T / R beyond 0.95 - 1.05, whether it is a violation of the conditions of the calculation for algorithm of adaptive design?

Adjusted alphas of the published frameworks are only valid for certain ranges of n1/CV-combinations, fixed GMRs, and target powers assessed (see this presentation, slide 20). F.i. Potvin’s αadj 0.0294 in ‘Method B’ (“Type 1”) is valid for n1 12–60, CV 10–100%, fixed GMR 0.95, and target power 80%. The maximum Type I Error is generally seen at small stage 1 sample sizes and low CVs. Hence, even if the n1 and/or CV were outside the validated range on the upper end (say for ‘Method B’ >60 and/or >100%) you can be pretty sure that the patient’s risk is still controlled. However, in such a case picky assessors might ask for simulations.
I would avoid performing the first stage in 12 subjects. Due to dropouts one may end up outside the validated range. Example for ‘Method B’:

library(Power2Stage)
power.2stage(CV=0.2, n1=12, alpha=c(0.0294, 0.0294), theta0=1.25,
targetpower=0.8, pmethod="shifted", nsims=1e6)$pBE # [1] 0.046352 # n1 within validated range: TIE <0.05. power.2stage(CV=0.2, n1=10, alpha=c(0.0294, 0.0294), theta0=1.25, targetpower=0.8, pmethod="shifted", nsims=1e6)$pBE
# [1] 0.048389 #
n1 outside validated range: higher TIE but still <0.05.

The GMR observed in the first stage is not relevant.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Silva
☆

Portugal,
2017-03-09 00:26
(1246 d 06:16 ago)

@ Helmut
Posting: # 17143
Views: 10,180

## Validated frameworks; observed GMR not relevant

Hi Helmut

Trying to learn technical issues of TSD.

In the example you gave:

library(Power2Stage)
power.2stage(CV=0.2, n1=12, alpha=c(0.0294, 0.0294), theta0=1.25,
targetpower=0.8, pmethod="shifted", nsims=1e6)\$pBE
# [1] 0.046352
# n1 within validated range: TIE <0.05.

why the use of theta0 as 1.25 and not as GMR value? I understand the use of GMR of 0.95 for Potvin B method (as it was validated with this assumption), but don´t understant the meaning of theta0.

According to Power2Stage manual, theta 0 corresponds to the True ratio of T/R for simulating. (Defaults to the GMR argument if missing).

What is the meaning of "True ratio of T/R for simulating" and what is the difference for GMR?
d_labes
★★★

Berlin, Germany,
2017-03-09 08:21
(1245 d 22:21 ago)

@ Silva
Posting: # 17144
Views: 10,090

## GMR, theta 0 and that all

Dear Silva,

» What is the meaning of "True ratio of T/R for simulating" and what is the difference for GMR?

To obtain a value of the type 1 error or the power via simulations one has to create data for many studies, say 1 Mio, perform the TSD framework (using GMR=0.95 f.i., the A in Helmut's schemes above) on that data and count all studies in which BE was decided.

To create the study data you use the "True ratio of T/R" and the "True CV" and the statistical distributions relating these parameters to the observed outcomes of a study.

True ratio = 1.25 will create studies which are under the Null hypothesis "bioinequivalence" and counting studies which decide here nevertheless BE gives you the type I error "Null hypothesis true but decided as false".

Hope this was understandable.

Regards,

Detlew
Silva
☆

Portugal,
2017-03-09 11:38
(1245 d 19:04 ago)

@ d_labes
Posting: # 17145
Views: 10,137

## GMR, theta 0 and that all

Dear d_labes
Many thanks for your explanations. So just to clarify my mind…
Using theta0 in power2stage as 0.8 or 1.25, I’m informing the system that, after studies have been simulated based on expected GMR, n1, CV and target power, test product is truly non bioequivalent, because true T/R is 0.8 or 1.25 and therefore the respective 90% CI will always be outside the [0.80.1.25] bioequivalence range.
The algorithm will then calculate the number of simulated studies that wrongly rejected the Null Hypothesis and divide this number by the total number of simulated studies. The ratio represents TIE.

Considering a study design under Potvin’s method B framework (type I TSD), i.e. and expected GMR of 0.95, an n1 between 12 and 60 subjects, a CV between 10 and 100% and a target power of 0.8, no simulations are required if, on the end of the study, GMR was 0.95 and CV between 10 and 100%. Am I thinking appropriately?
But if expected GMR is for example 0.91, n1 between 12 and 60 subjects, CV between 10 and 100% and target power is 0.8, there is a violation of method B assumptions, right?
And therefore simulations are needed based on true data at the end of trial in order to calculate if TIE was below the nominal alpha of 0.05. So, assuming a final GMR of 0.91, a CV of 34%, n1 = 16, a target power of 0.8, and no futility rule, power2stage simulation conditions would be:
power.2stage(method = c("B"), alpha0 = 0.05, alpha = c(0.0294, 0.0294),n1=16, GMR=0.91, CV=0.34, targetpower = 0.8, pmethod = c("nct"), usePE = FALSE, Nmax = Inf, min.n2=0, theta0=0.8, theta1=0.8, theta2=1.25, npct = c(0.05, 0.5, 0.95), setseed = TRUE, details = TRUE)
With this simulation scenario:
1e+05 sims. Stage 1 - Time consumed (secs):
user  system elapsed
0.4     0.0     0.4
Keep calm. Sample sizes for stage 2 (98482 studies)
will be estimated. May need some time.
Time consumed (secs):
user  system elapsed
1.3     0.0     1.3
Total time consumed (secs):
user  system elapsed
2       0       2

Method B: alpha (s1/s2) = 0.0294 0.0294
Target power in power monitoring and sample size est. = 0.8
BE margins = 0.8 ... 1.25
CV = 0.34; n(stage 1)= 16; GMR = 0.91
GMR = 0.91 and mse of stage 1 in sample size est. used
Futility criterion Nmax = Inf

1e+05 sims at theta0 = 0.8 (p(BE)='alpha').
p(BE)    = 0.04385
p(BE) s1 = 0.01512
Studies in stage 2 = 98.48%

Distribution of n(total)
- mean (range) = 100.5 (16 ... 332)
- percentiles
5% 50% 95%
46  96 170

Based on this results:
• Type I error with the 2 stages was 0.04385 (and therefore <0.05) for the 1e+05 simulated studies
• Type I error for the first stage was 0.01512
• 98.48% of the simulated studies went into stage 2. The other 1.52% of the studies ended on stage 1 (either as BE or non BE)
Am I interpreting correctly this results?
Best rgs and thks for all the patience!
ElMaestro
★★★

Belgium?,
2017-03-09 11:56
(1245 d 18:47 ago)

@ Silva
Posting: # 17146
Views: 10,089

## GMR, theta 0 and that all

Hi Silva,

I am not d_labes or Helmut but I have an opinion, too, and you are asking some bloody good questions there

» The algorithm will then calculate the number of simulated studies that wrongly rejected the Null Hypothesis and divide this number by the total number of simulated studies. The ratio represents TIE.

Actually I'd prefer to say it is the maximum type 1 error. The type 1 error is the chance of concluding BE for a product that isn't BE, which is when the true ratio is outside the acceptance range. So when we work with 80.00%-125.00% a product is truly not BE when the true ratio is 72%, 77%, 79%. But these three levels will be associated with different levels of power. Regardless of how a product is inequivalent we aim for methods that give a maximum type 1 error of 5%. By way of the nature of the game, the type 1 error becomes smaller as we part further from one of the limits.

» Considering a study design under Potvin’s method B framework (type I TSD), i.e. and expected GMR of 0.95, an n1 between 12 and 60 subjects, a CV between 10 and 100% and a target power of 0.8, no simulations are required if, on the end of the study, GMR was 0.95 and CV between 10 and 100%. Am I thinking appropriately?

For any approved method I know of it does not have anything to do with what the GMR or CV was at the end of the study as long as you showed BE.

» But if expected GMR is for example 0.91, n1 between 12 and 60 subjects, CV between 10 and 100% and target power is 0.8, there is a violation of method B assumptions, right?

No. See above.

» And therefore simulations are needed based on true data (...)

No, this isn't exactly how it works. The "true data" you refer to are your observations - they give an estimate but they do not give you the true ratio.

I could be wrong, but...

Best regards,
ElMaestro

"Pass or fail" (D. Potvin et al., 2008)
d_labes
★★★

Berlin, Germany,
2017-03-09 12:55
(1245 d 17:48 ago)

@ Silva
Posting: # 17147
Views: 10,051

## GMR, theta0 and that all

Dear Silva,

additionally to what our great Maestro said:

You have to know beforehand if your settings you use in a TSD i.e. adj. alpha, fixed GMR, targetpower, n1, inclusion of futility rules or other are suszeptible to an alpha-inflation over a range of reasonable true CV.
Potvin et.al. have that shown for adj. alpha = 0.0294, targetpower 0.8 and n1 12-60 for a range of true CVs of 10-100% by simulations, at least for the Type 1 (aka method B) decision scheme. For Type 2 (aka method C) decision scheme some guys believe they have seen an alpha-inflation in the numbers given.

Anders the Great has derived adj. alpha values for other settings of the fixed GMR and targetpower preserving also the maximum TIE <=0.05.
1. Fuglsang A. Controlling type I errors for two-stage bioequivalence study designs. Clin Res Regul Aff. 2011;28(4):100–5. doi 10.3109/10601333.2011.631547
2. Fuglsang A. Sequential Bioequivalence Trial Designs with Increased Power and Controlled Type I Error Rates. AAPS J. 2013;15(3):659–61. doi 10.1208/s12248-013-9475-5
If you change any of these settings you have to show again that no alpha-inflation is to be expected. By simulations (... although there is some rumor that some leading regulatory agencies don't like simulations).
And that is to what end package Power2Stage was invented.

The observed GMR(s) and CV in your actual study don't play a role in that game. Not known beforehand .
One exception: If you observe a CV > the 'validated' range it may be wise to do simulations with that CV assumed as TRUE.

Regards,

Detlew
Silva
☆

Portugal,
2017-03-09 17:01
(1245 d 13:42 ago)

@ d_labes
Posting: # 17148
Views: 10,048

## GMR, theta0 and that all

Dear el_maesto and d_labes

Many thanks for your reply and elucidations. I'll read carefuly Fuglsang's paper (2011). I've already studied the other two papers from the same author (2013 and 2014), as well as Helmut's nice review paper.

Best rgds
BE-proff
★★

Russia,
2017-02-21 10:31
(1261 d 20:11 ago)

@ Helmut
Posting: # 17093
Views: 10,500

## “Type 1” slightly higher power than “Type 2” for the same adj. α

Hi Helmut,

Thank you for clarification!