Helmut
★★★
avatar
Homepage
Vienna, Austria,
2012-12-01 02:50
(4522 d 16:39 ago)

Posting: # 9650
Views: 11,849
 

 Another Two-Stage ‘idea’ (lengthy) [Two-Stage / GS Designs]

Hi simulators!

I stumbled upon another goodie. A presentation at last years “AAPS Workshop on Facilitating Oral Product Development and Reducing Regulatory Burden through Novel Approaches to Assess Bioavailability / Bioequivalence”:

Alfredo García-Arieta
Current Issues in BE Evaluation in the European Union
23 October, Washington

The views expressed in this
presentation are the personal views of
the author and may not be
understood or quoted as being made
on behalf or reflecting the position of
the Spanish Agency for Medicines
and Health Care Products or the
European Medicines Agency or one
of its committees or working parties


Two-stage design

  • The first stage is an interim analysis and
  • The second stage is the analysis of the full data set
      The 2nd data set cannot be analyzed separately.
  • To preserve the overall type I error the significance level needs to be adjusted to obtain a coverage probability higher than 90%
      It is not acceptable to perform
        a 90% CI at the interim analysis and
        a 95% CI in the final analysis with the full data set.

  • How alpha is spent must be pre-defined in
    the protocol
  • The same or a different amount alpha can be spent in each analysis
      If the same alpha is spent in both stages the
        Bonferroni rule (95% confidence interval in both
        analyses) is too conservative and
      94.12% confidence interval can be used
      It is also possible to distribute the alpha differently
          A extreme case it is acceptable to design no alpha
            expenditure in the interim analysis when designed to
            obtain information of formulation differences and intra-
            subject variability and 90% CI are not estimated at the
            interim stage.

  • Even if the final sample size is going to be
    decided based on the intra-subject variability
    estimated in the interim analysis, a proposal
    for a final sample size must be included in the
    protocol.
  • This proposed final sample size should be
    recruited if the estimation obtained from the
    interim analysis were lower than the one predefined
    in the protocol in order to keep the
    consumer risk.
  • A term for the stage in the ANOVA model.


What puzzles me here is the “proposed final sample size”. I don’t understand why performing the stage in a smaller sample size should violate the consumer’s risk. The contrary was already demonstrated by Potvin et al. with Method B. Furthermore how would a sponsor derive the final sample size? This idea smells of the original work of Pocock, which would require for one interim analysis the first stage to be ½ of the final size. In my understanding (corrections welcome) the framework might look like this:

[image]

I’ve performed some simulations for CV 20% (see at the end of the post).

I wonder why so many new “methods” popped up in the last years (remember this one?) without any [sic] published evidence that they maintain the patient’s risk.

Example: αadj 0.0294, CV 20%, GMR 0.95, target power 80%, n1 12 (np 24), 106 simulations, exact sample size estimation (Owen’s Q)
                 αemp        1–βemp   ntot   5%  50%  95%  % in stage 2
Method B     :  0.046293    0.841332  20.6  12   18   40    56.3795
Pseudo Pocock:  0.050903*   0.896622  21.9  12   24   40    58.6798

* significantly >0.05 (limit 0.05036).

Such a method would require simulations for every single study – in order to come up with a proposed final sample size and a suitable αadj. It don’t expect 0.0294 to be applicable. Imagine a situation where the proposed sample size was 48 (n1 24), and the expected CV 20%. Actually it was 25%. Instead of running the second stage in ten subjects (nestn1=34–24) we would have to dose another 24 subjects. Of course we could set the proposed size just a little bit larger than n1 (would go with nest most of the time) – but this is not even “intended Pocock” any more and we end up with Method B without power. Patient’s risk? No idea, again. Show me simulations, please.
Why do we have to waste our time assessing all this “methods”? I would prefer that their inventors do their job. If they have done it already – which I doubt –, they should publish it.



[image]
Patient’s risk significantly inflated with n1 12 – although within Potvin’s maximum acceptable inflation (0.052); less conservative than Method B (especially at higher sample sizes in the first stage) where αemp asymptotically approaches αadj of 0.0294.

[image]
Higher power than Method B because studies proceeding to the second stage with nest < nprop have to be performed in npropn1 subjects (instead of in only nestn1).

[image]

[image]

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2012-12-01 23:20
(4521 d 20:09 ago)

@ Helmut
Posting: # 9652
Views: 10,051
 

 Not a PSRtPH that can be defended

Hi Helmut,

❝ This proposed final sample size should be

❝ recruited if the estimation obtained from the

❝ interim analysis were lower than the one predefined

❝ in the protocol in order to keep the

❝ consumer risk.


❝ What puzzles me here is the “proposed final sample size”. I don’t understand why performing the stage in a smaller sample size should violate the consumer’s risk. The contrary was already demonstrated by Potvin et al. with Method B.


You're completely right. In addition, if one follows the Potvin method with good result and gets RMS support then I think it will be very, very difficult for another EU regulator to defend a PSRtPH at CMDh or CHMP on basis of something like the above idea when nothing about it has been published. I am saying this not on basis of regulations (there aren't any due to EU's national sovereignty preservation) but on basis of practical experience.
Potvin showed that under her framework type I errors are preserved at 5% when the final sample size floats freely. If we tamper with final sample size like proposed above, the average sample size will inevitably go up as may power consequently. This means that we may be exposing more volunteers to IMPs than we would have done under Potvin even with preserved type I error rates. This can certainly be argued to be unethical. And due to the literal quote above it is not readily easy for a regulator to argue that it is the extra little power that is the true advantage.
I'll be happy to take the challenge :-D:-D:-D

Now back to the compiler. I need to recharge my dilithium crystals.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2012-12-02 12:43
(4521 d 06:47 ago)

@ ElMaestro
Posting: # 9653
Views: 9,934
 

 Full adaptive without α-spending?

Hi ElMaestro!

❝ […] a PSRtPH at CMDh or CHMP […]



Excuse my ignorance but I’m not familiar with EMA’s terminology: What’s a “PSRtPH”?

❝ […] the extra little power that is the true advantage.


The gain in power is massive. Look at my example: A fixed sample design in 24 subjects would have a power of 83.6% (α 0.0294), Potvin B 88.0% (αemp 0.0317), but the “Pseudo Pocock” 99.0% (αemp 0.0489). That’s not ethical. Since when are regulators interested in keeping the producer’s risk low?

Potvin et al. stated:

This study did not seek to find the best possible two-stage design, but rather to find good ones that could be used by sponsors without further validation.

As a side effect for many combinations of CV and n1 the framework is very conservative (αempαadj).
[image]

I always thought that regulators’ main interest is to protect public health; they should be happy with a method which demonstrated to be conservative.

Now for something completely different: How do you interpret this part?
  • A extreme case it is acceptable to design no alpha
    expenditure in the interim analysis when designed to
    obtain information of formulation differences and intra-
    subject variability and 90% CI are not estimated at the
    interim stage.
What does that mean? Sample size re-estimation based on PE and CV? Since we don’t assess BE in the first stage (that would be similar to Potvin’s “internal pilot” Method A – which they dropped due to α-inflation) we have to introduce a futility rule. Sample size estimation if the PE is not within the acceptance range simply doesn’t work. Is the following a scheme we could test?

[image]

BTW, I have performed studies of a similar design 10–15 years ago (for BfArM, AFFSAPS), but never simulated anything… Dunno how to set up sim’s properly. By chance one might get a PE far away from unity and a high CV requiring thousands of subjects. R kicked my ass with a nice “Error: cannot allocate vector of size 1983.8 Mb”.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2012-12-05 09:38
(4518 d 09:51 ago)

@ Helmut
Posting: # 9671
Views: 9,257
 

 Full adaptive with futility rule

Dear Helmut!

❝ ... Sample size re-estimation based on PE and CV? Since we don’t assess BE in the first stage (that would be similar to Potvin’s “internal pilot” Method A – which they dropped due to α-inflation) we have to introduce a futility rule. Sample size estimation if the PE is not within the acceptance range simply doesn’t work.


That and your scheme reminds me on an idea I had some times ago but was not able to elaborate due to missing spare time.

Dropping all the Spanish bells and whistles no one is aware of its foundations, but retaining the sample size adaption based on PE and CV and the PE futility rule leads to a modification of Potvin B (with implicit power) according to:

[image]

Other settings of alpha1, alpha2 are imaginable.
For instance alpha1=0 (resulting in CIs at stage 1 -Inf ... +Inf, i.e. BE test always "BE not proven") and alpha2=0.05 :cool:.

I think this scheme is worth of exploration.

If the overall alpha <= 0.05 is satisfied for this decision scheme it would overcome the drawback of Potvin's decision schemes that even if the stage 1 point estimate is "jenseits von gut und böse" (beyond the pale) we are forced to go to stage 2, but with a sample size based on a 'true' ratio of 0.95. With that we definitely don't reach BE.

Here some preliminary results of simulations (10E5 sims only) with the Pocock alphas (alpha1, alpha2=0.0294):
             empirical
CV    n1   alpha   power
------------------------
15 %   8   0.045   0.918
      12   0.042   0.964
      16   0.039   0.982
20 %   8   0.044   0.811
      12   0.045   0.897
      16   0.043   0.939
      20   0.043   0.959
      24   0.041   0.974
25 %   8   0.039   0.703
      12   0.043   0.812
      16   0.045   0.873
      20   0.045   0.913
      24   0.042   0.937
      28   0.042   0.953
      32   0.041   0.964


No sign of an alpha-inflation. Power ok, except for the very small first stage with n1=8. Seems promising.

BTW: The futility rule "PE outside 0.85...1/0.85~1.18" was used by Charles Bon, 2007 AAPS Annual Meeting. Unfortunately his presentation "Interim and Sequential Analyses" is no longer found on the Indernett.

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2012-12-05 20:56
(4517 d 22:33 ago)

@ d_labes
Posting: # 9678
Views: 9,115
 

 Full adaptive with futility rule

Dear Detlew!

❝ Dropping all the Spanish bells and whistles no one is aware of its foundations,


I’m afraid that’s a prerequisite.

❝ but retaining the sample size adaption based on PE and CV and the PE futility rule leads to a modification of Potvin B (with implicit power)…


Interesting scheme!

❝ Other settings of alpha1, alpha2 are imaginable.

❝ For instance alpha1=0 (resulting in CIs at stage 1 -Inf ... +Inf, i.e. BE test always "BE not proven")


Ha-ha! Bioinequivalence not rejected. ;-)

❝ Here some preliminary results of simulations (10E5 sims only)


Looks promising indeed. Too nasty that these types of simulations converge rather slow. 105 give just a hint. See the plot in this post.

❝ BTW: The futility rule "PE outside 0.85...1/0.85~1.18" was used by Charles Bon, 2007 AAPS Annual Meeting. Unfortunately his presentation "Interim and Sequential Analyses" is no longer found…


Time to write an e-mail.

❝ …on the Indernett.


:-D

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2012-12-03 03:07
(4520 d 16:22 ago)

@ ElMaestro
Posting: # 9655
Views: 9,764
 

 Piece of paper…

Hi ElMaestro,

I made an investment:

García-Arieta A, Gordon J. Bioequivalence Requirements in the European Union: Critical Discussion. AAPS J. 2012;14(4):738–48. doi:10.1208/s12248-012-9382-1


On p744 we find:

For the first time, this guideline acknowledges the possibility of a two-stage design to show BE. In this instance, the following should be noted:

  1. The first stage is an interim analysis and the second stage is the analysis of the full data set. The second data set cannot be analysed separately.
  2. In order to preserve the overall type I error, the significance level needs to be adjusted to obtain a coverage probability higher than 90%. Therefore, it is not acceptable to perform a 90% CI at the interim analysis and a 95% confidence interval in the final analysis with the full data set.
  3. The plan to spend alpha must be pre-defined in the protocol. The same or a different amount of alpha can be spent in each analysis. If the same alpha is spent in both stages, the Bonferroni rule (95% confidence interval in both analyses) is too conservative and 94.12% confidence interval can be used. It is also possible to distribute the alpha differently, and as an extreme case, it is acceptable to plan no alpha expenditure in the interim analysis when it is designed to obtain information on formulation differences and intra-subject variability and 90% CI are not estimated at the interim stage.
  4. A term for the stage should be included in the ANOVA model. However, the guideline does not clarify what the consequence should be if it is statistically significant. In principle, the data sets of both stages could not be combined.

Although the guideline is not explicit, even if the final sample size is going to be decided based on the intra-subject variability estimated in the interim analysis, a proposal for a final sample size must be included in the protocol so that a significant number of subjects (e.g., 12) is added to the interim sample size to avoid looking twice at almost identical samples. This proposed final sample size should be recruited even if the estimation obtained from the interim analysis is lower than the one pre-defined in the protocol in order to maintain the consumer risk.


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2012-12-03 08:19
(4520 d 11:10 ago)

@ Helmut
Posting: # 9656
Views: 9,570
 

 Piece of paper…

Hi Helmut,

good info, thanks a lot.

PSRtPH = Potential Serious Risk to Public Health = the regulated term for the dark matter from which referrals are made.

❝ A term for the stage should be included in the ANOVA model. However, the guideline does not clarify what the consequence should be if it is statistically significant. In principle, the data sets of both stages could not be combined.


We cannot combine data for stage 1 an stage 2 in our analyses, or what is meant there?

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2012-12-03 14:02
(4520 d 05:27 ago)

@ ElMaestro
Posting: # 9660
Views: 9,483
 

 Piece of paper…

Hi ElMaestro!

❝ ❝ A term for the stage should be included in the ANOVA model. However, the guideline does not clarify what the consequence should be if it is statistically significant. In principle, the data sets of both stages could not be combined.


❝ We cannot combine data for stage 1 an stage 2 in our analyses, or what is meant there?


Similar to the never-ending story of including a term if a conventional study was performed in more than one group. :-D

Potvin et al. used a stage-term in the second stage, but

[…] pooling data from stages 1 and 2 were always allowed, even if there was a statistically significant difference between the results from the two stages.

because

[The method] does not require poolability criteria (or at least should know whether results from both stages are poolable before sample analysis, i.e. base poolability on study conduct such as subject demographics, temporal considerations, use of same protocol, use of same site, etc., rather than a statistical test of poolability).


A term for stage makes sense. Potvin’s Example 2 (94.12% CI):

87.94–117.05% (with stage term, 17 df)
88.16–116.72% (without stage term, 18 df)


The stage term (lower df) widens the CI (conservative). Since the stage term is between subjects its power is low. Nevertheless, even if there is no ‘true’ effect we expect to get a false positive at the level of the test.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
UA Flag
Activity
 Admin contact
23,424 posts in 4,927 threads, 1,695 registered users;
29 visitors (0 registered, 29 guests [including 4 identified bots]).
Forum time: 20:30 CEST (Europe/Vienna)

Do not put your faith in what statistics say until you have carefully
considered what they do not say.    William W. Watt

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5