Bioequivalence and Bioavailability Forum • Misusing PowerTOST for superiority?

Misusing PowerTOST for superiority? [Power / Sample Size]

posted by Helmut – Vienna, Austria, 2011-11-27 18:46 (5323 d 23:05 ago) – Posting: # 7738
Views: 11,693

Hi Jamesmartinn!

❝ […] They replied asking me to take the weekend to try once more - the interview went very well and they seemed to like me. I have to submit something tomorrow.

❝

❝ I'm really starting to think that these questions were vague on purpose? I'm wondering if they want me to respond back by saying additional information is required (for example, alpha = 0.05, or 0.01? for these questions?)

Difficult to perform such a kind of telediagnosis across the pond. Either they purposely asked catch questions (OK) or they think that these questions make sense (bad). In the latter case be prepared for a hard every day work life.

❝ In regards to the CV study, I'm going to try to locate the Chow and Liu source, it seems my best bet.

Since time is running out: Check your inbox. :-D

❝ For the last question, as you stated, I have no idea what the goal of the study was. What I posted in green text was literally what I got. Given that limited information (2 treatments, 1 Placebo), what would be the most likely purpose (superiority, inferiority, equivalence) ? I'm guessing which ever it is, it's in reference to the placebo group.

We can only guess. The most likely scenarios IMHO are:

Superiority case: A vs. placebo and B vs. placebo. Target of the study is to select the ‘better’ treatment.
Equivalence case: A vs. B, internal validation (superiority of both to placebo)

❝ I'm going to do a few scenarios for that question. However, the whole '20% difference between treatments' as an effect size really throws me off.

In power calculations we need Δ – the clinically relevant difference. In bioequivalence it was set in the early 1980s to 20% (by convention!). It might be narrower or wider as well. For NTIDs (narrow therapeutic index drugs) it’s 10% and in some regulations for HVDs/HVDPs (highly variable drugs / drug products) 25% or even 30%. Since AUC and C_max follow a lognormal distribution we apply a multiplicative model (or an additive one on logs); therefore the conventional acceptance range in log-scale is [ln(1-Δ), ln(1/(1-Δ))] = [-0.2231, +0.2231]. These limits are symmetrical around zero and back-transformed we get 80–125%. The data of the third question don’t look like coming from a BE study. I would guess that’s a ‘classical’ clinical trial, parallel design, untransformed data.

❝ Could you demonstrate how this is taken into account in calculating 1 or 2 different scenarios? Let's pretend I'm comparing (i) treatment a vs placebo , (ii) treatment b vs placebo for either inferiority/superiority

Let’s see:

Treatment    n    Mean   SD    s²     CV

——————————————————————————————————————————

A            31   2.52  1.36  1.85   54.0%

B            30   2.13  1.30  1.69   61.0%

C (placebo)  16   0.69  1.01  1.02  146.4%

Pooled standard deviations are calculated according to: s₀=√{[(n₁–1)s²₁+(n₂–1)s²_s]/(n₁+n₂–2)}; we get:

Comparison   s₀   T–R     CV

——————————————————————————————

A vs. C     1.25  1.83   68.5%

B vs. C     1.21  1.44   84.0%

A vs. B     1.33  0.39  341.2%

In clinical trials commonly a 95% CI (as opposed to the 90% in BE) is calculated: CI=T–R±t_α,n1+n2–2·s₀·√[(n₁+n₂)/n₁n₂]; we get:

Comparison   Δ   t_α,n1+n2–2    95% CI

——————————————————————————————————————

A vs. C     1.83  2.0141   1.05 – 2.61

B vs. C     1.44  2.0154   0.69 – 2.19

A vs. B     0.39  2.0010  -0.29 – 1.07

Locking at a clinical relevant difference of +20% to placebo (0.83=0.69×1,2), treatment A ‘works’ (lower CL>0.83) and B not (lower CL<0.83). Now for the tricky part: We must not assume equal variances – and according to FDA’s guidance should not even test for it. The t-test is fairly robust against deviations from normality but rather sensitive against imbalance – which we have here. Therefore we should apply Satterthwaite’s approximation of the degrees of freedom and use Welch’s t-test instead (note: that’s the default in R). We get:

Comparison   Δ     df    t_α,df       95% CI

—————————————————————————————————————————————

A vs. C     1.83  39.09  2.0225   1.05 – 2.61

B vs. C     1.44  37.91  2.0246   0.68 – 2.20

A vs. B     0.39  58.99  2.0010  -0.29 – 1.07

No big deal with this dataset…

Time to fire up R:

require(PowerTOST)

power.RatioF(alpha = 0.025, theta1 = 0.8, theta0 = 2.52/2.13,

             CV = 3.412, n = 31+30, design = "parallel")

[1] 4.896124e-05

Not surprisingly power of the equivalence test A–B is terribly low (small difference, high CV). No idea how to tweak PowerTOST for the superiority cases. Maybe later on (can’t promise). I have a deadline approaching at 24:00 CET. :-(

Tried A > C:

power.RatioF(alpha = 0.025, theta1 = 1.2, theta2 = Inf,

             theta0 = 2.52/0.69, CV = 0.685, n = 31+16,

             design = "parallel")

… and got:
Error in checkmvArgs(lower = lower, upper = upper, mean = delta, corr = corr, : mean contains NA

According to help(pmvt):

Note that both -Inf and +Inf may be specified in the lower and upper integral limits in order to compute one-sided probabilities.

PowerTOST goes berserk in .power.RatioF attempting to calculate the correlation matrix – which ends up in Inf/Inf and NaN.

Detlew? :confused:

Too stupid to setup a workaround.

Using the sledgehammer approach (starting with the upper limit set to theta0):

power.RatioF(alpha = 0.025, theta1 = 1.2, theta2 = 2.52/0.69,

             theta0 = 2.52/0.69, CV = 0.685, n = 31+16,

             design = "parallel")

[1] 0.025

Good. That’s α. Now let’s increase theta0 (since Inf doesn’t work):

theta0    power

  4      0.08441706

  5      0.4485689

  6      0.7621197

  8      0.9619132

 10      0.9920993

Maybe you can test your diplomatic skills (something I’m completely lacking of) and answer with something like:

Treatment A showed superiority to placebo (defined as a 20% increase): +1.83 (95% CI +1.05 ~ +2.61) since the lower CL>0.83. Treatment B showed +1.44 (+0.68 ~ +2.20). However, equivalence between A and B could not be excluded (+0.39, -0.29 ~ +1.07) since the CI includes zero.

Now for the meaningless part:

Power to show equivalence between A and B was only 4.9·10^-5, XYZ (include the numbers from sampleN.RatioF here) subjects would be needed in a future study. Assuming a clinically relevant difference of 20% power to show superiority of both treatments to placebo was very high. :-D

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

Sample Size and CV for Replicate Design Jamesmartinn 2011-11-26 20:22
- Sample Size and CV for Replicate Design ElMaestro 2011-11-26 22:38
  - Sample Size and CV for Replicate Design Jamesmartinn 2011-11-26 23:52
    - Sample Size and CV for Replicate Design ElMaestro 2011-11-27 00:43
      - Sample Size and CV for Replicate Design Jamesmartinn 2011-11-27 00:50
        
        Sample Size and CV for Replicate Design ElMaestro 2011-11-27 01:24
        
        Sample Size and CV for Replicate Design Jamesmartinn 2011-11-27 01:50
        
        Strange questions… Helmut 2011-11-27 04:47
        
        Strange questions… Jamesmartinn 2011-11-27 14:43
        
        Misusing PowerTOST for superiority?Helmut 2011-11-27 17:46
        
        Strange questions… ElMaestro 2011-11-27 15:14
        
        Strange questions… Helmut 2011-11-27 18:07