Misusing PowerTOST for superiority? [Power / Sample Size]

posted by Helmut Homepage – Vienna, Austria, 2011-11-27 18:46 (5323 d 23:05 ago) – Posting: # 7738
Views: 11,693

Hi Jamesmartinn!

❝ […] They replied asking me to take the weekend to try once more - the interview went very well and they seemed to like me. I have to submit something tomorrow.


❝ I'm really starting to think that these questions were vague on purpose? I'm wondering if they want me to respond back by saying additional information is required (for example, alpha = 0.05, or 0.01? for these questions?)


Difficult to perform such a kind of telediagnosis across the pond. Either they purposely asked catch questions (OK) or they think that these questions make sense (bad). In the latter case be prepared for a hard every day work life.

❝ In regards to the CV study, I'm going to try to locate the Chow and Liu source, it seems my best bet.


Since time is running out: Check your inbox. :-D

❝ For the last question, as you stated, I have no idea what the goal of the study was. What I posted in green text was literally what I got. Given that limited information (2 treatments, 1 Placebo), what would be the most likely purpose (superiority, inferiority, equivalence) ? I'm guessing which ever it is, it's in reference to the placebo group.


We can only guess. The most likely scenarios IMHO are:

❝ I'm going to do a few scenarios for that question. However, the whole '20% difference between treatments' as an effect size really throws me off.


In power calculations we need Δ – the clinically relevant difference. In bioequivalence it was set in the early 1980s to 20% (by convention!). It might be narrower or wider as well. For NTIDs (narrow therapeutic index drugs) it’s 10% and in some regulations for HVDs/HVDPs (highly variable drugs / drug products) 25% or even 30%. Since AUC and Cmax follow a lognormal distribution we apply a multiplicative model (or an additive one on logs); therefore the conventional acceptance range in log-scale is [ln(1-Δ), ln(1/(1-Δ))] = [-0.2231, +0.2231]. These limits are symmetrical around zero and back-transformed we get 80–125%. The data of the third question don’t look like coming from a BE study. I would guess that’s a ‘classical’ clinical trial, parallel design, untransformed data.

❝ Could you demonstrate how this is taken into account in calculating 1 or 2 different scenarios? Let's pretend I'm comparing (i) treatment a vs placebo , (ii) treatment b vs placebo for either inferiority/superiority


Let’s see:
Treatment    n    Mean   SD    s²     CV
——————————————————————————————————————————
A            31   2.52  1.36  1.85   54.0%
B            30   2.13  1.30  1.69   61.0%
C (placebo)  16   0.69  1.01  1.02  146.4%


Pooled standard deviations are calculated according to: s0=√{[(n1–1)s²1+(n2–1)s²s]/(n1+n2–2)}; we get:
Comparison   s0   TR     CV
——————————————————————————————
A vs. C     1.25  1.83   68.5%
B vs. C     1.21  1.44   84.0%
A vs. B     1.33  0.39  341.2%


In clinical trials commonly a 95% CI (as opposed to the 90% in BE) is calculated: CI=TR±tα,n1+n2–2·s0·√[(n1+n2)/n1n2]; we get:
Comparison   Δ   tα,n1+n2–2    95% CI
——————————————————————————————————————
A vs. C     1.83  2.0141   1.05 – 2.61
B vs. C     1.44  2.0154   0.69 – 2.19
A vs. B     0.39  2.0010  -0.29 – 1.07


Locking at a clinical relevant difference of +20% to placebo (0.83=0.69×1,2), treatment A ‘works’ (lower CL>0.83) and B not (lower CL<0.83). Now for the tricky part: We must not assume equal variances – and according to FDA’s guidance should not even test for it. The t-test is fairly robust against deviations from normality but rather sensitive against imbalance – which we have here. Therefore we should apply Satterthwaite’s approximation of the degrees of freedom and use Welch’s t-test instead (note: that’s the default in R). We get:
Comparison   Δ     df    tα,df       95% CI
—————————————————————————————————————————————
A vs. C     1.83  39.09  2.0225   1.05 – 2.61
B vs. C     1.44  37.91  2.0246   0.68 – 2.20
A vs. B     0.39  58.99  2.0010  -0.29 – 1.07

No big deal with this dataset…

Time to fire up R:
require(PowerTOST)
power.RatioF(alpha = 0.025, theta1 = 0.8, theta0 = 2.52/2.13,
             CV = 3.412, n = 31+30, design = "parallel")
[1] 4.896124e-05

Not surprisingly power of the equivalence test A–B is terribly low (small difference, high CV). No idea how to tweak PowerTOST for the superiority cases. Maybe later on (can’t promise). I have a deadline approaching at 24:00 CET. :-(


Tried A > C:
power.RatioF(alpha = 0.025, theta1 = 1.2, theta2 = Inf,
             theta0 = 2.52/0.69, CV = 0.685, n = 31+16,
             design = "parallel")

… and got:
Error in checkmvArgs(lower = lower, upper = upper, mean = delta, corr = corr,  :  mean contains NA

According to help(pmvt):

Note that both -Inf and +Inf may be specified in the lower and upper integral limits in order to compute one-sided probabilities.


PowerTOST goes berserk in .power.RatioF attempting to calculate the correlation matrix – which ends up in Inf/Inf and NaN.

Detlew? :confused: Too stupid to setup a workaround.

Using the sledgehammer approach (starting with the upper limit set to theta0):
power.RatioF(alpha = 0.025, theta1 = 1.2, theta2 = 2.52/0.69,
             theta0 = 2.52/0.69, CV = 0.685, n = 31+16,
             design = "parallel")
[1] 0.025

Good. That’s α. Now let’s increase theta0 (since Inf doesn’t work):
theta0    power
  4      0.08441706
  5      0.4485689
  6      0.7621197
  8      0.9619132
 10      0.9920993


Maybe you can test your diplomatic skills (something I’m completely lacking of) and answer with something like:

Treatment A showed superiority to placebo (defined as a 20% increase): +1.83 (95% CI +1.05 ~ +2.61) since the lower CL>0.83. Treatment B showed +1.44 (+0.68 ~ +2.20). However, equivalence between A and B could not be excluded (+0.39, -0.29 ~ +1.07) since the CI includes zero.

Now for the meaningless part:

Power to show equivalence between A and B was only 4.9·10-5, XYZ (include the numbers from sampleN.RatioF here) subjects would be needed in a future study. Assuming a clinically relevant difference of 20% power to show superiority of both treatments to placebo was very high. :-D


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
Activity
 Admin contact
23,655 posts in 4,993 threads, 1,571 registered users;
145 visitors (0 registered, 145 guests [including 31 identified bots]).
Forum time: 18:52 CEST (Europe/Vienna)

The great tragedy of Science – the slaying
of a beautiful hypothesis by an ugly fact.    Thomas Henry Huxley

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5