Misusing PowerTOST for superiority? [Power / Sample Size]
❝ […] They replied asking me to take the weekend to try once more - the interview went very well and they seemed to like me. I have to submit something tomorrow.
❝
❝ I'm really starting to think that these questions were vague on purpose? I'm wondering if they want me to respond back by saying additional information is required (for example, alpha = 0.05, or 0.01? for these questions?)
Difficult to perform such a kind of telediagnosis across the pond. Either they purposely asked catch questions (OK) or they think that these questions make sense (bad). In the latter case be prepared for a hard every day work life.
❝ In regards to the CV study, I'm going to try to locate the Chow and Liu source, it seems my best bet.
Since time is running out: Check your inbox.

❝ For the last question, as you stated, I have no idea what the goal of the study was. What I posted in green text was literally what I got. Given that limited information (2 treatments, 1 Placebo), what would be the most likely purpose (superiority, inferiority, equivalence) ? I'm guessing which ever it is, it's in reference to the placebo group.
We can only guess. The most likely scenarios IMHO are:
- Superiority case: A vs. placebo and B vs. placebo. Target of the study is to select the ‘better’ treatment.
- Equivalence case: A vs. B, internal validation (superiority of both to placebo)
❝ I'm going to do a few scenarios for that question. However, the whole '20% difference between treatments' as an effect size really throws me off.
In power calculations we need Δ – the clinically relevant difference. In bioequivalence it was set in the early 1980s to 20% (by convention!). It might be narrower or wider as well. For NTIDs (narrow therapeutic index drugs) it’s 10% and in some regulations for HVDs/HVDPs (highly variable drugs / drug products) 25% or even 30%. Since AUC and Cmax follow a lognormal distribution we apply a multiplicative model (or an additive one on logs); therefore the conventional acceptance range in log-scale is [ln(1-Δ), ln(1/(1-Δ))] = [-0.2231, +0.2231]. These limits are symmetrical around zero and back-transformed we get 80–125%. The data of the third question don’t look like coming from a BE study. I would guess that’s a ‘classical’ clinical trial, parallel design, untransformed data.
❝ Could you demonstrate how this is taken into account in calculating 1 or 2 different scenarios? Let's pretend I'm comparing (i) treatment a vs placebo , (ii) treatment b vs placebo for either inferiority/superiority
Let’s see:
Treatment n Mean SD s² CV
——————————————————————————————————————————
A 31 2.52 1.36 1.85 54.0%
B 30 2.13 1.30 1.69 61.0%
C (placebo) 16 0.69 1.01 1.02 146.4%Pooled standard deviations are calculated according to: s0=√{[(n1–1)s²1+(n2–1)s²s]/(n1+n2–2)}; we get:
Comparison s0 T–R CV
——————————————————————————————
A vs. C 1.25 1.83 68.5%
B vs. C 1.21 1.44 84.0%
A vs. B 1.33 0.39 341.2%In clinical trials commonly a 95% CI (as opposed to the 90% in BE) is calculated: CI=T–R±tα,n1+n2–2·s0·√[(n1+n2)/n1n2]; we get:
Comparison Δ tα,n1+n2–2 95% CI
——————————————————————————————————————
A vs. C 1.83 2.0141 1.05 – 2.61
B vs. C 1.44 2.0154 0.69 – 2.19
A vs. B 0.39 2.0010 -0.29 – 1.07Locking at a clinical relevant difference of +20% to placebo (0.83=0.69×1,2), treatment A ‘works’ (lower CL>0.83) and B not (lower CL<0.83). Now for the tricky part: We must not assume equal variances – and according to FDA’s guidance should not even test for it. The t-test is fairly robust against deviations from normality but rather sensitive against imbalance – which we have here. Therefore we should apply Satterthwaite’s approximation of the degrees of freedom and use Welch’s t-test instead (note: that’s the default in R). We get:
Comparison Δ df tα,df 95% CI
—————————————————————————————————————————————
A vs. C 1.83 39.09 2.0225 1.05 – 2.61
B vs. C 1.44 37.91 2.0246 0.68 – 2.20
A vs. B 0.39 58.99 2.0010 -0.29 – 1.07No big deal with this dataset…
Time to fire up R:
require(PowerTOST)
power.RatioF(alpha = 0.025, theta1 = 0.8, theta0 = 2.52/2.13,
CV = 3.412, n = 31+30, design = "parallel")
[1] 4.896124e-05Not surprisingly power of the equivalence test A–B is terribly low (small difference, high CV). No idea how to tweak
PowerTOST for the superiority cases. Maybe later on (can’t promise). I have a deadline approaching at 24:00 CET. 
Tried A > C:
power.RatioF(alpha = 0.025, theta1 = 1.2, theta2 = Inf,
theta0 = 2.52/0.69, CV = 0.685, n = 31+16,
design = "parallel")… and got:
Error in checkmvArgs(lower = lower, upper = upper, mean = delta, corr = corr, : mean contains NAAccording to
help(pmvt):Note that both -Inf and +Inf may be specified in the lower and upper integral limits in order to compute one-sided probabilities.
PowerTOST goes berserk in .power.RatioF attempting to calculate the correlation matrix – which ends up in Inf/Inf and NaN.Detlew?
Too stupid to setup a workaround.Using the sledgehammer approach (starting with the upper limit set to theta0):
power.RatioF(alpha = 0.025, theta1 = 1.2, theta2 = 2.52/0.69,
theta0 = 2.52/0.69, CV = 0.685, n = 31+16,
design = "parallel")
[1] 0.025Good. That’s α. Now let’s increase theta0 (since Inf doesn’t work):
theta0 power
4 0.08441706
5 0.4485689
6 0.7621197
8 0.9619132
10 0.9920993Maybe you can test your diplomatic skills (something I’m completely lacking of) and answer with something like:
Treatment A showed superiority to placebo (defined as a 20% increase): +1.83 (95% CI +1.05 ~ +2.61) since the lower CL>0.83. Treatment B showed +1.44 (+0.68 ~ +2.20). However, equivalence between A and B could not be excluded (+0.39, -0.29 ~ +1.07) since the CI includes zero.
Now for the meaningless part:Power to show equivalence between A and B was only 4.9·10-5, XYZ (include the numbers from sampleN.RatioF here) subjects would be needed in a future study. Assuming a clinically relevant difference of 20% power to show superiority of both treatments to placebo was very high. 
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
![[image]](https://static.bebac.at/pics/Blue_and_yellow_ribbon_UA.png)
Helmut Schütz
![[image]](https://static.bebac.at/img/CC by.png)
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Complete thread:
- Sample Size and CV for Replicate Design Jamesmartinn 2011-11-26 20:22
- Sample Size and CV for Replicate Design ElMaestro 2011-11-26 22:38
- Sample Size and CV for Replicate Design Jamesmartinn 2011-11-26 23:52
- Sample Size and CV for Replicate Design ElMaestro 2011-11-27 00:43
- Sample Size and CV for Replicate Design Jamesmartinn 2011-11-27 00:50
- Sample Size and CV for Replicate Design ElMaestro 2011-11-27 01:24
- Sample Size and CV for Replicate Design Jamesmartinn 2011-11-27 01:50
- Strange questions… Helmut 2011-11-27 04:47
- Strange questions… Jamesmartinn 2011-11-27 14:43
- Misusing PowerTOST for superiority?Helmut 2011-11-27 17:46
- Strange questions… ElMaestro 2011-11-27 15:14
- Strange questions… Helmut 2011-11-27 18:07
- Strange questions… Jamesmartinn 2011-11-27 14:43
- Strange questions… Helmut 2011-11-27 04:47
- Sample Size and CV for Replicate Design Jamesmartinn 2011-11-27 01:50
- Sample Size and CV for Replicate Design ElMaestro 2011-11-27 01:24
- Sample Size and CV for Replicate Design Jamesmartinn 2011-11-27 00:50
- Sample Size and CV for Replicate Design ElMaestro 2011-11-27 00:43
- Sample Size and CV for Replicate Design Jamesmartinn 2011-11-26 23:52
- Sample Size and CV for Replicate Design ElMaestro 2011-11-26 22:38
