Bioequivalence and Bioavailability Forum • “Ideal” sample size of a pilot study?

Helmut
★★★

Vienna, Austria,
2013-10-27 15:15
(4201 d 07:45 ago)

Posting: # 11780
Views: 9,935

“Ideal” sample size of a pilot study? [Power / Sample Size]

Dear all,

in my workshops I try to argue against small pilot studies since – if the uncertainty of the estimated CV is taken into account – the pivotal studies will be larger (see also this thread). Regularly a question is raised:

“We have to pay for both studies. So which is the ‘ideal’ sample size of the pilot?”

Unfortunately it is not that easy. If we define the “ideal” pilot study as the one which results in the minimum total (pilot+pivotal) sample size, we need to have an educated guess about the CV. Only then we could play with this code:

############################################

# ‘Ideal’ sample sizes of pilot studies    #

# based on an educated guess about the CV. #

############################################

require(PowerTOST)

n.min   <- 12   # minimum pilot sample size

n.max   <- 48   # maximum pilot sample size

CV.min  <- 0.1  # minimum expected CV

CV.max  <- 0.3  # maximum expected CV

target  <- 0.8  # target power

GMR     <- 0.95 # expected GMR

alpha   <- 0.05 # type I error probability (default 0.05)

n.pilot <- seq(n.min, n.max, 2)

df      <- n.pilot-2

CV      <- seq(CV.max, CV.min, -0.05)

col     <- colorRampPalette(c("red", "blue"))(length(CV))

for(j in 1:length(CV)) {

  n.pivot <- NULL; n.total <- NULL

  for(k in 1:length(n.pilot)) {

    n <- as.numeric(sampleN.TOST(CV=as.numeric(CVCL(CV=CV[j], df=df[k], side="upper",

           alpha=alpha)[2]), targetpower=target, theta0=GMR, print=F, details=F)[7])

    if(n < 12) n <- 12 # minimum pivotal sample size acc. to GLs

    n.pivot <- c(n.pivot, n)

    n.total <- c(n.total, n.pilot[k]+n.pivot[k])

  }

  if(j == 1) {

    plot(n.pilot, n.total, type="b", pch=16, cex=1.2, xlim=c(n.min, n.max),

    ylim=c(n.min, max(n.total)), xlab="pilot sample size", col=col[j],

    ylab="total (pilot+pivotal) sample size", lwd=2, axes=F, frame.plot=T,

    main="Guessing ‘ideal’ pilot sample sizes (ABE, design 2×2×2)", cex.main=1)

    points(n.pilot, n.pivot, type="b", pch=1, cex=1, lwd=1, col=col[j])

    axis(1, at=seq(n.min, n.max, by=6))

    axis(1, at=seq(n.min+2, n.max-2, by=2), labels=F, tcl=-0.25)

    axis(2, at=seq(n.min, max(n.total), by=12), las=1)

    axis(2, at=seq(n.min+2, max(n.total)-2, by=2), labels=F, tcl=-0.25)

  } else {

  points(n.pilot, n.total, type="b", pch=16, cex=1.2, lwd=2, col=col[j])

  points(n.pilot, n.pivot, type="b", pch=1, cex=1, lwd=1, col=col[j])

  }

  cat(sprintf("%.0f%%:", CV[j]*100), n.total, "\n")

}

legend("bottomright", title=expression(CV[intra]), bg="white", lwd=2,

  legend=sprintf("%.0f%%", CV*100), pch=16, pt.cex=1.2, col=col, seg.len=1.5)

Full circles = total sample sizes, empty circles = pivotal sample sizes.

┌──────────────────────────────────────────────────────────────────┐

│                          pilot sample size                       │

├──────────────────────────────────────────────────────────────────┤

│     12  14 16 18 20 22 24 26 28 30 32 34 36 38 40  42  44  46  48│

├───┬──────────────────────────────────────────────────────────────┤

│CV%│             total (pilot+pivotal) sample size                │

├───┼──────────────────────────────────────────────────────────────┤

│ 30│108 102 98 94 94 92 92 92 92 94 94 96 96 98 98 100 102 104 104│

│ 25│ 80  76 74 72 72 72 72 74 74 76 76 78 80 80 82  84  86  86  88│

│ 20│ 58  56 54 54 54 56 56 58 58 60 62 64 64 66 68  70  72  74  76│

│ 15│ 38  38 38 40 40 42 44 46 46 48 50 52 54 56 58  58  60  62  64│

│ 10│ 26  26 28 30 32 34 36 38 40 42 44 46 48 50 52  54  56  58  60│

└───┴──────────────────────────────────────────────────────────────┘

Clearly for every CV there is a minimum, in others words beyond this value the ‘better’ upper CL of the estimated CV will not pay off any more (the pivotal sample sizes decrease less than sample sizes of the pilot increase). For an expected CV of 20% the ideal pilot sample size is 16–20 subjects (pivotal 34–38 based on the upper CL of the CV of 27.9–29.5%).

Now for the tricky part: In order to play with the minimum we need an educated guess of the CV. Let’s say we assume the CV to be 15–25%. 18 subjects are fine for 20–25% – although 16 would be enough for 15%. If the CV is only 10% 12 would be enough. For 30% 22 would be better.

If you are more adventurous you can increase the α-level. For 0.1 instead of the default 0.05 we get:

┌─────────────────────────────────────────────────────────────┐

│                        pilot sample size                    │

├─────────────────────────────────────────────────────────────┤

│    12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46  48│

├───┬─────────────────────────────────────────────────────────┤

│CV%│           total (pilot+pivotal) sample size             │

├───┼─────────────────────────────────────────────────────────┤

│ 30│90 86 84 84 84 84 84 86 86 88 88 90 90 92 94 96 96 98 100│

│ 25│68 66 66 66 66 66 68 68 70 70 72 74 76 76 78 80 82 84  86│

│ 20│50 48 48 50 50 52 52 54 56 58 58 60 62 64 66 68 70 72  74│

│ 15│34 34 36 36 38 40 42 44 44 46 48 50 52 54 56 58 60 62  64│

│ 10│24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58  60│

└───┴─────────────────────────────────────────────────────────┘

Would a two-stage design help? Maybe. But even then an educated guess of the CV would be helpful designing the study in such a way that we already have a reasonable chance of passing in the first stage.

If you have any hints that the formulation might be highly variable I would opt for a fully replicated design (TRT|RTR or TRTR|RTRT). For some thoughts see this post about applying the lower CL of the CV.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2013-10-27 17:47
(4201 d 05:12 ago)

@ Helmut
Posting: # 11783
Views: 8,558

“Ideal” sample size of a pilot study?

Post reply

Hi Helmut,

❝ “We have to pay for both studies. So which is the ‘ideal’ sample size of the pilot?”

this thread is extremely interesting!

❝ n <- as.numeric(sampleN.TOST(CV=as.numeric(CVCL(CV=CV[j], df=df[k], side="upper",

❝ alpha=alpha)[2]), targetpower=target, theta0=GMR, print=F, details=F)[7])

I might indeed read your code wrongly, but if I get it right the line converts the expected CV to a kind of "almost worst-case"-CV within reason as defined by alpha. I can see why, but it makes the resulting curve somewhat pessimistic. "On average" (pardon!) the total sample sizes will tend to be lower. On the other hand we also argued a few times here that observed GMR from a pilot must be taken into account. It gets complicated...

How bout something along these lines:

Define a true CV and true GMR and n.pilot and a fut. criterion ("We will not conduct the pivotal if obsGMR is more than 10% different etc. or if total sample size is so-and-so.").
Simulate a pilot trial w. true CV and GMR.
Extract obsCV and obsGMR, calculate n.pivot sample size.
If fut. criterion is met, bummer, go back to 2.
Simulate a pivotal with true CV and true GMR.
Find out if it is bioequivalent, record the stats.
Repeat from pt. 2 many times.

Playing around with alpha might be relevant too.

—
Pass or fail!
ElMaestro

Helmut
★★★

Vienna, Austria,
2013-10-27 19:27
(4201 d 03:33 ago)

@ ElMaestro
Posting: # 11784
Views: 8,479

“Ideal” sample size for TSDs’ n1?

Post reply

Hi ElMaestro,

❝ this thread is extremely interesting!

Nice to read!

❝ […] the line converts the expected CV to a kind of "almost worst-case"-CV within reason as defined by alpha. I can see why, but it makes the resulting curve somewhat pessimistic. "On average" (pardon!) the total sample sizes will tend to be lower.

Correct. That’s why I added the second (somewhat more optimistic) table. Of course everybody is free to choose another α. Try it: Any value >0.1 likely will tell you that the ideal pilot should have just 12 subjects. What about the reliability of the GMR? Which leads to…

❝ […] we also argued a few times here that observed GMR from a pilot must be taken into account. It gets complicated...

Indeed. Actually the expected GMR lies somewhere within the CI – which might be terribly wide if estimated from a small pilot. I’m not aware whether anybody played around with the CI (maybe a larger α would make sense).

BTW, here are the tables for Potvin’s Methods B/C (average total sample sizes from 10⁶ simulations):

┌──────────────────────────────────────────────────────────────────────────────────────────────────┐

│                                      Method B: stage 1 sample size                               │

├──────────────────────────────────────────────────────────────────────────────────────────────────┤

│    12   14   16   18   20   22   24   26   28   30   32   34   36   38   40   42   44   46   48  │

├───┬──────────────────────────────────────────────────────────────────────────────────────────────┤

│CV%│                                    average total sample size                                 │

├───┼──────────────────────────────────────────────────────────────────────────────────────────────┤

│ 30│46.4 45.7 44.7 43.5 42.1 40.8 39.8 39.2 38.9 38.9 39.3 39.9 40.7 41.7 42.9 44.2 45.7 47.2 48.9│

│ 25│32.4 31.3 30.1 29.2 28.6 28.6 29.0 29.7 30.7 31.9 33.3 34.9 36.6 38.3 40.2 42.1 44.1 46.0 48.0│

│ 20│20.6 20.0 20.0 20.6 21.7 23.0 24.6 26.3 28.2 30.1 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0│

│ 15│13.5 14.7 16.3 18.1 20.0 22.0 24.0 26.0 28.0 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0│

│ 10│12.0 14.0 16.0 18.0 20.0 22.0 24.0 26.0 28.0 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0│

└───┴──────────────────────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────────────────────────┐

│                                      Method C: stage 1 sample size                               │

├──────────────────────────────────────────────────────────────────────────────────────────────────┤

│    12   14   16   18   20   22   24   26   28   30   32   34   36   38   40   42   44   46   48  │

├───┬──────────────────────────────────────────────────────────────────────────────────────────────┤

│CV%│                                    average total sample size                                 │

├───┼──────────────────────────────────────────────────────────────────────────────────────────────┤

│ 30│46.6 45.8 44.9 43.6 42.2 40.9 39.9 39.2 38.9 38.9 39.2 39.7 40.4 41.4 42.5 43.8 45.3 46.9 48.6│

│ 25│32.5 31.4 30.2 29.2 28.6 28.5 28.9 29.5 30.5 31.7 33.1 34.7 36.4 38.2 40.1 42.0 44.0 46.0 48.0│

│ 20│20.6 20.0 20.0 20.5 21.5 22.9 24.5 26.2 28.1 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0│

│ 15│13.4 14.6 16.2 18.1 20.0 22.0 24.0 26.0 28.0 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0│

│ 10│12.0 14.0 16.0 18.0 20.0 22.0 24.0 26.0 28.0 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0│

└───┴──────────────────────────────────────────────────────────────────────────────────────────────┘

We see a similar pattern of the “ideal” stage 1 sample sizes, which relies on a best guess of the CV again. But there’s a major difference to the pilot/pivotal-combo. It might not be the primary interest of the sponsor to pass with the minimum total sample size, but already in the first stage. So for CV 20% likely the best choice of n₁ should not be based on the smallest total sample size 20 (with n₁ 14 chance to proceed to the second stage 44%) but we rather would opt for n₁ 24 (only 8% in stage 2) – which is also the sample size in a fixed sample design with α 0.0294.

❝ How bout something along these lines: [1–7]

Have mercy and show more respect for my peanut-sized brain! Have to think it over. As you know, futility rules might kick power in the arse.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro ★★★ Denmark, 2013-10-27 20:09 (4201 d 02:51 ago) @ Helmut Posting: # 11785 Views: 8,466	“Ideal” sample size for TSDs’ n1? Post reply
	Hi Hötzi, ❝ Have mercy and show more respect for my peanut-sized brain! Have to think it over. As you know, futility rules might kick power in the arse. Yes, I heard about that... Anyways, if I do the coding, will you run the sims, and draft an ms with us both as co-authors? — Pass or fail! ElMaestro

Helmut
★★★

Vienna, Austria,
2013-10-27 20:19
(4201 d 02:41 ago)

@ ElMaestro
Posting: # 11786
Views: 8,476

“Ideal” sample size for TSDs’ n1?

Post reply

Hi ElMaestro,

❝ Yes, I heard about that...

❝ Anyways, if I do the coding, will you run the sims,…

Positive maybe. My main machine is nine years old and not the speediest around (Detlew’s is 4times faster).

❝ …and draft an ms with us both as co-authors?

Yeah, why not? I luv sim’s.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Helmut
★★★

Vienna, Austria,
2013-10-28 14:29
(4200 d 08:30 ago)

@ ElMaestro
Posting: # 11793
Views: 8,141

Alpha?

Post reply

Hi ElMaestro,

❝ Playing around with alpha might be relevant too.

Hhm, why do you think so? IMHO, whatever we do here might only affect power. The pivotal study will stand on its own and doesn’t need any α-adjustment.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2013-10-28 15:23
(4200 d 07:37 ago)

@ Helmut
Posting: # 11794
Views: 8,184

Alpha?

Post reply

Hi Hötzi,

❝ Hhm, why do you think so? IMHO, whatever we do here might only affect power. The pivotal study will stand on its own and doesn’t need any α-adjustment.

We're on the same page. Pilots and pivotals are traditionally stand-alone events. It gets a little tricky if applicants specify that a successful pilot must be accepted as a pivotal, which happens and is ok with some authorities. Success at the pilot sage is in itself rare though.
In such a situation alpha speculation may be relevant due to type I errors. I am fairly sure a reviewer would raise the point if a paper is submitted. Especially if it ends up in hands of people who have reviewed Potvin's paper and the sequelae. :-D

—
Pass or fail!
ElMaestro

Helmut
★★★

Vienna, Austria,
2013-10-28 16:01
(4200 d 06:59 ago)

@ ElMaestro
Posting: # 11796
Views: 8,171

Alpha?

Post reply

Hi ElMaestro,

❝ It gets a little tricky if applicants specify that a successful pilot must be accepted as a pivotal, which happens and is ok with some authorities.

Haven’t thought about that! Definitely OK for the FDA (even mentioned in the [image]

guidance). Was OK in some European countries (esp. Scandinavia, Germany, Austria). Assessors of a certain southwestern peninsula essentially said:

“Thanks for showing us the GMR and CV. Nice to see such a narrow CI, although that wasn’t the purpose of the study. Now please go ahead and perform the pivotal study based on an appropriate sample size estimation.”

No idea about the current practice.*

❝ Success at the pilot sage is in itself rare though.

Not that rare. For 0.943 ≤ T/R ≤ 1.06 and CV 15% a study with n=12 likely will pass (power 80.2%).

❝ In such a situation alpha speculation may be relevant due to type I errors.

Still I don’t get it. If a pilot “passes” BE I think it should be possible to state in the protocol the intention to submit the study as pivotal evidence. Would make sense if the CV is already expected to be low and/or the GMR close to 1. No adjustment necessary, IMHO. Whether peninsulists would accept that is another story.

❝ I am fairly sure a reviewer would raise the point if a paper is submitted. Especially if it ends up in hands of people who have reviewed Potvin's paper and the sequelae. :-D

If you have Method C in mind, yes.

Might well be that the sample size estimation based on the pilot’s CV will suggest us to run the pivotal study in less subjects than the pilot.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Helmut
★★★

Vienna, Austria,
2013-10-31 00:49
(4197 d 22:11 ago)

@ ElMaestro
Posting: # 11827
Views: 8,202

12 = bad

Post reply

Hi ElMaestro,

I found a goody from the 2008 AAPS Annual Meeting

Gagné J-F, Shink É, Trabelsi F, and M Tanguay
Evaluation of the reliability associated with pilot bioequivalence studies
Poster W5317, online abstract

Quote:

Simulations showed that given a 85% theoretical PE (bad formulation) and a sample-size of 12 subjects, the probability of detecting a bad formulation (observed PE outside 95%-105%) is ≥85% for all ISCVs tested.
However, given a theoretical ratio of 100% (good formulation), a sample-size of 12 subjects, and an ISCV of 20%, the probability of detecting a good formulation (observed PE within 95%-105%) is 48% only. This probability goes down to ≤30% for highly variable drugs.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes