Helmut
★★★
avatar
Homepage
Vienna, Austria,
2013-10-27 15:15
(4201 d 07:45 ago)

Posting: # 11780
Views: 9,935
 

 “Ideal” sample size of a pilot study? [Power / Sample Size]

Dear all,

in my workshops I try to argue against small pilot studies since – if the uncertainty of the estimated CV is taken into account – the pivotal studies will be larger (see also this thread). Regularly a question is raised:

“We have to pay for both studies. So which is the ‘ideal’ sample size of the pilot?”


Unfortunately it is not that easy. If we define the “ideal” pilot study as the one which results in the minimum total (pilot+pivotal) sample size, we need to have an educated guess about the CV. Only then we could play with this code:

############################################
# ‘Ideal’ sample sizes of pilot studies    #
# based on an educated guess about the CV. #
############################################
require(PowerTOST)
n.min   <- 12   # minimum pilot sample size
n.max   <- 48   # maximum pilot sample size
CV.min  <- 0.1  # minimum expected CV
CV.max  <- 0.3  # maximum expected CV
target  <- 0.8  # target power
GMR     <- 0.95 # expected GMR
alpha   <- 0.05 # type I error probability (default 0.05)
n.pilot <- seq(n.min, n.max, 2)
df      <- n.pilot-2
CV      <- seq(CV.max, CV.min, -0.05)
col     <- colorRampPalette(c("red", "blue"))(length(CV))
for(j in 1:length(CV)) {
  n.pivot <- NULL; n.total <- NULL
  for(k in 1:length(n.pilot)) {
    n <- as.numeric(sampleN.TOST(CV=as.numeric(CVCL(CV=CV[j], df=df[k], side="upper",
           alpha=alpha)[2]), targetpower=target, theta0=GMR, print=F, details=F)[7])
    if(n < 12) n <- 12 # minimum pivotal sample size acc. to GLs
    n.pivot <- c(n.pivot, n)
    n.total <- c(n.total, n.pilot[k]+n.pivot[k])
  }
  if(j == 1) {
    plot(n.pilot, n.total, type="b", pch=16, cex=1.2, xlim=c(n.min, n.max),
    ylim=c(n.min, max(n.total)), xlab="pilot sample size", col=col[j],
    ylab="total (pilot+pivotal) sample size", lwd=2, axes=F, frame.plot=T,
    main="Guessing ‘ideal’ pilot sample sizes (ABE, design 2×2×2)", cex.main=1)
    points(n.pilot, n.pivot, type="b", pch=1, cex=1, lwd=1, col=col[j])
    axis(1, at=seq(n.min, n.max, by=6))
    axis(1, at=seq(n.min+2, n.max-2, by=2), labels=F, tcl=-0.25)
    axis(2, at=seq(n.min, max(n.total), by=12), las=1)
    axis(2, at=seq(n.min+2, max(n.total)-2, by=2), labels=F, tcl=-0.25)
  } else {
  points(n.pilot, n.total, type="b", pch=16, cex=1.2, lwd=2, col=col[j])
  points(n.pilot, n.pivot, type="b", pch=1, cex=1, lwd=1, col=col[j])
  }
  cat(sprintf("%.0f%%:", CV[j]*100), n.total, "\n")
}
legend("bottomright", title=expression(CV[intra]), bg="white", lwd=2,
  legend=sprintf("%.0f%%", CV*100), pch=16, pt.cex=1.2, col=col, seg.len=1.5)

[image]

Full circles = total sample sizes, empty circles = pivotal sample sizes.

┌──────────────────────────────────────────────────────────────────┐
│                          pilot sample size                       │
├──────────────────────────────────────────────────────────────────┤
│     12  14 16 18 20 22 24 26 28 30 32 34 36 38 40  42  44  46  48│
├───┬──────────────────────────────────────────────────────────────┤
│CV%│             total (pilot+pivotal) sample size                │
├───┼──────────────────────────────────────────────────────────────┤
│ 30│108 102 98 94 94 92 92 92 92 94 94 96 96 98 98 100 102 104 104
│ 25│ 80  76 74 72 72 72 72 74 74 76 76 78 80 80 82  84  86  86  88
│ 20│ 58  56 54 54 54 56 56 58 58 60 62 64 64 66 68  70  72  74  76
│ 15│ 38  38 38 40 40 42 44 46 46 48 50 52 54 56 58  58  60  62  64
│ 10│ 26  26 28 30 32 34 36 38 40 42 44 46 48 50 52  54  56  58  60
└───┴──────────────────────────────────────────────────────────────┘

Clearly for every CV there is a minimum, in others words beyond this value the ‘better’ upper CL of the estimated CV will not pay off any more (the pivotal sample sizes decrease less than sample sizes of the pilot increase). For an expected CV of 20% the ideal pilot sample size is 16–20 subjects (pivotal 34–38 based on the upper CL of the CV of 27.9–29.5%).

Now for the tricky part: In order to play with the minimum we need an educated guess of the CV. Let’s say we assume the CV to be 15–25%. 18 subjects are fine for 20–25% – although 16 would be enough for 15%. If the CV is only 10% 12 would be enough. For 30% 22 would be better.

If you are more adventurous you can increase the α-level. For 0.1 instead of the default 0.05 we get:
┌─────────────────────────────────────────────────────────────┐
│                        pilot sample size                    │
├─────────────────────────────────────────────────────────────┤
│    12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46  48│
├───┬─────────────────────────────────────────────────────────┤
│CV%│           total (pilot+pivotal) sample size             │
├───┼─────────────────────────────────────────────────────────┤
│ 30│90 86 84 84 84 84 84 86 86 88 88 90 90 92 94 96 96 98 100
│ 25│68 66 66 66 66 66 68 68 70 70 72 74 76 76 78 80 82 84  86
│ 20│50 48 48 50 50 52 52 54 56 58 58 60 62 64 66 68 70 72  74
│ 15│34 34 36 36 38 40 42 44 44 46 48 50 52 54 56 58 60 62  64
│ 10│24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58  60
└───┴─────────────────────────────────────────────────────────┘


Would a two-stage design help? Maybe. But even then an educated guess of the CV would be helpful designing the study in such a way that we already have a reasonable chance of passing in the first stage.

If you have any hints that the formulation might be highly variable I would opt for a fully replicated design (TRT|RTR or TRTR|RTRT). For some thoughts see this post about applying the lower CL of the CV.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2013-10-27 17:47
(4201 d 05:12 ago)

@ Helmut
Posting: # 11783
Views: 8,558
 

 “Ideal” sample size of a pilot study?

Hi Helmut,

“We have to pay for both studies. So which is the ‘ideal’ sample size of the pilot?”


this thread is extremely interesting!

n <- as.numeric(sampleN.TOST(CV=as.numeric(CVCL(CV=CV[j], df=df[k], side="upper",

       alpha=alpha)[2]), targetpower=target, theta0=GMR, print=F, details=F)[7])


I might indeed read your code wrongly, but if I get it right the line converts the expected CV to a kind of "almost worst-case"-CV within reason as defined by alpha. I can see why, but it makes the resulting curve somewhat pessimistic. "On average" (pardon!) the total sample sizes will tend to be lower. On the other hand we also argued a few times here that observed GMR from a pilot must be taken into account. It gets complicated...

How bout something along these lines:
  1. Define a true CV and true GMR and n.pilot and a fut. criterion ("We will not conduct the pivotal if obsGMR is more than 10% different etc. or if total sample size is so-and-so.").
  2. Simulate a pilot trial w. true CV and GMR.
  3. Extract obsCV and obsGMR, calculate n.pivot sample size.
  4. If fut. criterion is met, bummer, go back to 2.
  5. Simulate a pivotal with true CV and true GMR.
  6. Find out if it is bioequivalent, record the stats.
  7. Repeat from pt. 2 many times.

Playing around with alpha might be relevant too.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2013-10-27 19:27
(4201 d 03:33 ago)

@ ElMaestro
Posting: # 11784
Views: 8,479
 

 “Ideal” sample size for TSDs’ n1?

Hi ElMaestro,

❝ this thread is extremely interesting!


Nice to read!

❝ […] the line converts the expected CV to a kind of "almost worst-case"-CV within reason as defined by alpha. I can see why, but it makes the resulting curve somewhat pessimistic. "On average" (pardon!) the total sample sizes will tend to be lower.


Correct. That’s why I added the second (somewhat more optimistic) table. Of course everybody is free to choose another α. Try it: Any value >0.1 likely will tell you that the ideal pilot should have just 12 subjects. What about the reliability of the GMR? Which leads to…

❝ […] we also argued a few times here that observed GMR from a pilot must be taken into account. It gets complicated...


Indeed. Actually the expected GMR lies somewhere within the CI – which might be terribly wide if estimated from a small pilot. I’m not aware whether anybody played around with the CI (maybe a larger α would make sense).

BTW, here are the tables for Potvin’s Methods B/C (average total sample sizes from 106 simulations):
┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                      Method B: stage 1 sample size                               │
├──────────────────────────────────────────────────────────────────────────────────────────────────┤
│    12   14   16   18   20   22   24   26   28   30   32   34   36   38   40   42   44   46   48  │
├───┬──────────────────────────────────────────────────────────────────────────────────────────────┤
│CV%│                                    average total sample size                                 │
├───┼──────────────────────────────────────────────────────────────────────────────────────────────┤
│ 30│46.4 45.7 44.7 43.5 42.1 40.8 39.8 39.2 38.9 38.9 39.3 39.9 40.7 41.7 42.9 44.2 45.7 47.2 48.9
│ 25│32.4 31.3 30.1 29.2 28.6 28.6 29.0 29.7 30.7 31.9 33.3 34.9 36.6 38.3 40.2 42.1 44.1 46.0 48.0
│ 20│20.6 20.0 20.0 20.6 21.7 23.0 24.6 26.3 28.2 30.1 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0
│ 15│13.5 14.7 16.3 18.1 20.0 22.0 24.0 26.0 28.0 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0
│ 10│12.0 14.0 16.0 18.0 20.0 22.0 24.0 26.0 28.0 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0
└───┴──────────────────────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                      Method C: stage 1 sample size                               │
├──────────────────────────────────────────────────────────────────────────────────────────────────┤
│    12   14   16   18   20   22   24   26   28   30   32   34   36   38   40   42   44   46   48  │
├───┬──────────────────────────────────────────────────────────────────────────────────────────────┤
│CV%│                                    average total sample size                                 │
├───┼──────────────────────────────────────────────────────────────────────────────────────────────┤
│ 30│46.6 45.8 44.9 43.6 42.2 40.9 39.9 39.2 38.9 38.9 39.2 39.7 40.4 41.4 42.5 43.8 45.3 46.9 48.6
│ 25│32.5 31.4 30.2 29.2 28.6 28.5 28.9 29.5 30.5 31.7 33.1 34.7 36.4 38.2 40.1 42.0 44.0 46.0 48.0
│ 20│20.6 20.0 20.0 20.5 21.5 22.9 24.5 26.2 28.1 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0
│ 15│13.4 14.6 16.2 18.1 20.0 22.0 24.0 26.0 28.0 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0
│ 10│12.0 14.0 16.0 18.0 20.0 22.0 24.0 26.0 28.0 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0
└───┴──────────────────────────────────────────────────────────────────────────────────────────────┘

We see a similar pattern of the “ideal” stage 1 sample sizes, which relies on a best guess of the CV again. But there’s a major difference to the pilot/pivotal-combo. It might not be the primary interest of the sponsor to pass with the minimum total sample size, but already in the first stage. So for CV 20% likely the best choice of n1 should not be based on the smallest total sample size 20 (with n1 14 chance to proceed to the second stage 44%) but we rather would opt for n1 24 (only 8% in stage 2) – which is also the sample size in a fixed sample design with α 0.0294.

❝ How bout something along these lines: [1–7]



Have mercy and show more respect for my peanut-sized brain! Have to think it over. As you know, futility rules might kick power in the arse.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2013-10-27 20:09
(4201 d 02:51 ago)

@ Helmut
Posting: # 11785
Views: 8,466
 

 “Ideal” sample size for TSDs’ n1?

Hi Hötzi,

❝ Have mercy and show more respect for my peanut-sized brain! Have to think it over. As you know, futility rules might kick power in the arse.


Yes, I heard about that...
Anyways, if I do the coding, will you run the sims, and draft an ms with us both as co-authors?

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2013-10-27 20:19
(4201 d 02:41 ago)

@ ElMaestro
Posting: # 11786
Views: 8,476
 

 “Ideal” sample size for TSDs’ n1?

Hi ElMaestro,

❝ Yes, I heard about that...


:-D

❝ Anyways, if I do the coding, will you run the sims,…


Positive maybe. My main machine is nine years old and not the speediest around (Detlew’s is 4times faster).

❝ …and draft an ms with us both as co-authors?


Yeah, why not? I luv sim’s.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2013-10-28 14:29
(4200 d 08:30 ago)

@ ElMaestro
Posting: # 11793
Views: 8,141
 

 Alpha?

Hi ElMaestro,

❝ Playing around with alpha might be relevant too.


Hhm, why do you think so? IMHO, whatever we do here might only affect power. The pivotal study will stand on its own and doesn’t need any α-adjustment.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2013-10-28 15:23
(4200 d 07:37 ago)

@ Helmut
Posting: # 11794
Views: 8,184
 

 Alpha?

Hi Hötzi,

❝ Hhm, why do you think so? IMHO, whatever we do here might only affect power. The pivotal study will stand on its own and doesn’t need any α-adjustment.


We're on the same page. Pilots and pivotals are traditionally stand-alone events. It gets a little tricky if applicants specify that a successful pilot must be accepted as a pivotal, which happens and is ok with some authorities. Success at the pilot sage is in itself rare though.
In such a situation alpha speculation may be relevant due to type I errors. I am fairly sure a reviewer would raise the point if a paper is submitted. Especially if it ends up in hands of people who have reviewed Potvin's paper and the sequelae. :-D

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2013-10-28 16:01
(4200 d 06:59 ago)

@ ElMaestro
Posting: # 11796
Views: 8,171
 

 Alpha?

Hi ElMaestro,

❝ It gets a little tricky if applicants specify that a successful pilot must be accepted as a pivotal, which happens and is ok with some authorities.


Haven’t thought about that! Definitely OK for the FDA (even mentioned in the [image] guidance). Was OK in some European countries (esp. Scandinavia, Germany, Austria). Assessors of a certain southwestern peninsula essentially said:

“Thanks for showing us the GMR and CV. Nice to see such a narrow CI, although that wasn’t the purpose of the study. Now please go ahead and perform the pivotal study based on an appropriate sample size estimation.”

No idea about the current practice.*

❝ Success at the pilot sage is in itself rare though.


Not that rare. For 0.943 ≤ T/R ≤ 1.06 and CV 15% a study with n=12 likely will pass (power 80.2%).

❝ In such a situation alpha speculation may be relevant due to type I errors.


Still I don’t get it. If a pilot “passes” BE I think it should be possible to state in the protocol the intention to submit the study as pivotal evidence. Would make sense if the CV is already expected to be low and/or the GMR close to 1. No adjustment necessary, IMHO. Whether peninsulists would accept that is another story.

❝ I am fairly sure a reviewer would raise the point if a paper is submitted. Especially if it ends up in hands of people who have reviewed Potvin's paper and the sequelae. :-D


If you have Method C in mind, yes.


  • Might well be that the sample size estimation based on the pilot’s CV will suggest us to run the pivotal study in less subjects than the pilot. :angry:

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2013-10-31 00:49
(4197 d 22:11 ago)

@ ElMaestro
Posting: # 11827
Views: 8,202
 

 12 = bad

Hi ElMaestro,

I found a goody from the 2008 AAPS Annual Meeting

Gagné J-F, Shink É, Trabelsi F, and M Tanguay
Evaluation of the reliability associated with pilot bioequivalence studies
Poster W5317, online abstract

Quote:

Simulations showed that given a 85% theoretical PE (bad formulation) and a sample-size of 12 subjects, the probability of detecting a bad formulation (observed PE outside 95%-105%) is ≥85% for all ISCVs tested.
However, given a theoretical ratio of 100% (good formulation), a sample-size of 12 subjects, and an ISCV of 20%, the probability of detecting a good formulation (observed PE within 95%-105%) is 48% only. This probability goes down to ≤30% for highly variable drugs.


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
UA Flag
Activity
 Admin contact
23,424 posts in 4,927 threads, 1,673 registered users;
92 visitors (0 registered, 92 guests [including 2 identified bots]).
Forum time: 00:00 CEST (Europe/Vienna)

There are two possible outcomes: if the result confirms the
hypothesis, then you’ve made a measurement. If the result is
contrary to the hypothesis, then you’ve made a discovery.    Enrico Fermi

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5