ElMaestro
★★★

Denmark,
2017-08-19 15:04
(1256 d 18:10 ago)

Posting: # 17709
Views: 26,183

## Initial sample size guess for the Potvin methods [Two-Stage / GS Designs]

Hi all,

just wanted to share a little algo for the initial sample size guess for the Potvin methods.
This is an area where speed gains are possible. A good initial sample size guess can eliminate quite some computation time. As I never got Zhang's method I played a little around with an entirely empirical idea.

My idea is that at least the upper part of a curve of power as function of sample size can be modeled as a kind of sigmoid function. I googled around to find a few expressions for such curves. The logistic function is one such. We can write the approximation something like:
Power=1/(1+exp(-k(N-x0))) 
where N is the sample size.

If we have two estimates of power p1, p2 at sample sizes nps1 and nps2 (I use number of subjects per sequence, abbreviated as nps), then we can determine the constants:
 a2=log(1/p2 - 1) a1=log(1/p1 - 1) x0=(a2*nps1-a1*nps2)/(a2-a1) k=a1/(x0-nps1) 

We can then solve for the desired power Pwr.T:
nps= x0+ (log(1/Pwr.T -1) / (-k))
- and this should of course converted to integer.

It turns out in my application to work really, really well.
The "only" issue is if the initial nps1 and nps2 are chosen poorly; p1 needs to be "not too close" to zero and p2 needs to be "not too close" to 1. But that's it.
My current implementation is this:

GetStartNps.X3=function(GMR, CV, Pwr.T) {   nps1=6   p1=Power.calc(nps1, GMR, CV, Is.St2=1)  ##equation in Potvin et al.   for (i in 1:3)    if (p1<0.1) { nps1=4*nps1;p1=Power.calc(nps1, GMR, CV, Is.St2=1);}   nps2=20*nps1   p2=Power.calc(nps2, GMR, CV, Is.St2=1)   f=2.4   while(p2>0.99) { nps2=floor(nps2/f);p2=Power.calc(nps2, GMR, CV, Is.St2=1);}   a2=log(1/p2 - 1)   a1=log(1/p1 - 1)   x0=(a2*nps1-a1*nps2)/(a2-a1)   k=a1/(x0-nps1)   Nps=floor(    x0+ (log(1/Pwr.T -1) / (-k))     )   if (Nps<6) Nps=6   return(Nps) }

Works well for me for assumed GMR's close to 1 (such as 0.95) and I am sure if anyone cares to fiddle with it it can be improved much, much further to work well in "all" scenarios. Play around with the constants in red to get the optimization that works for you.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-19 16:06
(1256 d 17:09 ago)

@ ElMaestro
Posting: # 17710
Views: 25,251

## Initial sample size guess for the Potvin methods

Hi ElMaestro,

I tried GetStartNps.X3(GMR=0.95, CV=0.2, Pwr.T=0.8) and was honored with
Error in Power.calc(nps1, GMR, CV, Is.St2 = 1) :   could not find function "Power.calc"
Can you provide the function Power.calc(), please?

My standard code gives:

GMR              : 0.95 target power     : 0.8 ‘best guess’ CV  : 0.2 Fixed sample size: 20   power          : 0.835 ‘Type 1’ TSD, n1 : 16 (80.0% of N)   ASN            : 20.0   power (interim): 0.619   power (final)  : 0.851

How does it compare to yours?

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-19 16:14
(1256 d 17:00 ago)

(edited by ElMaestro on 2017-08-19 17:14)
@ Helmut
Posting: # 17711
Views: 25,255

## Initial sample size guess for the Potvin methods

Hi Hötzi,

» Error in Power.calc(nps1, GMR, CV, Is.St2 = 1) :  »   could not find function "Power.calc"
» Did I specify the arguments wrongly?

That's why I included the remark:
##equation in Potvin et al.
I thought you'd be plugging in your own, but be that as it may.

An implementation of the function is here:

alpha1=0.0294 alpha2=0.05 vecQT1=c(1:5000) vecQT2=c(1:5000) for (i in 1:5000)    {     vecQT1[i]=qt(p=1-alpha1, df=i)     vecQT2[i]=qt(p=1-alpha2, df=i)    } Power.calc=function(Nps, GMR, CV, Is.St2=0) {   s=sqrt(log(1+CV*CV))   nom1=log(1.25/GMR)   nom2=-log(1.25*GMR)   den=s*sqrt(2/(2*Nps))   df=Nps*2-2   if (Is.St2==T) df=df-1   ##cat("1-a=", 1-alpha, "df=",df,"\n")   if (Is.St2==0) q=vecQT1[df]     else         q=vecQT2[df]   pw=pt( (nom1/den) -q, df) - pt( (nom2/den) +q, df)   if (pw<0) pw=0.01   return(pw) }

Edit: Forgot to write: For classical B the alphas are (0.0294, 0.0294) and for C they are (0.05, 0.0294) though that isn't related to the topic. Second, Hötzi, you gave a little table which I am not very sure how to read. Note this isn't about the sample size calculation itself, this is about the initial guess used for the sample size calculation by up/down iteration.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-19 17:12
(1256 d 16:02 ago)

@ ElMaestro
Posting: # 17712
Views: 25,225

## Initial sample size guess for the Potvin methods

Hi ElMaestro,

» That's why I included the remark:
» ##equation in Potvin et al.
» I thought you'd be plugging in your own, but be that as it may.

I see.

So with your code I end up with 10/seq or 20 in stage 1. That’s actually the sample size of the fixed design. Try this (ASN is the expected average total sample size, pwr is the probability to show BE in stage 1 or 2, stop the chance to end if stage 1, and pct.stg2 the percentage of studies proceeding to stage 2):

library(PowerTOST) library(Power2Stage) CV     <- 0.2 GMR    <- 0.95 target <- 0.8 n.fix  <- sampleN.TOST(CV=CV, theta0=GMR, targetpower=target,                        details=FALSE, print=FALSE)[["Sample size"]] n1     <- seq(12, n.fix, 2) res    <- data.frame(CV=CV, GMR=GMR, target=target, n1=n1, ASN=NA,                      pwr.1=NA, stop=NA, pct.stg2=NA, pwr.2=NA) for (j in seq_along(n1)) {    x <- power.2stage(CV=CV, theta0=GMR, targetpower=target,                      n1=n1[j], details=FALSE)    res[j, "ASN"]      <- x[["nmean"]]    res[j, "pwr.1"]    <- x[["pBE_s1"]]    res[j, "stop"]     <- 100-x[["pct_s2"]]    res[j, "pct.stg2"] <- x[["pct_s2"]]    res[j, "pwr.2"]    <- x[["pBE"]] } print(signif(res, 3), row.names=FALSE)   CV  GMR target n1  ASN pwr.1 stop pct.stg2 pwr.2  0.2 0.95    0.8 12 20.6 0.413 43.7     56.3 0.842  0.2 0.95    0.8 14 20.0 0.527 55.8     44.2 0.848  0.2 0.95    0.8 16 20.0 0.619 66.1     33.9 0.851  0.2 0.95    0.8 18 20.6 0.695 74.6     25.4 0.856  0.2 0.95    0.8 20 21.7 0.752 81.5     18.5 0.862

I still prefer 16 subjects in stage 1 (already ~62% power and a ~66% chance to stop). The final power is similar to your n1 20 but ASN is lower.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-19 17:17
(1256 d 15:58 ago)

@ Helmut
Posting: # 17713
Views: 25,203

## Initial sample size guess for the Potvin methods

Hi Hötzi,

are you sure we are on the same page?
I am talking about the initial guess for the sample size iteration, not the sample size itself.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-19 17:33
(1256 d 15:41 ago)

@ ElMaestro
Posting: # 17714
Views: 25,221

## Confuse-a-Cat

Hi ElMaestro,

» are you sure we are on the same page?
No. Completely misunderstood you.

» I am talking about the initial guess for the sample size iteration, not the sample size itself.

Now I get it – though your code is beyond me. Zhang’s method in many cases hits the right point in the first attempt. Sometimes sampleN.TOST() has to iterate upwards (since Zhang’s method is based on a large sample approximation and power with anything of the t-family will be lower). In very rare cases sampleN.TOST() has to iterate downwards. Can your code do that as well?

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-19 17:56
(1256 d 15:18 ago)

@ Helmut
Posting: # 17715
Views: 25,280

## Confuse-a-Cat

Hi again,

» Now I get it – though your code is beyond me. Zhang’s method in many cases hits the right point in the first attempt. Sometimes sampleN.TOST() has to iterate upwards (since Zhang’s method is based on a large sample approximation and power with anything of the t-family will be lower). In very rare cases sampleN.TOST() has to iterate downwards. Can your code do that as well?

Yes. Basically, get the initial initial guess, determine if it is low or high for the desired power, iterate up or down depending on that.

The reason I am doing this is I got a mental "Does not compute error" when I read Zhang's paper. The goal is to get to the right sample size in as few cycles as possible.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-20 14:40
(1255 d 18:35 ago)

@ ElMaestro
Posting: # 17719
Views: 25,102

## loop ↔ vectorized ↔ direct

Hi ElMaestro,

» vecQT1=c(1:5000)
» vecQT2=c(1:5000)
» for (i in 1:5000)
»    {
»     vecQT1[i]=qt(p=1-alpha1, df=i)
»     vecQT2[i]=qt(p=1-alpha2, df=i)
»    }

Later down you makes calls to vecQT1[df] and vecQT2[df]. That’s awfully slooow. Why not simply call qt(p=foo, df=bar) directly?

library(microbenchmark) loop <- function(alpha1, alpha2, df) {   vecQT2 <- vecQT1 <- numeric()   for (j in 1:5000) {     vecQT1[j]=qt(p=1-alpha1, df=j)     vecQT2[j]=qt(p=1-alpha2, df=j)   }   return(QT=c(vecQT1[df], vecQT2[df])) } vectorized <- function(alpha1, alpha2, df) {   vecQT1 <- qt(p=1-alpha1, df=1:5000)   vecQT2 <- qt(p=1-alpha2, df=1:5000)   return(QT=c(vecQT1[df], vecQT2[df])) } direct <- function(alpha1, alpha2, df) {   return(QT=c(qt(p=1-alpha1, df=df,               qt(p=1-alpha2, df=df)))) } res <- microbenchmark(loop(0.0294, 0.05, 10),                       vectorized(0.0294, 0.05, 10),                       direct(0.0294, 0.05, 10), times=200L,                       control=list("random", warmup=10)) print(res)

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-20 15:22
(1255 d 17:53 ago)

@ Helmut
Posting: # 17721
Views: 25,034

## loop ↔ vectorized ↔ direct

Hi Hötzi,

» Later down you makes calls to vecQT1[df] and vecQT2[df]. That’s awfully slooow. Why not simply call qt(p=foo, df=bar) directly?

That kind of benchmarking is misleading. Here you are doing the vectorise operation of array generation numerous times which will never happens in practice; you only need to initialise the arrays once within every run of 1000000 iterations, possible even keep the arrays/vectors across runs.

I agree that whole "once" operation is faster when you use
vecQT1 <- qt(p=1-alpha1, df=1:5000)
(I had no idea you could do that, actually)
, but if you benchmark the entire you you will see the difference may be a few microsecs out of numerous seconds or minutes. Saving 0.0001% time? That's not optimization, it beautification or achieving the same with fewer lines of code which isn't what I am after.

I am not very experienced with vectorised functions at all - I understand code better if loops are visible. That's just me, of course.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-20 16:23
(1255 d 16:52 ago)

@ ElMaestro
Posting: # 17725
Views: 25,057

## loop ↔ vectorized ↔ direct

Hi ElMaestro,

I don’t understand why you use vectors at all. What ’bout:

df <- Nps*2-2 if (!Is.St2) {   q  <- qt(p=1-alpha1, df=df) } else {   df <- df-1   q  <- qt(p=1-alpha2, df=df) }

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-20 17:22
(1255 d 15:53 ago)

@ Helmut
Posting: # 17726
Views: 25,048

## loop ↔ vectorized ↔ direct

Hi Hötzi,

» I don’t understand why you use vectors at all. What ’bout:
»

df <- Nps*2-2» if (!Is.St2) { »   q  <- qt(p=1-alpha1, df=df)» } else { »   df <- df-1 »   q  <- qt(p=1-alpha2, df=df)» }

Simple, but I think I aready told you: qt is a computationally very expensive function. If we need "many" qt values then it makes much more sense in terms of computation to cache the well-defined ones.

Within the 1000000 iterations values from qt are needed many times, but each of them have defined ranges of DF's (up to the max number of subjects -3 [or 2]) and alphas (two levels). Thus we make a cache of qt values prior to entering the iteration loop then we can just take a sample from the cache in stead of re-inventing the wheel for every call. Do the population of the arrays once, and rely on them during the iterations or whatever other functions that need them. The speedup can be massive both in R and in C.

Pass or fail!
ElMaestro
ElMaestro
★★★

Denmark,
2017-08-19 20:04
(1256 d 13:11 ago)

@ ElMaestro
Posting: # 17716
Views: 25,179

## The n ext crackpot iteration

Further:

When we initiate a Potvin run the assumed GMR = 0.95 or whatever is known at the time the iterations are started, they do not change as we go. Same for Target power.

Note this:
CV=c(100:1000)/1000 N=rep(0, length(CV)) for (i in 1:length(N))   N[i]=GetNps(0.95, CV[i], 0.8) ##Yes, Hötzi, plug in your own function here plot (CV, N)

Aha, within some reason the necessary number of subjects per sequence (given the GMR and the target power) varies in a simply manner with CV, perhaps this is well modeled with a polynomial?

M=lm(N~poly(CV, 4, raw=TRUE)) points(CV, fitted(M), col="red")

Oh how wonderful!! The polynomial provides a fantastic approximation within our entire interval of interest.
Let us create an approximating the function:

GuessNps=function(CV, GMR, T.Pwr) {    if ((GMR==0.95) && (T.Pwr==0.8))     Rslt = 6 -43.3322*CV + 363.4835*CV*CV -238.5417 *CV*CV*CV+ 62.7520*CV*CV*CV*CV    ##get your coefficient from summary(M) or coef(M)      ##note: We might not want to write blah^3 etc if we optimize for speed, not sure.    ##fill in the other GMR, T.Pwr scenarios here     return(round(Rslt,0)) ## perhaps ceil would be better? } 

And this one is really really fast! The initial guess may be as accurate as Zhang (haven't checked because I don't speak Zhango), but it does not rely on pnorm, qnorm, pt or qt and there is no division, sqrt, exp or ln. Therefore this is blazing fast.

All it takes is a split second of numerical gymnastics before a Potvin run (and you do not need c(100:1000)/1000 - we can even do c(10:100)/100 and get the same quality of estimate, cheaper.

Actually we might even completely abolish sample size iteration altogether because:
for (i in 1:length(N))   cat("i=",i, "Diff=", GetNps(0.95,CV[i],0.8) - GuessNps(CV[i], 0.95, 0.8),"\n") 

You see, the sample size estimates are sort of almost perfect already. If you want to remove the very few 1's and -1's then just increase the polynomial degree above.

The implementation of Potvin et al. is hereby sped up by a factor 10 gazillion

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-20 02:20
(1256 d 06:54 ago)

@ ElMaestro
Posting: # 17717
Views: 25,161

## The n ext crackpot iteration

Capt’n,

some remarks. More maybe tomorrow.

» The polynomial provides a fantastic approximation within our entire interval of interest.

Played around a little. Based on the AIC the 4th degree is the winner indeed.

» ##note: We might not want to write blah^3 etc if we optimize for speed, not sure.

Old wisdom. Here with my coefficients (total sample sizes [not N/seq], exact method for GRM 0.95, 80% power, CV 0.1–1.0).

library(microbenchmark) a <- c(5.897943, -40.988390, 603.109578, -338.281351, 70.43138) old.school <- function(a, CV) {   x <- a[1] + a[2]*CV + a[3]*CV*CV + a[4]*CV*CV*CV + a[5]*CV*CV*CV*CV   x + (2 - x %% 2) } lazy <- function(a, CV) {   x <- a[1] + a[2]*CV + a[3]*CV^2 + a[4]*CV^3 + a[5]*CV^4   x + (2 - x %% 2) } res <- microbenchmark(old.school(a, CV), lazy(a, CV), times=500L,                       control=list("random", warmup=10)) boxplot(res, boxwex=0.25, las=1) options(microbenchmark.unit="us") print(res)

» ## perhaps ceil would be better?

I would round up to the next even (as above).

» You see, the sample size estimates are sort of almost perfect already. If you want to remove the very few 1's and -1's then just increase the polynomial degree above.

That doesn’t help. With CV <- seq(0.1, 1, 0.01) I got a match in 46/91 and +2 in 45/91. OK, conservative.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-20 14:35
(1255 d 18:40 ago)

(edited by ElMaestro on 2017-08-20 14:48)
@ ElMaestro
Posting: # 17718
Views: 25,104

## The ultimate crackpot iteration!

Hi all,

what am I thinking???

For any given GMR(assumed) and any given target power we can calculate the sample size necessary to achieve the power on basis of an observed CV.

We can look at that in reverse: For any given sample size there exists a level of CV which keeps the power at the desired level. We can call that a criticial CV. If the CV is above that critical CV for any given sample size then the power is lower than the target.
So let us get the critical CV's.

Here done with D&C, I am sure this can be improved a lil bit. This time I am also giving a power.calc function And remember I am expressing it as Nps, not total number of subjects.

 alpha1=0.05 alpha2=0.0294 vecQT1=c(1:2000) vecQT2=c(1:2000) for (i in 1:2000)    {     vecQT1[i]=qt(p=1-alpha1, df=i)     vecQT2[i]=qt(p=1-alpha2, df=i)    } Power.calc=function(Nps, GMR, CV, Is.St2=0) {   s=sqrt(log(1+CV*CV))   nom1=log(1.25/GMR)   nom2=-log(1.25*GMR)   den=s*sqrt(2/(2*Nps))   df=Nps*2-2   if (Is.St2==T) df=df-1   ##cat("1-a=", 1-alpha, "df=",df,"\n")   if (Is.St2==0) q=vecQT1[df]     else         q=vecQT2[df]   pw=pt( (nom1/den) -q, df) - pt( (nom2/den) +q, df)   if (pw<0) pw=0.01   return(pw) } FindCritCV=function(Pwr.T, GMR, Nps, Is.St2=1, toler=0.000001) {   CV1=0.0123   CV2=8.7654   while (abs(Power.calc(Nps, GMR, CV1, Is.St2)) - abs(Power.calc(Nps, GMR, CV2, Is.St2))>toler)   {    CV3=0.5*(CV1+CV2)    tmp=Pwr.T-Power.calc(Nps, GMR, CV3, Is.St2)    if (tmp>0) CV2=CV3       else    CV1=CV3   }   return(0.5*(CV2+CV1) )   } vecCritCVs=rep(0,1000) for (Nps in 6:1000)   vecCritCVs[Nps]=FindCritCV(0.8, 0.95, Nps, 1, 1e-12) ##let's have a look print(vecCritCVs) ##so how do we use this in practice? GetNps=function(CV) {   Nps=1   while (vecCritCVs[Nps]<=CV) Nps=Nps+1   return(Nps)  } ## GetNps(0.165) ##should be 9, right?? Power.calc(9, 0.95, 0.1764 , Is.St2=1) ##should be a tiny bit lower than 0.8 since the critical                                        ##CV level is 0.1763761 

I hereby officially declare all sample size iterations in Potvin runs unnecessary, apart from the three-liner above Play around with tolerances and lenghts and include your own error trapping.
A good Sunday to all of you. Goodbye Zhang.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-20 15:11
(1255 d 18:03 ago)

@ ElMaestro
Posting: # 17720
Views: 25,104

## The ultimate crackpot iteration!

Hi ElMaestro,

» GetNps(0.165) ##should be 9, right??

Rather 8.

library(PowerTOST) sampleN.TOST(alpha=0.0294, CV=0.165, theta0=0.95,              targetpower=0.8, method="shifted") +++++++++++ Equivalence test - TOST +++++++++++             Sample size estimation ----------------------------------------------- Study design:  2x2 crossover log-transformed data (multiplicative model) alpha = 0.0294, target power = 0.8 BE margins = 0.8 ... 1.25 True ratio = 0.95,  CV = 0.165 Sample size (total)  n     power 16   0.801537 Approximate power calculation with shifted central t-distribution.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-20 15:28
(1255 d 17:47 ago)

@ Helmut
Posting: # 17722
Views: 25,043

## The ultimate crackpot iteration!

Hi Hötzi,

» Rather 8.

When aiming for stage 2 (see the explicit flag in code above) we need to remember to subtract that one DF.

Therefore, I think the answer to the question I asked is 9

The answer you give is related to a question about power at stage 1.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-20 16:06
(1255 d 17:09 ago)

@ ElMaestro
Posting: # 17723
Views: 25,062

## The ultimate crackpot iteration!

Hi ElMaestro,

» Therefore, I think the answer to the question I asked is 9
»
» The answer you give is related to a question about power at stage 1.

I see. Have to tweak that. However, PowerTOST is ~500 times faster than your magick.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-20 16:15
(1255 d 17:00 ago)

@ Helmut
Posting: # 17724
Views: 25,030

## The ultimate crackpot iteration!

Hi Hötzi,

» I see. Have to tweak that. However, PowerTOST is ~500 times faster than your magick.

You mean if you do the array ini as described here once before each run, and then use GetNps (or equivalent) then that will be 500x slower than using Zhang + iteration??
That would surprise me a lot. Please elaborate.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-20 18:58
(1255 d 14:17 ago)

@ ElMaestro
Posting: # 17727
Views: 25,096

## The ultimate crackpot iteration!

Hi ElMaestro,

» You mean if you do the array ini as described here once before each run, and then use GetNps (or equivalent) then that will be 500x slower than using Zhang + iteration??
» That would surprise me a lot. Please elaborate.

Had to dive deep into the guts of PowerTOST’s internal functions. Hope I got it right. Detlew?

library(PowerTOST) library(microbenchmark) sampleN.Stg2 <- function(alpha, CV, theta1=0.8, theta2=1.25,                          theta0=0.95, targetpower=0.8,                          method="shifted", imax=100) {   d.no  <- PowerTOST:::.design.no("2x2")   ades  <- PowerTOST:::.design.props(d.no)   dfe   <- PowerTOST:::.design.df(ades, robust=FALSE)   steps <- ades$steps/2 bk <- ades$bk   df    <- n <- 0   while (df < 1) {     n  <- n + 1     df <- eval(dfe) - 1   }   nmin <- as.integer(steps*trunc(n/steps))   nmin <- nmin + steps*(nmin < n)   ltheta1 <- log(theta1)   ltheta2 <- log(theta2)   diffm   <- log(theta0)   se      <- CV2se(CV)   n       <- PowerTOST:::.sampleN0_3(alpha, targetpower, ltheta1,                                      ltheta2, diffm, se, steps, bk)   if (n < nmin) n <- nmin   df  <- eval(dfe) - 1   pow <- PowerTOST:::.calc.power(alpha, ltheta1, ltheta2, diffm,                                  sem=se*sqrt(bk/n), df, method)   iter <- 0   down <- FALSE; up <- FALSE   while (pow > targetpower) {     if (n <= nmin) break     down <- TRUE     n    <- n-steps     iter <- iter + 1     df   <- eval(dfe) - 1     pow  <- PowerTOST:::.calc.power(alpha, ltheta1, ltheta2, diffm,                                     sem=se*sqrt(bk/n), df, method)     if (iter > imax) break   }   while (pow<targetpower) {     up   <- TRUE; down <- FALSE     n    <- n+steps     iter <- iter + 1     df   <- eval(dfe)-1     pow <- PowerTOST:::.calc.power(alpha, ltheta1, ltheta2, diffm,                                    sem=se*sqrt(bk/n), df, method)     if (iter > imax) break   }   return(n) } EM <- function(CV, alpha1=0.05, alpha2=0.0294, theta1=0.8,                theta2=1.25, GMR=0.95, targetpower=0.8) {   vecQT1 <- qt(p=1-alpha1, df=1:2000)   vecQT2 <- qt(p=1-alpha2, df=1:2000)   Power.calc <- function(Nps, GMR, CV, Is.St2=FALSE) {     s    <- sqrt(log(CV*CV+1))     nom1 <- log(theta2/GMR)     nom2 <- log(theta1/GMR)     Nps2 <- 2*Nps     den  <- s*sqrt(2/Nps2)     df   <- Nps2-2     if (!Is.St2) {       q <- vecQT1[df]     } else {       df <- df-1       q  <- vecQT2[df]     }     pw <- pt((nom1/den)-q, df) - pt((nom2/den)+q, df)     if (pw < 0) pw <- 0.01     return(pw)   }   FindCritCV <- function(Pwr.T, GMR, Nps, Is.St2=TRUE, toler=0.000001) {     CV1 <- 0.0123     CV2 <- 8.7654     while (abs(Power.calc(Nps, GMR, CV1, Is.St2)) -            abs(Power.calc(Nps, GMR, CV2, Is.St2)) > toler) {       CV3 <- 0.5*(CV1+CV2)       tmp <- Pwr.T-Power.calc(Nps, GMR, CV3, Is.St2)       ifelse (tmp > 0, CV2 <- CV3, CV1 <- CV3)     }     return(0.5*(CV2+CV1))   }   vecCritCVs <- numeric()   for (Nps in 2:1000)     vecCritCVs[Nps] <- FindCritCV(targetpower, GMR, Nps, Is.St2=TRUE, 1e-12)   GetNps <- function(CV) {     Nps <- 2     while (vecCritCVs[Nps] <= CV) Nps <- Nps+1     return(Nps)   }   N <- GetNps(CV)   return(N) }

Let’s see:

CV <- seq(0.1, 0.8, 0.05) N2 <- data.frame(GMR=0.95, target=0.8, CV, EM=NA, PT=NA) for (j in seq_along(CV)) {   N2[j, "EM"] <- EM(CV=CV[j])   N2[j, "PT"] <- ceiling(sampleN.Stg2(alpha=0.0294, CV=CV[j])/2) } print(N2, row.names=FALSE)   GMR target   CV  EM  PT  0.95    0.8 0.10   4   4  0.95    0.8 0.15   7   7  0.95    0.8 0.20  12  12  0.95    0.8 0.25  17  17  0.95    0.8 0.30  24  24  0.95    0.8 0.35  31  31  0.95    0.8 0.40  40  40  0.95    0.8 0.45  49  49  0.95    0.8 0.50  59  59  0.95    0.8 0.55  69  69  0.95    0.8 0.60  80  80  0.95    0.8 0.65  92  92  0.95    0.8 0.70 104 104  0.95    0.8 0.75 116 116  0.95    0.8 0.80 128 128

So far, so good. Speed?

# wrappers for nice output EM.f <- function() EM(CV=0.165) PT.f <- function() ceiling(sampleN.Stg2(alpha=0.0294, CV=0.165)/2)

Check first:

EM.f() [1] 9 PT.f() [1] 9

Looking good. Now:

res <- microbenchmark(EM.f(), PT.f(), times=50L,                       control=list("random", warmup=10)) options(microbenchmark.unit="relative") print(res) Unit: relative    expr     min       lq     mean   median       uq      max neval cld  EM.f() 1936.08 1765.027 1430.441 1304.101 1245.526 1141.647    50   b  PT.f()    1.00    1.000    1.000    1.000    1.000    1.000    50  a

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-20 19:32
(1255 d 13:43 ago)

@ Helmut
Posting: # 17728
Views: 25,026

## The ultimate crackpot iteration!

Hi Hötzi,

we are not on the same page. Again
I am not sure I get it all, but I think with this code you are calling the same shit over and over again and that is completely unnecessary.
In the beginning of everything, before you initiate the simulation, literally before you make use of the nsims object, you read out the assumed GMR and target power, and on basis of that you can define an array of critical CV's which you initialise with the correct values. Once and for all. Not for every call where you need a sample size.

Note that if you initialise the array of critical values with syntax such as
vecCritCVs<<- rep(0, 400)

or however many values you need then vecCV will be public (I think), due to "<<-".
You can then access the array of critical CV's from within the SampleN.Stg2 function.

We might end up with something like:

sampleN.Stg2.EM <- function(CV) {   #all other variable info is already accumulated thin the critical values, so:   Nps=1   while (vecCritCVs[Nps]<=CV) Nps=Nps+1   return(Nps*2)  }

Can you see what I mean now?

Pass or fail!
ElMaestro
ElMaestro
★★★

Denmark,
2017-08-21 18:13
(1254 d 15:02 ago)

(edited by ElMaestro on 2017-08-21 18:37)
@ ElMaestro
Posting: # 17729
Views: 24,892

## Suggested code

Hi Hötzi,

I imagine I was not able to explain what I meant??
I looked at the code available here.

Do this:

1. In the R code for the Potvin function, include this code:

 Power.calc=function(Nps, GMR, CV, Is.St2=0) {   s=sqrt(log(1+CV*CV))   nom1=log(1.25/GMR)   nom2=-log(1.25*GMR)   den=s*sqrt(2/(2*Nps))   df=Nps*2-2   if (Is.St2==T) df=df-1   ##cat("1-a=", 1-alpha, "df=",df,"\n")   if (Is.St2==0) q=vecQT1[df]     else         q=vecQT2[df]   pw=pt( (nom1/den) -q, df) - pt( (nom2/den) +q, df)   if (pw<0) pw=0.01   return(pw) } FindCritCV=function(Pwr.T, GMR, Nps, Is.St2=1, toler=0.000001) {   CV1=0.0123   CV2=8.7654   while (abs(          Power.calc(Nps, GMR, CV1, Is.St2) -          Power.calc(Nps, GMR, CV2, Is.St2))>toler)   {    CV3=0.5*(CV1+CV2)    tmp=Pwr.T-Power.calc(Nps, GMR, CV3, Is.St2)    if (tmp>0) CV2=CV3       else    CV1=CV3   }   return(0.5*(CV2+CV1) ) }

2. immediately before the line that says "if (setseed) set.seed(1234567)", include this code:
 MAX=1000 ##Hötzi, specify the number of values you need,          ##I don't know it from the top of my head. vecQT1<<-c(1:MAX) vecQT2<<-c(1:MAX) for (i in 1:MAX)    {     vecQT1[i]<<-qt(p=1-alpha[1], df=i)     vecQT2[i]<<-qt(p=1-alpha[2], df=i)    } vecCritCVs<<-rep(0,MAX)  for (Nps in 6:length(vecCritCVs))   vecCritCVs[Nps]<<-FindCritCV(0.8, 0.95, Nps, 1, 1e-10)   ##you can tweak this line with user setting for   ##assumed GMR and target power as you please,   ##using the values from the function call 

3. Replace the code for SampleN.Stg2 with the code from my previous post:

» sampleN.Stg2.EM <- function(CV) » { »   #all other variable info is already accumulated thin the critical values, so: »   Nps=1 »   while (vecCritCVs[Nps]<=CV) Nps=Nps+1 »   return(Nps*2)  » }
(and make sure you call this function, possibly rename it as you deem appropriate)

4. Tweak and benchmark it. The relevant benchmark is the runtime for the entire Potvin call with 1.000.000 iterations at e.g. CV=0.3 and N1=36 total or something which takes tangible computation time. You will also need to modify the Power.calc function to accomodate other acceptance ranges.

5. Eliminate all other qt calls by replacing them with lookups into the vecQT1 or vecQT2 vectors.

Note: I did not test it, I only looked at the code online, so not everything here can be assumed to be working out of the box.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-22 13:33
(1253 d 19:41 ago)

@ ElMaestro
Posting: # 17730
Views: 24,679

## Nitpicker!

Hi ElMaestro,

» I imagine I was not able to explain what I meant??

No, you were. I’m currently busy with other stuff.

» I looked at the code available here.

Better to have a look at the current version of sampsiz2.R. There you find in lines 42–46:
  # degrees of freedom as expression   # n-2 for 2x2 crossover and 2-group parallel design   dfe <- parse(text="n-2", srcfile=NULL)   # or should that read n-3? see Kieser/Rauch   #dfe <- parse(text="n-3", srcfile=NULL)

You have a point. IMHO, this question should be answered: yes! n1+n2−3 degrees of freedom mentioned already by Mdm. Povin.
Forget my last code. Should be: n2 <- n + n %% 2 - n1

n1 = 12; stage 2 sample sizes (n2), method = exact    CV sampleN.TOST pwr.TOST sampleN2.TOST pwr.sampleN2.TOST  0.10           NA  0.97308            NA           0.97308  0.15            2  0.81765             2           0.82711  0.20           12  0.83603            12           0.83875  0.25           22  0.81272            22           0.81411  0.30           36  0.81708            36           0.81775  0.35           50  0.80576            50           0.80616  0.40           68  0.81051            68           0.81075  0.45           86  0.80654            86           0.80670  0.50          106  0.80584           106           0.80595  0.55          126  0.80158           126           0.80165  0.60          148  0.80090           148           0.80096  0.65          172  0.80292           172           0.80297  0.70          196  0.80299           196           0.80303  0.75          220  0.80198           220           0.80201  0.80          244  0.80040           244           0.80042
Good news: Equal sample sizes (so we can use sampleN.TOST() till we have code specific for the 2nd stage; for studies proceeding to the 2nd stage power wrong in the 3rd decimal or less). Now one of the riddles of Potvin’s paper is resolved. Could never figure out the reported power of the examples.
1. n1 12, s²1 0.020977 ⇒ N 14, reported power 83.1%.
CV <- mse2CV(0.020977) print(round(sampleN.TOST(alpha=0.0294, CV=CV, method="shifted",                          details=FALSE, print=FALSE)[7:8], 3),       row.names=FALSE) Sample size Achieved power          14          0.837 print(round(sampleN2.TOST(alpha=0.0294, CV=CV, n1=12,                           method="shifted")[8:9], 3),       row.names=FALSE) Sample size Achieved power           2          0.831

2. n1 12, s²1 0.032634 ⇒ N 20, reported power 82.4%.
CV <- mse2CV(0.032634) print(round(sampleN.TOST(alpha=0.0294, CV=CV, method="shifted",                          details=FALSE, print=FALSE)[7:8], 3),       row.names=FALSE) Sample size Achieved power          20          0.826 print(round(sampleN2.TOST(alpha=0.0294, CV=CV, n1=12,                           method="shifted")[8:9], 3),       row.names=FALSE) Sample size Achieved power           8          0.824

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-22 17:27
(1253 d 15:47 ago)

@ Helmut
Posting: # 17733
Views: 24,682

## Nitpicker!

Hi Hötzi,

» You have a point. IMHO, this question should be answered: yes!

And on basis of this story we will define Hötzi's universal law of BE:
ElMaestro is always right1.

By the way, since you are a speed devotee:
 FindCritCV.Sec=function(Pwr.T, GMR, Nps, Is.St2=1, d=.00001, iniCV) ##the secant method has really nasty properties if the ini value is a little off. {   CV1=iniCV   CV2=CV1*1.05   y2=1   while (abs(y2)>d)   {    y1=Power.calc(Nps, GMR, CV1, Is.St2)-Pwr.T    y2=Power.calc(Nps, GMR, CV2, Is.St2)-Pwr.T    CV3= CV2-y2*(CV2-CV1)/(y2-y1)    CV1=CV2    CV2=CV3   }   return(0.5*(CV1+CV2)) ##or play around } (...)  for (Nps in 6:length(vecCritCVs))   {    if (Nps==6) iniCV=min(0.709*GMR-0.552, 0.709/GMR -0.552)                else iniCV=vecCritCVs[Nps-1]     vecCritCVs[Nps]<<-FindCritCV.Sec(0.8, GMR, Nps, 1, 1e-10, iniCV)   } (...) 

Footnote 1:
Except when he isn't. Which also happens. With regrettable frequency...

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-22 17:49
(1253 d 15:26 ago)

@ ElMaestro
Posting: # 17734
Views: 24,596

## Nitpicker!

» And on basis of this story we will define Hötzi's universal law of BE:
» ElMaestro is always right1.

Footnote accepted as an excuse.

» By the way, since you are a speed devotee:
» […]

I am. But I don’t get the point of initializing a large vector of values and later picking out the suitable one. I used the bisection algo in my youth as well. I leave this stuff to C-nerds and work on adapting our R-code instead. A median execution time of 1 ms is fast enough for me.

I have stopped reading Stephen King novels.
Richard A. O'Keefe

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-22 17:59
(1253 d 15:16 ago)

@ Helmut
Posting: # 17735
Views: 24,580

## Nitpicker!

Hi again,

» (...) I don’t get the point of initializing a large vector of values and later picking out the suitable one.

The secant method is terrible if the initial guess is wrong. Terrible as in, converges to plus/minus infinity and will give errors, so it isn't a matter of many iterations.
What the code above does: The n'th CV is probably going to be not so far from the (n-1)'th CV, and therefore it is a good guess. Therefore, if we only guess one CV reasonably (the 6th) then we get the corresponding 6'th critical CV, and from that point and onwards we can use the previously generated critical CV as the guess, i.e. when we want the 7'th value we use the 6th value as a guess, and so on. That's how these lines seem to work very fast.

You can also play around with regula falsi or whatever the heck Wikipedia calls it:

 FindCritCV.rf=function(Pwr.T, GMR, Nps, Is.St2=1, toler=0.000001) {   CV1=0.0123   CV2=8.7654   Iter=0   while (abs(CV1-CV2)>toler)   {    p1=Power.calc(Nps, GMR, CV1, Is.St2)-Pwr.T    p2=Power.calc(Nps, GMR, CV2, Is.St2)-Pwr.T    Iter=Iter+1    CV3=CV2-p2*((CV2-CV1)/(p2-p1))    ##cat("Noi=", Noi, "\n")    tmp=Power.calc(Nps, GMR, CV3, Is.St2) -Pwr.T    if (tmp<0) CV2=CV3       else    CV1=CV3    cat("CV1=", CV1, "CV2=", CV2, "dCV=", abs(CV2-CV1), "\n")   }   ##cat("Iterations RF:", Iter,"\n")   return(0.5*(CV2+CV1) ) } 

It is a little faster than bisection and does not require a good guess apart from the above. It is stable for power here since the power does not misbehave within any CV-interval of interest; there are no minima or maxima or things that can screw it badly up

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-22 19:15
(1253 d 14:00 ago)

@ ElMaestro
Posting: # 17738
Views: 24,646

## Nitpicker!

Hi ElMaestro,

» […] That's how these lines seem to work very fast.
» It is a little faster …

My Speed King!

Fast cars, fast women, fast algorithms …
what more could a man want?
Joe Mattis

Would you mind benchmarking your wonder-code (not with proc.time() since it keeps the overhead). If it’s, say >10times faster than mine (<1 ms), I’ll be interested.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2017-08-22 22:29
(1253 d 10:46 ago)

(edited by ElMaestro on 2017-08-23 00:31)
@ Helmut
Posting: # 17740
Views: 24,634

## Benchmark code

Hi Hötzi,

» Would you mind benchmarking your wonder-code (not with proc.time() since it keeps the overhead). If it’s, say >10times faster than mine (<1 ms), I’ll be interested.

I do not have your code, I don't know how to locate it in Github or CRAN, so I can't readily benchmark yours.

Below is a good starting point.
The basis is as follows. In a more or less typical Potvin run, say n1=24 and CV=0.5 we will need to enter stage 2 almost 100% of the time. So we will need to convert a CV to a sample size roughly 1.000.000 times (look up the percentage in the paper). It will need to happen whether or not the code is vectorised.

b. The present way of doing that is to convert the CV, given the assumed GMR, target power, and the alpha levels, to a sample size 1.000.000 times first via Zhang step, and then finally via a "normal" up/down step.
c. The alternative is to invest a little time in creating smart arrays before we convert any CV's (given the assumed GMR, target power, and the alpha levels), and then afterwards do lookups into the sameple size array.
Remember that my background for this is: In the digital world, if we want to dig 1.000.000 holes we will buy a shovel one time and dig all the holes with it. We will not buy a million shovels and use a new one for every hole we wish to dig.

I have written a function Bench.H1 for the alternative way.
Please load your code into Bench.H2 and do a head-on benchmark with Bench.H1 against bench.H2. I could perhaps do it too if I only knew where to locate your code.
Footnote: The code is simplified, it looks up CV=0.5 every time. It could be made more natural by rchisq'ing the CV. Feel free.

If your code is completely vectorised then I imagine you would create a vector of one million values of 0.5 and let the hounds loose on that vector. Same difference for me.

##The benchmark
Power.calc=function(Nps, GMR, CV, Is.St2=0) {   s=sqrt(log(1+CV*CV))   nom1=log(1.25/GMR)   nom2=-log(1.25*GMR)   den=s*sqrt(2/(2*Nps))   df=Nps*2-2   if (Is.St2==T) df=df-1   ##cat("1-a=", 1-alpha, "df=",df,"\n")   if (Is.St2==0) q=vecQT1[df]     else         q=vecQT2[df]   pw=pt( (nom1/den) -q, df) - pt( (nom2/den) +q, df)   ##cat("Nom1=", nom1," Nom2=", nom2, "Den=", den,"q=", q, "\n")   ##if (is.na(pw)) pw=0.01   ##if (pw<=0) pw=0.01   return(pw) } FindCritCV.Sec=function(Pwr.T, GMR, Nps, Is.St2=1, d=.00001, iniCV) {   CV1=iniCV   CV2=CV1*1.05   y2=1   while (abs(y2)>d)   {    y1=Power.calc(Nps, GMR, CV1, Is.St2)-Pwr.T    y2=Power.calc(Nps, GMR, CV2, Is.St2)-Pwr.T    CV3= CV2-y2*(CV2-CV1)/(y2-y1)    CV1=CV2    CV2=CV3   }   return(0.5*(CV1+CV2)) } GetNps=function(CV) {   Nps=1   while (vecCritCVs[Nps]<=CV) Nps=Nps+1   return(Nps)  } Bench.H1=function() {  GMR=0.95  alpha1=0.0294  alpha2=0.0294  vecQT1<<-c(1:2000)  vecQT2<<-c(1:2000)  for (i in 1:2000)    {     vecQT1[i]<<-qt(p=1-alpha1, df=i)     vecQT2[i]<<-qt(p=1-alpha2, df=i)    }  vecCritCVs<<-rep(0,500)  for (Nps in 6:length(vecCritCVs))   {    if (Nps==6) iniCV=min(0.709*GMR-0.552, 0.709/GMR -0.552)                else iniCV=vecCritCVs[Nps-1]     vecCritCVs[Nps]<<-FindCritCV.Sec(0.8, GMR, Nps, 1, 1e-9, iniCV)   }  for (i in 1:1000000)  {   SampleSize=GetNps(0.5)  }  return("foo") } Bench.H1() Bench.H2=function() {  for (i in 1:1000000)  {   SampleSize= ##Hötzi insert your call to the correct function here  }  return("bar") } ##now go and benchmark Bench.H1 against Bench.H2

Late edit: I believe we can even do something like:
Nps= 2* length(vecCritCVs[vecCritCVs<=CV])
- which is faster still and does not explicitly involve a loop.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2017-08-23 01:48
(1253 d 07:27 ago)

@ ElMaestro
Posting: # 17741
Views: 24,613

## Benchmark code

Hi ElMaestro,

I changed this part (the number of repetitions should be left to microbench).
 vecCritCVs<<-rep(0,500)  for (Nps in 6:length(vecCritCVs))   {    if (Nps==6) iniCV=min(0.709*GMR-0.552, 0.709/GMR -0.552)                else iniCV=vecCritCVs[Nps-1]     vecCritCVs[Nps]<<-FindCritCV.Sec(0.8, GMR, Nps, 1, 1e-9, iniCV)   }   n2=GetNps(0.5)  return(n2) } Bench.H1()

I ran just 10,000 cause yours was so slow. Improved over previous versions but still 260times slower than mine. BTW, Zhang’s approximation was spot on. No iterations performed at all (to be expected with a high sample size).

Unit: microseconds             expr   min    lq  mean  median    uq    max neval       Bench.H1() 27847 28381 29364 28943.8 29707 139057 10000  sampleN2.TOST()    65    75   111   111.1   143   1614 10000

» Late edit: I believe we can even do something like:
» Nps= 2* length(vecCritCVs[vecCritCVs<=CV])
» - which is faster still and does not explicitly involve a loop.

Given the results from above I didn’t try that.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes