ElMaestro ★★★ Belgium?, 20170819 15:04 Posting: # 17709 Views: 24,316 

Hi all, just wanted to share a little algo for the initial sample size guess for the Potvin methods. This is an area where speed gains are possible. A good initial sample size guess can eliminate quite some computation time. As I never got Zhang's method I played a little around with an entirely empirical idea. My idea is that at least the upper part of a curve of power as function of sample size can be modeled as a kind of sigmoid function. I googled around to find a few expressions for such curves. The logistic function is one such. We can write the approximation something like: Power=1/(1+exp(k(Nx0))) where N is the sample size. If we have two estimates of power p1, p2 at sample sizes nps1 and nps2 (I use number of subjects per sequence, abbreviated as nps), then we can determine the constants:
a2=log(1/p2  1) We can then solve for the desired power Pwr.T: nps= x0+ (log(1/Pwr.T 1) / (k))  and this should of course converted to integer. It turns out in my application to work really, really well. The "only" issue is if the initial nps1 and nps2 are chosen poorly; p1 needs to be "not too close" to zero and p2 needs to be "not too close" to 1. But that's it. My current implementation is this: GetStartNps.X3=function(GMR, CV, Pwr.T) Works well for me for assumed GMR's close to 1 (such as 0.95) and I am sure if anyone cares to fiddle with it it can be improved much, much further to work well in "all" scenarios. Play around with the constants in red to get the optimization that works for you. — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170819 16:06 @ ElMaestro Posting: # 17710 Views: 23,544 

Hi ElMaestro, I tried GetStartNps.X3(GMR=0.95, CV=0.2, Pwr.T=0.8) and was honored withError in Power.calc(nps1, GMR, CV, Is.St2 = 1) : Can you provide the function Power.calc() , please?My standard code gives:
How does it compare to yours? — Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170819 16:14 (edited by ElMaestro on 20170819 17:14) @ Helmut Posting: # 17711 Views: 23,543 

Hi Hötzi, » Error in Power.calc(nps1, GMR, CV, Is.St2 = 1) :
» could not find function "Power.calc" » Did I specify the arguments wrongly? That's why I included the remark: ##equation in Potvin et al. I thought you'd be plugging in your own, but be that as it may. An implementation of the function is here: alpha1=0.0294 Edit: Forgot to write: For classical B the alphas are (0.0294, 0.0294) and for C they are (0.05, 0.0294) though that isn't related to the topic. Second, Hötzi, you gave a little table which I am not very sure how to read. Note this isn't about the sample size calculation itself, this is about the initial guess used for the sample size calculation by up/down iteration. — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170819 17:12 @ ElMaestro Posting: # 17712 Views: 23,530 

Hi ElMaestro, » That's why I included the remark: » ##equation in Potvin et al. » I thought you'd be plugging in your own, but be that as it may. I see. So with your code I end up with 10/seq or 20 in stage 1. That’s actually the sample size of the fixed design. Try this ( ASN is the expected average total sample size, pwr is the probability to show BE in stage 1 or 2, stop the chance to end if stage 1, and pct.stg2 the percentage of studies proceeding to stage 2):
— Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170819 17:17 @ Helmut Posting: # 17713 Views: 23,495 

Hi Hötzi, are you sure we are on the same page? I am talking about the initial guess for the sample size iteration, not the sample size itself. — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170819 17:33 @ ElMaestro Posting: # 17714 Views: 23,525 

Hi ElMaestro, » are you sure we are on the same page? No. Completely misunderstood you. » I am talking about the initial guess for the sample size iteration, not the sample size itself. Now I get it – though your code is beyond me. Zhang’s method in many cases hits the right point in the first attempt. Sometimes sampleN.TOST() has to iterate upwards (since Zhang’s method is based on a large sample approximation and power with anything of the tfamily will be lower). In very rare cases sampleN.TOST() has to iterate downwards. Can your code do that as well?— Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170819 17:56 @ Helmut Posting: # 17715 Views: 23,573 

Hi again, » Now I get it – though your code is beyond me. Zhang’s method in many cases hits the right point in the first attempt. Sometimes sampleN.TOST() has to iterate upwards (since Zhang’s method is based on a large sample approximation and power with anything of the tfamily will be lower). In very rare cases sampleN.TOST() has to iterate downwards. Can your code do that as well?Yes. Basically, get the initial initial guess, determine if it is low or high for the desired power, iterate up or down depending on that. The reason I am doing this is I got a mental "Does not compute error" when I read Zhang's paper. The goal is to get to the right sample size in as few cycles as possible. — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170820 14:40 @ ElMaestro Posting: # 17719 Views: 23,386 

Hi ElMaestro, » vecQT1=c(1:5000) » vecQT2=c(1:5000) » for (i in 1:5000) » { » vecQT1[i]=qt(p=1alpha1, df=i) » vecQT2[i]=qt(p=1alpha2, df=i) » } Later down you makes calls to vecQT1[df] and vecQT2[df] . That’s awfully sl^{o}o_{o}w. Why not simply call qt(p=foo, df=bar) directly?
— Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170820 15:22 @ Helmut Posting: # 17721 Views: 23,329 

Hi Hötzi, » Later down you makes calls to vecQT1[df] and vecQT2[df] . That’s awfully sl^{o}o_{o}w. Why not simply call qt(p=foo, df=bar) directly?That kind of benchmarking is misleading. Here you are doing the vectorise operation of array generation numerous times which will never happens in practice; you only need to initialise the arrays once within every run of 1000000 iterations, possible even keep the arrays/vectors across runs. I agree that whole "once" operation is faster when you use vecQT1 < qt(p=1alpha1, df=1:5000) (I had no idea you could do that, actually) , but if you benchmark the entire you you will see the difference may be a few microsecs out of numerous seconds or minutes. Saving 0.0001% time? That's not optimization, it beautification or achieving the same with fewer lines of code which isn't what I am after. I am not very experienced with vectorised functions at all  I understand code better if loops are visible. That's just me, of course. — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170820 16:23 @ ElMaestro Posting: # 17725 Views: 23,332 

Hi ElMaestro, I don’t understand why you use vectors at all. What ’bout:
— Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170820 17:22 @ Helmut Posting: # 17726 Views: 23,343 

Hi Hötzi, » I don’t understand why you use vectors at all. What ’bout: »
Simple, but I think I aready told you: qt is a computationally very expensive function. If we need "many" qt values then it makes much more sense in terms of computation to cache the welldefined ones. Within the 1000000 iterations values from qt are needed many times, but each of them have defined ranges of DF's (up to the max number of subjects 3 [or 2]) and alphas (two levels). Thus we make a cache of qt values prior to entering the iteration loop then we can just take a sample from the cache in stead of reinventing the wheel for every call. Do the population of the arrays once, and rely on them during the iterations or whatever other functions that need them. The speedup can be massive both in R and in C. — I could be wrong, but... Best regards, ElMaestro 
ElMaestro ★★★ Belgium?, 20170819 20:04 @ ElMaestro Posting: # 17716 Views: 23,468 

Further: When we initiate a Potvin run the assumed GMR = 0.95 or whatever is known at the time the iterations are started, they do not change as we go. Same for Target power. Note this: CV=c(100:1000)/1000 Aha, within some reason the necessary number of subjects per sequence (given the GMR and the target power) varies in a simply manner with CV, perhaps this is well modeled with a polynomial? M=lm(N~poly(CV, 4, raw=TRUE)) Oh how wonderful!! The polynomial provides a fantastic approximation within our entire interval of interest. Let us create an approximating the function: GuessNps=function(CV, GMR, T.Pwr) And this one is really really fast! The initial guess may be as accurate as Zhang (haven't checked because I don't speak Zhango), but it does not rely on pnorm, qnorm, pt or qt and there is no division, sqrt, exp or ln. Therefore this is blazing fast. All it takes is a split second of numerical gymnastics before a Potvin run (and you do not need c(100:1000)/1000  we can even do c(10:100)/100 and get the same quality of estimate, cheaper. Actually we might even completely abolish sample size iteration altogether because: for (i in 1:length(N)) You see, the sample size estimates are sort of almost perfect already. If you want to remove the very few 1's and 1's then just increase the polynomial degree above. The implementation of Potvin et al. is hereby sped up by a factor 10 gazillion — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170820 02:20 @ ElMaestro Posting: # 17717 Views: 23,455 

Capt’n, some remarks. More maybe tomorrow. » The polynomial provides a fantastic approximation within our entire interval of interest. Played around a little. Based on the AIC the 4^{th} degree is the winner indeed. » ##note: We might not want to write blah^3 etc if we optimize for speed, not sure. Old wisdom. Here with my coefficients (total sample sizes [not N/seq], exact method for GRM 0.95, 80% power, CV 0.1–1.0).
» ## perhaps ceil would be better? I would round up to the next even (as above). » You see, the sample size estimates are sort of almost perfect already. If you want to remove the very few 1's and 1's then just increase the polynomial degree above. That doesn’t help. With CV < seq(0.1, 1, 0.01) I got a match in 46/91 and +2 in 45/91. OK, conservative.— Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170820 14:35 (edited by ElMaestro on 20170820 14:48) @ ElMaestro Posting: # 17718 Views: 23,399 

Hi all, what am I thinking??? For any given GMR(assumed) and any given target power we can calculate the sample size necessary to achieve the power on basis of an observed CV. We can look at that in reverse: For any given sample size there exists a level of CV which keeps the power at the desired level. We can call that a criticial CV. If the CV is above that critical CV for any given sample size then the power is lower than the target. So let us get the critical CV's. Here done with D&C, I am sure this can be improved a lil bit. This time I am also giving a power.calc function And remember I am expressing it as Nps, not total number of subjects.
alpha1=0.05 I hereby officially declare all sample size iterations in Potvin runs unnecessary, apart from the threeliner above Play around with tolerances and lenghts and include your own error trapping. A good Sunday to all of you. Goodbye Zhang. — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170820 15:11 @ ElMaestro Posting: # 17720 Views: 23,389 

Hi ElMaestro, » GetNps(0.165) ##should be 9, right?? Rather 8.
— Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170820 15:28 @ Helmut Posting: # 17722 Views: 23,331 

Hi Hötzi, » Rather 8. When aiming for stage 2 (see the explicit flag in code above) we need to remember to subtract that one DF. Therefore, I think the answer to the question I asked is 9 The answer you give is related to a question about power at stage 1. — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170820 16:06 @ ElMaestro Posting: # 17723 Views: 23,365 

Hi ElMaestro, » Therefore, I think the answer to the question I asked is 9 » » The answer you give is related to a question about power at stage 1. I see. Have to tweak that. However, PowerTOST is ~500 times faster than your magick. — Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170820 16:15 @ Helmut Posting: # 17724 Views: 23,324 

Hi Hötzi, » I see. Have to tweak that. However, PowerTOST is ~500 times faster than your magick. You mean if you do the array ini as described here once before each run, and then use GetNps (or equivalent) then that will be 500x slower than using Zhang + iteration?? That would surprise me a lot. Please elaborate. — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170820 18:58 @ ElMaestro Posting: # 17727 Views: 23,365 

Hi ElMaestro, » You mean if you do the array ini as described here once before each run, and then use GetNps (or equivalent) then that will be 500x slower than using Zhang + iteration?? » That would surprise me a lot. Please elaborate. Had to dive deep into the guts of PowerTOST ’s internal functions. Hope I got it right. Detlew?library(PowerTOST) Let’s see: CV < seq(0.1, 0.8, 0.05) So far, so good. Speed? # wrappers for nice output Check first: EM.f() Looking good. Now: res < microbenchmark(EM.f(), PT.f(), times=50L, — Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170820 19:32 @ Helmut Posting: # 17728 Views: 23,321 

Hi Hötzi, we are not on the same page. Again I am not sure I get it all, but I think with this code you are calling the same shit over and over again and that is completely unnecessary. In the beginning of everything, before you initiate the simulation, literally before you make use of the nsims object, you read out the assumed GMR and target power, and on basis of that you can define an array of critical CV's which you initialise with the correct values. Once and for all. Not for every call where you need a sample size.Note that if you initialise the array of critical values with syntax such as vecCritCVs<< rep(0, 400) or however many values you need then vecCV will be public (I think), due to "<<" .You can then access the array of critical CV's from within the SampleN.Stg2 function.We might end up with something like: sampleN.Stg2.EM < function(CV) Can you see what I mean now? — I could be wrong, but... Best regards, ElMaestro 
ElMaestro ★★★ Belgium?, 20170821 18:13 (edited by ElMaestro on 20170821 18:37) @ ElMaestro Posting: # 17729 Views: 23,180 

Hi Hötzi, I imagine I was not able to explain what I meant?? I looked at the code available here. Do this: 1. In the R code for the Potvin function, include this code:
Power.calc=function(Nps, GMR, CV, Is.St2=0) 2. immediately before the line that says "if (setseed) set.seed(1234567)", include this code:
MAX=1000 ##Hötzi, specify the number of values you need, 3. Replace the code for SampleN.Stg2 with the code from my previous post: » sampleN.Stg2.EM < function(CV) » { (and make sure you call this function, possibly rename it as you deem appropriate) 4. Tweak and benchmark it. The relevant benchmark is the runtime for the entire Potvin call with 1.000.000 iterations at e.g. CV=0.3 and N1=36 total or something which takes tangible computation time. You will also need to modify the Power.calc function to accomodate other acceptance ranges. 5. Eliminate all other qt calls by replacing them with lookups into the vecQT1 or vecQT2 vectors. Note: I did not test it, I only looked at the code online, so not everything here can be assumed to be working out of the box. — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170822 13:33 @ ElMaestro Posting: # 17730 Views: 22,952 

Hi ElMaestro, » I imagine I was not able to explain what I meant?? No, you were. I’m currently busy with other stuff. » I looked at the code available here. Better to have a look at the current version of sampsiz2.R. There you find in lines 42–46: # degrees of freedom as expression You have a point. IMHO, this question should be answered: yes! n_{1}+n_{2}−3 degrees of freedom mentioned already by Mdm. Povin. Forget my last code. Should be: n2 < n + n %% 2  n1 n1 = 12; stage 2 sample sizes (n2), method = exact Good news: Equal sample sizes (so we can use sampleN.TOST() till we have code specific for the 2^{nd} stage; for studies proceeding to the 2^{nd} stage power wrong in the 3^{rd} decimal or less). Now one of the riddles of Potvin’s paper is resolved. Could never figure out the reported power of the examples.
— Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170822 17:27 @ Helmut Posting: # 17733 Views: 22,966 

Hi Hötzi, » You have a point. IMHO, this question should be answered: yes! And on basis of this story we will define Hötzi's universal law of BE: ElMaestro is always right^{1}. By the way, since you are a speed devotee:
FindCritCV.Sec=function(Pwr.T, GMR, Nps, Is.St2=1, d=.00001, iniCV) Footnote 1: Except when he isn't. Which also happens. With regrettable frequency... — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170822 17:49 @ ElMaestro Posting: # 17734 Views: 22,896 

» And on basis of this story we will define Hötzi's universal law of BE: » ElMaestro is always right^{1}. Footnote accepted as an excuse. » By the way, since you are a speed devotee: » […] I am. But I don’t get the point of initializing a large vector of values and later picking out the suitable one. I used the bisection algo in my youth as well. I leave this stuff to Cnerds and work on adapting our Rcode instead. A median execution time of 1 ms is fast enough for me. I have stopped reading Stephen King novels. Now I just read C code instead. Richard A. O'Keefe — Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170822 17:59 @ Helmut Posting: # 17735 Views: 22,873 

Hi again, » (...) I don’t get the point of initializing a large vector of values and later picking out the suitable one. The secant method is terrible if the initial guess is wrong. Terrible as in, converges to plus/minus infinity and will give errors, so it isn't a matter of many iterations. What the code above does: The n'th CV is probably going to be not so far from the (n1)'th CV, and therefore it is a good guess. Therefore, if we only guess one CV reasonably (the 6th) then we get the corresponding 6'th critical CV, and from that point and onwards we can use the previously generated critical CV as the guess, i.e. when we want the 7'th value we use the 6th value as a guess, and so on. That's how these lines seem to work very fast. You can also play around with regula falsi or whatever the heck Wikipedia calls it:
FindCritCV.rf=function(Pwr.T, GMR, Nps, Is.St2=1, toler=0.000001) It is a little faster than bisection and does not require a good guess apart from the above. It is stable for power here since the power does not misbehave within any CVinterval of interest; there are no minima or maxima or things that can screw it badly up — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170822 19:15 @ ElMaestro Posting: # 17738 Views: 22,920 

Hi ElMaestro, » […] That's how these lines seem to work very fast. » It is a little faster … My Speed King! Fast cars, fast women, fast algorithms … what more could a man want? Joe Mattis Would you mind benchmarking your wondercode (not with proc.time() since it keeps the overhead). If it’s, say >10times faster than mine (<1 ms), I’ll be interested. — Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20170822 22:29 (edited by ElMaestro on 20170823 00:31) @ Helmut Posting: # 17740 Views: 22,924 

Hi Hötzi, » Would you mind benchmarking your wondercode (not with proc.time() since it keeps the overhead). If it’s, say >10times faster than mine (<1 ms), I’ll be interested. I do not have your code, I don't know how to locate it in Github or CRAN, so I can't readily benchmark yours. Below is a good starting point. The basis is as follows. In a more or less typical Potvin run, say n1=24 and CV=0.5 we will need to enter stage 2 almost 100% of the time. So we will need to convert a CV to a sample size roughly 1.000.000 times (look up the percentage in the paper). It will need to happen whether or not the code is vectorised. b. The present way of doing that is to convert the CV, given the assumed GMR, target power, and the alpha levels, to a sample size 1.000.000 times first via Zhang step, and then finally via a "normal" up/down step. c. The alternative is to invest a little time in creating smart arrays before we convert any CV's (given the assumed GMR, target power, and the alpha levels), and then afterwards do lookups into the sameple size array. Remember that my background for this is: In the digital world, if we want to dig 1.000.000 holes we will buy a shovel one time and dig all the holes with it. We will not buy a million shovels and use a new one for every hole we wish to dig. I have written a function Bench.H1 for the alternative way. Please load your code into Bench.H2 and do a headon benchmark with Bench.H1 against bench.H2. I could perhaps do it too if I only knew where to locate your code. Footnote: The code is simplified, it looks up CV=0.5 every time. It could be made more natural by rchisq'ing the CV. Feel free. If your code is completely vectorised then I imagine you would create a vector of one million values of 0.5 and let the hounds loose on that vector. Same difference for me. ##The benchmark Power.calc=function(Nps, GMR, CV, Is.St2=0) Late edit: I believe we can even do something like: Nps= 2* length(vecCritCVs[vecCritCVs<=CV])  which is faster still and does not explicitly involve a loop. — I could be wrong, but... Best regards, ElMaestro 
Helmut ★★★ Vienna, Austria, 20170823 01:48 @ ElMaestro Posting: # 17741 Views: 22,908 

Hi ElMaestro, I changed this part (the number of repetitions should be left to microbench ). vecCritCVs<<rep(0,500) I ran just 10,000 cause yours was so slow. Improved over previous versions but still 260times slower than mine. BTW, Zhang’s approximation was spot on. No iterations performed at all (to be expected with a high sample size). Unit: microseconds » Late edit: I believe we can even do something like: » Nps= 2* length(vecCritCVs[vecCritCVs<=CV]) »  which is faster still and does not explicitly involve a loop. Given the results from above I didn’t try that. — Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 