ElMaestro ★★★ Denmark, 2017-08-19 17:04 (2772 d 04:05 ago) Posting: # 17709 Views: 30,902 |
|
Hi all, just wanted to share a little algo for the initial sample size guess for the Potvin methods. This is an area where speed gains are possible. A good initial sample size guess can eliminate quite some computation time. As I never got Zhang's method I played a little around with an entirely empirical idea. My idea is that at least the upper part of a curve of power as function of sample size can be modeled as a kind of sigmoid function. I googled around to find a few expressions for such curves. The logistic function is one such. We can write the approximation something like: Power=1/(1+exp(-k(N-x0))) where N is the sample size. If we have two estimates of power p1, p2 at sample sizes nps1 and nps2 (I use number of subjects per sequence, abbreviated as nps), then we can determine the constants:
a2=log(1/p2 - 1) We can then solve for the desired power Pwr.T: nps= x0+ (log(1/Pwr.T -1) / (-k)) - and this should of course converted to integer. It turns out in my application to work really, really well. The "only" issue is if the initial nps1 and nps2 are chosen poorly; p1 needs to be "not too close" to zero and p2 needs to be "not too close" to 1. But that's it. My current implementation is this: GetStartNps.X3=function(GMR, CV, Pwr.T) Works well for me for assumed GMR's close to 1 (such as 0.95) and I am sure if anyone cares to fiddle with it it can be improved much, much further to work well in "all" scenarios. Play around with the constants in red to get the optimization that works for you. ![]() — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-19 18:06 (2772 d 03:03 ago) @ ElMaestro Posting: # 17710 Views: 29,495 |
|
Hi ElMaestro, I tried GetStartNps.X3(GMR=0.95, CV=0.2, Pwr.T=0.8) and was honored withError in Power.calc(nps1, GMR, CV, Is.St2 = 1) : Can you provide the function Power.calc() , please?My standard code gives:
How does it compare to yours? — Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-19 18:14 (2772 d 02:55 ago) @ Helmut Posting: # 17711 Views: 29,396 |
|
Hi Hötzi, ❝ ❝ Did I specify the arguments wrongly? That's why I included the remark: ##equation in Potvin et al. I thought you'd be plugging in your own, but be that as it may. An implementation of the function is here: alpha1=0.0294 Edit: Forgot to write: For classical B the alphas are (0.0294, 0.0294) and for C they are (0.05, 0.0294) though that isn't related to the topic. Second, Hötzi, you gave a little table which I am not very sure how to read. Note this isn't about the sample size calculation itself, this is about the initial guess used for the sample size calculation by up/down iteration. — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-19 19:12 (2772 d 01:57 ago) @ ElMaestro Posting: # 17712 Views: 29,571 |
|
Hi ElMaestro, ❝ That's why I included the remark: ❝ ❝ I thought you'd be plugging in your own, but be that as it may. I see. So with your code I end up with 10/seq or 20 in stage 1. That’s actually the sample size of the fixed design. Try this ( ASN is the expected average total sample size, pwr is the probability to show BE in stage 1 or 2, stop the chance to end if stage 1, and pct.stg2 the percentage of studies proceeding to stage 2):
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-19 19:17 (2772 d 01:53 ago) @ Helmut Posting: # 17713 Views: 29,434 |
|
Hi Hötzi, are you sure we are on the same page? I am talking about the initial guess for the sample size iteration, not the sample size itself. — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-19 19:33 (2772 d 01:36 ago) @ ElMaestro Posting: # 17714 Views: 29,483 |
|
Hi ElMaestro, ❝ are you sure we are on the same page? ![]() ![]() ❝ I am talking about the initial guess for the sample size iteration, not the sample size itself. Now I get it – though your code is beyond me. Zhang’s method in many cases hits the right point in the first attempt. Sometimes sampleN.TOST() has to iterate upwards (since Zhang’s method is based on a large sample approximation and power with anything of the t-family will be lower). In very rare cases sampleN.TOST() has to iterate downwards. Can your code do that as well?— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-19 19:56 (2772 d 01:13 ago) @ Helmut Posting: # 17715 Views: 29,534 |
|
Hi again, ❝ Now I get it – though your code is beyond me. Zhang’s method in many cases hits the right point in the first attempt. Sometimes Yes. Basically, get the initial initial guess, determine if it is low or high for the desired power, iterate up or down depending on that. The reason I am doing this is I got a mental "Does not compute error" when I read Zhang's paper. The goal is to get to the right sample size in as few cycles as possible. — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-20 16:40 (2771 d 04:29 ago) @ ElMaestro Posting: # 17719 Views: 29,227 |
|
Hi ElMaestro, ❝ ❝ ❝ ❝ ❝ ❝ ❝ Later down you makes calls to vecQT1[df] and vecQT2[df] . That’s awfully slooow. Why not simply call qt(p=foo, df=bar) directly?
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-20 17:22 (2771 d 03:48 ago) @ Helmut Posting: # 17721 Views: 29,390 |
|
Hi Hötzi, ❝ Later down you makes calls to That kind of benchmarking is misleading. Here you are doing the vectorise operation of array generation numerous times which will never happens in practice; you only need to initialise the arrays once within every run of 1000000 iterations, possible even keep the arrays/vectors across runs. I agree that whole "once" operation is faster when you use vecQT1 <- qt(p=1-alpha1, df=1:5000) (I had no idea you could do that, actually) , but if you benchmark the entire you you will see the difference may be a few microsecs out of numerous seconds or minutes. Saving 0.0001% time? That's not optimization, it beautification or achieving the same with fewer lines of code which isn't what I am after. I am not very experienced with vectorised functions at all - I understand code better if loops are visible. ![]() — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-20 18:23 (2771 d 02:46 ago) @ ElMaestro Posting: # 17725 Views: 29,182 |
|
Hi ElMaestro, I don’t understand why you use vectors at all. What ’bout:
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-20 19:22 (2771 d 01:47 ago) @ Helmut Posting: # 17726 Views: 29,318 |
|
Hi Hötzi, ❝ I don’t understand why you use vectors at all. What ’bout: ❝
❝ if (!Is.St2) { ❝ } else { ❝ q <- qt(p=1-alpha2, df=df) ❝ } Simple, but I think I aready told you: qt is a computationally very expensive function. If we need "many" qt values then it makes much more sense in terms of computation to cache the well-defined ones. Within the 1000000 iterations values from qt are needed many times, but each of them have defined ranges of DF's (up to the max number of subjects -3 [or 2 ![]() — Pass or fail! ElMaestro |
ElMaestro ★★★ Denmark, 2017-08-19 22:04 (2771 d 23:06 ago) @ ElMaestro Posting: # 17716 Views: 29,602 |
|
Further: When we initiate a Potvin run the assumed GMR = 0.95 or whatever is known at the time the iterations are started, they do not change as we go. Same for Target power. Note this: CV=c(100:1000)/1000 Aha, within some reason the necessary number of subjects per sequence (given the GMR and the target power) varies in a simply manner with CV, perhaps this is well modeled with a polynomial? M=lm(N~poly(CV, 4, raw=TRUE)) Oh how wonderful!! The polynomial provides a fantastic approximation within our entire interval of interest. Let us create an approximating the function: GuessNps=function(CV, GMR, T.Pwr) And this one is really really fast! The initial guess may be as accurate as Zhang (haven't checked because I don't speak Zhango), but it does not rely on pnorm, qnorm, pt or qt and there is no division, sqrt, exp or ln. Therefore this is blazing fast. All it takes is a split second of numerical gymnastics before a Potvin run (and you do not need c(100:1000)/1000 - we can even do c(10:100)/100 and get the same quality of estimate, cheaper. Actually we might even completely abolish sample size iteration altogether because: for (i in 1:length(N)) You see, the sample size estimates are sort of almost perfect already. If you want to remove the very few 1's and -1's then just increase the polynomial degree above. The implementation of Potvin et al. is hereby sped up by a factor 10 gazillion ![]() ![]() ![]() ![]() ![]() — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-20 04:20 (2771 d 16:49 ago) @ ElMaestro Posting: # 17717 Views: 29,338 |
|
Capt’n, some remarks. More maybe tomorrow. ❝ The polynomial provides a fantastic approximation within our entire interval of interest. Played around a little. Based on the AIC the 4th degree is the winner indeed. ❝ Old wisdom. Here with my coefficients (total sample sizes [not N/seq], exact method for GRM 0.95, 80% power, CV 0.1–1.0).
❝ I would round up to the next even (as above). ❝ You see, the sample size estimates are sort of almost perfect already. If you want to remove the very few 1's and -1's then just increase the polynomial degree above. That doesn’t help. With CV <- seq(0.1, 1, 0.01) I got a match in 46/91 and +2 in 45/91. OK, conservative.— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-20 16:35 (2771 d 04:34 ago) @ ElMaestro Posting: # 17718 Views: 29,302 |
|
Hi all, what am I thinking??? For any given GMR(assumed) and any given target power we can calculate the sample size necessary to achieve the power on basis of an observed CV. We can look at that in reverse: For any given sample size there exists a level of CV which keeps the power at the desired level. We can call that a criticial CV. If the CV is above that critical CV for any given sample size then the power is lower than the target. So let us get the critical CV's. Here done with D&C, I am sure this can be improved a lil bit. This time I am also giving a power.calc function ![]() ![]() ![]() ![]()
alpha1=0.05 I hereby officially declare all sample size iterations in Potvin runs unnecessary, apart from the three-liner above ![]() ![]() ![]() A good Sunday to all of you. Goodbye Zhang. — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-20 17:11 (2771 d 03:58 ago) @ ElMaestro Posting: # 17720 Views: 29,410 |
|
Hi ElMaestro, ❝ GetNps(0.165) ##should be 9, right?? Rather 8.
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-20 17:28 (2771 d 03:42 ago) @ Helmut Posting: # 17722 Views: 29,130 |
|
Hi Hötzi, ❝ Rather 8. When aiming for stage 2 (see the explicit flag in code above) we need to remember to subtract that one DF. Therefore, I think the answer to the question I asked is 9 ![]() The answer you give is related to a question about power at stage 1. — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-20 18:06 (2771 d 03:03 ago) @ ElMaestro Posting: # 17723 Views: 29,273 |
|
Hi ElMaestro, ❝ Therefore, I think the answer to the question I asked is 9 ❝ ❝ The answer you give is related to a question about power at stage 1. I see. Have to tweak that. However, PowerTOST is ~500 times faster than your magick. ![]() — Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-20 18:15 (2771 d 02:55 ago) @ Helmut Posting: # 17724 Views: 29,160 |
|
Hi Hötzi, ❝ I see. Have to tweak that. However, You mean if you do the array ini as described here once before each run, and then use GetNps (or equivalent) then that will be 500x slower than using Zhang + iteration?? That would surprise me a lot. Please elaborate. — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-20 20:58 (2771 d 00:11 ago) @ ElMaestro Posting: # 17727 Views: 29,279 |
|
Hi ElMaestro, ❝ You mean if you do the array ini as described here once before each run, and then use GetNps (or equivalent) then that will be 500x slower than using Zhang + iteration?? ❝ That would surprise me a lot. Please elaborate. Had to dive deep into the guts of PowerTOST ’s internal functions. Hope I got it right. Detlew?library(PowerTOST) Let’s see: CV <- seq(0.1, 0.8, 0.05) So far, so good. Speed? # wrappers for nice output Check first: EM.f() Looking good. Now: res <- microbenchmark(EM.f(), PT.f(), times=50L, — Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-20 21:32 (2770 d 23:37 ago) @ Helmut Posting: # 17728 Views: 29,318 |
|
Hi Hötzi, we are not on the same page. Again ![]() ![]() I am not sure I get it all, but I think with this code you are calling the same shit over and over again and that is completely unnecessary. In the beginning of everything, before you initiate the simulation, literally before you make use of the nsims object, you read out the assumed GMR and target power, and on basis of that you can define an array of critical CV's which you initialise with the correct values. Once and for all. Not for every call where you need a sample size.Note that if you initialise the array of critical values with syntax such as vecCritCVs<<- rep(0, 400) or however many values you need then vecCV will be public (I think), due to "<<-" .You can then access the array of critical CV's from within the SampleN.Stg2 function.We might end up with something like: sampleN.Stg2.EM <- function(CV) Can you see what I mean now? ![]() ![]() — Pass or fail! ElMaestro |
ElMaestro ★★★ Denmark, 2017-08-21 20:13 (2770 d 00:56 ago) @ ElMaestro Posting: # 17729 Views: 29,129 |
|
Hi Hötzi, I imagine I was not able to explain what I meant?? I looked at the code available here. Do this: 1. In the R code for the Potvin function, include this code:
Power.calc=function(Nps, GMR, CV, Is.St2=0) 2. immediately before the line that says "if (setseed) set.seed(1234567)", include this code:
MAX=1000 ##Hötzi, specify the number of values you need, 3. Replace the code for SampleN.Stg2 with the code from my previous post: ❝ ❝ { ❝ Nps=1 ❝ while (vecCritCVs[Nps]<=CV) Nps=Nps+1 ❝ return(Nps*2) ❝ } 4. Tweak and benchmark it. The relevant benchmark is the runtime for the entire Potvin call with 1.000.000 iterations at e.g. CV=0.3 and N1=36 total or something which takes tangible computation time. You will also need to modify the Power.calc function to accomodate other acceptance ranges. 5. Eliminate all other qt calls by replacing them with lookups into the vecQT1 or vecQT2 vectors. Note: I did not test it, I only looked at the code online, so not everything here can be assumed to be working out of the box. — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-22 15:33 (2769 d 05:36 ago) @ ElMaestro Posting: # 17730 Views: 28,906 |
|
Hi ElMaestro, ❝ I imagine I was not able to explain what I meant?? No, you were. I’m currently busy with other stuff. ❝ I looked at the code available here. Better to have a look at the current version of sampsiz2.R. There you find in lines 42–46: # degrees of freedom as expression You have a point. IMHO, this question should be answered: yes! n1+n2−3 degrees of freedom mentioned already by Mdm. Povin. Forget my last code. Should be: n2 <- n + n %% 2 - n1 n1 = 12; stage 2 sample sizes (n2), method = exact Good news: Equal sample sizes (so we can use sampleN.TOST() till we have code specific for the 2nd stage; for studies proceeding to the 2nd stage power wrong in the 3rd decimal or less). Now one of the riddles of Potvin’s paper is resolved. Could never figure out the reported power of the examples.
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-22 19:27 (2769 d 01:42 ago) @ Helmut Posting: # 17733 Views: 28,926 |
|
Hi Hötzi, ❝ You have a point. IMHO, this question should be answered: yes! And on basis of this story we will define Hötzi's universal law of BE: ElMaestro is always right1. By the way, since you are a speed devotee:
FindCritCV.Sec=function(Pwr.T, GMR, Nps, Is.St2=1, d=.00001, iniCV) Footnote 1: Except when he isn't. Which also happens. With regrettable frequency... ![]() — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-22 19:49 (2769 d 01:21 ago) @ ElMaestro Posting: # 17734 Views: 29,013 |
|
❝ And on basis of this story we will define Hötzi's universal law of BE: ❝ ElMaestro is always right1. Footnote accepted as an excuse. ❝ By the way, since you are a speed devotee: ❝ I am. But I don’t get the point of initializing a large vector of values and later picking out the suitable one. I used the bisection algo in my youth as well. I leave this stuff to C-nerds and work on adapting our R-code instead. A median execution time of 1 ms is fast enough for me. I have stopped reading Stephen King novels. Now I just read C code instead. Richard A. O'Keefe — Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-22 19:59 (2769 d 01:11 ago) @ Helmut Posting: # 17735 Views: 28,972 |
|
Hi again, ❝ (...) I don’t get the point of initializing a large vector of values and later picking out the suitable one. The secant method is terrible if the initial guess is wrong. Terrible as in, converges to plus/minus infinity and will give errors, so it isn't a matter of many iterations. What the code above does: The n'th CV is probably going to be not so far from the (n-1)'th CV, and therefore it is a good guess. Therefore, if we only guess one CV reasonably (the 6th) then we get the corresponding 6'th critical CV, and from that point and onwards we can use the previously generated critical CV as the guess, i.e. when we want the 7'th value we use the 6th value as a guess, and so on. That's how these lines seem to work very fast. You can also play around with regula falsi or whatever the heck Wikipedia calls it:
FindCritCV.rf=function(Pwr.T, GMR, Nps, Is.St2=1, toler=0.000001) It is a little faster than bisection and does not require a good guess apart from the above. It is stable for power here since the power does not misbehave within any CV-interval of interest; there are no minima or maxima or things that can screw it badly up ![]() — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-22 21:15 (2768 d 23:54 ago) @ ElMaestro Posting: # 17738 Views: 29,021 |
|
Hi ElMaestro, ❝ […] That's how these lines seem to work very fast. ❝ It is a little faster … My ![]() Fast cars, fast women, fast algorithms … what more could a man want? Joe Mattis Would you mind benchmarking your wonder-code (not with proc.time() since it keeps the overhead). If it’s, say >10times faster than mine (<1 ms), I’ll be interested. ![]() — Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-08-23 00:29 (2768 d 20:40 ago) @ Helmut Posting: # 17740 Views: 28,776 |
|
Hi Hötzi, ❝ Would you mind benchmarking your wonder-code (not with I do not have your code, I don't know how to locate it in Github or CRAN, so I can't readily benchmark yours. Below is a good starting point. The basis is as follows. In a more or less typical Potvin run, say n1=24 and CV=0.5 we will need to enter stage 2 almost 100% of the time. So we will need to convert a CV to a sample size roughly 1.000.000 times (look up the percentage in the paper). It will need to happen whether or not the code is vectorised. b. The present way of doing that is to convert the CV, given the assumed GMR, target power, and the alpha levels, to a sample size 1.000.000 times first via Zhang step, and then finally via a "normal" up/down step. c. The alternative is to invest a little time in creating smart arrays before we convert any CV's (given the assumed GMR, target power, and the alpha levels), and then afterwards do lookups into the sameple size array. Remember that my background for this is: In the digital world, if we want to dig 1.000.000 holes we will buy a shovel one time and dig all the holes with it. We will not buy a million shovels and use a new one for every hole we wish to dig. I have written a function Bench.H1 for the alternative way. Please load your code into Bench.H2 and do a head-on benchmark with Bench.H1 against bench.H2. I could perhaps do it too if I only knew where to locate your code. Footnote: The code is simplified, it looks up CV=0.5 every time. It could be made more natural by rchisq'ing the CV. Feel free. If your code is completely vectorised then I imagine you would create a vector of one million values of 0.5 and let the hounds loose on that vector. Same difference for me. ![]() ##The benchmark Power.calc=function(Nps, GMR, CV, Is.St2=0) Late edit: I believe we can even do something like: Nps= 2* length(vecCritCVs[vecCritCVs<=CV]) - which is faster still and does not explicitly involve a loop. — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-08-23 03:48 (2768 d 17:21 ago) @ ElMaestro Posting: # 17741 Views: 29,226 |
|
Hi ElMaestro, I changed this part (the number of repetitions should be left to microbench ). vecCritCVs<<-rep(0,500) I ran just 10,000 cause yours was so slow. Improved over previous versions but still 260times slower than mine. BTW, Zhang’s approximation was spot on. No iterations performed at all (to be expected with a high sample size). Unit: microseconds ❝ Late edit: I believe we can even do something like: ❝ ❝ - which is faster still and does not explicitly involve a loop. Given the results from above I didn’t try that. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |