Helmut ★★★ Vienna, Austria, 2014-05-31 20:20 (3611 d 15:19 ago) Posting: # 13024 Views: 13,215 |
|
Dear all, have you ever wondered how to find a suitable adjusted α if the desired combination of target power, expected T/R-ratio (and even the acceptance range) is not given in any of the many publications? In such a case we have to validate the framework within the desired range of n1/CV-combinations in order to demonstrate that the overall type I error is preserved. Here is my recipe – inspired by Fuglsang’s1 ideas:
Power2Stage :library(Power2Stage) I got: 12 24 36 48 60 estimated adjusted α: 0.0274 12 24 36 48 60 Get a decent cup of coffee – it takes a while (on my machine 11 min for Step 1 and 10 min for Step 2) Depending on the chosen grid expect up to a couple of daysa for Step 3. The simple grid of 75·106 sim’s took six hours to complete. For “Method B”, T/R-ratio 0.9, 90% power I got 0.0274. Slightly larger than the 0.0269 Fuglsang reported2 for “Method C”. Not surprising, since B is always more conservative than C. In other words a slightly larger α is expected to lead to a similar inflation. BTW, for “Method C” I got 0.0268 (10 & 17 min). IMHO, a nice agreementb (different software: C vs. R , different seeds of the pseudo-random generator, different power-methods: shifted t vs. noncentral t).Don’t forget the third step – regulators want to see only that (EMA: “appropriate steps must be taken to preserve the overall type I error of the experiment” and “the choice of how much alpha to spend at the interim analysis is at the company’s discretion”). If you want to introduce a futility criterion (e.g., an upper total sample size or even fiddle with usePE=TRUE ), simulating power is crucial in order to avoid a nasty surprise.
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
d_labes ★★★ Berlin, Germany, 2014-06-02 10:28 (3610 d 01:11 ago) @ Helmut Posting: # 13025 Views: 10,879 |
|
Dear Helmut! Thanks for sharing this code! Just one minor comment or better question: How is your experience in regulatory acceptance of Potvin's 'acceptable' alpha-inflation of 0.052? If I remember correctly there was some rumour that even a smaller value of the empirical alpha has to be seen as inflation. That would mean that we had to down weight the adj. alpha to some extent. — Regards, Detlew |
Helmut ★★★ Vienna, Austria, 2014-06-02 16:44 (3609 d 18:55 ago) @ d_labes Posting: # 13026 Views: 11,084 |
|
Dear Detlew! ❝ Thanks for sharing this code! My pleasure! I think that we still need some improvements:
Overall α (step 3) for this combo decreases from 0.051255 to 0.049795… If we follow this track I expect lower alphas than in any of the published papers. Does that make sense? ❝ How is your experience in regulatory acceptance of Potvin's 'acceptable' alpha-inflation of 0.052? Mixed. Some European (!) regulators don’t like (‼) TSDs at all, but accept them if following “Method B” (quote: “according to the guideline…”). In one case Austria’s AGES asked for a posteriori confirmation of “lacking inflation” of “Method C” based on the actual sample size and CV in the study (was 0.0494 with a 95% CI of 0.0490–0.0498). Duno what might have happened if 0.05 <α ≤0.052. According to Chinese whispers ≤0.051 is considered acceptable. Why? Duno. The maximum inflation in Potvin’s paper for “Method B” is 0.0485 and for “Method C” 0.051. Maybe someone read the wrong column. ❝ If I remember correctly there was some rumour that even a smaller value of the empirical alpha has to be seen as inflation. If I recall correctly that’s the personal opinion of the Austrian member or EMA’s Biostatistics Working Party. Recently a member of the PKWP told me how he made peace with TSDs – after years of lurking doubt: “The inflation would be relevant only if the CI in the study covers exactly 80–125%. Since in real life the CI is narrower, the actual patient’s risk – even if there would be a small inflation due to the method – likely is ≪5%. So I don’t bother any more.” Pragmatic approach. ❝ That would mean that we had to down weight the adj. alpha to some extent. By throwing away all published papers and increasing the downloads of Power2Stage ? Of course PQRI’s Sequential Design Working Group’s “negligible inflation” (≤0.052) is arbitrary – as are many other rules we have to observe in BE. BTW, only Montague’s “Method D” scratches 0.052. Results of the publications:
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
Helmut ★★★ Vienna, Austria, 2014-06-03 00:07 (3609 d 11:32 ago) @ d_labes Posting: # 13028 Views: 11,149 |
|
Dear Detlew! Below the results of my simulations (I used α 0.0304 for “Method B” and 0.0282 for “Method C”). Power2Stage is an amazing piece – 73 minutes for Methods B and C! It took the PQRI 1½ years to come up with their simulations in Compaq Visual Fortran. Method B: Empiric type I error; Pot = Potvin’s 0.0294, HB = homebrew’s 0.0304.
Method C: Empiric type I error; Pot = Potvin’s 0.0294, HB = homebrew’s 0.0282.
With the new alphas no (!) significant inflation for both methods. Largest observed in “Method B” 0.050111 (at 36/0.4) and in “Method C” 0.049984 (at 12/0.2). I’m getting the impression that if PQRI would have had a closer look right from the start (instead of coming up with a ‘one size fits all’ α and playing with a “negligible inflation”), maybe we could have avoided all those effectless discussions we had the last years. BTW, in Montague’s paper I read “The simulations were performed using R […].” Nice to know. Then “A different randomly selected seed was used for each scenario.” Why? Shall we switch to setseed=FALSE in Power2Stage ?Anders’ algo suggests 0.027 (instead of 0.028) for “Method D”. Sim’s running. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
d_labes ★★★ Berlin, Germany, 2014-06-03 10:47 (3609 d 00:52 ago) @ Helmut Posting: # 13030 Views: 10,829 |
|
Dear Helmut! Again ! That's worth a paper. ❝ I’m getting the impression that if PQRI would have had a closer look right from the start (instead of coming up with a ‘one size fits all’ α and playing with a “negligible inflation”), maybe we could have avoided all those effectless discussions we had the last years. I think their approach was driven by the ability to perform only a limited number of study simulations per day, especially if the number of subjects rises. Thus they have taken Pocock's 'universal' constant and simply looked what happened. Instead of taking the approach as what it is: These nominal alpha's are lacking any theoretical justification and are used on a purely empirical basis for the BE decision in a crossover with interim sample size adaption. And thus should be adapted to this problem. As they have done later in method D and Montague’s paper. And as Anders had done in his epoch-making paper A. Fuglsang "Controlling type I errors for two-stage bioequivalence study designs" Clinical Research and Regulatory Affairs, 2011; 28(4): 100–105 ❝ BTW, in Montague’s paper I read “The simulations were performed using R […].” Nice to know. R rulez ! ❝ Then “A different randomly selected seed was used for each scenario.” Why? Shall we switch to As I read this sentence they used a different seed, randomly selected, but fixed for each scenario (CV, n1) to be protected against artefacts resulting from the starting point of the pseudo-random number generator. IMHO using a different but fixed seed or a single fixed seed doesn't make a big difference if we simulate with 1E6 sims. Setting setseed=FALSE in Power2Stage would have the drawback of differing results even for one scenario (CV, n1) if run again.BTW: What is Anders algo? His C programs? — Regards, Detlew |
Helmut ★★★ Vienna, Austria, 2014-06-03 15:49 (3608 d 19:50 ago) @ d_labes Posting: # 13032 Views: 10,814 |
|
Dear Detlew! ❝ That's worth a paper. Who’s going to write one? ❝ I think their approach was driven by […] Agree. Maybe their choice was (partly?) strategic. Since Pocock’s 0.0294 has a tradition of decades of accepted use in phase III, they opted for a smooth introduction: “We used it for such a long time and it seems to work here as well…” In Section 5 they wrote: “It is our understanding that the FDA has accepted studies with designs like those considered here.” If I recall it correctly they were referring to an NDA (bonus question: Guess the α!). Donald Schuirmann wrote the biostatistical assessment based on simulations he performed and accepted the study. I have to dig out the reference. ❝ As they have done later in method D and Montague’s paper. Yep – but not consequently enough. Their maximum inflation of 0.0518 is the largest one of all papers published so far. Again I assume they tried to keep it simple: “We have shown in our first paper that for a T/R of 0.95 D’s 0.0280 is more conservative than C’s 0.0294. Let’s give it a try with a T/R of 0.9. Hey, <0.052 – let’s publish.” Method D: Empiric type I error; Mont = Montague’s 0.0280, HB = homebrew’s 0.0270 (runtime one hour).
❝ IMHO using a different but fixed seed or a single fixed seed doesn't make a big difference if we simulate with 1E6 sims. ❝ Setting Agree. ❝ BTW: What is Anders algo? The linear regression of type I errors vs. αadj in order to find αadj leading to TIE 0.05 (implemented in the second step of my code). I borrowed the idea from his 2011 paper you quoted. I’m afraid regulators likely will not accept his method of data-driven adjusting α in the interim step (EMA: “The plan to use a two-stage approach must be pre-specified in the protocol along with the adjusted significance levels to be used for each of the analyses”). ❝ His C programs? I have just a few of them. Given your last improvements Power2Stage is almost as fast as his compiled stuff. PS: Seems to be a popular topic. ~100 visits / day so far.… — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
Helmut ★★★ Vienna, Austria, 2014-10-13 16:53 (3476 d 18:46 ago) @ d_labes Posting: # 13692 Views: 10,358 |
|
Dear Detlew, have you every wondered where the magick 0.0294 comes from? In R we can do better, of course: require(mvtnorm) Nitpicking as usual. Compare at the location of maximum inflation … Method C: alpha0= 0.05, alpha (s1/s2)= 0.02938572 0.02938572 … with the usual stuff: Method C: alpha0= 0.05, alpha (s1/s2)= 0.0294 0.0294 — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2014-10-13 17:30 (3476 d 18:09 ago) @ Helmut Posting: # 13693 Views: 10,018 |
|
Hi Hötzi, ❝ ❝ ❝ It's a scandal!! Please contact the editor in chief and demand an immediate erratum — Pass or fail! ElMaestro |
d_labes ★★★ Berlin, Germany, 2014-10-14 10:56 (3476 d 00:43 ago) @ Helmut Posting: # 13695 Views: 9,991 |
|
Dear Helmut, ❝ ❝ ❝ ❝ ❝ ❝ Another magick number? — Regards, Detlew |
Helmut ★★★ Vienna, Austria, 2014-10-14 15:36 (3475 d 20:03 ago) @ d_labes Posting: # 13701 Views: 10,033 |
|
Dear Detlew, ❝ ❝ ❝ Require or not require, that's the question . Oh, I didn’t know that. I assumed that require() attachs a package only if is not already attached.Should have RTFM, which states: “ require is designed for use inside other functions …”❝ ❝ ❝ ❝ Another magick number? Exactly. In Pocock’s paper both z and α’ are rounded (2.178, 0.0294). I had to introduce my magick number in order to end up with 2.178. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |