Helmut ★★★ Vienna, Austria, 2018-04-21 19:17 (2364 d 02:11 ago) Posting: # 18714 Views: 19,830 |
|
Dear all, in version 0.5-1 of package Power2Stage exact methods1,2,3 are implemented – after months of struggling (many THX to Ben). The methods are extremely flexible (arbitrary BE-limits and target power, futility criteria on the PE, its CI, and the maximum total sample size, adapting for the PE of stage 1).I’ve heard in the past that regulatory statisticians in the EU prefer methods which strictly control the Type I Error (however, at the 3rd GBHI conference in Amsterdam last week it was clear that methods based on simulations are perfectly fine for the FDA) and the inverse normal method with repeated confidence intervals would be the method of choice. Well roared lion; wasn’t aware of software which can do this job. That’s like saying “Fly to Mars but you are not allowed to use a rocket!” What else? Levitation? Witchcraft? Obtaining two p-values (like in TOST) is fairly easy but to convert them into a confidence interval (as required in all guidelines) not trivial. Despite we showed this approach4 a while ago, nothing was published in a peer-reviewed journal until very recently. Although we have a method now which demonstrated to control the TIE, I was curious how it performs in simulations (just to set it into perspective). R-code at the end of the post (with small step sizes of CV and n1 expect runtimes of some hours; in large simulations I don’t recommend pmethod="exact" – about 10times slower than pmethod="nct" ). See the documentation of function power.tsd.in() about how to set futility criteria and make it fully adaptive. As usual in the latter case say goodbye to power…I explored scenarios, maximum combination test
Plots of the first four scenarios: When exploring the details it is also clear that the exact method keeps the desired power better than the simulation methods in extreme cases. Power of scenario #5 (a) and modifications:
R-code
In memory of Willi Maurer, Dr. sc. math. ETH — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
Helmut ★★★ Vienna, Austria, 2018-04-21 22:33 (2363 d 22:55 ago) @ Helmut Posting: # 18715 Views: 18,349 |
|
Dear all, answering my own post in order to keep it short. In the following an example. We have a guesstimate of the CV (0.20), assume a GMR of 0.95, and aim at power 0.80. No futility criteria. Some regulatory statisticians told me to prefer a first stage as estimated for a fixed-sample design (i.e., the second stage is solely a ‘safety net’).
In the method the weights have to be pre-specified, stated in the SAP, and used throughout subsequent steps (irrespective of the re-estimated n2). In the fixed-sample design we would need 20 subjects. How to set the weights? An intuitive way is to use the x̃ (20) of the total sample size based on simulations. This would give us weights of [1, 0]. Great. But weights have to be >0 and <1. Hence, I tweaked them a little to [0.999999, 0.000001]. What can we expect if we run the study with n1 20?
Fine. If everything turns out as expected we have to be unlucky to need a second stage. Power in the first is already 0.73 and stage 2 sample sizes are not shocking. As common in TSDs the overall power is generally higher than in a fixed-sample design. We perform the first stage and get GMR 0.91 and CV 0.25. Oops! Both are worse than assumed. Especially the GMR is painful.
We fail to show BE (lower CL 77.31%) and should initiate the second stage with 24 subjects. How would a ‘Type 1’ TSD perform?
Pretty similar though a lower n2 is suggested. OK, we perform the second stage and get GMR 0.93 and CV 0.21. Both are slightly better than what we got in the first stage but again worse than assumed.
We survived. In a ‘Type 1’ TSD we would get:
Pretty similar again. If we state it in the protocol, we could also aim for higher power in the second stage if the GMR in the first doesn’t look nice. If we switch to 0.90 we would run the second stage with 36 subjects.
Helps. Another option would be to adjust for GMR1 by using the argument usePE=TRUE in interim.tsd.in() . For power 0.80 that would mean 40 subjects in the second stage and for 0.90 already 62…— Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2018-04-21 22:49 (2363 d 22:39 ago) @ Helmut Posting: # 18716 Views: 17,916 |
|
Hi Hötzi, thank you for this post. What is "the inverse normal method with repeated confidence intervals" ? — Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2018-04-21 23:41 (2363 d 21:48 ago) @ ElMaestro Posting: # 18717 Views: 18,295 |
|
Hi ElMaestro, flow chart (futility of the CI, unrestricted total sample size): Details: — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
mittyri ★★ Russia, 2018-04-28 17:54 (2357 d 03:34 ago) @ Helmut Posting: # 18737 Views: 17,506 |
|
Hi Helmut, sorry for naive questions raised from my hazelnut brain 1. I'm trying to compare the old function power.tsd(method = c("B", "C", "B0"), alpha0 = 0.05, alpha = c(0.0294, 0.0294), with a new one power.tsd.in(alpha, weight, max.comb.test = TRUE, n1, CV, targetpower = 0.8, So the old function was nice since the user can choose the method or specify 3 alphas. In the new one I see the comment regarding alpha If one element is given, the overall one-sided significance level. If two elements are given, the adjusted one-sided alpha levels for stage 1 and stage 2, respectively. If missing, defaults to 0.05. What about alpha0 for method C? Is it deprecated? 2. Why did you decide to include CI futility rule by default? 3. Regarding your flowchart: isn't it possible that we get some value lower than 4? power.tsd.in(CV=0.13, n1=12) for example and after first stage CV=15%, CI=[0.7991897 1.0361745]: sampleN2.TOST(CV=0.15, n1=12) 4. Is it possible to update the docs attached to the library? 5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time Are there any reasons to double that functions? PS: regarding 3rd point: I tried interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No") oh, there's a default argument min.n2 = 4 OK, let's try to change that: > interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No", min.n2 = 2) Why couldn't I select a smaller one? — Kind regards, Mittyri |
Helmut ★★★ Vienna, Austria, 2018-04-28 19:29 (2357 d 02:00 ago) @ mittyri Posting: # 18738 Views: 17,359 |
|
Hi Mittyri, I’m in a hurry; so answering only part of your questions (leave the others to Detlew or Ben). ❝ 2. Why did you decide to include CI futility rule by default? This applies only to the x.tsd.in functions (to be in accordance with the paper of Maurer et al.).❝ 3. Regarding your flowchart: ❝ isn't it possible that we get some value lower than 4? ❝ for example and after first stage CV=15%, CI=[0.7991897 1.0361745]: ❝ ❝ 2x2 0.0294 0.15 0.95 0.8 1.25 12 2 sampleN2.TOST() is intended for the other methods where at the end stages are pooled.In the inverse normal method stages are evaluated separately (PE and MSE from ANOVAS of each stage). If you have less than 4 subjects in the second stage you will run out of steam (too low degrees of freedom). Well, 3 would work, but… ❝ 5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time ❝ Are there any reasons to double that functions? Since this is a 0.x-release according to CRAN’s policy we can rename functions or even remove them without further notice. We decided to unify the function-names. In order not to break existing code we introduced the aliases. In the next release functions x.2stage.x() will be removed and only their counterparts x.tsd.x() kept.❝ regarding 3rd point: ❝ I tried ❝ ❝ […] ❝ - Calculated n2 = 4 ❝ - Decision: Continue to stage 2 with 4 subjects ❝ oh, there's a default argument min.n2 = 4 ❝ OK, let's try to change that: ❝ ❝ Error in interim.tsd.in(GMR1 = sqrt(0.7991897 * 1.0361745), CV1 = 0.15, : ❝ Why couldn't I select a smaller one? See above. Doesn’t make sense with zero degrees of freedom (n2=2). — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
d_labes ★★★ Berlin, Germany, 2018-04-29 23:11 (2355 d 22:17 ago) @ mittyri Posting: # 18741 Views: 17,332 |
|
Dear Michael, just my two cents. ❝ 1. I'm trying to compare the old function ❝ ... ❝ So the old function was nice since the user can choose the method or specify 3 alphas. ❝ In the new one I see the comment regarding alpha ❝ If one element is given, the overall one-sided significance level. If two elements are given, the adjusted one-sided alpha levels for stage 1 and stage 2, respectively. ❝ If missing, defaults to 0.05. ❝ What about alpha0 for method C? Is it deprecated? Sorry for confusion, but you definitely have to study the references (start with 1)) to get a clue whats going on with this new function(s) implementing a new method for evaluating TSDs. New in the sense that it was not implemented in Power2Stage and was not applied in the evaluation of TSDs up to now. It's by no means a method adding to or amending the Potvin methods. It is a new method with a different philosophy behind. And this method, combination of p-values of applying the TOST with the data of the two stages separately, is said to control the TIE rate at <=0.05, regardless of what design changes are done during interim, e.g. re-estimation of the sample size at interim analysis. And this is not proven by simulations, but in theory, by proof. A feature which is demanded by EMA statisticians. Do you remember the statement "Potvin's methods are not valid / acceptable in Europe"? Exept Russia which is at least to some extent also in Europe IIRC... ❝ 2. Why did you decide to include CI futility rule by default? See Helmut's answer. Maurer et al. have included a CI futility rule in their paper. And it's our behavior to set defaults according to the (first) paper(s) describing TSD evaluation methods. Ok, that may be sub-optimal in comparing methods, since you always have to remember the defaults and differences within them for different functions. But, ... The re-calculation or verification of results comes first. And my lazyness calls for defaults resembling the details done in the paper(s) after wich a function in Power2Stage was implemented. ❝ 3. Regarding your flowchart: ❝ isn't it possible that we get some value lower than 4? See Helmut's answer. Since min.n2<4 doesn't make sense it is restricted to >=4. As described in the Maurer et al. paper. ❝ 4. Is it possible to update the docs attached to the library? Not quite clear for me what we should update. Could you please elaborate? ❝ 5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time ❝ Are there any reasons to double that functions? The real reason behind this change is lazyness of mine (sic!). It saves me 3(!) keystrokes . Believe it or not ... Don't hesitate to ask more "naive" question. We all here, including not at least me, are naive with respect to this new method of evaluating TSDs. If you feel more comfortable ask me or Helmut or Ben via the private way. I.e. write to the maintainer of Power2Stage . 1) Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018;37(10):1–21. doi:10.1002/sim.7614. Drop me a Mehl if you need that sheet of paper. — Regards, Detlew |
mittyri ★★ Russia, 2018-04-30 15:41 (2355 d 05:47 ago) @ d_labes Posting: # 18744 Views: 17,379 |
|
Dear Detlew, Dear Helmut, I'm very sorry for that post, looks like I'm out of current state of TSDs... OK, I need to review the paper since certainly that's a new standard like Potvin's paper before. ❝ ❝ 4. Is it possible to update the docs attached to the library? ❝ ❝ Not quite clear for me what we should update. Could you please elaborate? I looked into Power2Stage/doc and I've found that updated on Jan-2016. — Kind regards, Mittyri |
d_labes ★★★ Berlin, Germany, 2018-04-25 16:19 (2360 d 05:09 ago) @ Helmut Posting: # 18729 Views: 17,577 |
|
Dear Helmut, great post. Only one remark about the weights you choose for the maximum combination test in your R code. ❝ ❝ n[j] <- sampleN.TOST(alpha=alpha, CV=CV[j], theta0=GMR, theta1=theta1, ❝ theta2=theta2, targetpower=targetpower, ❝ print=FALSE, details=FALSE)[["Sample size"]] ❝ if (n[j] < 12) n[j] <- 12 ❝ for (k in seq_along(n1)) { ❝ # median of expected total sample size as a 'best guess' ❝ n.tot <- power.tsd.in(alpha=alpha, CV=CV[j], n1=n1[k], GMR=GMR, ❝ usePE=usePE, theta1=theta1, theta2=theta2, ❝ targetpower=targetpower, fCrit=fCrit, ❝ fClower=fClower, fCNmax=fCNmax, pmethod=pmethod, ❝ npct=0.5)$nperc[["50%"]] ❝ w <- c(n1[k], n.tot - n1[k]) / n.tot ❝ # force extreme weights if expected to stop in stage 1 with n1 ❝ if (w[1] == 1) w <- w + c(-1, +1) * 1e-6 ❝ ... Defining the weights that way is IMHO not what you intended. Or I don't understand what you intended. It is correct if you think in terms of the standard combination test and think further that you have to specify two weights for that. But since the two weights are connected by w, 1-w the second is calculated within the function power.tsd.in() automatically. You only need to define w[1] in the input argument.The idea behind the maximum combination test now is: If our first pair of weights w, 1-w (chosen anyhow) is not "optimal", choose a second pair of weights w*, 1-w* which is more adapted to the real situation.If you were too optimistic in your planing of n2, i.e. have chosen n2 too low compared to what really happens in the sample size adaption, it would be wise to define w* lower than w. You do that, but your choice ( w in w[1]=0.999999, w* in w[2]=1e-6 ) is too extreme I think and not your intention I further think. The second pair of weights w*=1e-6, 1-w*=0.999999 here is for a situation were the p-values from the second stage nearly exclusively determine the overall outcome of the maximum combination test. The p-values from the first stage data are down-weighted with w*=1e-6 .Hope this sermon is not too confusing. BTW: Choosing the weights "optimal" is for me a mystery. To do that, we had to know the outcomes of the two stages, but we don't have them until the study has been done. On the other hand we have to predefine them to gain strict TIE control. Hier beißt sich die Katze in den Schwanz. — Regards, Detlew |
Helmut ★★★ Vienna, Austria, 2018-04-26 11:51 (2359 d 09:38 ago) @ d_labes Posting: # 18733 Views: 17,412 |
|
Dear Detlew, ❝ Defining the weights that way is IMHO not what you intended. OK, I see! ❝ BTW: Choosing the weights "optimal" is for me a mystery. To do that, we had to know the outcomes of the two stages, but we don't have them until the study has been done. On the other hand we have to predefine them to gain strict TIE control. Hier beißt sich die Katze in den Schwanz. Using the median of n.tot to define the weights from the sim’s was a – maybe too naïve – attempt. Other suggestions? Some regulatory statisticians prefer the first stage in a TSD to be like in a fixed sample design. For some combinations of n1/CV in my grid this will be ≤ the median of n.tot. In other words, I’m not too optimistic but rather too pessimistic. Now what? Example: CV 0.1, GMR 0.95, target power 0.80. Fixed sample design’s n 8 (n1 ⇒ 12 acc. to GLs). n.mean and median of n.tot 12 with the default weights (0.5, 0.25). Even the 95% percentile of n.tot is 12. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
d_labes ★★★ Berlin, Germany, 2018-04-26 22:02 (2358 d 23:26 ago) @ Helmut Posting: # 18734 Views: 17,475 |
|
Dear Helmut, ❝ ... ❝ Using the median of n.tot to define the weights from the sim’s was a – maybe too naïve – attempt. Other suggestions? Some regulatory statisticians prefer the first stage in a TSD to be like in a fixed sample design. For some combinations of n1/CV in my grid this will be ≤ the median of n.tot. In other words, I’m not too optimistic but rather too pessimistic. Now what? As I already said, DUNO really. ❝ Example: CV 0.1, GMR 0.95, target power 0.80. Fixed sample design’s n 8 (n1 ⇒ 12 acc. to GLs). n.mean and median of n.tot 12 with the default weights (0.5, 0.25). Even the 95% percentile of n.tot is 12. ❝ If you were pesssimistic, so in the spirit of the MCT ist would be wise to choose the second pair of weights with decreased value. Or do I err here ("real" n2 lower than the pessimistic)? If I'm right, possible values could be: w=0.999, w*=0.5 (or something like that value) Or we stay for that extremal case with the standard combination test? But to state it again: For me it is a mystery how to choose the weights. But I think it doesn't make so much difference if we are not totally wrong with our choosen weights. As far as I have seen so far for a small number of examples: The power is influenced only to a "minor" extent. The TIE is controlled, whatsoever weights we choose. — Regards, Detlew |
d_labes ★★★ Berlin, Germany, 2018-05-09 15:53 (2346 d 05:35 ago) @ Helmut Posting: # 18757 Views: 17,206 |
|
Dear Helmut, I have tried to demystify some aspects of choosing w and w* for the maximum combination test by looking into some examples: Take nfix as sample size in stage 1 (Helmut’s proposal) Guess: CV=0.2, theta0=0.95 -> nfix = 20 Choose n1 = nfix = 20, i.e. w= 0.99, since w has to be <1. Guess was too pessimistic: e.g. true CV=0.15 -> nfix = 12 or theta0=0.975 -> nfix = 16 For both the sample size for stage 1 exceed the necessary total sample size of a fixed design. Thus a more realistic w* can’t be defined or should be set to the same value as w. This results in the standard combination test. Guess was too optimistic: e.g. true CV=0.25 -> nfix = 28 or theta0=0.925 -> nfix = 26 Both lead to a ‘more realistic’ w*= 0.71 or 0.77. Let's choose w* = 0.7 for simplicity. Power & sample size of the scenarios
N Take nfix/2 as sample size in stage 1 (Maurer et al.) Guess: CV=0.2, theta0=0.95 -> nfix = 20 Choose n1 = nfix/2 = 10, i.e. w= 0.5. Guess was too pessimistic: e.g. true CV=0.15 -> nfix = 12 or theta0=0.975 -> nfix = 16 This would let to a ‘more realistic’ w*= 0.83 or 0.625, respectively. Let's take for simplicity w* = 0.7. Guess was too optimistic: e.g. true CV=0.25 -> nfix = 28 or theta0=0.925 -> nfix = 26 Both lead to a ‘more realistic’ w*= 0.36 or 0.38. Let's take for simplicity w* = 0.4. Power & sample size of the scenarios
N Confusion :
— Regards, Detlew |
Ben ★ 2018-06-10 22:12 (2313 d 23:16 ago) @ d_labes Posting: # 18880 Views: 16,717 |
|
Dear All, Sorry for my rather late reply. A lot of very good comments have been made around this new function power.tsd.in . I hope there will be applications and more investigations in the future regarding this. As Detlew already mentioned: the type 1 error is controlled regardless of the scenario (we know it by theory, no simulations needed). This makes it very valuable in my opinion.I try to comment on some points made. ❝ What about alpha0 for method C? Is it deprecated? I hope you (mittyri) had a look into the references and found some hints on it. alpha0 just does not exist in this method. For the inverse normal method we always need (only) two adjusted alpha values, one for stage 1 and another for stage 2. The fact that the function also allows you to specify only one value is for your convenience, it will then calculate the adjusted ones internally.❝ isn't it possible that we get some value lower than 4? It actually is never possible to get a smaller sample size. All sample size functions used in this R package give at least 4 and thus this criterion implicitly applies to all functions within Power2Stage. Comment on the weights: Detlew already pointed out some important remarks. I can only highlight again that the standard combination test for the inverse normal method already uses 2 weights, but only one (w) needs to be specified because the second is just 1-w. For the maximum combination test we have 2 pairs of weights (so 4 in total), but again only the first ones of the two pairs are relevant. Those two first weights need to be spedified in the argument weight .❝ Some regulatory statisticians told me to prefer a first stage as estimated for a fixed-sample design (i.e., the second stage is solely a ‘safety net’). Sounds interesting. At first this sounds nice, but I am a bit puzzled about it. "Safety net" sounds like we have a rather good understanding about the CV but in case we observe some unforeseen value we have the possibility to add some extra subjects. However, in such a case we could just go with a fixed design and adapt the Power. In a TSD setting we typically have no good understanding about the CV... Do I miss something here? Based on what assumptions would we select n1 (= fixed design sample size)? We typically have some range of possible values and we don't know where we will be. For n1 I would then rather use the lower end of this range. Comments? More comments on the weights: Usual practice when dealing with adaptive designs was to define not just n1 but also n2 (the theory and examples were introduced for superiority designs). One way of doing that is by calculating a fixed study design sample size and then ask yourself after what fraction we want to make a look into the data+. This is done by Maurer et al and they choose 50% for the interim look. So n1 equals n2. If we assume that all subjects are evaluable this would give us a weight w of 0.5 for the first pair of weights. For superiority trials it is common practice not to go below n2 (in case the second stage is performed). Thus: if we want to apply the maximum combination test, a second weight w* being greater than w does not make sense. For the BE setting it seems this is all different. Here, n2 is flexible and can also be lower than the initially planned one. In fact, the initially planned stage 2 sample size is not really formally defined (although it theoretically exists - at least in case you calculate n1 according to some fixed design sample size). This makes the decision regarding how to define the two pairs of weights even harder. There is no unique way of defining the weights. One could for example perform some optimization procedure (with a side condition that fixes either power or sample size). Unfortunately I currently don't have an ideal solution solution to this either. + Note that the intention for the interim look may be very different for superiority trials than for BE trials. ❝ Take nfix/2 as sample size in stage 1 (Maurer et al.) ❝ ... ❝ Too optimistic 0.25 0.95 0.5 0.4 0.822 37.1 34 78 ❝ ... ❝ • Too pessimistic specifications result in higher power and lower expected sample size (!) (at least for CVs around 0.2) ❝ • Too optimistic specifications may result in lower power and higher expected sample size (!) Brings us back to: We should plan with a realistic/slightly optimistic scenario. Best regards, Ben. |
Helmut ★★★ Vienna, Austria, 2018-06-11 15:57 (2313 d 05:31 ago) @ Ben Posting: # 18883 Views: 16,844 |
|
Hi Ben, ❝ I hope there will be applications and more investigations in the future regarding this. So do I – once we solved the mystery of finding a “suitable” n1 and specifying “appropriate” weights. ❝ ❝ Some regulatory statisticians told me to prefer a first stage as estimated for a fixed-sample design (i.e., the second stage is solely a ‘safety net’). ❝ ❝ Sounds interesting. At first this sounds nice, but I am a bit puzzled about it. "Safety net" sounds like we have a rather good understanding about the CV … Not necessarily good but a “guesstimate”. ❝ … but in case we observe some unforeseen value we have the possibility to add some extra subjects. ❝ However, in such a case we could just go with a fixed design and adapt the Power. I’m not sure what you mean here. In a fixed sample design I would rather work with the upper CL of the CV or – if not available – assume a reasonably higher CV than my original guess rather than fiddling around with power. ❝ In a TSD setting we typically have no good understanding about the CV... Do I miss something here? (1) Yep and (2) no. ❝ Based on what assumptions would we select n1 (= fixed design sample size)? We typically have some range of possible values and we don't know where we will be. I was just quoting a regulatory statistician (don’t want to out him). Others didn’t contradict him. So likely he wasn’t alone with his point of view. ❝ For n1 I would then rather use the lower end of this range. Comments? Very interesting. I expected that the sample size penalty (n2) will be higher if we use a low n1. Of course it all depends on which CV we observe in stage 1.
The pessimistic approach would be crazy. ❝ More comments on the weights: Have to chew on that… ❝ Brings us back to: We should plan with a realistic/slightly optimistic scenario. Seems so. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
Ben ★ 2018-06-12 21:14 (2312 d 00:14 ago) @ Helmut Posting: # 18892 Views: 16,443 |
|
Hi Helmut, ❝ Not necessarily good but a “guesstimate”. ❝ ❝ … but in case we observe some unforeseen value we have the possibility to add some extra subjects. ❝ ❝ However, in such a case we could just go with a fixed design and adapt the Power. ❝ ❝ I’m not sure what you mean here. In a fixed sample design I would rather work with the upper CL of the CV or – if not available – assume a reasonably higher CV than my original guess rather than fiddling around with power. ❝ ❝ In a TSD setting we typically have no good understanding about the CV... Do I miss something here? ❝ (1) Yep and (2) no. ❝ ❝ Based on what assumptions would we select n1 (= fixed design sample size)? We typically have some range of possible values and we don't know where we will be. ❝ I was just quoting a regulatory statistician (don’t want to out him). Others didn’t contradict him. So likely he wasn’t alone with his point of view. ❝ Very interesting. I expected that the sample size penalty (n2) will be higher if we use a low n1. ❝ If we base n1 on the lower end and the CV is close to the guesstimate that’s the winner. One the other hand there is a ~56% chance of proceeding to the second stage which is not desirable – and contradicts the concept of a “safety net”. A compromise would be 75% of the fixed sample design. ❝ The pessimistic approach would be crazy. Best regards, Ben. |
mittyri ★★ Russia, 2018-06-12 01:27 (2312 d 20:01 ago) @ Ben Posting: # 18884 Views: 16,616 |
|
Dear Ben, Thank you for explanations! As I mentioned above, I was not aware that Maurer method is not just another set of alphas. After reading the manuscript I understood the concept. I tried to simulate some data to show the approach of safety net with inverse normal vs fixed (Helmut is right, it is very popular in Russia last days as my colleagues said; of course the Sponsors are using Potvin C ) But my loop was interrupted: interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38) What is going on here? I thought the problem with GMR1<0.9, tried to add some condition to omit the replicates with GMR1<0.9, but even that case got the same error for some replicates. — Kind regards, Mittyri |
Ben ★ 2018-06-12 21:32 (2311 d 23:56 ago) @ mittyri Posting: # 18893 Views: 16,784 |
|
Dear mittyri, ❝ I tried to simulate some data to show the approach of safety net with inverse normal vs fixed (Helmut is right, it is very popular in Russia last days as my colleagues said; of course the Sponsors are using Potvin C ) ❝ But my loop was interrupted: ❝ ❝ Error in tval[, 1] : incorrect number of dimensions ❝ In addition: Warning messages: ❝ 1: In qnorm(p2) : NaNs produced ❝ 2: In min(df) : no non-missing arguments to min; returning Inf ❝ ❝ What is going on here? interim.tsd.in proceeded and still wanted to calculate n2. This is however not possible because the estimated conditional target power is only defined if the power of stage 1 is greater than the overall power (argument targetpower ). If you still try to calculate it, you will end up with a negative estimated conditional target power which will then be put into the sample size routine as input target power - which of course will fail.I have corrected this bug on GitHub and it will be part of the next release. General remark here: In your example we see that BE has not been achieved only marginally. The Repeated CI is (0.79215, 0.99993). Even though the power of stage 1 is large enough so that we formally conclude futility, one could question whether it is really a good idea to stop the trial due to futility. On the other hand: If we want to have this futility criterion then we need a cut-off threshold, and at some point this cut-off will be met... Best regards, Ben. |
d_labes ★★★ Berlin, Germany, 2018-06-13 18:59 (2311 d 02:30 ago) @ Ben Posting: # 18900 Views: 16,363 |
|
Dear Ben, ❝ General remark here: In your example we see that BE has not been achieved only marginally. The Repeated CI is (0.79215, 0.99993). Even though the power of stage 1 is large enough so that we formally conclude futility, one could question whether it is really a good idea to stop the trial due to futility ... If I see it correctly that's called in Maurer et al. "futility rule can be applied in a nonbinding manner, ie, it can be used as guidance but must not necessarily be followed." (page 19, bottom) How to obtain a sample size number for the second stage if we want to do so? — Regards, Detlew |
Helmut ★★★ Vienna, Austria, 2018-06-13 21:23 (2311 d 00:05 ago) @ d_labes Posting: # 18901 Views: 16,467 |
|
Dear Detlew, ❝ If I see it correctly… You do. ❝ … that's called in Maurer et al. "futility rule can be applied in a nonbinding manner, ie, it can be used as guidance but must not necessarily be followed." (page 19, bottom) Sounds to me like the statement of the FDA’s guidances “Contains Nonbinding Recommendations” or a similar one in the protocols of European Scientific Advices. ❝ How to obtain a sample size number for the second stage if we want to do so? Introducing fuzzy logic to the AlGore Rhythm? Stop if it looks really terrible or continue if the weather is not that bad.*
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
d_labes ★★★ Berlin, Germany, 2018-06-14 12:18 (2310 d 09:10 ago) @ Helmut Posting: # 18905 Views: 16,369 |
|
Dear Helmut, ❝ ❝ How to obtain a sample size number for the second stage if we want to do so? ❝ ❝ Introducing fuzzy logic to the AlGore Rhythm? Stop if it looks really terrible or continue if the weather is not that bad. Let the decision to use the futility or not to NLYW ? "... there is a fundamental gender-based distinction in the functional system of this thinking apparatus: unlike male, female logic is based on fuzzy logic -- wherein each statement has got several values in such a way that if women say "No", this response doesn't mean absolute 'no-thing-ness', but implies some insensible and imperceptible features of the quite opposite response -- "Yes". The same is also true for a shift in the opposite direction of evaluation in female logic -- from "Yes" to "No". That is why it sometimes turns out to be a very difficult task to translate women's fuzzy logic to men's two-valued logic, that includes only "Yes" or "No", without a third value." Elmar Hussein — Regards, Detlew |
Ben ★ 2018-06-13 22:26 (2310 d 23:02 ago) @ d_labes Posting: # 18902 Views: 16,576 |
|
Dear Detlew, ❝ If I see it correctly that's called in Maurer et al. "futility rule can be applied in a nonbinding manner, ie, it can be used as guidance but must not necessarily be followed." (page 19, bottom) ❝ How to obtain a sample size number for the second stage if we want to do so? The reference at page 19 actually refers to the CI futility criterion, but nevertheless the same argument should apply to the 'power of stage 1' criterion. Well, as I said: Formula (15), i.e. the formula for the estimated conditional target power is only defined if the power of stage 1, P(R1), is less than the overall power 1 - beta. That's a key feature of the equation. Therefore, in my opinion the only possibility is: if you want to be able to handle it in a nonbinding manner, then you have to go with conditional error rates only (i.e. you cannot use the estimated conditional target power as target power for calculation of n2). So, we would need to select ssr.conditional = "error" .Best regards, Ben. PS: Please don't shoot the messenger |
d_labes ★★★ Berlin, Germany, 2018-06-14 12:47 (2310 d 08:42 ago) @ Ben Posting: # 18906 Views: 16,699 |
|
Dear Ben, ❝ The reference at page 19 actually refers to the CI futility criterion, I know. ❝ ... in my opinion the only possibility is: if you want to be able to handle it in a nonbinding manner, then you have to go with conditional error rates only (i.e. you cannot use the estimated conditional target power as target power for calculation of n2). So, we would need to select My first thought was: Set fCpower = 1 , that results in do not use the power futility criterion. This gives n2=16 for mittyri's example interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38, fCpower=1) .Your suggestion interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38, ssr.conditional = "error") gives also n2=16. Astonishing or correct? Avoiding the conditional sample size re-estimation, i.e. using the conventional sample size re-estimation via interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38, ssr.conditional = "no") gives n2=4. Ooops? Wow! Helmuts caveat of how to decide in case of "nonbinding futility" needs to be considered, scientifically, not via NLYW . IIRC the term "nonbinding" in the context of sequential designs is used for flexibility in stopping or continuing due to external reasons. Do we have such here? Binding, nonbinding - does it have an impact on the alpha control? I think not, but are not totally sure. — Regards, Detlew |
Ben ★ 2018-06-15 19:58 (2309 d 01:30 ago) @ d_labes Posting: # 18908 Views: 16,394 |
|
Dear Detlew, ❝ My first thought was: Set ❝ ❝ ❝ Your suggestion ❝ ❝ gives also n2=16. Astonishing or correct? This is correct. Please note that if fCpower = 1, then (as intended) the futility criterion regarding power of stage 1 never applies. If you then encounter a scenario where power of stage 1 is greater than targetpower (this must not happen, but it can happen), then the conditional estimated target power will be negative. Thus, we would have a problem with this being the target power for sample size calculation. To avoid this from happening the function automatically sets the target power for recalculation to targetpower (which is equivalent to ssr.conditional = "error"). See 'Details' in the man page.❝ Avoiding the conditional sample size re-estimation, i.e. using the conventional sample size re-estimation via ❝ ❝ gives n2=4. Ooops? Wow! I have to think about that ❝ IIRC the term "nonbinding" in the context of sequential designs is used for flexibility in stopping or continuing due to external reasons. Do we have such here? ❝ Binding, nonbinding - does it have an impact on the alpha control? I think not, but are not totally sure. Binding: Type 1 error is protected only if the futility criterion will be adhered to. ('Binding' is not common practice, authorities don't want this). Best regards, Ben. |
d_labes ★★★ Berlin, Germany, 2018-06-16 21:42 (2307 d 23:46 ago) @ Ben Posting: # 18909 Views: 16,261 |
|
Dear Ben, ❝ ❝ Binding, nonbinding - does it have an impact on the alpha control? I think not, but are not totally sure. ❝ Non-binding: Type 1 error is protected, even if the futility criterion is ignored. Was also my thought because I didn't find any relationship to a futility rule in the proof of alpha control in the paper of Maurer et al. Or do I err here? ❝ Binding: Type 1 error is protected only if the futility criterion will be adhered to. ('Binding' is not common practice, authorities don't want this). Are you sure for the binding case? I thought: If the TIE is controlled without adhering to any futility rule, then it is more than ever controlled also by applying a futility criterion. The probability of deciding BE is lowered by doing so, and therefore also the TIE. Of course the power may be compromized. Example (some sort of 'forced BE', whatever this is): power.tsd.in(CV=0.25, theta0=0.9, GMR=0.9, n1=36) gives pBE('empiric power')= 0.68452 (!). Increasing n1 doesn't help. Try it. Empiric TIE (theta0=1.25) is: pBE= 0.034186. Without the futility criterion w.r.t. the CI power.tsd.in(CV=0.25, theta0=0.9, GMR=0.9, n1=36, fCrit="no") you obtain pBE('empiric power')= 0.80815. Power much more raised if you also forget the power futility rule: power.tsd.in(CV=0.25, theta0=0.9, GMR=0.9, n1=36, fCrit="no", fCpower=1) gives a pBE= 0.90658. Empiric TIE (theta0=1.25) is: pBE= 0.050012. Nitpickers! Don't cry "alpha inflation"! The +0.000012 to 0.05 is the simulation error. Try setseed=F and you will get something like p(BE)= 0.049858 or in the next run p(BE)= 0.04982.I think that your statement for the binding case is only valid if you make a further adaption of the local alpha / critical values taking the futility rule into consideration. But I don't know how this could be done. The implementation in Power2Stage anyhow doesn't make such an adaption, if I see it correctly. Do you have any experinces for your statement 'Binding' is not common practice, authorities don't want this'. If yes, what is/are the reason(s) given by authorities to abandon binding futility rule(s) or not to 'like' them? — Regards, Detlew |
Ben ★ 2019-03-30 10:52 (2021 d 09:36 ago) @ d_labes Posting: # 20105 Views: 13,286 |
|
Dear Detlew, Sorry, totally forgot about this post. ❝ Avoiding the conditional sample size re-estimation, i.e. using the conventional sample size re-estimation via ❝ interim.tsd.in(GMR1=0.89, CV1=0.2575165, n1=38, ssr.conditional = "no") ❝ gives n2=4. Ooops? Wow! Ok, again: the recommendation here is to stop due to futility because the power of stage 1 is greater than the target power 80%. The result of n2 = 4 is correct in this situation. The reason is (i) we calculate n2 based on GMR which is 95% and (ii) we are not using conditional error rates, i.e. we ignore the magnitude of the p-values from stage 1.❝ ❝ ❝ Binding, nonbinding - does it have an impact on the alpha control? I think not, but are not totally sure. ❝ ❝ Non-binding: Type 1 error is protected, even if the futility criterion is ignored. ❝ ❝ Was also my thought because I didn't find any relationship to a futility rule in the proof of alpha control in the paper of Maurer et al. Or do I err here? You are correct. ❝ ❝ Binding: Type 1 error is protected only if the futility criterion will be adhered to. ('Binding' is not common practice, authorities don't want this). ❝ ❝ Are you sure for the binding case? I believe so, yes. ❝ Of course the power may be compromized. Agreed! ❝ I think that your statement for the binding case is only valid if you make a further adaption of the local alpha / critical values taking the futility rule into consideration. No, I don't think that a further adaptation needs to be made. Should be mentioned in e.g. the book of Wassmer and Brannath. I will check this when I have more time. ❝ Do you have any experinces for your statement 'Binding' is not common practice, authorities don't want this'. ❝ If yes, what is/are the reason(s) given by authorities to abandon binding futility rule(s) or not to 'like' them? Should also be mentioned in the book, but I haven't checked. I learned that in a workshop. I think the problem is that people may not believe you that you will always adhere to the stopping rule. Best regards, Ben. |