Helmut Hero Vienna, Austria, 20180421 17:17 Posting: # 18714 Views: 1,411 

Dear all, in version 0.51 of package Power2Stage exact methods^{1,2,3} are implemented – after months of struggling (many THX to Ben). The methods are extremely flexible (arbitrary BElimits and target power, futility criteria on the PE, its CI, and the maximum total sample size, adapting for the PE of stage 1).I’ve heard in the past that regulatory statisticians in the EU prefer methods which strictly control the Type I Error (however, at the 3^{rd} GBHI conference in Amsterdam last week it was clear that methods based on simulations are perfectly fine for the FDA) and the inverse normal method with repeated confidence intervals would be the method of choice. Well roared lion; wasn’t aware of software which can to this job. That’s like saying “Fly to Mars but you are not allowed to use a rocket!” What else? Levitation? Witchcraft? Obtaining two pvalues (like in TOST) is fairly easy but to convert them into a confidence interval (as required in all guidelines) not trivial. Despite we showed this approach^{4} a while ago, nothing was published in a peerreviewed journal until very recently. Although we have a method now which demonstrated to control the TIE, I was curious how it performs in simulations (just to set it into perspective). Rcode at the end of the post (with small step sizes of CV and n_{1} expect runtimes of some hours; in large simulations I don’t recommend pmethod="exact" – about 10times slower than pmethod="nct" ). See the documentation of function power.tsd.in() about how to set futility criteria and make it fully adaptive. As usual in the latter case say goodbye to power…I explored scenarios, maximum combination test
Plots of the first four scenarios: When exploring the details it is also clear that the exact method keeps the desired power better than the simulation methods in extreme cases. Power of scenario #5 (a) and modifications:
Rcode
_{} In memory of Willi Maurer, Dr. sc. math. ETH — Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
Helmut Hero Vienna, Austria, 20180421 20:33 @ Helmut Posting: # 18715 Views: 1,295 

Dear all, answering my own post in order to keep it short. In the following an example. We have a guestimate of the CV (0.20), assume a GMR of 0.95, and aim at power 0.80. No futility criteria. Some regulatory statisticians told me to prefer a first stage as estimated for a fixedsample design (i.e., the second stage is solely a ‘safety net’).
In the method the weights have to be prespecified, stated in the SAP, and used throughout subsequent steps (irrespective of the reestimated n_{2}). In the fixedsample design we would need 20 subjects. How to set the weights? An intuitive way is to use the median (20) of the total sample size based on simulations. This would give us weights of [1, 0]. Great. But weights have to be >0 and <1. Hence, I tweaked them a little to [0.999999, 0.000001]. What can we expect if we run the study with n_{1} 20?
Fine. If everything turns out as expected we have to be unlucky to need a second stage. Power in the first is already 0.73 and stage 2 sample sizes are not shocking. As common in TSDs the overall power is generally higher than in a fixedsample design. We perform the first stage and get GMR 0.91 and CV 0.25. Oops! Both are worse than assumed. Especially the GMR is painful.
We fail to show BE (lower CL 77.31%) and should initiate the second stage with 24 subjects. How would a ‘Type 1’ TSD perform?
Pretty similar though a lower n_{2} is suggested. OK, we perform the second stage and get GMR 0.93 and CV 0.21. Both are slightly better than what we got in the first stage but again worse than assumed.
We survived. In a ‘Type 1’ TSD we would get:
Pretty similar again. If we state it in the protocol, we could also aim for higher power in the second stage if the GMR in the first doesn’t look nice. If we switch to 0.90 we would run the second stage with 36 subjects.
Helps. Another option would be to adjust for GMR1 by using the argument usePE=TRUE in interim.tsd.in() . For power 0.80 that would mean 40 subjects in the second stage and for 0.90 already 62…— Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
ElMaestro Hero Denmark, 20180421 20:49 @ Helmut Posting: # 18716 Views: 1,279 

Hi Hötzi, thank you for this post. What is "the inverse normal method with repeated confidence intervals" ? — “A tenyear, doubleblind study from the Mayo Clinic concluded that even in late stages of dementia, the last to go is the lobe of the brain in charge of cafeteria layout.” (Serge Storms/Tim Dorsey). Best regards, ElMaestro  Bootstrapping is a relatively new hobby of mine. I am only 30 years late to the party. 
Helmut Hero Vienna, Austria, 20180421 21:41 @ ElMaestro Posting: # 18717 Views: 1,280 

Hi ElMaestro, flow chart (futility of the CI, unrestricted total sample size): Details: — Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
mittyri Senior Russia, 20180428 15:54 (edited by mittyri on 20180428 16:08) @ Helmut Posting: # 18737 Views: 617 

Hi Helmut, sorry for naive questions raised from my hazelnut brain 1. I'm trying to compare the old function power.tsd(method = c("B", "C", "B0"), alpha0 = 0.05, alpha = c(0.0294, 0.0294), with a new one power.tsd.in(alpha, weight, max.comb.test = TRUE, n1, CV, targetpower = 0.8, So the old function was nice since the user can choose the method or specify 3 alphas. In the new one I see the comment regarding alpha If one element is given, the overall onesided significance level. If two elements are given, the adjusted onesided alpha levels for stage 1 and stage 2, respectively. If missing, defaults to 0.05. What about alpha0 for method C? Is it deprecated? 2. Why did you decide to include CI futility rule by default? 3. Regarding your flowchart: isn't it possible that we get some value lower than 4? power.tsd.in(CV=0.13, n1=12) for example and after first stage CV=15%, CI=[0.7991897 1.0361745]: sampleN2.TOST(CV=0.15, n1=12) 4. Is it possible to update the docs attached to the library? 5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time Are there any reasons to double that functions? PS: regarding 3rd point: I tried interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No") oh, there's a default argument min.n2 = 4 OK, let's try to change that: > interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No", min.n2 = 2) Why couldn't I select a smaller one? — Kind regards, Mittyri 
Helmut Hero Vienna, Austria, 20180428 17:29 @ mittyri Posting: # 18738 Views: 611 

Hi Mittyri, I’m in a hurry; so answering only part of your questions (leave the others to Detlew or Ben). » 2. Why did you decide to include CI futility rule by default? This applies only to the x.tsd.in functions (to be in accordance with the paper of Maurer et al.).» 3. Regarding your flowchart: » isn't it possible that we get some value lower than 4? » for example and after first stage CV=15%, CI=[0.7991897 1.0361745]: » sampleN2.TOST(CV=0.15, n1=12)
» Design alpha CV theta0 theta1 theta2 n1 Sample size sampleN2.TOST() is intended for the other methods where at the end stages are pooled.In the inverse normal method stages are evaluated separately (PE and MSE from ANOVAS of each stage). If you have less than 4 subjects in the second stage you will run out of steam (too low degrees of freedom). Well, 3 would work, but… » 5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time » Are there any reasons to double that functions? Since this is a 0.xrelease according to CRAN’s policy we can rename functions or even remove them without further notice. _{} We decided to unify the functionnames. In order not to break existing code we introduced the aliases. In the next release functions x.2stage.x() will be removed and only their counterparts x.tsd.x() kept.» PS: » regarding 3rd point: » I tried » interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No") » […]
» oh, there's a default argument min.n2 = 4 » OK, let's try to change that: » interim.tsd.in(GMR1=sqrt(0.7991897 * 1.0361745), CV1=0.15,n1=12, fCrit="No", min.n2 = 2) » Error in interim.tsd.in(GMR1 = sqrt(0.7991897 * 1.0361745), CV1 = 0.15, : » Why couldn't I select a smaller one? See above. Doesn’t make sense with zero degrees of freedom (n_{2}=2). — Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
d_labes Hero Berlin, Germany, 20180429 21:11 @ mittyri Posting: # 18741 Views: 584 

Dear Michael, just my two cents. » 1. I'm trying to compare the old function » ... » So the old function was nice since the user can choose the method or specify 3 alphas. » In the new one I see the comment regarding alpha » If one element is given, the overall onesided significance level. If two elements are given, the adjusted onesided alpha levels for stage 1 and stage 2, respectively. » If missing, defaults to 0.05. » What about alpha0 for method C? Is it deprecated? Sorry for confusion, but you definitely have to study the references (start with ^{1)}) to get a clue whats going on with this new function(s) implementing a new method for evaluating TSDs. New in the sense that it was not implemented in Power2Stage and was not applied in the evaluation of TSDs up to now. It's by no means a method adding to or amending the Potvin methods. It is a new method with a different philosophy behind. And this method, combination of pvalues of applying the TOST with the data of the two stages separately, is said to control the TIE rate at <=0.05, regardless of what design changes are done during interim, e.g. reestimation of the sample size at interim analysis. And this is not proven by simulations, but in theory, by proof. A feature which is demanded by EMA statisticians. Do you remember the statement "Potvin's methods are not valid / acceptable in Europe"? Exept Russia which is at least to some extent also in Europe IIRC... » 2. Why did you decide to include CI futility rule by default? See Helmut's answer. Maurer et al. have included a CI futility rule in their paper. And it's our behavior to set defaults according to the (first) paper(s) describing TSD evaluation methods. Ok, that may be suboptimal in comparing methods, since you always have to remember the defaults and differences within them for different functions. But, ... The recalculation or verification of results comes first. And my lazyness calls for defaults resembling the details done in the paper(s) after wich a function in Power2Stage was implemented. » 3. Regarding your flowchart: » isn't it possible that we get some value lower than 4? » ... See Helmut's answer. Since min.n2<4 doesn't make sense it is restricted to >=4. As described in the Maurer et al. paper. » 4. Is it possible to update the docs attached to the library? Not quite clear for me what we should update. Could you please elaborate? » 5. I was confused with "2stage" 'aliased' with "tsd" and was looking for differences some time » Are there any reasons to double that functions? The real reason behind this change is lazyness of mine (sic!). It saves me 3(!) keystrokes . Believe it or not ... Don't hesitate to ask more "naive" question. We all here, including not at least me, are naive with respect to this new method of evaluating TSDs. If you feel more comfortable ask me or Helmut or Ben via the private way. I.e. write to the maintainer of Power2Stage . ^{1)}Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in twostage sequential designs when testing for average bioequivalence. Stat Med. 2018;37(10):1–21. doi:10.1002/sim.7614. Drop me a Mehl if you need that sheet of paper. — Regards, Detlew 
mittyri Senior Russia, 20180430 13:41 @ d_labes Posting: # 18744 Views: 559 

Dear Detlew, Dear Helmut, I'm very sorry for that post, looks like I'm out of current state of TSDs... OK, I need to review the paper since certainly that's a new standard like Potvin's paper before. » » 4. Is it possible to update the docs attached to the library? » » Not quite clear for me what we should update. Could you please elaborate? I looked into Power2Stage/doc and I've found that updated on Jan2016. — Kind regards, Mittyri 
d_labes Hero Berlin, Germany, 20180425 14:19 @ Helmut Posting: # 18729 Views: 769 

Dear Helmut, great post. Only one remark about the weights you choose for the maximum combination test in your R code. » ...
» for (j in seq_along(CV)) { Defining the weights that way is IMHO not what you intended. Or I don't understand what you intended. It is correct if you think in terms of the standard combination test and think further that you have to specify two weights for that. But since the two weights are connected by w, 1w the second is calculated within the function power.tsd.in() automatically. You only need to define w[1] in the input argument.The idea behind the maximum combination test now is: If our first pair of weights w, 1w (chosen anyhow) is not "optimal", choose a second pair of weights w*, 1w* which is more adapted to the real situation.If you were too optimistic in your planing of n2, i.e. have chosen n2 too low compared to what really happens in the sample size adaption, it would be wise to define w* lower than w. You do that, but your choice ( w in w[1]=0.999999, w* in w[2]=1e6 ) is too extreme I think and not your intention I further think. The second pair of weights w*=1e6, 1w*=0.999999 here is for a situation were the pvalues from the second stage nearly exclusively determine the overall outcome of the maximum combination test. The pvalues from the first stage data are downweighted with w*=1e6 .Hope this sermon is not too confusing. BTW: Choosing the weights "optimal" is for me a mystery. To do that, we had to know the outcomes of the two stages, but we don't have them until the study has been done. On the other hand we have to predefine them to gain strict TIE control. Hier beißt sich die Katze in den Schwanz. — Regards, Detlew 
Helmut Hero Vienna, Austria, 20180426 09:51 @ d_labes Posting: # 18733 Views: 714 

Dear Detlew, » Defining the weights that way is IMHO not what you intended. OK, I see! » BTW: Choosing the weights "optimal" is for me a mystery. To do that, we had to know the outcomes of the two stages, but we don't have them until the study has been done. On the other hand we have to predefine them to gain strict TIE control. Hier beißt sich die Katze in den Schwanz. Using the median of n.tot to define the weights from the sim’s was a – maybe too naïve – attempt. Other suggestions? Some regulatory statisticians prefer the first stage in a TSD to be like in a fixed sample design. For some combinations of n_{1}/CV in my grid this will be ≤ the median of n.tot. In other words, I’m not too optimistic but rather too pessimistic. Now what? Example: CV 0.1, GMR 0.95, target power 0.80. Fixed sample design’s n 8 (n_{1} ⇒ 12 acc. to GLs). n.mean and median of n.tot 12 with the default weights (0.5, 0.25). Even the 95% percentile of n.tot is 12. — Cheers, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
d_labes Hero Berlin, Germany, 20180426 20:02 @ Helmut Posting: # 18734 Views: 680 

Dear Helmut, » ... » Using the median of n.tot to define the weights from the sim’s was a – maybe too naïve – attempt. Other suggestions? Some regulatory statisticians prefer the first stage in a TSD to be like in a fixed sample design. For some combinations of n_{1}/CV in my grid this will be ≤ the median of n.tot. In other words, I’m not too optimistic but rather too pessimistic. Now what? As I already said, DUNO really. » Example: CV 0.1, GMR 0.95, target power 0.80. Fixed sample design’s n 8 (n_{1} ⇒ 12 acc. to GLs). n.mean and median of n.tot 12 with the default weights (0.5, 0.25). Even the 95% percentile of n.tot is 12. » If you were pesssimistic, so in the spirit of the MCT ist would be wise to choose the second pair of weights with decreased value. Or do I err here ("real" n2 lower than the pessimistic)? If I'm right, possible values could be: w=0.999, w*=0.5 (or something like that value) Or we stay for that extremal case with the standard combination test? But to state it again: For me it is a mystery how to choose the weights. But I think it doesn't make so much difference if we are not totally wrong with our choosen weights. As far as I have seen so far for a small number of examples: The power is influenced only to a "minor" extent. The TIE is controlled, whatsoever weights we choose. — Regards, Detlew 
d_labes Hero Berlin, Germany, 20180509 13:53 (edited by d_labes on 20180509 14:25) @ Helmut Posting: # 18757 Views: 307 

Dear Helmut, I have tried to demystify some aspects of choosing w and w* for the maximum combination test by looking into some examples: Take n_{fix} as sample size in stage 1 (Helmut’s proposal) Guess: CV=0.2, theta0=0.95 > n_{fix} = 20 Choose n1 = n_{fix} = 20, i.e. w= 0.99, since w has to be <1. Guess was too pessimistic: e.g. true CV=0.15 > n_{fix} = 12 or theta0=0.975 > n_{fix} = 16 For both the sample size for stage 1 exceed the necessary total sample size of a fixed design. Thus a more realistic w* can’t be defined or should be set to the same value as w. This results in the standard combination test. Guess was too optimistic: e.g. true CV=0.25 > n_{fix} = 28 or theta0=0.925 > n_{fix} = 26 Both lead to a ‘more realistic’ w*= 0.71 or 0.77. Let's choose w* = 0.7 for simplicity. Power & sample size of the scenarios
N Take n_{fix}/2 as sample size in stage 1 (Maurer et al.) Guess: CV=0.2, theta0=0.95 > n_{fix} = 20 Choose n1 = n_{fix}/2 = 10, i.e. w= 0.5. Guess was too pessimistic: e.g. true CV=0.15 > n_{fix} = 12 or theta0=0.975 > n_{fix} = 16 This would let to a ‘more realistic’ w*= 0.83 or 0.625, respectively. Let's take for simplicity w* = 0.7. Guess was too optimistic: e.g. true CV=0.25 > n_{fix} = 28 or theta0=0.925 > n_{fix} = 26 Both lead to a ‘more realistic’ w*= 0.36 or 0.38. Let's take for simplicity w* = 0.4. Power & sample size of the scenarios
N Confusion :
— Regards, Detlew 