Helmut Hero Vienna, Austria, 20151127 19:05 Posting: # 15680 Views: 7,352 

Dear all, I received a question and suggested the sender to register at the forum, which he didn’t do. However, I think that the question is interesting and I want to get your opinions. The study is planed for USFDA submission. “BE study will be initiated with dosing for 50% subjects of protocol and samples will be analysed; if results with 50% subjects show bioequivalence, data will be submitted to regulatory. If results are not bioequivalent, study will be continued with dosing for remaining 50% subjects and samples will be analysed; The results with all subjects (100%) will be evaluated for BE and results show bioequivalence, data will be submitted to regulatory.” OK, smells of a “classical” GroupSequential Design with one interim at N/2. The best guess CV is around 40% and the expected GMR 0.95^{1}:
1. GSD No inflation of the TIE if we use Pocock’s approach with Lan/DeMets αspending. Power is pretty high and drops below 80% only for CV > 46%. 2. ‘Type 1’ TSD No inflation of the TIE. Power in the first stage is similar to the GSD (since alphas are similar). Overall power is more consistent and doesn’t drop below the target 80%. Now my questions (especially @Ben). If the CV is lower than the ‘best guess’ in the GSD we have to go full throttle with another 50 subjects. Compare the column “ 2nd% ” which gives the chance to proceed to the 2^{nd} part. Not only the chance is higher in the GSD, we are punished with another 50 subjects. Have a look at the TSD’s column “E[N] ” giving the expected average total sample size. Much lower. Sure. Sometimes we need just a few more subjects and not another 50. Only for high CVs the TSD’s approach the GSD’s. Nice side effect: If we start the TSD in 75% of the fixed sample design’s n, on the average the total sample will be even (slightly) lower (64 < 66).Given all that: Why should one use a GSD instead of a TSD?
— All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
ElMaestro Hero Denmark, 20151127 19:54 @ Helmut Posting: # 15681 Views: 6,338 

Hi Hötzi, » Given all that: Why should one use a GSD instead of a TSD? It is a great question, and I will not offer a definitive answer, but I will volunteer an opinion You can look at it this way: A GSD is a kind of TSD where you make assumptions about both the GMR and the CV (you use the anticipated ones, not the observed ones) when you transit from stage 1 to stage 2. That anticipated pair of CV+GMR is exactly the (or a) combo that naïvely doubles the sample size. Simple but extremely rigid. Thereby the GSDs should be considered a relic from bygone ages when computers were not fast enough to allow simulations to achieve what Potvin et al. have done. — I could be wrong, but… Best regards, ElMaestro  since June 2017 having an affair with the bootstrap. 
d_labes Hero Berlin, Germany, 20151130 11:15 @ Helmut Posting: # 15684 Views: 6,133 

Dear Helmut, » Now my questions (especially @Ben). If the CV is lower than the ‘best guess’ in the GSD we have to go full throttle with another 50 subjects. Compare the column “ 2nd% ” which gives the chance to proceed to the 2^{nd} part. Not only the chance is higher in the GSD, we are punished with another 50 subjects. Have a look at the TSD’s column “E[N] ” giving the expected average total sample size. Much lower. Much lower than what? Your presentation of the GSD results is a little bit unfair. It seems that the expected N is 100. But thats not true:
E[N] = 71.3 for CV=40% and n1=n2=50. IMHO not that much higher compared to 64 for the adaptive TSD.The fact itselve is left: E[N]_{GSD} > E[N]_{TSD} , at least for this example.— Regards, Detlew 
Helmut Hero Vienna, Austria, 20151201 16:35 @ d_labes Posting: # 15685 Views: 5,997 

Dear Detlew » » […] Much lower. » Much lower than what? TSD’s E[N] than GSD’s N. » Your presentation of the GSD results is a little bit unfair. It seems that the expected N is 100. I see. » But thats not true: […] You are absolutely right. As (almost) always. THX! The line res[j, 7] < sprintf("%.0f", max(cum)) should be replaced by res[j, 7] < sprintf("%.1f", (1tmp1$pct_s2/100)*n[1] + E[N] Interesting. — All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
d_labes Hero Berlin, Germany, 20151203 09:16 @ Helmut Posting: # 15691 Views: 5,878 

Dear Helmut, » The line » res[j, 7] < sprintf("%.0f", max(cum)) » should be replaced by » res[j, 7] < sprintf("%.1f", (1tmp1$pct_s2/100)*n[1] +
» (tmp1$pct_s2/100)*max(cum)) » » E[N]
» ─────────── » Interesting. What's your expected N? BTW: ? cumsum. — Regards, Detlew 
Helmut Hero Vienna, Austria, 20151203 13:10 @ d_labes Posting: # 15693 Views: 5,917 

Dear Detlew, » » E[N] » » ─────────── » » expected N GSD TSD » » ─────────────────────── » » 50 71.3 63.7 » » 100 84.6 99.2 » » Interesting. » » » What's your expected N? My interpretation of the question in the first post was 50. If I understood your post correctly this was a misinterpretation and it should be 100. » BTW: ? cumsum. I didn’t know that! Hence, my loop. Replace
— All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
d_labes Hero Berlin, Germany, 20151203 13:56 @ Helmut Posting: # 15694 Views: 5,791 

Dear Helmut, seems I have spoken Suaheli. What I meant was: You perform stage 1 with n1. Only if necessary you perform stage 2 with n2. Thus N(total) isn't always n1+n2 (=100 in your example). Then the 'expected' total sample size aka 'mean' sample size aka ASN can be calculated via my formula given above. If it is reasonable to calculate a mean for a variable with only 2 values is left to you. That's the reason why power.2stage.GS() dosn't give back components concerning the sample size 'distribution', unlike the other power.2stage.whatever() functions.Hope my English was now the better Suaheli . — Regards, Detlew 
Ben Regular 20151202 19:27 @ Helmut Posting: # 15687 Views: 6,018 

Dear Helmut / All, You raised an interesting question and yes the TSD from Potvin et al appears to have astonishing design features. The classical GSD or the adaptive twostage design according to the inverse normal method rely on a formal statistical framework: mathematical theorems including proofs are available on why they work, what properties they have and how they should be applied. This is nice. For the Potvin approach we only have simulations for certain scenarios at hand. Even though it appears to be good, it is not clear if this is always the case. More information on that topic with some more elaborations contains for example the article from Kieser and Rauch (2015). » In a TSD one would opt for a stage 1 sample size of ~75% of the fixed sample design. Reference? Some software packages give an inflation factor that helps determining the study size… Anyhow, I think such a rule of thumb is too strict and inflexible. Consider for example two alternative scenarios:
Best regards, Ben Ref: Kieser M, Rauch G Twostage designs for crossover bioequivalence trials Stat Med (Epub ahead of print 24 March 2015) doi 10.1002/sim.6487 
Helmut Hero Vienna, Austria, 20151203 03:11 @ Ben Posting: # 15689 Views: 5,977 

Dear Ben et alii, » […] the TSD from Potvin et al appears to have astonishing design features. The classical GSD or the adaptive twostage design according to the inverse normal method rely on a formal statistical framework: mathematical theorems including proofs are available on why they work, what properties they have and how they should be applied. This is nice. For the Potvin approach we only have simulations for certain scenarios at hand. Even though it appears to be good, it is not clear if this is always the case. More information on that topic with some more elaborations contains for example the article from Kieser and Rauch (2015). I agree that the frameworks of Potvin etc. are purely empirical. To show whether a given α maintains the TIE for a desired range of n_{1}/CV and target power takes 30 minutes in Power2Stage . I’m not sure whether the two lines in Kieser/Rauch fulfill the requirements of a formal proof. IMHO, it smells more of a claim. At least Gernot Wassmer told me that it it not that easy.» » In a TSD one would opt for a stage 1 sample size of ~75% of the fixed sample design. » » Reference? Some software packages give an inflation factor that helps determining the study size… Anyhow, I think such a rule of thumb is too strict and inflexible. See the discussion in my review, Table 3 in the Supplementary Material, and Rcode at the end.
» Consider for example two alternative scenarios: » ● Preplanned n1 = 52 and final N = 78 (i.e. n2 = 26). The average sample number (ASN) is smaller than for the Potvin TSD. Power is higher up until a certain point where the CV gets too high. Hhm. See the code at the end. I tried to implement your suggestions.
Type I Error? » ● Preplanned n1 = 48, n2 = 48. ASN comparable, Power similarly as above.
Well… » Therefore, I think the GSD has some charme and can be useful in situations with uncertainty. If (if!) you have some clue about the variability. » Moreover, the advantage is that we do not have to rely on only simulation results from certain parameter settings. 30 minutes. I will again chew on the email conversation we had last April. Rcodes 1. Find n1 for TSDs based on a ‘best guess’ CV.
2. Comparison of GSD and TSD
— All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
d_labes Hero Berlin, Germany, 20151203 09:47 (edited by d_labes on 20151203 16:19) @ Helmut Posting: # 15692 Views: 5,877 

Dear Helmut, dear Ben! Twosided or not twosided, that is the question! library(ldbounds) gives us: [1] 0.03100573 0.02774015
# onesided gives us: [1] 0.03100573 0.02972542 simsalabim Ben's preferred values. I personally opt for twosided . BTW: Lan/deMets spending function is Pocock like. Nearer to original Pocock are the mean of the critical values. Try 2*(1pnorm(rep(mean(bds2.poc$upper.bounds),2))) simsalabim, Pocock's natural constant! Nearly. (1pnorm(rep(mean(bds1.poc$upper.bounds),2))) hokus pokus fidibus, Ben's magical number! — Regards, Detlew 
Helmut Hero Vienna, Austria, 20151203 14:56 @ d_labes Posting: # 15695 Views: 5,868 

Dear Detlew & Ben, » Twosided or not twosided, that is the question! Yessir! » 2*(1pnorm(rep(mean(bds2.poc$upper.bounds),2))) » simsalabim, Pocock's natural constant!
‘Exact’
In chapter 12 Jones/Kenward (in the context of blinded sample reestimation) report an inflation of the TIE. The degree of inflation depends on the timing of the interim (the earlier, the worse). They state: “In the presence of Type I error rate inflation, the value of α used in the TOST must be reduced, so that the achieved Type I error rate is no larger than 0.05.” (my emphasis)They recommend an iterative algorithm [sic] by Golkowski et al^{3} and conclude: “[…] before using any of the methods […], their operating characteristics should be evaluated for a range of values of n_{1}, CV and true ratio of means that are of interest, in order to decide if the Type I error rate is controlled, the power is adequate and the potential maximum total sample size is not too great.” Given all that, I’m not sure whether the discussion of proofs, exact values, etc. does make sense at all. This wonderful stuff is based solely on normal theory and I’m getting bored by reading the phrase “when N is sufficiently large” below a series of fancy formulas. Unless someone comes up with a proof for small samples (many tried, all failed so far) I rather stick to simulations.
— All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
d_labes Hero Berlin, Germany, 20151203 16:15 @ Helmut Posting: # 15696 Views: 5,750 

Dear Helmut, » ... I think that Kieser/Rauch are correct in their lament about one vs. twosided Pocock’s limits. They argue for 0.0304 (which Jones/Kenward^{2} used in chapter 13 as well). Jennison/Turnbull give C_{p} (K=2, α=0.10) 1.875: » rep(1pnorm(1.875), 2) » [1] 0.03039636 0.03039636 I have another one: Gould A. L. "Group Sequential Extensions of a Standard Bioequivalence Testing Procedure" Journal of Pharmacokinetics and Biopharmaceutics. Vol 23. No.1. 1995 Table I: critical value for n1=n2: 1.8753 Seems I have to change my personal preference stated in my post above. That means on the other hand: Potvin und Konsorten were much more lucky then they should have been. Thats great . — Regards, Detlew 
Helmut Hero Vienna, Austria, 20151203 16:26 @ d_labes Posting: # 15697 Views: 5,789 

Dear Detlew, » I have another one: » Gould A. L. » "Group Sequential Extensions of a Standard Bioequivalence Testing Procedure" » Journal of Pharmacokinetics and Biopharmaceutics. Vol 23. No.1. 1995 » Table I: critical value for n1=n2: 1.8753 How could I forget Mr Gould? He was first in exploring this stuff for BEstudies.
» That means on the other hand: Potvin und Konsorten were much more lucky then they should have been. Yep. Lucky punch. — All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
Ben Regular 20160110 12:43 @ Helmut Posting: # 15808 Views: 3,580 

Dear Helmut / All, » I agree that the frameworks of Potvin etc. are purely empirical. To show whether a given α maintains the TIE for a desired range of n_{1}/CV and target power takes 30 minutes in Power2Stage . Well, yes, but this is again only empirical. » I’m not sure whether the two lines in Kieser/Rauch fulfill the requirements of a formal proof. I actually meant the dicsussion on the decision scheme and the properties from Potvin et al (not mathematical theorems and proofs  there are in fact none). » What I don’t understand in GSDs (lacking experience): How do you arrive at N? Is Detlew right when he said that this is the expected sample size? You can use the sample size from a fixed design and adapt it based on an inflation factor. Addplan for example provides such values (one should however keep in mind that everything in Addplan is based on the Normal approximation). Of course, no one keeps you from further playing around and checking some design properties (as for example the resulting average sample size). A good idea may be to focus on a realistic best guess for the interim CV and so determine n1, and to cover a bad CV scenario via the second stage n2. » Your example would translate to a fixed sample design with GMR 0.95, CV ~44%, and target power 0.8. So the only purpose of the interim is hoping for a lucky punch (i.e., ASN 64)? If the CV is just a little bit higher (50%), power is unacceptable. In your case the CV is already pretty high and maybe the design properties do not behave so well in those regions? I have not investigated this thoroughly... » If (if!) you have some clue about the variability. Yes, but when is this not the case? You would not conduct a confirmatory BE study without having performed other PK studies with that substance, would you? You will always have a first in man trial and some bioavailability trials (or historical trials from a comparator). Regarding the boundaries that Detlew mentioned: It should be based on onesided bounds. Using directly twosided can mess things up. Best regards, Ben 