Helmut ★★★ ![]() ![]() Vienna, Austria, 2015-11-27 20:05 (3421 d 21:03 ago) Posting: # 15680 Views: 16,457 |
|
Dear all, I received a question and suggested the sender to register at the forum, which he didn’t do. However, I think that the question is interesting and I want to get your opinions. The study is planed for USFDA submission. “BE study will be initiated with dosing for 50% subjects of protocol and samples will be analysed; if results with 50% subjects show bioequivalence, data will be submitted to regulatory. If results are not bioequivalent, study will be continued with dosing for remaining 50% subjects and samples will be analysed; The results with all subjects (100%) will be evaluated for BE and results show bioequivalence, data will be submitted to regulatory.” OK, smells of a “classical” Group-Sequential Design with one interim at N/2. The best guess CV is around 40% and the expected GMR 0.951:
1. GSD No inflation of the TIE if we use Pocock’s approach with Lan/DeMets α-spending. Power is pretty high and drops below 80% only for CV > 46%. 2. ‘Type 1’ TSD No inflation of the TIE. Power in the first stage is similar to the GSD (since alphas are similar). Overall power is more consistent and doesn’t drop below the target 80%. Now my questions (especially @Ben). If the CV is lower than the ‘best guess’ in the GSD we have to go full throttle with another 50 subjects. Compare the column “ 2nd% ” which gives the chance to proceed to the 2nd part. Not only the chance is higher in the GSD, we are punished with another 50 subjects. Have a look at the TSD’s column “E[N] ” giving the expected average total sample size. Much lower. Sure. Sometimes we need just a few more subjects and not another 50. Only for high CVs the TSD’s approach the GSD’s. Nice side effect: If we start the TSD in 75% of the fixed sample design’s n, on the average the total sample will be even (slightly) lower (64 < 66).Given all that: Why should one use a GSD instead of a TSD?
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2015-11-27 20:54 (3421 d 20:14 ago) @ Helmut Posting: # 15681 Views: 13,965 |
|
Hi Hötzi, ❝ Given all that: Why should one use a GSD instead of a TSD? It is a great question, and I will not offer a definitive answer, but I will volunteer an opinion ![]() You can look at it this way: A GSD is a kind of TSD where you make assumptions about both the GMR and the CV (you use the anticipated ones, not the observed ones) when you transit from stage 1 to stage 2. That anticipated pair of CV+GMR is exactly the (or a) combo that naïvely doubles the sample size. Simple but extremely rigid. Thereby the GSDs should be considered a relic from bygone ages when computers were not fast enough to allow simulations to achieve what Potvin et al. have done. — Pass or fail! ElMaestro |
d_labes ★★★ Berlin, Germany, 2015-11-30 12:15 (3419 d 04:53 ago) @ Helmut Posting: # 15684 Views: 13,878 |
|
Dear Helmut, ❝ Now my questions (especially @Ben). If the CV is lower than the ‘best guess’ in the GSD we have to go full throttle with another 50 subjects. Compare the column “ Much lower than what? Your presentation of the GSD results is a little bit unfair. It seems that the expected N is 100. But thats not true:
E[N] = 71.3 for CV=40% and n1=n2=50. IMHO not that much higher compared to 64 for the adaptive TSD.The fact itselve is left: E[N]GSD > E[N]TSD , at least for this example.— Regards, Detlew |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2015-12-01 17:35 (3417 d 23:33 ago) @ d_labes Posting: # 15685 Views: 13,635 |
|
Dear Detlew ❝ ❝ […] Much lower. ❝ Much lower than what? TSD’s E[N] than GSD’s N. ❝ Your presentation of the GSD results is a little bit unfair. It seems that the expected N is 100. I see. ❝ But thats not true: […] You are absolutely right. As (almost) always. ![]() The line res[j, 7] <- sprintf("%.0f", max(cum)) should be replaced by res[j, 7] <- sprintf("%.1f", (1-tmp1$pct_s2/100)*n[1] + E[N] Interesting. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
d_labes ★★★ Berlin, Germany, 2015-12-03 10:16 (3416 d 06:52 ago) @ Helmut Posting: # 15691 Views: 13,487 |
|
Dear Helmut, ❝ The line ❝ ❝ should be replaced by ❝ ❝ ❝ ❝ expected N GSD TSD ❝ ─────────────────────── ❝ 100 84.6 99.2 ❝ Interesting. ![]() What's your expected N? BTW: ? cumsum. — Regards, Detlew |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2015-12-03 14:10 (3416 d 02:58 ago) @ d_labes Posting: # 15693 Views: 13,635 |
|
Dear Detlew, ❝ ❝ ❝ ❝ ❝ ❝ ❝ ❝ ❝ ❝ ❝ ❝ ❝ ❝ Interesting. ❝ ❝ ❝ What's your expected N? My interpretation of the question in the first post was 50. If I understood your post correctly this was a misinterpretation and it should be 100. ❝ BTW: ? cumsum. I didn’t know that! Hence, my loop. Replace
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
d_labes ★★★ Berlin, Germany, 2015-12-03 14:56 (3416 d 02:12 ago) @ Helmut Posting: # 15694 Views: 13,366 |
|
Dear Helmut, seems I have spoken Suaheli. What I meant was: You perform stage 1 with n1. Only if necessary you perform stage 2 with n2. Thus N(total) isn't always n1+n2 (=100 in your example). Then the 'expected' total sample size aka 'mean' sample size aka ASN can be calculated via my formula given above. If it is reasonable to calculate a mean for a variable with only 2 values is left to you. That's the reason why power.2stage.GS() dosn't give back components concerning the sample size 'distribution', unlike the other power.2stage.whatever() functions.Hope my English was now the better Suaheli ![]() — Regards, Detlew |
Ben ★ 2015-12-02 20:27 (3416 d 20:41 ago) @ Helmut Posting: # 15687 Views: 13,806 |
|
Dear Helmut / All, You raised an interesting question and yes the TSD from Potvin et al appears to have astonishing design features. The classical GSD or the adaptive two-stage design according to the inverse normal method rely on a formal statistical framework: mathematical theorems including proofs are available on why they work, what properties they have and how they should be applied. This is nice. For the Potvin approach we only have simulations for certain scenarios at hand. Even though it appears to be good, it is not clear if this is always the case. More information on that topic with some more elaborations contains for example the article from Kieser and Rauch (2015). ❝ In a TSD one would opt for a stage 1 sample size of ~75% of the fixed sample design. Reference? Some software packages give an inflation factor that helps determining the study size… Anyhow, I think such a rule of thumb is too strict and inflexible. Consider for example two alternative scenarios:
Best regards, Ben Ref: Kieser M, Rauch G Two-stage designs for cross-over bioequivalence trials Stat Med (Epub ahead of print 24 March 2015) doi 10.1002/sim.6487 |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2015-12-03 04:11 (3416 d 12:57 ago) @ Ben Posting: # 15689 Views: 13,943 |
|
Dear Ben et alii, ❝ […] the TSD from Potvin et al appears to have astonishing design features. The classical GSD or the adaptive two-stage design according to the inverse normal method rely on a formal statistical framework: mathematical theorems including proofs are available on why they work, what properties they have and how they should be applied. This is nice. For the Potvin approach we only have simulations for certain scenarios at hand. Even though it appears to be good, it is not clear if this is always the case. More information on that topic with some more elaborations contains for example the article from Kieser and Rauch (2015). I agree that the frameworks of Potvin etc. are purely empirical. To show whether a given α maintains the TIE for a desired range of n1/CV and target power takes 30 minutes in Power2Stage . I’m not sure whether the two lines in Kieser/Rauch fulfill the requirements of a formal proof. IMHO, it smells more of a claim. At least Gernot Wassmer told me that it it not that easy.❝ ❝ In a TSD one would opt for a stage 1 sample size of ~75% of the fixed sample design. ❝ ❝ Reference? Some software packages give an inflation factor that helps determining the study size… Anyhow, I think such a rule of thumb is too strict and inflexible. See the discussion in my review, Table 3 in the Supplementary Material, and R-code at the end.
❝ Consider for example two alternative scenarios: ❝ ● Pre-planned n1 = 52 and final N = 78 (i.e. n2 = 26). The average sample number (ASN) is smaller than for the Potvin TSD. Power is higher up until a certain point where the CV gets too high. Hhm. See the code at the end. I tried to implement your suggestions.
Type I Error? ❝ ● Pre-planned n1 = 48, n2 = 48. ASN comparable, Power similarly as above.
![]() ![]() Well… ❝ Therefore, I think the GSD has some charme and can be useful in situations with uncertainty. If (if!) you have some clue about the variability. ❝ Moreover, the advantage is that we do not have to rely on only simulation results from certain parameter settings. 30 minutes. ![]() I will again chew on the e-mail conversation we had last April. R-codes 1. Find n1 for TSDs based on a ‘best guess’ CV.
2. Comparison of GSD and TSD
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
d_labes ★★★ Berlin, Germany, 2015-12-03 10:47 (3416 d 06:21 ago) (edited on 2015-12-03 16:19) @ Helmut Posting: # 15692 Views: 13,565 |
|
Dear Helmut, dear Ben! Two-sided or not two-sided, that is the question! library(ldbounds) gives us: [1] 0.03100573 0.02774015
# one-sided gives us: [1] 0.03100573 0.02972542 simsalabim Ben's preferred values. I personally opt for two-sided ![]() BTW: Lan/deMets spending function is Pocock like. Nearer to original Pocock are the mean of the critical values. Try 2*(1-pnorm(rep(mean(bds2.poc$upper.bounds),2))) simsalabim, Pocock's natural constant! Nearly. (1-pnorm(rep(mean(bds1.poc$upper.bounds),2))) hokus pokus fidibus, Ben's magical number! — Regards, Detlew |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2015-12-03 15:56 (3416 d 01:12 ago) @ d_labes Posting: # 15695 Views: 13,549 |
|
Dear Detlew & Ben, ❝ Two-sided or not two-sided, that is the question! Yessir! ❝ ❝ simsalabim, Pocock's natural constant!
‘Exact’
In chapter 12 Jones/Kenward (in the context of blinded sample re-estimation) report an inflation of the TIE. The degree of inflation depends on the timing of the interim (the earlier, the worse). They state: “In the presence of Type I error rate inflation, the value of α used in the TOST must be reduced, so that the achieved Type I error rate is no larger than 0.05.” (my emphasis)They recommend an iterative algorithm [sic] by Golkowski et al3 and conclude: “[…] before using any of the methods […], their operating characteristics should be evaluated for a range of values of n1, CV and true ratio of means that are of interest, in order to decide if the Type I error rate is controlled, the power is adequate and the potential maximum total sample size is not too great.” Given all that, I’m not sure whether the discussion of proofs, exact values, etc. does make sense at all. This wonderful stuff is based solely on normal theory and I’m getting bored by reading the phrase “when N is sufficiently large” below a series of fancy formulas. Unless someone comes up with a proof for small samples (many tried, all failed so far) I rather stick to simulations.
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
d_labes ★★★ Berlin, Germany, 2015-12-03 17:15 (3415 d 23:53 ago) @ Helmut Posting: # 15696 Views: 13,430 |
|
Dear Helmut, ❝ ... I think that Kieser/Rauch are correct in their lament about one- vs. two-sided Pocock’s limits. They argue for 0.0304 (which Jones/Kenward2 used in chapter 13 as well). Jennison/Turnbull give Cp (K=2, α=0.10) 1.875: ❝ ❝ I have another one: Gould A. L. "Group Sequential Extensions of a Standard Bioequivalence Testing Procedure" Journal of Pharmacokinetics and Biopharmaceutics. Vol 23. No.1. 1995 Table I: critical value for n1=n2: 1.8753 Seems I have to change my personal preference stated in my post above. That means on the other hand: Potvin und Konsorten were much more lucky then they should have been. Thats great ![]() — Regards, Detlew |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2015-12-03 17:26 (3415 d 23:42 ago) @ d_labes Posting: # 15697 Views: 13,411 |
|
Dear Detlew, ❝ I have another one: ❝ Gould A. L. ❝ "Group Sequential Extensions of a Standard Bioequivalence Testing Procedure" ❝ Journal of Pharmacokinetics and Biopharmaceutics. Vol 23. No.1. 1995 ❝ Table I: critical value for n1=n2: 1.8753 How could I forget Mr Gould? He was first in exploring this stuff for BE-studies.
![]() ❝ That means on the other hand: Potvin und Konsorten were much more lucky then they should have been. Yep. Lucky punch. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
Ben ★ 2016-01-10 13:43 (3378 d 03:25 ago) @ Helmut Posting: # 15808 Views: 11,249 |
|
Dear Helmut / All, ❝ I agree that the frameworks of Potvin etc. are purely empirical. To show whether a given α maintains the TIE for a desired range of n1/CV and target power takes 30 minutes in Well, yes, but this is again only empirical. ❝ I’m not sure whether the two lines in Kieser/Rauch fulfill the requirements of a formal proof. I actually meant the dicsussion on the decision scheme and the properties from Potvin et al (not mathematical theorems and proofs - there are in fact none). ❝ What I don’t understand in GSDs (lacking experience): How do you arrive at N? Is Detlew right when he said that this is the expected sample size? You can use the sample size from a fixed design and adapt it based on an inflation factor. Addplan for example provides such values (one should however keep in mind that everything in Addplan is based on the Normal approximation). Of course, no one keeps you from further playing around and checking some design properties (as for example the resulting average sample size). A good idea may be to focus on a realistic best guess for the interim CV and so determine n1, and to cover a bad CV scenario via the second stage n2. ❝ Your example would translate to a fixed sample design with GMR 0.95, CV ~44%, and target power 0.8. So the only purpose of the interim is hoping for a lucky punch (i.e., ASN 64)? If the CV is just a little bit higher (50%), power is unacceptable. In your case the CV is already pretty high and maybe the design properties do not behave so well in those regions? I have not investigated this thoroughly... ❝ If (if!) you have some clue about the variability. Yes, but when is this not the case? You would not conduct a confirmatory BE study without having performed other PK studies with that substance, would you? You will always have a first in man trial and some bioavailability trials (or historical trials from a comparator). Regarding the boundaries that Detlew mentioned: It should be based on one-sided bounds. Using directly two-sided can mess things up. Best regards, Ben |