3x3x3 vs 3x6x3 reloaded [Design Issues]

Dear all,

Sorry for bringing up this topic again, I know there are lots of posts, but still I do not fully understand why a 3x6x3 crossover design is prefered compared to a 3x3x3 design (or in what cases the latter suffices) and hope you are willing to discuss again.

The main argument is that it (the 3x6x3 design) is variance balanced and balanced for (first order) carry-over effects. Let us consider the latter property: What exactly does it mean? Does it mean that on average the effect of carry-over cancels out (as with the period effect)? Is that always the case or only in case carry-over is incoprorated in the statistcal model? If it always holds true, then I wonder why people are concerned about carry-over and incorporate washout (maybe otherwise a higher-order carry-over will occur...) - at least for Williams designs. If it's only in case the effect is incorporated in the model (as I understood from this post this option is correct), then this advantage is actually useless because it is not recommended to incorporate this effect and in fact "nobody" does.

Coming to the variance balanced property, this implies that all pariwise trt comparisons have the same variance and hence precision (-> CI), but does it also imply the CIs are shorter compared to the ones obtained by 3x3x3? Probably not, it might be the case that one obtained by 3x3x3 is even shorter, but another is not. There are cases where we do not care about certain comparisons (e.g. T1, T2, R; here, T1 vs T2 may not be of interest). The problem though is that we cannot tell which comparison is/will be better or worse?! So instead of not knowing a priori we want to have the same variance for sure... Helmut mentioned in one of his posts that "The 3×3×3 does not give us unbiased estimates". But 3x6x3 does? For me, having variance balance does guarantee unbiased estimates (for the trt comparison). Can someone shed light on this?

Another argument is that "Any Williams' design has the advantage that pairwise comparisons may be extracted (also recommended by Byron Jones in a personal communication at the BioInternational 2003) - which are also balanced (needed for nonparametric comparisons, which seems to be of historical interest in the EU...)" as posted by Helmut here. Is this extracting only useful for special additional analyses as e.g. the mentioned nonparametric comparisons or will this extracting 2x2 tables always be used when analysing/evaluating a crossover design (point estimate and CI)? If it's always used then clearly this property is an advantage, otherwise I wonder what these special analayses are good for and whether they are actually used.

Regarding missing values/drop-outs: In this case will the 3x3x3 design be more "vulnerable" when losing values? Meaning that also the 3x6x3 will get broader CIs but the 3x3x3 will get broader CIs more quickly (i.e. less drop-outs needed in order to get broader CIs)? (The extent to what degree the CIs will get broader is not known)

Side note and question regarding PowerTOST:
When thinking of precision I was also thinking about the power. The common precision does not imply a greater power, though (leaving all other parameters unchanged); the property of having variance balance does not play a role when computing the power. Moreover, the only formal changes when changing a design are the degrees of freedom and the design constant bk (assuming equal allocation to sequences) and since both parameters are the same for 3x3x3 and 3x6x3 the powers should be the same. I used PowerTOST (v0.9-10) to verify but got
 power.TOST(CV=0.2, design="3x3", n=18) [1] 0.7887534 power.TOST(CV=0.2, design="3x6x3", n=18) [1] 0.7785031 
Is that a bug in PowerTOST? When computing sample sizes the corresponding achieved powers coincide (although different from both values above):
 sampleN.TOST(CV=0.2, design="3x3", print=FALSE)   Design alpha  CV theta0 theta1 theta2 Sample size Achieved power Target power 1    3x3  0.05 0.2   0.95    0.8   1.25          18      0.8089486          0.8 sampleN.TOST(CV=0.2, design="3x6x3", print=FALSE)   Design alpha  CV theta0 theta1 theta2 Sample size Achieved power Target power 1  3x6x3  0.05 0.2   0.95    0.8   1.25          18      0.8089486          0.8

Btw, wrt power the 3x3x3 design seems to be less vulnerable than 3x6x3: In case of one drop-out the power of 3x3x3 decreases about 3.8% whereas the power of 3x6x3 decreases about 5.1% (compare above results to power2.TOST(CV=0.2, design="3x3", n=c(5,6,6)) and power2.TOST(CV=0.2, design="3x6x3", n=c(2,3,3,3,3,3)), respectively). This might be an advantage of 3x3x3...

Thanks,
Ben