d_labes Hero Berlin, Germany, 20100910 12:05 (edited by d_labes on 20100910 13:06) Posting: # 5904 Views: 9,557 

Dear all! Ok, this thread could also go into the category "R for BE/BA". But this here is Helmut's favorite . Despite the fact that nonparametrics will no longer be accepted by the EMA the world is bigger than Europe and many guidelines still require a nonparametric analysis of tmax (some of them obviously copied from the old EMEA guidance). It is well known^{1)}, that the analysis in case of the classical 2x2 crossover is done by analysing the period differences (P1P2) via a Wilcoxon rank sum test with the sequence as grouping factor. Since in case of tmax there are usually some/many equal values (ties in the nonparametric methods speak) the used software should be able to handle such. In exploring this within R using the tmax values from the single dose data of Bear I came up with: # Bear single dose data This tolds me: Package 'exactRankTests' is no longer under development. BTW: Because of lexical ordering RT vs. TR is tested and this estimates RT. Thus we have to change the sign and exchange upper and lower bound to get them for TR in which we are interested. Since I'm an obedient man (especially if my wife speaks to me ) I followed the polite suggestion got after loading package 'exactRankTests' and tried: library(coin) and this told me: Exact Wilcoxon MannWhitney Rank Sum Test Upps! Different point estimates . In other examples (no room to show here) with tied data I have also observed different CIs! Any body out there who knows whats going on here? I have searched the R support lists and found that observed by others. But the answers where at least not sufficient if not meaningless. I have found a stat. course website which states "There is an R function wilcox.exact ... It does not do confidence intervals or point estimates correctly in the presence of ties." but without an explanation. The new option HL (=Hodges Lehmann estimator) in my "Power to knoff" SAS® 9.2 Proc NPAR1way (code upon request) gives me between other stuff: but this estimates TR because TR is tested versus RT here. ^{1)}Hauschke et.al. "A distribution free procedure for the statistical analysis of bioequivalence studies" Int. J. Clin. Pharmacol. Vol. 28 (2) 1990, 7278/Vol. 30, Suppl. 1, S3743 Sorry for that lengthy post. But I intended to give the Rusers among us the possibility to verify my results with cut and paste. Sorry2: I have totally forgotten that the estimate and the confidence interval calculated as given are for 2*(TR). Thus the results must be divided by 2! But what I wanted to show remains. — Regards, Detlew 
Helmut Hero Vienna, Austria, 20100910 18:28 @ d_labes Posting: # 5905 Views: 8,422 

Dear D. Labes! » Ok, this thread could also go into the category "R for BE/BA". But this here is Helmut's favorite . Right. But if ties are considered, you are stirring up a hornets' nest! » Since in case of tmax there are usually some/many equal values (ties in the nonparametric methods speak) the used software should be able to handle such. Well, cough… Summary of your results (TR)  changed signs if necessary and divided by 2: program PE 90.26% CI My quick and dirty solution: PKt$pdiff < (PKt$P1  PKt$P2)/2 » Any body out there who knows whats going on here? Hhm; don't use library(exactRankTests) ?Hauschke et al. (1990) don't speak about ties; they refer to Hollander and Wolfe¹. In my edition the word 'tie' is not even mentioned in the subject index… With my mid1990 implementation in Rocky Mountain BASIC (not corrected for ties) I get: PE 90.26% CI Now for StatXact 6.1. using the coding according to Senn (2002) I get: PE 90.26% CI What the hell? Phoenix 6.1 (asymptotic), according to Koch² (no ties!): PE 90.26% CI EquivTest/PK (asymptotic): PE 90.26% CI Not a single word about ties in the manuals of Phoenix and EquivTest/PK. StatXact states: [...] zero differences are ignored. We assume initially that there are no ties in the observed values of the D_{i}'s. We will show subsequently how to adjust for the possibility of ties. […] By suitably adapting the reasoning in Lehmann (1975, pages 181 to 182) so as to accommodate ties one can show that the probability that the interval [λ_{*}, λ^{*}] excludes λ is at most α. This ensures that the HodgesLehmann confidence intervals preserve the desired coverage probability. In a more recent textbook³ the common procedure of using midranks for ties is given. According to this reference StatXact adjusts for ties. But I'm not so sure anymore. Open questions:
» Sorry for that lengthy post. So am I. You gave me a nice excuse not to spend the afternoon struggling with my income tax return.
— All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
d_labes Hero Berlin, Germany, 20100913 13:21 @ Helmut Posting: # 5906 Views: 8,265 

Dear Helmut! Thanks for your very elaborate answer! » Right. But if ties are considered, you are stirring up a hornets' nest! I see . » My quick and dirty solution: » PKt$pdiff < (PKt$P1  PKt$P2)/2 » PKt$sequence < relevel(PKt$sequence, ref="TR") » ... Interesting: "relevel". Was not aware of that handy tool. But astonishing the result: A change in order gives a different lower bound! » Hhm; don't use library(exactRankTests) ?Seems a good recommendation. But it is claimed frequently that Rusers should use it in case of ties. Moreover coin has a much bigger footprint. » ... they refer to Hollander and Wolfe. In my edition the word 'tie' is not even mentioned in the subject index... Very strange. Everyone writing about rank methods is citing this. » With my mid1990 implementation in Rocky Mountain BASIC (not » corrected for ties) I get: » PE 90.26% CI » 0.25 0.25 +0.75 Not 'correcting' for ties to obtain the CI is exactly what Charles Geyer recommends. With respect to the Wilcoxon sign rank test he states: "First, neither ties nor zeros should make any difference in calculating point estimators or confidence intervals.
And this may be transferable to the ranksum test . If one follows his code in the interval section of the Wilcoxon rank sum test one gets your Rocky results. » Now for StatXact 6.1. using the coding according to Senn (2002) ... What exactly did you here? Senn's Basic estimator? Then a test of sequence group differences would give an estimate of the period effect. Or am I wrong? See the expected period differences below. » ... Open questions: » • Signs... (Sigh!) I've checked the coding in all programs numerous times, » but don't know what's going on here. Fishbrain.I came across with: Sometimes the smaller group is used as the first, sometimes the coding order, sometimes the character order, sometimes the first occurred level. My _{} uses the rule: smaller group and if both have the same n use the first occurred level as group 1. If this makes any sense is left to you. » I guess the reason might be that you are comparing period differences rather than treatment differences. The reason for using the period differences rather then the treatment diffs comes from the paper of Hauschke. It is based on the expected values sequence P1P2 Thus a test of the differences by sequence group gets rid of the period effects. This is not the case with the treatment diffs. » ... I tried ties.method="midranks" and ties.method="averagescores" and got:» Warning: In independence_test.IndependenceProblem(object, teststat = "scalar", : » additional arguments ties.method will be ignored I also observed this. Although mentioned on the help page it seems that the ties.method is not available in the wilcoxon_test function and other location tests of package coin. ties.method is only accepted by the function normal_test() which performs the "Exact Normal Quantile (van der Waerden) Test". » If I exclude subjects with a zero difference ... Called by Charles Geyer "The Zero fudge" . » ... You gave me a nice excuse not to spend the afternoon struggling with my income tax return. If it is that horrible as here in Germanien I did you the favor very willingly . But the pity is that the problem with the Finanzamt does not vanish by sitting it out. — Regards, Detlew 
martin Senior Austria, 20100914 08:33 @ d_labes Posting: # 5907 Views: 8,169 

dear HS and d_labes! you may find the following simulation results interesting rm(list=ls(all=TRUE)) best regards martin 
d_labes Hero Berlin, Germany, 20100915 10:14 @ martin Posting: # 5909 Views: 8,114 

Dear Martin! » you may find the following simulation results interesting » ... » print(cover1/nsim) » [1] 0.937 # coverage of wilcox_test (coin) » print(cover2/nsim) » [1] 0.9505 # coverage of wilcox.exact (exactRankTests) (comments by me)Wow! Really interesting. And unbelievable . I was told that the Wilcoxon rank sum test is due to the discrete nature of the permutation distribution function always to some degree conservative (achievable confidence with n_{1}, n_{2}=10 is 0.956743). And here it comes out that the implementation in the package coin is distinct anticonservative! And the implementation in wilcox.exact does not show signs of conservativeness. I have one small doubt. I'm not quite sure but should we not include the CI borders in counting the coverage due to the definition of the CI? If we do this one gets (only 5000 sims, see BTW below): [1] 0.9608 # coverage of wilcox_test (coin) Moreover there seems to be some interaction between both packages. If I simulate without coin I get coverage of wilcox.exact = 0.9688 (including the CI borders in counting coverage). BTW: Warning to all! The code is really a stresstest for the CPU of your computer (at least for mine). Submit the code and go for one and another beers. Next morning the result is astonishing enough . Therefore I could answer not until today. — Regards, Detlew 
martin Senior Austria, 20100915 14:53 @ d_labes Posting: # 5915 Views: 8,083 

dear d_labes! » I have one small doubt. I'm not quite sure but should we not include the CI borders in counting the coverage due to the definition of the CI? thank you very much for pointing this out; IMHO you are right  this is crucial in the case of ties. here is a little bit faster version of the simulation code (ca. 30 min on my CPU) date() and here I am, for all my lore, the wretched fool I was before (Goethe) martin 
Helmut Hero Vienna, Austria, 20100915 16:17 @ martin Posting: # 5916 Views: 8,093 

Dear Martin! » here is a little bit faster version of the simulation code (ca. 30 min on my CPU) Faster machine than mine (double Xeon 2.8GHz, 4GB RAM, 32bit XP Pro, hyperthreading on)... I would use conf.level=0.90 . At the start of the codeptm < proc.time() and after the output
runtime < proc.time()  ptm I got print(cover1/nsim) » and here I am, for all my lore, the wretched fool I was before (Goethe) Yep. For n_{1}=n_{2}=10 I would expect 0.9108 ...— All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
d_labes Hero Berlin, Germany, 20100916 11:19 @ Helmut Posting: # 5917 Views: 8,012 

Dear Helmut! » Yep. For n_{1}=n_{2}=10 I would expect 0.9108 ...Yeah, but ... without ties. 10 000 sims each with round(,5) in the code and only wilcox.exact(): run 1 0.9096 Each run (10 000 sims) only approx. 100 s! The mean seems very close to your expectation . [nitpicking] What I'm very curious of is that scatter between the runs. Seems that one has to simulate at least 100 000 or more datasets to get a reliable coverage. To much beer in case of coin:wilcox_exac() . BTW: with round to five decimals I have observed anyhow up to 10 sets with ties among the 10 000 sims. — Regards, Detlew 
Helmut Hero Vienna, Austria, 20100916 14:01 @ d_labes Posting: # 5918 Views: 8,035 

Dear D. Labes! » Yeah, but ... without ties. Right... » Each run (10 000 sims) only approx. 100 s! Everybody seems to have a faster machine than I do. Took mine ~115". Should ask my boss for a new one. » What I'm very curious of is that scatter between the runs. Seems that one has to simulate at least 100 000 or more datasets to get a reliable coverage. Interesting. — All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
martin Senior Austria, 20100916 15:35 (edited by martin on 20100916 16:03) @ Helmut Posting: # 5919 Views: 8,022 

dear d_labes and HS ! I included two additional bootstrap based approaches implemented in R package pairwiseCI where investigation of empirical coverage of bootstrap based CIs is a real stress test (7 hours)set.seed(150910) the empirical coverage of 0.9053 for the difference of HarrellDavis estimates for location looks nice. best regards martin PS.: you can interpret the empirical coverage as a proportion based on a sample size of nsim enabling to calculate a CI enabling to assess simulation accuracybinom.test(x=0.9053*nsim, n=nsim, p=0.9) 
martin Senior Austria, 20100916 18:04 @ d_labes Posting: # 5920 Views: 8,126 

dear d_labes and HS! I spend some time on thinking how to compare tmax and have some doubts that using the difference between average values (e.g. medians, HodgesLehmann or HarrellDavis estimate) is appropriate as the effect size in tmax between test and reference depends on the sampling times used. For this reason I would suggest using relative effects (e.g. implemented in R package nparcomp ).Definition (taken from this presentation http://www.biopharmnet.com/doc/2010_05_20_webinar.pdf ): "The relative effect is a a measure of how often a random subject receiving treatment X will outperform a random subject receiving treatment Y. It can be interpreted as a probability that a randomly selected patient in the control reveals a smaller response value than a randomly selected patient in the treatment group" naive application of d_labes dataset:
npar.t.test(pdiff~sequence, conflevel = 0.1, data=PKt) what do you think about using relative effects for comparing tmax values between test and reference? best regards martin 
Helmut Hero Vienna, Austria, 20100916 19:52 @ martin Posting: # 5921 Views: 8,039 

Dear Martin! » I spend some time on thinking how to compare tmax and have some doubts that using the difference between average values (e.g. medians, HodgesLehmann or HarrellDavis estimate) is appropriate as the effect size in tmax between test and reference depends on the sampling times used. Well, you know better than I do that we are comparing not medians, but the median of ranked differences  that's the HLestimator. Of course the comparison depends on the sampling schedule. But: keep in mind that t_{max} (and <nitpicking>C_{max}</nitpicking> as well) are surrogates of the rate of absorption k_{a}  which we cannot access by NCA. For an example see this post. » [...] I would suggest using relative effects... » Definition (taken from this presentation http://www.biopharmnet.com/doc/2010_05_20_webinar.pdf )...: Heavy stuff to swallow for an amateur. » "The relative effect is a a measure of how often a random subject receiving treatment X will outperform a random subject receiving treatment Y. It can be interpreted as a probability that a randomly selected patient in the control reveals a smaller response value than a randomly selected patient in the treatment group" I'm not happy with this definition. » $Analysis.of.relative.effects » Comparison rel.effect confidence.interval t.value p.value p.perm » 1 p(RT,TR) 0.633 [ 0.354 ; 0.844 ] 0.7804535 0.435 0.41 » » $Wilcoxon.Test » Comparison rel.effect p.value » 1 p(RT,TR) 0.633 0.4345742 » » what do you think about using relative effects for comparing tmax values between test and reference? I'm not familiar with the coding. The problem IMHO comes from the results given as ratios. Commonly for t_{max} a linear model is postulated (besides PK theory think about the way we speak: 'The maximum concentration of treatment A was observed one hour earlier than the one of treatment B'  not 'The time point of the maximum concentration of treatment A was 63.3% of treatment B's.'). Now if we get a percentage, to which location parameter should we apply it? To the median of the reference  or what? Interesting to read the presentation, page 90: The choice of the differencebased relevance threshold delta is nasty, use ratiotocontrol instead. Nasty or not, I don't think that regulators are happy with a multiplicative model. And so am I.— All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
martin Senior Austria, 20100916 21:00 (edited by martin on 20100916 22:14) @ Helmut Posting: # 5922 Views: 8,001 

Dear HS! thank you for your point of view, I just have a minor comment regarding the interpretation of relative effects. It's a probability that a randomly selected subject with R has a larger t_{max} value than a randomly selected subject with T (no effect would be indicated by a relative effect of 0.5). for the given dataset, the probability of a randomly selected subject with R having a larger t_{max} value than a randomly selected subject with T is 63.3% (90% CI: 35.4% to 84.4%). the other way round, the probability of a randomly selected subject with T having a smaller t_{max} value than a randomly selected subject with R is 63.3% (90% CI: 35.4% to 84.4%). best regards martin PS.: this is a naive application to the dataset  some adjustments may necessary to account for the cross over design 
Helmut Hero Vienna, Austria, 20100917 13:10 @ martin Posting: # 5923 Views: 7,950 

Dear Martin! » [...] It's a probability that a randomly selected subject with R has a larger t_{max} value than a randomly selected subject with T (no effect would be indicated by a relative effect of 0.5). Oh, I see! I mixed up probability with the size of the effect. But regulators are only interested in the latter! — All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
Helmut Hero Vienna, Austria, 20100920 14:20 @ Jack Posting: # 5925 Views: 7,917 

Hi Jack! » The coin packages uses the midpoint of the confidence interval as the point estimate for the location difference (ie (0.51.5)/2=0.5 while the exactRankTests packages uses (sort of) the median of the observed differences as the point estimate. Note that the exact procedure of exactRankTests is quite situational which is why the difference in point estimates only occurs sometimes. Interesting! » The real issue from my point of view is that it is hard (impossible?) to decide which is the "correct" estimate. Intuition tells me that the exactRankTest version is the one that is closer to the spirit of the Wilcoxon test while practicality certainly speaks for the midpoint of the CI to be used. Hhm; in those ol' days I plotted the sampling distribution  and it can be quite asymetrical. I'm not so sure whether the midpoint of the CI makes sense if the location shift is concerned. » As for the differences in confidence intervals I presume that it has to do with either the approximation which is uses if n>50 and/or the continuity correction. I'm wondering whether there's a cutoff implemented in both packages switching from the exact method (based on permutation) to an approximation due to performance reasons? I know that it's possible to dump the code from the library, but my knowledge of R is too limited (even to use the right keywords for a search). — All the best, Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. ☼ Science Quotes 
d_labes Hero Berlin, Germany, 20100927 11:27 @ Helmut Posting: # 5938 Views: 7,784 

Dear Helmut! » I'm wondering whether there's a cutoff implemented in both packages switching from the exact method (based on permutation) to an approximation due to performance reasons? I know that it's possible to dump the code from the library, but my knowledge of R is too limited (even to use the right keywords for a search). It is described in the help pages for wilcox.exact. Use getS3method("wilcox.exact","default") to see the code.For coin it is more difficult because of the S4 implementation and deserves more puzzling detective work. Especially the theoretic framework must be known in detail to follow the implementation pieces. Out of scope for a runofthemill applied statistician like me. — Regards, Detlew 
d_labes Hero Berlin, Germany, 20100927 09:53 @ Jack Posting: # 5937 Views: 7,825 

Dear Jack, was on holiday thus I can answer but now. Thanks for sharing your view. » The coin packages uses the midpoint of the confidence interval as the point estimate for the location difference (ie (0.51.5)/2=0.5 while the exactRankTests packages uses (sort of) the median of the observed differences as the point estimate. Interesting as Helmut already said. Is it documented elsewhere? Interesting enough the SAS Proc npar1way gives me both point estimates, HodgesLehmann median of the pairwise differences and the CI midpoint. The latter attributed to Lehmann:Lehmann, E. L. (1963). "Nonparametric Confidence Intervals for a Shift Parameter" Annals of Mathematical Statistics, 34, 15071512 As noticed in the example above ... (sort of) ... is different from the HL estimate in case of ties. » As for the differences in confidence intervals I presume that it has to do with either the approximation which is uses if n>50 and/or the continuity correction. No. The example I have shown has n<50 and in that case the calculations will be done "exact" whatever that "exact" means. Here another peculiarity of the coin package: library(coin) This gives: shift vs. splitup: equal CIs 0.73 Uuuups! — Regards, Detlew 