❝ Should I use the Mann-Whitney-Wilcoxon test and then calculate the confidence interval based on the Hodges-Lehmann estimator or choose one of the two?

Cave: The Mann-Whitney-Wilcoxon test

^{1}is for independent samples (parallel groups). If you deal with crossover designs or paired samples you want the Wilcoxon signed-rank test.^{2}If you specify the hypotheses as usual,$$\small{\eqalign{

H_0:\Delta&=0\\

H_{\textrm{a}}:\Delta&\neq0}}$$ you get one

*p*-value (two-sided test, of course). IMHO, that’s not useful because you will be trapped like in the approach for BE used in the early 1980s (before Donald Schuirmann’s TOST entered the stage).- Small difference & low variability →
*p*< 0.05 (significant)

- Large difference & high variability →
*p*≥ 0.05 (not significant)

❝ Another question If I may: how to define the bioequivalence limits for the tmax in this case?

You may.

Only (‼) if

*t*_{max}is*clinically relevant*(rapid onset of effect, safety issues) you should pre-specify limits. Otherwise, simply*report*the Hodges-Lehmann estimate and the ≈90% confidence interval.How to set the limits is the million dollar question. For a painkiller they might be pretty narrow (say, ±30 minutes).

^{3}For a drug used in management of chronic pain – which means multiple doses – the limits might be pretty wide.Now for the 10

^{7}$ question: How to deal with a drug with multiple indications? When ibuprofen is used as a painkiller, narrow limits. What about fever? Difference less relevant. Two limits, must pass both? Gimme a break!Furthermore, the distribution of

*t*_{max}is not symmetric (skewed to the right). Perhaps it would make sense to set asymmetric limits, say, –25 min to +40 min… More details in this article.- Also called Wilcoxon-Mann-Whitney test, Mann-Whitney
*U*test, Wilcoxon rank-sum test.

- Also Wilcoxon
*T*-test.

- According to Chinese Whispers recently a European authority didn’t accept ±30 minutes for ibuprofen.

