Significant ≠ relevant [Design Issues]

posted by Helmut Homepage – Vienna, Austria, 2015-02-05 01:49 (3340 d 05:37 ago) – Posting: # 14374
Views: 24,101

Hi Felipe,

❝ […] the biostat said that this effect could appear when you cannot give the treatments to whole sample size at the same time/day. (statement 1)


If he/she meant by “appear” to be “statistically significant”: False. Dive into your database of studies (performed in one group), arbitrarily code the first half of subjects with group=1 and the second with group=2. Run a model including a group term. If you set the significance limit to 0.05, I bet that you will see a significant “group effect” in ~1/20 of studies – although we know that the data originate from one group. That’s called “false positive” or in this particular case a statistical artifact.
I don’t like to test for effects which are either irrelevant or have no consequences. If a study was performed in two groups and we assess the p-values, there are three possible results:
  1. p ≤0.05: Groups differ. We should not pool them. Hopefully groups were not split 1:1, but one of them was the maximum capacity of the clinical site. Example: Based on a CV of 27% the sample size was estimated as 32. The capacity of the site is 24. If you split groups 16:16, power drops from 80.4% to 41.4%. What if you are lucky and show BE in one group, but not in the other? Do you think that regu­la­tors would believe the results of the “nice group” and ignore the other (failed) one? On the other hand, no regulator would ask you questions if you have unequal group sizes and base the decision on the larger group. If the split is 24:8, power in the larger group will still be 66.7%. Not sooo bad. Like rolling a die and bet on even/odd. ;-)
  2. p ≤0.05: Groups don’t differ, but the result is a false positive. Bad luck. All the nasty stuff from above is applicable.
  3. p >0.05: Groups don’t differ. Happy pooling.
Note that this test is based on the between-subject variability, which has poor power with sample sizes generally used in crossovers. Therefore, some opt for a level of 0.10 instead of 0.05. Expect a false positive in ~1/10 of studies. Splendid idea!
Hence, we try to keep groups as similar as possible by design. I have stated above con­di­ti­ons where the FDA accepts no group-term in the model. I don’t like that “experts” browse the internet for SAS-code and apply it without thinking about the consequences. I got the impression from Smitha’s post that they use a group term routinely – which will be sig­ni­fi­cant at the level of the test by pure chance. Therefore, I asked why

❝ However group effect is not a big deal such as sequence, treatment or period effect. (statement 2)


Edit: Just saw ElMaestro’s post above. What really would worry me is a p >0.05 of the subject effect. Could give a hint that subjects were a bunch of monozygotic quadruplets. The model requires that subjects are independent… Don’t overdo standardization.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
Activity
 Admin contact
22,957 posts in 4,819 threads, 1,636 registered users;
105 visitors (0 registered, 105 guests [including 6 identified bots]).
Forum time: 07:27 CET (Europe/Vienna)

With four parameters I can fit an elephant,
and with five I can make him wiggle his trunk.    John von Neumann

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5