All is good, just forget ANOVA - you don't need it! [RSABE / ABEL]

posted by ElMaestro  – Denmark, 2012-10-12 18:25 (4597 d 05:43 ago) – Posting: # 9406
Views: 11,049

Dear d_labes,

❝ Simple Answer: Someone (my boss, my sponsor, my spouse, a NLYW ...) has placed an order to me to do it. And I have to respond under threat of punishment :-D.


Yeah, I know. Spouses, bosses, NLYW broadcast on frequencies that I don't easily receive. At least the signals appear encrypted when I try.

❝ ❝ If you use drop1(C, test="F") to get type III SS then you get basically the anova you pasted above. But we are only interested in the effect estimates not the source of variation, and the residual stays the same. So forget the type III SS approach - it is not necessary.


❝ This may be true. But my SAS code tells me further that the difference R1-R2 (and the 90% CI) is not estimable!


I think you got it wrong. Bear in mind that a model (a fit, a GLM in SAS or an LM in R) is just about finding an effect vector that corresponds to the maximum likelihood which (for this model type) is the least squares fit. We just need the model matrix along with the observations to find the effects vector. When you do a type III anova on a model with four effects Subj, Treat, Seq and Per then for the case of Treat, it will first fit the linear model with Subj, Seq, Per and record the residual. Then it will include Treat on top of all the others and record the residual. The difference in residuals is the type III variability for Treat. At the time when it adds Treat it will correspond to a maximum of two columns in the model matrix. But it cannot add a single column, because any of the two potential treatment columns will not be liearly independent of the set of columns that are already in the model matrix (the columns for Subj, Per, Seq). This is why you get df=0 when you try. To see exactly which, how, what, why you can eliminate columns one by one until the rank starts dropping.

You get a similar issue for Seq; the Seq is known (not free to vary) when we know period and treatment). This will be the type III case.

The solution is to do the fit with Treatment first (lexically first, lexically = the first effect the interpreter reads and handles when it constructs the model matrix; for R this is, as far as I know, the first effect after the "~" character), then the rest. The anova is irrelevant for the construction of a 90% CI, since you get the effects, residual and residual df's from the fit itself. These essentials don't come from an anova (although SAS likes to pretend they do).

❝ And it seems not only a type III issue and not a SAS phenomenon:

(PKdata contains the data for Reference only, factors created prior to lm() call.)

# Anders order of effects

❝ > model1 <- lm(log(Cmax) ~ 0 + tmt2 + subject + period + sequence, PKdata)

❝ > anova(model1) # type I sum of squares

❝ Analysis of Variance Table


❝ Response: log(Cmax)

❝           Df Sum Sq Mean Sq    F value    Pr(>F)   

❝ tmt2       2 8839.6  4419.8 22175.1817 < 2.2e-16 ***

❝ subject   76  119.3     1.6     7.8755 2.798e-16 ***

❝ period     1    0.4     0.4     2.0046    0.1612   

❝ Residuals 71   14.2     0.2                         


Sequence gone!


❝ > model2 <- lm(log(Cmax) ~ 0 + subject + period + sequence + tmt2, PKdata)

❝ > anova(model2)

❝ Analysis of Variance Table


❝ Response: log(Cmax)

❝           Df Sum Sq Mean Sq  F value Pr(>F)   

❝ subject   77 8958.6 116.345 583.7277 <2e-16 ***

❝ period     2    0.8   0.386   1.9383 0.1515   

❝ Residuals 71   14.2   0.199                   


Sequence gone! tmt2 vanished (coef() for tmt2R2=NA)!

❝ Note the varying df's for period!

❝ The same happens in models with intercept. Weird I think. The same strong dependence on the order of the effects I had noted already in this post.


No, it isn't weird. It all is because of the columns of the model matrix.
Lemme say this with different words: Imagine we just look at a model with two treatments and some subjects. We fit it with an intercept. Simplest case is a dataset like this:
Subj   Treat
1      A
1      B
2      A
2      B
3      B
3      A
4      B
4      A

(etc).

The unreduced model matrix looks like this with intercept (I for intercept, S for subject, T for treatment):
I S1 S2 S3 S4 TA TB
1 1           1 
1 1              1
1    1        1
1    1           1
1       1        1
1       1     1
1          1     1 
1          1  1

(etc, blanks are zeros; I have used the most intuitive contrast coding)
You can see that if you add the columns for S1+S2+S3+S4 then they add up to exactly column I. Thus they are not linearly independent. We therefore need to remove a subject. We can remove any of them, the rest will never add up to the intercept. Same goes for Treatment, the two T-columns add up to the intercept. We therefore remove one of them, egal welche. This means, we lost one df for subject and one for treatment because we wanted to fit an intercept.

If we do not wan to fit an intercept, then the I-column in the above example isn't there. Thus all four subject columns are linearly independent, so we have 4 df's for subject. But we need to remove a treat column (minus one df for treat) because the four subject columns added would otherwise equal the two treatment columns added. If we fit treatment first, then we retain two df's for treatment but then we have to get rid of one of the subject columns.

I hope it now makes sense.
The way out is thus to fit a model without intercept and treatment first; then you can fill in all the rest as you please; the residual (90% CI) will not be affected. R will take care of the model matrix for you.

Have a good weekend.

Pass or fail!
ElMaestro

Complete thread:

UA Flag
Activity
 Admin contact
23,424 posts in 4,927 threads, 1,671 registered users;
192 visitors (0 registered, 192 guests [including 3 identified bots]).
Forum time: 00:09 CEST (Europe/Vienna)

Only dead fish go with the current.    Scuba divers' proverb

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5