Mean as intercept; model matrices [🇷 for BE/BA]

posted by ElMaestro  – Denmark, 2009-05-20 22:35 (5425 d 12:29 ago) – Posting: # 3719
Views: 11,303

(edited by ElMaestro on 2009-05-21 10:29)

Dear all,

I have two questions about R and BE and the model specification of Chow & Liu. In principle, I could ask these questions in the more dedicated R fora, however, since Bebac users are friendlier and sexier and probably are able to provide answers that relate directly to BE, I prefer to ask them here.

For illustration purposes I have invented some arbitrary data, in R syntax:
lnY  = c(10,11,12,13,14,15,10,11,12,13,14,15,16,17)
Per  = as.factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2))
Seq  = as.factor(c(1,1,1,1,2,2,2,1,1,1,1,2,2,2))
Trt  = as.factor(c(1,1,1,1,2,2,2,2,2,2,2,1,1,1))
Subj = as.factor(c(1,2,3,4,5,6,7,1,2,3,4,5,6,7))
Snot = lm(lnY~Trt+Subj+Seq+Per)
## seems to be standard syntax in this forum
...which corresponds to a 2,2,2-BE study with 7 subjects

1. Chow & Liu's model specification is
lnY = Mean + S(i,k) + P(j) + F(j,k) + C(j-1,k) + E(i,j,k)
The lm fit above is not exactly corresponding to this model, as lm will fit an intercept unless we request it specifically not to. The fitted intercept is not the mean. This makes no difference in terms of the anova/error term BUT it will have some consequence for the regressors. Luckily, when we calculate the 90% CI, we do not need the individual model constants for the two treatments, we actually only need the difference between the model coefficients for the two treatments [rule: ln(A)-ln(B)=ln(A/B)], and that difference is preserved regardless of the magnitude of the intercept. We can get the difference from summary(Snot), where we read out how different Trt2 is from Trt1.
But let's assume we want to do the fit as Chow & Liu want us to. Which lm syntax works? I cannot do it with lm or glm. The only way I can do achieve the same is via the absolutely horrendous syntax:
Snot2 = lm(lnY-mean(lnY)~Trt+Subj+Seq+Per-1) or:
lnY2  = lnY-mean(lnY) and then Snot2 = lm(lnY2~Trt+Subj+Seq+Per-1)

The advantage of Snot2 is that we ask R to do a fit without the intercept (we have subtracted the intercept/mean on the left-hand side anyway). Now we can read out both treatment regressors directly from summary(Snot2), not just the difference between them.
Is there a smarter, direct way of doing the fit the Chow&Liu way with mean as intercept?

2. In principle the overall model matrix for Snot above will have 14 rows (14 observations) and 14 columns (1 intercept+7 subjects+2 periods+2 treatments+2 sequences). Then there are redundancies so the 14×14 matrix will not be full rank, making it difficult to do matrix boogie (like inversion etc, which can be really handy when fitting) on it. Some statistical packages therefore prefer to reduce the model matrices to full rank. Now let's look at R's model matrix for Snot:
model.matrix(Snot)

On my computer, I get a matrix with 10 columns. Obviously, the column called "Seq2" can be constructed from the three columns "Subj5", "Subj6" and "Subj7". Thus the muddle matrix is not full rank. It has rank=9 [qr(model.matrix(Snot))].
So, why on earth will R reduce a model matrix into something that does not have full rank? What is the computational advantage of doing the muddle fit on this matrix?


Many thanks for any insight you can provide
EM.

--
Edit1: Formatted using BBCode. [Helmut]
Edit2: Originally got 15 when I added 1,7,2,2 and 2. Corrected above. Morale: Never trust any calculation done by ElMaestro.
Edit3: Welcome to the club! [Helmut]

Complete thread:

UA Flag
Activity
 Admin contact
22,957 posts in 4,819 threads, 1,640 registered users;
83 visitors (0 registered, 83 guests [including 2 identified bots]).
Forum time: 10:05 CET (Europe/Vienna)

Nothing shows a lack of mathematical education more
than an overly precise calculation.    Carl Friedrich Gauß

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5