Bioequivalence and Bioavailability Forum • lm and the model matrix

ElMaestro
★★★

Denmark,
2024-02-19 10:51
(468 d 08:17 ago)

Posting: # 23868
Views: 4,288

lm and the model matrix [🇷 for BE/BA]

Hi all,

I continue to be a little baffled by R's lm and related functions.

Let's say we have a typical BE-related model like
M=lm(log(Cmax) ~ factor(Seq)+factor(Subj)+factor(Trt)+factor(Per))

I want a full rank model matrix, for example because further downstream I need access to the projection matrix (or hat matrix, H). The matrix, not just the diagonal entries. Should be easy, right?


H=X(X'X)^-1X'

, where X is the full rank model matrix.
So I just make a call like

X=model.matrix(M)

and we plug that into the equation for H.

But no.... Unfortunately, R doesn't let me get away with it, because the X I extract from the call above is not full rank. Neither is does it contain a column for every level of every factor. It is somehwere in between the two extremes and I do no understand how that X offers any advantage at all.

Next, ok, there's probably an easy way to make X full rank, right? A simple call in R's stats library must definitely exist, right?
Well, perhaps you guys know it, but I don't. I find myself having to write a stupid function like

MakeFullRank=function(X)

{

  Q=X[,1]


  for (i in 2:ncol(X))

  {

    rnk=qr(Q)$rank

    Q2=cbind(Q, X[,i])

    rnk2=qr(Q2)$rank

    if (rnk2>rnk) Q=Q2

  }

  return(Q)

}

This will convert the X into a full rank X.

It all seems so odd to me... R is such a functional language and all sorts of functions are optimised, and offer all sorts of handy functionality, so:

Can anyone tell if R's model.matrix (not full rank and not containing a column for every factor level for every factor) offers any advantage within an lm?
Can anyone tell if there are easier or direct ways to extract a full rank model matrix from an lm object?
Can anyone tell if there are easier ways to extract the hat matrix (not just its diagonals) than the clumsy outline above?

Finally, let me emphasize I do not wish for a discussion of Moore-Primrose pseudoinverses or the opportunities those beasts offer. In essence this is plainly and simply about lm and inverses, not pseudoinverses. Should anyone care to answer then give pt 3 lowest priority, please. The hat matrix deal is only one example showing why I need a full rank X. There are plenty other reasons as well. But if I understand why creators of lm prefer to work on deficienct-rank model matrices then perhaps I can apply this understanding in my own setting as well :-)

Many thanks for any input.

—
Pass or fail!
ElMaestro

mittyri
★★

Russia,
2024-02-22 21:21
(464 d 21:47 ago)

@ ElMaestro
Posting: # 23873
Views: 3,367

the model matrix and the contrasts

Post reply

Dear ElMaestro!

I (and many others) understand your degree of discontent.
Yes, this is something unexpected.
I suppose you already read this answer

I have nothing to add to this: that's a price for identifiability. The only solution I see besides you already gave is to turn off the contrasts as shown there:

f <- sample(gl(3, 4, labels = letters[1:3]))

# [1] c a a b b a c b c b a c

#Levels: a b c



g <- sample(gl(3, 4, labels = LETTERS[1:3]))

# [1] A B A B C B C A C C A B

#Levels: A B C

X0 <- model.matrix(~ f + g, contrasts.arg = list(

                   f = contr.treatment(n = 3, contrasts = FALSE),

                   g = contr.treatment(n = 3, contrasts = FALSE)))



#   (Intercept) f1 f2 f3 g1 g2 g3

#1            1  0  0  1  1  0  0

#2            1  1  0  0  0  1  0

#3            1  1  0  0  1  0  0

#4            1  0  1  0  0  1  0

#5            1  0  1  0  0  0  1

#6            1  1  0  0  0  1  0

#7            1  0  0  1  0  0  1

#8            1  0  1  0  1  0  0

#9            1  0  0  1  0  0  1

#10           1  0  1  0  0  0  1

#11           1  1  0  0  1  0  0

#12           1  0  0  1  0  1  0

—
Kind regards,
Mittyri

ElMaestro
★★★

Denmark,
2024-02-23 18:41
(464 d 00:28 ago)

@ mittyri
Posting: # 23878
Views: 3,353

Yes, but....

Post reply

Hi Mittyri,

❝ I have nothing to add to this: that's a price for identifiability. The only solution I see besides you already gave is to turn off the contrasts as shown there:

❝ f <- sample(gl(3, 4, labels = letters[1:3]))

❝ # [1] c a a b b a c b c b a c

❝ #Levels: a b c

❝

❝ g <- sample(gl(3, 4, labels = LETTERS[1:3]))

❝ # [1] A B A B C B C A C C A B

❝ #Levels: A B C

❝ X0 <- model.matrix(~ f + g, contrasts.arg = list(


❝                    f = contr.treatment(n = 3, contrasts = FALSE),

❝                    g = contr.treatment(n = 3, contrasts = FALSE)))

Nice, but are you aware that this model matrix is also not full rank, but rank-deficient? A cardinal question was how to get the (or a) full rank matrix.
The rank of your X0 is 5 but there are 7 columns. Seems you went to the opposite extreme of full rank, i.e. one column for every level of factors, plus intercept. Deficiency here = 2 redundant columns.
Anyways, I guess I will just live with it and do my stuff in a dumb but necessary fashion. Keep me posted if you find a nifty way of generating a full rank matrix model, please. :-)

—
Pass or fail!
ElMaestro

mittyri
★★

Russia,
2024-02-23 20:39
(463 d 22:29 ago)

@ ElMaestro
Posting: # 23879
Views: 3,309

Yes, but....

Post reply

Hi ElMaestro,

may be I misunderstood your

❝ is not full rank. Neither is does it contain a column for every level of every factor.

The solution you proposed with qr() does not give a column for every level of every factor

> (MakeFullRank(model.matrix(~ f + g)))

   Q        

1  1 0 0 1 0

2  1 0 1 1 0

3  1 1 0 0 1

4  1 0 1 1 0

5  1 0 0 0 0

6  1 0 0 0 1

7  1 0 1 0 0

8  1 1 0 0 0

9  1 0 1 1 0

10 1 1 0 0 0

11 1 0 0 0 1

12 1 1 0 0 1

—
Kind regards,
Mittyri

ElMaestro
★★★

Denmark,
2024-02-23 23:15
(463 d 19:54 ago)

(edited on 2024-02-24 00:00)
@ mittyri
Posting: # 23880
Views: 3,339

Yes, but....

Post reply

Hi Mittyri,

❝ may be I misunderstood your

❝

❝ ❝ is not full rank. Neither is does it contain a column for every level of every factor.

❝ The solution you proposed with qr() does not give a column for every level of every factor

❝ > (MakeFullRank(model.matrix(~ f + g)))


❝    Q        
❝ 1  1 0 0 1 0
❝ 2  1 0 1 1 0
❝ 3  1 1 0 0 1
❝ 4  1 0 1 1 0
❝ 5  1 0 0 0 0
❝ 6  1 0 0 0 1
❝ 7  1 0 1 0 0
❝ 8  1 1 0 0 0
❝ 9  1 0 1 1 0
❝ 10 1 1 0 0 0
❝ 11 1 0 0 0 1

❝ 12 1 1 0 0 1

A full rank model matrix means there are not more columns than the rank. Then X^tX becomes invertible.
A model matrix with a column for every level of every factor (plus intercept) is not the same. Rather that's redundancy. X^tX now becomes trouble :-)

My function creates a full rank model matrix.

—
Pass or fail!
ElMaestro