d_labes
★★★

Berlin, Germany,
2012-04-25 17:52
(4767 d 04:43 ago)

Posting: # 8461
Views: 13,192
 

 Ref. vs. Ref. in partial replicate design [RSABE / ABEL]

Dear ALL!

From time to time the question of comparing Reference versus Reference in replicate designs pops up at me. We had discussed this already to some extent in this thread.

Helmut has pointed out a method via the EMA suggested ANOVA (throwing away the data for the Test formulation) recoding the treatments (or periods) to the occasion (replicate) the Reference was applied and comparing occasion 1 to occasion 2. His model must neglect period if he was evaluating the EMA data set I from the EMA Q&A, a 2-sequence-4-periods design.

I was for some reasons interested in applying this idea to the partial replicate design (sequences TRR/RTR/RRT). My test data was the EMA set II. I recoded the treatments to T1, R1 and R2. The numbers after the code are the replicate numbers.

With the usual ANOVA (recommended in the Q&A also for replicate studies) with effects tmt2, period, sequence and subject within sequence I got for the pair R1-R2:
        ----- log data ---------------------------   --- back transf. ------- 
data      se     diff R1-R2        90% CI            point est.  90% CI
R only  0.098703  0.005718  -0.164125 ... 0.175561   1.0057    0.8486 - 1.1919
All     0.046180  0.019498  -0.058095 ... 0.097091   1.0197    0.9436 - 1.1020

(se is the standard error of the difference)

For me that's a big difference and very astonishing because the intra-subject CV's of both analyses don't differ so much (CV=11.4% or 12.0% respectively). Quite the contrary the 90% CI is much broader for the lower CV.

To shed some light on the question "Which analysis is the correct one?" I have made an additional evaluation via the intra-subject contrasts R1-R2 analysed via an ANOVA with sequence as solely effect in the model (this is the so-called robust analysis of Patterson/Jones aka Senn's basic estimator approach).
The naive application of the intra-subject contrasts R1-R2 suffers however from a flaw concerning the period effects (named p1, p2, p3) for the partial replicate design:
sequ. Contrast Expectation
TRR    P2-P3   (R-R')+p2-p3
RTR    P1-P3   (R-R')+p1-p3
RRT    P1-P2   (R-R')+p1-p2

(P1, P2 and P3 term the values of the log-transf. PK metric in the corresponding period)
That is the mean over the sequence groups is contaminated by 2*p1-2*p3 and only an unbiased estimator of (R-R') if there are no period effects. But this flaw can easily corrected within that design if one takes P3-P1 in case of the sequence RTR as a measure of R-R'.
I obtained:
        ----- log data ---------------------------   --- back transf. ------- 
Method    se     diff R1-R2        90% CI            point est.  90% CI
naive   0.032901 -0.021780  -0.078394 ... 0.034835   0.9785    0.9246 - 1.0355
corr.   0.032901  0.001906  -0.054708 ... 0.058520   1.0019    0.9468 - 1.0603


Different results of both methods even if no significant period effect is observed within this data set.
Different results to above :confused:! But more in line with the ANOVA results using all data.
What makes me very curious is that the results of ANOVA using R data only coincide with the evaluation via intra-subject contrasts (corr. for period effects) if the first are divided by 3 (sic). Try it.

Cough :smoke:.
Any body out there with an idea what's going on here?
Bonus question: Which result do you trust?

BTW: The partial replicate design seems the only one between the common used replicate designs which can be corrected for period effects regarding an estimate of R vs. R'. For others a model without period effects must be employed, if I'm not wrong.

Regards,

Detlew
d_labes
★★★

Berlin, Germany,
2012-10-10 12:56
(4599 d 09:39 ago)

@ d_labes
Posting: # 9371
Views: 10,949
 

 Ref. vs. Ref. in partial replicate design again

Dear ALL!

May I bring this post again to your attention? Any comments from the community? >900 views but no opinion?

BTW: My favourite method would be the evaluation via intra-subject contrasts.
Because it had the shortest CI :-D.

Regards,

Detlew
ElMaestro
★★★

Denmark,
2012-10-10 14:39
(4599 d 07:56 ago)

@ d_labes
Posting: # 9372
Views: 11,056
 

 Ref. vs. Ref. in partial replicate design

Dear d_labes,

I am sure you have a point - but I don't understand it. You are probably not wrong in any way, I am sure, but my walnutsized brain cut out when I tried to understand what you are trying to achieve.

Why not just extract model effects from the raw model fit without intercept?
I mean Foobar = lm(logY ~ 0 + Treatment + blahblah )
Then use the residual to get the CI?

In case the answer is somehow hidden in your first post I hope you can somehow reformulate (I hate that word, sorry) it so that I can understand.

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2012-10-10 18:26
(4599 d 04:10 ago)

@ ElMaestro
Posting: # 9378
Views: 10,935
 

 Ref. vs. Ref. in partial replicate design

Dear ElMaestro,

❝ I am sure you have a point - but I don't understand it. You are probably not wrong in any way, I am sure, but my walnutsized brain cut out when I tried to understand what you are trying to achieve.


I tried to achieve a measure, preferably the same way as for conventional BE namely via 90% CI for R vs. R, to show that the two PK targets / per subject for the Reference obtainable from replicate crossover studies can be judged as equivalent.

❝ Why not just extract model effects from the raw model fit without intercept?

❝ I mean Foobar = lm(logY ~ 0 + Treatment + blahblah )

❝ Then use the residual to get the CI?


:confused: Can you eventually explain your idea a little bit more elaborate?
What is the intention to not fit an intercept?

Regards,

Detlew
ElMaestro
★★★

Denmark,
2012-10-10 19:16
(4599 d 03:20 ago)

@ d_labes
Posting: # 9379
Views: 10,935
 

 Ref. vs. Ref. in partial replicate design

Dear d_labes,

:confused: Can you eventually explain your idea a little bit more elaborate?

❝ What is the intention to not fit an intercept?


In matrix form we have y=Xb+e; the LSMeans can be extracted from the b-vector if we fit without an intercept. If we fit the model with an intercept, which is standard in BE when we work on the models, then the b-vector corresponds to the LSMeans offset by the intercept. The difference between the two types of fits will apart from the b-vector issue be reflected in the ANOVA p-value but the residual or amount of explained variation remains the same.
So I just suggested the model without intercept in order to have an easy way to get the LSMeans in R; if you use SAS (I don't!) I guess you'll get them anyway without much further ado.

Sorry if I caused confusion. I am just a lowlife R-user.

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2012-10-11 11:43
(4598 d 10:52 ago)

@ ElMaestro
Posting: # 9383
Views: 11,037
 

 R vs. R in partial replicate design

Mon Capitano!

Ok, now I have the LSMeans (in SASophylistic way as standard or in R using your model).

In case of using all data without recoding I have them for T and R.
With recoding to occasions for T and R1,R2.

Or using only the data for R (EMA crippled model) without recoding I have it for R (=intercept). With recoding I have them for R1 and R2.

And now? How to get a 90% CI for R vs. R?
Simple use R1-R2 of the recoded data?
Or anything else?

Regards,

Detlew
ElMaestro
★★★

Denmark,
2012-10-11 12:26
(4598 d 10:09 ago)

@ d_labes
Posting: # 9384
Views: 10,984
 

 R1 vs R2

Dear d_labes,

❝ Ok, now I have the LSMeans (in SASophylistic way as standard or in R using (...blah blah blah...)

❝ And now? How to get a 90% CI for R vs. R?

❝ Simple use R1-R2 of the recoded data?

❝ Or anything else?


The standard formula applies, doesn't it? The df's are easy from the model or anova, the critical t-value is obtainable at your chosen alpha and the derived df's, the PE (R1 - R2) is extracted via from the model effects, the sigma thorugh the residual, and you have the harmonic mean of no's of sequences under the square root sign.

I wouldn't personally think of it as a 90% CI for R vs. R, but rather refer to it as a 90% CI for the "first R vs. second R". Quite clear that we also violate some assumptions for the normal linear model here, but I guess we must live with that.
We could perhaps alternatively think along the lines of a mixed model with a covariance matrix having both a within- and between-sigma2 (the betweens would be off diagonal, the within on the diagonal) and optimise it by REML? I have no idea how to actually do this in R or any other package. Have I been sniffing too much glue? My understanding of covariance matrices is still a bit backward.

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2012-10-11 14:01
(4598 d 08:35 ago)

@ ElMaestro
Posting: # 9386
Views: 10,933
 

 R1 vs R2

Dear ElMaestro,

❝ The standard formula applies, doesn't it? The df's are easy from the model or anova, the critical t-value is obtainable at your chosen alpha and the derived df's, the PE (R1 - R2) is extracted via from the model effects, the sigma thorugh the residual, and you have the harmonic mean of no's of sequences under the square root sign.


Yessss Sir. Now we come to "des Pudels Kern". I have done it along these lines above, once with only the R data, once with all data in the usual ANOVA (all effects fixed as the EMA requests us to do).
The third evaluation was done via appropriate intra-subject contrasts (aka Senn's basic estimator approach). BTW: This is the way the FDA code in the progesterone guidance takes in evaluating the reference variability within the scaled ABE framework.

What bothers me is that these evaluations gave such distinct answers. And the most obvious evaluation in the spirit of the EMA crippled model, the first mentioned above, gave the worst answer if our goal is showing equivalence of reference versus itself within a partial replicate study.
To repeat my question: Which result do you trust?

What bothers me further is that a distinction of R into R1 and R2 is purely arbitrary. An ratio R vs. R different from 1 is horrible for me.

Moreover the distinction of R into R1 and R2 and evaluation of the R data alone via ANOVA with effects tmt2, period, sequence and subject within sequence is not universally valid. Try it with the EMA dataset I, a 4 period full replicate study. I got:
  Source               DF    Type III SS    Mean Square   F Value   Pr > F

  tmt2                  0      0.0000000       .              .      .
  sequence              0      0.0000000       .              .      .
  subject(sequence)    74    118.3549864      1.5993917      8.02   <.0001
  period                1      0.3995422      0.3995422      2.00   0.1612

Trapped in the type III hell! No estimate of R1 vs R2 obtainable.
It only functions if you omit period from the model.

❝ We could perhaps alternatively think along the lines of a mixed model with a covariance matrix having both a within- and between-sigma2 (the betweens would be off diagonal, the within on the diagonal) and optimise it by REML? I have no idea how to actually do this in R or any other package. Have I been sniffing too much glue? My understanding of covariance matrices is still a bit backward.


No idea from my side how to get an estimate of R vs. R within this framework :-(.

Regards,

Detlew
ElMaestro
★★★

Denmark,
2012-10-11 14:09
(4598 d 08:26 ago)

@ d_labes
Posting: # 9387
Views: 10,936
 

 R1 vs R2

Dear d_labes,

❝ To repeat my question: Which result do you trust?


❝ What bothers me further is that a distinction of R into R1 and R2 is purely arbitrary. An ratio R vs. R different from 1 is horrible for me.


Yes, this is truly weird.
We are testing one and the same product against itself, so by definition we know that the true P/E is exactly 1 with 100% confidence. In practice we get some variability, though, if we try and verify it in practice. Biology sucks.

❝ Moreover the distinction of R into R1 and R2 and evaluation of the R data alone via ANOVA with effects tmt2, period, sequence and subject within sequence is not universally valid. Try it with the EMA dataset I, a 4 period full replicate study. I got:

  Source               DF    Type III SS    Mean Square   F Value   Pr > F


❝   tmt2                  0      0.0000000       .              .      .
❝   sequence              0      0.0000000       .              .      .
❝   subject(sequence)    74    118.3549864      1.5993917      8.02   <.0001
❝   period                1      0.3995422      0.3995422      2.00   0.1612

❝ Trapped in the type III hell! No estimate of R1 vs R2 obtainable.

❝ It only functions if you omit period from the model.


Can you send me the dataset (coded R1 and R2) in an excel file or something like that? I need to look at the columns and rank of the design matrix to figure it out.

Muchas gracias.

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2012-10-11 14:46
(4598 d 07:50 ago)

@ ElMaestro
Posting: # 9388
Views: 10,891
 

 R1 vs R2

Dear ElMaestro,

you have Mehl in der Mehlkiste.

Regards,

Detlew
ElMaestro
★★★

Denmark,
2012-10-11 15:38
(4598 d 06:58 ago)

@ d_labes
Posting: # 9391
Views: 10,927
 

 treatment df

Thank you, d_labes,

❝ you have Mehl in der Mehlkiste.

M = lm(A$logDATA ~ 0 + as.factor(A$tmt2) + as.factor(A$subject) + as.factor(A$period) + as.factor(A$sequence))
anova(M)
Analysis of Variance Table

Response: A$logDATA
                      Df  Sum Sq Mean Sq    F value Pr(>F)   
as.factor(A$tmt2)      4 17913.4  4478.3 28057.2233 <2e-16 ***
as.factor(A$subject)  76   214.4     2.8    17.6736 <2e-16 ***
as.factor(A$period)    3     0.3     0.1     0.6654 0.5742   
Residuals            215    34.3     0.2                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


coef(M) gives the effects for all recoded treatments. Note df=4 because the fit is without intercept.

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2012-10-11 18:34
(4598 d 04:01 ago)

@ ElMaestro
Posting: # 9392
Views: 10,890
 

 Missunderstanding

Dear Ol'Pirate,

I talked about evaluation using only the data for Reference (EMA crippled method)!

Regards,

Detlew
ElMaestro
★★★

Denmark,
2012-10-11 20:12
(4598 d 02:23 ago)

(edited on 2012-10-11 21:51)
@ d_labes
Posting: # 9395
Views: 10,903
 

 Missunderstanding, crippled

Hi d_labes,

❝ I talked about evaluation using only the data for Reference (EMA crippled method)!


Did you just mean to prune away everything related to T1 and T2 and then run an analysis*? It becomes more and more unclear to me what you actually wish to accomplish. I still haven't got the faintest clue why you wish to evaluate Ref against itself.
  • C = subset(A, tmt2!="T2" & tmt2!="T1")
    M2 = lm(C$logDATA~0 + as.factor(C$tmt2) + as.factor(C$subject) + as.factor(C$period) + as.factor(C$sequence))
    coef(M2)
or did you mean sumfin entirely else?


If you use drop1(C, test="F") to get type III SS then you get basically the anova you pasted above. But we are only interested in the effect estimates not the source of variation, and the residual stays the same. So forget the type III SS approach - it is not necessary.


Edit: Massa! Why don’t you simply edit your post? [Helmut]

Hi Helmut: Yes, I screwed up a few times today when I somehow created a new entry rather than an edit :sleeping:. It was borderline stupid, I admit. Please accept my apologies for causing overtime for you. Beer is on me next time we meet.


Yes, pleeeze! :party: [Helmut]

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2012-10-12 16:54
(4597 d 05:42 ago)

@ ElMaestro
Posting: # 9405
Views: 10,882
 

 EMA crippled method - R peculiarities

Hi ElMaestro,

❝ Did you just mean to prune away everything related to T1 and T2 and then run an analysis*?


Exactly. This is what the EMA wants us to do. At least in the context of calculation of the intra-subject variability for the reference.
But also in the evaluation of 3-treatment-3-period or 4-treatment-4-period they request in the same spirit " ... In studies with more than two treatment arms (e.g. a three period study including two references, one from EU and another from USA, or a four period study including test and reference in fed and fasted states), the analysis for each comparison should be conducted excluding the data from the treatments that are not relevant for the comparison in question."

❝ It becomes more and more unclear to me what you actually wish to accomplish. I still haven't got the faintest clue why you wish to evaluate Ref against itself.


Simple Answer: Someone (my boss, my sponsor, my spouse, a NLYW ...) has placed an order to me to do it. And I have to respond under threat of punishment :-D.

❝ If you use drop1(C, test="F") to get type III SS then you get basically the anova you pasted above. But we are only interested in the effect estimates not the source of variation, and the residual stays the same. So forget the type III SS approach - it is not necessary.


This may be true. But my SAS code tells me further that the difference R1-R2 (and the 90% CI) is not estimable!
And it seems not only a type III issue and not a SAS phenomenon:
(PKdata contains the data for Reference only, factors created prior to lm() call.)
# Anders order of effects
> model1 <- lm(log(Cmax) ~ 0 + tmt2 + subject + period + sequence, PKdata)
> anova(model1) # type I sum of squares
Analysis of Variance Table

Response: log(Cmax)
          Df Sum Sq Mean Sq    F value    Pr(>F)   
tmt2       2 8839.6  4419.8 22175.1817 < 2.2e-16 ***
subject   76  119.3     1.6     7.8755 2.798e-16 ***
period     1    0.4     0.4     2.0046    0.1612   
Residuals 71   14.2     0.2                         

Sequence gone!

> model2 <- lm(log(Cmax) ~ 0 + subject + period + sequence + tmt2, PKdata)
> anova(model2)
Analysis of Variance Table

Response: log(Cmax)
          Df Sum Sq Mean Sq  F value Pr(>F)   
subject   77 8958.6 116.345 583.7277 <2e-16 ***
period     2    0.8   0.386   1.9383 0.1515   
Residuals 71   14.2   0.199                   

Sequence gone! tmt2 vanished (coef() for tmt2R2=NA)!


Note the varying df's for period!
The same happens in models with intercept. Weird I think. The same strong dependence on the order of the effects I had noted already in this post.

:ponder: Ways out? Abandon EMA's crippled method?

Regards,

Detlew
ElMaestro
★★★

Denmark,
2012-10-12 18:25
(4597 d 04:10 ago)

@ d_labes
Posting: # 9406
Views: 11,046
 

 All is good, just forget ANOVA - you don't need it!

Dear d_labes,

❝ Simple Answer: Someone (my boss, my sponsor, my spouse, a NLYW ...) has placed an order to me to do it. And I have to respond under threat of punishment :-D.


Yeah, I know. Spouses, bosses, NLYW broadcast on frequencies that I don't easily receive. At least the signals appear encrypted when I try.

❝ ❝ If you use drop1(C, test="F") to get type III SS then you get basically the anova you pasted above. But we are only interested in the effect estimates not the source of variation, and the residual stays the same. So forget the type III SS approach - it is not necessary.


❝ This may be true. But my SAS code tells me further that the difference R1-R2 (and the 90% CI) is not estimable!


I think you got it wrong. Bear in mind that a model (a fit, a GLM in SAS or an LM in R) is just about finding an effect vector that corresponds to the maximum likelihood which (for this model type) is the least squares fit. We just need the model matrix along with the observations to find the effects vector. When you do a type III anova on a model with four effects Subj, Treat, Seq and Per then for the case of Treat, it will first fit the linear model with Subj, Seq, Per and record the residual. Then it will include Treat on top of all the others and record the residual. The difference in residuals is the type III variability for Treat. At the time when it adds Treat it will correspond to a maximum of two columns in the model matrix. But it cannot add a single column, because any of the two potential treatment columns will not be liearly independent of the set of columns that are already in the model matrix (the columns for Subj, Per, Seq). This is why you get df=0 when you try. To see exactly which, how, what, why you can eliminate columns one by one until the rank starts dropping.

You get a similar issue for Seq; the Seq is known (not free to vary) when we know period and treatment). This will be the type III case.

The solution is to do the fit with Treatment first (lexically first, lexically = the first effect the interpreter reads and handles when it constructs the model matrix; for R this is, as far as I know, the first effect after the "~" character), then the rest. The anova is irrelevant for the construction of a 90% CI, since you get the effects, residual and residual df's from the fit itself. These essentials don't come from an anova (although SAS likes to pretend they do).

❝ And it seems not only a type III issue and not a SAS phenomenon:

(PKdata contains the data for Reference only, factors created prior to lm() call.)

# Anders order of effects

❝ > model1 <- lm(log(Cmax) ~ 0 + tmt2 + subject + period + sequence, PKdata)

❝ > anova(model1) # type I sum of squares

❝ Analysis of Variance Table


❝ Response: log(Cmax)

❝           Df Sum Sq Mean Sq    F value    Pr(>F)   

❝ tmt2       2 8839.6  4419.8 22175.1817 < 2.2e-16 ***

❝ subject   76  119.3     1.6     7.8755 2.798e-16 ***

❝ period     1    0.4     0.4     2.0046    0.1612   

❝ Residuals 71   14.2     0.2                         


Sequence gone!


❝ > model2 <- lm(log(Cmax) ~ 0 + subject + period + sequence + tmt2, PKdata)

❝ > anova(model2)

❝ Analysis of Variance Table


❝ Response: log(Cmax)

❝           Df Sum Sq Mean Sq  F value Pr(>F)   

❝ subject   77 8958.6 116.345 583.7277 <2e-16 ***

❝ period     2    0.8   0.386   1.9383 0.1515   

❝ Residuals 71   14.2   0.199                   


Sequence gone! tmt2 vanished (coef() for tmt2R2=NA)!

❝ Note the varying df's for period!

❝ The same happens in models with intercept. Weird I think. The same strong dependence on the order of the effects I had noted already in this post.


No, it isn't weird. It all is because of the columns of the model matrix.
Lemme say this with different words: Imagine we just look at a model with two treatments and some subjects. We fit it with an intercept. Simplest case is a dataset like this:
Subj   Treat
1      A
1      B
2      A
2      B
3      B
3      A
4      B
4      A

(etc).

The unreduced model matrix looks like this with intercept (I for intercept, S for subject, T for treatment):
I S1 S2 S3 S4 TA TB
1 1           1 
1 1              1
1    1        1
1    1           1
1       1        1
1       1     1
1          1     1 
1          1  1

(etc, blanks are zeros; I have used the most intuitive contrast coding)
You can see that if you add the columns for S1+S2+S3+S4 then they add up to exactly column I. Thus they are not linearly independent. We therefore need to remove a subject. We can remove any of them, the rest will never add up to the intercept. Same goes for Treatment, the two T-columns add up to the intercept. We therefore remove one of them, egal welche. This means, we lost one df for subject and one for treatment because we wanted to fit an intercept.

If we do not wan to fit an intercept, then the I-column in the above example isn't there. Thus all four subject columns are linearly independent, so we have 4 df's for subject. But we need to remove a treat column (minus one df for treat) because the four subject columns added would otherwise equal the two treatment columns added. If we fit treatment first, then we retain two df's for treatment but then we have to get rid of one of the subject columns.

I hope it now makes sense.
The way out is thus to fit a model without intercept and treatment first; then you can fill in all the rest as you please; the residual (90% CI) will not be affected. R will take care of the model matrix for you.

Have a good weekend.

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2012-10-15 10:35
(4594 d 12:00 ago)

@ ElMaestro
Posting: # 9415
Views: 10,718
 

 Just forget ANOVA. What next?

Dear Öberster größter Meister,
  1. Thanks for all your in depth explanations about muddle matrices and similar complicated issues. I will take it to heart and keep there.
  2. First you told me: Forget type III. Now you tell me: Forget ANOVA. What next to forget?
  3. I personally have not and will not trust in any model parameters (aka effect estimates) from a model which doesn't fit. And not getting one ore more coefficients of the model (getting '.' in SAS or 'NA' in R-speak) is for me an indication of no fit. A strong indication.

Regards,

Detlew
ElMaestro
★★★

Denmark,
2012-10-15 12:04
(4594 d 10:32 ago)

@ d_labes
Posting: # 9416
Views: 10,953
 

 Just forget ANOVA. What next?

Dear d_labes,

❝ 2. First you told me: Forget type III. Now you tell me: Forget ANOVA. What next to forget?

Erm... this is a matter of personal preference. The parking ticket you got last week? The doctor's order to trash the cigarettes? Wife's birthday?

❝ 3. I personally have not and will not trust in any model parameters (aka effect estimates) from a model which doesn't fit. And not getting one ore more coefficients of the model (getting '.' in SAS or 'NA' in R-speak) is for me an indication of no fit. A strong indication.


No fit isn't in question. All the reasonable models you can think of in this regard will fit.
I understand your scepticism (scepticalism? sceptolophysm? Dammit my English is getting worse with time, sorry). Anyways, let me show you an example in R to illustrate what I mean. This is just a made up dataset for a 2-seq, 2-treat, 2-per BE study with three subjects in each sequence:
lnPKdata=c(18,19,23,22,27,18,16,15,20,20, 26,17)
Per=as.factor(c(1,2,1,2,1,2,1,2,1,2,1,2))
Seq=as.factor(c("TR", "TR", "TR", "TR", "TR", "TR", "RT", "RT", "RT","RT", "RT", "RT"))
Trt=as.factor(c("T", "R","T", "R","T", "R","R", "T","R", "T","R", "T"))
Subj=as.factor(c(1,1,2,2,3,3,4,4,5,5,6,6))


Let's say you do a classical thing like:
M=lm(lnPKdata~Trt+Seq+Subj+Per)

Now, in this model there is no effect for Treatnment T, but this doesn't mean it doesn't exist or didn't fit: It just happens that we wanted an intercept, so we cannot have both a column for treatment T and treatment R in the model matrix. We just get one of them (in this case R makes the choice, it will be the effect for TrtT). So what about TrtR? Well, an observation that isn't associated with TrtT mst be TrtR. And since the intercept is a column of 1's, we can subtract the column for TrtT from the column of the intercept and voila there we have a column that would indicate TrtR. So, hokus pokus and rabbits out of a hat, try and subtract the effect for TrtT from the intercept (effect).
You can fit the same model without intercept and then you can extract the effects for TrtT and TrtR more directly.

A more complicated example to illustrate the logical beauty of model matrices, estimability and matrix algebra:

From the fit above we get:
Coefficients:
(Intercept)         TrtT        SeqTR        Subj2        Subj3        Subj4 
    23.1667      -0.1667      -3.0000       4.0000       4.0000      -6.0000 
      Subj5        Subj6         Per2 
    -1.5000           NA      -3.1667


Eh.... So the effect for Subject 6 doesn't exist? Is not estimable??
Well, it does exist but from the way the model is specified and the way R constructs a model matrix, it is not fee to vary. Try and look at the internal model matrix:

model.matrix(M)
   (Intercept) TrtT SeqTR Subj2 Subj3 Subj4 Subj5 Subj6 Per2
1            1    1     1     0     0     0     0     0    0
2            1    0     1     0     0     0     0     0    1
3            1    1     1     1     0     0     0     0    0
4            1    0     1     1     0     0     0     0    1
5            1    1     1     0     1     0     0     0    0
6            1    0     1     0     1     0     0     0    1
7            1    0     0     0     0     1     0     0    0
8            1    1     0     0     0     1     0     0    1
9            1    0     0     0     0     0     1     0    0
10           1    1     0     0     0     0     1     0    1
11           1    0     0     0     0     0     0     1    0
12           1    1     0     0     0     0     0     1    1


Look at the column for subject 6. Then mentally, subtract SeqTR from Intercept, and subtract Subj5 and Subj4 from that. What do you get? Subject 6! Therefore, there is collinearity in the above model matrix and the rank (the df's) cannot be 9 (=the number of columns; you can try and qr the model matrix to confirm that the rank is 8 rather than 9). The model matrix displayed here is thus not the full rank matrix the linear model requires so we need to decrease by a df which in this case is subject 6. Thus to get the effect for Subj 6, even though the model could not "fit" it, we just add and subtract a little of the other effects to get it.

Neat, eh? Just really stupid that R displays a model matrix that is not full rank. It is not very pedagogical.

I guess all in all we can say:
1. The model specification determines the model matrix.
2. The model matrix in a linear model must be full column rank.
3. Fitting with an intercept steals exactly one df from some other factor. Usually the first factor mentioned but this depends on the language implementation.
4. The order of factors may determine the number of df's associated with each of them in a standard fit.
5. We can get effects from treatments that appear gone and lost, by playing around with the effects until we get a match of model latrix columns for the effect we are interested in.

Now back to the seven seas.
:pirate:








:pirate:

Pass or fail!
ElMaestro
UA Flag
Activity
 Admin contact
23,424 posts in 4,927 threads, 1,671 registered users;
137 visitors (0 registered, 137 guests [including 8 identified bots]).
Forum time: 22:36 CEST (Europe/Vienna)

Only dead fish go with the current.    Scuba divers' proverb

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5