Bioequivalence and Bioavailability Forum • Let's skip the fancy nesting syntax!

ElMaestro
★★★

Denmark,
2009-08-26 22:54
(5774 d 11:40 ago)

Posting: # 4118
Views: 16,497

Let's skip the fancy nesting syntax! [🇷 for BE/BA]

Ahoy all,

I have read and I have learned. Therefore, I hereby declare war on syntactical misnomers like "Subject % in % Sequence" or "Subject : Sequence".
If case you want to know why, then read on, otherwise please continue drinking your coffee or reading your newspaper. The following is just another elmaestrolophystic opinion, which means it is written by someone with a brain the size of a walnut, most of it is probably wrong, based on misunderstandings and miscalculations, or just reflecting Belgian seamanship.

Most people doing BE know that in crossover studies a given subject is uniquely assigned to a specific sequence. This is a type of nesting, and there is a certain schwung about speaking of 'subjects nested in sequence'. It sounds good. It sounds fancy. It might even impress the cute blonde two offices down the corridor if I speak bull like that loud enough. Certainly there are some reason...
Let's look at it from R's perspective, and let me discuss two ways of specifying subjects:

We can assign each subject in the trial a unique identifier (like 1, 2...n where n is the total number of subjects).
For each sequence we can give each subject a unique identifier (like 1, 2...n where n is the total number of subjects in that sentence). This implies that that there could be a subject in sequence 1 that is called "1" and there could at the same time be one in sequence 2 called "1". And so forth.

Both are fully valid ways of working with subjects, and R provides functionality for both ways.

When we use option 2 above we need to make sure R understands that subject 1 in sequence 1 and subject 1 in sequence 2 are not identical. Not identical in this regards means we need to have separate model coefficients for both. And this is exactly where the nesting syntax is useful- We would specify a model using muddle lingo like "lnAuc ~ Subj % in % Seq + Seq + Per + Trt". If we have 30 subjects in 2 sequences then it means R with that kind of syntax will in principle (it does reduce some redundancies away, though) try to fit 2x30=60 coefficients for the interaction of Subject and Sequence. (subject 1 in sequence 1, subject 1 in sequence 2, subject 2 in sequence 1 ... subject 30 in sequence 2).
And hey, yes, I deliberately used the horrible word "interaction" in the previous sentence. When R fits a model it creates a model matrix for us, although we do not see it unless we ask for it (model.matrix(MyFunkyFit)). Nesting in a model matrix is the same as interactions. This is why we get same results if we use "lnAuc ~ Subj : Seq + Seq + Per + Trt". However, interactions do not care about order of factors. That's why when we finally do the ANOVA and we have used "Subj : Seq" then R reserves its right to give us the result expressed as a p-value for "Seq : subj" rather than "Subj : Seq" (example here). Subjects nested in Sequences is just the same as Sequences nested in Subjects to a numbercruncher working on bits and bytes. However, to me in practice, speaking of Subjects in Sequence makes some sense whereas speaking of Sequences in Subjects is meaningless.

Everything is so much easier when we specify subjects uniquely as under option 1 above. Now we can just ask R to fit the model specified as "lnAuc ~ Subj + Seq + Per + Trt" or whatever order of factor we prefer (if we have a preference at all). That way we do not need to specify the nesting at all. And what happens if we do? Well, although in the end the ANOVA table is right, poor little R is forced to do serious overtime, because as mentioned above it will try and fit with a model matrix having a potentially high number of columns that correspond to coefficients that cannot be calculated because the subjects do not exist (you will see around n annoying "NA"s in the model summary when n unique subjects are evaluated that way).

So, next time you try to impress the blonde down the corridor with mumbojumbo like "subjects nested in sequence", make sure you have a pretty good excuse ready in case she asks you why the heck you are using an approach that is intended for non-unique subject identifiers.

OK, now back to the seven seas.
EM.

Wow! Giving me something to chew upon. Some translations for the German word "Schwung" in the leading section (just before the cute blondes): swing, kick, impetus: If you are not happy with that, have a look here for other translation. [Helmut]

yjlee168
★★★

Kaohsiung, Taiwan,
2009-08-27 00:29
(5774 d 10:05 ago)

@ ElMaestro
Posting: # 4119
Views: 13,461

Let's skip the fancy nesting syntax!

Post reply

Dear ElMaestro,

Thanks for your interesting and great posting. Could you tell me where you got this? Many thanks.

❝ than "Subj : Seq" (example here). Subjects nested in sequences is just the same as Sequences nested in Subjects to a numbercruncher working on bits and bytes. However, to me in practice, speaking of Subjects in Sequence makes some sense whereas speaking of Sequences in Subjects is meaningless.

It clarifies the puzzle that we have when developing bear. :clap:

—
All the best,
-- Yung-jin Lee
bear v2.9.2:- created by Hsin-ya Lee & Yung-jin Lee
Kaohsiung, Taiwan https://www.pkpd168.com/bear
Download link (updated) -> here

ElMaestro
★★★

Denmark,
2009-08-27 00:46
(5774 d 09:48 ago)

@ yjlee168
Posting: # 4120
Views: 13,431

Let's skip the fancy nesting syntax!

Post reply

Hi,

❝ Thanks for your interesting and great posting. Could you tell me where you got this? Many thanks.

❝

❝ ❝ than "Subj : Seq" (example here). Subjects nested in Sequences is just the same as Sequences nested in Subjects to a numbercruncher (...)

This is something I have arrived at after trying to understand how model matrices conceptually work and how they are used for least squares modeling.
Our model is Y=Xb + e in matrix notation, where X is the model matrix and b is the vector of model coefficients.

With simplest contrast coding you can consider X a matrix of zeros and ones (yes yes, there are other contrasts available, but for simplicity I will only work with 0 and 1 here). For each datapoint (AUC-value etc) there is a row in X, and for each coefficient we want estimated, there is in principle a column (but as written before there is reduction of some redundancies). Let's say a column corresponds to the coefficient for the Subj X in sequence Y. We will then only put two ones in that column, namely in the rows that correspond to data obatined by the Xth subject in the Yth sequence, and we will put a zero in all other rows in that column. And so forth (in order to grasp this is it a good idea also to see how a matrix (like X) is multiplied with a vector (like b)). Generally, we put a 1 in the row when the desired levels of both factors are present, and that is exactly the same way interactions are specified.

Best regards
EM.

yjlee168
★★★

Kaohsiung, Taiwan,
2009-08-27 22:54
(5773 d 11:39 ago)

@ ElMaestro
Posting: # 4127
Views: 13,450

Let's skip the fancy nesting syntax!

Post reply

Dear EM,

Thanks for your message. I will try it later myself.

❝ [...] With simplest contrast coding you can consider X a matrix of zeros and ones (yes yes, there are other contrasts available, but for simplicity I will only work with 0 and 1 here). For each datapoint (AUC-value etc) there is a row in X, and for each coefficient we want estimated, there is [...]

Can the above descriptions be presented with .csv file or any format that can be ready-to-use? An example and R running outputs can be better. What do you think? :clap:

—
All the best,
-- Yung-jin Lee
bear v2.9.2:- created by Hsin-ya Lee & Yung-jin Lee
Kaohsiung, Taiwan https://www.pkpd168.com/bear
Download link (updated) -> here

ElMaestro
★★★

Denmark,
2009-08-28 00:14
(5773 d 10:19 ago)

@ yjlee168
Posting: # 4128
Views: 13,418

Let's skip the fancy nesting syntax!

Post reply

Hi Yjlee,

❝ Can the above descriptions be presented with .csv file or any format that can be ready-to-use? An example and R running outputs can be better. What do you think? :clap:

I am not sure I understand your question.
If you talk about presenting a model matrix with Bear, I think you will have very few users actually wanting that. It is easy enough to get it printed, though, with the syntax above. It would be an OK option to have, I guess.
If this was not what you meant, then please rephrase.

Many thanks,
EM.

yjlee168
★★★

Kaohsiung, Taiwan,
2009-08-28 00:26
(5773 d 10:07 ago)

@ ElMaestro
Posting: # 4129
Views: 13,483

Let's skip the fancy nesting syntax!

Post reply

Dear EM,

❝ [...] Nesting in a model matrix is the same as interactions. This is why we get same results if we use "lnAuc ~ Subj : Seq + Seq + Per + Trt". However,[...]

What if we keep same data matrix as before (e.g., as in bear), but change the model with "lnAUC ~ (1|Subj:Seq) + Seq + Per + Trt"? I haven't tried it yet. I read this from the text: Faraway JJ, Extending the Linear Model with R - Generalized Linear, Mixed Effects and Nonparametric Regression Models. Chapman & Hall/CRC, 2006. It seems that we will get normal "Subj: Seq" in anova table in this case.

—
All the best,
-- Yung-jin Lee
bear v2.9.2:- created by Hsin-ya Lee & Yung-jin Lee
Kaohsiung, Taiwan https://www.pkpd168.com/bear
Download link (updated) -> here

ElMaestro
★★★

Denmark,
2009-08-28 00:46
(5773 d 09:47 ago)

@ yjlee168
Posting: # 4130
Views: 13,446

Let's skip the fancy nesting syntax!

Post reply

Hi Yjlee,

❝ What if we keep same data matrix as before (e.g., as in bear), but change the model with "lnAUC ~ (1|Subj:Seq) + Seq + Per + Trt"? I haven't tried it yet. I read this from (...) It seems that we will get normal "Subj: Seq" in anova table in this case.

I do not know how R will handle that, sorry. But I don't personally have a preference for it. I clearly would prefer the simple lnAUC~Per+Trt+Seq+Subj in combo with uniquely coded subjects.
If you really insist on the ":" in the model syntax (please tell me why!) and you need to make sure the output for the nesting is never specified as "Seq:Subj" then I guess you could also fiddle with some aspects of R's data garbling. What in practice this implies, I could only tell if I read the source code. That is not likely to happen any time soon, though.

EM.

ElMaestro
★★★

Denmark,
2009-08-28 01:53
(5773 d 08:41 ago)

@ yjlee168
Posting: # 4131
Views: 13,426

Simple solution

Post reply

Hi Yjlee,

❝ It seems that we will get normal "Subj: Seq" in anova table in this case.

You have an additional option. You can do quick and dirty manual tampering like this:

BearFit=lm(lnAuc~Seq+Subj+Per+Trt)    ## the standard and simple model

BearAnova=anova(BearFit)              ## BearAnova now holds the anova table 

row.names(BearAnova)[2] = "Subj(Seq)" ## because it inherits from a data.frame

...Or something along those lines...

Now you can just print BearAnova the standard way and voila it will print nicely with whatever name you decide to give the row specifying the subjects.

EM.

yjlee168
★★★

Kaohsiung, Taiwan,
2009-09-07 03:53
(5763 d 06:40 ago)

(edited on 2009-09-07 13:06)
@ ElMaestro
Posting: # 4154
Views: 13,822

Simple solution

Post reply

Dear ElMaestro,

Sorry about this delayed response (never got the chance to test your codes.) Yes, you're absolutely right about this. I've tested the previous codes that I got from one of R textbooks, and it did not work. As you suggested to simply use lm(lnAUC ~ Seq + Subj + Per + Trt) which was exactly equal to (have the same result) that obtained from lm(lnAUC ~ Seq + Subj:Seq + Per + Trt). Why we do use ":" in R? for nesting purpose, I suppose. Your following codes are quite interesting. It works, too, as the attached output. (I randomly tested with Cmax, not lnCmax.)

❝ BearFit=lm(lnAuc~Seq+Subj+Per+Trt) ## the standard and simple model

❝ BearAnova=anova(BearFit) ## BearAnova now holds the anova table

❝ row.names(BearAnova)[2] = "Subj(Seq)" ## because it inherits from a data.frame

❝ BearFit=lm(lnAuc~Seq+Subj+Per+Trt) ## the standard and simple model

❝ BearAnova=anova(BearFit) ## BearAnova now holds the anova table

❝ row.names(BearAnova)[2] = "Subj(Seq)" ## because it inherits from a

❝ data.frame

Type I SS

Analysis of Variance Table



Response: Cmax

          Df Sum Sq Mean Sq F value Pr(>F)

seq        1   1057    1057  0.0247 0.8778

prd        1  37889   37889  0.8844 0.3656

drug       1  88706   88706  2.0705 0.1757

subj(seq) 12 719310   59943  1.3991 0.2849

Residuals 12 514123   42844               



Type III SS

Single term deletions



Model:

Cmax ~ seq + prd + drug + subj

          Df Sum of Sq     RSS     AIC F value  Pr(F)

<none>                  514123     307               

seq        0 2.328e-10  514123     307 (do we need this?) 

prd        1     37889  552013     307  0.8844 0.3656

drug       1     88706  602830     309  2.0705 0.1757

subj(seq) 12    719310 1233434     307  1.3991 0.2849



Tests of Hypothesis for SUBJECT(SEQUENCE) as an error term



Error: subj

          Df Sum Sq Mean Sq F value Pr(>F)

prd:drug   1   1057    1057  0.0176 0.8966

Residuals 12 719310   59943               



    Error: Within Df Sum Sq Mean Sq F value Pr(>F) prd        1 

    37889   37889  0.8844 0.3656 drug       1  88706   88706 

    2.0705 0.1757 Residuals 12 514123   42844

The original outputs from bear are as follows.

Type I SS

Analysis of Variance Table



Response: Cmax

          Df Sum Sq Mean Sq F value Pr(>F)

seq        1   1057    1057  0.0247 0.8778

prd        1  37889   37889  0.8844 0.3656

drug       1  88706   88706  2.0705 0.1757

subj(seq) 12 719310   59943  1.3991 0.2849

Residuals 12 514123   42844               



Type III SS

Single term deletions



Model:

Cmax ~ seq + prd + drug + subj:seq

          Df Sum of Sq     RSS     AIC F value  Pr(F)

<none>                  514123     307               

prd        1     37889  552013     307  0.8844 0.3656

drug       1     88706  602830     309  2.0705 0.1757

subj(seq) 12    719310 1233434     307  1.3991 0.2849



Tests of Hypothesis for SUBJECT(SEQUENCE) as an error term



Error: subj

          Df Sum Sq Mean Sq F value Pr(>F)

prd:drug   1   1057    1057  0.0176 0.8966

Residuals 12 719310   59943               



Error: Within

          Df Sum Sq Mean Sq F value Pr(>F)

prd        1  37889   37889  0.8844 0.3656

drug       1  88706   88706  2.0705 0.1757

Residuals 12 514123   42844

Many users ask us to provide the validation results obtained from bear with WinNonlin and SAS. In order to conveniently compare with, we may follow your suggested codes as above in the future.

—
All the best,
-- Yung-jin Lee
bear v2.9.2:- created by Hsin-ya Lee & Yung-jin Lee
Kaohsiung, Taiwan https://www.pkpd168.com/bear
Download link (updated) -> here

ElMaestro
★★★

Denmark,
2009-09-07 17:47
(5762 d 16:47 ago)

@ yjlee168
Posting: # 4158
Views: 13,412

Type III SS again

Post reply

Dear yjlee,

❝ Why we do use ":" in R? for nesting purpose, I suppose.

Strictly, as explained before, it is fo interactions. They just happen to have the same meaning as nesting to the model matrix.

❝ Type III SS

❝ Single term deletions

❝

❝ Model:

❝ Cmax ~ seq + prd + drug + subj

❝ Df Sum of Sq RSS AIC F value Pr(F)

❝ <none> 514123 307

❝ seq 0 2.328e-10 514123 307 (do we need this?)

❝

❝ prd 1 37889 552013 307 0.8844 0.3656

❝ drug 1 88706 602830 309 2.0705 0.1757

❝ subj(seq) 12 719310 1233434 307 1.3991 0.2849

As noted elsewhere, the type III SS for a given factor are calculated by fitting the full model and then fitting the full model minus the factor in question and noting the difference between the two residual SS.
When we want type III SS for the Sequence factor, R fits the full model and then a model without Sequence as a factor and then compares the difference in residuals. In your case it is a number very close to zero. It always will be when the single-term-deletion strategy is followed.
This is because the model without Sequence as a factor includes Subjects. The subjects are "nested in sequence" (no pun intended!), so your sequence effect becomes effectively zero* (or to say it differently: you cannot include the subjects without also including the sequence, so in the model without the explicit inclusion of Sequence you have Subjects and therefore Sequence is included implicitly). SAS is clever enough to figure this one out by itself and fiddle with the inclusions.

So in summary: Type III SS are generally calculated by single term deletions. However, uncritical use of single term deletions may result in odd stufff or nested data. R does it un critically, SAS does some clever work for you, and therefore the SAS type III output differs from R's drop1 output.

EM.

The low value you see is simply a matter of machine precision; the fitting algorithm stops once the residual SS cannot be improved more than a certain small quantity. It does arguably (if you ask SAS) not reflect the actual Sequence effect. Although I am not a fan of SAS, I totally agree on this specific issue.

yjlee168
★★★

Kaohsiung, Taiwan,
2009-09-07 20:08
(5762 d 14:25 ago)

@ ElMaestro
Posting: # 4159
Views: 13,195

Type III SS again

Post reply

Dear Elmaestro,

❝ [...] It always will be when the single-term-deletion strategy is followed. This is because the model without Sequence as a factor includes Subjects. The subjects are "nested in sequence" (no pun intended!), so your sequence effect becomes effectively zero* (or to say it differently: you cannot [...]

Because each subject just has one and the only one seq in a crossover study, so subj:seq is just the same as subj. However, it is interesting to find out that R treats the model of (lnAUC ~ seq + prd + trt + subj) and the other model of (lnAUC ~ seq + prd + trt + subj:seq) sightly differently through drop1() function to calculate type III SS that we've already had many discussion when we announce bear at this Forum.

❝ [...] odd stuff for nested data. R does it un critically, SAS does some clever work for you, and therefore the SAS type III output differs from R's drop1 output.

As far as I know, for a balanced 2x2x2 crossover study, the SAS-calculated type III SS should be the same as the type I SS. Both should be the same as those calculated by R's drop1(). You said they're different. Could you explain more? Thanks.

❝ [...] the actual Sequence effect. Although I am not a fan of SAS, I totally agree on this specific issue.

Indeed.

—
All the best,
-- Yung-jin Lee
bear v2.9.2:- created by Hsin-ya Lee & Yung-jin Lee
Kaohsiung, Taiwan https://www.pkpd168.com/bear
Download link (updated) -> here

ElMaestro
★★★

Denmark,
2009-09-07 21:53
(5762 d 12:40 ago)

@ yjlee168
Posting: # 4160
Views: 13,632

Type III SS again

Post reply

Dear Yjlee,

❝ Because each subject just has one and the only one seq in a crossover study, so subj:seq is just the same as subj.

Ah good. That means you are using uniquely coded subjects. Full point from Belgium!

❝ However, it is interesting to find out that R treats the model of (lnAUC ~ seq + prd + trt + subj) and the other model of (lnAUC ~ seq + prd + trt + subj:seq) sightly differently through drop1() function to calculate type III SS that we've already had many discussion when we announce bear at this Forum.

Well, since you prefer uniquely encoded subject there is no reason to use the ":", so we need not to worry about this :-D

(but the secret lies in the model matrix, I think!)

❝ As far as I know, for a balanced 2x2x2 crossover study, the SAS-calculated type III SS should be the same as the type I SS. Both should be the same as those calculated by R's drop1(). You said they're different. Could you explain more? Thanks.

Actually this was exactly what I tried to explain in the previous post. You are in principle right when you say they should be. However, you can also argue that the output R provides is the correct one. Or that the SAS ouput is correct. Confused? Many of us are. Type III SS can be generally calculated by dropping the terms one by one. But dropping Seq makes no difference - there is no "added value" of Seq when Subject is already in the model. Therefore a round 0.0 (sort of) for the Seq SS from R's perspective, and you can say this is correct.
However, we must confess that Seq itself actually does add variation to our data. And therefore the Seq SS is not zero, and therefore SAS does more than just dropping terms one by one. It figures out that in order to make meaningful SS for Seq in this case it needs to play a little trick with the factor inclusions. And so it does (I can later come back to how this can be done in R, but trust me, my proposed solution will be syntactically ugly!), and therefore we have this odd situation where the SAS typeIII output can differ from R's drop1 output.

Finally as for type I SS vs type III SS: yes, they will be identical when the dataset is balanced for sequences.

EM.