d_labes
★★★

Berlin, Germany,
2011-08-23 15:01
(5424 d 02:28 ago)

Posting: # 7290
Views: 10,079
 

 EMA replicate surprise [Regulatives / Guidelines]

Dear All!

In this posting I had observed a type III sum-of-square peculiarity in the EMA recommended code (Helmut calls it crippled :cool:) for obtaining the intra-subject variability for the Reference formulation in replicate crossover studies.

Now I had coded (not remembering the code literally)
Proc GLM data=EMAset1;
  class period sequence subject;
  model logData =  period3 sequence1 subject(sequence)2;
  where code='R';
run; quit;
(numbers are the order in the EMA Q&A code)

The result was
                         EMA set 1

                       The GLM Procedure

Dependent Variable: logval   log(Cmax)

                                           Sum of
Source                DF         Squares     Mean Square    F Value    Pr > F

Model                 78     120.2314511       1.5414289       7.73    <.0001
Error                 71      14.1512621       0.1993136
Corrected Total      149     134.3827132


                R-Square     Coeff Var      Root MSE    logval Mean

                0.894694      5.815810      0.446445       7.676411


Source                DF       Type I SS     Mean Square    F Value    Pr > F

period                 3       1.5697687       0.5232562       2.63    0.0570
sequence               0       0.0000000        .               .       .
subject(sequence)     75     118.6616824       1.5821558       7.94    <.0001


Source                DF     Type III SS     Mean Square    F Value    Pr > F

period                 2       0.7726727       0.3863363       1.94    0.1515
sequence               0       0.0000000        .               .       .
subject(sequence)     75     118.6616824       1.5821558       7.94    <.0001


:confused:
Of course I knew that the type I sum-of-squares are order dependent.
But df=0 was a great surprise for me.

Regards,

Detlew
ElMaestro
★★★

Denmark,
2011-08-23 15:15
(5424 d 02:14 ago)

@ d_labes
Posting: # 7291
Views: 8,823
 

 EMA replicate surprise

Oh you mighty numbercruncher,

❝ model logData = period3 sequence1 subject(sequence)2;


How does it look for sequence and typeIII SS if you fit the muddle without subject(sequence)?

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2011-08-23 15:58
(5424 d 01:31 ago)

@ ElMaestro
Posting: # 7293
Views: 8,801
 

 Big bäng CV

Old Pirate,

❝ How does it look for sequence and typeIII SS if you fit the muddle without subject(sequence)?


Very high residual error (CVRef=121.8%)

                  The GLM Procedure

Dependent Variable: logval   log(Cmax)

                                 Sum of
Source                DF         Squares     Mean Square    F Value    Pr > F

Model                  3       1.5697687       0.5232562       0.58    0.6322
Error                146     132.8129445       0.9096777
Corrected Total      149     134.3827132


           R-Square     Coeff Var      Root MSE    logval Mean

           0.011681      12.42469      0.953770       7.676411


Source                DF       Type I SS     Mean Square    F Value    Pr > F

period                 3      1.56976869      0.52325623       0.58    0.6322
sequence               0      0.00000000       .                .       .


Source                DF     Type III SS     Mean Square    F Value    Pr > F

period                 2      1.48175697      0.74087848       0.81    0.4449
sequence               0      0.00000000       .                .       .

Seems this does'nt make any sense.

Regards,

Detlew
ElMaestro
★★★

Denmark,
2011-08-23 17:02
(5424 d 00:27 ago)

@ d_labes
Posting: # 7295
Views: 8,752
 

 Big bäng CV

Dear d_labes,

❝ Seems this does'nt make any sense.


How extremely odd that is. There certainly is a sequence component which isn't zero. Have you found a bug in SAS?

Further, I wonder:

The dataset contains, if I am not mistaking:
4 periods
2 sequences
77 subjects (there is no subject called 61)

I just wonder about another thing; in your original post that you link to above and this in this thread as well SAS says the model has 78 df's.
Apprently (from your linked post)
df periods = 2
df seqs =1
df subjs(seq) =75
why df periods =2 and not 3?

(note: I just tried to construct the original unreduced model matrix: 4 columns for periods, 2 for seqs, and 77 for subjs. Then qr'ed the matrix in R and got a rank of 81 (had expected 78 if SAS is correct); prolly I made a mistake somewhere :-D)

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2011-08-23 17:28
(5424 d 00:01 ago)

@ ElMaestro
Posting: # 7296
Views: 8,778
 

 Period DFs

Dear ElMaestro,

❝ I just wonder about another thing; in your original post that you link to above and this in this thread as well SAS says the model has 78 df's.

❝ Apprently (from your linked post)

❝ df periods = 2

❝ df seqs =1

❝ df subjs(seq) =75

❝ why df periods =2 and not 3?


Good question. Next question :-D.

Seems there is some inter-dependence between periods and sequences.
Subjects with sequence TRTR have only periods 2 and 4, those with sequence RTRT have periods 1 and 3 (Note: we are dealing only with the data of the Reference).
Maybe this is the source of this odd degree of freedom?

BTW: If you have the data in R could you lm() them to see what's the result?

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2011-08-23 17:36
(5423 d 23:52 ago)

@ d_labes
Posting: # 7297
Views: 8,814
 

 Period DFs

Dear both of you!

Funny stuff! I’m a little bit too busy to be of help by now.

❝ BTW: If you have the data in R could you lm() them to see what's the result?


@ElMaestro: If you don’t want to extract the data from EMA’s PDF, here is an Excel-sheet for download.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2011-08-23 18:26
(5423 d 23:02 ago)

(edited on 2011-08-23 20:54)
@ d_labes
Posting: # 7298
Views: 8,797
 

 Period DFs

Dear d_labes,

❝ Seems there is some inter-dependence between periods and sequences.

❝ Subjects with sequence TRTR have only periods 2 and 4, those with sequence RTRT have periods 1 and 3 (Note: we are dealing only with the data of the Reference).

❝ Maybe this is the source of this odd degree of freedom?


Important edit:
I apologise - overlooked that we are only dealing with the ref.

I will let my initial response remain below.


I am not sure this is the explanation, here's my logic:

If we for any observation for any subject within the i'th sequence have info abut which three periods the obserbation does not come from thenwe can infer which period it comes from. Thus df(per)=4-1=3.
Same for sequence, df(seq)=2-1.
For the subject; if we know which of the subjects it isn't then we can infer which subject (in seqs) it comes from (that's minus 1) and we can even infer which if it is the subject from seq 1 or seq 2 (minus 1 again). df(subj)=77-2=75.
75+3+1=79 (according to Excel, this is not on an early-age pentium machine :-D).

❝ BTW: If you have the data in R could you lm() them to see what's the result?


Will do, sir!
Please allow me a little time for that.

Pass or fail!
ElMaestro
ElMaestro
★★★

Denmark,
2011-08-23 18:58
(5423 d 22:31 ago)

@ d_labes
Posting: # 7299
Views: 8,763
 

 Period DFs

This post is extensively edited as I did initially not catch that we are only dealing with observations of the Reference formulation.


Dear d_labes and HS,

Took the file provided by HS and and rn it through R as follows:

a=read.table("ema1.csv", header=T)
b=a[grep("R", a$Formulation, ignore.case=T),]
Muddle=lm(log(b$Data)~factor(b$Period)+factor(b$Sequence)+factor(b$Subject))
anova(Muddle)
Analysis of Variance Table

Response: log(b$Data)
                  Df  Sum Sq Mean Sq F value    Pr(>F)   
factor(b$Period)   3   1.570 0.52326  2.6253   0.05703 . 
factor(b$Subject) 75 118.662 1.58216  7.9380 2.454e-16 ***
Residuals         71  14.151 0.19931                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
drop1(Muddle, test="F")
Single term deletions

Model:
log(b$Data) ~ factor(b$Period) + factor(b$Sequence) + factor(b$Subject)
                   Df Sum of Sq     RSS      AIC F value     Pr(F)   
<none>                           14.151 -196.125                     
factor(b$Period)    2     0.773  14.924 -192.150  1.9383    0.1515   
factor(b$Sequence)  0     0.000  14.151 -196.125                     
factor(b$Subject)  75   118.662 132.813  -10.254  7.9380 2.454e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



What it really, really strange is this:

M=model.matrix(Muddle)
qr(M)$rank
[1] 79


Nützt nüchts. Ich werde's nie verstehen. Ist auch scheissegal.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2011-08-24 02:44
(5423 d 14:45 ago)

@ ElMaestro
Posting: # 7300
Views: 8,765
 

 Period DFs

Dear comrades-in-arms!

❝ Nützt nüchts. Ich werde's nie verstehen. Ist auch scheissegal.


Yep. Let me throw in Phoenix6.2 in the spirit of EMA’s Q&A p25:

‘The analysis presented above show that this approach (Method A) is feasible even for unbalanced replicate design studies. The advantage of this approach is that it is straightforward and that it appears to be software and software option independent.’

(my emphases)

Sequential Tests
Hypothesis        Numer_DF  Denom_DF       F_stat     P_value
Period                   3        71       2.6252933  0.05702846
Sequence                 0  Not estimable
Sequence*Subject         75       71       7.9380219  0


Partial Tests
Hypothesis        Numer_DF  Denom_DF       F_stat     P_value
Period                   3        71       1.3322478  0.27074552
Sequence                 0  Not estimable
Sequence*Subject         75       71       7.9380219  0


‘Sequential Tests’ in Phoenix/WinNonlin = SAS I
Quote from the manual about ‘Partial Tests’:

The partial tests in LinMix are not equivalent to the Type III method in SAS though they coincide in most situations.
The Partial Tests worksheet is created by testing each model term given every other model term. Unlike sequential tests, partial tests are invariant under the order in which model terms are listed in the Fixed Effects tab. Partial tests factor out of each model term the contribution attributable to the remaining model terms.
This is computed by modifying the basis created by the QR factorization to yield a basis that more closely resembles that found in balanced data.
For fixed effects models, certain properties can be stated for the two types of ANOVA. For the sequential ANOVA, the sums of squares are statistically independent. Also, the sum of the individual sums of squares is equal to the model sum of squares; which means the ANOVA represents a partitioning of the model sum of squares. However, some terms in ANOVA may be contaminated with undesired effects. The partial ANOVA is designed to eliminate the contamination problem, but the sums of squares are correlated and do not add up to the model sums of squares. The mixed effects tests have similar properties.


Aha!

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2011-08-24 13:24
(5423 d 04:05 ago)

@ Helmut
Posting: # 7302
Views: 8,805
 

 Lessons learned?

Dear brothers-in-arms!

What does this all tell us? What lessons learned? :confused:

First I hope that the statisticians at EMA responsible for that paragraph of the Q&A read this thread. It must be a splashy slap in their face considering Helmut's quotation of "the spirit of EMA’s Q&A".

Second I come more and more to the conclusion that a decomposition of the subject effect into the terms sequence and subject nested within sequence is not really possible with those crippled datasets (reasons not really understood, but I'm only a bloody amateur and have no such deep understanding of concepts like rank of the model.matrix). The robust model without any peculiarities should therefore include period and subject effects only.
Do I err here?

Third the intra-individual variability (represented by the residual error) is fortunately not affected by all that muddle. And this is the only interesting number in this analysis. The rest is really "scheissegal" (best vulgar translation found: don't give a flying fuck :-D).
Also I personally have a deep horror to see df=0 or 'not estimable' in a model.

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2011-08-24 15:51
(5423 d 01:38 ago)

@ d_labes
Posting: # 7304
Views: 8,752
 

 Lessons learned?

Dear control freaks!

❝ Second I come more and more to the conclusion that a decomposition of the subject effect into the terms sequence and subject nested within sequence is not really possible with those crippled datasets […]. The robust model without any peculiarities should therefore include period and subject effects only.

❝ Do I err here?


If you want to get a second amateur’s opinion: exactly. That’s why I called the datasets (after removal of T!) ‘crippled’. All information about sequences (RTRT ^ TRTR ¬ RR!) is lost. Martin (not an amateur) once told me that you can do only statistics if you understand the data generating process. In all replicate datasets I have seen until now, EMA’s method gives a smaller CVWR than the full model (= FDA’s or EMA’s ‘Method C’). Politics = narrower AR? If we throw away sequences, it doesn’t make any sense to include them in the model (not to speak about nested subjects either).

BTW, more than 20 years after Freeman’s paper1, Senn’s book2, and the empiric study by D’Angelo et al.3 EMA has realised (see IR GL) that sequence effects are of no importance in a properly designed cross-over study. But: Would anybody evaluate a 2×2 cross-over by Y=treatment+period+subject only - though the CI stays the same?

Agree with your third point.


[image]Quotes on a sad occasion:
  • Ein Klavier, ein Klavier! Mutter, wir danken dir!
  • In der Politik gibt niemals der Klügere nach,
    sondern immer der Schwächere…

  1. Freeman PR. The Performance of the Two-Stage Analysis of Two-Treatment, Two-Period Crossover Trials. Stat Med. 1989;8:1421–32.
  2. S Senn S. Cross-over Trials in Clinical Research. Chichester: John Wiley & Sons; 2nd ed. 2002.
  3. D’angelo G, Potvin D, Turgeon J. Carry-Over Effects in Bioequivalence Studies. J Biopharm Stat. 2001;11(1&2):35–43.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2011-08-28 23:34
(5418 d 17:54 ago)

@ d_labes
Posting: # 7307
Views: 8,504
 

 Lessons learned?

Hi all,

quick note:
There is a sequence effect to be fit, no doubt about that.

The fact that we boil TRTR and RTRT down to RR is not the direct culprit here as treatment is not considered. But as was said by Helmut there is an interaction or nesting of period in sequence. So just as is the case with subejcts, whenever period is in the Muddle, any type III anova goes bonkers.
Compare the two sequential ones:

Muddle=lm(log(b$Data)~factor(b$Period) + factor(b$Sequence))
anova(Muddle)
Analysis of Variance Table

Response: log(b$Data)
                  Df Sum Sq Mean Sq F value Pr(>F)
factor(b$Period)   3   1.57 0.52326  0.5752 0.6322
Residuals        146 132.81 0.90968

Muddle=lm(log(b$Data)~factor(b$Sequence) + factor(b$Period))
anova(Muddle)
Analysis of Variance Table

Response: log(b$Data)
                    Df  Sum Sq Mean Sq F value Pr(>F)
factor(b$Sequence)   1   0.088 0.08801  0.0968 0.7562
factor(b$Period)     2   1.482 0.74088  0.8144 0.4449
Residuals          146 132.813 0.90968 



Remember earlier discussions about three-treatment designs where EMA occasionally does not like that the residual is polluted by an irrelevant treatment A when comparison of treatment B and C is done. In those cases the dataset is pruned down to a "pseudo" two-treatment, two-sequence, two-period dataset and analysed the usual way. We could do the same to periods here and get rid of the ugly 'issue', I think. I don't think this will change the residual SS (which is what is important), however, but I haven't done it or thought it fully through.

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2011-08-24 12:47
(5423 d 04:42 ago)

@ ElMaestro
Posting: # 7301
Views: 8,770
 

 Period DFs - EMA code literally in R

Dear ElMaestro!

❝ Muddle=lm(log(b$Data)~factor(b$Period)+factor(b$Sequence)+factor(b$Subject))

❝ anova(Muddle)

❝ Analysis of Variance Table


Response: log(b$Data)

❝                   Df  Sum Sq Mean Sq F value    Pr(>F)   

❝ factor(b$Period)   3   1.570 0.52326  2.6253   0.05703 .

❝ factor(b$Subject) 75 118.662 1.58216  7.9380 2.454e-16 ***

❝ Residuals         71  14.151 0.19931

factor(b$sequence) vanished in nirvana :confused:
Clever or bug?

In terms of the EMA code literally taken but treated in R you can observe a similar "cleverle" but now with type III
(I have renamed Data to Cmax and done a factor transformation before calling lm()):

muddle2 <- lm(log(Cmax)~ sequence + subject%in%sequence + period, data=refdata)
anova(muddle2)
Analysis of Variance Table

Response: log(Cmax)
                 Df  Sum Sq Mean Sq F value    Pr(>F)   
sequence          1   0.088 0.08801  0.4416   0.50852   
period            2   1.482 0.74088  3.7172   0.02915 * 
sequence:subject 75 118.662 1.58216  7.9380 2.454e-16 ***
Residuals        71  14.151 0.19931                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(muddle2)
Single term deletions

Model:
log(Cmax) ~ sequence + subject %in% sequence + period
                 Df Sum of Sq     RSS      AIC
<none>                         14.151 -196.125
period            2     0.773  14.924 -192.150
sequence:subject 75   118.662 132.813  -10.254

Note the period DF's. Also note the reversed order of effects.

❝ Nützt nüchts. Ich werde's nie verstehen. Ist auch scheissegal.

Here we are 2!

Regards,

Detlew
ElMaestro
★★★

Denmark,
2011-08-24 15:04
(5423 d 02:25 ago)

@ d_labes
Posting: # 7303
Views: 8,658
 

 Period DFs - EMA code literally in R

Dear d_labes

factor(b$sequence) vanished in nirvana :confused:

❝ Clever or bug?

I have no idea. I am one big question mark.

muddle2 <- lm(log(Cmax)~ sequence + subject%in%sequence + period, data=refdata)

My eyes, my eyes, it hurts it hurts! Subject%in%sequence when all subjects have unique identifiers... Initial muddle matrix will contain 77 columns of pure zeros. Makes no practical difference for the results, though.

drop1(muddle2)


drop1 does not work like in SAS when it comes to sequence. When R drops Sequence to get the deltaSS for this factor it does not automatically recognise that Sequence is already in the model because it is contained in Subject. SAS does that (or at least: should do it, but might obviously be failing here).


Your example with first the sequential muddle with period last and then a drop1 is truly brilliant. Here I would expect exactly the same SS for period between the two anovas. It just isn't the case.

Where's hrotter and jdetlor? There must be an explanation.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2011-08-24 16:01
(5423 d 01:28 ago)

@ ElMaestro
Posting: # 7305
Views: 8,733
 

 call for expert’s opinions

Comrade!

❝ Where's hrotter and jdetlor?


Wouldn’t be too optimistic about Hermann Rotter (last login 2010-04-08), maybe J. Detlor (last login 2011-07-29) is willing to step into the muddle.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
jdetlor
☆    

2011-11-01 04:05
(5354 d 12:24 ago)

@ Helmut
Posting: # 7568
Views: 7,743
 

 call for expert’s opinions

❝ Comrade!


❝ ❝ Where's hrotter and jdetlor?


Dear All!

It's been a while since I've logged on because I have left the pharmaceutical/bioequivalence consulting industry. I have gone back to finance. I'm now using various statistical methods to build decision trees to evaluate customer risk.

Anyways, if I read the problem correctly, I believe the issue lies within the modification to your design matrix (X-matrix). As a full replicate design, the periods and sequences are independent with respect to treatment. When you delete a specific treatment from the data, the columns for sequence and period are not linearly independent anymore, and thus when PROC GLM tries to assign degrees of freedom based on independent columns, it fails to assign some to the sequence effect.

Hope this helps with the issue, and I will be updating my profile with a working email address.

Thanks

J. Detlor

Edit:
Forgot to mention — in SAS you can see this problem if you use PROC GENMOD with the modified data set (same model statement though) and request the design matrix through the ODS output.
ElMaestro
★★★

Denmark,
2011-11-02 10:48
(5353 d 05:41 ago)

@ jdetlor
Posting: # 7580
Views: 17,297
 

 The finance sector

Hi jdetlor,

❝ It's been a while since I've logged on because I have left the pharmaceutical/bioequivalence consulting industry. I have gone back to finance.


I am sure they need your competent help in the finance sector.

Best regards,
EM.
UA Flag
Activity
 Admin contact
23,656 posts in 4,994 threads, 1,570 registered users;
320 visitors (0 registered, 320 guests [including 26 identified bots]).
Forum time: 17:29 CEST (Europe/Vienna)

Most scientists today are devoid of ideas, full of fear, intent on
producing some paltry result so that they can add to the flood
of inane papers that now constitutes “scientific progress”
in many areas.    Paul Feyerabend

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5