Bioequivalence and Bioavailability Forum

d_labes
★★★

Berlin, Germany,
2011-08-23 15:01
(5424 d 02:28 ago)

Posting: # 7290
Views: 10,079

EMA replicate surprise [Regulatives / Guidelines]

Dear All!

In this posting I had observed a type III sum-of-square peculiarity in the EMA recommended code (Helmut calls it crippled :cool:

) for obtaining the intra-subject variability for the Reference formulation in replicate crossover studies.

Now I had coded (not remembering the code literally)

Proc GLM data=EMAset1;

  class period sequence subject;

  model logData =  period³ sequence¹ subject(sequence)²;

  where code='R';

run; quit;

(numbers are the order in the EMA Q&A code)

The result was

                         EMA set 1 



                       The GLM Procedure



Dependent Variable: logval   log(Cmax)



                                           Sum of

Source                DF         Squares     Mean Square    F Value    Pr > F



Model                 78     120.2314511       1.5414289       7.73    <.0001

Error                 71      14.1512621       0.1993136

Corrected Total      149     134.3827132





                R-Square     Coeff Var      Root MSE    logval Mean



                0.894694      5.815810      0.446445       7.676411





Source                DF       Type I SS     Mean Square    F Value    Pr > F



period                 3       1.5697687       0.5232562       2.63    0.0570

sequence               0       0.0000000        .               .       .

subject(sequence)     75     118.6616824       1.5821558       7.94    <.0001





Source                DF     Type III SS     Mean Square    F Value    Pr > F



period                 2       0.7726727       0.3863363       1.94    0.1515

sequence               0       0.0000000        .               .       .

subject(sequence)     75     118.6616824       1.5821558       7.94    <.0001

Of course I knew that the type I sum-of-squares are order dependent.
But df=0 was a great surprise for me.

—
Regards,

Detlew

ElMaestro ★★★ Denmark, 2011-08-23 15:15 (5424 d 02:14 ago) @ d_labes Posting: # 7291 Views: 8,823	EMA replicate surprise Post reply
	Oh you mighty numbercruncher, ❝ model logData = period³ sequence¹ subject(sequence)²; How does it look for sequence and typeIII SS if you fit the muddle without subject(sequence)? — Pass or fail! ElMaestro

d_labes
★★★

Berlin, Germany,
2011-08-23 15:58
(5424 d 01:31 ago)

@ ElMaestro
Posting: # 7293
Views: 8,801

Big bäng CV

Post reply

Old Pirate,

❝ How does it look for sequence and typeIII SS if you fit the muddle without subject(sequence)?

Very high residual error (CV_Ref=121.8%)

                  The GLM Procedure



Dependent Variable: logval   log(Cmax)



                                 Sum of

Source                DF         Squares     Mean Square    F Value    Pr > F



Model                  3       1.5697687       0.5232562       0.58    0.6322

Error                146     132.8129445       0.9096777

Corrected Total      149     134.3827132





           R-Square     Coeff Var      Root MSE    logval Mean



           0.011681      12.42469      0.953770       7.676411





Source                DF       Type I SS     Mean Square    F Value    Pr > F



period                 3      1.56976869      0.52325623       0.58    0.6322

sequence               0      0.00000000       .                .       .





Source                DF     Type III SS     Mean Square    F Value    Pr > F



period                 2      1.48175697      0.74087848       0.81    0.4449

sequence               0      0.00000000       .                .       .

Seems this does'nt make any sense.

—
Regards,

Detlew

ElMaestro
★★★

Denmark,
2011-08-23 17:02
(5424 d 00:27 ago)

@ d_labes
Posting: # 7295
Views: 8,752

Big bäng CV

Post reply

Dear d_labes,

❝ Seems this does'nt make any sense.

How extremely odd that is. There certainly is a sequence component which isn't zero. Have you found a bug in SAS?

Further, I wonder:

The dataset contains, if I am not mistaking:
4 periods
2 sequences
77 subjects (there is no subject called 61)

I just wonder about another thing; in your original post that you link to above and this in this thread as well SAS says the model has 78 df's.
Apprently (from your linked post)
df periods = 2
df seqs =1
df subjs(seq) =75
why df periods =2 and not 3?

(note: I just tried to construct the original unreduced model matrix: 4 columns for periods, 2 for seqs, and 77 for subjs. Then qr'ed the matrix in R and got a rank of 81 (had expected 78 if SAS is correct); prolly I made a mistake somewhere :-D

)

—
Pass or fail!
ElMaestro

d_labes
★★★

Berlin, Germany,
2011-08-23 17:28
(5424 d 00:01 ago)

@ ElMaestro
Posting: # 7296
Views: 8,778

Period DFs

Post reply

Dear ElMaestro,

❝ I just wonder about another thing; in your original post that you link to above and this in this thread as well SAS says the model has 78 df's.

❝ Apprently (from your linked post)

❝ df periods = 2

❝ df seqs =1

❝ df subjs(seq) =75

❝ why df periods =2 and not 3?

Good question. Next question :-D

.

Seems there is some inter-dependence between periods and sequences.
Subjects with sequence TRTR have only periods 2 and 4, those with sequence RTRT have periods 1 and 3 (Note: we are dealing only with the data of the Reference).
Maybe this is the source of this odd degree of freedom?

BTW: If you have the data in R could you lm() them to see what's the result?

—
Regards,

Detlew

Helmut
★★★

Vienna, Austria,
2011-08-23 17:36
(5423 d 23:52 ago)

@ d_labes
Posting: # 7297
Views: 8,814

Period DFs

Post reply

Dear both of you!

Funny stuff! I’m a little bit too busy to be of help by now.

❝ BTW: If you have the data in R could you lm() them to see what's the result?

@ElMaestro: If you don’t want to extract the data from EMA’s PDF, here is an Excel-sheet for download.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2011-08-23 18:26
(5423 d 23:02 ago)

(edited on 2011-08-23 20:54)
@ d_labes
Posting: # 7298
Views: 8,797

Period DFs

Post reply

Dear d_labes,

❝ Seems there is some inter-dependence between periods and sequences.

❝ Subjects with sequence TRTR have only periods 2 and 4, those with sequence RTRT have periods 1 and 3 (Note: we are dealing only with the data of the Reference).

❝ Maybe this is the source of this odd degree of freedom?

Important edit:
I apologise - overlooked that we are only dealing with the ref.
I will let my initial response remain below.

I am not sure this is the explanation, here's my logic:

If we for any observation for any subject within the i'th sequence have info abut which three periods the obserbation does not come from thenwe can infer which period it comes from. Thus df(per)=4-1=3.
Same for sequence, df(seq)=2-1.
For the subject; if we know which of the subjects it isn't then we can infer which subject (in seqs) it comes from (that's minus 1) and we can even infer which if it is the subject from seq 1 or seq 2 (minus 1 again). df(subj)=77-2=75.
75+3+1=79 (according to Excel, this is not on an early-age pentium machine :-D

❝ BTW: If you have the data in R could you lm() them to see what's the result?

Will do, sir!
Please allow me a little time for that.

—
Pass or fail!
ElMaestro

ElMaestro
★★★

Denmark,
2011-08-23 18:58
(5423 d 22:31 ago)

@ d_labes
Posting: # 7299
Views: 8,763

Period DFs

Post reply

This post is extensively edited as I did initially not catch that we are only dealing with observations of the Reference formulation.

Dear d_labes and HS,

Took the file provided by HS and and rn it through R as follows:




a=read.table("ema1.csv", header=T)

b=a[grep("R", a$Formulation, ignore.case=T),]


Muddle=lm(log(b$Data)~factor(b$Period)+factor(b$Sequence)+factor(b$Subject))

anova(Muddle)

Analysis of Variance Table



Response: log(b$Data)

                  Df  Sum Sq Mean Sq F value    Pr(>F)    

factor(b$Period)   3   1.570 0.52326  2.6253   0.05703 .  

factor(b$Subject) 75 118.662 1.58216  7.9380 2.454e-16 ***

Residuals         71  14.151 0.19931                      

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

drop1(Muddle, test="F")

Single term deletions



Model:

log(b$Data) ~ factor(b$Period) + factor(b$Sequence) + factor(b$Subject)

                   Df Sum of Sq     RSS      AIC F value     Pr(F)    

<none>                           14.151 -196.125                      

factor(b$Period)    2     0.773  14.924 -192.150  1.9383    0.1515    

factor(b$Sequence)  0     0.000  14.151 -196.125                      

factor(b$Subject)  75   118.662 132.813  -10.254  7.9380 2.454e-16 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

What it really, really strange is this:

M=model.matrix(Muddle)

qr(M)$rank

[1] 79

Nützt nüchts. Ich werde's nie verstehen. Ist auch scheissegal.

—
Pass or fail!
ElMaestro

Helmut
★★★

Vienna, Austria,
2011-08-24 02:44
(5423 d 14:45 ago)

@ ElMaestro
Posting: # 7300
Views: 8,765

Period DFs

Post reply

Dear comrades-in-arms!

❝ Nützt nüchts. Ich werde's nie verstehen. Ist auch scheissegal.

Yep. Let me throw in Phoenix6.2 in the spirit of EMA’s Q&A p25:

‘The analysis presented above show that this approach (Method A) is feasible even for unbalanced replicate design studies. The advantage of this approach is that it is straightforward and that it appears to be software and software option independent.’

(my emphases)

Sequential Tests

Hypothesis        Numer_DF  Denom_DF       F_stat     P_value

Period                   3        71       2.6252933  0.05702846

Sequence                 0  Not estimable

Sequence*Subject         75       71       7.9380219  0

Partial Tests

Hypothesis        Numer_DF  Denom_DF       F_stat     P_value

Period                   3        71       1.3322478  0.27074552

Sequence                 0  Not estimable

Sequence*Subject         75       71       7.9380219  0

‘Sequential Tests’ in Phoenix/WinNonlin = SAS I
Quote from the manual about ‘Partial Tests’:

The partial tests in LinMix are not equivalent to the Type III method in SAS though they coincide in most situations.
The Partial Tests worksheet is created by testing each model term given every other model term. Unlike sequential tests, partial tests are invariant under the order in which model terms are listed in the Fixed Effects tab. Partial tests factor out of each model term the contribution attributable to the remaining model terms.
This is computed by modifying the basis created by the QR factorization to yield a basis that more closely resembles that found in balanced data.
For fixed effects models, certain properties can be stated for the two types of ANOVA. For the sequential ANOVA, the sums of squares are statistically independent. Also, the sum of the individual sums of squares is equal to the model sum of squares; which means the ANOVA represents a partitioning of the model sum of squares. However, some terms in ANOVA may be contaminated with undesired effects. The partial ANOVA is designed to eliminate the contamination problem, but the sums of squares are correlated and do not add up to the model sums of squares. The mixed effects tests have similar properties.

Aha!

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

d_labes
★★★

Berlin, Germany,
2011-08-24 13:24
(5423 d 04:05 ago)

@ Helmut
Posting: # 7302
Views: 8,805

Lessons learned?

Post reply

Dear brothers-in-arms!

What does this all tell us? What lessons learned? :confused:

First I hope that the statisticians at EMA responsible for that paragraph of the Q&A read this thread. It must be a splashy slap in their face considering Helmut's quotation of "the spirit of EMA’s Q&A".

Second I come more and more to the conclusion that a decomposition of the subject effect into the terms sequence and subject nested within sequence is not really possible with those crippled datasets (reasons not really understood, but I'm only a bloody amateur and have no such deep understanding of concepts like rank of the model.matrix). The robust model without any peculiarities should therefore include period and subject effects only.
Do I err here?

Third the intra-individual variability (represented by the residual error) is fortunately not affected by all that muddle. And this is the only interesting number in this analysis. The rest is really "scheissegal" (best vulgar translation found: don't give a flying fuck :-D

).
Also I personally have a deep horror to see df=0 or 'not estimable' in a model.

—
Regards,

Detlew

Helmut
★★★

Vienna, Austria,
2011-08-24 15:51
(5423 d 01:38 ago)

@ d_labes
Posting: # 7304
Views: 8,752

Lessons learned?

Post reply

Dear control freaks!

❝ Second I come more and more to the conclusion that a decomposition of the subject effect into the terms sequence and subject nested within sequence is not really possible with those crippled datasets […]. The robust model without any peculiarities should therefore include period and subject effects only.

❝ Do I err here?

If you want to get a second amateur’s opinion: exactly. That’s why I called the datasets (after removal of T!) ‘crippled’. All information about sequences (RTRT _^ TRTR ¬ RR!) is lost. Martin (not an amateur) once told me that you can do only statistics if you understand the data generating process. In all replicate datasets I have seen until now, EMA’s method gives a smaller CV_WR than the full model (= FDA’s or EMA’s ‘Method C’). Politics = narrower AR? If we throw away sequences, it doesn’t make any sense to include them in the model (not to speak about nested subjects either).

BTW, more than 20 years after Freeman’s paper¹, Senn’s book², and the empiric study by D’Angelo et al.³ EMA has realised (see IR GL) that sequence effects are of no importance in a properly designed cross-over study. But: Would anybody evaluate a 2×2 cross-over by Y=treatment+period+subject only - though the CI stays the same?

Agree with your third point.

Quotes on a sad occasion:

Ein Klavier, ein Klavier! Mutter, wir danken dir!
In der Politik gibt niemals der Klügere nach,
sondern immer der Schwächere…

Freeman PR. The Performance of the Two-Stage Analysis of Two-Treatment, Two-Period Crossover Trials. Stat Med. 1989;8:1421–32.
S Senn S. Cross-over Trials in Clinical Research. Chichester: John Wiley & Sons; 2^nd ed. 2002.
D’angelo G, Potvin D, Turgeon J. Carry-Over Effects in Bioequivalence Studies. J Biopharm Stat. 2001;11(1&2):35–43.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2011-08-28 23:34
(5418 d 17:54 ago)

@ d_labes
Posting: # 7307
Views: 8,504

Lessons learned?

Post reply

Hi all,

quick note:
There is a sequence effect to be fit, no doubt about that.

The fact that we boil TRTR and RTRT down to RR is not the direct culprit here as treatment is not considered. But as was said by Helmut there is an interaction or nesting of period in sequence. So just as is the case with subejcts, whenever period is in the Muddle, any type III anova goes bonkers.
Compare the two sequential ones:

Muddle=lm(log(b$Data)~factor(b$Period) + factor(b$Sequence))

anova(Muddle)

Analysis of Variance Table



Response: log(b$Data)

                  Df Sum Sq Mean Sq F value Pr(>F)

factor(b$Period)   3   1.57 0.52326  0.5752 0.6322

Residuals        146 132.81 0.90968



Muddle=lm(log(b$Data)~factor(b$Sequence) + factor(b$Period))

anova(Muddle)

Analysis of Variance Table



Response: log(b$Data)

                    Df  Sum Sq Mean Sq F value Pr(>F)

factor(b$Sequence)   1   0.088 0.08801  0.0968 0.7562

factor(b$Period)     2   1.482 0.74088  0.8144 0.4449

Residuals          146 132.813 0.90968

Remember earlier discussions about three-treatment designs where EMA occasionally does not like that the residual is polluted by an irrelevant treatment A when comparison of treatment B and C is done. In those cases the dataset is pruned down to a "pseudo" two-treatment, two-sequence, two-period dataset and analysed the usual way. We could do the same to periods here and get rid of the ugly 'issue', I think. I don't think this will change the residual SS (which is what is important), however, but I haven't done it or thought it fully through.

—
Pass or fail!
ElMaestro

d_labes
★★★

Berlin, Germany,
2011-08-24 12:47
(5423 d 04:42 ago)

@ ElMaestro
Posting: # 7301
Views: 8,770

Period DFs - EMA code literally in R

Post reply

Dear ElMaestro!

❝ Muddle=lm(log(b$Data)~factor(b$Period)+factor(b$Sequence)+factor(b$Subject))

❝ anova(Muddle)

❝ Analysis of Variance Table

❝

❝ Response: log(b$Data)


❝                   Df  Sum Sq Mean Sq F value    Pr(>F)    
❝ factor(b$Period)   3   1.570 0.52326  2.6253   0.05703 .
❝ factor(b$Subject) 75 118.662 1.58216  7.9380 2.454e-16 ***

❝ Residuals 71 14.151 0.19931

factor(b$sequence) vanished in nirvana :confused:

Clever or bug?

In terms of the EMA code literally taken but treated in R you can observe a similar "cleverle" but now with type III
(I have renamed Data to Cmax and done a factor transformation before calling lm()):

muddle2 <- lm(log(Cmax)~ sequence + subject%in%sequence + period, data=refdata)

anova(muddle2)

Analysis of Variance Table



Response: log(Cmax)

                 Df  Sum Sq Mean Sq F value    Pr(>F)    

sequence          1   0.088 0.08801  0.4416   0.50852    

period            2   1.482 0.74088  3.7172   0.02915 *  

sequence:subject 75 118.662 1.58216  7.9380 2.454e-16 ***

Residuals        71  14.151 0.19931                      

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

drop1(muddle2)

Single term deletions



Model:

log(Cmax) ~ sequence + subject %in% sequence + period

                 Df Sum of Sq     RSS      AIC

<none>                         14.151 -196.125

period            2     0.773  14.924 -192.150

sequence:subject 75   118.662 132.813  -10.254

Note the period DF's. Also note the reversed order of effects.

❝ Nützt nüchts. Ich werde's nie verstehen. Ist auch scheissegal.

Here we are 2!

—
Regards,

Detlew

ElMaestro
★★★

Denmark,
2011-08-24 15:04
(5423 d 02:25 ago)

@ d_labes
Posting: # 7303
Views: 8,658

Period DFs - EMA code literally in R

Post reply

Dear d_labes

❝ factor(b$sequence) vanished in nirvana :confused:

❝ Clever or bug?

I have no idea. I am one big question mark.

❝ muddle2 <- lm(log(Cmax)~ sequence + subject%in%sequence + period, data=refdata)

My eyes, my eyes, it hurts it hurts! Subject%in%sequence when all subjects have unique identifiers... Initial muddle matrix will contain 77 columns of pure zeros. Makes no practical difference for the results, though.

❝ drop1(muddle2)

drop1 does not work like in SAS when it comes to sequence. When R drops Sequence to get the deltaSS for this factor it does not automatically recognise that Sequence is already in the model because it is contained in Subject. SAS does that (or at least: should do it, but might obviously be failing here).

Your example with first the sequential muddle with period last and then a drop1 is truly brilliant. Here I would expect exactly the same SS for period between the two anovas. It just isn't the case.

Where's hrotter and jdetlor? There must be an explanation.

—
Pass or fail!
ElMaestro

Helmut ★★★ Vienna, Austria, 2011-08-24 16:01 (5423 d 01:28 ago) @ ElMaestro Posting: # 7305 Views: 8,733	call for expert’s opinions Post reply
	Comrade! ❝ Where's hrotter and jdetlor? Wouldn’t be too optimistic about Hermann Rotter (last login 2010-04-08), maybe J. Detlor (last login 2011-07-29) is willing to step into the muddle. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes

jdetlor
☆

2011-11-01 04:05
(5354 d 12:24 ago)

@ Helmut
Posting: # 7568
Views: 7,743

call for expert’s opinions

Post reply

❝ Comrade!

❝

❝ ❝ Where's hrotter and jdetlor?

Dear All!

It's been a while since I've logged on because I have left the pharmaceutical/bioequivalence consulting industry. I have gone back to finance. I'm now using various statistical methods to build decision trees to evaluate customer risk.

Anyways, if I read the problem correctly, I believe the issue lies within the modification to your design matrix (X-matrix). As a full replicate design, the periods and sequences are independent with respect to treatment. When you delete a specific treatment from the data, the columns for sequence and period are not linearly independent anymore, and thus when PROC GLM tries to assign degrees of freedom based on independent columns, it fails to assign some to the sequence effect.

Hope this helps with the issue, and I will be updating my profile with a working email address.

Thanks

J. Detlor

Edit:
Forgot to mention — in SAS you can see this problem if you use PROC GENMOD with the modified data set (same model statement though) and request the design matrix through the ODS output.

ElMaestro ★★★ Denmark, 2011-11-02 10:48 (5353 d 05:41 ago) @ jdetlor Posting: # 7580 Views: 17,297	The finance sector Post reply
	Hi jdetlor, ❝ It's been a while since I've logged on because I have left the pharmaceutical/bioequivalence consulting industry. I have gone back to finance. I am sure they need your competent help in the finance sector. Best regards, EM.

EMA replicate surprise [Regulatives / Guidelines]

EMA replicate surprise

Big bäng CV

Big bäng CV

Period DFs

Period DFs

Period DFs

Period DFs

Period DFs

Lessons learned?

Lessons learned?

Lessons learned?

Period DFs - EMA code literally in R

Period DFs - EMA code literally in R

call for expert’s opinions

call for expert’s opinions

The finance sector