No ANOVA by lme() – lengthy reply [🇷 for BE/BA]

posted by Helmut Homepage – Vienna, Austria, 2021-12-31 00:45 (837 d 18:20 ago) – Posting: # 22723
Views: 2,705

Hi Weidson and others who want to give it a try…

Download and save to a suitable folder. Depending on your operating system, browser, and localization possibly you will be asked to open the file in Excel or another spreadsheet. Don’t. Just save it. Then:

# Check whether the package is installed in the minimum required version.
# If yes, attach it. If not, stop and ask the user for download/installation.

pack <- "replicateBE"                         # Required package
inst <- installed.packages()[pack, "Version"] # Is it installed?
if (length(inst) == 0) {                      # No
  stop ("Please download/install the package \'replicateBE\' from CRAN.")
} else {
  suppressMessages(require(replicateBE))      # Attach it.
  if (inst < "1.1.0")                         # Check version.
    stop ("At least version 1.1.0 of package \'replicateBE\' required.",
          "\n       Please download/install the current version from CRAN.")
}
path.out <- path.in <- "E:/Public/Documents/BEBAC/R/Weidson" # adapt the folder accordingly
method.A(path.in = path.in, path.out = path.out, file = "IPC_Drug.Cmax",
         ext = "csv", sep = "\t", dec = ".", ola = TRUE, verbose = TRUE)
method.B(path.in = path.in, path.out = path.out, file = "IPC_Drug.Cmax",
         ext = "csv", sep = "\t", dec = ".", ola = TRUE, verbose = TRUE)


With method.A(...) you should get in the console:

Data set IPC_Drug.Cmax: Method A by lm()
────────────────────────────────────────
Type III Analysis of Variance Table

Response: log(PK)
                  Df   Sum Sq Mean Sq F value   Pr(>F)
sequence           1   1.3253 1.32531 0.27224 0.604233
period             3   3.5824 1.19413 2.19909 0.090673
treatment          1   2.7923 2.79228 5.14222 0.024818
sequence:subject  48 233.6721 4.86817 8.96514  < 2e-16
Residuals        146  79.2796 0.54301                 

treatment T – R:
  Estimate Std. Error    t value   Pr(>|t|)
-0.2380370  0.1049710 -2.2676500  0.0248184
146 Degrees of Freedom


And with method.B(...):

Data set IPC_Drug.Cmax: Method B (option = 2) by lme()
──────────────────────────────────────────────────────
Response: log(PK)
            numDF denDF       F-value p-value
(Intercept)     1   146 763.258972853  <.0001
sequence        1    48   0.272239993  0.6042
period          3   146   2.199092956  0.0907
treatment       1   146   5.142222444  0.0248

treatment T – R:
    Value Std.Error   t-value   p-value
-0.238040  0.104970 -2.267600  0.024818
146 Degrees of Freedom (equivalent to SAS’ DDFM=CONTAIN)


The point estimates $$\small{100\,\exp(-0.238037)\approx78.81735\%\;vs\;100\,\exp(-0.23804)\approx78.81712\%}$$ and standard errors (\(\small{0.104971\;vs\;0.104970}\)) are very similar because the data set is complete (no missing periods). Results are identical after rounding acc. to the guidelines (see mine for Method A and Method B).

You don’t need library(stats) because it is attached by default.
Try print(sessionInfo(), locale = FALSE). I got after startup:

R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.1.2


Reproducing your results:

file    <- "E:/Public/Documents/BEBAC/R/Weidson/IPC_Drug.Cmax.csv"
DS_Cmax <- read.csv(file = file, sep = "\t", dec = "." , header = TRUE)
Fit_A   <- lm(log(PK) ~ Sequence + Subject %in% Sequence +
                        Period + Treatment, data = DS_Cmax)
str(DS_Cmax)
print(anova(Fit_A), digits = 10, signif.stars = FALSE)

'data.frame':   200 obs. of  5 variables:
 $ Subject  : int  2 2 2 2 3 3 3 3 5 5 ...
 $ Period   : int  1 2 3 4 1 2 3 4 1 2 ...
 $ Treatment: chr  "T" "R" "T" "R" ...
 $ Sequence : chr  "TRTR" "TRTR" "TRTR" "TRTR" ...
 $ PK       : num  557 525 119 201 544 ...

Analysis of Variance Table

Response: log(PK)
                  Df       Sum Sq     Mean Sq F value  Pr(>F)
Sequence           1   1.32531015 1.325310155 0.83233 0.36273
Period             1   0.31093557 0.310935573 0.19528 0.65905
Treatment          1   3.41095080 3.410950796 2.14218 0.14492
Sequence:Subject   2   6.70138392 3.350691961 2.10433 0.12470
Residuals        194 308.90305612 1.592283794

Here lm() is left out in the rain because we have only a simple data frame. The degrees of freedom and Mean Squares are wrong. Therefore, we need to factorize effects:

# columns which need to be factorized
cols          <- c("Subject", "Period", "Sequence", "Treatment")
DS_Cmax[cols] <- lapply(DS_Cmax[cols], factor)
Fit_A.corr    <- lm(log(PK) ~ Sequence + Subject %in% Sequence +
                              Period + Treatment, data = DS_Cmax)
TypeI         <- anova(Fit_A.corr)
str(DS_Cmax)
print(TypeI, digits = 6, signif.stars = FALSE)

'data.frame':   200 obs. of  5 variables:
 $ Subject  : Factor w/ 50 levels "2","3","5","8",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ Period   : Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...
 $ Treatment: Factor w/ 2 levels "R","T": 2 1 2 1 2 1 2 1 2 1 ...
 $ Sequence : Factor w/ 2 levels "RTRT","TRTR": 2 2 2 2 2 2 2 2 2 2 ...
 $ PK       : num  557 525 119 201 544 ...

Analysis of Variance Table

Response: log(PK)
                  Df   Sum Sq Mean Sq F value   Pr(>F)
Sequence           1   1.3253 1.32531 2.44067 0.120391
Period             3   3.5824 1.19413 2.19909 0.090673
Treatment          1   2.7923 2.79228 5.14222 0.024818
Sequence:Subject  48 233.6721 4.86817 8.96514  < 2e-16
Residuals        146  79.2796 0.54301

That’s the default, where Sequence is tested against Residuals. Now change to the correct test:

TypeIII       <- TypeI # use what we have
attr(TypeIII, "heading")[1] <- "Type III Analysis of Variance Table\n"
MSdenom       <- TypeIII["Sequence:Subject", "Mean Sq"]
df2           <- TypeIII["Sequence:Subject", "Df"]
fvalue        <- TypeIII["Sequence", "Mean Sq"] / MSdenom
df1           <- TypeIII["Sequence", "Df"]
TypeIII["Sequence", 4] <- fvalue
TypeIII["Sequence", 5] <- pf(fvalue, df1, df2, lower.tail = FALSE)
print(TypeIII, digits = 6, signif.stars = FALSE)

Analysis of Variance Table

Response: log(PK)
                  Df   Sum Sq Mean Sq F value   Pr(>F)
Sequence           1   1.3253 1.32531 2.44067 0.120391
Period             3   3.5824 1.19413 2.19909 0.090673
Treatment          1   2.7923 2.79228 5.14222 0.024818
Sequence:Subject  48 233.6721 4.86817 8.96514  < 2e-16
Residuals        146  79.2796 0.54301

Finally we get what replicateBE method.A() provides.

Two more comments:
  1. As shown above, Fit_A <- lm() followed by anova(Fit_A) will give by default (in SAS parlance) Type I estimates. That’s not what we want (see there and how to get a Type III ANOVA in [image] and Phoenix Win­Non­lin). We changed that in v1.1.0 of replicateBE last June (see here and there).
  2. In a mixed-effects model there are no least squares estimates cause the optimization is done by (restricted) maximum likelihood. IMHO, anova((lme(foo ~ bar)) is bit unfortunate cause it is not an ANOVA (see there).
    • When only one fitted model object is present, a data frame with the numerator degrees of freedom, denominator degrees of freedom, F-values, and P-values for Wald tests for the terms in the model (when Terms and L are NULL) …
    Furthermore, you get only the fixed effects in the data frame, i.e., not subjects, which are a random effect. Note also that in lme() you get automatically the correct test for sequence.
[image]BTW, the nested model subject(sequence) doesn’t make sense for uniquely coded subjects (see there). If Subject 1 is randomized to sequence TR, there is not ‘another’ Subject 1 randomized to sequence RT. Randomization is not like Schrö­dinger’s cat. Hence, the nested term in the guidelines is an insult to the mind. That’s more evident if we use aov() instead of anova(), showing in an additional line if effects are not estimable.

nested <- lm(log(PK) ~ Sequence + Subject %in% Sequence +
                       Period + Treatment, data = DS_Cmax)
print(aov(nested))

                 Sequence    Period Treatment Sequence:Subject Residuals
Sum of Squares    1.32531   3.58239   2.79228        233.67209  79.27956
Deg. of Freedom         1         3         1               48       146

Residual standard error: 0.7368926

50 out of 104 effects not estimable
Estimated effects may be unbalanced

# not nested
simple <- lm(log(PK) ~ Sequence + Subject + Period + Treatment, data = DS_Cmax)
print(aov(simple))

                 Sequence   Subject    Period Treatment Residuals
Sum of Squares    1.32531 233.67209   3.58239   2.79228  79.27956
Deg. of Freedom         1        48         3         1       146

Residual standard error: 0.7368926

1 out of 55 effects not estimable
Estimated effects may be unbalanced

# without sequence
very.simple <- lm(log(PK) ~ Subject + Period + Treatment, data = DS_Cmax)
print(aov(very.simple))

                  Subject    Period Treatment Residuals
Sum of Squares  234.99740   3.58239   2.79228  79.27956
Deg. of Freedom        49         3         1       146

Residual standard error: 0.7368926
Estimated effects may be unbalanced

In all models (even the one without Sequence) the MSE of Period, and Treatment and the Residual Error are identical. Hence, you will get the same PE and its CI…

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
Activity
 Admin contact
22,984 posts in 4,822 threads, 1,651 registered users;
43 visitors (0 registered, 43 guests [including 4 identified bots]).
Forum time: 20:06 CEST (Europe/Vienna)

You can’t fix by analysis
what you bungled by design.    Richard J. Light, Judith D. Singer, John B. Willett

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5