Helmut
★★★
avatar
Homepage
Vienna, Austria,
2022-11-15 11:55
(499 d 06:51 ago)

Posting: # 23367
Views: 2,718
 

 Observations in a linear model [General Sta­tis­tics]

Dear all,

off-list I was asked whether it makes a difference if we have single observations or replicates. When you look at the standard equations, clearly the answer is no.
If you don’t believe that, an [image]-script for the simulation example at the end.

x = 1, 2, 3, 4, 5
 estimate    a.hat  b.hat
     mean +0.00001 1.9999
x = 1, 1, 3, 5, 5
 estimate    a.hat  b.hat
     mean -0.00003 2.0000
x = 1, 1, 1, 5, 5
 estimate    a.hat  b.hat
     mean +0.00011 1.9999
x = 1, 1, 1, 1, 5
 estimate    a.hat  b.hat
     mean +0.00012 1.9999

Very small differences in the estimated intercept, practically identical slope.


lr <- function(x, y, n) { # standard equations are faster than lm(y ~ x)
  sum.y <- sum(y)
  sum.x <- sum(x)
  b.hat <- (sum(x * y) - sum.x * sum.y / n) / (sum(x^2) - sum.x^2 / n)
  a.hat <- sum.y / n - sum.x / n * b.hat
  return(c(a.hat, b.hat))
}
a     <- 0   # intercept
b     <- 2   # slope
nsims <- 1e5 # number of simulations
x.n   <- 5   # number of x-levels
x.min <- 1   # minimum level
x.max <- 5   # maximum level
x1    <- x2 <- x3 <- x4 <- seq(x.min, x.max, length.out = x.n)
x2[c(c(1:2), c((x.n-1):x.n))] <- c(rep(x.min, 2), rep(x.max, 2))
x3[c(c(1:3), c((x.n-1):x.n))] <- c(rep(x.min, 3), rep(x.max, 2))
x4[1:4] <- rep(x.min, 4)
s1    <- data.frame(sim = rep(1:nsims, each = x.n), x = x1) # singlets
s2    <- data.frame(sim = rep(1:nsims, each = x.n), x = x2) # 2 duplicates
s3    <- data.frame(sim = rep(1:nsims, each = x.n), x = x3) # tri- and duplicate
s4    <- data.frame(sim = rep(1:nsims, each = x.n), x = x4) # quadruplet
t     <- c(paste("x =", paste(signif(x1, 3), collapse = ", "), "\n"),
                 paste("x =", paste(signif(x2, 3), collapse = ", "), "\n"),
                 paste("x =", paste(signif(x3, 3), collapse = ", "), "\n"),
                 paste("x =", paste(signif(x4, 3), collapse = ", "), "\n"))
set.seed(123456)
s1$y  <- rnorm(n = nsims * x.n, mean = a + b * s1$x, sd = 0.5)
set.seed(123456)
s2$y  <- rnorm(n = nsims * x.n, mean = a + b * s2$x, sd = 0.5)
set.seed(123456)
s3$y  <- rnorm(n = nsims * x.n, mean = a + b * s3$x, sd = 0.5)
set.seed(123456)
s4$y  <- rnorm(n = nsims * x.n, mean = a + b * s4$x, sd = 0.5)
c1    <- c2 <- c3 <- c4 <- data.frame(sim = 1:nsims, a.hat = NA_real_, b.hat = NA_real_)
pb    <- txtProgressBar(style = 3)
for (i in 1:nsims) {
  c1[i, 2:3] <- lr(s1$x[s1$sim == i], s1$y[s1$sim == i], x.n)
  c2[i, 2:3] <- lr(s2$x[s2$sim == i], s2$y[s2$sim == i], x.n)
  c3[i, 2:3] <- lr(s3$x[s3$sim == i], s3$y[s3$sim == i], x.n)
  c4[i, 2:3] <- lr(s4$x[s4$sim == i], s4$y[s4$sim == i], x.n)
  setTxtProgressBar(pb, i / nsims)
}
close(pb)
comp  <- data.frame(estimate = rep("mean", 4),
                    a.hat = sprintf("%+.5f", c(mean(c1$a.hat), mean(c2$a.hat),
                                               mean(c3$a.hat), mean(c4$a.hat))),
                    b.hat = sprintf("%.4f", c(mean(c1$b.hat), mean(c2$b.hat),
                                              mean(c3$b.hat), mean(c4$b.hat))))
cat(t[1]); print(comp[1, ], row.names = FALSE)
cat(t[2]); print(comp[2, ], row.names = FALSE)
cat(t[3]); print(comp[3, ], row.names = FALSE)
cat(t[4]); print(comp[4, ], row.names = FALSE)


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2022-11-16 09:44
(498 d 09:01 ago)

@ Helmut
Posting: # 23368
Views: 2,217
 

 Observations in a linear model

Hi Helmut :-D

I have no idea what is it is you were trying to test.
If it makes a difference to have more replicates, or something along such lines, please put more word to it. It is way too vague for me.

Please at least define "it" and "difference" in your own context with ample words.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2022-11-16 10:49
(498 d 07:57 ago)

@ ElMaestro
Posting: # 23369
Views: 2,259
 

 Observations in a linear model

Hi ElMaestro,

❝ If it makes a difference to have more replicates, or something along such lines, please put more word to it. It is way too vague for me.


❝ Please at least define "it" and "difference" in your own context with ample words.


I try. Two examples:
  • Two calibration curves spanning the same range with either n single measurements or n/2 duplicates.
  • A 2×2×2 crossover with n subject vs a 4-period full replicate design with n/2 subjects.
The question was whether the estimates of the models (obtained by single observations or replicates) are different.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2022-11-16 21:50
(497 d 20:55 ago)

@ Helmut
Posting: # 23370
Views: 2,193
 

 Observations in a linear model

Hi,

I still do not quite know what it is you are trying to convince yourself about. :-D

❝ I try. Two examples:

❝ – Two calibration curves spanning the same range with either n single measurements or n/2 duplicates.

❝ – A 2×2×2 crossover with n subject vs a 4-period full replicate design with n/2 subjects.

❝ The question was whether the estimates of the models (obtained by single observations or re­plicates) are different.


The estimates are unbiased estimators of the effects you are extracting. That's how it works, so there is no surprise here. Now try and consider the quality of the estimates, for example if you want to know how well is a mean (or slope or intercept or ...) estimated. Look for example at the SE of the estimates, easily extractable from an lm via e.g.
summary(M)$coeff[,2]

More replicates, better estimates. Or, expressed in clear BE terms, higher sample size gives narrower confidence intervals, but the expected value of the point estimate is unchanged.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2022-11-17 11:19
(497 d 07:27 ago)

@ ElMaestro
Posting: # 23371
Views: 2,163
 

 And the Oscar goes to: Replicates!

Hi ElMaestro,

❝ The estimates are unbiased estimators of the effects you are extracting. That's how it works, so there is no surprise here. Now try and consider the quality of the estimates, for example if you want to know how well is a mean (or slope or intercept or ...) estimated. Look for example at the SE of the estimates, easily extractable from an lm via e.g.

summary(M)$coeff[,2]


Your wish is my command.

x = 1, 2, 3, 4, 5
 estimate    a.hat     SE  b.hat     SE
     mean +0.00001 0.4836 1.9999 0.1458
x = 1, 1, 3, 5, 5
 estimate    a.hat     SE  b.hat     SE
     mean -0.00003 0.4024 2.0000 0.1152
x = 1, 1, 1, 5, 5
 estimate    a.hat     SE  b.hat     SE
     mean +0.00011 0.3423 1.9999 0.1051
x = 1, 1, 1, 1, 5
 estimate    a.hat     SE  b.hat     SE
     mean +0.00012 0.3102 1.9999 0.1288


❝ More replicates, better estimates.


You are right! I always preferred calibration curves with duplicates. ;-)

❝ Or, expressed in clear BE terms, higher sample size gives narrower confidence intervals, but the expected value of the point estimate is unchanged.


Not only that. Power depends roughly on the total number of treatments. Even more, in replicate designs we have more degrees of freedom than in a 2×2×2 crossover.

library(PowerTOST)
des1  <- c("2x2x2", "2x2x4", "2x2x3", "2x3x3")
des2  <- c("2×2×2 crossover",
           "2-sequence 4-period full replicate",
           "2-sequence 3-period full replicate",
           "3-sequence 3-period partial replicate")
CV    <- 0.35
cross <- sampleN.TOST(CV = CV, design = des1[1], print = FALSE)[7:8]
full1 <- sampleN.TOST(CV = CV, design = des1[2], print = FALSE)[7:8]
full2 <- sampleN.TOST(CV = CV, design = des1[3], print = FALSE)[7:8]
part  <- sampleN.TOST(CV = CV, design = des1[4], print = FALSE)[7:8]
x     <- data.frame(design = des2,
                    n = c(cross[["Sample size"]], full1[["Sample size"]],
                          full2[["Sample size"]], part[["Sample size"]]),
                    df = c(cross[["Sample size"]] -  2,
                           3 * full1[["Sample size"]] - 4,
                           2 * full2[["Sample size"]] - 3,
                           2 * part[["Sample size"]] - 3),
                    power = c(cross[["Achieved power"]], full1[["Achieved power"]],
                              full2[["Achieved power"]], part[["Achieved power"]]))
print(x, row.names = FALSE, right = FALSE)

design                                n  df power   
2×2×2 crossover                       52 50 0.8074702
2-sequence 4-period full replicate    26 74 0.8108995
2-sequence 3-period full replicate    38 73 0.8008153
3-sequence 3-period partial replicate 39 75 0.8109938


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2022-11-18 18:28
(496 d 00:17 ago)

@ Helmut
Posting: # 23372
Views: 2,160
 

 It's all in the df's

Hi again,

stating the entire thread in fewer terms and no code: more df's = better (more precise) estimates.
You can increase the df's in two ways:
  1. Betweens: More subjects
  2. Withins: More treatments (or, if you will, treatment periods) per subject
Both will work, but which one works best in any given circumstance depends on the details of the problem at hand.

Amen.

Pass or fail!
ElMaestro
UA Flag
Activity
 Admin contact
22,957 posts in 4,819 threads, 1,636 registered users;
68 visitors (0 registered, 68 guests [including 4 identified bots]).
Forum time: 18:46 CET (Europe/Vienna)

Nothing shows a lack of mathematical education more
than an overly precise calculation.    Carl Friedrich Gauß

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5