Helmut
★★★

Vienna, Austria,
2022-11-15 11:55
(24 d 11:34 ago)

Posting: # 23367
Views: 614

## Observations in a linear model [General Sta­tis­tics]

Dear all,

off-list I was asked whether it makes a difference if we have single observations or replicates. When you look at the standard equations, clearly the answer is no.
If you don’t believe that, an -script for the simulation example at the end.

x = 1, 2, 3, 4, 5  estimate    a.hat  b.hat      mean +0.00001 1.9999 x = 1, 1, 3, 5, 5  estimate    a.hat  b.hat      mean -0.00003 2.0000 x = 1, 1, 1, 5, 5  estimate    a.hat  b.hat      mean +0.00011 1.9999 x = 1, 1, 1, 1, 5  estimate    a.hat  b.hat      mean +0.00012 1.9999

Very small differences in the estimated intercept, practically identical slope.

lr <- function(x, y, n) { # standard equations are faster than lm(y ~ x)   sum.y <- sum(y)   sum.x <- sum(x)   b.hat <- (sum(x * y) - sum.x * sum.y / n) / (sum(x^2) - sum.x^2 / n)   a.hat <- sum.y / n - sum.x / n * b.hat   return(c(a.hat, b.hat)) } a     <- 0   # intercept b     <- 2   # slope nsims <- 1e5 # number of simulations x.n   <- 5   # number of x-levels x.min <- 1   # minimum level x.max <- 5   # maximum level x1    <- x2 <- x3 <- x4 <- seq(x.min, x.max, length.out = x.n) x2[c(c(1:2), c((x.n-1):x.n))] <- c(rep(x.min, 2), rep(x.max, 2)) x3[c(c(1:3), c((x.n-1):x.n))] <- c(rep(x.min, 3), rep(x.max, 2)) x4[1:4] <- rep(x.min, 4) s1    <- data.frame(sim = rep(1:nsims, each = x.n), x = x1) # singlets s2    <- data.frame(sim = rep(1:nsims, each = x.n), x = x2) # 2 duplicates s3    <- data.frame(sim = rep(1:nsims, each = x.n), x = x3) # tri- and duplicate s4    <- data.frame(sim = rep(1:nsims, each = x.n), x = x4) # quadruplet t     <- c(paste("x =", paste(signif(x1, 3), collapse = ", "), "\n"),                  paste("x =", paste(signif(x2, 3), collapse = ", "), "\n"),                  paste("x =", paste(signif(x3, 3), collapse = ", "), "\n"),                  paste("x =", paste(signif(x4, 3), collapse = ", "), "\n")) set.seed(123456) s1$y <- rnorm(n = nsims * x.n, mean = a + b * s1$x, sd = 0.5) set.seed(123456) s2$y <- rnorm(n = nsims * x.n, mean = a + b * s2$x, sd = 0.5) set.seed(123456) s3$y <- rnorm(n = nsims * x.n, mean = a + b * s3$x, sd = 0.5) set.seed(123456) s4$y <- rnorm(n = nsims * x.n, mean = a + b * s4$x, sd = 0.5) c1    <- c2 <- c3 <- c4 <- data.frame(sim = 1:nsims, a.hat = NA_real_, b.hat = NA_real_) pb    <- txtProgressBar(style = 3) for (i in 1:nsims) {   c1[i, 2:3] <- lr(s1$x[s1$sim == i], s1$y[s1$sim == i], x.n)   c2[i, 2:3] <- lr(s2$x[s2$sim == i], s2$y[s2$sim == i], x.n)   c3[i, 2:3] <- lr(s3$x[s3$sim == i], s3$y[s3$sim == i], x.n)   c4[i, 2:3] <- lr(s4$x[s4$sim == i], s4$y[s4$sim == i], x.n)   setTxtProgressBar(pb, i / nsims) } close(pb) comp  <- data.frame(estimate = rep("mean", 4),                     a.hat = sprintf("%+.5f", c(mean(c1$a.hat), mean(c2$a.hat),                                                mean(c3$a.hat), mean(c4$a.hat))),                     b.hat = sprintf("%.4f", c(mean(c1$b.hat), mean(c2$b.hat),                                               mean(c3$b.hat), mean(c4$b.hat)))) cat(t[1]); print(comp[1, ], row.names = FALSE) cat(t[2]); print(comp[2, ], row.names = FALSE) cat(t[3]); print(comp[3, ], row.names = FALSE) cat(t[4]); print(comp[4, ], row.names = FALSE)

Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2022-11-16 09:44
(23 d 13:44 ago)

@ Helmut
Posting: # 23368
Views: 519

## Observations in a linear model

Hi Helmut

I have no idea what is it is you were trying to test.
If it makes a difference to have more replicates, or something along such lines, please put more word to it. It is way too vague for me.

Please at least define "it" and "difference" in your own context with ample words.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2022-11-16 10:49
(23 d 12:40 ago)

@ ElMaestro
Posting: # 23369
Views: 512

## Observations in a linear model

Hi ElMaestro,

❝ If it makes a difference to have more replicates, or something along such lines, please put more word to it. It is way too vague for me.

❝ Please at least define "it" and "difference" in your own context with ample words.

I try. Two examples:
• Two calibration curves spanning the same range with either n single measurements or n/2 duplicates.
• A 2×2×2 crossover with n subject vs a 4-period full replicate design with n/2 subjects.
The question was whether the estimates of the models (obtained by single observations or replicates) are different.

Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2022-11-16 21:50
(23 d 01:38 ago)

@ Helmut
Posting: # 23370
Views: 476

## Observations in a linear model

Hi,

I still do not quite know what it is you are trying to convince yourself about.

❝ I try. Two examples:

❝ – Two calibration curves spanning the same range with either n single measurements or n/2 duplicates.

❝ – A 2×2×2 crossover with n subject vs a 4-period full replicate design with n/2 subjects.

❝ The question was whether the estimates of the models (obtained by single observations or re­plicates) are different.

The estimates are unbiased estimators of the effects you are extracting. That's how it works, so there is no surprise here. Now try and consider the quality of the estimates, for example if you want to know how well is a mean (or slope or intercept or ...) estimated. Look for example at the SE of the estimates, easily extractable from an lm via e.g.
 summary(M)$coeff[,2] More replicates, better estimates. Or, expressed in clear BE terms, higher sample size gives narrower confidence intervals, but the expected value of the point estimate is unchanged. Pass or fail! ElMaestro Helmut ★★★ Vienna, Austria, 2022-11-17 11:19 (22 d 12:09 ago) @ ElMaestro Posting: # 23371 Views: 448 ## And the Oscar goes to: Replicates! Hi ElMaestro, ❝ The estimates are unbiased estimators of the effects you are extracting. That's how it works, so there is no surprise here. Now try and consider the quality of the estimates, for example if you want to know how well is a mean (or slope or intercept or ...) estimated. Look for example at the SE of the estimates, easily extractable from an lm via e.g. summary(M)$coeff[,2]

x = 1, 2, 3, 4, 5  estimate    a.hat     SE  b.hat     SE      mean +0.00001 0.4836 1.9999 0.1458 x = 1, 1, 3, 5, 5  estimate    a.hat     SE  b.hat     SE      mean -0.00003 0.4024 2.0000 0.1152 x = 1, 1, 1, 5, 5  estimate    a.hat     SE  b.hat     SE      mean +0.00011 0.3423 1.9999 0.1051 x = 1, 1, 1, 1, 5  estimate    a.hat     SE  b.hat     SE      mean +0.00012 0.3102 1.9999 0.1288

❝ More replicates, better estimates.

You are right! I always preferred calibration curves with duplicates.

❝ Or, expressed in clear BE terms, higher sample size gives narrower confidence intervals, but the expected value of the point estimate is unchanged.

Not only that. Power depends roughly on the total number of treatments. Even more, in replicate designs we have more degrees of freedom than in a 2×2×2 crossover.

library(PowerTOST) des1  <- c("2x2x2", "2x2x4", "2x2x3", "2x3x3") des2  <- c("2×2×2 crossover",            "2-sequence 4-period full replicate",            "2-sequence 3-period full replicate",            "3-sequence 3-period partial replicate") CV    <- 0.35 cross <- sampleN.TOST(CV = CV, design = des1[1], print = FALSE)[7:8] full1 <- sampleN.TOST(CV = CV, design = des1[2], print = FALSE)[7:8] full2 <- sampleN.TOST(CV = CV, design = des1[3], print = FALSE)[7:8] part  <- sampleN.TOST(CV = CV, design = des1[4], print = FALSE)[7:8] x     <- data.frame(design = des2,                     n = c(cross[["Sample size"]], full1[["Sample size"]],                           full2[["Sample size"]], part[["Sample size"]]),                     df = c(cross[["Sample size"]] -  2,                            3 * full1[["Sample size"]] - 4,                            2 * full2[["Sample size"]] - 3,                            2 * part[["Sample size"]] - 3),                     power = c(cross[["Achieved power"]], full1[["Achieved power"]],                               full2[["Achieved power"]], part[["Achieved power"]])) print(x, row.names = FALSE, right = FALSE) design                                n  df power    2×2×2 crossover                       52 50 0.8074702 2-sequence 4-period full replicate    26 74 0.8108995 2-sequence 3-period full replicate    38 73 0.8008153 3-sequence 3-period partial replicate 39 75 0.8109938

Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2022-11-18 18:28
(21 d 05:00 ago)

@ Helmut
Posting: # 23372
Views: 414

## It's all in the df's

Hi again,

stating the entire thread in fewer terms and no code: more df's = better (more precise) estimates.
You can increase the df's in two ways:
1. Betweens: More subjects
2. Withins: More treatments (or, if you will, treatment periods) per subject
Both will work, but which one works best in any given circumstance depends on the details of the problem at hand.

Amen.

Pass or fail!
ElMaestro