Observations in a linear model [General Statistics]
off-list I was asked whether it makes a difference if we have single observations or replicates. When you look at the standard equations, clearly the answer is no.
If you don’t believe that, an -script for the simulation example at the end.
x = 1, 2, 3, 4, 5
estimate a.hat b.hat
mean +0.00001 1.9999
x = 1, 1, 3, 5, 5
estimate a.hat b.hat
mean -0.00003 2.0000
x = 1, 1, 1, 5, 5
estimate a.hat b.hat
mean +0.00011 1.9999
x = 1, 1, 1, 1, 5
estimate a.hat b.hat
mean +0.00012 1.9999
lr <- function(x, y, n) { # standard equations are faster than lm(y ~ x)
sum.y <- sum(y)
sum.x <- sum(x)
b.hat <- (sum(x * y) - sum.x * sum.y / n) / (sum(x^2) - sum.x^2 / n)
a.hat <- sum.y / n - sum.x / n * b.hat
return(c(a.hat, b.hat))
}
a <- 0 # intercept
b <- 2 # slope
nsims <- 1e5 # number of simulations
x.n <- 5 # number of x-levels
x.min <- 1 # minimum level
x.max <- 5 # maximum level
x1 <- x2 <- x3 <- x4 <- seq(x.min, x.max, length.out = x.n)
x2[c(c(1:2), c((x.n-1):x.n))] <- c(rep(x.min, 2), rep(x.max, 2))
x3[c(c(1:3), c((x.n-1):x.n))] <- c(rep(x.min, 3), rep(x.max, 2))
x4[1:4] <- rep(x.min, 4)
s1 <- data.frame(sim = rep(1:nsims, each = x.n), x = x1) # singlets
s2 <- data.frame(sim = rep(1:nsims, each = x.n), x = x2) # 2 duplicates
s3 <- data.frame(sim = rep(1:nsims, each = x.n), x = x3) # tri- and duplicate
s4 <- data.frame(sim = rep(1:nsims, each = x.n), x = x4) # quadruplet
t <- c(paste("x =", paste(signif(x1, 3), collapse = ", "), "\n"),
paste("x =", paste(signif(x2, 3), collapse = ", "), "\n"),
paste("x =", paste(signif(x3, 3), collapse = ", "), "\n"),
paste("x =", paste(signif(x4, 3), collapse = ", "), "\n"))
set.seed(123456)
s1$y <- rnorm(n = nsims * x.n, mean = a + b * s1$x, sd = 0.5)
set.seed(123456)
s2$y <- rnorm(n = nsims * x.n, mean = a + b * s2$x, sd = 0.5)
set.seed(123456)
s3$y <- rnorm(n = nsims * x.n, mean = a + b * s3$x, sd = 0.5)
set.seed(123456)
s4$y <- rnorm(n = nsims * x.n, mean = a + b * s4$x, sd = 0.5)
c1 <- c2 <- c3 <- c4 <- data.frame(sim = 1:nsims, a.hat = NA_real_, b.hat = NA_real_)
pb <- txtProgressBar(style = 3)
for (i in 1:nsims) {
c1[i, 2:3] <- lr(s1$x[s1$sim == i], s1$y[s1$sim == i], x.n)
c2[i, 2:3] <- lr(s2$x[s2$sim == i], s2$y[s2$sim == i], x.n)
c3[i, 2:3] <- lr(s3$x[s3$sim == i], s3$y[s3$sim == i], x.n)
c4[i, 2:3] <- lr(s4$x[s4$sim == i], s4$y[s4$sim == i], x.n)
setTxtProgressBar(pb, i / nsims)
}
close(pb)
comp <- data.frame(estimate = rep("mean", 4),
a.hat = sprintf("%+.5f", c(mean(c1$a.hat), mean(c2$a.hat),
mean(c3$a.hat), mean(c4$a.hat))),
b.hat = sprintf("%.4f", c(mean(c1$b.hat), mean(c2$b.hat),
mean(c3$b.hat), mean(c4$b.hat))))
cat(t[1]); print(comp[1, ], row.names = FALSE)
cat(t[2]); print(comp[2, ], row.names = FALSE)
cat(t[3]); print(comp[3, ], row.names = FALSE)
cat(t[4]); print(comp[4, ], row.names = FALSE)
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Complete thread:
- Observations in a linear modelHelmut 2022-11-15 10:55 [General Statistics]
- Observations in a linear model ElMaestro 2022-11-16 08:44
- Observations in a linear model Helmut 2022-11-16 09:49
- Observations in a linear model ElMaestro 2022-11-16 20:50
- And the Oscar goes to: Replicates! Helmut 2022-11-17 10:19
- It's all in the df's ElMaestro 2022-11-18 17:28
- And the Oscar goes to: Replicates! Helmut 2022-11-17 10:19
- Observations in a linear model ElMaestro 2022-11-16 20:50
- Observations in a linear model Helmut 2022-11-16 09:49
- Observations in a linear model ElMaestro 2022-11-16 08:44