Observations in a linear model [General Sta­tis­tics]

posted by Helmut Homepage – Vienna, Austria, 2022-11-15 11:55 (1314 d 12:34 ago) – Posting: # 23367
Views: 7,920

Dear all,

off-list I was asked whether it makes a difference if we have single observations or replicates. When you look at the standard equations, clearly the answer is no.
If you don’t believe that, an [image]-script for the simulation example at the end.

x = 1, 2, 3, 4, 5
 estimate    a.hat  b.hat
     mean +0.00001 1.9999
x = 1, 1, 3, 5, 5
 estimate    a.hat  b.hat
     mean -0.00003 2.0000
x = 1, 1, 1, 5, 5
 estimate    a.hat  b.hat
     mean +0.00011 1.9999
x = 1, 1, 1, 1, 5
 estimate    a.hat  b.hat
     mean +0.00012 1.9999

Very small differences in the estimated intercept, practically identical slope.


lr <- function(x, y, n) { # standard equations are faster than lm(y ~ x)
  sum.y <- sum(y)
  sum.x <- sum(x)
  b.hat <- (sum(x * y) - sum.x * sum.y / n) / (sum(x^2) - sum.x^2 / n)
  a.hat <- sum.y / n - sum.x / n * b.hat
  return(c(a.hat, b.hat))
}
a     <- 0   # intercept
b     <- 2   # slope
nsims <- 1e5 # number of simulations
x.n   <- 5   # number of x-levels
x.min <- 1   # minimum level
x.max <- 5   # maximum level
x1    <- x2 <- x3 <- x4 <- seq(x.min, x.max, length.out = x.n)
x2[c(c(1:2), c((x.n-1):x.n))] <- c(rep(x.min, 2), rep(x.max, 2))
x3[c(c(1:3), c((x.n-1):x.n))] <- c(rep(x.min, 3), rep(x.max, 2))
x4[1:4] <- rep(x.min, 4)
s1    <- data.frame(sim = rep(1:nsims, each = x.n), x = x1) # singlets
s2    <- data.frame(sim = rep(1:nsims, each = x.n), x = x2) # 2 duplicates
s3    <- data.frame(sim = rep(1:nsims, each = x.n), x = x3) # tri- and duplicate
s4    <- data.frame(sim = rep(1:nsims, each = x.n), x = x4) # quadruplet
t     <- c(paste("x =", paste(signif(x1, 3), collapse = ", "), "\n"),
                 paste("x =", paste(signif(x2, 3), collapse = ", "), "\n"),
                 paste("x =", paste(signif(x3, 3), collapse = ", "), "\n"),
                 paste("x =", paste(signif(x4, 3), collapse = ", "), "\n"))
set.seed(123456)
s1$y  <- rnorm(n = nsims * x.n, mean = a + b * s1$x, sd = 0.5)
set.seed(123456)
s2$y  <- rnorm(n = nsims * x.n, mean = a + b * s2$x, sd = 0.5)
set.seed(123456)
s3$y  <- rnorm(n = nsims * x.n, mean = a + b * s3$x, sd = 0.5)
set.seed(123456)
s4$y  <- rnorm(n = nsims * x.n, mean = a + b * s4$x, sd = 0.5)
c1    <- c2 <- c3 <- c4 <- data.frame(sim = 1:nsims, a.hat = NA_real_, b.hat = NA_real_)
pb    <- txtProgressBar(style = 3)
for (i in 1:nsims) {
  c1[i, 2:3] <- lr(s1$x[s1$sim == i], s1$y[s1$sim == i], x.n)
  c2[i, 2:3] <- lr(s2$x[s2$sim == i], s2$y[s2$sim == i], x.n)
  c3[i, 2:3] <- lr(s3$x[s3$sim == i], s3$y[s3$sim == i], x.n)
  c4[i, 2:3] <- lr(s4$x[s4$sim == i], s4$y[s4$sim == i], x.n)
  setTxtProgressBar(pb, i / nsims)
}
close(pb)
comp  <- data.frame(estimate = rep("mean", 4),
                    a.hat = sprintf("%+.5f", c(mean(c1$a.hat), mean(c2$a.hat),
                                               mean(c3$a.hat), mean(c4$a.hat))),
                    b.hat = sprintf("%.4f", c(mean(c1$b.hat), mean(c2$b.hat),
                                              mean(c3$b.hat), mean(c4$b.hat))))
cat(t[1]); print(comp[1, ], row.names = FALSE)
cat(t[2]); print(comp[2, ], row.names = FALSE)
cat(t[3]); print(comp[3, ], row.names = FALSE)
cat(t[4]); print(comp[4, ], row.names = FALSE)


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
Activity
 Admin contact
23,655 posts in 4,993 threads, 1,571 registered users;
120 visitors (0 registered, 120 guests [including 13 identified bots]).
Forum time: 01:29 CEST (Europe/Vienna)

Science is simply common sense at its best that is,
rigidly accurate in observation, and
merciless to fallacy in logic.    Thomas Henry Huxley

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5