Bioequivalence and Bioavailability Forum

Observations in a linear model [General Statistics]

posted by Helmut – Vienna, Austria, 2022-11-15 11:55 (522 d 03:47 ago) – Posting: # 23367
Views: 2,832

Dear all,

off-list I was asked whether it makes a difference if we have single observations or replicates. When you look at the standard equations, clearly the answer is no.
If you don’t believe that, an [image]

-script for the simulation example at the end.

x = 1, 2, 3, 4, 5 estimate a.hat b.hat mean +0.00001 1.9999 x = 1, 1, 3, 5, 5 estimate a.hat b.hat mean -0.00003 2.0000 x = 1, 1, 1, 5, 5 estimate a.hat b.hat mean +0.00011 1.9999 x = 1, 1, 1, 1, 5 estimate a.hat b.hat mean +0.00012 1.9999

Very small differences in the estimated intercept, practically identical slope.

lr <- function(x, y, n) { # standard equations are faster than lm(y ~ x) sum.y <- sum(y) sum.x <- sum(x) b.hat <- (sum(x * y) - sum.x * sum.y / n) / (sum(x^2) - sum.x^2 / n) a.hat <- sum.y / n - sum.x / n * b.hat return(c(a.hat, b.hat)) } a <- 0 # intercept b <- 2 # slope nsims <- 1e5 # number of simulations x.n <- 5 # number of x-levels x.min <- 1 # minimum level x.max <- 5 # maximum level x1 <- x2 <- x3 <- x4 <- seq(x.min, x.max, length.out = x.n) x2[c(c(1:2), c((x.n-1):x.n))] <- c(rep(x.min, 2), rep(x.max, 2)) x3[c(c(1:3), c((x.n-1):x.n))] <- c(rep(x.min, 3), rep(x.max, 2)) x4[1:4] <- rep(x.min, 4) s1 <- data.frame(sim = rep(1:nsims, each = x.n), x = x1) # singlets s2 <- data.frame(sim = rep(1:nsims, each = x.n), x = x2) # 2 duplicates s3 <- data.frame(sim = rep(1:nsims, each = x.n), x = x3) # tri- and duplicate s4 <- data.frame(sim = rep(1:nsims, each = x.n), x = x4) # quadruplet t <- c(paste("x =", paste(signif(x1, 3), collapse = ", "), "\n"), paste("x =", paste(signif(x2, 3), collapse = ", "), "\n"), paste("x =", paste(signif(x3, 3), collapse = ", "), "\n"), paste("x =", paste(signif(x4, 3), collapse = ", "), "\n")) set.seed(123456) s1$y <- rnorm(n = nsims * x.n, mean = a + b * s1$x, sd = 0.5) set.seed(123456) s2$y <- rnorm(n = nsims * x.n, mean = a + b * s2$x, sd = 0.5) set.seed(123456) s3$y <- rnorm(n = nsims * x.n, mean = a + b * s3$x, sd = 0.5) set.seed(123456) s4$y <- rnorm(n = nsims * x.n, mean = a + b * s4$x, sd = 0.5) c1 <- c2 <- c3 <- c4 <- data.frame(sim = 1:nsims, a.hat = NA_real_, b.hat = NA_real_) pb <- txtProgressBar(style = 3) for (i in 1:nsims) { c1[i, 2:3] <- lr(s1$x[s1$sim == i], s1$y[s1$sim == i], x.n) c2[i, 2:3] <- lr(s2$x[s2$sim == i], s2$y[s2$sim == i], x.n) c3[i, 2:3] <- lr(s3$x[s3$sim == i], s3$y[s3$sim == i], x.n) c4[i, 2:3] <- lr(s4$x[s4$sim == i], s4$y[s4$sim == i], x.n) setTxtProgressBar(pb, i / nsims) } close(pb) comp <- data.frame(estimate = rep("mean", 4), a.hat = sprintf("%+.5f", c(mean(c1$a.hat), mean(c2$a.hat), mean(c3$a.hat), mean(c4$a.hat))), b.hat = sprintf("%.4f", c(mean(c1$b.hat), mean(c2$b.hat), mean(c3$b.hat), mean(c4$b.hat)))) cat(t[1]); print(comp[1, ], row.names = FALSE) cat(t[2]); print(comp[2, ], row.names = FALSE) cat(t[3]); print(comp[3, ], row.names = FALSE) cat(t[4]); print(comp[4, ], row.names = FALSE)

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

Observations in a linear modelHelmut 2022-11-15 10:55 [General Statistics]
- Observations in a linear model ElMaestro 2022-11-16 08:44
  - Observations in a linear model Helmut 2022-11-16 09:49
    - Observations in a linear model ElMaestro 2022-11-16 20:50
      - And the Oscar goes to: Replicates! Helmut 2022-11-17 10:19
        
        It's all in the df's ElMaestro 2022-11-18 17:28

Observations in a linear model [General Sta­tis­tics]

Complete thread:

Observations in a linear model [General Statistics]