Bioequivalence and Bioavailability Forum

Helmut
★★★

Vienna, Austria,
2022-11-15 11:55
(965 d 01:05 ago)

Posting: # 23367
Views: 5,556

Observations in a linear model [General Statistics]

Dear all,

off-list I was asked whether it makes a difference if we have single observations or replicates. When you look at the standard equations, clearly the answer is no.
If you don’t believe that, an [image]

-script for the simulation example at the end.

x = 1, 2, 3, 4, 5 estimate a.hat b.hat mean +0.00001 1.9999 x = 1, 1, 3, 5, 5 estimate a.hat b.hat mean -0.00003 2.0000 x = 1, 1, 1, 5, 5 estimate a.hat b.hat mean +0.00011 1.9999 x = 1, 1, 1, 1, 5 estimate a.hat b.hat mean +0.00012 1.9999

Very small differences in the estimated intercept, practically identical slope.

lr <- function(x, y, n) { # standard equations are faster than lm(y ~ x) sum.y <- sum(y) sum.x <- sum(x) b.hat <- (sum(x * y) - sum.x * sum.y / n) / (sum(x^2) - sum.x^2 / n) a.hat <- sum.y / n - sum.x / n * b.hat return(c(a.hat, b.hat)) } a <- 0 # intercept b <- 2 # slope nsims <- 1e5 # number of simulations x.n <- 5 # number of x-levels x.min <- 1 # minimum level x.max <- 5 # maximum level x1 <- x2 <- x3 <- x4 <- seq(x.min, x.max, length.out = x.n) x2[c(c(1:2), c((x.n-1):x.n))] <- c(rep(x.min, 2), rep(x.max, 2)) x3[c(c(1:3), c((x.n-1):x.n))] <- c(rep(x.min, 3), rep(x.max, 2)) x4[1:4] <- rep(x.min, 4) s1 <- data.frame(sim = rep(1:nsims, each = x.n), x = x1) # singlets s2 <- data.frame(sim = rep(1:nsims, each = x.n), x = x2) # 2 duplicates s3 <- data.frame(sim = rep(1:nsims, each = x.n), x = x3) # tri- and duplicate s4 <- data.frame(sim = rep(1:nsims, each = x.n), x = x4) # quadruplet t <- c(paste("x =", paste(signif(x1, 3), collapse = ", "), "\n"), paste("x =", paste(signif(x2, 3), collapse = ", "), "\n"), paste("x =", paste(signif(x3, 3), collapse = ", "), "\n"), paste("x =", paste(signif(x4, 3), collapse = ", "), "\n")) set.seed(123456) s1$y <- rnorm(n = nsims * x.n, mean = a + b * s1$x, sd = 0.5) set.seed(123456) s2$y <- rnorm(n = nsims * x.n, mean = a + b * s2$x, sd = 0.5) set.seed(123456) s3$y <- rnorm(n = nsims * x.n, mean = a + b * s3$x, sd = 0.5) set.seed(123456) s4$y <- rnorm(n = nsims * x.n, mean = a + b * s4$x, sd = 0.5) c1 <- c2 <- c3 <- c4 <- data.frame(sim = 1:nsims, a.hat = NA_real_, b.hat = NA_real_) pb <- txtProgressBar(style = 3) for (i in 1:nsims) { c1[i, 2:3] <- lr(s1$x[s1$sim == i], s1$y[s1$sim == i], x.n) c2[i, 2:3] <- lr(s2$x[s2$sim == i], s2$y[s2$sim == i], x.n) c3[i, 2:3] <- lr(s3$x[s3$sim == i], s3$y[s3$sim == i], x.n) c4[i, 2:3] <- lr(s4$x[s4$sim == i], s4$y[s4$sim == i], x.n) setTxtProgressBar(pb, i / nsims) } close(pb) comp <- data.frame(estimate = rep("mean", 4), a.hat = sprintf("%+.5f", c(mean(c1$a.hat), mean(c2$a.hat), mean(c3$a.hat), mean(c4$a.hat))), b.hat = sprintf("%.4f", c(mean(c1$b.hat), mean(c2$b.hat), mean(c3$b.hat), mean(c4$b.hat)))) cat(t[1]); print(comp[1, ], row.names = FALSE) cat(t[2]); print(comp[2, ], row.names = FALSE) cat(t[3]); print(comp[3, ], row.names = FALSE) cat(t[4]); print(comp[4, ], row.names = FALSE)

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro ★★★ Denmark, 2022-11-16 09:44 (964 d 03:15 ago) @ Helmut Posting: # 23368 Views: 4,524	Observations in a linear model Post reply
	Hi Helmut I have no idea what is it is you were trying to test. If it makes a difference to have more replicates, or something along such lines, please put more word to it. It is way too vague for me. Please at least define "it" and "difference" in your own context with ample words. — Pass or fail! ElMaestro

Helmut
★★★

Vienna, Austria,
2022-11-16 10:49
(964 d 02:11 ago)

@ ElMaestro
Posting: # 23369
Views: 4,612

Observations in a linear model

Post reply

Hi ElMaestro,

❝ If it makes a difference to have more replicates, or something along such lines, please put more word to it. It is way too vague for me.

❝

❝ Please at least define "it" and "difference" in your own context with ample words.

I try. Two examples:

Two calibration curves spanning the same range with either n single measurements or n/2 duplicates.
A 2×2×2 crossover with n subject vs a 4-period full replicate design with n/2 subjects.

The question was whether the estimates of the models (obtained by single observations or replicates) are different.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2022-11-16 21:50
(963 d 15:09 ago)

@ Helmut
Posting: # 23370
Views: 4,543

Observations in a linear model

Post reply

Hi,

I still do not quite know what it is you are trying to convince yourself about. :-D

❝ I try. Two examples:

❝ – Two calibration curves spanning the same range with either n single measurements or n/2 duplicates.

❝ – A 2×2×2 crossover with n subject vs a 4-period full replicate design with n/2 subjects.

❝ The question was whether the estimates of the models (obtained by single observations or replicates) are different.

The estimates are unbiased estimators of the effects you are extracting. That's how it works, so there is no surprise here. Now try and consider the quality of the estimates, for example if you want to know how well is a mean (or slope or intercept or ...) estimated. Look for example at the SE of the estimates, easily extractable from an lm via e.g.
summary(M)$coeff[,2]

More replicates, better estimates. Or, expressed in clear BE terms, higher sample size gives narrower confidence intervals, but the expected value of the point estimate is unchanged.

—
Pass or fail!
ElMaestro

Helmut
★★★

Vienna, Austria,
2022-11-17 11:19
(963 d 01:41 ago)

@ ElMaestro
Posting: # 23371
Views: 4,522

And the Oscar goes to: Replicates!

Post reply

Hi ElMaestro,

❝ The estimates are unbiased estimators of the effects you are extracting. That's how it works, so there is no surprise here. Now try and consider the quality of the estimates, for example if you want to know how well is a mean (or slope or intercept or ...) estimated. Look for example at the SE of the estimates, easily extractable from an lm via e.g.

❝ summary(M)$coeff[,2]

Your wish is my command.

x = 1, 2, 3, 4, 5 estimate a.hat SE b.hat SE mean +0.00001 0.4836 1.9999 0.1458 x = 1, 1, 3, 5, 5 estimate a.hat SE b.hat SE mean -0.00003 0.4024 2.0000 0.1152 x = 1, 1, 1, 5, 5 estimate a.hat SE b.hat SE mean +0.00011 0.3423 1.9999 0.1051 x = 1, 1, 1, 1, 5 estimate a.hat SE b.hat SE mean +0.00012 0.3102 1.9999 0.1288

❝ More replicates, better estimates.

You are right! I always preferred calibration curves with duplicates. ;-)

❝ Or, expressed in clear BE terms, higher sample size gives narrower confidence intervals, but the expected value of the point estimate is unchanged.

Not only that. Power depends roughly on the total number of treatments. Even more, in replicate designs we have more degrees of freedom than in a 2×2×2 crossover.

library(PowerTOST) des1 <- c("2x2x2", "2x2x4", "2x2x3", "2x3x3") des2 <- c("2×2×2 crossover", "2-sequence 4-period full replicate", "2-sequence 3-period full replicate", "3-sequence 3-period partial replicate") CV <- 0.35 cross <- sampleN.TOST(CV = CV, design = des1[1], print = FALSE)[7:8] full1 <- sampleN.TOST(CV = CV, design = des1[2], print = FALSE)[7:8] full2 <- sampleN.TOST(CV = CV, design = des1[3], print = FALSE)[7:8] part <- sampleN.TOST(CV = CV, design = des1[4], print = FALSE)[7:8] x <- data.frame(design = des2, n = c(cross[["Sample size"]], full1[["Sample size"]], full2[["Sample size"]], part[["Sample size"]]), df = c(cross[["Sample size"]] - 2, 3 * full1[["Sample size"]] - 4, 2 * full2[["Sample size"]] - 3, 2 * part[["Sample size"]] - 3), power = c(cross[["Achieved power"]], full1[["Achieved power"]], full2[["Achieved power"]], part[["Achieved power"]])) print(x, row.names = FALSE, right = FALSE) design n df power 2×2×2 crossover 52 50 0.8074702 2-sequence 4-period full replicate 26 74 0.8108995 2-sequence 3-period full replicate 38 73 0.8008153 3-sequence 3-period partial replicate 39 75 0.8109938

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro ★★★ Denmark, 2022-11-18 18:28 (961 d 18:32 ago) @ Helmut Posting: # 23372 Views: 4,508	It's all in the df's Post reply
	Hi again, stating the entire thread in fewer terms and no code: more df's = better (more precise) estimates. You can increase the df's in two ways: Betweens: More subjects Withins: More treatments (or, if you will, treatment periods) per subject Both will work, but which one works best in any given circumstance depends on the details of the problem at hand. Amen. — Pass or fail! ElMaestro

Observations in a linear model [General Sta­tis­tics]

Observations in a linear model

Observations in a linear model

Observations in a linear model

And the Oscar goes to: Replicates!

It's all in the df's

Observations in a linear model [General Statistics]