Die don’t remember the last roll. Really. [General Statistics]
❝ what is statistical independence in layman's terms??
See the subject line.
❝ For example: [etc. etc.]
Fire up R and execute the script at the end.
There are samples with identical means but different variances and others with (almost) identical variances but different means. However, acc. to the central limit theorem (CLT) pooled parameters will sooner or later approach the population’s ones.
❝ […] any perturbation on the data that changes Z will (may) also change s […]
Not sure what you mean by “perturbation on the data”. Sample is a sample is a sample.
Or are you thinking about something happening after sampling (e.g., transcription errors, manipulations, …)? If you have a single sample and the population’s parameters are – as usual – unknown, you enter the slippery ground of outliers – which is a mixture of assessing (deviations from) assumptions, knowledge of the data generating process (incl. checking for implausible values), .
❝ Muchas gracias.
De nada.
PS: Comparing variances is a daring feat. In the example the ratio of largest/smallest sample mean is 1.18, whereas variances varied [pun] almost fourfold. This explains why outlier tests are not powerful and should be taken with a grain of salt.
set.seed(123456)
# Population
N <- 1e5
mu <- 100
s <- 20
# Sampling
n <- 20 # Sample size
samples <- 30 # Number of samples
# Simulate population
x <- rnorm(N, mean = mu, sd = s)
summary(x) # Show what we have
mean0 <- mean(x)
sd0 <- sd(x)
# Draw samples
xs <- data.frame(sample = rep(1:samples, each = n),
xs = sample(x, samples*n))
# Sample estimates
mean.spl <- aggregate(xs[, 2], list(sample = xs$sample), mean)
var.spl <- aggregate(xs[, 2], list(sample = xs$sample), var)
sd.spl <- aggregate(xs[, 2], list(sample = xs$sample), sd)
spl <- data.frame(sample = 1:samples, mean = mean.spl$x, var = var.spl$x)
# Pool estimates
mean.pld <- mean(mean.spl$x)
var.pld <- sum((n-1)*var.spl$x)/(n*samples-samples)
sd.pld <- sqrt(var.pld)
idx.m <- sort(mean.spl$x, index = TRUE)$ix # index of sorted sample means
idx.v <- sort(var.spl$x, index = TRUE)$ix # index of sorted sample variances
# 3 plots with recording
col <- rainbow(samples, start = 0.7, end = 0.1)
h.pop <- hist(x, breaks = "FD", plot = FALSE)
xlim <- range(h.pop$mids)
ylim <- range(h.pop$density)*1.5
windows(record = TRUE)
op <- par(no.readonly = TRUE)
par(mar = c(4, 4, 2, 0) + 0.1, ask = TRUE)
# 1: Population
plot(h.pop, freq = FALSE, xlim = xlim, ylim = ylim, col = "bisque",
main = "Population", cex.main = 1, font.main = 1,
xlab = paste("N = ", N), cex.lab = 0.9, las = 1, border = FALSE)
abline(v = mean0, lty = 2, col = "blue"); box()
curve(dnorm(x, mean = mean0, sd = sd0), n = 501, col = "blue", lwd = 2, add = TRUE)
lines(mean0+c(-1, +1)*sd0, rep(dnorm(x = mean0-sd0, mean = mean0, sd = sd0), 2),
lty = 3, col = "blue")
par(family = "mono")
legend("topright", bty = "n", col = "blue", lwd = 2,
legend = sprintf("%5.1f | %5.1f (%3.1f%%)", mean0, sd0^2, 100*sd0/mean0),
cex = 0.85, title = "mean | var (CV)")
# 2: Samples
par(family = "sans")
plot(xlim, ylim, type = "n", xlim = xlim, ylim = ylim,
main = "Samples drawn from population (sorted by mean)", cex.main = 1,
font.main = 1, xlab = paste0(samples, " samples (each n = ", n, ")"),
ylab = "Density", cex.lab = 0.9, las = 1, frame.plot = TRUE)
par(family = "mono")
legend("topleft", box.lty = 0, lwd = 1, bg = "white", col = col[idx.m], cex = 0.75,
legend = sprintf("%2i", mean.spl$sample[idx.m]), title = "sample")
legend("topright", box.lty = 0, lwd = NA, bg = "white", col = col[idx.m], cex = 0.75,
legend = sprintf("%5.1f | %5.1f (%3.1f%%)",
mean.spl$x[idx.m], var.spl$x[idx.m],
100*sqrt(var.spl$x[idx.m])/mean.spl$x[idx.m]),
title = "mean | var (CV)"); box()
for (j in seq_along(idx.m)) {
curve(dnorm(x, mean = mean.spl[j, 2], sd = sd.spl[j, 2]), n = 501,
col = col[j], add = TRUE)
lines(mean.spl[j, 2]+c(-1, +1)*sd.spl[j, 2],
rep(dnorm(x = mean.spl[j, 2]-sd.spl[j, 2],
mean = mean.spl[j, 2], sd = sd.spl[j, 2]), 2),
lty = 3, col = col[j])
}
# 3: Population and pooled samples
par(family = "sans")
plot(xlim, ylim, type = "n", xlim = xlim, ylim = ylim,
main = "Population and pooled samples", cex.main = 1, font.main = 1,
xlab = paste0("Estimated from ", samples, " pooled samples (each n = ",
n, ")"), ylab = "Density", cex.lab = 0.9, las = 1)
abline(v = mean0, lty = 2, col = "blue")
abline(v = mean.pld, lty = 2, col = col[floor(samples/2)]); box()
curve(dnorm(x, mean = mean0, sd = sd0), n = 501, lwd = 2,
col = "blue", add = TRUE)
lines(mean0+c(-1, +1)*sd0, rep(dnorm(x = mean0-sd0, mean = mean0, sd = sd0), 2),
lty = 3, col = "blue")
curve(dnorm(x, mean = mean.pld, sd = sqrt(var.pld)), n = 501, lwd = 2,
col = col[floor(samples/2)], add = TRUE)
lines(mean.pld+c(-1, +1)*sqrt(var.pld),
rep(dnorm(x = mean.pld-sqrt(var.pld), mean = mean.pld, sd = sqrt(var.pld)), 2),
lty = 3, col = col[floor(samples/2)])
par(family = "mono")
legend("topright", box.lty = 0, lwd = 2, col = c("blue", col[floor(samples/2)]),
legend = c(sprintf("Population : %5.1f | %5.1f (%3.1f%%)",
mean0, sd0^2, 100*sd0/mean0),
sprintf("Pooled samples: %5.1f | %5.1f (%3.1f%%)",
mean.pld, var.pld, 100*sqrt(var.pld)/mean.pld)),
cex = 0.85, title = "mean | var (CV)", bg = "white"); box()
# 4: Is there a relationship between samples' means and variances?
par(family = "sans")
plot(spl$mean, spl$var, type = "n", las = 1,
main = paste0("Sample estimates (correlation = ",
signif(cor(spl$mean, spl$var), 5), ")"),
cex.main = 1, font.main = 1, xlab = "mean", ylab = "variance")
abline(h = var.pld, lty = 3, col = "lightgrey")
abline(v = mean.pld, lty = 3, col = "lightgrey")
abline(lsfit(spl$mean, spl$var), lty = 2, col = "darkgrey"); box()
points(mean.spl$x[idx.m], var.spl$x[idx.m], pch = 16, col = col[idx.m], cex = 1.35)
for (j in 1:nrow(spl)) {
ifelse (spl$var[j] >= var.pld, loc <- 1, loc <- 3)
text(spl$mean[j], spl$var[j], labels = spl$sample[j], cex = 0.6, pos = loc)
}
par(op)
# Sample estimates ordered by mean
print(spl[idx.m, ], row.names = FALSE)
# Sample estimates ordered by variance
print(spl[idx.v, ], row.names = FALSE)
# Ratio of extreme sample means
spl[idx.m, "mean"][samples]/spl[idx.m, "mean"][1]
# Ratio of extreme sample variances
spl[idx.v, "var"][samples]/spl[idx.v, "var"][1]
# %RE of sample means and variances
summary(100*(mean.spl$x-mu)/mu); summary(100*(sd.spl$x^2-s^2)/s^2)
# Correlation of sample means and variances
cor(spl$mean, spl$var)
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Complete thread:
- Statistical independence, what is it? I mean really, what is it?? ElMaestro 2020-06-27 21:35 [General Statistics]
- Die don’t remember the last roll. Really.Helmut 2020-06-28 13:35
- Die don’t remember the last roll. Really. ElMaestro 2020-06-28 14:45
- Die don’t remember the last roll. Really. Helmut 2020-06-28 15:36
- Still none the wiser ElMaestro 2020-06-28 18:20
- You’ve lost me now. Helmut 2020-06-28 21:55
- Worded differently ElMaestro 2020-06-29 08:30
- Still not sure what you are aiming at… Helmut 2020-06-29 16:46
- Still not sure what you are aiming at… ElMaestro 2020-06-30 00:55
- Confuse-a-Cat Helmut 2020-06-30 11:33
- Confuse-a-Cat ElMaestro 2020-06-30 13:07
- Confuse-a-Cat Helmut 2020-06-30 14:27
- pseudorandom and linear independence mittyri 2020-07-01 00:04
- Confuse-a-Cat ElMaestro 2020-06-30 13:07
- Confuse-a-Cat Helmut 2020-06-30 11:33
- Still not sure what you are aiming at… ElMaestro 2020-06-30 00:55
- Still not sure what you are aiming at… Helmut 2020-06-29 16:46
- Worded differently ElMaestro 2020-06-29 08:30
- You’ve lost me now. Helmut 2020-06-28 21:55
- Still none the wiser ElMaestro 2020-06-28 18:20
- Die don’t remember the last roll. Really. Helmut 2020-06-28 15:36
- Die don’t remember the last roll. Really. ElMaestro 2020-06-28 14:45
- Statistical independence, what is it? I mean really, what is it?? martin 2020-07-01 08:40
- Statistical independence, what is it? I mean really, what is it?? ElMaestro 2020-07-01 09:42
- Statistical independence, what is it? I mean really, what is it?? martin 2020-07-01 10:07
- Statistical independence, what is it? I mean really, what is it?? ElMaestro 2020-07-01 09:42
- Die don’t remember the last roll. Really.Helmut 2020-06-28 13:35