Die don’t remember the last roll. Really. [General Statistics]
» what is statistical independence in layman's terms??
See the subject line.
» For example: [etc. etc.]
Fire up R and execute the script at the end.
There are samples with identical means but different variances and others with (almost) identical variances but different means. However, acc. to the
![[image]](https://static.bebac.at/img/Favicon_wikipedia_en.gif)
» […] any perturbation on the data that changes Z will (may) also change s […]
Not sure what you mean by “perturbation on the data”.
![[image]](https://static.bebac.at/img/Favicon_wikipedia_en.gif)
Or are you thinking about something happening after sampling (e.g., transcription errors, manipulations, …)? If you have a single sample and the population’s parameters are – as usual – unknown, you enter the slippery ground of outliers – which is a mixture of assessing (deviations from) assumptions, knowledge of the data generating process (incl. checking for implausible values),

» Muchas gracias.
De nada.
PS: Comparing variances is a daring feat. In the example the ratio of largest/smallest sample mean is 1.18, whereas variances varied [pun] almost fourfold. This explains why outlier tests are not powerful and should be taken with a grain of salt.
set.seed(123456)
# Population
N <- 1e5
mu <- 100
s <- 20
# Sampling
n <- 20 # Sample size
samples <- 30 # Number of samples
# Simulate population
x <- rnorm(N, mean = mu, sd = s)
summary(x) # Show what we have
mean0 <- mean(x)
sd0 <- sd(x)
# Draw samples
xs <- data.frame(sample = rep(1:samples, each = n),
xs = sample(x, samples*n))
# Sample estimates
mean.spl <- aggregate(xs[, 2], list(sample = xs$sample), mean)
var.spl <- aggregate(xs[, 2], list(sample = xs$sample), var)
sd.spl <- aggregate(xs[, 2], list(sample = xs$sample), sd)
spl <- data.frame(sample = 1:samples, mean = mean.spl$x, var = var.spl$x)
# Pool estimates
mean.pld <- mean(mean.spl$x)
var.pld <- sum((n-1)*var.spl$x)/(n*samples-samples)
sd.pld <- sqrt(var.pld)
idx.m <- sort(mean.spl$x, index = TRUE)$ix # index of sorted sample means
idx.v <- sort(var.spl$x, index = TRUE)$ix # index of sorted sample variances
# 3 plots with recording
col <- rainbow(samples, start = 0.7, end = 0.1)
h.pop <- hist(x, breaks = "FD", plot = FALSE)
xlim <- range(h.pop$mids)
ylim <- range(h.pop$density)*1.5
windows(record = TRUE)
op <- par(no.readonly = TRUE)
par(mar = c(4, 4, 2, 0) + 0.1, ask = TRUE)
# 1: Population
plot(h.pop, freq = FALSE, xlim = xlim, ylim = ylim, col = "bisque",
main = "Population", cex.main = 1, font.main = 1,
xlab = paste("N = ", N), cex.lab = 0.9, las = 1, border = FALSE)
abline(v = mean0, lty = 2, col = "blue"); box()
curve(dnorm(x, mean = mean0, sd = sd0), n = 501, col = "blue", lwd = 2, add = TRUE)
lines(mean0+c(-1, +1)*sd0, rep(dnorm(x = mean0-sd0, mean = mean0, sd = sd0), 2),
lty = 3, col = "blue")
par(family = "mono")
legend("topright", bty = "n", col = "blue", lwd = 2,
legend = sprintf("%5.1f | %5.1f (%3.1f%%)", mean0, sd0^2, 100*sd0/mean0),
cex = 0.85, title = "mean | var (CV)")
# 2: Samples
par(family = "sans")
plot(xlim, ylim, type = "n", xlim = xlim, ylim = ylim,
main = "Samples drawn from population (sorted by mean)", cex.main = 1,
font.main = 1, xlab = paste0(samples, " samples (each n = ", n, ")"),
ylab = "Density", cex.lab = 0.9, las = 1, frame.plot = TRUE)
par(family = "mono")
legend("topleft", box.lty = 0, lwd = 1, bg = "white", col = col[idx.m], cex = 0.75,
legend = sprintf("%2i", mean.spl$sample[idx.m]), title = "sample")
legend("topright", box.lty = 0, lwd = NA, bg = "white", col = col[idx.m], cex = 0.75,
legend = sprintf("%5.1f | %5.1f (%3.1f%%)",
mean.spl$x[idx.m], var.spl$x[idx.m],
100*sqrt(var.spl$x[idx.m])/mean.spl$x[idx.m]),
title = "mean | var (CV)"); box()
for (j in seq_along(idx.m)) {
curve(dnorm(x, mean = mean.spl[j, 2], sd = sd.spl[j, 2]), n = 501,
col = col[j], add = TRUE)
lines(mean.spl[j, 2]+c(-1, +1)*sd.spl[j, 2],
rep(dnorm(x = mean.spl[j, 2]-sd.spl[j, 2],
mean = mean.spl[j, 2], sd = sd.spl[j, 2]), 2),
lty = 3, col = col[j])
}
# 3: Population and pooled samples
par(family = "sans")
plot(xlim, ylim, type = "n", xlim = xlim, ylim = ylim,
main = "Population and pooled samples", cex.main = 1, font.main = 1,
xlab = paste0("Estimated from ", samples, " pooled samples (each n = ",
n, ")"), ylab = "Density", cex.lab = 0.9, las = 1)
abline(v = mean0, lty = 2, col = "blue")
abline(v = mean.pld, lty = 2, col = col[floor(samples/2)]); box()
curve(dnorm(x, mean = mean0, sd = sd0), n = 501, lwd = 2,
col = "blue", add = TRUE)
lines(mean0+c(-1, +1)*sd0, rep(dnorm(x = mean0-sd0, mean = mean0, sd = sd0), 2),
lty = 3, col = "blue")
curve(dnorm(x, mean = mean.pld, sd = sqrt(var.pld)), n = 501, lwd = 2,
col = col[floor(samples/2)], add = TRUE)
lines(mean.pld+c(-1, +1)*sqrt(var.pld),
rep(dnorm(x = mean.pld-sqrt(var.pld), mean = mean.pld, sd = sqrt(var.pld)), 2),
lty = 3, col = col[floor(samples/2)])
par(family = "mono")
legend("topright", box.lty = 0, lwd = 2, col = c("blue", col[floor(samples/2)]),
legend = c(sprintf("Population : %5.1f | %5.1f (%3.1f%%)",
mean0, sd0^2, 100*sd0/mean0),
sprintf("Pooled samples: %5.1f | %5.1f (%3.1f%%)",
mean.pld, var.pld, 100*sqrt(var.pld)/mean.pld)),
cex = 0.85, title = "mean | var (CV)", bg = "white"); box()
# 4: Is there a relationship between samples' means and variances?
par(family = "sans")
plot(spl$mean, spl$var, type = "n", las = 1,
main = paste0("Sample estimates (correlation = ",
signif(cor(spl$mean, spl$var), 5), ")"),
cex.main = 1, font.main = 1, xlab = "mean", ylab = "variance")
abline(h = var.pld, lty = 3, col = "lightgrey")
abline(v = mean.pld, lty = 3, col = "lightgrey")
abline(lsfit(spl$mean, spl$var), lty = 2, col = "darkgrey"); box()
points(mean.spl$x[idx.m], var.spl$x[idx.m], pch = 16, col = col[idx.m], cex = 1.35)
for (j in 1:nrow(spl)) {
ifelse (spl$var[j] >= var.pld, loc <- 1, loc <- 3)
text(spl$mean[j], spl$var[j], labels = spl$sample[j], cex = 0.6, pos = loc)
}
par(op)
# Sample estimates ordered by mean
print(spl[idx.m, ], row.names = FALSE)
# Sample estimates ordered by variance
print(spl[idx.v, ], row.names = FALSE)
# Ratio of extreme sample means
spl[idx.m, "mean"][samples]/spl[idx.m, "mean"][1]
# Ratio of extreme sample variances
spl[idx.v, "var"][samples]/spl[idx.v, "var"][1]
# %RE of sample means and variances
summary(100*(mean.spl$x-mu)/mu); summary(100*(sd.spl$x^2-s^2)/s^2)
# Correlation of sample means and variances
cor(spl$mean, spl$var)
Dif-tor heh smusma 🖖
Helmut Schütz
![[image]](https://static.bebac.at/img/CC by.png)
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Complete thread:
- Statistical independence, what is it? I mean really, what is it?? ElMaestro 2020-06-27 21:35 [General Statistics]
- Die don’t remember the last roll. Really.Helmut 2020-06-28 13:35
- Die don’t remember the last roll. Really. ElMaestro 2020-06-28 14:45
- Die don’t remember the last roll. Really. Helmut 2020-06-28 15:36
- Still none the wiser ElMaestro 2020-06-28 18:20
- You’ve lost me now. Helmut 2020-06-28 21:55
- Worded differently ElMaestro 2020-06-29 08:30
- Still not sure what you are aiming at… Helmut 2020-06-29 16:46
- Still not sure what you are aiming at… ElMaestro 2020-06-30 00:55
- Confuse-a-Cat Helmut 2020-06-30 11:33
- Confuse-a-Cat ElMaestro 2020-06-30 13:07
- Confuse-a-Cat Helmut 2020-06-30 14:27
- pseudorandom and linear independence mittyri 2020-07-01 00:04
- Confuse-a-Cat ElMaestro 2020-06-30 13:07
- Confuse-a-Cat Helmut 2020-06-30 11:33
- Still not sure what you are aiming at… ElMaestro 2020-06-30 00:55
- Still not sure what you are aiming at… Helmut 2020-06-29 16:46
- Worded differently ElMaestro 2020-06-29 08:30
- You’ve lost me now. Helmut 2020-06-28 21:55
- Still none the wiser ElMaestro 2020-06-28 18:20
- Die don’t remember the last roll. Really. Helmut 2020-06-28 15:36
- Die don’t remember the last roll. Really. ElMaestro 2020-06-28 14:45
- Statistical independence, what is it? I mean really, what is it?? martin 2020-07-01 08:40
- Statistical independence, what is it? I mean really, what is it?? ElMaestro 2020-07-01 09:42
- Statistical independence, what is it? I mean really, what is it?? martin 2020-07-01 10:07
- Statistical independence, what is it? I mean really, what is it?? ElMaestro 2020-07-01 09:42
- Die don’t remember the last roll. Really.Helmut 2020-06-28 13:35