Normal distribution assessment [General Statistics]
❝ Let's say I have randomly generated set of 1 million values.
Before we apply a statistical method, we have to understand the data generating process.
Therefore: How did you generate your data set? Obtained from
RANDOM.ORG
? With a hardware random number generator? Software? If yes, which PRNG? In the last case, most software (even Excel since 2010) implement the Mersenne Twister, which is with its period of ≈4.3×106,001 fine for generating large data sets. However, in VBA still an LCG is implemented, which is bad for large data sets due to its shorter period.❝ What criterion should be used to check if the set has normal distribution?
Look at the histogram first.
set.seed(123456) # for reproducibility
x <- rnorm(1e6, mean = 0, sd = 1) # or your data instead
lim <- c(-max(abs(range(x))), max(abs(range(x)))) # for the plots
hist(x, breaks = "FD", freq = FALSE, xlim = lim, col = "bisque", border = NA, las = 1)
rug(x, side = 1, ticksize = 0.02)
legend("topright", x.intersp = 0,
legend = c(paste("mean(x) =", signif(mean(x), 6)),
paste("sd(x) =", signif(sd(x), 6))))
If in doubt, overlay it with a kernel density estimate.
lines(density(x, n = 2^10), lwd = 3, col = "#FF000080")
curve(dnorm, lim[1], lim[2], n = 2^10, lwd = 2, col = "#0000FF80", add = TRUE)
plot(lim, lim, type = "n", xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles", main = "Normal Q-Q Plot", las = 1)
grid()
qq <- qqnorm(x, plot.it = FALSE)
points(qq$x, qq$y, pch = 21, cex = 1.25, col = "#87CEFA80", bg = "#87CEFA80") # patience...
qqline(x)
ks.test(x, "pnorm" , alternative = "two.sided")
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Complete thread:
- Normal distribution assessment BE-proff 2022-02-20 19:07 [General Statistics]
- Normal distribution assessmentHelmut 2022-02-20 22:59
- Normal distribution assessment BE-proff 2022-02-21 14:42
- Testing ‘randomness’ Helmut 2022-02-22 15:11
- Normal distribution assessment BE-proff 2022-02-21 14:42
- Normal distribution assessment Obinoscopy 2022-02-27 18:49
- Normal distribution assessmentHelmut 2022-02-20 22:59