BE-proff
●    

2022-02-20 20:07
(766 d 20:46 ago)

Posting: # 22798
Views: 2,104
 

 Normal distribution assessment [General Sta­tis­tics]

Hi All,

One very general question that may be a little bit out of topic :-D
Let's say I have randomly generated set of 1 million values.:surprised:

What criterion should be used to check if the set has normal distribution? :confused:
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2022-02-20 23:59
(766 d 16:54 ago)

@ BE-proff
Posting: # 22799
Views: 1,761
 

 Normal distribution assessment

Hi BE-proff,

❝ Let's say I have randomly generated set of 1 million values.:surprised:


Before we apply a statistical method, we have to understand the [image] data generating process.
Therefore: How did you generate your data set? Obtained from RANDOM.ORG? With a hardware random number generator? Software? If yes, which PRNG? In the last case, most software (even Excel since 2010) implement the Mersenne Twis­ter, which is with its period of ≈4.3×106,001 fine for generating large data sets. However, in VBA still an LCG is im­ple­mented, which is bad for large data sets due to its shorter period.

❝ What criterion should be used to check if the set has normal distribution? :confused:


Look at the histogram first. ;-)

set.seed(123456)                                  # for reproducibility
x   <- rnorm(1e6, mean = 0, sd = 1)               # or your data instead
lim <- c(-max(abs(range(x))), max(abs(range(x)))) # for the plots
hist(x, breaks = "FD", freq = FALSE, xlim = lim, col = "bisque", border = NA, las = 1)
rug(x, side = 1, ticksize = 0.02)
legend("topright", x.intersp = 0,
       legend = c(paste("mean(x) =", signif(mean(x), 6)),
                  paste("sd(x) =", signif(sd(x), 6))))

Does it look normal? Happy with the mean (should be ≈0) and the standard deviation (should be ≈1)?
If in doubt, overlay it with a kernel density estimate.

lines(density(x, n = 2^10), lwd = 3, col = "#FF000080")

Does it match? Not sure? Overlay the normal distribution.

curve(dnorm, lim[1], lim[2], n = 2^10, lwd = 2, col = "#0000FF80", add = TRUE)

Still in doubt?

plot(lim, lim, type = "n", xlab = "Theoretical Quantiles",
     ylab = "Sample Quantiles", main = "Normal Q-Q Plot", las = 1)
grid()
qq <- qqnorm(x, plot.it = FALSE)
points(qq$x, qq$y, pch = 21, cex = 1.25, col = "#87CEFA80", bg = "#87CEFA80") # patience...
qqline(x)

If you insist in a test comparing the data’s empirical cumulative distribution function to the cumulative distribution function of the standard normal:

ks.test(x, "pnorm" , alternative = "two.sided")


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
BE-proff
●    

2022-02-21 15:42
(766 d 01:11 ago)

@ Helmut
Posting: # 22800
Views: 1,597
 

 Normal distribution assessment

Привет Helmut,

Many thanks for clarification :clap:
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2022-02-22 16:11
(765 d 00:41 ago)

@ BE-proff
Posting: # 22801
Views: 1,598
 

 Testing ‘randomness’

Hi BE-proff,

❝ Many thanks for clarification :clap:


Welcome.

runs <- 25L
n    <- 1e6
res  <- data.frame(run = 1:runs, p = NA_real_, sign = "")
for (j in 1:nrow(res)) {
  if (j == 1) set.seed(123456) else set.seed(j)
  res$p[j] <- ks.test(rnorm(n = n), "pnorm", alternative = "two.sided")$p.value
  if (res$p[j] < 0.05) res$sign[j] <- "*"
}
print(res, row.names = FALSE)


See also there for further information.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Obinoscopy
★    

USA,
2022-02-27 19:49
(759 d 21:03 ago)

@ BE-proff
Posting: # 22811
Views: 1,479
 

 Normal distribution assessment

❝ Let's say I have randomly generated set of 1 million values.:surprised:


❝ What criterion should be used to check if the set has normal distribution? :confused:


I don't think a randomly generated set of numbers would have a normal distribution. Every number should have an equal chance of been selected so I would say it should look more like a uniform distribution. But that is just my thinking....

Scopy
UA Flag
Activity
 Admin contact
22,957 posts in 4,819 threads, 1,638 registered users;
72 visitors (0 registered, 72 guests [including 7 identified bots]).
Forum time: 16:53 CET (Europe/Vienna)

Nothing shows a lack of mathematical education more
than an overly precise calculation.    Carl Friedrich Gauß

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5