ElMaestro
★★★

Denmark,
2020-06-27 23:35
(1370 d 00:49 ago)

Posting: # 21582
Views: 5,367
 

 Statistical independence, what is it? I mean really, what is it?? [General Sta­tis­tics]

Hi all,

what is statistical independence in layman's terms?? I thought I had a basic understanding of it, but now I am not so sure.

For example: Wikipedia mentions on the page for the t distribution that t=Z/s where Z and s are independent, and Z = (sample mean - mean) and s is the sample sd divided by square root of n.

Given that any perturbation on the data that changes Z will (may) also change s, I am kind of slightly conflicted regarding my own idea of independence.

I am not so much looking for a textbook explaining what it is, or a link to some web page or other. I probably read those links that are the first 50 on google and got nowhere. Rather, I am humbly looking for an explanation that appeals to my braincells, the number of which is very, very limited.
Muchas gracias.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-06-28 15:35
(1369 d 08:48 ago)

@ ElMaestro
Posting: # 21583
Views: 4,628
 

 Die don’t remember the last roll. Really.

Hi ElMaestro,

❝ what is statistical independence in layman's terms??


See the subject line.

❝ For example: [etc. etc.]



Fire up R and execute the script at the end.

There are samples with identical means but different variances and others with (almost) identical variances but different means. However, acc. to the central limit theorem (CLT) pooled parameters will sooner or later approach the population’s ones.

❝ […] any perturbation on the data that changes Z will (may) also change s […]



Not sure what you mean by “perturbation on the data”. Sample is a sample is a sample.
Or are you thinking about something happening after sampling (e.g., transcription errors, manipulations, …)? If you have a single sample and the population’s parameters are – as usual – unknown, you enter the slippery ground of outliers – which is a mixture of assessing (deviations from) assumptions, knowledge of the data generating process (incl. checking for implausible values), :blahblah:.

❝ Muchas gracias.


De nada.


PS: Comparing variances is a daring feat. In the example the ratio of largest/smallest sample mean is 1.18, whereas variances varied [pun] almost fourfold. This explains why outlier tests are not powerful and should be taken with a grain of salt.

set.seed(123456)
# Population
N        <- 1e5
mu       <- 100
s        <- 20
# Sampling
n        <- 20 # Sample size
samples  <- 30 # Number of samples
# Simulate population
x        <- rnorm(N, mean = mu, sd = s)
summary(x) # Show what we have
mean0    <- mean(x)
sd0      <- sd(x)
# Draw samples
xs       <- data.frame(sample = rep(1:samples, each = n),
                       xs = sample(x, samples*n))
# Sample estimates
mean.spl <- aggregate(xs[, 2], list(sample = xs$sample), mean)
var.spl  <- aggregate(xs[, 2], list(sample = xs$sample), var)
sd.spl   <- aggregate(xs[, 2], list(sample = xs$sample), sd)
spl      <- data.frame(sample = 1:samples, mean = mean.spl$x, var = var.spl$x)
# Pool estimates
mean.pld <- mean(mean.spl$x)
var.pld  <- sum((n-1)*var.spl$x)/(n*samples-samples)
sd.pld   <- sqrt(var.pld)
idx.m    <- sort(mean.spl$x, index = TRUE)$ix # index of sorted sample means
idx.v    <- sort(var.spl$x, index = TRUE)$ix  # index of sorted sample variances
# 3 plots with recording
col      <- rainbow(samples, start = 0.7, end = 0.1)
h.pop    <- hist(x, breaks = "FD", plot = FALSE)
xlim     <- range(h.pop$mids)
ylim     <- range(h.pop$density)*1.5
windows(record = TRUE)
op       <- par(no.readonly = TRUE)
par(mar = c(4, 4, 2, 0) + 0.1, ask = TRUE)
# 1: Population
plot(h.pop, freq = FALSE, xlim = xlim, ylim = ylim, col = "bisque",
     main = "Population", cex.main = 1, font.main = 1,
     xlab = paste("N  = ", N), cex.lab = 0.9, las = 1, border = FALSE)
abline(v = mean0, lty = 2, col = "blue"); box()
curve(dnorm(x, mean = mean0, sd = sd0), n = 501, col = "blue", lwd = 2, add = TRUE)
lines(mean0+c(-1, +1)*sd0, rep(dnorm(x = mean0-sd0, mean = mean0, sd = sd0), 2),
      lty = 3, col = "blue")
par(family = "mono")
legend("topright", bty = "n", col = "blue", lwd = 2,
       legend = sprintf("%5.1f | %5.1f (%3.1f%%)", mean0, sd0^2, 100*sd0/mean0),
       cex = 0.85, title = "mean | var (CV)")
# 2: Samples
par(family = "sans")
plot(xlim, ylim, type = "n", xlim = xlim, ylim = ylim,
     main = "Samples drawn from population (sorted by mean)", cex.main = 1,
     font.main = 1, xlab = paste0(samples, " samples (each n = ", n, ")"),
     ylab = "Density", cex.lab = 0.9, las = 1, frame.plot = TRUE)
par(family = "mono")
legend("topleft", box.lty = 0, lwd = 1, bg = "white", col = col[idx.m], cex = 0.75,
       legend = sprintf("%2i", mean.spl$sample[idx.m]), title = "sample")
legend("topright", box.lty = 0, lwd = NA, bg = "white", col = col[idx.m], cex = 0.75,
       legend = sprintf("%5.1f | %5.1f (%3.1f%%)",
                        mean.spl$x[idx.m], var.spl$x[idx.m],
                        100*sqrt(var.spl$x[idx.m])/mean.spl$x[idx.m]),
       title = "mean | var (CV)"); box()
for (j in seq_along(idx.m)) {
  curve(dnorm(x, mean = mean.spl[j, 2], sd = sd.spl[j, 2]), n = 501,
        col = col[j], add = TRUE)
  lines(mean.spl[j, 2]+c(-1, +1)*sd.spl[j, 2],
        rep(dnorm(x = mean.spl[j, 2]-sd.spl[j, 2],
                  mean = mean.spl[j, 2], sd = sd.spl[j, 2]), 2),
        lty = 3, col = col[j])
}
# 3: Population and pooled samples
par(family = "sans")
plot(xlim, ylim, type = "n", xlim = xlim, ylim = ylim,
     main = "Population and pooled samples", cex.main = 1, font.main = 1,
     xlab = paste0("Estimated from ", samples, " pooled samples (each n  = ",
                   n, ")"), ylab = "Density", cex.lab = 0.9, las = 1)
abline(v = mean0, lty = 2, col = "blue")
abline(v = mean.pld, lty = 2, col = col[floor(samples/2)]); box()
curve(dnorm(x, mean = mean0, sd = sd0), n = 501, lwd = 2,
      col = "blue", add = TRUE)
lines(mean0+c(-1, +1)*sd0, rep(dnorm(x = mean0-sd0, mean = mean0, sd = sd0), 2),
      lty = 3, col = "blue")
curve(dnorm(x, mean = mean.pld, sd = sqrt(var.pld)), n = 501, lwd = 2,
      col = col[floor(samples/2)], add = TRUE)
lines(mean.pld+c(-1, +1)*sqrt(var.pld),
      rep(dnorm(x = mean.pld-sqrt(var.pld), mean = mean.pld, sd = sqrt(var.pld)), 2),
      lty = 3, col = col[floor(samples/2)])
par(family = "mono")
legend("topright", box.lty = 0, lwd = 2, col = c("blue", col[floor(samples/2)]),
       legend = c(sprintf("Population    : %5.1f | %5.1f (%3.1f%%)",
                          mean0, sd0^2, 100*sd0/mean0),
                  sprintf("Pooled samples: %5.1f | %5.1f (%3.1f%%)",
                          mean.pld, var.pld, 100*sqrt(var.pld)/mean.pld)),
       cex = 0.85, title = "mean | var (CV)", bg = "white"); box()
# 4: Is there a relationship between samples' means and variances?
par(family = "sans")
plot(spl$mean, spl$var, type = "n", las = 1,
     main = paste0("Sample estimates (correlation = ",
                   signif(cor(spl$mean, spl$var), 5), ")"),
     cex.main = 1, font.main = 1, xlab = "mean", ylab = "variance")
abline(h = var.pld, lty = 3, col = "lightgrey")
abline(v = mean.pld, lty = 3, col = "lightgrey")
abline(lsfit(spl$mean, spl$var), lty = 2, col = "darkgrey"); box()
points(mean.spl$x[idx.m], var.spl$x[idx.m], pch = 16, col = col[idx.m], cex = 1.35)
for (j in 1:nrow(spl)) {
  ifelse (spl$var[j] >= var.pld, loc <- 1, loc <- 3)
  text(spl$mean[j], spl$var[j], labels = spl$sample[j], cex = 0.6, pos = loc)
}
par(op)
# Sample estimates ordered by mean
print(spl[idx.m, ], row.names = FALSE)
# Sample estimates ordered by variance
print(spl[idx.v, ], row.names = FALSE)
# Ratio of extreme sample means
spl[idx.m, "mean"][samples]/spl[idx.m, "mean"][1]
# Ratio of extreme sample variances
spl[idx.v, "var"][samples]/spl[idx.v, "var"][1]
# %RE of sample means and variances
summary(100*(mean.spl$x-mu)/mu); summary(100*(sd.spl$x^2-s^2)/s^2)
# Correlation of sample means and variances
cor(spl$mean, spl$var)


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2020-06-28 16:45
(1369 d 07:38 ago)

@ Helmut
Posting: # 21584
Views: 4,535
 

 Die don’t remember the last roll. Really.

Hi Helmut,
and thank for trying to help.


If it did remember the last roll and the outcome was somehow a function of it, wouldn't that be correlation?

And how is the die and last roll mental image fitting in with Z and s being indepedent? I think it fits better into the mental picture of errors in a model being IID.

Or more generally:
From a sample of size N I am deriving two quantities A and B. Under what cicumstances will A and B be independent? A die with Alzheimer does not really shed light on it, does it?

Kindly help me a little more. I do not see the idea of the R code but it executes nicely except errors for idx.m and idx.v.:-)
Code on this forum rarely provides clarity for me. This has nothing to do with the code, but has everything to do with me :sleeping:

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-06-28 17:36
(1369 d 06:47 ago)

@ ElMaestro
Posting: # 21585
Views: 4,528
 

 Die don’t remember the last roll. Really.

Hi ElMaestro,

❝ If it did remember the last roll and the outcome was somehow a function of it, wouldn't that be correlation?


Well, the die is [image] Thick As A Brick. If a roll depends on the result of the last one, that could be (a) cheating by an expert gambler or (b) an imperfect die. In both cases you would have a correlation indeed and the concept collapses.

❝ And how is the die and last roll mental image fitting in with Z and s being indepedent? I think it fits better into the mental picture of errors in a model being IID.


My example deals with \(\small{\mathcal{N}(\mu,\sigma^2)}\). If you want to dive into combinatorics/permutations (coin flipping, rolling dice) see the first example in this post.

❝ Or more generally:

❝ From a sample of size N I am deriving two quantities A and B. Under what cicumstances will A and B be independent? A die with Alzheimer does not really shed light on it, does it?


As usual: Know the data generating process. If you cannot be sure that the outcome of B is independent from A you are in deep shit.

❝ Kindly help me a little more. I do not see the idea of the R code but it executes nicely …

  1. Simulate a \(\small{\mathcal{N}(\mu,\sigma^2)}\) population \(\small{x}\) with \(\small{\mu=100},\) and \(\small{\sigma=20}\).
    For \(\small{N=10^5}\) the trusted Mersenne-Twister gives us \(\small{\hat{\mu}=100.0655,}\) \(\small{\hat{\sigma}=20.0154}\).
    Draw a histogram of \(\small{x}\). Overlay the histogram with a normal curve.
  2. Draw 30 samples \(\small{x_{s1},x_{s2},\ldots,x_{s30}}\) of size 20 from \(\small{x}\).
    Estimate \(\small{\bar{x}, {s_{x}}^{2}}\) of each sample and plot the normal curves with \(\small{\bar{x}\mp s_x}\).
    Get pooled estimates, which acc. to the CLT should approximate the population’s parameters.
  3. Compare the outcome with the population.
  4. Correlation of samples’ means and variances. As expected, there is none.

❝ … except errors for idx.m and idx.v.:-)


Fuck. Try the edited one.

❝ Code on this forum rarely provides clarity for me. This has nothing to do with the code, but has everything to do with me :sleeping:


Sorry. I added more comments than usual. ;-)

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2020-06-28 20:20
(1369 d 04:03 ago)

@ Helmut
Posting: # 21586
Views: 4,477
 

 Still none the wiser

Hi Helmut,

thanks for trying. I have to admit, I am sorry but I am none the wiser.:-D
Let us go over it again:
Let x be our sample.
Let f(x|other shit) be a function of x, given some other shit.
Let g(x|other shit) be another function of x, given the same other shit.

In general (and ideally layman's) terms, what properties of f and g would lead me to think f and g are independent.

I assume it is implied that we are talking about independence from each other, at least when we try and think of f and g as numerators and denominators of the quantity defining t above. When this is the wrong perception, kindly correct me, please.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-06-28 23:55
(1369 d 00:28 ago)

@ ElMaestro
Posting: # 21587
Views: 4,469
 

 You’ve lost me now.

Hi ElMaestro,

❝ Let us go over it again:

❝ Let x be our sample.

❝ Let f(x|other shit) be a function of x, given some other shit.

❝ Let g(x|other shit) be another function of x, given the same other shit.


Like this (or any other funny transformation)?

other.shit <- function(x, funny) {
  eval(funny)
}
x <- rnorm(100, mean = 100, sd = 20)
f <- other.shit(x, funny = x^2)
g <- other.shit(x, funny = sqrt(x))
cor(f, g)
plot(f, g, las = 1, xlab = "f(x)", ylab = "g(x)")


❝ In general (and ideally layman's) terms, what properties of f and g would lead me to think f and g are independent.


❝ I assume it is implied that we are talking about independence from each other, at least when we try and think of f and g as numerators and denominators of the quantity defining t above. When this is the wrong perception, kindly correct me, please.


Completely confused. Can you try again, please?

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2020-06-29 10:30
(1368 d 13:54 ago)

@ Helmut
Posting: # 21591
Views: 4,431
 

 Worded differently

Hi Hötzi,

❝ Completely confused. Can you try again, please?


OK.
  1. Let us look at the wikipedia page for the t test:
    "Most test statistics have the form t = Z/s, where Z and s are functions of the data."
  2. For the t-distribution, here Z=sample mean - mean and s=sd/sqrt(n)
  3. Why are Z and s independent in this case? Or more generally, and for me much more importantly, if we have two functions (f and g, or Z and s), then which properties of such functions or their input would render them independent??
Wikipedia links to a page about independence, key here is: "Two events are independent, statistically independent, or stochastically independent if the occurrence of one does not affect the probability of occurrence of the other (equivalently, does not affect the odds). Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other."

I am fully aware that when we simulate a normal dist. with some mean and some variance, then that defines their expected estimates in a sample. I.e. if a sample has a mean that is higher than the simulated mean, then that does not necessarily mean the sampled sd is higher (or lower, for that matter, that was where I was going with "perturbation"). It sounds right to think of the two as independent, in that case. Now, how about the general case, for example if we know nothing about the nature of the sample, but just look at any two functions of the sample? What property would we look for in those two functions to think they are independent?
A general understanding of the idea of independence of any two quantities derived from a sample, that is what I am looking for; point #3 above defines my question.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-06-29 18:46
(1368 d 05:38 ago)

@ ElMaestro
Posting: # 21603
Views: 4,457
 

 Still not sure what you are aiming at…

Hi ElMaestro,

❝ 1. Let us look at the wikipedia page for the t test:

❝ "Most test statistics have the form t = Z/s, where Z and s are functions of the data."


OK, so far.

❝ 2. For the t-distribution, here Z=sample mean - mean and s=sd/sqrt(n)


Wait a minute. You are referring to the one-sample t-test, right? At the Assumptions we find$$t=\frac{Z}{s}=\frac{\bar{X}-\mu}{\hat{\sigma}/\sqrt{n}}$$That’s a little bit strange because WP continues with

\(\hat{\sigma}\) is the estimate of the standard deviation of the population

I beg your pardon? Most of my textbooks give the same formula but with \(s\) in the denominator as the sample standard deviation. Of course, \(s/\sqrt{n}\) is the standard error and sometimes we find \(t=\frac{\bar{X}-\mu}{\textrm{SE}}\) instead. Nevertheless, further down we find$$t=\frac{\bar{x}-\mu_0}{s/\sqrt{n}}$$THX a lot, soothing!

❝ 3. Why are Z and s independent in this case?


[image]Here we know the population mean. Hence, the numerator depends on the sample mean and the denominator on the sample’s standard error. They are independent indeed.
I added another plot to the code of this post.

A modified plot of 5,000 samples to the right.

❝ Or more generally, and for me much more importantly, if we have two functions (f and g, or Z and s), then which properties of such functions or their input would render them independent??

❝ Wikipedia links to a page about independence, key here is: […]



Yep.

❝ I am fully aware that when we simulate a normal dist. with some mean and some variance, then that defines their expected estimates in a sample. I.e. if a sample has a mean that is higher than the simulated mean, then that does not necessarily mean the sampled sd is higher (or lower, for that matter, that was where I was going with "perturbation"). It sounds right to think of the two as independent, in that case.


Correct. Anything is possible.

❝ Now, how about the general case, for example if we know nothing about the nature of the sample, but just look at any two functions of the sample? What property would we look for in those two functions to think they are independent?

❝ A general understanding of the idea of independence of any two quantities derived from a sample, that is what I am looking for; point #3 above defines my question.


Still not sure whether I understand you  correctly  at all. Think about the general formulation of a test statistic from above $$t=\frac{Z}{s},$$where \(Z\) and \(s\) are functions of the data.
I think that this formulation is unfortunate because it has neither to do anything with the standard normal distribution \(Z\) nor the sample standard deviation \(s\). For continuous variables I would prefer sumfink like$$test\;statistic=\frac{measure\;of\;location}{measure\;of\;dispersion}$$for clarity. If a test would be constructed in such a way that the independence is not correctly represented it would be a piece of shit.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2020-06-30 02:55
(1367 d 21:29 ago)

@ Helmut
Posting: # 21611
Views: 4,376
 

 Still not sure what you are aiming at…

OK, I try again.

I give you two functions of a sample x, call the functions f and g (or Z and s, or alpha and beta, or apple and banana). Symbols not important. How do we determine that f and g are independent?

I completely see the point about mean and dispersion, no issue, the plot is a nice example of apparent "correlationlessness".
Simulation of that specific case aside, how about generally, when f and g are not necessarily mean and dispersion indicators of the x-sample from a normal distribution?

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-06-30 13:33
(1367 d 10:50 ago)

@ ElMaestro
Posting: # 21614
Views: 4,337
 

 Confuse-a-Cat

Hi ElMaestro,

❝ OK, I try again.


THX for your patience.

❝ I give you two functions of a sample x, call the functions f and g (or Z and s, or alpha and beta, or apple and banana). Symbols not important. How do we determine that f and g are independent?


[image] Feel like a cat.

In the OP (and now?) you were talking about a test and whether the numerator and denominator constructing it are independent functions of \(x\).$$t=\frac{Z}{s},\; Z=f(x)\:\wedge\:s=g(x)$$Somehow I have the feeling that the discussion moves towards transformations. Another cup of tea.

x    <- seq(1, 2, length.out = 100)
fun  <- data.frame(f.1 = sin(x), f.2 = sin(x+1), f.3 = cos(x),
                   f.4 = x^2,    f.5 = sqrt(x),  f.6 = tan(x))
corr <- data.frame(f.1 = rep(NA, 6), f.2 = NA, f.3 = NA,
                   f.4 = NA, f.5 = NA, f.6 = NA)
colnames(corr) <- rownames(corr) <- c("sin(x)", "sin(x+1)", "cos(x)",
                                      "x^2", "sqrt(x)", "tan(x)")
for (j in 1:nrow(corr)) {
  for (k in 1:ncol(corr)) {
    if (k < j) {
      corr[k, j] <- sprintf("%+7.5f", cor(fun[, j], fun[, k]))
    }
  }
}
corr[is.na(corr)] <- ""
corr[-nrow(corr), ]


❝ […] how about generally, when f and g are not necessarily mean and dispersion indicators of the x-sample from a normal distribution?


Are you not happy with existing tests (questioning the independence) and are trying to develop a new one? How does the “perturbation on the data” come into play?
Slowly I get the feeling that I can’t follow your arguments and I’m not qualified to answer your question. Sorry.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2020-06-30 15:07
(1367 d 09:17 ago)

@ Helmut
Posting: # 21617
Views: 4,302
 

 Confuse-a-Cat

Hi Helmut,

I am sorry that I am once again not able to tell what I am looking for :-D. I am not talking about simulations and not about transformations either.

Two functions, generally, and the key word is really generally, when are they (or their results) to be considered independent?

We can think of f=mean(x) and g=median(x). I guess we can easily do a mental picture of plotting means versus medians, often seeing the strong relationship. Visually appealing. Independence?

OK then let us say f=range(x) and g=sine function of the range (x).
Or an F statistics with a variance in both f=numerator and g=denominator in an unbalanced anova.
Or Cmax and AUCt (which I guess are correlated and dependent(?), but the example is not great in my perspective since the two functions are not applied to a random sample but to a time series).
There is no end to the possible examples.

And so forth. Without debating to much about the specific cases, how do we generally approach it to define two (outcomes of) functions as being independent? Which mathematical/algebraic/statistical/whatever properties of functions render them mutually independent? When I understand it, I think or hope I will understand the nature of independence.
For inspiration: Are estimates of any two statistical moments independent? If yes, why? Is it only the first and second? Why? Is it generally so? Why? Etc. I am looking for the general clarity.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-06-30 16:27
(1367 d 07:56 ago)

@ ElMaestro
Posting: # 21620
Views: 4,312
 

 Confuse-a-Cat

Hi ElMaestro,

❝ I am sorry that I am once again not able to tell what I am looking for :-D.


No, I’m sorry that I’m not able to comprehend your message. As you know, walnut-sized brain.

❝ I am not talking about simulations and not about transformations either.


OK.

❝ Two functions, generally, and the key word is really generally, when are they (or their results) to be considered independent?


I think:
  • The results of f(x) and g(x) should have a very (very!) low correlation.
  • Both should convey different information about the properties of x.
  • What else?

❝ We can think of f=mean(x) and g=median(x). I guess we can easily do a mental picture of plotting means versus medians, often seeing the strong relationship. Visually appealing.


Go back to my script and

med.spl <- aggregate(xs[, 2], list(sample = xs$sample), median)
plot(mean.spl$x, med.spl$x)
cor(mean.spl$x, med.spl$x)


❝ Independence?


No. Both are measures of the location of the data. IMHO, that would not be suitable to construct a test as stated at the end of this post.

❝ OK then let us say f=range(x) and g=sine function of the range (x).


:rotfl:

❝ Or an F statistics with a variance in both f=numerator and g=denominator in an unbalanced anova.


Oh dear!

❝ Or Cmax and AUCt (which I guess are correlated and dependent(?), …


Correlated, yes. Highly sometimes. In some cases not so much (recall that one). Why? Duno. Though correlation  causation.
Actually there are hidden (confounding) variables – the entire PK stuff – which drives the apparent correlation. So are they dependent even with a high correlation? I would say no. Both depend on the underlying PK.

❝ … but the example is not great in my perspective since the two functions are not applied to a random sample but to a time series).


Yep.

❝ There is no end to the possible examples.


Already the ones you mentioned gave me headaches.

❝ Without debating to much about the specific cases, how do we generally approach it to define two (outcomes of) functions as being independent?


See above. Maybe I’m completely wrong.

❝ Which mathematical/algebraic/statistical/whatever properties of functions render them mutually independent? When I understand it, I think or hope I will understand the nature of independence.

❝ For inspiration: Are estimates of any two statistical moments independent? If yes, why? Is it only the first and second? Why? Is it generally so? Why? Etc. I am looking for the general clarity.


Sorry, again.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
mittyri
★★  

Russia,
2020-07-01 02:04
(1366 d 22:20 ago)

@ ElMaestro
Posting: # 21624
Views: 4,242
 

 pseudorandom and linear independence

Hi ElMaestro,

I am not sure I can help you with that theoretical request, but would point out that math is talking about linear independent functions.
It is merely related to the vector linear independence and dimension problem. Since the term is connected to linear combination and derivatives, 'linear' is essential here.

But you may also like to refer to pseudorandom definition where you need to predefine the set of distinguishers before starting to solve the problem.

Kind regards,
Mittyri
martin
★★  

Austria,
2020-07-01 10:40
(1366 d 13:44 ago)

@ ElMaestro
Posting: # 21626
Views: 4,261
 

 Statistical independence, what is it? I mean really, what is it??

Dear ElMaestro,

I know this is might be confusing where you may find the corresponding mathematical proof of interest.


Here is another follow up on the definition of statistical independence - its a concept in probability theory. A very nice summary can be found here:

Two events A and B are statistical independent if and only if their joint probability can be factorized into their marginal probabilities, i.e., P(A ∩ B) = P(A)P(B). If two events A and B are statistical independent, then the conditional probability equals the marginal probability: P(A|B) = P(A) and P(B|A) = P(B).

Now applying this concept to random variables:

Two random variables X and Y are independent if and only if the events {X ≤ x} and {Y ≤ y} are independent for all x and y, that is, F(x, y) = F X (x)F Y (y), where F(x, y) is the joint cumulative distribution function and F X and F Y are the marginal cumulative distribution functions of X and Y, respectively.


best regards & hope this helps

Martin


Edit: Merged with a later (now deleted) post. You can edit your OP for 24 h. [Helmut]
ElMaestro
★★★

Denmark,
2020-07-01 11:42
(1366 d 12:42 ago)

@ martin
Posting: # 21628
Views: 4,210
 

 Statistical independence, what is it? I mean really, what is it??

Thanks Martin,

❝ Two random variables X and Y are independent if and only if the events {X ≤ x} and {Y ≤ y} are independent for all x and y, that is, F(x, y) = F X (x)F Y (y), where F(x, y) is the joint cumulative distribution function and F X and F Y are the marginal cumulative distribution functions of X and Y, respectively.


thanks for the posts.
I think now we are in the right direction, not confounding independence with correlation.
Given a sample x1,x2....xn, from which we estimate mean and variance, would we under the quote above consider the estimated mean and the estimated variance "random variables" in their own right, or is this immaterial to the issue at hand?

Pass or fail!
ElMaestro
martin
★★  

Austria,
2020-07-01 12:07
(1366 d 12:16 ago)

@ ElMaestro
Posting: # 21629
Views: 4,184
 

 Statistical independence, what is it? I mean really, what is it??

Dear ElMaestro,

If the sample on which the mean and standard deviations are calculated is smaller than infinity :-) than the mean and standard deviations are also random variables and follow a specfic distribution (i.e. sampling distribution).

To be precise: The sampling distribution of a statistic is the distribution of that statistic, considered as a random variable, when derived from a random sample of size n.

best regards & hope this helps

Martin
UA Flag
Activity
 Admin contact
22,957 posts in 4,819 threads, 1,636 registered users;
83 visitors (0 registered, 83 guests [including 8 identified bots]).
Forum time: 23:24 CET (Europe/Vienna)

Nothing shows a lack of mathematical education more
than an overly precise calculation.    Carl Friedrich Gauß

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5