ElMaestro ★★★ Denmark, 20200627 19:35 (352 d 09:10 ago) Posting: # 21582 Views: 3,996 

Hi all, what is statistical independence in layman's terms?? I thought I had a basic understanding of it, but now I am not so sure. For example: Wikipedia mentions on the page for the t distribution that t=Z/s where Z and s are independent, and Z = (sample mean  mean) and s is the sample sd divided by square root of n. Given that any perturbation on the data that changes Z will (may) also change s, I am kind of slightly conflicted regarding my own idea of independence. I am not so much looking for a textbook explaining what it is, or a link to some web page or other. I probably read those links that are the first 50 on google and got nowhere. Rather, I am humbly looking for an explanation that appeals to my braincells, the number of which is very, very limited. Muchas gracias. — Pass or fail! ElMaestro 
Helmut ★★★ Vienna, Austria, 20200628 11:35 (351 d 17:09 ago) @ ElMaestro Posting: # 21583 Views: 3,363 

Hi ElMaestro, » what is statistical independence in layman's terms?? See the subject line. » For example: [etc. etc.] Fire up R and execute the script at the end. There are samples with identical means but different variances and others with (almost) identical variances but different means. However, acc. to the central limit theorem (CLT) pooled parameters will sooner or later approach the population’s ones. » […] any perturbation on the data that changes Z will (may) also change s […] Not sure what you mean by “perturbation on the data”. Sample is a sample is a sample. Or are you thinking about something happening after sampling (e.g., transcription errors, manipulations, …)? If you have a single sample and the population’s parameters are – as usual – unknown, you enter the slippery ground of outliers – which is a mixture of assessing (deviations from) assumptions, knowledge of the data generating process (incl. checking for implausible values), . » Muchas gracias. De nada. PS: Comparing variances is a daring feat. In the example the ratio of largest/smallest sample mean is 1.18, whereas variances varied [pun] almost fourfold. This explains why outlier tests are not powerful and should be taken with a grain of salt.
— Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Denmark, 20200628 12:45 (351 d 15:59 ago) @ Helmut Posting: # 21584 Views: 3,311 

Hi Helmut, and thank for trying to help. If it did remember the last roll and the outcome was somehow a function of it, wouldn't that be correlation? And how is the die and last roll mental image fitting in with Z and s being indepedent? I think it fits better into the mental picture of errors in a model being IID. Or more generally: From a sample of size N I am deriving two quantities A and B. Under what cicumstances will A and B be independent? A die with Alzheimer does not really shed light on it, does it? Kindly help me a little more. I do not see the idea of the R code but it executes nicely except errors for idx.m and idx.v . Code on this forum rarely provides clarity for me. This has nothing to do with the code, but has everything to do with me — Pass or fail! ElMaestro 
Helmut ★★★ Vienna, Austria, 20200628 13:36 (351 d 15:08 ago) @ ElMaestro Posting: # 21585 Views: 3,310 

Hi ElMaestro, » If it did remember the last roll and the outcome was somehow a function of it, wouldn't that be correlation? Well, the die is Thick As A Brick. If a roll depends on the result of the last one, that could be (a) cheating by an expert gambler or (b) an imperfect die. In both cases you would have a correlation indeed and the concept collapses. » And how is the die and last roll mental image fitting in with Z and s being indepedent? I think it fits better into the mental picture of errors in a model being IID. My example deals with \(\small{\mathcal{N}(\mu,\sigma^2)}\). If you want to dive into combinatorics/permutations (coin flipping, rolling dice) see the first example in this post. » Or more generally: » From a sample of size N I am deriving two quantities A and B. Under what cicumstances will A and B be independent? A die with Alzheimer does not really shed light on it, does it? As usual: Know the data generating process. If you cannot be sure that the outcome of B is independent from A you are in deep shit. » Kindly help me a little more. I do not see the idea of the R code but it executes nicely …
» … except errors for idx.m and idx.v .Fuck. Try the edited one. » Code on this forum rarely provides clarity for me. This has nothing to do with the code, but has everything to do with me Sorry. I added more comments than usual. — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Denmark, 20200628 16:20 (351 d 12:24 ago) @ Helmut Posting: # 21586 Views: 3,277 

Hi Helmut, thanks for trying. I have to admit, I am sorry but I am none the wiser. Let us go over it again: Let x be our sample. Let f(xother shit) be a function of x, given some other shit. Let g(xother shit) be another function of x, given the same other shit. In general (and ideally layman's) terms, what properties of f and g would lead me to think f and g are independent. I assume it is implied that we are talking about independence from each other, at least when we try and think of f and g as numerators and denominators of the quantity defining t above. When this is the wrong perception, kindly correct me, please. — Pass or fail! ElMaestro 
Helmut ★★★ Vienna, Austria, 20200628 19:55 (351 d 08:49 ago) @ ElMaestro Posting: # 21587 Views: 3,260 

Hi ElMaestro, » Let us go over it again: » Let x be our sample. » Let f(xother shit) be a function of x, given some other shit. » Let g(xother shit) be another function of x, given the same other shit. Like this (or any other funny transformation)?
» In general (and ideally layman's) terms, what properties of f and g would lead me to think f and g are independent. » » I assume it is implied that we are talking about independence from each other, at least when we try and think of f and g as numerators and denominators of the quantity defining t above. When this is the wrong perception, kindly correct me, please. Completely confused. Can you try again, please? — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Denmark, 20200629 06:30 (350 d 22:15 ago) @ Helmut Posting: # 21591 Views: 3,226 

Hi Hötzi, » Completely confused. Can you try again, please? OK.
I am fully aware that when we simulate a normal dist. with some mean and some variance, then that defines their expected estimates in a sample. I.e. if a sample has a mean that is higher than the simulated mean, then that does not necessarily mean the sampled sd is higher (or lower, for that matter, that was where I was going with "perturbation"). It sounds right to think of the two as independent, in that case. Now, how about the general case, for example if we know nothing about the nature of the sample, but just look at any two functions of the sample? What property would we look for in those two functions to think they are independent? A general understanding of the idea of independence of any two quantities derived from a sample, that is what I am looking for; point #3 above defines my question. — Pass or fail! ElMaestro 
Helmut ★★★ Vienna, Austria, 20200629 14:46 (350 d 13:59 ago) @ ElMaestro Posting: # 21603 Views: 3,203 

Hi ElMaestro, » 1. Let us look at the wikipedia page for the t test: » "Most test statistics have the form t = Z/s, where Z and s are functions of the data." OK, so far. » 2. For the tdistribution, here Z=sample mean  mean and s=sd/sqrt(n) Wait a minute. You are referring to the onesample ttest, right? At the Assumptions we find$$t=\frac{Z}{s}=\frac{\bar{X}\mu}{\hat{\sigma}/\sqrt{n}}$$That’s a little bit strange because WP continues with \(\hat{\sigma}\) is the estimate of the standard deviation of the population I beg your pardon? Most of my textbooks give the same formula but with \(s\) in the denominator as the sample standard deviation. Of course, \(s/\sqrt{n}\) is the standard error and sometimes we find \(t=\frac{\bar{X}\mu}{\textrm{SE}}\) instead. Nevertheless, in further down we find$$t=\frac{\bar{x}\mu_0}{s/\sqrt{n}}$$THX a lot, soothing!» 3. Why are Z and s independent in this case? Here we know the population mean. Hence, the numerator depends on the sample mean and the denominator on the sample’s standard error. They are independent indeed. I added another plot to the code of this post. A modified plot of 5,000 samples to the right. » Or more generally, and for me much more importantly, if we have two functions (f and g, or Z and s), then which properties of such functions or their input would render them independent?? » Wikipedia links to a page about independence, key here is: […] Yep. » I am fully aware that when we simulate a normal dist. with some mean and some variance, then that defines their expected estimates in a sample. I.e. if a sample has a mean that is higher than the simulated mean, then that does not necessarily mean the sampled sd is higher (or lower, for that matter, that was where I was going with "perturbation"). It sounds right to think of the two as independent, in that case. Correct. Anything is possible. » Now, how about the general case, for example if we know nothing about the nature of the sample, but just look at any two functions of the sample? What property would we look for in those two functions to think they are independent? » A general understanding of the idea of independence of any two quantities derived from a sample, that is what I am looking for; point #3 above defines my question. Still not sure whether I understand you I think that this formulation is unfortunate because it has neither to do anything with the standard normal distribution \(Z\) nor the sample standard deviation \(s\). For continuous variables I would prefer sumfink like$$test\;statistic=\frac{measure\;of\;location}{measure\;of\;dispersion}$$for clarity. If a test would be constructed in such a way that the independence is not correctly represented it would be a piece of shit. — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Denmark, 20200629 22:55 (350 d 05:50 ago) @ Helmut Posting: # 21611 Views: 3,141 

OK, I try again. I give you two functions of a sample x, call the functions f and g (or Z and s, or alpha and beta, or apple and banana). Symbols not important. How do we determine that f and g are independent? I completely see the point about mean and dispersion, no issue, the plot is a nice example of apparent "correlationlessness". Simulation of that specific case aside, how about generally, when f and g are not necessarily mean and dispersion indicators of the xsample from a normal distribution? — Pass or fail! ElMaestro 
Helmut ★★★ Vienna, Austria, 20200630 09:33 (349 d 19:12 ago) @ ElMaestro Posting: # 21614 Views: 3,117 

Hi ElMaestro, » OK, I try again. THX for your patience. » I give you two functions of a sample x, call the functions f and g (or Z and s, or alpha and beta, or apple and banana). Symbols not important. How do we determine that f and g are independent? Feel like a cat. In the OP (and now?) you were talking about a test and whether the numerator and denominator constructing it are independent functions of \(x\).$$t=\frac{Z}{s},\; Z=f(x)\:\wedge\:s=g(x)$$Somehow I have the feeling that the discussion moves towards transformations. Another cup of tea.
» […] how about generally, when f and g are not necessarily mean and dispersion indicators of the xsample from a normal distribution? Are you not happy with existing tests (questioning the independence) and are trying to develop a new one? How does the “perturbation on the data” come into play? Slowly I get the feeling that I can’t follow your arguments and I’m not qualified to answer your question. Sorry. — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Denmark, 20200630 11:07 (349 d 17:38 ago) @ Helmut Posting: # 21617 Views: 3,094 

Hi Helmut, I am sorry that I am once again not able to tell what I am looking for . I am not talking about simulations and not about transformations either. Two functions, generally, and the key word is really generally, when are they (or their results) to be considered independent? We can think of f=mean(x) and g=median(x). I guess we can easily do a mental picture of plotting means versus medians, often seeing the strong relationship. Visually appealing. Independence? OK then let us say f=range(x) and g=sine function of the range (x). Or an F statistics with a variance in both f=numerator and g=denominator in an unbalanced anova. Or Cmax and AUCt (which I guess are correlated and dependent(?), but the example is not great in my perspective since the two functions are not applied to a random sample but to a time series). There is no end to the possible examples. And so forth. Without debating to much about the specific cases, how do we generally approach it to define two (outcomes of) functions as being independent? Which mathematical/algebraic/statistical/whatever properties of functions render them mutually independent? When I understand it, I think or hope I will understand the nature of independence. For inspiration: Are estimates of any two statistical moments independent? If yes, why? Is it only the first and second? Why? Is it generally so? Why? Etc. I am looking for the general clarity. — Pass or fail! ElMaestro 
Helmut ★★★ Vienna, Austria, 20200630 12:27 (349 d 16:17 ago) @ ElMaestro Posting: # 21620 Views: 3,094 

Hi ElMaestro, » I am sorry that I am once again not able to tell what I am looking for . No, I’m sorry that I’m not able to comprehend your message. As you know, walnutsized brain. » I am not talking about simulations and not about transformations either. OK. » Two functions, generally, and the key word is really generally, when are they (or their results) to be considered independent? I think:
» We can think of f=mean(x) and g=median(x). I guess we can easily do a mental picture of plotting means versus medians, often seeing the strong relationship. Visually appealing. Go back to my script and
» Independence? No. Both are measures of the location of the data. IMHO, that would not be suitable to construct a test as stated at the end of this post. » OK then let us say f=range(x) and g=sine function of the range (x). » Or an F statistics with a variance in both f=numerator and g=denominator in an unbalanced anova. Oh dear! » Or Cmax and AUCt (which I guess are correlated and dependent(?), … Correlated, yes. Highly sometimes. In some cases not so much (recall that one). Why? Duno. Though correlation ≠ causation. Actually there are hidden ( confounding) variables – the entire PK stuff – which drives the apparent correlation. So are they dependent even with a high correlation? I would say no. Both depend on the underlying PK. » … but the example is not great in my perspective since the two functions are not applied to a random sample but to a time series). Yep. » There is no end to the possible examples. Already the ones you mentioned gave me headaches. » Without debating to much about the specific cases, how do we generally approach it to define two (outcomes of) functions as being independent? See above. Maybe I’m completely wrong. » Which mathematical/algebraic/statistical/whatever properties of functions render them mutually independent? When I understand it, I think or hope I will understand the nature of independence. » For inspiration: Are estimates of any two statistical moments independent? If yes, why? Is it only the first and second? Why? Is it generally so? Why? Etc. I am looking for the general clarity. Sorry, again. — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
mittyri ★★ Russia, 20200630 22:04 (349 d 06:41 ago) @ ElMaestro Posting: # 21624 Views: 3,011 

Hi ElMaestro, I am not sure I can help you with that theoretical request, but would point out that math is talking about linear independent functions. It is merely related to the vector linear independence and dimension problem. Since the term is connected to linear combination and derivatives, 'linear' is essential here. But you may also like to refer to pseudorandom definition where you need to predefine the set of distinguishers before starting to solve the problem. — Kind regards, Mittyri 
martin ★★ Austria, 20200701 06:40 (348 d 22:05 ago) @ ElMaestro Posting: # 21626 Views: 3,000 

Dear ElMaestro, I know this is might be confusing where you may find the corresponding mathematical proof of interest. Here is another follow up on the definition of statistical independence  its a concept in probability theory. A very nice summary can be found here: Two events A and B are statistical independent if and only if their joint probability can be factorized into their marginal probabilities, i.e., P(A ∩ B) = P(A)P(B). If two events A and B are statistical independent, then the conditional probability equals the marginal probability: P(AB) = P(A) and P(BA) = P(B). Now applying this concept to random variables: Two random variables X and Y are independent if and only if the events {X ≤ x} and {Y ≤ y} are independent for all x and y, that is, F(x, y) = F X (x)F Y (y), where F(x, y) is the joint cumulative distribution function and F X and F Y are the marginal cumulative distribution functions of X and Y, respectively. best regards & hope this helps Martin Edit: Merged with a later (now deleted) post. You can edit your OP for 24 h. [Helmut] 
ElMaestro ★★★ Denmark, 20200701 07:42 (348 d 21:03 ago) @ martin Posting: # 21628 Views: 2,991 

Thanks Martin, » Two random variables X and Y are independent if and only if the events {X ≤ x} and {Y ≤ y} are independent for all x and y, that is, F(x, y) = F X (x)F Y (y), where F(x, y) is the joint cumulative distribution function and F X and F Y are the marginal cumulative distribution functions of X and Y, respectively. thanks for the posts. I think now we are in the right direction, not confounding independence with correlation. Given a sample x1,x2....xn, from which we estimate mean and variance, would we under the quote above consider the estimated mean and the estimated variance "random variables" in their own right, or is this immaterial to the issue at hand? — Pass or fail! ElMaestro 
martin ★★ Austria, 20200701 08:07 (348 d 20:37 ago) @ ElMaestro Posting: # 21629 Views: 2,959 

Dear ElMaestro, If the sample on which the mean and standard deviations are calculated is smaller than infinity than the mean and standard deviations are also random variables and follow a specfic distribution (i.e. sampling distribution). To be precise: The sampling distribution of a statistic is the distribution of that statistic, considered as a random variable, when derived from a random sample of size n. best regards & hope this helps Martin 