victor ☆ Malaysia, 20191116 21:57 (314 d 18:48 ago) (edited by victor on 20191116 22:09) Posting: # 20813 Views: 4,482 

Hi everyone! I'm new to pharmacokinetics, and I'm wondering what is the largest α (Alpha) & β (Beta) allowed by FDA, for each of the three hypothesis tests illustrated below (with each α & β highlighted Where:
Thanks in advance P.S. if you spot any mistake in my illustration below, could you kindly inform me as well? ଘ(੭*ˊᵕˋ)੭* ̀ˋ P.S. The following post didn't submit correctly, even though the preview for it was working. So, I decided to screenshot my question instead. Hope it is acceptable :) Edit: Category changed; see also this post #1. Link to 643KiB 2,000px photo deleted and changed to a downscaled variant. [Helmut] 
Helmut ★★★ Vienna, Austria, 20191117 01:26 (314 d 15:19 ago) @ victor Posting: # 20814 Views: 3,741 

Hi Victor, I tried to reconstruct your original post as good as I could. Since it was broken before the first “\(\mathcal{A}\)”, I guess you used an UTF16 character whereas the forum is coded in UTF8. Please don’t link to large images breaking the layout of the posting area and forcing us to scroll our viewport. THX. I think that your approach has same flaws.
— Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
victor ☆ Malaysia, 20191117 10:53 (314 d 05:52 ago) (edited by victor on 20191117 14:16) @ Helmut Posting: # 20815 Views: 3,725 

» Hi Victor, I tried to reconstruct your original post as good as I could. Since it was broken before the first “\(\mathcal{A}\)”, I guess you used an UTF16 character whereas the forum is coded in UTF8. Hi Helmut, Thanks for helping me out :) Edit: after a quick experiment (click here to see screenshot), it seems that the “\(\mathcal{A}\)” I used was a UTF8 character after all? ⊙.☉ » Please don’t link to large images breaking the layout of the posting area and forcing us to scroll our viewport. THX. Noted, and thanks for downscaling my original image :) » I think that your approach has same flaws.
I see; I thought it would make sense for T_{max} to also be transformed because of googling stuff like this: coupled with the fact that the population distribution that is being analyzed looks a lot like a Lognormal distribution; so I thought normalizing T_{max} just made sense, since almost all distributions studied in undergraduate (e.g. Fdistribution used by ANOVA) are ultimately transformations of one or more standard normals. With that said, is the above stuff that I googled, wrong? »
Thanks for enlightening me that I can now restate the current standard's hypothesis in a "more familiar (undergraduatelevel)" form: $$H_0: ln(\mu_T)  ln(\mu_R) \notin \left [ ln(\theta_1), ln(\theta_2) \right ]\:vs\:H_1: ln(\theta_1) < ln(\mu_T)  ln(\mu_R) < ln(\theta_2)$$ I now realize that I was actually using the old standard's hypothesis (whose null tested for bioequivalence, instead of the current standard's null for bioinequivalence), which had problems with their α & β (highlighted in red below, cropped from this paper), thus rendering my initial question pointless, because I was analyzing an old problem; i.e. before Hauck and Anderson's 1984 paper. »
With that said, regarding the old standard's hypothesis (whose null tested for bioequivalence), I was originally curious (although it may be a meaningless problem now, but I'm still curious) on how they bounded the familywise error rate (FWER) if α=5% for each hypothesis test, since the probability of committing one or more type I errors when performing three hypothesis tests = 1  (1α)^3 = 1  (10.05)^3 = 14.26% (if those three hypothesis tests were actually independent). The same question more importantly applied to β, since in the old standard's hypothesis (whose null tested for bioequivalence), "the consumer’s risk is defined as the probability (β) of accepting a formulation which is bioinequivalent, i.e. accepting H_{0} when H_{0} is false (Type II error)." (as quoted from page 212 of the same paper). Do you know how FDA bounded the "global" α & β before 1984? Because I am curious on "what kind of secret math technique" was happening behindthescenes that allowed 12 randomsamples to be considered "good enough by the FDA"; i.e.
Thanks in advance :) ଘ(੭*ˊᵕˋ)੭* ̀ˋ 
Helmut ★★★ Vienna, Austria, 20191117 14:35 (314 d 02:10 ago) @ victor Posting: # 20816 Views: 3,787 

Hi Victor, » I did use UTF8 though, because the following HTML works, and I could save (and reopen) it using my Windows 10's notepad.exe under UTF8 encoding; but […] Duno. The mysteries of HTML/CSS/php/MySQL. » » Please don’t link to large images breaking the layout of the posting area and forcing us to scroll our viewport. THX. » » Noted, and thanks for downscaling my original image :) Sorry if the downscaled image shows poor legibility. The one in full resolution here. » I thought it would make sense for T_{max} to also be transformed because of googling stuff like this: » Aha! A presentation by Mr Concordet of 2004. Heteroscedasticity refers to more than one distribution. A single distribution might be skewed; I guess that is what was meant. When we apply a parametric method (ANOVA, ttests) one of the assumptions – as you correctly stated in your graph – is that residual errors follow a normal distribution. It makes sense to assume that PK metrics like AUC and C_{max} follow a lognormal distribution since concentrations are bound to \(\mathbb{R}^+\) (negative ones don’t exist and zero is excluded). However, even if this assumption would be wrong, the only important thing is that the model’s residuals are approximately normal, i.e., \(\epsilon \approx \mathcal{N}(0,1)\). It should be noted that the ttest is fairly robust against heteroscedasticity but very sensitive to unequal sequences (crossover) and group sizes (parallel design). That’s why the FDA recommends in any case Satterthwaite’s approximation of the degrees of freedom. t_{max}^{1} is yet another story. The distribution strongly depends on the study’s sampling schedule but is definitely discrete. Given, the underlying one likely is continuous^{2} but we simply don’t have an infinite number of samples in NCA. A logtransformation for discrete distributions is simply not allowed. Hence, what is stated in this slide is wrong. Many people opt for one of the variants of the Wilcoxon test to assess the difference. Not necessarily correct. The comparison of the shift in locations is only valid if distributions are equal. If not, one has to opt for the BrunnerMunzel test^{3} (available in the R package nparcomp ).» … coupled with the fact that the population distribution that is being analyzed looks a lot like a Lognormal distribution; Wait a minute. It is possible to see concentrationtime profiles as distributions resulting from a stochastic process. The usual statistical moments are:$$S_0=\int f(x)dx$$ $$S_1=\int x\cdot f(x)dx$$ $$S_2=\int x^2\cdot f(x)dx$$ where in PK \(x=t\) and \(f(x)=C\) leads to \(S_0=AUC\) and \(S_1=AUMC\). AFAIK, there is no particular term for \(S_2\) in PK though it is – rarely – used to calculate the “Variance of Residence Times” as \(VRT=S_2/S_0(S_1/S_0)^2\). The intersection of \(MRT\) (the vertical line) and \(VRT\) (the horizontal line) is the distribution’s “Center of Gravity”. Print it and stick a needle trough it. Makes a nice whizz wheel.^{4} I know only one approach trying to directly compare profiles based on momenttheory (the KullbackLeibler information criterion).^{5} Never tried it with real data but I guess one might face problems with variability. Les Benet once stated (too lazy to search where and when) that for a reliable estimate of AUC one has to sample in such a way that the extrapolated fraction is 5–20%. For AUMC one would need 1–5%. No idea about VRT but in my experience its variability is extreme. » … so I thought normalizing T_{max} just made sense, […]. With that said, is the above stuff that I googled, wrong? Yes it is because such a transformation for a discrete distribution (like t_{max}) is not allowed. » […] I can now restate the current standard's hypothesis in a "more familiar (undergraduatelevel)" form: » $$H_0: ln(\mu_T)  ln(\mu_R) \notin \left [ ln(\theta_1), ln(\theta_2) \right ]\:vs\:H_1: ln(\theta_1) < ln(\mu_T)  ln(\mu_R) < ln(\theta_2)$$ Correct. » I now realize that I was actually using the old standard's hypothesis (whose null tested for bioequivalence, instead of the current standard's null for bioinequivalence) […] Correct, again. » […] regarding the old standard's hypothesis (whose null tested for bioequivalence), I was originally curious (although it may be a meaningless problem now, but I'm still curious) on how they bounded the familywise error rate (FWER) if α=5% for each hypothesis test, since the probability of committing one or more type I errors when performing three hypothesis tests = 1  (1α)^3 = 1  (10.05)^3 = 14.26% (if those three hypothesis tests were actually independent). Here you err. We are not performing independent tests (which would call for Bonferroni’s adjustment or similar) but have to pass all tests. Hence, the IUT keeps the FWER at 5% or lower. Actually it is more conservative than the single tests themselves and will get more conservative the more the PK metrics differ. See the R code provided by Martin in this post for an example. » The same question more importantly applied to β, since in the old standard's hypothesis (whose null tested for bioequivalence), "the consumer’s risk is defined as the probability (β) of accepting a formulation which is bioinequivalent, i.e. accepting H_{0} when H_{0} is false (Type II error)." (as quoted from page 212 of the same paper). Correct but gone with the wind. See this post how the BE requirements of the FDA evolved. » Do you know how the FDA bounded the "global" α & β before 1984? α was always 0.05. In the – very – early days of the CIinclusion approach some people wrongly used a 95% CI, which actually implies α 0.025. As the name tells the FDA’s 80/20 rule required ≥80% power or β ≤20%. Nowadays post hoc (a posteriori, retrospective) power is irrelevant. Power (1–β) is important only in designing a study where in most jurisdictions 80–90% (β 10–20%) are recommended. This allows to answer your original question: » » » What is the largest α & β allowed by FDA? α 0.05 (since it is fixed in the method) and β is not assessed. It can be very high (i.e., low “power”) if the assumptions leading to the sample size were not realized in the study (e.g., larger deviation of T from R and/or higher variability than assumed, higher number of dropouts than anticipated). However, quoting ElMaestro: Being lucky is not a crime. On the other hand, a very high producer’s risk in designing a study is like gambling and against ICH E9:The number of subjects in a clinical trial should always be large enough to provide a reliable answer to the questions addressed. Hopefully such a protocol is rejected by the IEC.» […] I am curious on "what kind of secret math technique" was happening behindthescenes that allowed 12 randomsamples to be considered "good enough by FDA". No idea. Rooted in Babylonian numbers? See also this post. For referencescaling the FDA requires at least 24 subjects. A minimum sample 12 is recommended in all jurisdictions. IMHO, not based on statistics (large power for low variability and T/R close to 1) but out of the blue.
— Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
victor ☆ Malaysia, 20191118 08:26 (313 d 08:19 ago) (edited by victor on 20191118 08:59) @ Helmut Posting: # 20817 Views: 3,564 

Wow! I'm blown away by your amazing answers (and reply speed)! Since it will take me time to properly digest and reply all the details you just shared (especially on IUT, because IUT is new to me, and I think IUT might be the missing link I was searching for to deal with dependencies in multiple hypothesis testing), hence I decided to just respond to specific concepts, one at a time :) Also, thanks for confirming which concepts I grasped correctly, because as someone new to pharmacokinetics, it definitely helps me eliminate logic errors (｡♥‿♥｡) For now, I'd like to dig a little deeper to understand all the logic errors(?) that I made regarding t_{max}. »
Good point :) I initially chose to use T_{max} because I'm used to writing random variables in uppercase, to help myself catch for syntax errors if I ever wrote something silly like E[t], when t is not a random variable, etc. With that said...(I rearranged your reply below to fit the flow better) » » … coupled with the fact that the population distribution that is being analyzed looks a lot like a Lognormal distribution; so I thought normalizing T_{max} just made sense, since almost all distributions studied in undergraduate (e.g. Fdistribution used by ANOVA) are ultimately transformations of one or more standard normals. » » Yes it is because such a transformation for a discrete distribution (like t_{max}) is not allowed. » » The distribution strongly depends on the study’s sampling schedule but is definitely discrete. Given, the underlying one likely is continuous^{2} but we simply don’t have an infinite number of samples in NCA. A logtransformation for discrete distributions is simply not allowed. Hence, what is stated in this slide is wrong. Many people opt for one of the variants of the Wilcoxon test to assess the difference. Not necessarily correct. The comparison of the shift in locations is only valid if distributions are equal. If not, one has to opt for the BrunnerMunzel test^{3} (available in the R package nparcomp ).Does it mean that the Nyquist criterion is not satisfied in pharmacokinetics (in terms of sampling interval)? Couldn't we just design our study’s sampling schedule to ensure that Nyquist criterion is satisfied so that we can perfectly reconstruct the original continuoustime function from the samples? » Wait a minute. It is possible to see concentrationtime profiles as distributions resulting from a stochastic process. […] I know only one approach trying to directly compare profiles based on momenttheory (the KullbackLeibler information criterion).^{5} Yes, I originally viewed the population distribution that is being analyzed as a stochastic process, but I didn't dive in too deep to check if the momentgenerating function of a lognormal distribution "matches" the (concentrationtime profile?) model used in pharmacokinetics (because I'm still learning about the models used in pharmacokinetics), so it was more of a "visually looks a lot like a lognormal distribution" on my part :p With that said, this specific fact you shared intrigues me... » Les Benet once stated (too lazy to search where and when) that for a reliable estimate of AUC one has to sample in such a way that the extrapolated fraction is 5–20%. For AUMC one would need 1–5%. No idea about VRT but in my experience its variability is extreme. I'm familiar with Taylor Series Approximations, Fourier series approximation, etc. but never really thought about how the error terms of a momentgenerating function is controlled if I use, for example, a lognormal distribution to describe the concentrationtime profile. It might be obvious (but I don't recall learning it in undergraduate though, did I?), but I kinda prefer spending my time learning about IUT instead of thinking about this :p, so I was wondering if you have any good reference materials for learning about how to correctly use the momentgenerating function to model a distribution in reallife? (e.g. concentrationtime profile) Fun fact: as I was briefly reading about Kullback–Leibler divergence, I thought it looked familiar, then I realized I first encountered it when reading about IIT Thanks once again for all your amazing replies! ଘ(੭*ˊᵕˋ)੭* ̀ˋ 
Helmut ★★★ Vienna, Austria, 20191118 15:09 (313 d 01:36 ago) @ victor Posting: # 20821 Views: 3,518 

Hi Victor, » » Edit: after a quick experiment (click here to see screenshot), it seems that the “\(\mathcal{A}\)” I used was a UTF8 character after all? ⊙.☉ Correct, and the mystery is resolved. bin 11110000 10011101 10010011 10010000 hex F09D9390 When I paste the character it is shown in the preview but not in the post. The field of the databasetable is of type utf8_general_ci , supporting only characters with a length of 3 bytes, whereas yours has 4. That’s it. In the development forum (in German, sorry) we realized that changing the type to utf8mb4_general_ci (neither of the field, the table, or the entire DB) alone can resolve the issue. It requires to rewrite all parts of the scripts handling the connection php/MySQL. Not easy and not my top priority.» I'm blown away by your amazing answers (and reply speed)! My pleasure. » Since it will take me time to properly digest and reply all the details you just shared (especially on IUT, because IUT is new to me, and I think IUT might be the missing link I was searching for to deal with dependencies in multiple hypothesis testing), … It’s not that complicated. Let’s explore the plot: We have three tests. The areas give their type I errors. Since we perform all at the same level, the areas are identical. The Euclidean distance between centers give the correlation of PK metrics (here they are identical as well). The FWER is given by the area of the intersection which in any case will be ≤ the nominal α. In reality the correlation of AUC_{0–∞} (green) with AUC_{0–t} (blue) is higher than the correlation of both with C_{max} (red). If we would test only the AUCs, the FWER would by given again by the intersection which is clearly lower than the individual type I errors. If we add C_{max}, the FWER decreases further. Unfortunately the correlations are unknown. See this post for an example with two metrics and a reference how to deal with comparisons of multiple PK metrics. In PowerTOST two simultaneous tests are implemented. Say we have a CV of 0.25 for both, a sample size of 28, and the highest possible correlation of 1.
» » T_{max} is poor terminology. “T” denotes the absolute temperature in Kelvin whereas “t” stands for time. Hence, t_{max} should be used. » I initially chose to use T_{max} because I'm used to writing random variables in uppercase, … Yep, that’s fine. Unfortunately “T_{max}” is used in some regulatory documents… When I review papers, I have a standard rant pointing that difference out followed by “regulations ≠ science”. » » The distribution strongly depends on the study’s sampling schedule but is definitely discrete. […] A logtransformation for discrete distributions is simply not allowed. » » Does it mean that the Nyquist criterion is not satisfied in pharmacokinetics (in terms of sampling interval)? Duno. Sorry. » Couldn't we just design our study’s sampling schedule to ensure that Nyquist criterion is satisfied so that we can perfectly reconstruct the original continuoustime function from the samples? That’s wishful thinking. People tried a lot with Doptimal designs in PK. What I do is trying to set up a model and explore different sampling schemes based on the variance inflection factor. Regrettably, it rarely works. Modeling absorption is an art rather than science – especially with lagtimes (delayed onset of absorption due to gastric emptying, gastricresistant coatings, ). What if the model suggests to draw fifty blood samples (to “catch” C_{max}/t_{max} in the majority of subjects) and you are limited to twenty? It’s always a compromise. » […] I didn't dive in too deep to check if the momentgenerating function of a lognormal distribution "matches" the (concentrationtime profile?) model used in pharmacokinetics (because I'm still learning about the models used in pharmacokinetics), so it was more of a "visually looks a lot like a lognormal distribution" on my part :p Don’t fall into the trap of visual similarity. The fact that a concentration profile after an oral dose looks similar like a lognormal distribution is a mere coincidence. In a twocompartment model (right example: distribution is three times faster than elimination) the center of gravity is outside the profile; my “whizz wheel proof” would not work any more. Or even more extreme an intravenous dose… Of note the mean of residence times \(MRT=\frac{AUMC}{AUC}\) is nice because we can compare different models (say, a one with a twocompartment model). Independent from the model after MRT ~⅔ of the drug are eliminated. For years I try to educate clinicians to abandon half lives (which are less informative) but old believes die hard (see there, slides 24–28). If you want to dive into the KullblackLeiber divergence note that any distributions can be compared. The fact that we logtransform AUC and C_{max} in BE has three reasons:
Not sure what the current state of affairs are but in the past the Malaysian authority preferred the log_{10}transformation over log_{e}. Not a particular good idea since PK is based on exponential functions. Furthermore, it makes our life miserable. The coefficient of variation based on the residual error of logtransformed data is given as \(CV=\sqrt{\textrm{e}^{MSE}1}\). That’s the only one you find in textbooks and guidelines. If one used a log_{10}transformation the appropriate formula is \(CV=\sqrt{10^{\textrm{log}_{e}(10)\cdot MSE}1}\). I have seen wrong sample size estimations where the former was used instead of the latter. » » Les Benet once stated (too lazy to search where and when) that for a reliable estimate of AUC one has to sample in such a way that the extrapolated fraction is 5–20%. For AUMC one would need 1–5%. » » I'm familiar with Taylor Series Approximations, Fourier series approximation, etc. but never really thought about how the error terms of a momentgenerating function is controlled if I use, for example, a lognormal distribution to describe the concentrationtime profile… Forget about that. » […] I kinda prefer spending my time learning about IUT instead of thinking about this Good luck. Not that difficult. » … so I was wondering if you have any good reference materials for learning about how to correctly use the momentgenerating function to model a distribution in reallife? (e.g. concentrationtime profile) (1) No idea and (2) the approach of seeing a PK profile as a distribution is pretty exotic. — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
victor ☆ Malaysia, 20191118 20:16 (312 d 20:29 ago) (edited by victor on 20191118 20:49) @ Helmut Posting: # 20824 Views: 3,424 

Wow! Yet another awesome answer (and a big round of applause for solving the UTF8 mystery)! Honestly, I feel like a caveman who has been gifted a powerful handphone (by you), equipped with GPS, google map, etc. to help me navigate the forest of pharmacokinetics, while I'm still looking at bushes (not even trees yet!) and using that handphone as a mirror Thanks for all the great pointers, keywords, and properties to look for :) Especially the idea behind IUT, because its my first time hearing that correlation had such a geometric interpretation! (｡♥‿♥｡) » We have three tests. The areas give their type I errors. Since we perform all at the same level, the areas are identical. The Euclidean distance between centers give the correlation of PK metrics (here they are identical as well). The FWER is given by the area of the intersection which in any case will be ≤ the nominal α. » » In reality the correlation of AUC_{0–∞} (green) with AUC_{0–t} (blue) is higher than the correlation of both with C_{max} (red). If we would test only the AUCs, the FWER would by given again by the intersection which is clearly lower than the individual type I errors. If we add C_{max}, the FWER decreases further. Now I'm even MORE looking forward to studying IUT in detail, because it reminds me of when I first learned that I could use SVD to geometrically visualize the error ellipsoid of a covariance matrix. I found it beautiful :) Thanks Helmut, for all your help (*ˊᗜˋ*)/ᵗᑋᵃᐢᵏ ᵞᵒᵘ* P.S. I'm spending time reading on IUT now, hence the short reply; but I thought I'd end with a nice little picture for memory :p 
Helmut ★★★ Vienna, Austria, 20191119 12:01 (312 d 04:44 ago) @ victor Posting: # 20827 Views: 3,360 

Hi Victor, » Honestly, I feel like a caveman […] C’mon! Your skills in maths are impressive. If you want to dive deeper into the matter:
The proof of the result is almost trivial, at least if one is willing to adopt some piece of the basic formalism customary in expositions of the abstract theory of statistical hypothesis testing methods. […] The condition we have to verify, reads […] as follows: » I thought I'd end with a nice little picture for memory :p Nice picture! Do you know Anscombe’s quartet? — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
victor ☆ Malaysia, 20191122 01:28 (309 d 15:17 ago) (edited by victor on 20191122 02:18) @ Helmut Posting: # 20857 Views: 3,290 

» » Honestly, I feel like a caveman […]» » C’mon! Your skills in maths are impressive. Thanks Helmut I appreciate it a lot. (*^‿^*) I'm also grateful that you shared that excerpt (more on this later, because Chapter 7 wasn't available on books.google.com). Thankfully, I found enough free time to analyze the following IUT theorem using my notes when studying Theorem 8.3.4 from page 122 of this lecture notes. (I don't have access to any of your recommended textbooks yet though). P.S. I intentionally used < instead of ≤ because I feel that in practice, α_{Y} will always be less than one. Besides that, my Gaussian level sets didn't scale correctly against my handdrawn background, so their shapes are compromised; they should actually have the same covariance matrix. I'm now thinking about all the cases when H_{0}'s covariance matrix is different from the true covariance matrix (such as how β would look like) to see how IUT really dealt with dependencies. But a preliminary glance seem to suggest that the above theorem is violated by the following cases, so I definitely need to think deeper on what the theorem actually states (i.e. what it means for IUT to be a level α test of H_{0} versus H_{1}) to truly understand how the following cases affect the "global" α and β: With that said, now that I am starting to get a feel for IUT, I feel that I am getting closer to truly understand the following facts you shared: »
As of now though, I haven't managed to see the "new" geometric interpretation of correlation (yet), so the following facts are still not within my grasp: » The FWER gets more conservative the more the PK metrics differ. » The Euclidean distance between centers give the correlation of PK metrics (here they are identical as well). The FWER is given by the area of the intersection which in any case will be ≤ the nominal α. » » In reality the correlation of AUC_{0–∞} (green) with AUC_{0–t} (blue) is higher than the correlation of both with C_{max} (red). If we would test only the AUCs, the FWER would by given again by the intersection which is clearly lower than the individual type I errors. If we add C_{max}, the FWER decreases further. » » The proof of the result is almost trivial, at least if one is willing to adopt some piece of the basic formalism customary in expositions of the abstract theory of statistical hypothesis testing methods. […] The condition we have to verify, reads […] as follows: » I'm thinking that the excerpt you shared contains the crucial info for me to see the "new" geometric interpretation of correlation, because of the following statement: » […] with \(\rho(\cdot,\cdot)\) denoting a suitable measure of distance between parameters. but I'm very confused by the excerpt's notations, because I couldn't find their corresponding notations in the aforementioned lecture notes (nor by googling); in particular: 【・ヘ・?】
Thanks in advance for the clarification » Do you know Anscombe’s quartet? Nope Thanks for sharing :) Because this is the first time I heard of Anscombe’s quartet, so I found it pretty interesting as a possible example to introduce other statistics (e.g. skew and kurtosis). For some reason though, my mind thought of the Raven Paradox when reading about Anscombe’s quartet. Maybe because they both raised the question on what actually constitutes as evidence for a hypothesis? 
victor ☆ Malaysia, 20191123 09:05 (308 d 07:40 ago) (edited by victor on 20191123 09:18) @ victor Posting: # 20863 Views: 3,197 

» P.S. I intentionally used < instead of ≤ because I feel that in practice, α_{Y} will always be less than one. Note to self: Refer to first image below to see why we need to use ≤ » […] But a preliminary glance seem to suggest that the above theorem is violated by the following cases, […] Update: More careful observation using a (correctly scaled) uniform sampling distribution for θ, instead of a bivariate normal sampling distribution for θ, reveals the aforementioned violation: 