Bioequivalence and Bioavailability Forum

Some answers [General Statistics]

posted by Helmut – Vienna, Austria, 2019-11-17 15:35 (1619 d 11:52 ago) – Posting: # 20816
Views: 7,467

Hi Victor,

❝ I did use UTF-8 though, because the following HTML works, and I could save (and reopen) it using my Windows 10's notepad.exe under UTF-8 encoding; but […]

Duno. The mysteries of HTML/CSS/php/MySQL. ;-)

❝ ❝ Please don’t link to large images breaking the layout of the posting area and forcing us to scroll our viewport. THX.

❝

❝ Noted, and thanks for downscaling my original image :)

Sorry if the downscaled image shows poor legibility. The one in full resolution here.

❝ I thought it would make sense for T_max to also be transformed because of googling stuff like this:

❝ [image]

Aha! A presentation by Mr Concordet of 2004. Heteroscedasticity refers to more than one distribution. A single distribution might be skewed; I guess that is what was meant. When we apply a parametric method (ANOVA, t-tests) one of the assumptions – as you correctly stated in your graph – is that residual errors follow a normal distribution. It makes sense to assume that PK metrics like AUC and C_max follow a log-normal distribution since concentrations are bound to $\mathbb{R}^+$ (negative ones don’t exist and zero is excluded). However, even if this assumption would be wrong, the only important thing is that the model’s residuals are approximately normal, i.e., $\epsilon \approx \mathcal{N}(0,1)$. It should be noted that the t-test is fairly robust against heteroscedasticity but very sensitive to unequal sequences (crossover) and group sizes (parallel design). That’s why the FDA recommends in any case Satterthwaite’s approximation of the degrees of freedom.

t_max¹ is yet another story. The distribution strongly depends on the study’s sampling schedule but is definitely discrete (i.e., on an ordinal scale). Given, the underlying one likely is continuous² but we simply don’t have an infinite number of samples in NCA. A log-transformation for discrete distributions is simply not allowed. Hence, what is stated in this slide is wrong. Many people opt for one of the variants of the Wilcoxon test to assess the difference. Not necessarily correct. The comparison of the shift in locations is only valid if distributions are equal. If not, one has to opt for the Brunner-Munzel test³ (available in the R package nparcomp).

❝ … coupled with the fact that the population distribution that is being analyzed looks a lot like a Log-normal distribution;

Wait a minute. It is possible to see concentration-time profiles as distributions resulting from a stochastic process. The usual statistical moments are:$$S_0=\int f(x)dx$$ $$S_1=\int x\cdot f(x)dx$$ $$S_2=\int x^2\cdot f(x)dx$$ where in PK $x=t$ and $f(x)=C$ leads to $S_0=AUC$ and $S_1=AUMC$.
AFAIK, there is no particular term for $S_2$ in PK though it is – rarely – used to calculate the “Variance of Residence Times” as $VRT=S_2/S_0-(S_1/S_0)^2$.

[image]

The intersection of $MRT$ (the vertical line) and $VRT$ (the horizontal line) is the distribution’s “Center of Gravity”. Print it and stick a needle trough it. Makes a nice whizz wheel.⁴ :-D

I know only one approach trying to directly compare profiles based on moment-theory (the Kullback-Leibler information criterion).⁵ Never tried it with real data but I guess one might face problems with variability. Les Benet once stated (too lazy to search where and when) that for a reliable estimate of AUC one has to sample in such a way that the extrapolated fraction is 5–20%. For AUMC one would need 1–5%. No idea about VRT but in my experience its variability is extreme.

❝ … so I thought normalizing T_max just made sense, […]. With that said, is the above stuff that I googled, wrong?

Yes it is because such a transformation for a discrete distribution (like t_max) is not allowed.

❝ […] I can now restate the current standard's hypothesis in a "more familiar (undergraduate-level)" form:

❝ $$H_0: ln(\mu_T) - ln(\mu_R) \notin \left [ ln(\theta_1), ln(\theta_2) \right ]\:vs\:H_1: ln(\theta_1) < ln(\mu_T) - ln(\mu_R) < ln(\theta_2)$$

Correct.

❝ I now realize that I was actually using the old standard's hypothesis (whose null tested for bioequivalence, instead of the current standard's null for bioinequivalence) […]

Correct, again.

❝ […] regarding the old standard's hypothesis (whose null tested for bioequivalence), I was originally curious (although it may be a meaningless problem now, but I'm still curious) on how they bounded the family-wise error rate (FWER) if α=5% for each hypothesis test, since the probability of committing one or more type I errors when performing three hypothesis tests = 1 - (1-α)^3 = 1 - (1-0.05)^3 = 14.26% (if those three hypothesis tests were actually independent).

Here you err. We are not performing independent tests (which would call for Bonferroni’s adjustment or similar) but have to pass all tests. Hence, the IUT keeps the FWER at 5% or lower. Actually it is more conservative than the single tests themselves and will get more conservative the more the PK metrics differ. See the R code provided by Martin in this post for an example.

❝ The same question more importantly applied to β, since in the old standard's hypothesis (whose null tested for bioequivalence), "the consumer’s risk is defined as the probability (β) of accepting a formulation which is bioinequivalent, i.e. accepting H₀ when H₀ is false (Type II error)." (as quoted from page 212 of the same paper).

Correct but gone with the wind. See this post how the BE requirements of the FDA evolved.

❝ Do you know how the FDA bounded the "global" α & β before 1984?

α was always 0.05. In the – very – early days of the CI-inclusion approach some people wrongly used a 95% CI, which actually implies α 0.025. As the name tells the FDA’s 80/20 rule required ≥80% power or β ≤20%. Nowadays post hoc (a posteriori, retrospective) power is irrelevant. Power (1–β) is important only in designing a study where in most jurisdictions 80–90% (β 10–20%) are recommended. This allows to answer your original question:

❝ ❝ ❝ What is the largest α & β allowed by FDA?

α 0.05 (since it is fixed in the method) and β is not assessed. It can be very high (i.e., low “power”) if the assumptions leading to the sample size were not realized in the study (e.g., larger deviation of T from R and/or higher variability than assumed, higher number of dropouts than anticipated). However, quoting ElMaestro:

Being lucky is not a crime.

On the other hand, a very high producer’s risk in designing a study is like gambling and against ICH E9:

The number of subjects in a clinical trial should always be large enough to provide a reliable answer to the questions addressed.

Hopefully such a protocol is rejected by the IEC.

❝ […] I am curious on "what kind of secret math technique" was happening behind-the-scenes that allowed 12 random-samples to be considered "good enough by FDA".

No idea. Rooted in Babylonian numbers? See also this post. For reference-scaling the FDA requires at least 24 subjects. A minimum sample 12 is recommended in all jurisdictions. IMHO, not based on statistics (large power for low variability and T/R close to 1) but out of the blue.

T_max is poor terminology. “T” denotes the absolute temperature in Kelvin whereas “t” stands for time. Hence, t_max should be used.
If we use PK modeling instead of NCA we get indeed a continuous distribution. I never explored how it looks so far. In many cases C_max/t_max is poorly fit.
Brunner E, Munzel U. The Nonparametric Behrens-Fisher Problem: Asymptotic Theory and a Small Sample Approximation. Biometrical J. 2000; 42(1): 17–25.
doi:10.1002/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO;2-U.
Many, many years ago I wrote software for PK modeling (only NONLIN for mainframes was available in the early 1980s). The integrals for $\int_{t=0}^{t=\infty}$ are trivial. However, less so when we stop sampling at a certain time and/or we have more than one compartment. I didn’t trust in my calculus and at the end I tried the whizz wheel. Funny way of proofing things.
Pereira LM. Bioequivalence testing by statistical shape analysis. J Pharmacokin Pharmacodyn. 2007; 34(4): 451–84. doi:10.1007/s10928-007-9055-3.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

What is the largest α (Alpha) & β (Beta) allowed by FDA? victor 2019-11-16 21:57 [General Statistics]
- What do you want to achieve? Helmut 2019-11-17 01:26
  - I'm seeking to understand the math behind our current regulation victor 2019-11-17 10:53
    - Some answersHelmut 2019-11-17 14:35
      - Wow! Amazing answers! victor 2019-11-18 08:26
        
        More answers Helmut 2019-11-18 15:09
        
        Wow! More amazing answers! victor 2019-11-18 20:16
        
        Books & intersection-union Helmut 2019-11-19 12:01
        
        My progress on IUT so far victor 2019-11-22 01:28
        
        Update: Counterexamples victor 2019-11-23 09:05

Some answers [General Sta­tis­tics]

Complete thread:

Some answers [General Statistics]