victor
☆

Malaysia,
2019-11-16 21:57
(318 d 06:02 ago)

(edited by victor on 2019-11-16 22:09)
Posting: # 20813
Views: 4,490

## What is the largest α (Alpha) & β (Beta) allowed by FDA? [General Sta­tis­tics]

Hi everyone! I'm new to pharmacokinetics, and I'm wondering what is the largest α (Alpha) & β (Beta) allowed by FDA, for each of the three hypothesis tests illustrated below (with each α & β highlighted with red borders)?
Where:
• α = probability of committing a type I error (i.e. rejecting the null hypothesis when it is actually true).
• β = probability of committing a type II error (i.e. failing to reject the null hypothesis when it is actually false).
To elaborate, if I want to test if the population distribution of an innovator drug (denoted $$\mathcal{A}$$) estimated by random-sampling 12 healthy people, is identical to the population distribution of a generic drug (denoted $$\mathcal{B}$$), by comparing 12 sample statistics (AUC, Cmax, Tmax) of $$\mathcal{A}$$ against $$\mathcal{B}$$, via three hypothesis tests that determine if there are any differences in the population means of those three statistics; i.e.:
• Are both drugs' expected AUC equal? i.e. E[AUC of $$\mathcal{A}$$ = E[AUC of $$\mathcal{B}$$]
• Are both drugs' expected Cmax equal? i.e. E[Cmax of $$\mathcal{A}$$ = E[Cmax of $$\mathcal{B}$$]
• Are both drugs' expected Tmax equal? i.e. E[Tmax of $$\mathcal{A}$$ = E[Tmax of $$\mathcal{B}$$]
then what is the largest α & β allowed by FDA, for each of the three hypothesis tests?

P.S. if you spot any mistake in my illustration below, could you kindly inform me as well? ଘ(੭*ˊᵕˋ)੭* ̀ˋ

P.S. The following post didn't submit correctly, even though the preview for it was working. So, I decided to screenshot my question instead. Hope it is acceptable :)

Edit: Category changed; see also this post #1. Link to 643KiB 2,000px photo deleted and changed to a downscaled variant. [Helmut]
Helmut
★★★

Vienna, Austria,
2019-11-17 01:26
(318 d 02:34 ago)

@ victor
Posting: # 20814
Views: 3,749

## What do you want to achieve?

Hi Victor,

I tried to reconstruct your original post as good as I could. Since it was broken before the first “$$\mathcal{A}$$”, I guess you used an UTF-16 character whereas the forum is coded in UTF-8. Please don’t link to large images breaking the layout of the posting area and forcing us to scroll our viewport. THX.

I think that your approach has same flaws.
• You shouldn’t transform the profiles but the PK metrics AUC and Cmax.
• The Null hypothesis is bioinequivalence, i.e.,
$$H_0:\mu_T/\mu_R\notin \left [ \theta_1,\theta_2 \right ]\:vs\:H_1:\theta_1<\mu_T/\mu_R<\theta_2$$ where $$[\theta_1,\theta_2]$$ are the limits of the acceptance range. Testing for a statistically significant difference is futile (i.e., asking whether treatments are equal). We are interested in a clinically relevant difference $$\Delta$$. With the common 20% we get back-transformed $$\theta_1=1-\Delta,\:\theta_2=1/(1-\Delta)$$ or 80–125%.
• Nominal $$\alpha$$ is fixed by the regulatory agency (generally at 0.05). With low sample sizes and/or high variability the actual $$\alpha$$ can be substantially lower.
• Since you have to pass both AUC and Cmax (each tested at $$\alpha$$ 0.05) the intersection-union tests keep the familywise error rate at ≤0.05.
• For given design, sample size, variability, and point estimate calculation of $$\alpha$$ and $$\beta$$ is straightforward. R-code for the package PowerTOST at the end.
• tmax follows a discrete distribution and hence, should be assessed by a nonparametric test.

library(PowerTOST)
design <- "2x2x2" # for others, see known.designs()
n      <- 24
CV     <- 0.25
PE     <- 0.95
alpha  <- power.TOST(CV = CV, n = n, theta0 = 1.25, design = design)
beta   <- 1 - power.TOST(CV = CV, n = n, theta0 = PE, design = design)
cat("alpha =", alpha,
"\nbeta  =", beta, "\n"

Gives
alpha = 0.04999527
beta  = 0.2608845

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
victor
☆

Malaysia,
2019-11-17 10:53
(317 d 17:07 ago)

(edited by victor on 2019-11-17 14:16)
@ Helmut
Posting: # 20815
Views: 3,734

## I'm seeking to understand the math behind our current regulation

» Hi Victor, I tried to reconstruct your original post as good as I could. Since it was broken before the first “$$\mathcal{A}$$”, I guess you used an UTF-16 character whereas the forum is coded in UTF-8.

Hi Helmut, Thanks for helping me out :)

Edit: after a quick experiment (click here to see screenshot), it seems that the “$$\mathcal{A}$$” I used was a UTF-8 character after all? ⊙.☉

» Please don’t link to large images breaking the layout of the posting area and forcing us to scroll our viewport. THX.

Noted, and thanks for downscaling my original image :)

» I think that your approach has same flaws.
• You shouldn’t transform the profiles but the PK metrics AUC and Cmax.

I see; I thought it would make sense for Tmax to also be transformed because of googling stuff like this:

coupled with the fact that the population distribution that is being analyzed looks a lot like a Log-normal distribution; so I thought normalizing Tmax just made sense, since almost all distributions studied in undergraduate (e.g. F-distribution used by ANOVA) are ultimately transformations of one or more standard normals. With that said, is the above stuff that I googled, wrong?

»
• The Null hypothesis is bioinequivalence, i.e.,
» $$H_0:\mu_T/\mu_R\notin \left [ \theta_1,\theta_2 \right ]\:vs\:H_1:\theta_1<\mu_T/\mu_R<\theta_2$$ where $$[\theta_1,\theta_2]$$ are the limits of the acceptance range. Testing for a statistically significant difference is futile (i.e., asking whether treatments are equal). We are interested in a clinically relevant difference $$\Delta$$. With the common 20% we get back-transformed $$\theta_1=1-\Delta,\:\theta_2=1/(1-\Delta)$$ or 80–125%.

Thanks for enlightening me that I can now restate the current standard's hypothesis in a "more familiar (undergraduate-level)" form:
$$H_0: ln(\mu_T) - ln(\mu_R) \notin \left [ ln(\theta_1), ln(\theta_2) \right ]\:vs\:H_1: ln(\theta_1) < ln(\mu_T) - ln(\mu_R) < ln(\theta_2)$$

I now realize that I was actually using the old standard's hypothesis (whose null tested for bioequivalence, instead of the current standard's null for bioinequivalence), which had problems with their α & β (highlighted in red below, cropped from this paper), thus rendering my initial question pointless, because I was analyzing an old problem; i.e. before Hauck and Anderson's 1984 paper.

»
• Nominal $$\alpha$$ is fixed by the regulatory agency (generally at 0.05). With low sample sizes and/or high variability the actual $$\alpha$$ can be substantially lower.
• Since you have to pass both AUC and Cmax (each tested at $$\alpha$$ 0.05) the intersection-union tests keep the familywise error rate at ≤0.05.

With that said, regarding the old standard's hypothesis (whose null tested for bioequivalence), I was originally curious (although it may be a meaningless problem now, but I'm still curious) on how they bounded the family-wise error rate (FWER) if α=5% for each hypothesis test, since the probability of committing one or more type I errors when performing three hypothesis tests = 1 - (1-α)^3 = 1 - (1-0.05)^3 = 14.26% (if those three hypothesis tests were actually independent).

The same question more importantly applied to β, since in the old standard's hypothesis (whose null tested for bioequivalence), "the consumer’s risk is defined as the probability (β) of accepting a formulation which is bioinequivalent, i.e. accepting H0 when H0 is false (Type II error)." (as quoted from page 212 of the same paper).

Do you know how FDA bounded the "global" α & β before 1984? Because I am curious on "what kind of secret math technique" was happening behind-the-scenes that allowed 12 random-samples to be considered "good enough by the FDA"; i.e.
• How to calculate the probability of committing one or more type I errors when performing three hypothesis tests, when the null was tested for bioequivalence (before 1984)?
• How to calculate the probability of committing one or more type II errors when performing three hypothesis tests, when the null was tested for bioequivalence (before 1984)?

ଘ(੭*ˊᵕˋ)੭* ̀ˋ
Helmut
★★★

Vienna, Austria,
2019-11-17 14:35
(317 d 13:25 ago)

@ victor
Posting: # 20816
Views: 3,796

Hi Victor,

» I did use UTF-8 though, because the following HTML works, and I could save (and reopen) it using my Windows 10's notepad.exe under UTF-8 encoding; but […]

Duno. The mysteries of HTML/CSS/php/MySQL.

» » Please don’t link to large images breaking the layout of the posting area and forcing us to scroll our viewport. THX.
»
» Noted, and thanks for downscaling my original image :)

Sorry if the downscaled image shows poor legibility. The one in full resolution here.

» I thought it would make sense for Tmax to also be transformed because of googling stuff like this:
»

Aha! A presentation by Mr Concordet of 2004. Heteroscedasticity refers to more than one distribution. A single distribution might be skewed; I guess that is what was meant. When we apply a parametric method (ANOVA, t-tests) one of the assumptions – as you correctly stated in your graph – is that residual errors follow a normal distribution. It makes sense to assume that PK metrics like AUC and Cmax follow a log-normal distribution since concentrations are bound to $$\mathbb{R}^+$$ (negative ones don’t exist and zero is excluded). However, even if this assumption would be wrong, the only important thing is that the model’s residuals are approximately normal, i.e., $$\epsilon \approx \mathcal{N}(0,1)$$. It should be noted that the t-test is fairly robust against heteroscedasticity but very sensitive to unequal sequences (crossover) and group sizes (parallel design). That’s why the FDA recommends in any case Satterthwaite’s approximation of the degrees of freedom.

tmax1 is yet another story. The distribution strongly depends on the study’s sampling schedule but is definitely discrete. Given, the underlying one likely is continuous2 but we simply don’t have an infinite number of samples in NCA. A log-transformation for discrete distributions is simply not allowed. Hence, what is stated in this slide is wrong. Many people opt for one of the variants of the Wilcoxon test to assess the difference. Not necessarily correct. The comparison of the shift in locations is only valid if distributions are equal. If not, one has to opt for the Brunner-Munzel test3 (available in the R package nparcomp).

» … coupled with the fact that the population distribution that is being analyzed looks a lot like a Log-normal distribution;

Wait a minute. It is possible to see concentration-time profiles as distributions resulting from a stochastic process. The usual statistical moments are:$$S_0=\int f(x)dx$$ $$S_1=\int x\cdot f(x)dx$$ $$S_2=\int x^2\cdot f(x)dx$$ where in PK $$x=t$$ and $$f(x)=C$$ leads to $$S_0=AUC$$ and $$S_1=AUMC$$.
AFAIK, there is no particular term for $$S_2$$ in PK though it is – rarely – used to calculate the “Variance of Residence Times” as $$VRT=S_2/S_0-(S_1/S_0)^2$$.

The intersection of $$MRT$$ (the vertical line) and $$VRT$$ (the horizontal line) is the distribution’s “Center of Gravity”. Print it and stick a needle trough it. Makes a nice whizz wheel.4

I know only one approach trying to directly compare profiles based on moment-theory (the Kullback-Leibler information criterion).5 Never tried it with real data but I guess one might face problems with variability. Les Benet once stated (too lazy to search where and when) that for a reliable estimate of AUC one has to sample in such a way that the extrapolated fraction is 5–20%. For AUMC one would need 1–5%. No idea about VRT but in my experience its variability is extreme.

» … so I thought normalizing Tmax just made sense, […]. With that said, is the above stuff that I googled, wrong?

Yes it is because such a transformation for a discrete distribution (like tmax) is not allowed.

» […] I can now restate the current standard's hypothesis in a "more familiar (undergraduate-level)" form:
» $$H_0: ln(\mu_T) - ln(\mu_R) \notin \left [ ln(\theta_1), ln(\theta_2) \right ]\:vs\:H_1: ln(\theta_1) < ln(\mu_T) - ln(\mu_R) < ln(\theta_2)$$
Correct.

» I now realize that I was actually using the old standard's hypothesis (whose null tested for bioequivalence, instead of the current standard's null for bioinequivalence) […]

Correct, again.

» […] regarding the old standard's hypothesis (whose null tested for bioequivalence), I was originally curious (although it may be a meaningless problem now, but I'm still curious) on how they bounded the family-wise error rate (FWER) if α=5% for each hypothesis test, since the probability of committing one or more type I errors when performing three hypothesis tests = 1 - (1-α)^3 = 1 - (1-0.05)^3 = 14.26% (if those three hypothesis tests were actually independent).

Here you err. We are not performing independent tests (which would call for Bonferroni’s adjustment or similar) but have to pass all tests. Hence, the IUT keeps the FWER at 5% or lower. Actually it is more conservative than the single tests themselves and will get more conservative the more the PK metrics differ. See the R code provided by Martin in this post for an example.

» The same question more importantly applied to β, since in the old standard's hypothesis (whose null tested for bioequivalence), "the consumer’s risk is defined as the probability (β) of accepting a formulation which is bioinequivalent, i.e. accepting H0 when H0 is false (Type II error)." (as quoted from page 212 of the same paper).

Correct but gone with the wind. See this post how the BE requirements of the FDA evolved.

» Do you know how the FDA bounded the "global" α & β before 1984?

α was always 0.05. In the – very – early days of the CI-inclusion approach some people wrongly used a 95% CI, which actually implies α 0.025. As the name tells the FDA’s 80/20 rule required ≥80% power or β ≤20%. Nowadays post hoc (a posteriori, retrospective) power is irrelevant. Power (1–β) is important only in designing a study where in most jurisdictions 80–90% (β 10–20%) are recommended. This allows to answer your original question:

» » » What is the largest α & β allowed by FDA?

α 0.05 (since it is fixed in the method) and β is not assessed. It can be very high (i.e., low “power”) if the assumptions leading to the sample size were not realized in the study (e.g., larger deviation of T from R and/or higher variability than assumed, higher number of dropouts than anticipated). However, quoting ElMaestro:

Being lucky is not a crime.

On the other hand, a very high producer’s risk in designing a study is like gambling and against ICH E9:

The number of subjects in a clinical trial should always be large enough to provide a reliable answer to the questions addressed.

Hopefully such a protocol is rejected by the IEC.

» […] I am curious on "what kind of secret math technique" was happening behind-the-scenes that allowed 12 random-samples to be considered "good enough by FDA".

No idea. Rooted in Babylonian numbers? See also this post. For reference-scaling the FDA requires at least 24 subjects. A minimum sample 12 is recommended in all jurisdictions. IMHO, not based on statistics (large power for low variability and T/R close to 1) but out of the blue.

1. Tmax is poor terminology. “T” denotes the absolute temperature in Kelvin whereas “t” stands for time. Hence, tmax should be used.
2. If we use PK modeling instead of NCA we get indeed a continuous distribution. I never explored how it looks so far. In many cases Cmax/tmax is poorly fit.
3. Brunner E, Munzel U. The Nonparametric Behrens-Fisher Problem: Asymptotic Theory and a Small Sample Approximation. Biometrical J. 2000; 42(1): 17–25.
doi:10.1002/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO;2-U.
4. Many, many years ago I wrote software for PK modeling (only NONLIN for mainframes was available in the early 1980s). The integrals for $$\int_{t=0}^{t=\infty}$$ are trivial. However, less so when we stop sampling at a certain time and/or we have more than one compartment. I didn’t trust in my calculus and at the end I tried the whizz wheel. Funny way of proofing things.
5. Pereira LM. Bioequivalence testing by statistical shape analysis. J Pharmacokin Pharmacodyn. 2007; 34(4): 451–84. doi:10.1007/s10928-007-9055-3.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
victor
☆

Malaysia,
2019-11-18 08:26
(316 d 19:34 ago)

(edited by victor on 2019-11-18 08:59)
@ Helmut
Posting: # 20817
Views: 3,572

Since it will take me time to properly digest and reply all the details you just shared (especially on IUT, because IUT is new to me, and I think IUT might be the missing link I was searching for to deal with dependencies in multiple hypothesis testing), hence I decided to just respond to specific concepts, one at a time :)

Also, thanks for confirming which concepts I grasped correctly, because as someone new to pharmacokinetics, it definitely helps me eliminate logic errors
(｡♥‿♥｡)

For now, I'd like to dig a little deeper to understand all the logic errors(?) that I made regarding tmax.

»
1. Tmax is poor terminology. “T” denotes the absolute temperature in Kelvin whereas “t” stands for time. Hence, tmax should be used.

Good point :) I initially chose to use Tmax because I'm used to writing random variables in uppercase, to help myself catch for syntax errors if I ever wrote something silly like E[t], when t is not a random variable, etc.

With that said...(I rearranged your reply below to fit the flow better)

» » … coupled with the fact that the population distribution that is being analyzed looks a lot like a Log-normal distribution; so I thought normalizing Tmax just made sense, since almost all distributions studied in undergraduate (e.g. F-distribution used by ANOVA) are ultimately transformations of one or more standard normals.
»
» Yes it is because such a transformation for a discrete distribution (like tmax) is not allowed.
»
» The distribution strongly depends on the study’s sampling schedule but is definitely discrete. Given, the underlying one likely is continuous2 but we simply don’t have an infinite number of samples in NCA. A log-transformation for discrete distributions is simply not allowed. Hence, what is stated in this slide is wrong. Many people opt for one of the variants of the Wilcoxon test to assess the difference. Not necessarily correct. The comparison of the shift in locations is only valid if distributions are equal. If not, one has to opt for the Brunner-Munzel test3 (available in the R package nparcomp).

Does it mean that the Nyquist criterion is not satisfied in pharmacokinetics (in terms of sampling interval)? Couldn't we just design our study’s sampling schedule to ensure that Nyquist criterion is satisfied so that we can perfectly reconstruct the original continuous-time function from the samples?

» Wait a minute. It is possible to see concentration-time profiles as distributions resulting from a stochastic process. […] I know only one approach trying to directly compare profiles based on moment-theory (the Kullback-Leibler information criterion).5

Yes, I originally viewed the population distribution that is being analyzed as a stochastic process, but I didn't dive in too deep to check if the moment-generating function of a log-normal distribution "matches" the (concentration-time profile?) model used in pharmacokinetics (because I'm still learning about the models used in pharmacokinetics), so it was more of a "visually looks a lot like a log-normal distribution" on my part :p

With that said, this specific fact you shared intrigues me...

» Les Benet once stated (too lazy to search where and when) that for a reliable estimate of AUC one has to sample in such a way that the extrapolated fraction is 5–20%. For AUMC one would need 1–5%. No idea about VRT but in my experience its variability is extreme.

I'm familiar with Taylor Series Approximations, Fourier series approximation, etc. but never really thought about how the error terms of a moment-generating function is controlled if I use, for example, a log-normal distribution to describe the concentration-time profile. It might be obvious (but I don't recall learning it in undergraduate though, did I?), but I kinda prefer spending my time learning about IUT instead of thinking about this :p, so I was wondering if you have any good reference materials for learning about how to correctly use the moment-generating function to model a distribution in real-life? (e.g. concentration-time profile)

Fun fact: as I was briefly reading about Kullback–Leibler divergence, I thought it looked familiar, then I realized I first encountered it when reading about IIT

Thanks once again for all your amazing replies!
ଘ(੭*ˊᵕˋ)੭* ̀ˋ
Helmut
★★★

Vienna, Austria,
2019-11-18 15:09
(316 d 12:51 ago)

@ victor
Posting: # 20821
Views: 3,530

Hi Victor,

» » Edit: after a quick experiment (click here to see screenshot), it seems that the “$$\mathcal{A}$$” I used was a UTF-8 character after all? ⊙.☉

Correct, and the mystery is resolved.
bin 11110000 10011101 10010011 10010000
hex F09D9390
When I paste the character it is shown in the preview but not in the post. The field of the database-table is of type utf8_general_ci, supporting only characters with a length of 3 bytes, whereas yours has 4. That’s it. In the development forum (in German, sorry) we realized that changing the type to utf8mb4_general_ci (neither of the field, the table, or the entire DB) alone can resolve the issue. It requires to rewrite all parts of the scripts handling the connection php/MySQL. Not easy and not my top priority.

My pleasure.

» Since it will take me time to properly digest and reply all the details you just shared (especially on IUT, because IUT is new to me, and I think IUT might be the missing link I was searching for to deal with dependencies in multiple hypothesis testing), …

It’s not that complicated. Let’s explore the plot:
We have three tests. The areas give their type I errors. Since we perform all at the same level, the areas are identical. The Euclidean distance between centers give the correlation of PK metrics (here they are identical as well). The FWER is given by the area of the intersection which in any case will be ≤ the nominal α.

In reality the correlation of AUC0–∞ (green) with AUC0–t (blue) is higher than the correlation of both with Cmax (red). If we would test only the AUCs, the FWER would by given again by the intersection which is clearly lower than the individual type I errors. If we add Cmax, the FWER decreases further.

Unfortunately the correlations are unknown. See this post for an example with two metrics and a reference how to deal with comparisons of multiple PK metrics.
In PowerTOST two simultaneous tests are implemented. Say we have a CV of 0.25 for both, a sample size of 28, and the highest possible correlation of 1.

library(PowerTOST)
# TIE of one test
power.TOST(CV = 0.25, n = 28, theta0 = 1.25)
# [1] 0.04999963
# TIE of two tests
power.2TOST(CV = c(0.25, 0.25), n = 28, theta0 = c(1.25, 1.25),
rho = 1, nsims = 1e6)
# [1] 0.049928

Close match (based on simulations) and no inflation of the TIE. If ρ is lower, the test will get substantially more conservative.

» » Tmax is poor terminology. “T” denotes the absolute temperature in Kelvin whereas “t” stands for time. Hence, tmax should be used.
» I initially chose to use Tmax because I'm used to writing random variables in uppercase, …

Yep, that’s fine. Unfortunately “Tmax” is used in some regulatory documents… When I review papers, I have a standard rant pointing that difference out followed by “regulations  science”.

» » The distribution strongly depends on the study’s sampling schedule but is definitely discrete. […] A log-transformation for discrete distributions is simply not allowed.
»
» Does it mean that the Nyquist criterion is not satisfied in pharmacokinetics (in terms of sampling interval)?

Duno. Sorry.

» Couldn't we just design our study’s sampling schedule to ensure that Nyquist criterion is satisfied so that we can perfectly reconstruct the original continuous-time function from the samples?

That’s wishful thinking. People tried a lot with D-optimal designs in PK. What I do is trying to set up a model and explore different sampling schemes based on the variance inflection factor. Regrettably, it rarely works. Modeling absorption is an art rather than science – especially with lag-times (delayed onset of absorption due to gastric emptying, gastric-resistant coatings, ). What if the model suggests to draw fifty blood samples (to “catch” Cmax/tmax in the majority of subjects) and you are limited to twenty? It’s always a compromise.

» […] I didn't dive in too deep to check if the moment-generating function of a log-normal distribution "matches" the (concentration-time profile?) model used in pharmacokinetics (because I'm still learning about the models used in pharmacokinetics), so it was more of a "visually looks a lot like a log-normal distribution" on my part :p

Don’t fall into the trap of visual similarity. The fact that a concentration profile after an oral dose looks similar like a log-normal distribution is a mere coincidence. In a two-compartment model (right example: distribution is three times faster than elimination) the center of gravity is outside the profile; my “whizz wheel proof” would not work any more. Or even more extreme an intravenous dose…

Of note the mean of residence times $$MRT=\frac{AUMC}{AUC}$$ is nice because we can compare different models (say, a one- with a two-compartment model). Independent from the model after MRT ~⅔ of the drug are eliminated. For years I try to educate clinicians to abandon half lives (which are less informative) but old believes die hard (see there, slides 24–28).

If you want to dive into the Kullblack-Leiber divergence note that any distributions can be compared.

The fact that we log-transform AUC and Cmax in BE has three reasons:
1. The starting point: We need additive effects in the model.
2. Empiric evidence that both PK metrics are skewed to the right. However, as said before only the model’s residuals have to be approximately normal. Hence, in principle any transformation pulling the right tail towards the median would do the job.
3. The theoretical justification: The basic equation of PK is $$AUC=\frac{f\cdot D}{CL}$$. We are interested in comparing the relative bioavailability of Test ($$f_T$$) with the one of Reference ($$f_R$$) as $$\frac{f_T}{f_R}$$. This gives a set of two equations with two knowns ($$AUC_T,AUC_R$$) and four unknowns $$(D_T,D_R,CL_T,CL_R)$$. Since in most jurisdictions a correction for actual content is not allowed, we have to assume $$D_T=D_R$$. Clearance is a property of the drug not the formulation and therefore, we assume further $$CL_T=CL_R$$. Then we cancel these four variables out from the set of equations and obtain finally $$\frac{f_T}{f_R}=\frac{AUC_T}{AUC_R}$$. Mission accomplished but it comes with a price. If we face between-occasion variability of clearance it goes straight into the residual variance. That’s the reason why we need high sample sizes for highly variable drugs – even if the absorption (the property of the formulation) is similar and shows low variability.
You see that the original model is multiplicative (based on ratios) and with the log-transformation we get the additive one we need. Hence, the selection of the transformation is not arbitrary.

Not sure what the current state of affairs are but in the past the Malaysian authority preferred the log10-transformation over loge. Not a particular good idea since PK is based on exponential functions. Furthermore, it makes our life miserable. The coefficient of variation based on the residual error of log-transformed data is given as $$CV=\sqrt{\textrm{e}^{MSE}-1}$$. That’s the only one you find in textbooks and guidelines. If one used a log10-transformation the appropriate formula is $$CV=\sqrt{10^{\textrm{log}_{e}(10)\cdot MSE}-1}$$. I have seen wrong sample size estimations where the former was used instead of the latter.

» » Les Benet once stated (too lazy to search where and when) that for a reliable estimate of AUC one has to sample in such a way that the extrapolated fraction is 5–20%. For AUMC one would need 1–5%.
»
» I'm familiar with Taylor Series Approximations, Fourier series approximation, etc. but never really thought about how the error terms of a moment-generating function is controlled if I use, for example, a log-normal distribution to describe the concentration-time profile

Good luck. Not that difficult.

» … so I was wondering if you have any good reference materials for learning about how to correctly use the moment-generating function to model a distribution in real-life? (e.g. concentration-time profile)

(1) No idea and (2) the approach of seeing a PK profile as a distribution is pretty exotic.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
victor
☆

Malaysia,
2019-11-18 20:16
(316 d 07:43 ago)

(edited by victor on 2019-11-18 20:49)
@ Helmut
Posting: # 20824
Views: 3,433

Wow! Yet another awesome answer (and a big round of applause for solving the UTF-8 mystery)!

Honestly, I feel like a caveman who has been gifted a powerful handphone (by you), equipped with GPS, google map, etc. to help me navigate the forest of pharmacokinetics, while I'm still looking at bushes (not even trees yet!) and using that handphone as a mirror

Thanks for all the great pointers, keywords, and properties to look for :)
Especially the idea behind IUT, because its my first time hearing that correlation had such a geometric interpretation!
(｡♥‿♥｡)

» We have three tests. The areas give their type I errors. Since we perform all at the same level, the areas are identical. The Euclidean distance between centers give the correlation of PK metrics (here they are identical as well). The FWER is given by the area of the intersection which in any case will be ≤ the nominal α.
»
» In reality the correlation of AUC0–∞ (green) with AUC0–t (blue) is higher than the correlation of both with Cmax (red). If we would test only the AUCs, the FWER would by given again by the intersection which is clearly lower than the individual type I errors. If we add Cmax, the FWER decreases further.

Now I'm even MORE looking forward to studying IUT in detail, because it reminds me of when I first learned that I could use SVD to geometrically visualize the error ellipsoid of a covariance matrix. I found it beautiful :)

Thanks Helmut, for all your help
(*ˊᗜˋ*)/ᵗᑋᵃᐢᵏ ᵞᵒᵘ*

P.S. I'm spending time reading on IUT now, hence the short reply; but I thought I'd end with a nice little picture for memory :p
Helmut
★★★

Vienna, Austria,
2019-11-19 12:01
(315 d 15:59 ago)

@ victor
Posting: # 20827
Views: 3,370

## Books & intersection-union

Hi Victor,

» Honestly, I feel like a caveman […]

C’mon! Your skills in maths are impressive. If you want to dive deeper into the matter:
1. Chow SC, Liu JP. Design and Analysis of Bioavailability and Bioequivalence Studies. Boca Raton: CRC Press; 3rd ed. 2009.
2. Patterson SD, Jones B. Bioequivalence and Statistics in Clinical Pharmacology. Boca Raton: CRC Press; 2nd ed. 2016.
3. Hauschke D, Steinijans VW, Pigeot I. Bioequivalence Studies in Drug Development. Chichester: John Wiley; 2007.
4. Jones B, Kenward MG. Design and Analysis of Cross-Over Trials. Boca Raton: CRC Press; 3rd ed. 2015.
5. Julious SA. Sample Sizes for Clinical Trials. Boca Raton: CRC Press; 2010.
6. Senn S. Cross-over Trials in Clinical Research. Chichester: John Wiley; 2nd ed. 2002.
7. Wellek S. Testing statistical hypotheses of equivalence. Boca Raton: CRC Press; 2003.
The last one is demanding but contains a proof of the intersection-union principle in Chapter 7 (Multisample tests for equivalence). Excerpt (he uses $$H$$ for the Null and $$K$$ for the alternative hypothesis, respectively):

The proof of the result is almost trivial, at least if one is willing to adopt some piece of the basic formalism customary in expositions of the abstract theory of statistical hypothesis testing methods. […] The condition we have to verify, reads […] as follows:
$$E_{(\eta_1,\ldots,\eta_q)}(\phi)\leq\alpha\;\textrm{for all}\;(\eta_1,\ldots,\eta_q)\in H\tag{7.3}$$ where $$E_{(\eta_1,\ldots,\eta_q)}(\cdot)$$ denotes the expected value computed under the parameter constellation $$(\eta_1,\ldots,\eta_q)$$. […]
In order to apply the result to multisample equivalence testing problems, let $$\theta_j$$ be the parameter of interest (e.g., the expected value) for the ith distribution under comparison, and require of a pair $$(i,j)$$ of distributions equivalent to each other that the statement $$K_{(i,j)}:\,\rho(\theta_i,\theta_j)<\epsilon,\tag{7.4}$$ holds true with $$\rho(\cdot,\cdot)$$ denoting a suitable measure of distance between parameters. Suppose furthermore that for each $$(i,j)$$ a test $$\phi_{(i,j)}$$ of $$H_{(i,j)}:\,\rho(\theta_i,\theta_j)\geq \epsilon$$ versus $$K_{(i,j)}:\,\rho(\theta_i,\theta_j)< \epsilon$$ is available whose rejection probability is $$\leq \alpha$$ at any point $$(\theta_1,\ldots,\theta_k)$$ in the full parameter space such that $$\rho(\theta_i,\theta_j)\geq \epsilon$$. Then, by the intersection-union principle, deciding in favour of “global quivalence” if and only if equivalence can be established for all $$(_{2}^{k})$$ possible pairs, yields a valid level-$$\alpha$$ test for $$H:\,\underset{i<j}{\max}\{\rho(\theta_i,\theta_j)\}\geq \epsilon\;\textrm{vs.}\;K:\,\underset{i<j}{\max}\{\rho(\theta_i,\theta_j)\}<\epsilon\tag{7.5}$$

» I thought I'd end with a nice little picture for memory :p

Nice picture! Do you know Anscombe’s quartet?

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
victor
☆

Malaysia,
2019-11-22 01:28
(313 d 02:32 ago)

(edited by victor on 2019-11-22 02:18)
@ Helmut
Posting: # 20857
Views: 3,301

## My progress on IUT so far

» » Honestly, I feel like a caveman […]»
» C’mon! Your skills in maths are impressive.

Thanks Helmut I appreciate it a lot.
(*^‿^*)
I'm also grateful that you shared that excerpt (more on this later, because Chapter 7 wasn't available on books.google.com).

Thankfully, I found enough free time to analyze the following IUT theorem using my notes when studying Theorem 8.3.4 from page 122 of this lecture notes. (I don't have access to any of your recommended textbooks yet though).

P.S. I intentionally used < instead of ≤ because I feel that in practice, αY will always be less than one. Besides that, my Gaussian level sets didn't scale correctly against my hand-drawn background, so their shapes are compromised; they should actually have the same covariance matrix.

I'm now thinking about all the cases when H0's covariance matrix is different from the true covariance matrix (such as how β would look like) to see how IUT really dealt with dependencies. But a preliminary glance seem to suggest that the above theorem is violated by the following cases, so I definitely need to think deeper on what the theorem actually states (i.e. what it means for IUT to be a level α test of H0 versus H1) to truly understand how the following cases affect the "global" α and β:

With that said, now that I am starting to get a feel for IUT, I feel that I am getting closer to truly understand the following facts you shared:
»
• Since you have to pass both AUC and Cmax (each tested at $$\alpha$$ 0.05) the intersection-union tests keep the familywise error rate at ≤0.05.
• We have three tests. The areas give their type I errors. Since we perform all at the same level, the areas are identical. […] The FWER is given by the area of the intersection which in any case will be ≤ the nominal α.

As of now though, I haven't managed to see the "new" geometric interpretation of correlation (yet), so the following facts are still not within my grasp:

» The FWER gets more conservative the more the PK metrics differ.
» The Euclidean distance between centers give the correlation of PK metrics (here they are identical as well). The FWER is given by the area of the intersection which in any case will be ≤ the nominal α.
»
» In reality the correlation of AUC0–∞ (green) with AUC0–t (blue) is higher than the correlation of both with Cmax (red). If we would test only the AUCs, the FWER would by given again by the intersection which is clearly lower than the individual type I errors. If we add Cmax, the FWER decreases further.

»
»

The proof of the result is almost trivial, at least if one is willing to adopt some piece of the basic formalism customary in expositions of the abstract theory of statistical hypothesis testing methods. […] The condition we have to verify, reads […] as follows:
» $$E_{(\eta_1,\ldots,\eta_q)}(\phi)\leq\alpha\;\textrm{for all}\;(\eta_1,\ldots,\eta_q)\in H\tag{7.3}$$ where $$E_{(\eta_1,\ldots,\eta_q)}(\cdot)$$ denotes the expected value computed under the parameter constellation $$(\eta_1,\ldots,\eta_q)$$. […]»
»     In order to apply the result to multisample equivalence testing problems, let $$\theta_j$$ be the parameter of interest (e.g., the expected value) for the ith distribution under comparison, and require of a pair $$(i,j)$$ of distributions equivalent to each other that the statement $$K_{(i,j)}:\,\rho(\theta_i,\theta_j)<\epsilon,\tag{7.4}$$ holds true with $$\rho(\cdot,\cdot)$$ denoting a suitable measure of distance between parameters. Suppose furthermore that for each $$(i,j)$$ a test $$\phi_{(i,j)}$$ of $$H_{(i,j)}:\,\rho(\theta_i,\theta_j)\geq \epsilon$$ versus $$K_{(i,j)}:\,\rho(\theta_i,\theta_j)< \epsilon$$ is available whose rejection probability is $$\leq \alpha$$ at any point $$(\theta_1,\ldots,\theta_k)$$ in the full parameter space such that $$\rho(\theta_i,\theta_j)\geq \epsilon$$. Then, by the intersection-union principle, deciding in favour of “global quivalence” if and only if equivalence can be established for all $$(_{2}^{k})$$ possible pairs, yields a valid level-$$\alpha$$ test for $$H:\,\underset{i<j}{\max}\{\rho(\theta_i,\theta_j)\}\geq \epsilon\;\textrm{vs.}\;K:\,\underset{i<j}{\max}\{\rho(\theta_i,\theta_j)\}<\epsilon\tag{7.5}$$

»

I'm thinking that the excerpt you shared contains the crucial info for me to see the "new" geometric interpretation of correlation, because of the following statement:
» […] with $$\rho(\cdot,\cdot)$$ denoting a suitable measure of distance between parameters.

but I'm very confused by the excerpt's notations, because I couldn't find their corresponding notations in the aforementioned lecture notes (nor by googling); in particular:
【・ヘ・?】

Thanks in advance for the clarification

» Do you know Anscombe’s quartet?

Nope Thanks for sharing :) Because this is the first time I heard of Anscombe’s quartet, so I found it pretty interesting as a possible example to introduce other statistics (e.g. skew and kurtosis).

For some reason though, my mind thought of the Raven Paradox when reading about Anscombe’s quartet. Maybe because they both raised the question on what actually constitutes as evidence for a hypothesis?
victor
☆

Malaysia,
2019-11-23 09:05
(311 d 18:55 ago)

(edited by victor on 2019-11-23 09:18)
@ victor
Posting: # 20863
Views: 3,207

## Update: Counterexamples

» P.S. I intentionally used < instead of ≤ because I feel that in practice, αY will always be less than one.

Note to self: Refer to first image below to see why we need to use ≤

» […] But a preliminary glance seem to suggest that the above theorem is violated by the following cases, […]
Update: More careful observation using a (correctly scaled) uniform sampling distribution for θ, instead of a bivariate normal sampling distribution for θ, reveals the aforementioned violation: