Bioequivalence and Bioavailability Forum

Main page Policy/Terms of Use Abbreviations Latest Posts

 Log in |  Register |  Search

Back to the forum  2018-06-20 08:00 CEST (UTC+2h)
Smitha
Junior

India,
2015-02-02 04:21

Posting: # 14344
Views: 12,797
 

 Study conduct in groups [Design Issues]

Dear all,
If a Bioequivalence study has to be conducted in 2 groups (say 20 volunteers each), does the FDA or EMA require that all 40 volunteers be recruited into the study before Group 1 Period 1 dosing? :confused:

If yes, pls provide also the relevant reference/s

Would appreciate responses to this query.

Thanks and regards,
Smitha
ElMaestro
Hero

Denmark,
2015-02-02 08:21

@ Smitha
Posting: # 14347
Views: 11,667
 

 Study conduct in groups

Hi Smitha,

» If a Bioequivalence study has to be conducted in 2 groups (say 20 volunteers each), does the FDA or EMA require that all 40 volunteers be recruited into the study before Group 1 Period 1 dosing? :confused:

That's an interesting question. At least, I never gave it any thought, but of course that doesn't mean much.
If I get you right you are asking if you you need 40 signed ICF's before dosing your first subject. No guideline discusses it, I think.

I can't offer an answer, but I think I would play it safe. In a trial with 20 subject and 2 backups you would ordinarily have all 22 ICF's signed before the first subject is dosed. If the backups are needed the trial is in a sense having two groups, right? Only the second group is rather small, but in a sense it serves the same purposes: getting the desired number of subjects.
Also if you start a study when 20 ICFs are available and you cannot execute group 2 because you for some reason are running out of eligibles then all manners of GCP hell could ensue. Thus, I am leaning towards treating 20+20 like 20+2.

This is just one opinion and not a very qualified one. I hope others will chime in as I believe there might also be arguments for doing it the other way. So I will go make some popcorn and will watch this thread with interest.

if (3) 4

Best regards,
ElMaestro

"(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018.
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2015-02-02 13:08

@ Smitha
Posting: # 14351
Views: 11,877
 

 Study conduct in groups

Hi Smitha,

please search the forum before posting. We discussed this issue numerous times before.

The FDA stated in letters to applicants (nothing in any guidance!):

If all of the following criteria are met, it may not be necessary to test for group effects in the model:
  • the clinical study takes place at one site;
  • all study subjects have been recruited from the same enrollment pool;
  • all of the subjects have similar demographics; and
  • all enrolled subjects are randomly assigned to treatment groups at study outset
In this latter case, the appropriate statistical model would include only the factors Sequence, Period, Treatment and Subject(nested within sequence).


I you want to avoid trouble with a statistical model incorporating a group-term (which might turn out to be significant by chance) I would go with 40 in your case.

No idea about the EMA.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Smitha
Junior

India,
2015-02-04 04:30

@ Helmut
Posting: # 14370
Views: 11,658
 

 Study conduct in groups

Dear Helmut and ElMaestro,
Thank you for the responses.

Helmut- If we conduct the study in 2 or more groups, then we include group effect as part of the analysis.
The concern was more with the recruitment of volunteers of all groups before the dosing of Group 1 Period 1 volunteers. Solicit your opinion on this aspect.

Thanks,
Smitha
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2015-02-04 12:57

@ Smitha
Posting: # 14371
Views: 11,557
 

 Study conduct in groups

Hi Smitha,

» If we conduct the study in 2 or more groups, then we include group effect as part of the analysis.

Why?

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
felipeberlinski
Junior

Brazil,
2015-02-04 22:36

@ Helmut
Posting: # 14372
Views: 11,492
 

 Study conduct in groups

Why?

Once I have questioned the same inclusion of group effect analysis and the biostat said that this effect could appear when you cannot give the treatments to whole sample size at the same time/day. (statement 1)

However group effect is not a big deal such as sequence, treatment or period effect. (statement 2)

Just taking a ride on the initial clarification, could someone explains these "statements" to me?

Tks!
ElMaestro
Hero

Denmark,
2015-02-04 23:58

@ felipeberlinski
Posting: # 14373
Views: 11,469
 

 Study conduct in groups

Hi Felipe,

» Once I have questioned the same inclusion of group effect analysis and the biostat said that this effect could appear when you cannot give the treatments to whole sample size at the same time/day. (statement 1)

Probably a misunderstanding. The biostatistician was prolly thinking loud: The need for dosing in groups arise when you cannot dose all subjects in the same time/day.

» However group effect is not a big deal such as sequence, treatment or period effect. (statement 2)

Those are not considered big deals either, but it also comes down to who's judging your dossier.
In (average) BE you test the null hypothesis:
Test differs from Ref.

In contrast, when you look at an ANOVA with all its impressive P-values they are test of other null hypotheses:
Test equals Ref.
Sequence TR equal Sequence RT.
Period 1 equals Period 2.
Subjects 1 equals Subject 2 equals .... subject N.
Group 1 equals Group 2.

All these tests are interesting and look like advanced science. They don't deal so much with the important issue at hand.
The real BE null hypothesis just requires the two treatment effects as the residual variance. All these three come from the model fit, not from the anova, although you'll often see it explained wrongly in certain books or guidelines. Simply stated you do (well, should) not really need an ANOVA to judge bioequivalence.

Note an important pitfall related to the above: When you look at an ANOVA with a significant treatment effect, then you are rejecting the null hypothesis Test equals Ref, but this doesn't say anything definitive about your acceptance or rejection of the null hypothesis: Test differs from Ref.
The reason is the equivalence margin.

Finally, group is a between-factor. It does not affect the residual variance if the model is specified correctly.

I hope this helps.

if (3) 4

Best regards,
ElMaestro

"(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018.
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2015-02-05 00:49

@ felipeberlinski
Posting: # 14374
Views: 11,721
 

 Significant ≠ relevant

Hi Felipe,

» […] the biostat said that this effect could appear when you cannot give the treatments to whole sample size at the same time/day. (statement 1)

If he/she meant by “appear” to be “statistically significant”: False. Dive into your database of studies (performed in one group), arbitrarily code the first half of subjects with group=1 and the second with group=2. Run a model including a group term. If you set the significance limit to 0.05, I bet that you will see a significant “group effect” in ~1/20 of studies – although we know that the data originate from one group. That’s called “false positive” or in this particular case a statistical artifact.
I don’t like to test for effects which are either irrelevant or have no consequences. If a study was performed in two groups and we assess the p-values, there are three possible results:
  1. p ≤0.05: Groups differ. We should not pool them. Hopefully groups were not split 1:1, but one of them was the maximum capacity of the clinical site. Example: Based on a CV of 27% the sample size was estimated as 32. The capacity of the site is 24. If you split groups 16:16, power drops from 80.4% to 41.4%. What if you are lucky and show BE in one group, but not in the other? Do you think that regu­la­tors would believe the results of the “nice group” and ignore the other (failed) one? On the other hand, no regulator would ask you questions if you have unequal group sizes and base the decision on the larger group. If the split is 24:8, power in the larger group will still be 66.7%. Not sooo bad. Like rolling a die and bet on even/odd. ;-)
  2. p ≤0.05: Groups don’t differ, but the result is a false positive. Bad luck. All the nasty stuff from above is applicable.
  3. p >0.05: Groups don’t differ. Happy pooling.
Note that this test is based on the between-subject variability, which has poor power with sample sizes generally used in crossovers. Therefore, some opt for a level of 0.10 instead of 0.05. Expect a false positive in ~1/10 of studies. Splendid idea!
Hence, we try to keep groups as similar as possible by design. I have stated above con­di­ti­ons where the FDA accepts no group-term in the model. I don’t like that “experts” browse the internet for SAS-code any apply it without thinking about the consequences. I got the impression from Smitha’s post that they use a group term routinely – which will be sig­ni­fi­cant at the level of the test by pure chance. Therefore, I asked why

» However group effect is not a big deal such as sequence, treatment or period effect. (statement 2)
  • Sequence (actually unequal carryover): False. Since it cannot be properly handled in a 2×2 crossover (Freeman showed that 25 years ago) it should be avoided by design. There­fore, the EMA’s GL specifically states that this effect should not be tested and carryover avoided by a sufficiently long washout.
  • Treatment: False. Even for a very small difference with a high sample size size you will see a significant result because power increases (see this post). Is it relevant (“big deal”? Not at all. In most regulations the minimum sample size is 12. Therefore, you see a significant effect regularly for CVs ≤10%. In Brazil (min. 24) you will see it every other day. I guess you have a standard sentence to discuss that in the report. ;-)
  • Period: False. In a crossover the model will care for it. Both T and R will be ~equally affect­ed (unless the study is extremely imbalanced). Try it: Take the data of any study and multiply all values of the second period by 10. Does the CI of the PE change?

Edit: Just saw ElMaestro’s post above. What really would worry me is a p >0.05 of the subject effect. Could give a hint that subjects were a bunch of monozygotic quadruplets. The model requires that subjects are independent… Don’t overdo standardization.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Astea
Regular

Russia,
2016-03-24 20:10

@ Helmut
Posting: # 16137
Views: 9,849
 

 Significant ≠ relevant

Dear All!

I noticed it became fashionably for russian regulatories to ask to include group effect in the analyses even if in the protocal were stated the abovementioned rules. There are some more detailed situations when we were asked to take the effects into the ANOVA model. First is when the study is divided not only by groups by logystical reasons, but by groups by some other reasons (namely, different blood collection schemes). And the second is when there were drop-outs in the study and subjects were replaced by doubles (the date of the visit was to be included into the statistical model). I think that including term to ANOVA to prove it's significance is a bit naive because the true reasons may be various ("We shall see what we shall see"). What is your opinion on that topic?

And some practical questions:
  1. Which ANOVA model should be preffered in that cases Group, Group x Sequence or... ? (would be very grateful for link)?
  2. As I understand the residual variance depends on the quantity of terms (and even the more number of terms the less residual). Can we perform 2 different ANOVA models: first to exclude group effect, and second - to make a standard treatm+period+seq+sub(seq) calcaulation?
  3. What if we do get significant group term? Can we somehow make sure that number of subjects from only one group is sufficient for the study? What else can we take in such a case?

Grateful for your answers!
ElMaestro
Hero

Denmark,
2016-03-24 23:12

@ Astea
Posting: # 16138
Views: 9,763
 

 Significant ≠ relevant

Hi Astea,

» I think that including term to ANOVA to prove it's significance is a bit naive because the true reasons may be various ("We shall see what we shall see"). What is your opinion on that topic?

Correct. Not much likely to be significant anyway, but it can certainly be done. You do probably not run substantial risk by doing so just do it if they ask for it.

» And some practical questions:
» 1. Which ANOVA model should be preffered in that cases Group, Group x Sequence or... ?(would be very grateful for link)?

Group x Sequence?? Why not just group, tested against the between subject MS?

» 2. As I understand the residual variance depends on the quantity of terms (and even the more number of terms the less residual). Can we perform 2 different ANOVA models: first to exclude group effect, and second - to make a standard treatm+period+seq+sub(seq) calcaulation?

Not totally sure, but group would be a between-factor, so I think the residual (and its df) will be the same with and without Group.

» 3. What if we do get significant group term? Can we somehow make sure that number of subjects from only one group is sufficient for the study? What else can we take in such a case?

Well, in my opinion that would not change much. You still include both groups in the calculation.

if (3) 4

Best regards,
ElMaestro

"(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018.
zizou
Junior

Plzeň, Czech Republic,
2016-03-25 21:41
(edited by zizou on 2016-03-26 00:47)

@ Astea
Posting: # 16141
Views: 9,666
 

 Significant ≠ relevant

Hi everybody and nobody.

» And the second is when there were drop-outs in the study and subjects were replaced by doubles

Personally I don't like the option of replacing of subjects. But some sponsors require that.
For example 26 subjects are needed for 80% power. Expected drop-out rate xx%. Sample size 32. When number of subjects for BE evaluation is lower than 26 subjects, additional subjects will be treated.
Ok. In my poor practice I was lucky and never got such "grouped" data for evaluation x). Nevertheless I am afraid of that: I will have results from 25 subjects and from another 1 subject (formerly alternate). Then "group" effect can be significant purely by chance. Of course the group effect (after possible replacing of subjects) is not mentioned in protocol, but if it becomes required to test it. ...pfff... sponsor just looses money by treating 1 subject separately when only the larger group of 25 subjects will be evaluated in the final (if group effect tested with significant result - not expected of course).
Not mentioning about the option that 26 subjects will complete clinical part. Bioassay will be processed. Then pharmacokineticist gets the concentrations and he figures out that one subject has pre-dose concentration in period 2 higher than 5% of corresponding Cmax. Then almost all has been done and according to the protocol 1 subject (one of alternates) has to come in the clinical part. GRRR! (not expected as well)

» Can we perform 2 different ANOVA models: first to exclude group effect, and second - to make a standard treatm+period+seq+sub(seq) calcaulation?

It seems Ok. for me. However it should be stated in protocol..., when it is after deficiency letter, it should be stated in that letter :-D .

» What if we do get significant group term? Can we somehow make sure that number of subjects from only one group is sufficient for the study? What else can we take in such a case?

You can calculate post-hoc power (sometimes required by regulatory when sample size was calculated with assumptions "GMR in 0.95-1.05 and intra-subject CV x%" and true GMR was outside the expected interval or intra-subject CV was higher than expected. When the results are better than expected it could be possible to show that number of subjects from only one group is sufficient (especially when only one replaced subject was in the second group).
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2016-03-26 14:46

@ zizou
Posting: # 16142
Views: 9,661
 

 Loss of power etc.

Hi zizou,

» Personally I don't like the option of replacing of subjects. But some sponsors require that. […]

You gave nice examples! I think that part of the job of CROs is to educate their customers. The potential loss of power caused by dropouts is overrated by many. A handy educational tool (plots for dummies!) comes with the functions pa.ABE() and pa.scABE() in PowerTOST.

» For example 26 subjects are needed for 80% power. Expected drop-out rate xx%. Sample size 32. When number of subjects for BE evaluation is lower than 26 subjects, additional subjects will be treated.

Is it really worth the trouble it may cause in the analysis?

library(PowerTOST)
pi          <- 0.80
theta0      <- 0.95
CV          <- 0.24
do.rate     <- 0.15
worst       <- 4 # lower than desired
n0 <- sampleN.TOST(CV=CV, theta0=theta0,
                   targetpower=pi,
                   design="2x2x2",
                   print=FALSE)[["Sample size"]]
n1 <- ceiling(n0/(1-do.rate)/2)*2 # adjust and round up to even
n  <- n1:(n0-worst)
pw <- vector()
for(j in seq_along(n)) {
  pw[j] <- suppressMessages(power.TOST(CV=CV,
                                       theta0=theta0,
                                       n=n[j]))
  cat(sprintf("%i %.2f%%%s", n[j], 100*pw[j], "\n"))
}
plot(n, pw, ylim=c(min(pw), 1), las=1,
     xlab="sample size", ylab="expected power")
abline(h=pi)
abline(v=c(n0, n1), lty=3)
text(n, pw, round(100*pw, 1), cex=0.8,  pos=3)

32 88.17%
31 87.14%
30 86.09%
29 84.88%
28 83.65%
27 82.22%
26 80.77%
25 79.07%
24 77.35%
23 75.32%
22 73.27%

If your expected droput-rate was 15% and you end up just with 25 subjects, the expect­ed (!) power will be 79.1% – if (if‼) your assumptions about the CV and the T/R-ratio will hold true…

A common practice (especially in designs with more than two periods) is to dose “stand-ins” as soon as possible. Say you drop below your desired sample size after the second period, you start dosing them in period 3. Generally the data are naïvely pooled …

           period  
subject  1  2  3  4
1   – x  •  •  •  •
x+1 – y  •  •  •  •

… ignoring the true structure which is:

              period     
subject  1  2  3  4  5  6
1   – x  •  •  •  •      
x+1 – y        •  •  •  •

Treating such a mess correctly could be demanding. Even if “stand-ins” are dosed after completing all periods of the “regulars”, at the end we have twice as many periods than planned in the original design.

» » What if we do get significant group term? […]
» You can calculate post-hoc power (sometimes required by regulatory when sample size was calculated with assumptions "GMR in 0.95-1.05 and intra-subject CV x%" and true GMR was outside the expected interval or intra-subject CV was higher than expected.

Which country’s? I don’t hope a European one. Regulators (and sponsors as well) should learn that the sample size estimation (based on a priori power) is based on assumptions. Nothing more, nothing less (hence, the term “sample size calculation” should be avoided). BTW, we don’t know the true GMR. It lies with 90% probability somewhere within the 90% CI around the PE. It is a bizarre idea to relate the PE to the assumed T/R-ratio. Even if the CV is higher than expected and the PE more deviating from 1 than expected and the study passes (though with less than desired power) the only thing we can conclude is that our assumptions were wrong. Who cares? The job of regulators is to be concerned about the consumer risk (α), which is maintained in passing studies by definition. Only the producer’s risk (β) was higher than desired. As ElMaestro once wrote “Being lucky is not a crime”.
Imagine: You visit a casino once in your life to play roulette and place a single bet of € 1,000 on the magic number 24. The ball spins and at the end drops into the 24-pocket of the wheel. Instead of paying out € 35,000 the croupier tells you with a smirk on his face: “Congratulations, but since this achievement was highly improbable we don’t pay you anything. Thank you very much, see you next time.”

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Astea
Regular

Russia,
2016-03-27 21:18

@ Helmut
Posting: # 16145
Views: 9,388
 

 Loss of power etc.

Dear ElMaestro! Thank you for your answer!

Dear zizou! Thank you for your comment and interesting examples! That was actually what I meant: what kind of statistics can be discussed on a base of only one subject in group (drop-out)?!

Dear Helmut!

» Which country’s? I don’t hope a European one.

I was disappointed to read in the application to the Rules of conducting bioequivalence studies within Eurasian Economic Union a note that final statistical report should also contain "power analysis (with the presentation of the results according to the Cmax and AUC(0-t) in the form of a table;" (as far as I know, it is not adopted in Russia yet, but soon may be...)
zizou
Junior

Plzeň, Czech Republic,
2016-03-27 23:44
(edited by zizou on 2016-03-28 11:31)

@ Helmut
Posting: # 16146
Views: 9,483
 

 Loss of power etc.

Dear Helmut,

Thanks for additions to my comment.

» Regulators (and sponsors as well) should learn that the sample size estimation (based on a priori power) is based on assumptions. Nothing more, nothing less (hence, the term “sample size calculation” should be avoided).

My bad habit to use the terms acc. to guidelines (EMA 1401).
4.1.3 Subjects
Number of subjects
The number of subjects to be included in the study should be based on an appropriate sample size calculation. The number of evaluable subjects in a bioequivalence study should not be less than 12.

Neverthless sample size estimation is correct.

» Which country’s? I don’t hope a European one.

Personally I don't know, so that information can be misrepresented somehow. Nevertheless recently I was asked about response to insufficient sample size in performed study which demonstrated BE. It was study in 2x2 crossover design, no drop outs, sample size estimation performed and discussed in protocol. Only CV and GMR gone really much worse than expected.
(I don't know if response that sample size was estimated as described in protocol etc. with some references on literature intra-subject CV was enough.)
If we assumed the study results for another study, double sample size would be required for 80% power.

About post-hoc power as not important if the study demonstrated bioequivalence, I am still not so convinced (little bit yes).
When null hypothesis is false (formulations are BE) and we failed to reject bioinequivalence (probability beta = type II error = producer's risk) is for example 50%.
One of two studies will fail by chance?
With such under powered study there will be low reproducibility of results (by the other (supporting) studies)? ... Ok. When we double sample size it should be more reproducible, but with GMR in 80-125 it could be only about sample size.
If someone got luck to win with power 50% or 33% ... it should be still no problem for regulatory or is there a problem that if someone want to remake the whole study (don't know why) it would be hard acc. to needed luck.

(I am just thinking and writing.)

It is non sense when I got the idea to forget the power/producer's risk at all and design the study only for getting the 90% CI in 0.8000-1.2500.
(Once I was asked for sample size estimation with comment that study will be performed on 36 subjects. :D Fortunally it was Ok., maybe sponsor had made some own estimations, but it looked like impossible after first reading of such command x).)

Without required power (not good statistical practice - I am just only playing with the numbers when I am using algebra for LL (Lower Limit of 90% CI) below):

Assumpitons: 24 subjects, GMR 0.95-1.05, CV below 35.7%. If all mentioned assumptions remain in valid then BE will be demonstrated.

# the worst border case:
GMR=.95
CV=.357
n1=12
n2=12
alpha=.05
LL=exp(log(GMR)-sqrt(log(CV^2+1)/2*(1/n1+1/n2))*qt(1-alpha,n1+n2-2))
LL
# [1] 0.800133 # luck with power 35%

 # When I will think about not so good GMR (0.9), better CV 30% (not HVDs), and n=36:
GMR=.9
CV=.3
n1=18
n2=18
alpha=.05
LL=exp(log(GMR)-sqrt(log(CV^2+1)/2*(1/n1+1/n2))*qt(1-alpha,n1+n2-2))
LL
# [1] 0.8006268 # luck with power 51%


Remember, with no power comes no responsibility.
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2016-03-28 14:29

@ zizou
Posting: # 16148
Views: 9,415
 

 Combined power?

Hi zizou,

» My bad habit to use the terms acc. to guidelines (EMA 1401).
» The number of subjects to be included in the study should be based on an appropriate sample size calculation.
» Neverthless sample size estimation is correct.

Apart from the sloppy terminology this section is unfortunate. According to members of the EWP-PK drafting group (now PKWP) the GL should offer a kind of a “cook-book”. So why the heck the perfect recipe –

The number of subjects required is determined by

  1. the error variance associated with the primary characteristic to be studied as estimated from a pilot experiment, from previous studies or from published data,
  2. the significance level desired,
  3. the expected deviation from the reference product compatible with bioequivalence (delta) and
  4. the required power.
which was part of all [sic] previous versions – was replaced by the laconic elastic clause* “appropriate”?

» […] recently I was asked about response to insufficient sample size in performed study which demonstrated BE. It was study in 2x2 crossover design, no drop outs, sample size estimation performed and discussed in protocol. Only CV and GMR gone really much worse than expected.

Still I hold that if BE was demonstrated the sample size was sufficient indeed. The former implies the latter. Only assumptions were “disproved”.
The assessor could write a letter (not a deficiency letter!) essentially saying “You were lucky this time. Granted. In the future please pay more attention to the sample size, i.e., that assumptions will not be overly optimistic. A copy of this letter will be send to the responsible IEC. Thank you.”

» If we assumed the study results for another study, double sample size would be required for 80% power.

Yes, that’s the purpose of power. Plan another study properly.

» When null hypothesis is false (formulations are BE) and we failed to reject bioinequivalence (probability beta = type II error = producer's risk) is for example 50%.
» One of two studies will fail by chance?

On the long run, yes. Also by chance, you can have another study passing as well. We discussed the behavior of power in multiple studies in another thread.
Intuitively p = ∏pi (e.g., 0.64 for two studies with 0.8). Benjamin Lang coded the function power.2TOST() in PowerTOST. Type help(power.2TOST) for references. His original idea was to provide power for testing two correlated PK-metrics (hence the argument rho in the function) in the same study. Maybe we can misuse it for two studies?

library(PowerTOST)
alpha  <- 0.05
theta0 <- 0.95
CV     <- 0.23
x      <- sampleN.TOST(alpha=alpha, CV=CV, theta0=theta0,
          targetpower=0.8, print=FALSE)
n      <- x[["Sample size"]]
pwr1   <- x[["Achieved power"]]
n
# [1] 24
rho    <- seq(0, 1, 0.005) # correlation
pwr2   <- vector()
for (j in seq_along(rho)) {
  pwr2[j] <- power.2TOST(CV=rep(CV, 2), theta0=rep(theta0, 2),
                         n=n, rho=rho[j])
}
round(pwr1, 4)   # one study
# [1] 0.8067
round(pwr1^2, 4) # intuitive for two studies
# [1] 0.6507
round(range(pwr2), 4)
# [1] 0.6562 0.8066
plot(rho, pwr2, type="l", ylab="power of 2 TOSTs", las=1)


If studies are independent (ρ = 0) we arrive approximately at the intuitive result. If ρ = 1 combined power would be ~ the single studies’ power. The latter is practically impossible (first of all we would have to repeat the study in the same subjects). The “truth” may lie somewhere in between but don’t ask me where.

» With such under powered study there will be low reproducibility of results (by the other (supporting) studies)?

Tricky. I cannot imagine that any BE-study ever was repeated. The closest would be the combination of a pilot and a pivotal study in a larger sample size. If you consider a pilot study supportive, its (generally low) power is not relevant anyhow. IMHO, it is only the pivotal study which counts.

» If someone got luck to win with power 50% or 33% ... it should be still no problem for regulatory or is there a problem that if someone want to remake the whole study (don't know why) it would be hard acc. to needed luck.
»
» (I am just thinking and writing.)

Indeed. But I also can’t imagine why one being already aware of a lucky strike (low power in the first study) would gamble yet another time.
The guy in the Armani suit? ©2010 by ElMaestro

» It is non sense when I got the idea to forget the power/producer's risk at all and design the study only for getting the 90% CI in 0.8000-1.2500.

True. Remember that in the old NfG/GL power was mentioned. For the initiates “appropriate” is backed by ICH-E9 Section 3.5:

The number of subjects in a clinical trial should always be large enough to provide a reliable answer to the questions addressed.
Using the usual method for determining the appropriate sample size, the following items should be specified: […] the probability of erroneously failing to reject the null hypothesis (the type II error) […]


» Without required power (not good statistical practice - I am just only playing with the numbers when I am using algebra for LL:

Confirmed your results. ;-)

library(PowerTOST)
# functions to find the highest CV which will pass BE for
# given PE and n:
opt1 <- function(x) CI.BE(CV=x, pe=theta0, n=n)[["lower"]]-crit
opt2 <- function(x) CI.BE(CV=x, pe=theta0, n=n)[["upper"]]-crit
alpha  <- 0.05
theta0 <- 0.95
n      <- rep(12, 2)
if (theta0 <= 1) {
  crit <- 0.80
  CV   <- uniroot(opt1, interval=c(0.01, 5), tol=1e-8)$root
} else {
  crit <- 1.25
  CV   <- uniroot(opt2, interval=c(0.01, 5), tol=1e-8)$root
}
CV
# [1] 0.3573667
round(100*CI.BE(CV=CV, pe=theta0, n=n), 2)
# lower  upper
# 80.00 112.81
power.TOST(alpha=alpha, theta0=theta0, CV=CV, n=n)
# [1] 0.3534668

and

...
CV
# [1] 0.3020968
round(100*CI.BE(CV=CV, pe=theta0, n=n), 2)
# lower  upper
# 80.00 101.25
power.TOST(alpha=alpha, theta0=theta0, CV=CV, n=n)
# [1] 0.503441



  • Best described by the German “Gummiparagraph” (literally: rubber-paragraph)…

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Astea
Regular

Russia,
2016-03-28 23:57

@ zizou
Posting: # 16150
Views: 9,132
 

 Loss of power etc.

Dear all!

I understand clear that post-hoc power is set yours teeth on edge, but I'll take a chance to ask a question once more... There was a study with one drop-out. The results of that subject have not been included in the calculation. Recently the customers received a reply from regulatories to calculate aposteriory power and in the case of insufficient power present a plan for further clinical development of the drug.

I estimated power with the help of power.TOST, based on obtained GMR and CV. The result is: power is less than 80% for AUC and more than 80% for Cmax. I remade the calculation including the drop-out: power for AUC is still less than 80%!

Funny thing, but the drugs were bioequivalent! Moreover they are bioequivalent nevetherless including drop-out's results or not! And that fact was already stated in the report. What to do in such a situation? :confused:
ElMaestro
Hero

Denmark,
2016-03-29 00:16

@ Astea
Posting: # 16151
Views: 9,141
 

 Loss of power etc.

Hi Astea,

» I understand clear that post-hoc power is set yours teeth on edge, but I'll take a chance to ask a question once more... There was a study with one drop-out. The results of that subject have not been included in the calculation. Recently the customers received a reply from regulatories to calculate aposteriory power and in the case of insufficient power present a plan for further clinical development of the drug.

It is an extremely unfortunate situation, and the regulators have clearly misunderstood the use of power, what it is and what it isn't. It is tempting to think that a study that passes the BE test must have a high power but this argument is just flawed. I am really sorry to hear you are caught in this mess.

» I estimated power with the help of power.TOST, based on obtained GMR and CV. The result is: power is less than 80% for AUC and more than 80% for Cmax. I remade the calculation including the drop-out: power for AUC is still less than 80%!

Very often companies that really do use posthoc power in their reporting do not use the observed GMR but rather plug in something like 0.95 or 1.00. This of course often give rise to higher power estimates, ones that arguably have little to do with anything. Perhaps that is your chance? Let me add, I do not in any way endorse such power calculation but then again I do not know what the regulator wants with a posthoc power calculation in the first place, so who knows?!?

Good luck, let us know how this ends, please.

if (3) 4

Best regards,
ElMaestro

"(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018.
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2016-03-29 17:28

@ Astea
Posting: # 16152
Views: 9,131
 

 Mystery

Hi Astea,

» Recently the customers received a reply from regulatories to calculate aposteriory power and in the case of insufficient power present a plan for further clinical development of the drug.

I’ll offer the agency a free training if travel and accommodation are covered. ;-)
What is sufficient for them? Anything ≥80%? Smells of the end of Appendix 3 of Russia’s GLs of 2004 and 2008:

        We could resolve and reverse task – knowing study population size n, coefficient of variation CV, value of difference Ω and significant level α we could estimate statistical power of bio­equivalence evaluation. To make this possible you should use table of Student distribution and estimate probability of second level error β based on Student t-value, calculated by following equation (in assumption of mean values equality):
[image]
        In case of disruption the assumption for mean values equality the equation could be modified like this:
[image]
        Statistical test power must be not less than 80%.

(my emphasis)

BTW, these GLs were always a mystery to me. What is meant by this part of in Section 4.2. Number of participants?

        During statistical comparison if the power appears to be less than 80%, in those cases when study drugs are not bioequivalent, to draw a reasonable conclusion about nonbioequivalence, study population must be enlarged.


Sure, if one can expect to demonstrate BE in a reasonably larger sample size one will repeat the study. But if not (say in the study the CV was as expected but the PE terrible)?

library(PowerTOST)
n1 <- sampleN.TOST(CV=0.25, theta0=0.95, targetpower=0.8,
                   print=FALSE)[["Sample size"]]
round(100*CI.BE(CV=0.25, pe=0.85, n=n1), 2)
round(100*power.TOST(CV=0.25, theta0=0.85, n=n1), 2)
n2 <- sampleN.TOST(CV=0.25, theta0=0.85, targetpower=0.8,
                   print=FALSE)[["Sample size"]]
round(100*CI.BE(CV=0.25, pe=0.85, n=n2), 2)
round(100*power.TOST(CV=0.25, theta0=0.85, n=n2), 2)

We plan the study for 80% power (CV 25%, T/R 0.95) in 28 subjects. The PE turns out to be awful (85%). Study fails (CI 75.98–95.10%). Post hoc power 22.74%. Now what? That’s not a “reasonable conclusion about nonbioequivalence”? A failed study can never ever show high post-hoc power! Enlarge the sample size and force bioequivalence? Repeat the study in 330 subjects? Study passes (CI 81.66–88.48%), post-hoc power 80.11%.

» I estimated power with the help of power.TOST, based on obtained GMR and CV. The result is: power is less than 80% for AUC and more than 80% for Cmax.
» Funny thing, but the drugs were bioequivalent!

Yes, why not? See the rather extreme examples given by zizou above.

» What to do in such a situation? :confused:

Don’t know. My diplomatic skills are practically nonexistent. Maybe ElMaestro’s suggestions are a way out.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Astea
Regular

Russia,
2016-03-29 21:57

@ Helmut
Posting: # 16153
Views: 9,022
 

 Back to the Future

Dear Helmut!

» What is sufficient for them? Anything ≥80%? Smells of the end of Appendix 3 of Russia’s GLs of 2004 and 2008

You are reading my thoughts, I've just remembered that formulas, they go from russian recomendations, 2004.

» [image]
»       In case of disruption the assumption for mean values equality the equation could be modified like this:
» [image]

By the way, there is a mistake in the first formula: comparing it with the data provided in this article (Chow S.-C., Wang H. On Sample Size Calculation in Bioequivalence Trials, Journal of Pharmacokinetics and Pharmacodynamics, Vol. 28, No. 2, 2001;12(12):1865–8. Chow S.-C., Wang H.) there should be β/2.

Funny is that there was no formal occasion to ask us to calculate it, the rest is not funny at all. :-(

So if they want aposteriory power they'll get it. I'll just take the principles, kindly presented by yicaoting  , and calculate something like winnonlin's power. Of course it would be greater (completely different sense), hoping it would be sufficient...
ElMaestro
Hero

Denmark,
2016-03-29 23:11

@ Astea
Posting: # 16154
Views: 8,934
 

 Back to the Future

– "What's the time?"
– "Banana."

if (3) 4

Best regards,
ElMaestro

"(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018.
mittyri
Senior

Russia,
2016-03-30 00:17

@ ElMaestro
Posting: # 16155
Views: 8,940
 

 Using lectures != Reading them

Hi All,

By the way they are using Helmut's lectures...

Kind regards,
Mittyri
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2016-03-30 00:42

@ mittyri
Posting: # 16156
Views: 9,003
 

 Recycling

Hi Mittyri,

» By the way they are using Helmut's lectures...

Let’s call it recycling.
BTW, wasn’t in Mumbai but in Prague 2012 where I showed these plots for the last time. They should have read slide 56 of another presentation (Moscow 2014) instead.

I found Astea’s nightmare on page 154 of Распоряжение Коллегии № 178 (30.12.2015) about the study report:

анализ мощности исследования (с представлением результатов по данным Cmax и AUC(0-t) в форме таблицы)

Браво, “эксперты”! :angry:

@Astea: That’s not WinNonlin’s power, but an outdated method used by Phar­sight Certara for ages. See here and there. Given your results I guess by (falsely) applying this method “power” will be extremely high… Sigh. Make them happy. Get drunk later.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Astea
Regular

Russia,
2016-03-30 12:23

@ Helmut
Posting: # 16157
Views: 8,863
 

 Recycling

Dear Helmut!

» I found Astea’s nightmare on page 154 of Распоряжение Коллегии № 178 (30.12.2015) about the study report:

анализ мощности исследования (с представлением результатов по данным Cmax и AUC(0-t) в форме таблицы)

» Браво, “эксперты”! :angry:

That's what I was actually talking some posts higher.

Dear mittyri!
The person from regulators explained me that they have to learn almost all by themselves. So in the educational purposes they read this forum (they are among us, another reason for persecution mania :-D) and Helmut's lectures. If it is so, the role of educational direction of the Forum is extremely great and significant! Hope they will read it and review their attitude to pos-hoc power.
Beholder
Regular

Russia,
2016-03-30 13:00

@ Astea
Posting: # 16158
Views: 8,818
 

 Recycling

» The person from regulators explained me that they have to learn almost all by themselves. So in the educational purposes they read this forum (they are among us, another reason for persecution mania :-D) and Helmut's lectures.

Really funny!:-D

» If it is so, the role of educational direction of the Forum is extremely great and significant! Hope they will read it and review their attitude to pos-hoc power.

Absolutely agree!

Best regards
Beholder
Back to the forum Activity
 Thread view
Bioequivalence and Bioavailability Forum |  Admin contact
18,395 posts in 3,909 threads, 1,174 registered users;
online 40 (0 registered, 40 guests [including 25 identified bots]).

The analysis of variance is not a mathematical theorem,
but rather a convenient method of arranging the arithmetic.    R.A. Fisher

The BIOEQUIVALENCE / BIOAVAILABILITY FORUM is hosted by
BEBAC Ing. Helmut Schütz
HTML5 RSS Feed