Mauricio Sampaio
★

Brazil,
2020-02-11 20:52
(edited by Mauricio Sampaio on 2020-02-12 17:16)

Posting: # 21160
Views: 789

## ANVISA guidelines for two-stage design [Two-Stage / GS Designs]

Dear, ANVISA has made available a new draft on bioequivalence studies and a chapter on two stage design.

Below are the points.

Please, you could make contributions so that we can be in line with the other guidelines.

Art.75. For two-stage studies, the following should be noted:
1. It is acceptable to use a two-stage approach to demonstrate bioequivalence based on unknowledgement of the intra-individual variability of the drug;

2. An initial group of subjects can be treated and their data analyzed;

3. If power is not sufficient and bioequivalence has not been demonstrated, an additional group can be recruited and the results of both groups will be combined in a final analysis;

4. This second group must have, at least, 50% of the previous group;

5. Type I error must be preserved and adjusted, and in order to demonstrate bioequivalence the level of confidence is 94.12%;

6. In the protocol, the stopping criteria must be clearly defined before the study and the analysis of the first step must be treated as an interim analysis; and

7. When analyzing the combined data from the two stages, the stage variable should be included in the ANOVA model and its influence verified

In my opinion it could be better. So, I would like to hear your opinion.

Helmut
★★★

Vienna, Austria,
2020-02-12 02:26

@ Mauricio Sampaio
Posting: # 21161
Views: 753

## Brainless copy-and-paste-devil

Hi Mauricio,

» IV. This second group must have, at least, 50% of the previous group;
»
» V. Type I error must be preserved and adjusted, and in order to demonstrate bioequivalence the level of confidence is 94.12%;

The red parts are crap. See my remarks there.

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Mauricio Sampaio
★

Brazil,
2020-02-12 17:33

@ Helmut
Posting: # 21168
Views: 656

## Brainless copy-and-paste-devil

Hi Helmut,

Thank you for presentation. Ahhh I have a question...

Are you in Campinas?? Brazil???

» NESE, Campinas, 11 – 13 February, 2020

ElMaestro
★★★

Belgium?,
2020-02-12 08:27

@ Mauricio Sampaio
Posting: # 21162
Views: 725

## ANVISA guidelines for two-stage design

Hi MS,

» Art.75. For two-stage studies, the following should be noted: It is acceptable to use a two-stage approach to demonstrate bioequivalence based on ignorance of the intra-individual variability of the drug;

Ermmm...... WHAT????

Le tits now.

Best regards,
ElMaestro
nobody
nothing

2020-02-12 08:49

@ ElMaestro
Posting: # 21163
Views: 719

## ANVISA guidelines for two-stage design

» » Art.75. For two-stage studies, the following should be noted: It is acceptable to use a two-stage approach to demonstrate bioequivalence based on ignorance of the intra-individual variability of the drug;
»
»
» Ermmm...... WHAT????

Same reaction here last night, but I thought I'm hallucinating due to ovewr-day fasting.

Kindest regards, nobody
Helmut
★★★

Vienna, Austria,
2020-02-12 13:12

@ ElMaestro
Posting: # 21164
Views: 687

## ANVISA guidelines for two-stage design

Hi ElMaestro,

» » ignorance of the intra-individual variability of the drug;
»
»
» Ermmm...... WHAT????

Lost in translation?

É aceitável usar uma abordagem de dois estágios para demonstrar a bioequivalência baseada nodesconhecimento da variabilidade do fármaco;

I have the same translation like Mauricio. Google-translate suggests ‘unknowing’ for ‘nodesconhecimento’ but if you feed the entire sentence, ‘ignorance’ shows up.

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Mauricio Sampaio
★

Brazil,
2020-02-12 17:04
(edited by Mauricio Sampaio on 2020-02-12 17:14)

@ ElMaestro
Posting: # 21166
Views: 665

## ANVISA guidelines for two-stage design

Hi nobody

» "Ermmm...... WHAT???? '

Sorry! Change to: unknowlegement of intra-individual variability
Mauricio Sampaio
★

Brazil,
2020-02-12 17:11

@ ElMaestro
Posting: # 21167
Views: 661

## ANVISA guidelines for two-stage design

Hi El Maestro
» Ermmm...... WHAT????

Sorry! Change to: unknowledgement of intra-individual variability.

ElMaestro
★★★

Belgium?,
2020-02-12 21:54

@ Mauricio Sampaio
Posting: # 21169
Views: 620

## ANVISA guidelines for two-stage design

Hi MS,

» Art.75. For two-stage studies, the following should be noted:
1. It is acceptable to use a two-stage approach to demonstrate bioequivalence based on unknowledgement of the intra-individual variability of the drug;
»
2. An initial group of subjects can be treated and their data analyzed;
»
3. If power is not sufficient and bioequivalence has not been demonstrated, an additional group can be recruited and the results of both groups will be combined in a final analysis;
»
4. This second group must have, at least, 50% of the previous group;
»
5. Type I error must be preserved and adjusted, and in order to demonstrate bioequivalence the level of confidence is 94.12%;
»
6. In the protocol, the stopping criteria must be clearly defined before the study and the analysis of the first step must be treated as an interim analysis; and
»
7. When analyzing the combined data from the two stages, the stage variable should be included in the ANOVA model and its influence verified

» In my opinion it could be better. So, I would like to hear your opinion.

It sounds like a derivative of Potvin's method B with both alphas 0.0294 (1-2*0.0294=0.9412=94.12%), but the performance isn't one that is published. It is not at all demianding to do this type of study assuming you can generally handle twop-stage approaches, but I am a little uncertain when you mention that the influence of stage should be verified, I don't quite know what this means. Do they talk about anova and assessment of the stage effect through a p-level, comparison of results with and with a stage term or what?
Mauricio, do you think there could be alternative translations of the sentence in question?

Hötzi, do you want to publish the performance of this approach in AAPSJ or JPPS with me if I do the simulations and draft the ms?

Le tits now.

Best regards,
ElMaestro
Helmut
★★★

Vienna, Austria,
2020-02-15 17:53

@ ElMaestro
Posting: # 21171
Views: 434

## crap

Hi ElMaestro,

» It sounds like a derivative of Potvin's method B with both alphas 0.0294 (1-2*0.0294=0.9412=94.12%), but the performance isn't one that is published. It is not at all demianding to do this type of study assuming you can generally handle twop-stage approaches, but I am a little uncertain when you mention that the influence of stage should be verified, I don't quite know what this means. Do they talk about anova and assessment of the stage effect through a p-level, comparison of results with and with a stage term or what?
» Mauricio, do you think there could be alternative translations of the sentence in question?

In the meantime I know what happened. Naturally the original is in Portuguese. In my experience people at the ANVISA sometimes misunderstand English papers/regulations. Company X provided the original to a professional translator who produced what Mauricio posted. It channeled to company Y (I have it in all its doubtful beauty). Hardly better than what Google-translate produces.

» Hötzi, do you want to publish the performance of this approach in AAPSJ or JPPS with me if I do the simulations and draft the ms?

I don’t see the purpose. We discussed already more than three years ago that a minimum stage 2 might inflate the Type I Error. Not rocket-science. Here an example at the location of the maximum inflation of Potvin’s Method B (n1 12, CV 24%):

library(Power2Stage) n1     <- 12 CV     <- 0.24 method <- c("Potvin", "EMA Q&A", "ANVISA", "Potvin-opt", "Potvin-opt-mod",             "Kieser-Rauch") min.n2 <- c(0, 2, 0.5*n1, 0, 0.5*n1, 0) alpha  <- c(0.0294, 0.0294, 0.0294, 0.0302, 0.0302, 0.0304) res    <- data.frame(method = method, alpha = alpha,                      min.n2 = min.n2, TIE = NA) for (j in 1:nrow(res)) {   res$TIE[j] <- power.tsd(alpha = rep(res$alpha[j], 2), n1 = n1, CV = CV,                           min.n2 = res$min.n2[j], theta0 = 1.25)$pBE } print(res, row.names = FALSE)         method  alpha min.n2      TIE         Potvin 0.0294      0 0.048762 ← TIE controlled by chance
       EMA Q&A 0.0294      2 0.048762 ← stupid and meaningless
        ANVISA 0.0294      6 0.048791 ← higher TIE but OK
    Potvin-opt 0.0302      0 0.049987 ← TIE controlled
Potvin-opt-mod 0.0302      6 0.050196 ← inflated TIE due to n2 ≥ 50% n1
  Kieser-Rauch 0.0304      0 0.050270 ← inflated TIE

We all know that Potvin’s adjusted α for Method B was a lucky punch. It has nothing to do with Po­cock’s 0.0294 (which is for GSD, superiority, parallel groups, known variance, and one interim at exactly N/2). Kieser & Rauch lamented about that and stated that the correct Po­cock’s α for equivalence is 0.0304. Sorry guys, only for GSDs.
If we force a minimum n2, the TIE will always increase. Contrary to the EMA (stating ‘For example, using 94.12% confidence intervals…’) seemingly ANVISA mandates 0.0294, which is crap. By chance, the TIE is still maintained but this is not necessarily the case for other methods.

If I’m in the right mood I’ll write letter to ANVISA.

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Mauricio Sampaio
★

Brazil,
2020-02-16 05:01

@ Helmut
Posting: # 21172
Views: 412

## crap

» If I’m in the right mood I’ll write letter to ANVISA.

Or make your official contribution on the website:
http://formsus.datasus.gov.br/site/formulario.php?id_aplicacao=52824
Helmut
★★★

Vienna, Austria,
2020-02-16 15:43

@ Mauricio Sampaio
Posting: # 21173
Views: 383

## crap

Hi Mauricio,

» » If I’m in the right mood I’ll write letter to ANVISA.
»
»
» Or make your official contribution on the website:

No promises…

I played around around with published (and unpublished) methods. I used the noncentral t-distribution, whereas in the papers the shifted central t-distribution was used for speed reasons. One degree less in the sample size estimation because the stage-term is used in the pooled analysis. 100,000 simulations for the average total sample size E[N] and 1 mio for the empiric Type I Error. Narrow grid for CV (10–80%, step 2%), and n1 (12–72, step 2). The power/TIE surfaces are highly nonlinear; generally the maximum inflation is observed at a combination of low CV and small n1. The TIE is given for these locations (in the papers a wider grid with step sizes of 10% and 12 was used).
In the original methods no minimum stage 2 size; for the ANVISA I forced it to ≥ 50% n1. SLF refers to a manuscript by the usual Simul-Ants (Schütz, Labes, Fuglsang) we didn’t finish (rests in peace in my “dead dogs”-folder)…
        Name Method Type  GMR power  alpha   CV n1 E[N]     TIE min.n2 E[N]  ANVISA   comp          SLF      B    1 0.90   0.8 0.0272 0.20 12 40.8 0.04997      6 40.8 0.04999 higher          SLF      B    1 0.90   0.9 0.0268 0.22 16 60.3 0.04985      8 60.3 0.04977  lower       Potvin      B    1 0.95   0.8 0.0294 0.24 12 29.8 0.04876      6 29.9 0.04879 higher   Potvin-SLF      B    1 0.95   0.8 0.0302 0.24 12 29.5 0.04999      6 29.6 0.05020 higher     Fuglsang      B    1 0.95   0.9 0.0284 0.22 12 31.7 0.04960      6 31.7 0.04958  lower Fuglsang-SLF      B    1 0.95   0.9 0.0286 0.22 12 31.6 0.04999      6 31.6 0.05032 higher     Montague      D    2 0.90   0.8 0.0280 0.20 12 40.3 0.05180      6 40.3 0.05181 higher Montague-SLF      D    2 0.90   0.8 0.0268 0.18 12 32.7 0.04998      6 32.7 0.04980  lower     Fuglsang    C/D    2 0.90   0.9 0.0269 0.18 12 41.8 0.05021      6 41.8 0.05011  lower Fuglsang-SLF    C/D    2 0.90   0.9 0.0266 0.18 12 42.0 0.04995      6 42.0 0.04967  lower       Potvin      C    2 0.95   0.8 0.0294 0.22 12 24.9 0.05143      6 24.9 0.05136  lower   Potvin-SLF      C    2 0.95   0.8 0.0282 0.10 16 16.0 0.05010      8 16.0 0.05010  equal     Fuglsang    C/D    2 0.95   0.9 0.0274 0.10 16 16.0 0.05010      8 16.0 0.05010  equal Fuglsang-SLF    C/D    2 0.95   0.9 0.0275 0.20 12 25.8 0.04962      6 25.8 0.04985 higher

TIE which is significantly >0.05 in red (limit of the binomial test 0.05036). I don’t understand why in some scenarios the TIE is lower with a minimum n2.
Counterintuitive.

R-code:
library(Power2Stage) even.n2 <- function(n1, pct) {   ceiling(n1 * (1 + pct/100) / 2) * 2 - n1 } alpha0 <- 0.05 # for type 2 designs # locations of TIE (narrow grid) CV     <- c(0.24, 0.24, 0.22, 0.10, 0.22, 0.22, 0.10, 0.20, 0.20,             0.22, 0.20, 0.18, 0.18, 0.18) n1     <- c(12, 12, 12, 16, 12, 12, 16, 12, 12, 16, 12, 12, 12, 12) min.n2 <- even.n2(n1, 50) cond   <- data.frame(Name = c(rep(c("Potvin", "Potvin-SLF"), 2),                               rep(c("Fuglsang", "Fuglsang-SLF"), 2),                               rep("SLF", 2), "Montague", "Montague-SLF",                               "Fuglsang", "Fuglsang-SLF"),                      Method = c(rep("B", 2), rep("C", 2), rep("B", 2),                                 rep("C/D", 2), rep("B", 2), rep("D", 2),                                 rep("C/D", 2)),                      Type = c(rep(1, 2), rep(2, 2), rep(1, 2), rep(2, 2),                               rep(1, 2), rep(2, 2), rep(2, 2)),                      GMR = c(rep(0.95, 8), 0.90, 0.90, rep(0.90, 2),                              rep(0.90, 2)),                      power = c(rep(0.80, 4), rep(0.90, 4), 0.80, 0.90,                                rep(0.80, 2), rep(0.90, 2)),                      alpha = c(0.0294, 0.0302, 0.0294, 0.0282, 0.0284, 0.0286,                                0.0274, 0.0275, 0.0272, 0.0268, 0.0280, 0.0268,                                0.0269, 0.0266),                      CV = CV, n1 = n1, stringsAsFactors = FALSE) res    <- cbind(cond, ASN = NA, TIE = NA, min.n2 = min.n2, ASN.1 = NA,                 ANVISA = NA, comp = "equal", stringsAsFactors = FALSE) for (j in 1:nrow(cond)) {   ifelse (cond$Type[j] == 1, method <- "B", method <- "C") x1 <- power.tsd(method = method, alpha0 = alpha0, alpha = rep(cond$alpha[j], 2), n1 = cond$n1[j], GMR = cond$GMR[j], CV = cond$CV[j], targetpower = cond$power[j])   x2 <- power.tsd(method = method, alpha0 = alpha0,                   alpha = rep(cond$alpha[j], 2), n1 = cond$n1[j],                   GMR = cond$GMR[j], CV = cond$CV[j],                   targetpower = cond$power[j], theta0 = 1.25) res$ASN[j] <- round(x1$nmean, 1) res$TIE[j] <- signif(x2$pBE, 4) y1 <- power.tsd(method = method, alpha0 = alpha0, alpha = rep(cond$alpha[j], 2), n1 = cond$n1[j], GMR = cond$GMR[j], CV = cond$CV[j], targetpower = cond$power[j], min.n2 = res$min.n2[j]) y2 <- power.tsd(method = method, alpha0 = alpha0, alpha = rep(cond$alpha[j], 2), n1 = cond$n1[j], GMR = cond$GMR[j], CV = cond$CV[j], targetpower = cond$power[j], min.n2 = res$min.n2[j], theta0 = 1.25) res$ASN.1[j]  <- round(y1$nmean, 1) res$ANVISA[j] <- signif(y2$pBE, 4) } names(res)[c(9, 12)] <- rep("E[N]", 2) res$comp[which(res$ANVISA > res$TIE)] <- "higher" res$comp[which(res$ANVISA < res$TIE)] <- "lower" print(res[order(res$Type, res$GMR, res$power, res$Name, res$Method,                 decreasing = c(FALSE, FALSE, TRUE, FALSE, TRUE)), ],       row.names = FALSE)

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2020-02-16 19:34

@ Helmut
Posting: # 21175
Views: 366

## crap

Dear Helmut,

» ...
»         Name Method Type  GMR power  alpha     TIE  ANVISA   comp delta
»          SLF      B    1 0.90   0.8 0.0272 0.04997 0.04999 higher  2E-5
»          SLF      B    1 0.90   0.9 0.0268 0.04985 0.04977  lower  8E-5
»       Potvin      B    1 0.95   0.8 0.0294 0.04876 0.04879 higher  3E-5
»   Potvin-SLF      B    1 0.95   0.8 0.0302 0.04999 0.05020 higher  2E-4
»     Fuglsang      B    1 0.95   0.9 0.0284 0.04960 0.04958  lower  2E-5
» Fuglsang-SLF      B    1 0.95   0.9 0.0286 0.04999 0.05032 higher  3E-4
»     Montague      D    2 0.90   0.8 0.0280 0.05180 0.05181 higher  1E-5
» Montague-SLF      D    2 0.90   0.8 0.0268 0.04998 0.04980  lower  2E-4
»     Fuglsang    C/D    2 0.90   0.9 0.0269 0.05021 0.05011  lower  1E-4
» Fuglsang-SLF    C/D    2 0.90   0.9 0.0266 0.04995 0.04967  lower  3E-4
»       Potvin      C    2 0.95   0.8 0.0294 0.05143 0.05136  lower  7E-5
»   Potvin-SLF      C    2 0.95   0.8 0.0282 0.05010 0.05010  equal
»     Fuglsang    C/D    2 0.95   0.9 0.0274 0.05010 0.05010  equal
» Fuglsang-SLF    C/D    2 0.95   0.9 0.0275 0.04962 0.04985 higher  2E-4
»
» TIE which is significantly >0.05 in red (limit of the binomial test 0.05036). I don’t understand why in some scenarios the TIE is lower with a minimum n2.
» Counterintuitive.

I'm quite sure: This is because of the simulation error. The differences of the TIE without and with min.n2 are so small. See the last column above.
Any try with a different seed of the random number generator may and will change the comparison.

Edit: Removed irrelevant columns for clarity. [Helmut]

Regards,

Detlew
Helmut
★★★

Vienna, Austria,
2020-02-17 12:49

@ d_labes
Posting: # 21176
Views: 308

## Explained crap remains crap

Dear Detlew,

» I'm quite sure: This is because of the simulation error. The differences of the TIE without and with min.n2 are so small.
» Any try with a different seed of the random number generator may and will change the comparison.

as usual you are right.
The standard error of a single estimate from 1 mio simulations is $$\small{\sqrt{0.5\alpha/10^6}\approx 0.00016}$$. Variable results with random seeds.

25 replicates; blue dots fixed seeds, light blue dots random seeds. Linear fit, 95% prediction interval.

Walking in the footsteps of zizou and trying an argument:

In all methods the sample size of the second stage is estimated based on the adjusted α, the – mainly fixed – GMR, the CV observed in the interim, and target power. Only for these conditions the adjusted α is validated. None of the published methods use a minimum n2 (the minimum n2 of two subjects given in the EMA’s Q&A document is nonsense, of course).
If one arbitrarily increases n2, the chance to demonstrate BE (i.e., falsely rejecting the true Null) increases as well and hence, the Type I Error.

Since the α 0.0294 of Potvin’s1 Method B is overly conservative, ANVISA’s requirement fortunately controls the Type I Error (see the plot above) but this might not be the case with other methods where the adjusted α gives a TIE closer to the nominal 0.05.

Consequences for the Consulta Pública N° 760:
• I don’t get the point why one should treat more subjects than necessary. IMHO, that’s not ethical.
• If it will be implemented in its current form, one is bound to Potvin’s Method B.
Stupid because only applicable for GMR 0.95 and 80% power. Not fully adaptive (i.e., using the PE of the interim), no futility rules (maximum sample size, early stopping due to extreme PE, etc).
• Type 2 TSDs are seemingly not acceptable. Why? Fine for the FDA and Health Canada…
• Hopefully we can convince the ANVISA that other methods are valid as well. However, if the ANVISA insists on n2 ≥ 50% n1, simulations are mandatory to find a suitable – potentially lower – adjusted α.
• If nowadays dealing with crossover designs, I would leave the simulation-based methods aside and recommend Maurer’s2 approach instead. It is the most flexible one and allows to specify a minimum n2 (≥ 4) whilst controlling the TIE in the strict sense.
NB, the minimum n2 of the method is 4 (required for the ANOVA of the second stage).

1. Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs. Pharm Stat. 2008; 7(4): 245–62. doi:10.1002/pst.294.
2. Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018; 37(10): 1587–607. doi:10.1002/sim.7614.

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Mauricio Sampaio
★

Brazil,
2020-02-17 13:50

@ Helmut
Posting: # 21177
Views: 267

## Proposed changes

» Consequences for the Consulta Pública N° 760.

Instead of: "Type I error must be preserved and adjusted, and to demonstrate bioequivalence the level of confidence is 94.12%;"

I will only propose that: It must be demonstrated that the type I error of the study is controlled.

Instead of: "This second group must have at least 50% of the previous group"

I will propose that: The number of participants in the second stage must be calculated based on the data extracted from the first stage. The calculation must be justified considering possible losses and / or dropouts observed in the first stage.

In this way, the dialogue is open and not restricted. = "on top of the wall"

Helmut
★★★

Vienna, Austria,
2020-02-17 14:16

@ Mauricio Sampaio
Posting: # 21178
Views: 258

## Proposed changes

Hi Mauricio,

» Instead of: "Type I error must be preserved and adjusted, and to demonstrate bioequivalence the level of confidence is 94.12%;"
»
» I will only propose that: It must be demonstrated that the type I error of the study is controlled.

OK in principle. It’s always a good idea not only to propose a change but give a justification. Maybe refer to the EMA’s and the WHO’s guidelines stating that the adjusted α has to be specified in the protocol and the choice is at the company’s discretion. α 0.0294 (i.e., the 94.12% CI) is definitely not the only possible one.

» Instead of: "This second group must have at least 50% of the previous group"
»
» I will propose that: The number of participants in the second stage must be calculated based on the data extracted from the first stage. The calculation must be justified considering possible losses and / or dropouts observed in the first stage.

OK. Do me a favor: Use estimated/estimation instead of calculated/calculation.
Of course, n2 is always based on the eligible subjects in the interim (n1), not on the subjects randomized.
Justification: A minimum stage 2 sample size is not covered by the published methods; any minimum n2 might inflate the Type I Error. If that sounds too statistical write “the patient’s risk” instead.

Cheers,
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes