Helmut
★★★
avatar
Homepage
Vienna, Austria,
2014-07-21 04:04
(3538 d 10:14 ago)

Posting: # 13279
Views: 12,880
 

 EMA: α-inflation – the sus­pi­cion begins to mount [RSABE / ABEL]

Dear all,

maybe some of you remember this interesting scary discovery of Detlew. This week a paper was pub­lished by a German group* confirming his findings. Appetizers:

Surprisingly, when ‘scaled’ bioequivalence limits were set as bioequivalence limits, the highest re­jec­tion rate observed was at the lowest variability investigated. With CVANOVA of 30%, it was 7.05%, and it was still 5.39% with a variability of 40%. Therefore, taking into account a simulation error of roughly 0.5%, and the fact that our simulations are based on uncorrelated data and do not consider interindividual variabilities, one may doubt that an α-error of 5% is controlled even with the pre-set ‘scaled’ limits, at least for variabilities close to the cut-off point of CVANOVA of 30%.

  • Table II Rejection Rate After 10000 Simulations […] and Empirical α-Error Rate at the ‘Scaled’ BE Limits […] According to EMA with Increasing Intraindividual Variability
    CVANOVA [%]  N   GMR   Empirical α-error rate [%]
    ────────────────────────────────────────────────
        30     22  1.250         7.05 *
        35     25  1.295         5.58 *
        40     27  1.340         5.39 *
        45     27  1.386         4.25  
        50     28  1.432         3.51  
    ────────────────────────────────────────────────

(* significantly >0.05; my addition)

Try this code:

library(PowerTOST)
CV  <- seq(30, 50, 5)
res <- data.frame(CV = CV, N = c(22, 25, 27, 27, 28),
                  GMR = scABEL(CV/100)[, "upper"], pBE = NA, sig = "",
                  stringsAsFactors = FALSE)
for (j in seq_along(CV)) {
  res$pBE[j] <- round(100*power.scABEL(CV = CV[j]/100, theta0 = res$GMR[j],
                                       n = res$N[j], design = "2x3x3",
                                       nsims = 1e6), 2)
}
sig <- binom.test(0.05*1e6, 1e6, alternative = "less")$conf.int[[2]]
res$sig[res$pBE/100 > sig] <- "*"
names(res)[5] <- ""
print(res, row.names = FALSE)


Do these results look familiar?

 CV  N      GMR  pBE 
 30 22 1.250000 6.88 *
 35 25 1.294796 5.42 *
 40 27 1.340165 5.04 *
 45 27 1.385915 4.34 
 50 28 1.431910 3.32


If you are planning a study for evaluation with EMA’s ABEL-method think about it.


  • Wonnemann M, Frömke C, Koch A. Inflation of the Type I Error: Investigations on Regulatory Recommendations for Bioequivalence of Highly Variable Drugs. Pharm Res. 31 (preprint published 18 July 2014) doi:10.1007/s11095-014-1450-z.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2014-07-21 11:24
(3538 d 02:53 ago)

@ Helmut
Posting: # 13281
Views: 11,562
 

 EMA: α-inflation – the sus­pi­cion begins to mount

Hi Hötzi,

d_labes should publish a commentary in the journal. I think his figures were sort of more telling weren't they?

By the way, one of the authors used to be a member of the PK subgroup back in the day.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2014-07-21 18:14
(3537 d 20:04 ago)

@ ElMaestro
Posting: # 13284
Views: 11,671
 

 Why simulate a simple AB|BA at all?

Hi ElMaestro,

❝ d_labes should publish a commentary in the journal. I think his figures were sort of more telling weren't they?


Yes. Given the unpleasant experience we had last year submitting a letter to the namely journal I’m not sure whether he will risk the efforts…

BTW, I have some mixed feelings about the paper. I don’t have the slightest idea why the authors simulated the rejection rate for a conventional 2×2 cross-over. Power can be directly calculated for any given combination of α, CV, GMR, and N. One should never ever get anything >0.05! So where does this hump at N>30 in Fig. 1 come from?
Try:

library(PowerTOST)
n   <- seq(8, 100, 2)
CV  <- c(10, 20, seq(30, 55, 5))
pBE <- vector("numeric", length=length(n))
clr <- colorRampPalette(c("blue", "red"))(length(CV))
for(j in seq_along(CV)) {
  for(k in seq_along(n)) {
    pBE[k] <- power.TOST(CV=CV[j]/100, theta0=1.25, n=n[k])
    if(j == 1 & k == length(n)) {
      plot(n, pBE, type="l", ylim=c(0, 0.05), las=1, lwd=2, col=clr[j])
      abline(h=0.05, lty=3)
    }
  }
  lines(n, pBE, type="l", lwd=2, col=clr[j])
  text(n[j], pBE[j], labels=paste0(CV[j],"%"))
}


Another point are the number of simulations. We know the slow convergence in these kind of sim’s. 5,000 for power and 10,000 for empiric α are by a factor of 100 too low.

❝ […] one of the authors used to be a member of the PK subgroup back in the day.


Interesting.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2014-07-21 22:50
(3537 d 15:28 ago)

@ Helmut
Posting: # 13285
Views: 11,526
 

 Why simulate a simple AB|BA at all?

Hi Hötzi,

❝ Yes. Given the unpleasant experience we had last year submitting a letter to the namely journal I’m not sure whether he will risk the efforts…


Are we talking about Detlew der Hosenscheisser or Detlew the Conqueror?

❝ BTW, I have some mixed feelings about the paper. I don’t have the slightest idea why the authors simulated the rejection rate for a conventional 2×2 cross-over. Power can be directly calculated for any given combination of α, CV, GMR, and N. One should never ever get anything >0.05! So where does this hump at N>30 in Fig. 1 come from?


Hmmmm that's a relevant question. Perhaps they just wanted to demonstrate internal validity of the sim algos? I also use GraphPad Prism for graphing, though an earlier version. It does not stack or offset curves, so they really do seem to have gotten a little hump of sorts.

One conclusion to draw is that the whole scaling business might not be that smart when it comes to type I errors. At the more general level another proposal is that before a group of guideline authors agree on a new requirement they should investigate the requirement by specific simulations rather than just follow their instincts or adopt concepts that were made for a different null hypothesis ("0.0294" .... do I need to say more?).

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2014-07-22 01:13
(3537 d 13:05 ago)

@ ElMaestro
Posting: # 13286
Views: 11,637
 

 Fancy smoothing?

Hi ElMaestro,

❝ Hmmmm that's a relevant question. Perhaps they just wanted to demonstrate internal validity of the sim algos? I also use GraphPad Prism for graphing, though an earlier version. It does not stack or offset curves, so they really do seem to have gotten a little hump of sorts.


Below a comparison: Calculated, 104, and 106 sim’s. The red dotted line on top is the signi­fi­cance limit (0.05373 for 104 and 0.05036 for 106).

[image]

I tried also to repeat Japan’s bizarre method (Fig. 2 of the paper):

[image]

A lot of ‘noise’, but essentially I could reproduce the reported inflation (con­verging at ~7.5%). Inflation with this method is a textbook example (unadjusted multiple testing). I’m happy that finally someone dem­on­strated it. The inflation is what to expect from pooling two groups, where the size of the second one is 50% of the first: 0.05+0.05/2=0.075… Voilà.
For my experiences in 日本 see the end of this post. :lookaround:

❝ One conclusion to draw is that the whole scaling business might not be that smart when it comes to type I errors.


If EMA’s method is concerned, yes.

❝ At the more general level another proposal is that before a group of guideline authors agree on a new requirement they should investigate the requirement by specific simulations rather than just follow their instincts or adopt concepts that were made for a different null hypothesis […]



Exactly. It’s not only statistics which sucks. The PK group single-handed invented a bunch of PK metrics whose relevance and sensitivity to formulation differences are not supported by a single publication. What the heck is AUCτ/2? We had three conferences on the MR GL, hun­dreds of pages of comments… Guess what? Felt like talking to a brick wall.

❝ […] ("0.0294" .... do I need to say more?).


No.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2014-07-22 10:43
(3537 d 03:35 ago)

@ Helmut
Posting: # 13289
Views: 11,421
 

 Arbitrary smoothing?

Dear Helmut,

❝ A lot of ‘noise’, but essentially I could reproduce the reported inflation (con­verging at ~7.5%). Inflation with this method is a textbook example (unadjusted multiple testing). I’m happy that finally someone dem­on­strated it. The inflation is what to expect from pooling two groups, where the size of the second one is 50% of the first: 0.05+0.05/2=0.075… Voilà.


Where does the tremendous noise in your pictures came from?

And Fig. 2 (as well as others) of the paper is with tremendous (arbitrary?) smoothing? Crossing of some of the curves seems to point in that direction. The plot form (continuous curves, without the points they have simulated) itself is suspicious IMHO.

BTW: Do you see any need to create a power.addon() in PowerTOST? Pedagogical?

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2014-07-22 15:31
(3536 d 22:47 ago)

@ d_labes
Posting: # 13290
Views: 11,529
 

 Noise…

Dear Detlew,

❝ Where does the tremendous noise in your pictures came from?


Quick and dirty as ever; I set a different seed for every simulation. Looks better if keeping the same one:

[image]

[image]

❝ And Fig. 2 (as well as others) of the paper is with tremendous (arbitrary?) smoothing? Crossing of some of the curves seems to point in that direction. The plot form (continuous curves, without the points they have simulated) itself is suspicious IMHO.


Agree.

❝ BTW: Do you see any need to create a power.addon() in PowerTOST? Pedagogical?


I guess enthusiasts could already misuse* the – experimental – function power.2stage.GS() as I did. I tried alpha=c(0.05, 0.05) and n=c(n, n/2+(n/2)%%2). The modulo in the second part of the n-vector rounds up to the next even number be­cause I don’t like imbalanced studies. Example: n = 18, n/2 = 9, n/2+(n/2) mod 2 = 10.
power.addon() would be nice, of course. ;-)

BTW, the Japanese guidance states:

The add-on subject study should include at least one half of the number of subjects in the initial study.

I overlooked that until today (believing that n2 is fixed at n1/2). This opens the door to an infinite number of designs (adjusting for the observed CV, any power, even fully ad­ap­tive). Glad that I never dealt with submissions to Japan so far.


  • Due to the futility criterion in this function there are cases where the study stops after the first part. Anyhow, I saw inflation of ~7.55%.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2014-07-22 18:06
(3536 d 20:12 ago)

@ Helmut
Posting: # 13291
Views: 11,478
 

 TSD Japonica…

Dear Helmut,

❝ ❝ BTW: Do you see any need to create a power.addon() in PowerTOST? Pedagogical?


❝ I guess enthusiasts could already misuse the – experimental – function power.2stage.GS() as I did.


Clever, clever such enthusiasts! :cool:
Although not quite clear to me if this meets the "Addonionsis Japonica" in all respects.
At least it is not what Wonnemann et al. simulated. No futility criterion was used by them if I read the paper correct (see below what power.2stage.GS() does).

❝ BTW, the Japanese guidance states:

The add-on subject study should include at least one half of the number of subjects in the initial study.

I overlooked that until today (believing that n2 is fixed at n1/2). This opens the door to an infinite number of designs (adjusting for the observed CV, any power, even fully ad­ap­tive).


Seems the story goes further (just befor your quote):
"If bioequivalence cannot be demonstrated because of an insufficient number, an add-on subject study can be performed ..."

If one reads this, it may call for some criteria what an insufficient number is. Knowing Potvin et al. TSD's this may be interpreted as call for some power calculation step after 'stage 1'.

In power.2stage.GS() the criterion for continuing to stage 2 is:
- not BE in the first stage
- and result was not futile (via a PE or CI criterion)

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2014-07-22 19:50
(3536 d 18:28 ago)

@ d_labes
Posting: # 13292
Views: 11,461
 

 TSD Japonica…

Dear Detlew,

❝ […] not quite clear to me if this meets the "Addonionsis Japonica" in all respects.


No, it doesn’t. That’s why I called it misuse. Thoughtlessly I edited my post not being aware of your answer already posted. I’ve set the futility criterion to extreme values, but I was aware that still some studies might simply stop.

❝ At least it is not what Wonnemann et al. simulated. No futility criterion was used by them if I read the paper correct […]



Yes, you do.

❝ Seems the story goes further (just befor your quote):

"If bioequivalence cannot be demonstrated because of an insuffi­ci­ent number, an add-on subject study can be performed ..."


Oh no!

❝ If one reads this, it may call for some criteria what an insuffi­ci­ent number is. Knowing Potvin et al. TSD's this may be interpreted as call for some power calculation step after 'stage 1'.


Seems so. The terminology is consistent, since in the preceding sentence the guid­ance asks for a “sufficient” number of subjects in planning the sample size.

❝ Knowing Potvin et al. TSD's this may be interpreted as call for some power calculation step after 'stage 1'.


If one wants to find a suitable adjustment avoiding inflation the only difference to Potvin & Co. would be restricting n2 to n1/2.

❝ In power.2stage.GS() the criterion for continuing to stage 2 is:

❝ - not BE in the first stage

❝ - and result was not futile (via a PE or CI criterion)


For the latter case I used 0.01/100, but the former may still hit.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2014-07-24 15:47
(3534 d 22:30 ago)

@ Helmut
Posting: # 13307
Views: 11,337
 

 Noise debugged

Dear Helmut,

❝ ❝ Where does the tremendous noise in your pictures came from?


❝ Quick and dirty as ever...


Mea culpa, mea maxima culpa.
Quick and dirty was on my side. The noise comes from a nasty bug in power.2stage.GS() which let the power values jumping around and not converging with increasing number of sims :crying:.
I noted some of this behaviour already earlier and thus called this function "experimental". But than forgot it.

Corrected version is under way. Sorry for any inconvenience.

BTW: To calculate without futility criterion set fCrit="PE" and set fClower=0 (implies fCupper=Inf).

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2014-07-24 17:07
(3534 d 21:11 ago)

@ d_labes
Posting: # 13308
Views: 11,320
 

 Noise debugged

Dear Detlew,

❝ Quick and dirty was on my side. […] Corrected version is under way.


THX a lot!

❝ BTW: To calculate without futility criterion set fCrit="PE" and set fClower=0 (implies fCupper=Inf).


Ah – yes! Looks much better:

Fig 2:
[image]

Fig 4:
[image]

PS: In help/NEWS correct to Version 0.1-04 before uploading to CRAN.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2014-07-24 18:09
(3534 d 20:09 ago)

@ Helmut
Posting: # 13309
Views: 11,269
 

 Shit happens

Dear Helmut,

❝ PS: In help/NEWS correct to Version 0.1-04 before uploading to CRAN.


Oh no! Too late my Dear.
I hate this, have to change version number and date in more then one place and you can bet that I forgot at least one :angry:. Only the entry in DESCRIPTION is checked automatically during build. Fortunately this is only a "Schönheitsfehler".

Regards,

Detlew
d_labes
★★★

Berlin, Germany,
2014-07-22 10:18
(3537 d 04:00 ago)

@ ElMaestro
Posting: # 13288
Views: 11,458
 

 Commentary

My Dear,

❝ ❝ Yes. Given the unpleasant experience we had last year submitting a letter to the namely journal I’m not sure whether he will risk the efforts…


❝ Are we talking about Detlew der Hosenscheisser or Detlew the Conqueror?


We are talking about Detlew the Mahatma :-D:

"God, grant me the serenity to accept the things I cannot change,
courage to change the things I can
and wisdom to know the difference."

And changing the policy of Pharm. Res. (no letter to the editor since many years) is beyond my reach as experienced.

Regards,

Detlew
nobody
nothing

2014-09-15 10:24
(3482 d 03:54 ago)

@ d_labes
Posting: # 13511
Views: 10,788
 

 Commentary

"And changing the policy of Pharm. Res. (no letter to the editor since many years) ..."

Rrrrrrreally sure?

Look here:

click me...

Top of the list...

Kindest regards, nobody
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2014-09-15 12:12
(3482 d 02:06 ago)

@ nobody
Posting: # 13512
Views: 10,902
 

 Commentary

Hi nobody,

that’s amazing! Last year I searched the archives of Pharm Res and couldn’t find a single one. Now I got 34; I guess I was just too stupid. ;-)

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2014-09-29 05:33
(3468 d 08:45 ago)

@ ElMaestro
Posting: # 13611
Views: 10,668
 

 IBE/PBE = Two-Stage

Hi ElMaestro,

❝ One conclusion to draw is that the whole scaling business might not be that smart when it comes to type I errors.


Found a goody in Chow/Liu (3rd ed., but I guess you find it in the earlier ones as well). Chapter 19, Review of Regulatory Guidances on Bioequivalence, 19. Guidances on Statistical Procedures, 19.2.10 Two-Stage Test Procedure

   To apply the proposed criteria for assessment of PBE or IBE, the 2001 FDA guidance suggests that a constant scale be used if the observed estimator of σTR or σWR is smaller than σT0 or σW0. However, statistically, the observed estimator of σTR or σWR being smaller than σT0 or σW0 does not mean that σTR or σWR is smaller than σT0 or σW0. A test on the null hypothesis that σTR or σWR is smaller than σT0 or σW0 is nec­es­sarily per­formed. As a result, the proposed statistical procedure for assessment of PBE or IBE becomes a two-stage test procedure. It is then recommended that the overall type I error rate and the calculation of power be adjusted accordingly.


PBE and IBE were never seriously implemented, but with RSABE and ABEL we face the same shyte.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2014-09-29 17:16
(3467 d 21:02 ago)

@ Helmut
Posting: # 13617
Views: 10,513
 

 IBE/PBE = Two-Stage

Hi Helmut,

❝ PBE and IBE were never seriously implemented, but with RSABE and ABEL we face the same shyte.


thanks for posting. It is an interesting quote but I do not understand much of its implications, to be honest. That might be just because I am rather inexperienced with rsABE etc.

Having said that, FDA have gone full tilt ahead with population BE for budesonide and other inhalation drugs at the in vitro level. I wonder how the pieces actually fit together.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2014-09-30 17:03
(3466 d 21:15 ago)

@ ElMaestro
Posting: # 13621
Views: 10,609
 

 Sequential Design = Inflation likely…

Hi ElMaestro

❝ It is an interesting quote but I do not understand much of its implications, to be honest. That might be just because I am rather inexperienced with rsABE etc.


See Fig.3 from Davit et al.*

[image]


Is this a sequential design or not? Forget the protocol review, just look at the left branch. Will the type I error be inflated? In some cases, yes. Remember that in con­ven­tio­nal (unscaled) ABE the empiric α for some combinations of sample size and CV is substantially lower than the nominal α of 0.05. Similar here. If we don’t adjust α, the lower empiric than nominal level of TOST will protect us – maybe. But: We can expect to face inflation, especially close to CVWR 30%.

❝ […] FDA have gone full tilt ahead with population BE for budesonide and other inhalation drugs at the in vitro level. I wonder how the pieces actually fit together.


Not my field of expertise. Go and have my next puff now.


  • Davit BM, Ling Chen M-L, Conner DP, Sam H. Haidar SH, Kim S, Lee CH, Lion­berger RA, Makhlouf FT, Nwakama PE, Patel DT, Schuir­mann DJ, Yu LX. Implementation of a Reference-Scaled Average Bioequivalence Approach for Highly Variable Generic Drug Products by the US Food and Drug Administration. AAPS J. 2012;14(4):915–24. doi:10.1208/s12248-012-9406-x.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
UA Flag
Activity
 Admin contact
22,957 posts in 4,819 threads, 1,639 registered users;
81 visitors (0 registered, 81 guests [including 7 identified bots]).
Forum time: 13:18 CET (Europe/Vienna)

Nothing shows a lack of mathematical education more
than an overly precise calculation.    Carl Friedrich Gauß

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5