Helmut
★★★
avatar
Homepage
Vienna, Austria,
2016-03-04 15:37
(2062 d 13:09 ago)

Posting: # 16051
Views: 10,414
 

 The EMA’s BSWP’s opinon [Two-Stage / GS Designs]

Dear all,

last month I had the displeasure to attend a “scientific” advisory meeting at a Scandivian agency.

Background:
  • Nasty drug, only steady state in patients possible, parallel design (cross-over would take more than one year, disease often not stable (needs dose-adjustment – exclusion of subjects, etc.), expected drop­out-rate 20%. Given that, the agency’s PK-expert was fine with waiving the (more sensitive) single dose study and accepted the proposed parallel design.
  • Little known about the variability, but likely >50%. The company suggested a TSD and was referring to Anders’ paper.* Since the variability was expected to be high, the company followed the advice of the two Lászlós and assumed a T/R-ratio of 0.90. Not covered in Anders’ paper, simulations required.
  • I started with equal group sizes (62 to 125 per group = n1 124 to 250; step size 1/group) and CV 24% to 100% (step size 2%) ~2.7·109 simulations.
    Repeated for unequal group sizes, covering slight unbalance to the extreme scenario of 50.4% drop­outs in one group and none in the other (n1: 124/125 to 62/125).
    So far, so good.
  • But I had to assess heteroscedasticity in the combination of unequal group sizes as well. I explored 32 different CV-ratios (CVG1/CVG2: 0.262 to 3.87). The set was centered around CV 58% (the location of the maximum TIE). With current technology it is not feasible to simulate all possible combinations, i.e., 1.1·106×2,4962 = 6.853·1012 (~seven trillion!) BE studies. The runtime would be approximately three to four years. Hence, I assessed eight scenarios (all based on the Welch-test adjusting for heterogenicity and unequal group sizes):
    • Equal group sizes
      • No dropouts (n1 250).
      • Expected dropout rate of ~30%, resulting in 88 eligible subjects in stage 1 (n1 176).
      • High dropout rate of ~38%, resulting in 77 subjects in stage 1 (n1 154). This is the location of the maximum observed TIE.
      • Extreme dropout rate of ~50%, resulting in 62 eligible subjects in stage 1 (n1 124).
    • Unequal group sizes
      • No dropouts in group 1 (nG1 125) and dropout rate ~30% in group 2 (nG2 88), resulting in 213 subjects in stage 1. Overall dropout rate ~15%.
      • Dropout rate ~30% in group 1 (nG1 88) and no dropouts in group 2 (nG2 125), resulting in 213 subjects in stage 1. Overall dropout rate ~15%.
      • No dropouts in group 1 (nG1 125) and extreme dropout rate ~50% in group 2 (nG2 62), resulting in 187 subjects in stage 1. Overall dropout rate ~25%.
      • Extreme dropout rate ~30% in group 1 (nG1 62) and no dropouts in group 2 (nG2 125), resulting in 187 subjects in stage 1. Overall dropout rate ~25%.
Maximum TIE was 0.04987. Power generally >80% unless the very unlikely combination of extremely different group sizes and CVs hits. Even then ~70%. I was satisfied.

The agency’s statistician said (my comments in blue):
  • “According to the BE guideline Two-Stage Designs are acceptable in principle. Primary concern is preserving the TIE.”
    In principle‽ Sounds like Radio Yerevan.
  • “I did not read the report in its entirety.”
    The company sent my report to the agency one month in advance. Leaving the title page, formulas, tables, and graphs aside the text covers 4 (four!) pages.
  • “According to a recommendation of the Biostatistics Working Party TSDs similar to Potvin et al. are not acceptable because they are only based on simulations and lack a statistical proof.”
    Although both my report and the briefing package referred to Anders’ paper she had only Potvin’s paper with her. I told her that I’m aware of such rumours but since this recommendatoon of the BSWP is not public available companies performed such studies in the past and plan similar ones as well. The quality assessor of the agency was quite surprised and asked me “This statement was not made public‽”. I replied yes and the statistician confirmed that. After that the other assessors groaned loudly…
    I pointed out that these methods were published in peer-reviewed journals with high impact factors and co-authored by veteran statisticians like Donald Schuirmann and Walter Hauck. She replied:

  • “The BSWP does not agree with their conclusions. There are alternative methods containing a proof available.”
    I asked her which methods but she was unable to name a single one. The very next day I’ve sent her an e-mail asking for references and didn’t get an answer so far.
  • “The method might be [sic] acceptable if the outcome is unambigous (i.e., the CI not too close to the acceptance range).”

The work plan 2016 of the BSWP contains this:

Type I error control in two-stage designs in bioequivalence studies
Action: Continue work related to type I error control in two-stage designs in bioequivalence studies.
Comments: This is done in collaboration with the Pharmacokinetics Working Party.

I fear the worst. But where is the secret recommendation? Today Rev. 13 of the Q&A document was published. Nada.



Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2016-03-04 23:16
(2062 d 05:29 ago)

@ Helmut
Posting: # 16052
Views: 8,654
 

 The EMA’s BSWP’s opinon

Hi Hötzi,

clearly there must be some kind of misunderstanding. Some of the Spandinavians are not easy to understand as recent linguistic research has proven.

I have ok experience with scientific advices at EU agencies; as far as I recall I did not once run into someone telling me that an approach wasn't acceptable, as long as it was backed well up with a simulation to control type I errors. Regulators have for several years now accepted two-stage approaches (see e.g. the Q&A you linked to) and as far as I know Potvin's papers and its descendants are the only ones containing anything that just remotely resembles a proof of control over the type I error. But ok, perhaps regulators have a secret method which is acceptable (as long as there is a period in stage term :-D:-D:-D) and which they forgot to share with the rest of the world. I can't tell if this is the case, but I somehow doubt it. Is the problem simulation rather than math proof? It isn't my impression that simulations are outright banned or discouraged and some members of the PKWP or BSWP have published enjoyable and useful science based on simulation. Some of them have even been active in research funded under EU's 7th frame programme - it was called "Biosim" and the whole idea of simulation was enthusiastically promoted as being scientifically sound and necessary from many angles and it even enjoyed regulatory support from regulators in AT, NL, ES, DE, FR, NO and possibly UK not sure?!. The countries are just from the top of my head.

»

Yah I read that one too. I have a copy in the loo, might come in handy if I one day run out of roll.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2016-03-05 15:17
(2061 d 13:28 ago)

@ ElMaestro
Posting: # 16054
Views: 8,531
 

 Control of the TIE (empiric via simulations vs. proof)

Hi ElMaestro,

» Some of the Spandinavians are not easy to understand as recent linguistic research has proven.

Interesting! Given these problems did you ever consider to opt for an easier language? Most Irish abandoned “Sweet Gaelic” for the far more simple English. What ’bout [image] Gibberish?

» […] as far as I know Potvin's papers and its descendants are the only ones containing anything that just remotely resembles a proof of control over the type I error.

Not even remotely. You cannot plug the decision tree of the frameworks into a formula to estimate power. Hence, a mathematical proof is not possible. The control of the TIE shown over the assessed combos of n1/CV is purely empirical. That’s why hard-core statisticians without further considerations will tell you that these methods are crap.

» But ok, perhaps regulators have a secret method which is acceptable (as long as there is a period in stage term :-D:-D:-D) and which they forgot to share with the rest of the world. I can't tell if this is the case, but I somehow doubt it.

So do I. The only one which claims to contain a proof is Kieser & Rauch.* IMHO, the two lines in the article are actually no more than a claim… I asked the agency’s statistician whether she means this paper and she replied “No.”

» Is the problem simulation rather than math proof?

Yes. As said above this gives statistician hiccups. The ideal situation would be a proof for the type I error. Most statisticians accept simulations only for the type II error. IMHO, a proof for frameworks is not possible.

» It isn't my impression that simulations are outright banned or discouraged and some members of the PKWP or BSWP have published enjoyable and useful science based on simulation.

Well, the entire reference-scaling stuff (might it be the FDA’s RSABE for HVDs or NTIDS and the EMA’s ABEL) is entirely based on simulations. These methods are frameworks as well. Proof of the control of the TIE impossible. Do these expert statisticians don’t know that or just ignore it?
Case 1: Bad. Case 2: Double moral standards. :-(

» […] research funded under EU's 7th frame programme - it was called "Biosim"…

Ended March 2010. Different cup of tea. Toys for boys. No confirmatory statistics like in BE.


  • Kieser M, Rauch G. Two-stage designs for cross-over bioequivalence trials. Stat Med. 2015;34(6):2403–16. doi 10.1002/sim.6487

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2016-03-06 22:59
(2060 d 05:47 ago)

(edited by ElMaestro on 2016-03-06 23:49)
@ Helmut
Posting: # 16060
Views: 8,292
 

 To whom it may concern

To whom it may concern:

» (...)Yes. As said above this gives statistician hiccups. The ideal situation would be a proof for the type I error. Most statisticians accept simulations only for the type II error.(...)

Power is the chance of showing BE for some given model at a given CV and GMR. The type I error is by definition the power when the GMR is chosen to be exactly on the acceptance border (either high or low, there is no difference). Thus power and type I error is for all practical purposes one and the same thing, all that differs is the applied GMR. Why on earth would you trust anyone's results after he/she types 0.95 on a keyboard and presses Enter, but not trust the same person typing 0.8 or 1.25 and hitting Enter?

There is a not a single equation or iteration that is not operating according to the same rule set when we simulate for power vs. when we simulate for type I error. The difference is solely in the value of a single variable.

And I was always taught that cherry-picking is not an option. Consider two BE plasma samples that are being analysed. They are the input into the exact same process. Whatever comes out of that process has to be trusted and it isn't an option for me to prefer one output over the one as long as the process they were subjected to was the same and valid. If I try cherry-picking I will be subjected to inspection and questioning, and that is really fair enough.
If we subject "0.95" and "1.25" (or "0.80") to the exact same process then we do not consider the output arising from one of them to be more valid than the output arising from the other. So can you please tell me that either both results can be trusted or that neither of the two results can?

Many, many thanks. I really mean it.

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2016-03-07 11:41
(2059 d 17:04 ago)

@ ElMaestro
Posting: # 16064
Views: 8,240
 

 To whom it may concern

Dear ElMaestro,

you are totally right IMHO.
The reason why statisticians (some of) are accepting simulations only for type II error are the different roles of power and type I error.

Ideal statistical tests meet the rule that type I error is <=0.05 or some other threshold agreed upon. Using simulations to determine this has the drawback that you only can show this for the scenarios you have simulated. Not for the general case. Therefore a proof assuring no alpha-inflation is preferred.

Power on the other hand is useful for planning purposes (only). No strict criterion has to be applied to this term in general.

Regards,

Detlew
d_labes
★★★

Berlin, Germany,
2016-03-06 13:31
(2060 d 15:14 ago)

@ Helmut
Posting: # 16057
Views: 8,270
 

 Opinon?

Dear Helmut,

for me that's not an opinion but rather total ignorance of the scientific work on this field :not really:.

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2016-03-06 14:45
(2060 d 14:01 ago)

@ d_labes
Posting: # 16058
Views: 8,351
 

 Obsession?

Dear Detlew,

I’m sick of some statisticians’ obsession with mathematical proofs.
The three-body problem in physics can’t be solved analytically. However, man landed on the moon and the New Horizons space probe reached Pluto. Same goes with the Navier-Stokes equations of fluid dynamics. Aircraft are designed by the finite element method (fortunately resulting in a much, much lower risk* than 0.05). If statisticians are so worried about numeric methods they should stop traveling – except by foot.

Biostatistician. One who has neither the intellect for mathematics
nor the commitment for medicine
but likes to dabble in both.
      Stephen Senn



  • In 2014: More than 3.3 billion passengers flew safely on 38.0 million flights. 73 accidents (12 fatals). 641 fatalities (risk 1.9·10–7). 2014 was a bad year (MH 370 and MH 17).

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
DavidManteigas
★    

Portugal,
2016-06-02 12:57
(1972 d 16:49 ago)

@ Helmut
Posting: # 16378
Views: 7,162
 

 Obsession?

» Biostatistician. One who has neither the intellect for mathematics
» nor the commitment for medicine
» but likes to dabble in both.      Stephen Senn

Nice one Helmut, never heard!

I believe that is true for some old statisticians not used to computers and with a strong mathematics background. I'm a junior statistician and my background is in life sciences. Simulation is my favourite tool for everything, both because I'm not as good as that in mathematical theory and because simulation allows us to produce a lot of useful output to learn and prove our points (and it is much more simple to explain that mathematical equations :D). As far as I saw, new statisticians joining the pharmaceutical industry are much more computer science-lovers than mathematics nerds. So it will be a matter of time until Stephen's definition will be outdated :D
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2016-06-02 13:30
(1972 d 16:15 ago)

@ DavidManteigas
Posting: # 16379
Views: 7,179
 

 0.0501 not acceptable

Hi David,

welcome to the forum!

» Nice one Helmut, never heard!

From Stephen’s Statistical Issues in Drug Development, p.457.

I fully agree with your point of view.

» So it will be a matter of time until Stephen's definition will be outdated

I’m not sure if I’ll live long enough. What is the average age of members of the EMA’s Biostatistical Working Party? A member of the PKWP officially asked the BWSP whether a TIE of 0.0501 [sic] is acceptable and the answer was “No!”

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2016-06-02 14:44
(1972 d 15:02 ago)

@ Helmut
Posting: # 16383
Views: 7,242
 

 OT: Guernsey McPearson's Drug Development Dictionary

Hi David and Helmut,

» From Stephen’s Statistical Issues in Drug Development, p.457.

An other source of similar pieces of wisdom is "The Devil's Drug Development Dictionaries" :cool:.

Here a taster (one of my favorites):

Statistician.
  1. One devoted to generating statements that are probably true and definitely useless.
  2. One who wears a condom for telephone sex.
  3. One who thinks that loving your data is more exciting than dating your lover.
  4. One who can't run experiments himself, prefers to tell others how they should, steals the data and expects to be thanked for it.
  5. One who thinks that the way to be efficient is to measure as little as possible in a trial that costs millions to run.

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2016-06-02 15:59
(1972 d 13:46 ago)

@ d_labes
Posting: # 16386
Views: 7,190
 

 OT: Comparison

Dear Detlew,

Martin told me another one:
Stephen was asked by a colleague “How is your wife?” and he replied “Compared to what?”

Professional deformation? :-D

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2016-06-02 16:42
(1972 d 13:03 ago)

@ Helmut
Posting: # 16388
Views: 7,125
 

 OT: Comparison

Dear Helmut,

» Stephen was asked by a colleague “How is your wife?” and he replied “Compared to what?”
:rotfl:

» Professional deformation? :-D
Missing Null/Alternative hypo :-D

Regards,

Detlew
Activity
 Admin contact
21,758 posts in 4,550 threads, 1,544 registered users;
online 3 (0 registered, 3 guests [including 2 identified bots]).
Forum time: Wednesday 05:46 CEST (Europe/Vienna)

There ain’t no rules around here!
We’re trying to accomplish something!    Thomas Alva Edison

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5