Bioequivalence and Bioavailability Forum

Main page Policy/Terms of Use Abbreviations Latest Posts

 Log in |  Register |  Search

Back to the forum  2018-07-22 07:28 CEST (UTC+2h)
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-04-29 00:46

Posting: # 17278
Views: 10,629
 

 Russian «Экс­пер­тами» and their hobby [Regulatives / Guidelines]

Hi Artem,

concerning your question in the other thread:

» I need to calculate additional parameter in ANOVA - Cohort factor.

Oh, the hobby of the Russian «Экспертами»

» Then in the case of a EMA Model Specification is:
» sequence+subject(sequence)+period+treatment+cohort
» Am I right?

I’m afraid, no. The EMA does not specify a model. In the BE-GL we find only:

4.1.1 Study design
The study should be designed in such a way that the formulation effect can be distinguished from other effects.
4.1.8 Evaluation – Statistical analysis
The precise model to be used for the analysis should be pre-specified in the protocol. The statistical analysis should take into account sources of variation that can be reasonably assumed to have an effect on the response variable.


» And how Model Specification can be constructed for agencies recommending a mixed-effects model (FDA, Health Canada)?

You find the FDA’s models under the FOI and some members of the forum have a letter with the same wording. The FDA suggests three models (group instead of cohort):
  1. Group, Sequence, Treatment, Subject (nested within Group × Sequence), Period (nested within Group), Group-by-Sequence Interaction, Group-by-Treatment Interaction.
    Subject (nested within Group × Sequence) is a random effect and all other effects are fixed effects. Note that intra-subject contrasts for the estimation of the treatment effect (and hence, a PE and its CI) cannot be unbiased obtained from this model. It serves only as a decision tool.
    • If the Group-by-Treatment interaction test is not statistically significanta (p ≥0.1), only the Group-by-Treatment term can be dropped from the model. That means, pool the data and evaluate the study by model #2.
    • If the Group-by-Treatment interaction is statistically significanta (p <0.1), equivalence has to be demonstrated in one of the groups, provided that the group meets minimum requirements for a complete bioequivalence study. That means, no pooling and evaluate the (largest) group only by model #3.
  2. Group, Sequence, Treatment, Subject (nested within Group × Sequence), Period (nested within Group), Group-by-Sequence Interaction.
    Again, Subject (nested within Group × Sequence) is a random effect and all other effects are fixed effects.
    The model takes the multigroup nature of the study into account and is more conservative than the naïve pooled model (three degrees of freedom less than model #3).
  3. Sequence, Treatment, Period, Subject (nested within Sequence).
    Surprise: Subject (nested within Group × Sequence) is a random effect and all other effects are fixed effects.
However, the FDA also states that the simple model #3 (of pooled data) can be applied if all of the following criteria are met:
  • the clinical study takes place at one site,
  • all study subjects have been recruited from the same enrollment pool,
  • all of the subjects have similar demographics, and
  • all enrolled subjects are randomly assigned to treatment groups at study outset.
I have no idea why the group effect is such a big deal in Russia. Practically the criteria for not using group terms is almost always fulfilled. The nasty thing is that the Group-by-Treatment interaction test has low power (therefore, testing at the 0.1 level). You should expect a false positive rate at the level of the test and trash some of your studies due to lacking power.b Bizarre.

Since Russia follows the EMA’s footprints, treat subjects as fixed instead of random.c The decision scheme (i.e., whether data can be pooled or analysis of the largest group is recommended) is applicable as well. It should be noted that in rare cases (e.g., extremely unbalanced sequences) the fixed effects model gives no solution and the mixed effects model has to be used.


  1. In Phoenix/WinNonlin check the Partial Tests for model #1:
    Column Hypothesis, row Group*Treatment and its P_value.
  2. Example: CV of AUC 30% (no scaling allowed) but 4-period full replicate to allow scaling of Cmax, GMR 0.90, target power 90% → sample size 54. Capacity of the clinical site 24 beds. Three options:
    1. Equal group sizes (3×18).
    2. Two groups with the maximum size (24) and the remaining one six.
    3. One group 24, the remaining two as balanced as possible (16|14).
    Let us assume that we are not allowed to pool (significant Group-by-Treatment interaction in model #1) and have to assess BE in the groups. Which powers can we expect?
    1. 51% in all groups (n=18 each).
    2. 62% in the two large groups (n=24 each).
    3. 62% in the largest group (n=24).
    Hence, I don’t think that equal group sizes in #1 are a good idea.
    #2 looks better but what if one group passes and the other not? If you cherry-pick and present only the passing one I bet that assessors will ask for the other one. What do you think they will conclude?
    Therefore, I would suggest #3…
  3. Setup of the models in Phoenix/WinNonlin (map Group as Classification):
    1. Group+Sequence+Treatment+Group*Sequence+
      Group*Period+Group*Treatment+Subject(Group*Sequence)

    2. Group+Sequence+Treatment+Group*Sequence+
      Group*Period+Subject(Group*Sequence)

    3. Sequence+Treatment+Period+Subject(Sequence)

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
mittyri
Senior

Russia,
2017-04-29 22:57

@ Helmut
Posting: # 17283
Views: 9,820
 

 Low power of Group-by-Treatment interaction

Hi Helmut!

Your opinion is very important for Russian BEBA amateurs, so I'm expecting your approach will be 'carved in Russian stone'. :ok:
It would be great if we get some consensus regarding models with group term (until the moment when our experts will change their mind or, probably, all other world will be convinced by Russian experts). :-D

» The nasty thing is that the Group-by-Treatment interaction test has low power (therefore, testing at the 0.1 level). You should expect a false positive rate at the level of the test and trash some of your studies due to lacking power.

Could you please clarify this point? I saw many times the problem of power for Sequence term for simple model and Group-by-Treatment interaction for FDA model I. Is it possible to prove that with sims? Or somebody did this work analytically?

PS: I suspect a lot of fun with replicate designs. Your model specification with group (from Österreich with love :flower:) works well even there, but it doesn't mean that this model is applicable for replicate designs (as we discussed elsewhere).

Kind regards,
Mittyri
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-04-30 13:54

@ mittyri
Posting: # 17284
Views: 9,850
 

 Let’s forget the Group-by-Treatment interaction, please!

Hi mittyri,

» Your opinion is very important for Russian BEBAC amateurs, so I'm expecting your approach will be 'carved in Russian stone'

If they are following the forum (are they?) I want to make one point clear:

I do not advocate routinely using the group procedures of the FDA!
On the contrary, all criteria for not using them are usually fulfilled (i.e., the simple model of pooled data can be used).

I did so in dozens of studies without ever getting a single (‼) deficiency letter. And my CRO was just a tiny one… Many thousands of BE studies were accepted by a multitude of agencies without asking for an ‘analysis’ of the group effect. :thumb up:
I would say that the EMA accepts without reservation that the group effect “cannot be reasonably assumed to have an effect on the response variable.”

» » The nasty thing is that the Group-by-Treatment interaction test has low power (therefore, testing at the 0.1 level). You should expect a false positive rate at the level of the test …
»
» Could you please clarify this point? I saw many times the problem of power for Sequence term for simple model …

Senn1 (who always strongly argued against testing the sequence – or better unequal carryover – effect!) writes:

Because the power of the test is low, being based on between-patient difference, a high nominal level of significance (usually 10%) is used.

An interesting statement by the EMA2 concerning the treatment by covariate interaction:

The primary analysis should include only the covariates pre-specified in the protocol and no treatment by covariate interaction terms. […] Tests for interactions often lack statistical power and the absence of statistical evidence of an interaction is not evidence that there is no clinically relevant interaction. Conversely, an interaction cannot be considered as relevant on the sole basis of a significant test for interaction. Assessment of interaction terms based on statistical significance tests is therefore of little value [sic].

(my emphases)

» … and Group-by-Treatment interaction for FDA model I. Is it possible to prove that with sims? Or somebody did this work analytically?

Don’t know. I’m in contact with a Canadian CRO to collect empiric evidence (like D’Angelo et al.3 did for carryover). We will include only studies where groups were separated by just a couple of days and all of the FDA’s criteria for pooling were fulfilled. A great deal of work but seemingly ~⅒ of studies show a significant group-by-treatment interaction. :crying:

» I suspect a lot of fun with replicate designs. Your model specification with group […] works well even there, but it doesn't mean that this model is applicable for replicate designs (as we discussed elsewhere).

Yep.


  1. Senn S. Crossover Trials in Clinical Research. Chichester: Wiley; 2nd ed. 2002. p. 58.
  2. EMA. Guideline on adjustment for baseline covariates in clinical trials. London: 26 February 2015. EMA/CHMP/295050/2013.
  3. D’Angelo G, Potvin D, Turgeon J. Carryover effects in bioequivalence studies. J Biopharm Stat. 2001; 11(1–2): 35–43. doi:10.1081/BIP-100104196.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
ElMaestro
Hero

Denmark,
2017-05-01 16:19

@ Helmut
Posting: # 17286
Views: 9,754
 

 Let’s forget the Group-by-Treatment interaction, please!

Hi Hötzi and Mittyri,

this thread is interesting and confusing to me.
May I ask or comment for clarification:

M: "Is it possible to prove that with sims?" - what is it you want to prove? Can you formulate it plain and simple? Sims are totally possible, I just need to figure out the equations, as well as have a purpose.:-D

H: "It should be noted that in rare cases (e.g., extremely unbalanced sequences) the fixed effects model gives no solution and the mixed effects model has to be used." - a realistic linear model will have a single analytical solution unless you make a specification error. Imbalance would not affect that, please describe where/how you came a cross a fit which failed with the lm.

M+H: FDA are also fitting subject as fixed even when using the random statement in PROC GLM. Some of them just have not realised it :-)

H: "(...) seemingly ~⅒ of studies show a significant group-by-treatment interaction. " - this is expected by chance. You apply a 10% significance level. By chance 10% will then be significant.
(and by the way: Which denominator in F did you apply; within or between?)

if (3) 4

Best regards,
ElMaestro

"(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018.
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-05-02 01:10

@ ElMaestro
Posting: # 17287
Views: 9,871
 

 Some answers

Hi ElMaestro,

» M: "Is it possible to prove that with sims?" - what is it you want to prove? Can you formulate it plain and simple? Sims are totally possible, I just need to figure out the equations, as well as have a purpose.:-D

Not M but answering anyway.
The idea behind the Group-by-Treatment interaction is that the T/R in one group is different from the other (i.e., we have collinearity with a “hidden” variable). Therefore, simulate a group of subjects with T/R 0.95 and another one with T/R 0.95–1 (CV ad libitum). Merge them to get a “study”. Run model 1 and check the p-value of the Group-by-Treatment interaction. With the simple model you should expect T/R 1.

» H: "It should be noted that in rare cases (e.g., extremely unbalanced sequences) the fixed effects model gives no solution and the mixed effects model has to be used." - a realistic linear model will have a single analytical solution unless you make a specification error. Imbalance would not affect that, please describe where/how you came a cross a fit which failed with the lm.

I had one data set where the fixed effects model in Phoenix/WinNonlin showed me the finger. Same in JMP (“poor man’s SAS”). Have to check again.

» M+H: FDA are also fitting subject as fixed even when using the random statement in PROC GLM. Some of them just have not realised it :-)

True.

» H: "(...) seemingly ~10% of studies show a significant group-by-treatment interaction. " - this is expected by chance. You apply a 10% significance level. By chance 10% will then be significant.

Exactly. That’s the idea of assessing real studies. If there would be a true Group-by-Treatment interaction (i.e., not random alone) we could expect significant results in >10% of studies. This is what I have so far (I hope that the Canadians will come up with another ~100).

[image]

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.0012  0.2654  0.5542  0.5196  0.7774  0.9925


[image]

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.00376 0.28570 0.46705 0.48666 0.71465 0.98837

85 studies (60 analytes), 84 data sets for AUC and 85 for Cmax, sample sizes 15 to 74, two to four groups, median interval between groups three days. Significant Group-by-Treatment interaction in 8.33% (AUC) and 12.94% (Cmax) of data sets. Hence, I guess it is a bloody myth.

» (and by the way: Which denominator in F did you apply; within or between?)

Numerator DF = Groups – 1
Denominator DF = Subjects – 2 × Groups

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
ElMaestro
Hero

Denmark,
2017-05-02 09:04

@ Helmut
Posting: # 17288
Views: 9,721
 

 Some answers

Hi Hötzi,

» The idea behind the Group-by-Treatment interaction is that the T/R in one group is different from the other (i.e., we have collinearity with a “hidden” variable). Therefore, simulate a group of subjects with T/R 0.95 and another one with T/R 0.95–1 (CV ad libitum). Merge them to get a “study”. Run model 1 and check the p-value of the Group-by-Treatment interaction. With the simple model you should expect T/R 1.

Thanks for this.
This sounds reasonable (T/R=1, assuming equal group sizes).

Could you tell how you got your F-test denominator? I am sure you are right, but I don't know where it came from. For an interaction of a between- and within-factor I think the rule of thumb (which is also a wee bit hard to define :-D) is to test against the within, which in this case would be the model residual.

if (3) 4

Best regards,
ElMaestro

"(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018.
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-05-02 12:35

@ ElMaestro
Posting: # 17291
Views: 9,739
 

 Example

Hi ElMaestro,

» Could you tell how you got your F-test denominator? I am sure you are right, but I don't know where it came from. For an interaction of a between- and within-factor I think the rule of thumb (which is also a wee bit hard to define :-D) is to test against the within, which in this case would be the model residual.

Yep. Below an example of model 1 in Phoenix/WinNonlin. Two groups (n=24 each), all effects fixed.

Partial Sum of Squares
            Hypothesis DF        SS        MS   F_stat  P_value
---------------------------------------------------------------
                 Group  1 0.0131109 0.0131109 1.0385149 0.31374
              Sequence  1 0.0058638 0.0058638 0.4644731 0.49914
             Treatment  1 0.0011965 0.0011965 0.0947752 0.75964
        Group*Sequence  1 0.0108734 0.0108734 0.8612869 0.35844
          Group*Period  2 0.0160976 0.0080488 0.6375490 0.53340
       Group*Treatment  1 0.0131109 0.0131109 1.0385149 0.31374
Group*Sequence*Subject 44 0.555484  0.0126246 1         0.50000
                 Error 44 0.555484  0.0126246

Partial Tests of Model Effects
            Hypothesis Numer_DF Denom_DF  F_stat   P_value
----------------------------------------------------------
                 Group        1       44 1.0385149 0.31374
              Sequence        1       44 0.4644731 0.49914
             Treatment        1       44 0.0947752 0.75964
        Group*Sequence        1       44 0.8612869 0.35844
          Group*Period        2       44 0.6375490 0.53340
       Group*Treatment        1       44 1.0385149 0.31374
Group*Sequence*Subject       44       44 1         0.50000


N: ΣnG = 48
G: 2
Numerator DF: G – 1 = 1
Denominator DF: N - 2G = 44
F: 0.0131109/0.0126246 = 1.0385149
round(pf(1.0385149, 1, 44, lower.tail=FALSE), 5)
# [1] 0.31374

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
mittyri
Senior

Russia,
2017-05-02 18:29

@ Helmut
Posting: # 17294
Views: 9,625
 

 Sensitivity of term?

Hi Helmut and ElMaestro,

Helmut answered to the question directed to me more accurate than I can ;-)

» The idea behind the Group-by-Treatment interaction is that the T/R in one group is different from the other (i.e., we have collinearity with a “hidden” variable). Therefore, simulate a group of subjects with T/R 0.95 and another one with T/R 0.95–1 (CV ad libitum). Merge them to get a “study”. Run model 1 and check the p-value of the Group-by-Treatment interaction. With the simple model you should expect T/R 1.

seems to be reasonable, but I do not see why the power is low?
So if our hypothesis is that the power is low, we need to reject H0 that power is high, in another words to prove that sensitivity of this term to deviations is low.
By the way if the power of this term is low, some other should be high, right? which one? :confused:

» » M+H: FDA are also fitting subject as fixed even when using the random statement in PROC GLM. Some of them just have not realised it :-)
»
» True.

AFAIK PHX knows only one model where subject is fitted as fixed term when placed to the variance structure, that's conventional model. In all other cases LinMix will switch to the mixed modeling.

» » (and by the way: Which denominator in F did you apply; within or between?)
»
» Numerator DF = Groups – 1
» Denominator DF = Subjects – 2 × Groups

Yes, the results are the same for complete data (mixed vs glm)

Kind regards,
Mittyri
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-05-05 14:38

@ mittyri
Posting: # 17305
Views: 9,508
 

 Simulations

Hi mittyri,

» » The idea behind the Group-by-Treatment interaction is that the T/R in one group is different from the other (i.e., we have collinearity with a “hidden” variable). Therefore, simulate a group of subjects with T/R 0.95 and another one with T/R 0.95–1 (CV ad libitum). Merge them to get a “study”. Run model 1 and check the p-value of the Group-by-Treatment interaction. With the simple model you should expect T/R 1.
»
» seems to be reasonable, but I do not see why the power is low?

Good question. Next question?

I performed simulations (100,000 2×2×2 studies each for conditions a. and b. specified below). Two groups of 16 subjects each, CV 30%, no period and sequence effects. 32 subjects should give power 81.52% for T/R 1. If the Group-by-Treatment interaction is not significant (p ≥0.1) in model 1, the respective study is evaluated by model 2 (pooled data) or both groups by model 3 otherwise. All studies are evaluated by model 3 (pooled data). The listed PE is the geometric mean of passing studies’ PEs.
  1. T/R in group 1 0.95, T/R in group 2 0.95–1
    (i.e., ‘true’ Group-by-Treatment interaction):
    Model 1: p(G×T) <0.1 in 17.91% of studies.
    Evaluation of studies with p(G×T) <0.1 (Groups):
      passed model 3 (1)      :  1.42% (of tested); PE  98.69%
                                      range of PEs: 92.45% to 107.63%
      passed model 3 (2)      :  1.64% (of tested); PE 100.99%
                                      range of PEs: 93.99% to 108.23%
      passed model 3 (1 and 2):  0.00% (of tested)
    Evaluation of studies with p(G×T) ≥0.1 (pooled):
      passed model 2          : 66.47% (overall)
                                80.97% (of tested); PE  99.97%
                                      range of PEs: 86.36% to 114.27%
    Studies passing any of model 2 or 3: 67.02%
    Criteria for simple model fulfilled:
      passed model 3          : 80.95%;             PE  99.98%
                                      range of PEs: 86.36% to 114.68%


  2. T/R in both groups 1.00
    (i.e., no Group-by-Treatment interaction):
    Model 1: p(G×T) <0.1 in 9.79% of studies.
    Evaluation of studies with p(G×T) <0.1 (Groups):
      passed model 3 (1)      :  1.86% (of tested); PE 100.28%
                                      range of PEs: 93.09% to 108.40%
      passed model 3 (2)      :  1.87% (of tested); PE 100.01%
                                      range of PEs: 92.18% to 108.41%
      passed model 3 (1 and 2):  0.00% (of tested)
    Evaluation of studies with p(G×T) ≥0.1 (pooled):
      passed model 2          : 73.33% (overall)
                                81.28% (of tested); PE  99.98%
                                      range of PEs: 86.36% to 114.68%
    Studies passing any of model 2 or 3: 73.69%
    Criteria for simple model fulfilled:
      passed model 3          : 81.40%;             PE  99.98%
                                      range of PEs: 86.36% to 115.15%

IMHO, equal groups sizes are problematic. What if one group passes and the other fails? Even if one is fishy and present only the passing one, assessors likely would ask for the other one and make a conservative decision. Hoping that both groups will pass is simply futile.

Lessons learned:
If we test at the 10% level and there is no true Group-by-Treatment interaction we will find a significant effect at ~ the level of the test – as expected (b). Hurray, false positives!
On the other hand, if there is one, we will detect it (a).
The percentage of studies passing in models 2 and 3 are similar. Theoretically in model 2 it should be slightly lower than in model 3 (one degree of freedom of the treatment effect less). However, overall power is seriously compromised.

Slowly I get the impression that the evaluation of groups (by model 3) is not a good idea. If there is a true Group-by-Treatment interaction why the heck should the PE (say in the largest group) be unbiased? I would rather say that if one believes that a Group-by-Treatment interaction really exists (I don’t) and the test makes sense (I don’t) evaluation (of the largest group) by model 3 should not be performed. Consequently ~⅒ of (otherwise passing) studies would go into the waste bin. Didn’t I say that before?


The distribution of p-values should be uniform.
Looks good for b.

[image]

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
0.0000011 0.2517777 0.5002957 0.5008763 0.7508297 0.9999974


Interesting shape for a.

[image]

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
0.0000001 0.1562932 0.3991516 0.4306846 0.6868190 0.9999981


If you prefer more extreme stuff: T/R in group 1 0.90, T/R in group 2 0.90–1

Model 1: p(G×T) <0.1 in 40.35% of studies.
Evaluation of studies with p(G×T) <0.1 (Groups):
  passed model 3 (1)      :  1.09% (of tested); PE  98.76%
                                  range of PEs: 91.69% to 105.97%
  passed model 3 (2)      :  1.06% (of tested); PE 101.40%
                                  range of PEs: 94.58% to 108.34%
  passed model 3 (1 and 2):  0.00% (of tested)
Evaluation of studies with p(G×T) ≥0.1 (pooled):
  passed model 2          : 47.74% (overall)
                            80.03% (of tested); PE  99.98%
                                  range of PEs: 87.24% to 114.13%
Studies passing any of model 2 or 3: 48.60%
Criteria for simple model fulfilled:
  passed model 3          : 79.45%;             PE  99.99%
                                  range of PEs: 87.24% to 114.13%


[image]

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.00000 0.03962 0.15648 0.26602 0.42742 0.99997



PS: The code seems to work – at least for the pooled model 3. Comparisons of powers
power.TOST(...)                0.815152
power.TOST.sim(..., nsims=1e5) 0.81437
power.TOST.sim(..., nsims=1e6) 0.815127
My code (nsims=1e5)            0.81402
My code (nsims=1e6)            0.81551

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
mittyri
Senior

Russia,
2017-05-08 23:28

@ Helmut
Posting: # 17327
Views: 9,206
 

 loosing specificity due to low sensitivity

Hi Helmut,

you've made a great work! Won't it be published?
In your examples (simulations/practice) you showed the TxG test is not a good idea.
I was impressed by this:

» Model 1: p(G×T) <0.1 in 17.91% of studies.

» b. T/R in both groups 1.00
» (i.e., no Group-by-Treatment interaction):
» Model 1: p(G×T) <0.1 in 9.79% of studies.

» If you prefer more extreme stuff: T/R in group 1 0.90, T/R in group 2 0.90–1
» Model 1: p(G×T) <0.1 in 40.35% of studies.

I see that the sensitivity is really low, but I think it is not a good idea to compensate it with low specificity (high false positive).

Once again, thank you very much! Wouldn't you mind to publish the code of data building for simulations?

Kind regards,
Mittyri
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-05-09 00:55

@ mittyri
Posting: # 17329
Views: 9,258
 

 loosing specificity due to low sensitivity

Hi mittyri,

» you've made a great work!

THX!

» Won't it be published?

I hope so. It is on my to-do-list since last summer…

» In your examples (simulations/practice) you showed the TxG test is not a good idea.
» I was impressed by this: […]»
» I see that the sensitivity is really low, but I think it is not a good idea to compensate it with low specificity (high false positive).

Right. I have no idea where this idea come from. :confused:

» Wouldn't you mind to publish the code of data building for simulations?

I used code developed by Martin and me many years ago to simulate 2×2 designs followed by a simple rbind(part1, part2). I improved it to speed things up (which worked). Unfortunately I screwed up a couple of days ago (no version control, saving over). Shit.
In the meantime you can (mis)use the code Detlew distributed last year to simulate replicate designs. I duplicated his functions. Start with the RTRT|TRTR and set CVWT = CVwR = CVbT = CVbR. The code below shows the relevant changes after his example and how to sim. For the plot you have to attach package lattice.
set.seed(123456)
G1 <- 0.95 # T/R in group 1
G2 <- 1/G1 # T/R in group 2
mvc1 <- mean_vcov(c("TRTR", "RTRT"), muR=log(100), ldiff=log(G1),
                 sWT=CV2se(0.3), sWR=CV2se(0.3),
                 sBT=CV2se(0.3), sBR=CV2se(0.3), rho=1)
mvc2 <- mean_vcov(c("TRTR", "RTRT"), muR=log(100), ldiff=log(G2),
                 sWT=CV2se(0.3), sWR=CV2se(0.3),
                 sBT=CV2se(0.3), sBR=CV2se(0.3), rho=1)
# get the data
ow    <- options()
nsims <- 1e4
sig   <- pass.2 <- pass.3 <- pass.3.1 <- pass.3.2 <- pass.3.a <- 0
PE2   <- PE3.1 <- PE3.2 <- PE3 <- p.GxT <- numeric(0)
alpha <- 0.05
p.level <- 0.1
L <- 80
U <- 125
sub.seq <- 8 # subjects / sequence (within each group)
for (j in 1:nsims) {
  part1 <- prep_data(seqs=c("TRTR", "RTRT"), rep(sub.seq, 2),
                     metric="PK", dec=3, mvc_list=mvc1)
  part1 <- part1[, !(names(part1) %in% c("seqno", "logval"))]
  part1 <- part1[!part1$period %in% c(3, 4), ]
  part1$sequence[part1$sequence == "TRTR"] <- "TR"
  part1$sequence[part1$sequence == "RTRT"] <- "RT"
  part1$group <- 1
  part2 <- prep_data(seqs=c("TRTR", "RTRT"), rep(sub.seq, 2),
                     metric="PK", dec=3, mvc_list=mvc2)
  part2 <- part2[, !(names(part2) %in% c("seqno", "logval"))]
  part2 <- part2[!part2$period %in% c(3, 4), ]
  part2$sequence[part2$sequence == "TRTR"] <- "TR"
  part2$sequence[part2$sequence == "RTRT"] <- "RT"
  part2$subject <- part2$subject+sub.seq*2
  part2$group <- 2
  study <- rbind(part1, part2)
  study$subject   <- factor(study$subject)
  study$period    <- factor(study$period)
  study$sequence  <- factor(study$sequence)
  study$treatment <- factor(study$treatment)
  study$group     <- factor(study$group)
  options(contrasts=c("contr.treatment", "contr.poly"), digits=12)
  # model 1 of pooled data
  model1 <- lm(log(PK)~group+sequence+treatment+group*sequence+
                       group*period+group*treatment+subject%in%group*sequence,
                       data=study)
  p.GxT[j] <- anova(model1)[["group:treatment", "Pr(>F)"]]

  if (p.GxT[j] >= p.level) { # if no sign. interaction: model 2 of pooled data
    model2 <- lm(log(PK)~group+sequence+treatment+group*sequence+
                         group*period+subject%in%group*sequence,
                         data=study)
    CI2 <- round(100*exp(confint(model2, "treatmentT", level=1-2*alpha)), 2)
    if (CI2[1] >= L & CI2[2] <= U) {
      pass.2 <- pass.2 + 1 # count passing studies
      PE2[pass.2] <- as.numeric(exp(coef(model2)["treatmentT"]))
    }
  } else { # sign. interaction (otherwise): model 3 of both groups
    sig <- sig + 1 # count studies with significant interaction
    # first group (use part1 data)
    model3.1 <- lm(log(PK)~sequence+treatment+period+subject%in%sequence,
                           data=part1)
    CI3.1 <- round(100*exp(confint(model3.1, "treatmentT", level=1-2*alpha)), 2)
    if (CI3.1[1] >= L & CI3.1[2] <= U) {
      pass.3.1 <- pass.3.1 + 1 # count passing studies
      PE3.1[pass.3.1] <- exp(coef(model3.1)["treatmentT"])
    }
    # second group (use part2 data)
    model3.2 <- lm(log(PK)~sequence+treatment+period+subject%in%sequence,
                           data=part2)
    CI3.2 <- round(100*exp(confint(model3.2, "treatmentT", level=1-2*alpha)), 2)
    if (CI3.2[1] >= L & CI3.2[2] <= U) {
      pass.3.2 <- pass.3.2 + 1 # count passing studies
      PE3.2[pass.3.2] <- as.numeric(exp(coef(model3.2)["treatmentT"]))
    }
    # check whether /both/ groups pass (haha)
    if ((CI3.1[1] >= L & CI3.1[2] <= U) &
        (CI3.2[1] >= L & CI3.2[2] <= U)) pass.3.a <- pass.3.a + 1
  }
  # model 3 of pooled data (simple 2x2x2 crossover)
  model3 <- lm(log(PK)~sequence+treatment+period+subject%in%sequence,
                       data=study)
  CI3 <- round(100*exp(confint(model3, "treatmentT", level=1-2*alpha)), 2)
  if (CI3[1] >= L & CI3[2] <= U) {
    pass.3 <- pass.3 + 1 # count passing studies
    PE3[pass.3] <- as.numeric(exp(coef(model3)["treatmentT"]))
  }
} # end of sim loop
PE2est <- prod(PE2)^(1/length(PE2)) # geom. mean of PEs (passing with model 2)
if (length(PE3.1) > 0) { # geom. mean of PEs (passing with model 3; group 1)
  PE3.1est <- prod(PE3.1)^(1/length(PE3.1))
} else {
  PE3.1est <- NA
}
if (length(PE3.2) > 0) { # geom. mean of PEs (passing with model 3; group 2)
  PE3.2est <- prod(PE3.2)^(1/length(PE3.1))
} else {
  PE3.2est <- NA
}
PE3est <- prod(PE3)^(1/length(PE3)) # geom. mean of PEs (passing with model 3)
options(ow) # restore options
x <- c(0.25, 0.75)
y <- as.numeric(quantile(p.GxT, probs=x))
b <- diff(y)/diff(x)
a <- y[2]-b*x[2]
numsig <- length(which(p.GxT < p.level))
MajorInterval <- 5 # interval for major ticks
MinorInterval <- 4 # interval within major
Major <- seq(0, 1, 1/MajorInterval)
Minor <- seq(0, 1, 1/(MajorInterval*MinorInterval))
labl  <- sprintf("%.1f", Major)
ks    <- ks.test(x=p.GxT, y="punif", 0, 1)
if (G1 != G2) {
  main <- list(label=paste0("\nSimulation of \'true\' interaction\nT/R (G1) ",
                            G1, ", T/R (G2) ", round(G2, 4)), cex=0.9)
} else {
  main <- list(label=paste0("\nSimulation of no interaction\nT/R (G1, G2) ",
                            G1), cex=0.9)
}
if (ks$p.value == 0) {
  sub <- list(label=sprintf("Kolmogorov-Smirnov test: p <%1.5g",
              .Machine$double.eps), cex=0.8)
} else {
  sub <- list(label=sprintf("Kolmogorov-Smirnov test: p %1.5g",
              ks$p.value), cex=0.8)
}
trellis.par.set(layout.widths=list(right.padding=5))
qqmath(p.GxT, distribution=qunif,
  prepanel=NULL,
  panel=function(x) {
    panel.grid(h=-1, v=-1, lty=3)
    panel.abline(h=p.level, lty=2)
    panel.abline(c(0, 1), col="lightgray")
    panel.abline(a=a, b=b)
    panel.qqmath(x, distribution=qunif, col="blue", pch=46) },
    scales=list(x=list(at=Major), y=list(at=Major),
                tck=c(1, 0), labels=labl, cex=0.9),
    xlab="uniform [0, 1] quantiles",
    ylab="p (Group-by-Treatment Interaction)",
    main=main, sub=sub, min=0, max=1)
trellis.focus("panel", 1, 1, clip.off=TRUE)
panel.axis("bottom", check.overlap=TRUE, outside=TRUE, labels=FALSE,
           tck=0.5, at=Minor)
panel.axis("left", check.overlap=TRUE, outside=TRUE, labels=FALSE,
           tck=0.5, at=Minor)
panel.polygon(c(0, 0, numsig/nsims, numsig/nsims, 0),
              c(0, rep(p.level, 2), 0, 0), lwd=1, border="red")
trellis.unfocus()
cat("Model 1: p(G\u00D7T) <0.1 in", sprintf("%.2f%%", 100*sig/nsims),
"of studies.",
"\nEvaluation of studies with p(G\u00D7T) <0.1 (Groups):",
"\n  passed model 3 (1)      :",
  sprintf("%5.2f%%", 100*pass.3.1/sig), "(of tested); PE",
  sprintf("%6.2f%%", 100*PE3.1est),
"\n                                  range of PEs:",
  sprintf("%5.2f%% to %6.2f%%", 100*range(PE3.1)[1], 100*range(PE3.1)[2]),
"\n  passed model 3 (2)      :",
  sprintf("%5.2f%%", 100*pass.3.2/sig), "(of tested); PE",
  sprintf("%6.2f%%", 100*PE3.2est),
"\n                                  range of PEs:",
  sprintf("%5.2f%% to %6.2f%%", 100*range(PE3.2)[1], 100*range(PE3.2)[2]),
"\n  passed model 3 (1 and 2):",
  sprintf("%5.2f%%", 100*pass.3.a/sig), "(of tested)",
"\nEvaluation of studies with p(G\u00D7T) \u22650.1 (pooled):",
"\n  passed model 2          :",
  sprintf("%5.2f%%", 100*pass.2/nsims), "(overall)",
"\n                           ",
  sprintf("%5.2f%%", 100*pass.2/(nsims-sig)), "(of tested); PE",
  sprintf("%6.2f%%", 100*PE2est),
"\n                                  range of PEs:",
  sprintf("%5.2f%% to %6.2f%%", 100*range(PE2)[1], 100*range(PE2)[2]),
"\nStudies passing any of model 2 or 3:",
  sprintf("%5.2f%%", 100*(pass.3.1/nsims+pass.3.2/nsims+pass.2/nsims)),
"\nCriteria for simple model fulfilled:",
"\n  passed model 3          :",
  sprintf("%5.2f%%;             PE", 100*pass.3/nsims),
  sprintf("%6.2f%%", 100*PE3est),
"\n                                  range of PEs:",
  sprintf("%5.2f%% to %6.2f%%", 100*range(PE3)[1], 100*range(PE3)[2]), "\n")
round(summary(p.GxT), 7)

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-05-06 17:31

@ mittyri
Posting: # 17310
Views: 9,343
 

 Loss in power

Hi mittyri and all,

R-code to estimate the loss in power (two groups of equal size). Example for my simulations above and assuming that we will get a significant Group-by-Treatment interaction at the level of the test.

library(PowerTOST)
CV          <- 0.3
theta0      <- 1
targetpower <- 0.8
p.level     <- 0.1
res         <- sampleN.TOST(CV=CV, theta0=theta0, targetpower=targetpower,
                            print=FALSE)
N           <- res[["Sample size"]]
if (N >= 12) { # at least minimum sample size acc. to GLs?
  pwr.mod3pooled <- res[["Achieved power"]]
} else {
  N <- 12
  pwr.mod3pooled <- power.TOST(CV=CV, theta0=theta0, n=N)
}
pwr.mod2       <- suppressMessages(power.TOST(CV=CV, theta0=theta0,
                                              n=N-1)*(1-p.level))
pwr.mod3groups <- power.TOST(CV=CV, theta0=theta0, n=N/2)*p.level
pwr.mod2and3   <- pwr.mod2+pwr.mod3groups
cat(sprintf("CV %5.3f%%, theta0 %.4f, targetpower %.2f%%    : sample size %i",
            100*CV, theta0, 100*targetpower, N),
    "\nLevel of the G\u00D7T test (model 1)                  :",
      sprintf("% 6.4f", p.level),
     "\nPower of studies evaluated by model 2 (pooled)   :",
      sprintf("%5.2f%%", 100*pwr.mod2),
    "\nPower of studies evaluated by model 3 (groups)   :",
      sprintf("%5.2f%%", 100*pwr.mod3groups),
    "\nModel 2 (pooled) and 3 (groups) combined         :",
      sprintf("%5.2f%%", 100*pwr.mod2and3),
    "\nPower of studies evaluated by model 3 (pooled)   :",
      sprintf("%5.2f%%", 100*pwr.mod3pooled),
    "\nLoss in power if simple model 3 cannot be applied:",
      sprintf("%5.2f%%", 100*(pwr.mod3pooled-pwr.mod2and3)), "\n")

Gives

CV 30.000%, theta0 1.0000, targetpower 80.00%    : sample size 32
Level of the G×T test (model 1)                  :  0.1000
Power of studies evaluated by model 2 (pooled)   : 71.80%
Power of studies evaluated by model 3 (groups)   :  3.25%
Model 2 (pooled) and 3 (groups) combined         : 75.05%
Power of studies evaluated by model 3 (pooled)   : 81.52%
Loss in power if simple model 3 cannot be applied:  6.47%


Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-05-08 19:02

@ mittyri
Posting: # 17321
Views: 9,208
 

 Interval between groups

Hi mittyri and all,

the p-value of the Group-by-Treatment interaction seemingly does not depend on the interval between groups. In most of the studies the interval was just a couple of days but in some substantially longer (i.e., steady state studies where the clinic was occupied). The bubbles’ area, the linear regression, and the loess curve are scaled/weighed by the sample size.

[image]

slope: 0.014330 (p 0.079)

[image]

slope: 0.009428 (p 0.212)


Therefore, we should not worry. The FDA defines “greatly separated in time” as “months apart, for example”.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
mittyri
Senior

Russia,
2017-05-08 23:40

@ Helmut
Posting: # 17328
Views: 9,202
 

 IMP handling

Hi Helmut,

I suppose the problem could be not in the case of 'significant separation in time' but in case of some mistakes in IMP handling.
For example, RIMP has a proven stability up to 30C and TIMP up to 25 only. Due to some "why bother" attitude the designated employee missed it. As a result the second group will be treated with poor TIMP. I assume here that the order of groups treatment is
GR1PER1
GR1PER2
GR2PER1
GR2PER2
The CRO's are usually mixing the time for groups for more effective time management.

So I think in case of appropriate IMP handling we wouldn't observe any real (not false-positive) interaction.
Please correct me if I'm wrong here.

Kind regards,
Mittyri
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-05-09 01:08

@ mittyri
Posting: # 17330
Views: 9,214
 

 IMP handling

Hi mittyri,

» I suppose the problem could be not in the case of 'significant separation in time' but in case of some mistakes in IMP handling.
» For example, RIMP has a proven stability up to 30C and TIMP up to 25 only. Due to some "why bother" attitude the designated employee missed it. As a result the second group will be treated with poor TIMP. I assume here that the order of groups treatment is
» GR1PER1
» GR1PER2
» GR2PER1
» GR2PER2

[image]That’s what I would call a “stacked approach”.
IMHO, not a good idea for single dose but might be necessary in steady state studies if the capacity of the clinical site is limited.

» The CRO's are usually mixing the time for groups for more effective time management.

[image]Yep – the “staggered approach” keeps the interval as short as possible.


60% of my data sets had an interval of less then seven days. In most of my single dose studies the interval was one to three days.

» So I think in case of appropriate IMP handling we wouldn't observe any real (not false-positive) interaction.

Agree.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-05-14 17:22

@ mittyri
Posting: # 17352
Views: 8,821
 

 Loss in power

Hi mittyri,

continuing the evaluation of the data sets of this post.
Background: Some studies are quite dated (the oldest performed in October 1992!). In those days a pre-specified acceptance range of 75.00–133.33% (or even 70.00–142.86%) was acceptable for Cmax. However, I evaluated all data sets for the common 80.00–125.00%. This explains why more than the expected 20% failed (no, I didn’t screw up the design ;-)). If ever possible I tried to avoid equal group sizes but kept one as large as possible. Didn’t succeed sometimes. Working in a CRO, the sponsor is always right…
All data sets were evaluated by model 3 for pooled data (like in the reports – I never cared about groups) and by model 1 to get the p value of the Group-by-Treatment interaction.
  • If p (G×T) ≥0.1 the pooled data was evaluated by model 2.
  • If p (G×T) <0.1 the largest group(s) were evaluated by model 3.
    If there were more than one large group with equal sizes, both had to pass since I expect assessors to ask for it.
Here the results:

85 studies, 60 analytes, data sets: 84 (AUC), 85 (Cmax).
Evaluated by model 1 (all effects fixed); p (G×T) <0.1:
AUC :   8.33% ( 7/84)
Cmax:  12.94% (11/85)

Summary of passing results.

AUC : model 2 (pooled)                           :  84.42% (65/77)
      model 2 (pooled without pre-test)          :  84.52% (71/84)
              loss (compared to pooled model 3)  :   1.19% ( 1/84)
      model 3 (largest group)                    :  85.71% ( 6/ 7)
      model 2 (pooled) or model 3 (largest group):  84.52% (71/84)
              loss (compared to pooled model 3)  :   1.19% ( 1/84)
      model 3 (pooled)                           :  85.71% (72/84)
              CV (range)                         :  21.30% (4.59–61.73%)

Cmax: model 2 (pooled)                           :  62.16% (46/74)
      model 2 (pooled without pre-test)          :  63.53% (54/85)
              loss (compared to pooled model 3)  :   0.00% ( 0/85)
      model 3 (largest group)                    :  27.27% ( 3/11)
      model 2 (pooled) or model 3 (largest group):  57.65% (49/85)
              loss (compared to pooled model 3)  :   5.88% ( 5/85)
      model 3 (pooled)                           :  63.53% (54/85)
              CV (range)                         :  27.91% (6.82–76.99%)

The loss in power if we follow the FDA’s procedure (compared to the pooled model 3) is lower than I expected. Surprise. A possible explanation is that studies were usually powered for Cmax. Therefore, already the largest groups passed AUC.
On another note: If we apply model 2 without a pre-test (maybe the best way to go for regulators insisting in a group-term) the loss in power compared to the pooled model 3 is negligible. Reasonable, since we lost only few residual degrees of freedom:
pooled model 3: DF=n1+n2–2
pooled model 2: DF=n1+n2-(Ngroups–1)–2

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-05-25 15:26

@ ElMaestro
Posting: # 17418
Views: 8,420
 

 No convergence in JMP and Phoenix WinNonlin

Hi ElMaestro,

» » It should be noted that in rare cases (e.g., extremely unbalanced sequences) the fixed effects model gives no solution and the mixed effects model has to be used.
»
» a realistic linear model will have a single analytical solution unless you make a specification error. Imbalance would not affect that, please describe where/how you came a cross a fit which failed with the lm.

I was right about failing in JMP and Phoenix WinNonlin. ;-)
Sorry I can’t disclose the data set. Naïve pooling was performed. Deficiency letter by the MHRA in summer 2016:

The applicant should present estimates and 95% confidence interval for the difference between the Test and the Reference product on a ratio scale from ANOVA model, that reflects the design of the study, with terms for Group, Sequence, Sequence * Group, Subject (Sequence * Group), Period (Group), Treatment as fixed effects.

Note that this is the FDA’s model 2 with fixed effects. Why the 95% CI instead of the 90% CI was required is another story. The data set (subjects fixed) did not converge in JMP. Switched to random and all was good. Was accepted by the MHRA’s assessor.

Phoenix showed me the finger with the fixed effect Subject(Sequence*Group) in Model 1

[image]

and execution stopped (no results at all).
In model 2 I got the same warning as above but these results:

Partial Sum of Squares
            Hypothesis        DF          SS         MS     F_stat  P_value
---------------------------------------------------------------------------
                 Group         2   0.0758837  0.0379418   3.48232    0.0381
              Sequence         1   0.0708455  0.0708455   6.50224    0.0138
        Group*Sequence         2   0.145263   0.0726313   6.66614    0.0026
Sequence*Group*Subject        50   7.67886    0.153577   14.0954     0.0000
          Group*Period         3   0.0111135  0.0037045   0.340001   0.7965
             Treatment         1   0.144129   0.144129   13.2283     0.0006
                 Error        52   0.566569   0.0108956

Partial Tests of Model Effects
            Hypothesis  Numer_DF  Denom_DF     F_stat  P_value
--------------------------------------------------------------
                 Group         2        52   3.48232    0.0381
              Sequence         1        52   6.50224    0.0138
        Group*Sequence         2        52   6.66614    0.0026
Sequence*Group*Subject        50        52  14.0954     0.0000
          Group*Period         3        52   0.340001   0.7965
             Treatment         1        52  13.2283     0.0006

End of the story. No LSMs. Hence, no difference, no CI…

No problems in R.
Model 1:

Analysis of Variance Table

Response: log(Cmax)
                       Df   Sum Sq    Mean Sq  F value     Pr(>F)   
group                   2 0.078270 0.03913490  3.54171 0.03643331 * 
sequence                1 0.073106 0.07310604  6.61611 0.01312377 * 
treatment               1 0.141465 0.14146461 12.80257 0.00078035 ***
group:period            3 0.011114 0.00370450  0.33526 0.79988452   
group:sequence          2 0.145263 0.07263128  6.57314 0.00292116 **
group:treatment         2 0.014083 0.00704174  0.63728 0.53296911   
group:sequence:subject 50 7.678856 0.15357712 13.89875 < 2.22e-16 ***
Residuals              50 0.552485 0.01104970

Model 2:

Model 2:Analysis of Variance Table

Response: log(Cmax)
                       Df   Sum Sq    Mean Sq  F value     Pr(>F)   
group                   2 0.078270 0.03913490  3.59182 0.03458136 * 
sequence                1 0.073106 0.07310604  6.70971 0.01241215 * 
treatment               1 0.141465 0.14146461 12.98370 0.00070289 ***
group:period            3 0.011114 0.00370450  0.34000 0.79647341   
group:sequence          2 0.145263 0.07263128  6.66614 0.00264706 **
group:sequence:subject 50 7.678856 0.15357712 14.09540 < 2.22e-16 ***
Residuals              52 0.566569 0.01089555


[image]Diving deeper into it. Originally I set up the models in Phoenix WinNonlin’s Bioequivalence module, which sits on top of Linear Mixed Effects. When I send the data directly to Linear Mixed Effects (all fixed) no error, no warning, nada. CI identical to the one from R to 12 significant digits.
Conclusion: Bug in Phoenix WinNonlin’s Bioequivalence module.

[image]BTW: Running model 1 of my 85 data sets (5,004 subjects) in Bioequivalence takes more than ten hours and sucks up almost my entire 16 GB RAM (memory leak?). Direct execution in Linear Mixed Effects takes five minutes (max. RAM consumption 175 MB).
Much slower than R, which takes five seconds for model 1, model 2, model 3 (for each group), and model 3 (pooled).

I was wrong. Has nothing to do with unbalanced sequences and/or unequal group sizes.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
ElMaestro
Hero

Denmark,
2017-05-25 16:24

@ Helmut
Posting: # 17419
Views: 8,283
 

 Ouch?!???

Hi Hötzi,

I am not a WNL/Phoenix user, but if your post is correct then I imagine around 100 CROs as of today will need to change their software validation status from PQ'ed to "unknown" etc, contact the vendor, await a response and in the meantime do everything they can to study the potential impact on data generated (which they cannot necessarily do unless they have other 'validated' software that achieves the same)?
Only the software developer, who has the source, will be able to tell if there is a bug, and if there is, if the bug affects other models, and how/when. This is not good. Man, I don't even quite know what validated means anymore.

if (3) 4

Best regards,
ElMaestro

"(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018.
Artem Gusev
Junior

Russia, Moscow,
2017-05-02 16:13
(edited by Artem Gusev on 2017-05-02 16:33)

@ Helmut
Posting: # 17292
Views: 9,661
 

 Russian «Экс­пер­тами» and their hobby

Hi, Helmut, Mittyri and ELMaestro!

Thanks for alot of usefull information!

I've made some calculation with the models from your post. BE Study with 44 subj (2 groups by 22 sbj) in Phoenix 6.4 (Formulation=Treatment, FF = Cmax/AUC).

Step 1.

Model Fixed Effects: Group+Sequence+Formulation+Group*Sequence+Group*Period+Group*Formulation.
Model Random Effects: Subject(Group*Sequence).

Partial Test:

Dependent     Hypothesis          Numer_DF   Denom_DF     F_stat        P_value
Ln(Cmax)      Group*Formulation       1      39.658597    0.027673298   0.86872445
Ln(AUClast)   Group*Formulation       1      39.208625    0.5880992     0.44774829
Ln(FF)        Group*Formulation       1      40.02772     0.064521326   0.80078777


P_value is good, but is this normaly that DF is not integer?

Step 2.

Model Fixed Effects: Group+Sequence+Formulation+Group*Sequence+Group*Period.
Model Random Effects: Subject(Group*Sequence).

Partial Test:

Dependent    Hypothesis       Numer_DF  Denom_DF     F_stat       P_value

Ln(Cmax)     int              1         40.921725    11484.381    0
Ln(Cmax)     Group            1         40.93824     0.88167054   0.35325184
Ln(Cmax)     Sequence         1         40.921725    0.18361757   0.67052981
Ln(Cmax)     Formulation      1         40.672472    0.046689365  0.83000761
Ln(Cmax)     Group*Sequence   1         40.93824     0.25766245   0.61445462
Ln(Cmax)     Group*Period     2         40.655533    0.94380823   0.39750275
Ln(AUClast)  int              1         41.090066    8381.629     0
Ln(AUClast)  Group            1         41.094871    1.1907365    0.28153775
Ln(AUClast)  Sequence         1         41.090066    1.4454616    0.23614044
Ln(AUClast)  Formulation      1         40.206255    2.9963913    0.091118919
Ln(AUClast)  Group*Sequence   1         41.094871    0.65923753   0.42150764
Ln(AUClast)  Group*Period     2         40.202418    0.41157679   0.66536478
Ln(FF)       int              1         40.650657    139.60377    1E-14
Ln(FF)       Group            1         40.662987    0.54340035   0.46525932
Ln(FF)       Sequence         1         40.650657    3.1149598    0.085088655
Ln(FF)       Formulation      1         40.989423    1.0744673    0.30601523
Ln(FF)       Group*Sequence   1         40.662987    0.75424068   0.39023396
Ln(FF)       Group*Period     2         40.969735    1.9475433    0.15560361


P_value is also acceptable, but is this normaly that DF here is also not integer?

Best Regards,
Artem
mittyri
Senior

Russia,
2017-05-02 17:53

@ Artem Gusev
Posting: # 17293
Views: 9,692
 

 be careful with mixed models

Hi Artem,

Welcome to the world of mixed modeling!
Helmut suggested (see endnote c) to switch to the model with all effects as fixed. Your model is the same as suggested by FDA (with Subject as random). So Phoenix switched to the mixed model and used Satterthwaite degrees of freedom (which could be not integer).
Note that it is impossible to provide ANOVA tables (partial/sequential ss output PHX-speaking) for mixed models.

So due to no convention in Russian brains you are free to use your current model or the model provided by Helmut (with all effects fixed). Latter is more suitable for me due to hazelnut brain.

PS: are there any incomplete data (missed period) for some subjects?

Kind regards,
Mittyri
Artem Gusev
Junior

Russia, Moscow,
2017-05-03 11:02

@ mittyri
Posting: # 17300
Views: 9,567
 

 be careful with mixed models

Hi, Mittyri!

I've tried some fixed models after posting previous reply. Situation with DF has improved.
Also I checked the data, its fine.
It was strange for me because standard Phoenix model (Fixed: PRD, TRT, SEQ; Random: Subj(SEQ)) gives integer DF on same dataset. Now it makes clearer, so nvm.

The deeper you are into the statistics, the more terrible it becomes.

Thanks for help.

Best Regards,
Artem
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-05-05 14:48

@ Artem Gusev
Posting: # 17306
Views: 9,456
 

 p-value(s) in model 2

Hi Artem,

as mittyri already pointed out you should use fixed effect models in Phoenix/WinNonlin.

» Step 2.
» P_value is also acceptable, …

There are no “acceptable” p-values in model 2. Any one is just fine. Only in model 1 check the p-value of the Group-by-Treatment interaction.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2017-05-24 20:17

@ Helmut
Posting: # 17408
Views: 8,415
 

 Russian «Экс­пер­тами» following the EEU GLs

Hi Artem and all,

» Oh, the hobby of the Russian «Экспертами»

I have to correct myself. They are blindly following guidelines of the Eurasian Economic Union (Nov 2016, Dec 2015). Practically a 1:1 translation of the FDA’s guidance:

Исследования в нескольких группах

        94. Если исследование проведено в двух и более группах и эти группы изучались в различных клинических центрах или в одном и том же центре, но были разделены большим промежутком времени (например, месяцами), возникает сомнение относительно возможности объединения результатов, полученных этих группах, в один анализ. Такие ситуации необходимо обсуждать с уполномоченным органом.
Если предполагается проведение исследования в нескольких группах из логис­тических соображений, об этом необходимо явно указать в протоколе исследования; при этом, если в отчете отсутствуют результаты статисти­ческого анализа, учитывающие многогрупповой характер исследования, необходимо представить научное обоснование отсутствия таких результатов.
        93. Если перекрестное исследование проведено в 2 и более группах субъектов, т.е. разбиение всей выборки на несколько групп, каждая из которых начинает участие в исследовании в разные дни (например, если из логис­тических соображений единовременно в клиническом центре можно провести исследование с участием огра­ниченного числа субъектов), в целях отражения многогруппового характера исследования необходимо моди­фицировать статистическую модель. В частности, в модели необходимо учесть тот факт, что периоды для первой группы отличаются от периодов для второй (и последующих) группы.


Does any of our Russian members know whether the above was ever accepted? This section talks about giving in the report a justification for not performing such an analysis. Has anybody ever tried to give the justification already in the protocol? If yes, what happened? If no, why not?

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Beholder
Regular

Russia,
2017-05-24 22:37

@ Helmut
Posting: # 17410
Views: 8,420
 

 Russian «Экс­пер­тами» following the EEU GLs

Hello Helmut!

» Does any of our Russian members know whether the above was ever accepted? This section talks about giving in the report a justification for not performing such an analysis. Has anybody ever tried to give the justification already in the protocol? If yes, what happened? If no, why not?

You are citing too fresh doc I think. The doc came into force on 6th of May. So strictly speaking it was not obligatory to use draft of the doc during clinical trial conducting before 6th of May. So, nobody used it I think and no experience was gathered.

But I would try it))

If Im not mistaken, you wrote something about such algorithm somewhere in forum but I could not find it.

Best regards
Beholder
mittyri
Senior

Russia,
2017-05-25 08:52

@ Beholder
Posting: # 17412
Views: 8,375
 

 Penalty for carelessness

Dear Helmut, Dear Beholder,

@Helmut
» » Does any of our Russian members know whether the above was ever accepted? This section talks about giving in the report a justification for not performing such an analysis. Has anybody ever tried to give the justification already in the protocol? If yes, what happened? If no, why not?

I'd name the group effect as a 'penalty for carelessness'. After some hot discussions last week I understood that's what experts are waiting for since they do not want to dial back.
So some time ago (about 3 years ago) 'group' trend appeared in their mind. The experts asked after reports submission: group? group? group?
On the stage of request on report it was almost impossible to justify the model without groups. By the way now when this topic is very popular, the team who's developing the protocol should include the justification regarding absence of group effect in the model. Otherwise 'groupshot' is very likely.

@Beholder
» If Im not mistaken, you wrote something about such algorithm somewhere in forum but I could not find it.
Here you go


Edit: Changed to internal link; see also this post #7. [Helmut]

Kind regards,
Mittyri
Beholder
Regular

Russia,
2017-05-25 10:43

@ Beholder
Posting: # 17413
Views: 8,334
 

 Russian «Экс­пер­тами» following the EEU GLs

» If Im not mistaken, you wrote something about such algorithm somewhere in forum but I could not find it.

yes, found post regarding the EEU GL, which I mentioned.


Edit: Changed to internal link; see also this post #7. [Helmut]

Best regards
Beholder
Mikalai
Junior

Belarus,
2018-01-04 10:43

@ Helmut
Posting: # 18138
Views: 5,380
 

 Russian «Экс­пер­тами» following the EEU GLs

Dear all
My name is Mikalai, and I am responsible for the conduct of bioequivalence studies in a medium-sized private pharmaceutical company in Belarus. Due to logistic issues (a small clinical center and a highly variable drug) we have to conduct a bioequivalence study in multiple groups (two). Our competent authority requires a justification not to include the group effect in the proposed statistical model. The groups will be separated by a week at maximum. It seems that we meet criteria set out by FDA to use a statistical model without including the group effect. Our competent authority can accept the FDA position on this issue, but we should properly reference it.

Thus, where can I find this information under FOI (link) or might it be possible that someone can share a copy of letter signed by Barbara Davit where it is outlined requirements to ignore the group effect in a statistical model?

Any help will be appreciated.
Sincerely, Mikalai


Edit: I moved your post from an answer to this one, deleted your email address, and activated personal messages in your profile instead. [Helmut]
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2018-01-04 13:08

@ Mikalai
Posting: # 18139
Views: 5,371
 

 Belarus = member of the EEU

Hi Mikali,

» Due to logistic issues (a small clinical center and a highly variable drug) we have to conduct a bioequivalence study in multiple groups (two).

Are you aiming at reference-scaling for Cmax, i.e., perform the study in a replicate design? Even if not, opt for the “staggered approach” – not the “stacked” one (see above).

» Our competent authority requires a justification not to include the group effect in the proposed statistical model.

IMHO, stupid – but according to the GL. :-(

» The groups will be separated by a week at maximum.

Very good.

» It seems that we meet criteria set out by FDA to use a statistical model without including the group effect. Our competent authority can accept the FDA position on this issue, but we should properly reference it.

See this presentation summarizing my current thinking. Note that (since I don’t speak Russian) my remarks given on slide 16 are only partly correct. A justification in the protocol (as you rightly mentioned) should be acceptable. In the discussion following my presentation it became clear that:
  • Pooling (i.e., model III without a justification in the protocol) lead to rejection of the study. A justification in the report only was never consider sufficient by – Russian – experts.
  • Nobody tried model II without a pre-test (this would be a much better option than the FDA’s step-wise models). Why? Duno. The loss in power would be limited (see my small meta-analysis above) and would be compliant with the FDA’s 2001 guidance Section VII.A.:
    • […] the statistical model should be modified to reflect the multigroup nature of the study. In particular, the model should reflect the fact that the periods for the first group are different from the periods for the second group.

» Thus, where can I find this information under FOI (link) …

FDA’s step-wise models (which I would never ever use) are given here and there (maybe there are some more; too lazy to google). However, in the second document you find Comment 9:

If ALL of the following criteria are met, it may not be necessary to include Group-by-Treatment in the statistical model:

  • the clinical study takes place at one site;
  • all study subjects have been recruited from the same enrollment pool;
  • all of the subjects have similar demographics;
  • all enrolled subjects are randomly assigned to treatment groups at study outset.
  • In this latter case, the appropriate statistical model would include only the factors Sequence, Period, Treatment and Subject (nested within Sequence).
Note that “the appropriate statistical model in this later case” is the conventional model for a 2×2×2 crossover.

» … might it be possible that someone can share a copy of letter signed by Barbara Davit where it is outlined requirements to ignore the group effect in a statistical model?

I can’t share mine (chained to my table by a CDA). Maybe Detlew can share his. Note that the wording of Barbara’s letter is identical to the second reference above.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Mikalai
Junior

Belarus,
2018-01-04 19:49

@ Helmut
Posting: # 18142
Views: 5,266
 

 Belarus = member of the EEU

Dear Helmut,
Thank you very much.

Yes, we plan to use the "staggered approach". We used to plan the replicative design, but a large dropout was observed during our last bioequivalence study (crossover design but with long blood sampling interval) due to completely unrelated to the study reasons. Thus, the company decided to put on hold the replicative design given that a clinical center can accommodate only a bit more than 35 volunteers.

We are opting for the model III and will try to use the FDA requirements to justify the statistical model in the protocol. Bioequivalence studies done in accordance with international standards are relatively new for Belarus. As a result, experience, training, and lore are issues for major players (manufacturers, regulators, clinical investigators). We also cannot run studies outside the Eurasian Economic Union. Thus, from time to time our regulators have to rely on opinions of more experienced colleagues, mainly from EMA and FDA. If they see that something is acceptable in Europe or the USA, they usually give us a green light. That why we need papers or proper references.
Regards,
Mikalai.
mittyri
Senior

Russia,
2018-01-04 22:04

@ Helmut
Posting: # 18143
Views: 5,256
 

 Trying your model for EEU

Dear Helmut,

You are Key Opinion Leader in Russia (not limited to Russia I believe)

» Nobody tried model II without a pre-test (this would be a much better option than the FDA’s step-wise models). Why? Duno.

Why do you think that nobody has tried? :-D
Talked to some involved guys, they are trying and waiting for experts feedback

Kind regards,
Mittyri
Helmut
Hero
avatar
Homepage
Vienna, Austria,
2018-01-05 00:06

@ mittyri
Posting: # 18145
Views: 5,260
 

 Trying your model for EEU

Hi Mittyri,

» You are Key Opinion Leader in Russia (not limited to Russia I believe)
Hhm. :ponder:

» » Nobody tried model II without a pre-test (this would be a much better option than the FDA’s step-wise models). Why? Duno.
»
» Why do you think that nobody has tried? :-D

In Yaroslavl I specifically asked the participants. Maybe the ones you know were there but didn’t want to come up in front of the experts?

» Talked to some involved guys, they are trying and waiting for experts feedback

Great. Let’s keep our fingers crossed.

Also in Yaroslavl people encouraged me to publish my meta-study. Well, I’m still collecting data (:waving: Astea). They also suggested a Russian Journal. I don’t like that, since seemingly Russian is more ambiguous than English. Originally I thought of the Journal of Biopharmaceutical Statistics (where D’Angelo’s article about carry-over was published). When I presented about Multi-Group Studies at the 2nd Annual Biosimilars Forum (October 2017) everybody (and this was a statistical audience from agencies, the industry, and CROs) was surprised that it is an issue at all. Nobody (!) would expect anything than simple pooling. The consensus was that the FDA’s stage-wise procedure might even inflate the Type I Error and should be avoided. Given that I guess such a manuscript will be rejected right away due to its doubtful content. :-D

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes
Astea
Regular

Russia,
2018-01-10 12:09

@ Helmut
Posting: # 18156
Views: 4,893
 

 help us to stop it, please...

Dear Helmut! (:waving: )

The problem will remain untill the Eurasian Economic Union requirements will be corrected or modified. And instead of forgetting this subject as a nightmare we see the inverse tendention: new economic association involves new countries into this problem. Now Belarus was dragged into this story... What's next? It might be stopped just after a scientific paper by the respected author will announce the uselessness of this stuff...
Beholder
Regular

Russia,
2018-01-10 12:49

@ Astea
Posting: # 18158
Views: 4,882
 

 help us to stop it, please...

Dear colleagues,

» ... What's next? ...

Next is Kazakhstan, then Kyrgyzstan, then Armenia... :-D

Best regards
Beholder
d_labes
Hero

Berlin, Germany,
2018-01-10 15:15

@ Astea
Posting: # 18161
Views: 4,838
 

 regulators convinced by science?

Dear Astea!

» ... It might be stopped just after a scientific paper by the respected author will announce the uselessness of this stuff...

Do you really believe that regulators (aka «Экс­пер­тами») can be convinced by science?
May the Lord preserve your infant faith :no:.

Regards,

Detlew
Beholder
Regular

Russia,
2018-01-10 17:14

@ d_labes
Posting: # 18163
Views: 4,796
 

 regulators convinced by science?

Dear d_labes!

» Do you really believe that regulators (aka «Экс­пер­тами») can be convinced by science?
» May the Lord preserve your infant faith :no:.

"But I Tried, Didn't I? Goddamnit, at Least I Did That" - McMurphy ("One Flew Over the Cuckoo's Nest", 1975.)

Best regards
Beholder
d_labes
Hero

Berlin, Germany,
2018-01-10 18:53

@ Beholder
Posting: # 18164
Views: 4,785
 

 Чёрт побери!

Dear beholder!

» "But I Tried, Didn't I? Goddamnit, at Least I Did That" - McMurphy ("One Flew Over the Cuckoo's Nest", 1975.)

Swearing is of the чёрт :cool:

Regards,

Detlew
Astea
Regular

Russia,
2018-01-10 19:10

@ d_labes
Posting: # 18165
Views: 4,785
 

 regulators convinced by science?

Dear d_labes!

"You may say that I am a dreamer, but I am not the only one..."

Thanks for the support, Beholder!
d_labes
Hero

Berlin, Germany,
2018-01-10 20:18

@ Astea
Posting: # 18166
Views: 4,736
 

 Excuse me

Dear Astea!

First: Call me Detlew.
You are also allow to pronounce it Detluuu.
Up to now only my wife is allowed to do so :-D.

» "You may say that I am a dreamer, but I am not the only one..."
»
» Thanks for the support, Beholder!

Excuse me, the old grumpy buffer.
You are young. And dreaming is the privilege of the youth.

Regards,

Detlew
Astea
Regular

Russia,
2018-01-10 20:38

@ d_labes
Posting: # 18167
Views: 4,731
 

 Excuse me

Dear Detlew!

Thank you! I am not so young to believe in fairy tales, but I am not so old not to believe in mind. I see how things change to the better end on my own eyes just because of the presence of not indifferent scientific people.

P.S. You can call me Nastia (similar to nasty - easy to remember)
Back to the forum Activity
 Thread view
Bioequivalence and Bioavailability Forum |  Admin contact
18,547 posts in 3,941 threads, 1,192 registered users;
online 16 (1 registered, 15 guests [including 14 identified bots]).

Ignorance more frequently begets confidence
than does knowledge.    Charles Darwin

The BIOEQUIVALENCE / BIOAVAILABILITY FORUM is hosted by
BEBAC Ing. Helmut Schütz
HTML5 RSS Feed