Relaxation
★    

Germany,
2015-02-20 17:31
(3324 d 00:42 ago)

Posting: # 14474
Views: 7,953
 

 So they implemented Pitman-Morgan [General Sta­tis­tics]

Hello everybody and a nice evening.

I have to comment on the Release Notes of Phoenix WinNonlin 6.4 from August last year at the moment. In the "What's new" section I saw that Certara implemented tests for equal variances for parallel designs and also for 2x2 cross-over studies. And this is accompanied by some explanation in the handbook on how to proceed in case a significant value is observed (basically, you implement a random/repeated specification for period and group by formulation).

As I am not a statistician and, frankly, never came across a discussion of this test I tried to find out something about its importance in bioequivalence/rel. bioavailability.

Fortunately, after some unsuccessful searching in the net and this forum, I found some information in Chow & Liu 3rd edition (p.196) and, although I fail to understand most of the discussion :confused:, that seems to imply, that this test is not only relevant in pop/ind equivalence (found it only for these in Hauschke, Steinijans & Pigeot) but also for testing the intra-individual variance in a simple 2x2x2 cross-over study (not inter!) for average BE. Uhm, however, the consequences of rejection of H0 seem to be missing.

Whatever, as a natural consequence, I tried to figure out whether the recommended adaptation of the evaluation as given for Phoenix WinNonlin actually has any effect on the (BE) result at all simply using recent data sets (as said, I don't understand the formula sufficiently, have to try and compare :-D).
I was not able to perform the testing for unequal variances, but some variabilities (for Reference or Test) at least "look" different.
And there was no effect at all in PE or CIs :-). Thus, I wonder :ponder::
  • has such a test any meaning for the "final" outcome in BE testing, the estimate/CI of the ratio between treatments (probably my tests just showed no difference due to chance)?
  • and if so, could the proposed workaround be communicated to EU authorities anyway (this is clearly not an "all effects fixed" situation)
  • or is this for information only comparable to the testing for period and sequence effects?
Any suggestions or thoughts that set me thinking would be greatly appreciated and I wish everyone a great weekend!

Best regards,

Steven.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2015-02-20 19:56
(3323 d 22:17 ago)

@ Relaxation
Posting: # 14478
Views: 7,455
 

 I would not apply pretesting

Hi Relaxation,

❝ As I am not a statistician…


As am I. :cool: Partly I feel guilty. For many years I was unhappy with WinNonlin’s setup of parallel designs and fired questions at Pharsight. FDA’s guid­ance (2001) states:

For parallel designs […] equal variances should not be assumed.

(Classical) WinNonlin and PHX/WNL by default apply the conventional t-test, which is sensitive to un­equal variances and unequal group sizes. The conventional t-test is always liberal if compared to the Welch-Satterthwaite approximation. The setup was (and still is) misleading because – even if a user was aware of the issue – in the General Options > Degrees of Freedom ⦿ Satterthwaite applies the t-test for equal variances. In July 2013 Linda Hughes posted a workaround on Pharsight’s Extranet (which is applic­able to all versions of WinNonlin). End of 2013 Pharsight told me that they will implement tests for (un)equal variances – which is not necessarily the best idea.1 Any pretest might will inflate the Type I Error. In PHX/WNL6.4 the workaround is de­scribed in the “User’s Guide”. For parallel designs2 I recommend to always3,4 use the workaround instead of pretesting.

❝ […] this test is not only relevant in pop/ind equivalence (found it only for these in Hauschke, Steinijans & Pigeot) but also for testing the intra-individual variance in a simple 2x2x2 cross-over study (not inter!) for average BE. Uhm, however, the consequences of rejection of H0 seem to be missing.


One of the assumptions in the conventional crossover BE model are equal variances of T and R. If they are unequal, the CI will be inflated. Bad luck. That’s why regulators don’t care (aka increase the sample size). If s²WT < s²WR a “good” test product will be punished by the “bad” reference. At the end this issue lead to reference-scaling. But:
  • A high common CVintra (pooled from T and R) is only a hint of a highly variable reference. If you setup this model, it helps to get an idea – nothing more.
  • Already in the Q&A to the “old” NfG EMA stated that a replicate design is required to justify widening the limits for Cmax to 0.75–1.33…

❝ I was not able to perform the testing for unequal variances, but some variabilities (for Reference or Test) at least "look" different.


What do you mean by “look different”? First of all we need a mixed-effects model (not the EMA-stuff). In some data­sets I got no convergence. Clayton & Leslie’s (B; see there) worked and Sauter’s (A) crashed. Clayton’s is interesting. In the conventional analysis we get a CVW of 60.2% (Pitman-Morgan’s test is not significant, p 0.2209). Should we design the next study for such a high CV? The modified model gives us CVWR 48.0% and CVWT 71.3%. Have a look at Chow/Liu Ch.8. The high vari­abi­lity of the test likely is caused by the outlying subject 7. If we exclude this subject, CVW decreases to 43.6%. Hey, that’s pretty close to what we found for the reference in the full dataset.
I have no idea how sensitive the Pitman-Morgan test is. I got significance only in our datasets E and H. However, the modified model looks interesting.

Remember that all models are wrong; the practical question is
how wrong do they have to be to not be useful.
    George E.P. Box

❝ And there was no effect at all in PE or CIs :-).


Don’t worry; there will not be any. The residual variance is identical as in the conventional model.

has such a test any meaning for the "final" outcome in BE testing, the estimate/CI of the ratio between treatments […]?


No; see above.

could the proposed workaround be communicated to EU authorities anyway (this is clearly not an "all effects fixed" situation)


Please no. Don’t open up a can of worms. Personally I think that treating subjects as a fixed effect is crap.

is this for information only comparable to the testing for period and sequence effects?


Period effects are irrelevant (unless the study is extremely [sic] imbalanced). Luckily testing for se­quence (aka unequal carry-over) effects went to the regulatory trash can 21 (‼) years after Free­man’s paper.5 :crying:

❝ Any suggestions or thoughts that set me thinking would be greatly appreciated


This setup is useful if you want to get some insights about the properties of formulations (beyond their means). Although it is not a substitute for a replicate design it might give you some useful information (especially in product development studies). Note that Boddy et al.6 suggested reference-scaling based on a 2×2 design…


  1. Moser BK, Stevens GR, Watts CL. The Two-Sample t Test Versus Satterthwaite’s Approximate F test. Commun Statist–Theory Meth. 1989; 18: 3963–75. doi:10.1080/03610928908830135.
  2. Fuglsang A, Schütz H, Labes D. Reference Datasets for Bioequivalence Trials in a Two-Group Parallel Design. AAPS J. 2015; 17(2): 400–4. doi:10.1208/s12248-014-9704-6. [image] open access.
  3. DiSantostefano RL, Muller KE. A Comparison of Power Approximations for Satterthwaite’s Test. Commun Stat–Simula. 1995; 24: 583–93. doi:10.1080/03610919508813260.
  4. Ruxton GD. The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whit­ney U test. Behav Ecol. 2006; 17: 688–90. doi:10.1093/beheco/ark016.
  5. Freeman PR. The Performance of the Two-Stage Analysis of Two-Treatment, Two-Period Crossover Trials. Stat Med. 1989; 8: 1421–32. doi:10.1002/sim.4780081202.
  6. Boddy AW, Snikeris FC, Kringle RO, Wi GC-G, Oppermann JA, Midha KK. An Approach for Widening the Bioequivalence Acceptance Limits in the Case of Highly Variable Drugs. Pharm Res. 1995; 12: 1865–8. doi:10.1023/A:1016219317744.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Relaxation
★    

Germany,
2015-02-24 14:06
(3320 d 04:07 ago)

@ Helmut
Posting: # 14499
Views: 6,585
 

 I would not apply pretesting

Hello Helmut, hello forum members.
Now I feel guilty to made you spent so much work on an answer, as I really thought for the more experienced folk this would be a one-liner. Just makes me even more thankful for your efforts and this forum.
Yes, I think I got the importance of accounting (not necessarily testing) for equal variances in a parallel design (which is also implemented in WinNonlin 6.4). I just fail to intuitively understand, how the intraindividual variance for each product can be calculated without a replicated administration. However, that is likely due to me not able to understand the formulas properly and I should invest my own time here :yes:. Still, the discussion at the moment somehow calms me down.

❝ What do you mean by “look different”?


Well, as I have no access to WNL 6.4 at the moment I could only try the work around in 6.3, but without getting the results of the PM test. And the variances given for Test and Reference as “1_2/2_1” looked “different”, e.g. with 0.0113 and 0.0061 in one of my data sets.

❝ Please no. Don’t open up a can of worms. Personally I think that treating subjects as a fixed effect is crap. […]


❝ Period effects are irrelevant […] sequence (aka unequal carry-over) […] garbage


And still we have to submit the appropriate tests of all effects in the model. Hm, makes me wonder, if authorities are actually looking into the Core outputs regularly…
And when we use the simple Core output, testing by PM will then be included also, as “The results […] are given in the Average Bioequivalence output worksheet and at the end of the Core output {Users Guide 6.4}.”
From what I learned from other people, I can understand the condemnation of using a fixed effect for subjects. From some reports I saw I personally can appreciate, that in a 2x2 setting subjects who cannot contribute to the T/R comparison (missing data in one/two periods) will be omitted “automatically” instead of getting the missing data imputed. That’s a nice side effect making the evaluation keeping in line with the EU-GL.
Now, I will just read your post again and implement those literature references that are missing to our library.

Thanks again and best regards,
Steven.
ElMaestro
★★★

Denmark,
2015-02-24 15:28
(3320 d 02:44 ago)

@ Relaxation
Posting: # 14500
Views: 6,595
 

 I would gladly apply pretesting

Hi all,

I would gladly apply pretesting.

It will only lead to inflation of type I errors if the alpha for both the pretest and the BE is brainlessly chosen to be 5%. Adjust them, and you're good to go in terms of the type I error.

You will find wording in the guideline to the effect of not assuming variance homogeneity. It can along these lines be argued that you are not assuming it when you are testing for it and letting the outcome decide your next step.

All is good.

Choice of actual alpha's is of course a little tricky but that is a mere practicality.

LMSTFY. :-D

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2015-02-24 17:05
(3320 d 01:08 ago)

@ ElMaestro
Posting: # 14502
Views: 6,489
 

 Simulations feasible?

Hi ElMaestro,

❝ It will only lead to inflation of type I errors if the alpha for both the pretest and the BE is brainlessly chosen to be 5%.


Agree.

❝ Adjust them, and you're good to go in terms of the type I error.


Agree again.

❝ Choice of actual alpha's is of course a little tricky but that is a mere practicality.


The inflation can be nasty. See Ruxton’s Table 1.
For N1, N2 (11, 21) and s1, s2 (4, 1) the Type I Error of the con­ventional t-test is 0.155 (!)
OK, that’s extreme.

LMSTFY. :-D


IMHO, too many variables. Save your efforts – unless you will publish an entire book full of tables covering any possible combination (GMR, power, CV, sample size ratio, s-ratio).

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2015-02-24 15:51
(3320 d 02:22 ago)

@ Relaxation
Posting: # 14501
Views: 6,738
 

 Mixed vs. fixed effects (mainly)

Hi Steven,

❝ I just fail to intuitively understand, how the intraindividual variance for each product can be calculated without a replicated administration.


In a mixed-effects model (which we apply here) you could get this information. The method is similar to recovering information from incomplete data. Compare
  • A dataset with one period missing for one or more subjects. Run a mixed-effects model
    (subject random).
  • Exclude the subject(s) with missing periods and run the all fixed-effects model
    (subject fixed).
The CI (and CVintra) likely will be pretty similar.

❝ ❝ What do you mean by “look different”?


❝ […] the variances given for Test and Reference as “1_2/2_1” looked “different”, e.g. with 0.0113 and 0.0061 in one of my data sets.


Hhm, not sure which coding you used. It should be: Random: Subject(Sequence) and Repeated: Period, Subject, Treatment. Have a look at the Parameter Key-table to find out treatments’ coding. With my coding Var(Period*Treatment*Subject)_21 is s²wR and Var(Period*Treatment*Subject)_22 is s²wT.

❝ And still we have to submit the appropriate tests of all effects in the model. Hm, makes me wonder, if authorities are actually looking into the Core outputs regularly…


ElMaestro would say that chances are 0.0000001% or lower. At least in the EU deficiency letters of the type “There is a significant sequence effect in ANOVA. Please justify.” almost stopped.

❝ And when we use the simple Core output, testing by PM will then be included also, as “The results […] are given in the Average Bioequivalence output worksheet and at the end of the Core output {Users Guide 6.4}.”


Yep. I always use the core output myself (M$-Word export is awful). In v6.3 and earlier I deleted irrelevant or obsolete stuff (Westlake’s CI, Anderson-Hauck, “Power”). Had an SOP for it. ;-) Now I have an R-script for that.

❝ From what I learned from other people, I can understand the condemnation of using a fixed effect for subjects.


If one looks only at the numbers, results are the same. Since I don’t want to make a statement about subjects in this particular study only but to extrapolate to the population I prefer a random effect.

❝ From some reports I saw I personally can appreciate, that in a 2x2 setting subjects who cannot contribute to the T/R comparison (missing data in one/two periods) will be omitted “automatically” instead of getting the missing data imputed. That’s a nice side effect making the evaluation keeping in line with the EU-GL.


Yes. In v6.4 you can set it in the Preferences.
LinMixBioequivalence > Default for 2×2 crossover set to all fixed effects

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Relaxation
★    

Germany,
2015-02-26 13:50
(3318 d 04:22 ago)

@ Helmut
Posting: # 14506
Views: 6,449
 

 Mixed vs. fixed effects (mainly)

Hello everybody and in particular Helmut and ElMaestro.

I really appreciate the recommendations on how to proceed in thinking (and learning) and will try the recommended comparison.

❝ Hhm, not sure which coding you used.


The same, with identical outcome. Sorry, that 1_2 was a typo and should have been 2_2.

Best regards,

Steven.
UA Flag
Activity
 Admin contact
22,957 posts in 4,819 threads, 1,638 registered users;
81 visitors (0 registered, 81 guests [including 11 identified bots]).
Forum time: 18:13 CET (Europe/Vienna)

Nothing shows a lack of mathematical education more
than an overly precise calculation.    Carl Friedrich Gauß

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5