Interlude II (simulations) [RSABE / ABEL]

posted by Helmut Homepage – Vienna, Austria, 2020-08-19 23:31 (1369 d 11:29 ago) – Posting: # 21889
Views: 5,793

Dear all,

I simulated 500 data sets in the partial replicate design with \(\small{s_\textrm{wT}^2=s_\textrm{wR}^2=0.086\: (CV_\textrm{w}\approx 29.97\%),}\) \(\small{s_\textrm{bT}^2=s_\textrm{bR}^2=0.172\: (CV_\textrm{b}\approx 43.32\%),}\) \(\small{\rho=1},\) \(\small{\theta_0=1},\) i.e., no subject-by-formulation interaction. With \(\small{n=24}\) subjects 82.3797% power to demonstrate ABE. Evaluation in Phoenix/WinNonlin 8.1 with the FDA’s covariance structure FA0(2). Singularity tolerance and convergence criterion 1E-12 (instead of 1E-10), maximum iterations 250 (instead of 50).
If you want to try it in SAS or any other software: The data sets in CSV-format.

In 403 (80.6%) of the data sets PHX issued at least one warning.
In 56 (11.2%) of the data sets PHX threw this:

Negative final variance component. Consider omitting this VC structure.

Well roared lion. The model reached for the stars (namely \(\small{s_\textrm{wT}^2}\)). We know that we can get only the total variability (i.e., \(\small{s_\textrm{T}^2=s_\textrm{wT}^2+s_\textrm{bT}^2}\)) like in a parallel design where the within-subject variance (as well as the between variance) is not accessible as well. Amazingly in the other cases PHX got an estimate though it’s nonsense, of course.

In 340 (68%) of the data sets I was told:

Model may be over-specified. A simpler model could be tried.

Oh yes! »May be« is an euphemism. Actually it is. Not only in some of the data sets but in all.

In 333 (66.6%) of the data sets PHX threw this:

Newton's algorithm converged with modified Hessian. Output is suspect.

How I love to be told that results are suspect. Will assessors love that as well? Well, in PHX it’s hidden in the ‘Core Output’ and the ‘Warnings and Errors’. In SAS in a log-file… Will it be shown in your fancy output? Not necessarily.

79.6% of the data sets passed BE. That’s only slightly lower than expected and likely due to the small number of simulations (10,000 are running).
How close are the estimates to the targets?

                s²wR     s²bR     s²T       S×F      PE
target         0.08600  0.17200  0.25800  0        100.00
mean estimate  0.08733  0.16923  0.22655  0.01090   98.48
%RE            +1.54%   –1.61%   –12.19%   –       –1.52%

Fine with me, except \(\small{s_\textrm{T}^2}\) which is not directly estimated (see this lengthy thread for a promising alternative) but as the sum of two doubtful estimates (\(\small{s_\textrm{wT}^2,s_\textrm{bT}^2}\)).

We see also that the optimizer is fine in estimating CVwR but desperate with CVwT (only 437 values). The target was 29.9677% for both.

       min    QI     med    QIII   max
CVwR  14.25  26.75  30.11  33.20  46.13
CVwT   1.33  18.29  23.23  28.05  45.33

I evaluated the data sets with other covariance structures as well. Seems that FA0(1) is the winner.

Convergence          FA0(2)      FA0(1)        CS
Achieved          160 (32.0%)  500 (100%)  500 (100%)
Modified Hessian  340 (68.0%)     –           –

Warnings                       FA0(2)      FA0(1)      CS
Modified Hessian            333 (66.6%)     –          –
Negative variance component  56 (11.2%)  25 (5.0%)  56 (11.2%)
Both                         14 ( 2.8%)     –          –

As long as we achieve convergence, it doesn’t matter (we have seen nasty data sets in the past, where FA0(2) didn’t converge). Perhaps as long as the data set is balanced and/or does not contain ‘outliers’, all is good. At the end of the day we are interested in the 90% CI. I compared the results obtained with FA0(1) and CS to the guidances’ FA0(2). Up to the 4th decimal (rounded to percent, i.e., 6–7 significant digits) the CI was identical in all cases. Only when I looked at the 5th decimal for both covariance structures, 1/500 differed (the CI was wider). Since all guidelines require rounding to the 2nd decimal, that’s not relevant.

I’m not a friend of the EMA’s ‘all effects fixed’ model because it assumes identical variances of T and R (which has be shown to be wrong in many full replicate studies). But, of course, no issues with convergence in this simple linear model.

My original simulation code contained a stupid error (THX to Detlew for detecting it!) which lead to an extreme S×F-interaction. Example of one data set where the optimizer was in deep trouble. The default maximum iterations in PHX/WNL are 50. I got:

max.iter    s²wR     %RE  -2REML LL   AIC    BIC       df         90% CI
    50    0.084393  –1.87  39.368   61.368  85.455  22.10798  82.212–111.084
   250    0.085094  –1.05  39.345   61.345  85.431  22.18344  82.223–111.070
 1,250    0.085271  –0.85  39.339   61.339  85.425  22.20523  82.225–111.066
 6,250    0.085309  –0.80  39.338   61.338  85.424  22.20991  82.226–111.066
31,250    0.085317  –0.79  39.338   61.338  85.424  22.21032  82.226–111.066

A lament in all cases:

Failed to converge in allocated number of iterations. Output is suspect.

Note that the degrees of freedom increase with the number of iterations and hence, the CI narrows. Now I understand why Health Canada requires that the optimizer’s constraints are stated in the SAP.


Welcome to the hell of mixed effects modeling. ?

Edit 1: Results of a large data set (10,000 simulations, 20.5 MB in CSV-format). 80.95% passed BE in all setups. Relevant estimates were identical and pretty close to the targets:

                s²wR      PE
target         0.08600  100.00
mean estimate  0.08584   99.97
%RE            –0.19%   –0.03%

Convergence        FA0(2)  FA0(1)   CS
Achieved          30.14%   99.97%  100%
Modified Hessian  69.83%     –      –
> max. iter.       0.03%    0.03%   –

Warnings                    FA0(2)  FA0(1)   CS
Modified Hessian            68.75%    –      –
Negative variance component  9.01%  3.82%  11.15%
Both                         2.22%  0.06%    –

Given all that, I would opt for FA0(1).

Edit 2: I manipulated the small data sets. Removed subject 24 to make the study imbalanced, removed the last period (T) of subject 23 to make it incomplete. Multiplied T of subject 1 with 5–10 to mimic an ‘outlier’. Only 15.4% of studies passed. Yep, even a single outlier might be the killer. 381 warnings by FA0(2), 2 by FA0(1), and 7 by CS.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
 Admin contact
23,033 posts in 4,835 threads, 1,643 registered users;
42 visitors (0 registered, 42 guests [including 5 identified bots]).
Forum time: 11:01 CEST (Europe/Vienna)

Competence, like truth, beauty and contact lenses,
is in the eye of the beholder.    Laurence J. Peter

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz