Bioequivalence and Bioavailability Forum • Simulations 101

Helmut
★★★

Vienna, Austria,
2020-05-27 01:47
(2209 d 23:47 ago)

Posting: # 21483
Views: 7,703

Simulations 101 [Power / Sample Size]

Dear all,

off-list I was asked about some discrepancies between sample sizes estimated by PowerTOST’s functions sampleN.scABEL(), sampleN.RSABE() and the tables of the Two Lászlós¹ ( :waving:

Charlie).

Short answer
Not relevant in practice.
Longer answer
Unless the same (pseudo-random) number generator with the same seed and the same number of simulations is used, the outcome will always differ. The Two Lászlós simulated 10,000 studies, whereas in the functions of PowerTOST 100,000 are simulated by default.

Excursion: An introduction to Monte Carlo Simulations.

With a reasonably large number of simulations the empiric value will converge to the true value. How fast a sufficiently accurate result is obtained, depends on the problem. Hence, it is a good idea to assess that before you set the number of simulations. See IV and V below why we opted for 100,000 as the default in PowerTOST.

Rolling two dice
Here it’s easy to get the average sum of pips. If you remember probability and combinatorics, you will get the answer 7. If you are a gambler, you simply know it. If you have two handy, roll them. Of course, you will get anything between the possible extremes (2 and 12) in a single roll. But you will need many rolls to come close to the average 7. Since this is a boring excercise (unless you find an idiot betting against 7), let do the job:
Estimating π
We generate a set of uniformly distributed coordinates {x|y} in the range {–1, +1}. According to Mr Pythagoras all x² + y² ≤ 1 are at or within the unit circle and coded in blue. Since $A = \frac{\pi \, d^2}{4}$ and $d = 1$, we get an estimate of π by 4 times the number of coded points (the “area”) divided by the number of simulations. Here for 50,000:

This was a lucky punch since the convergence is lousy. To get a stable estimate with just four significant digits, we need at least 100 million (‼) simulations. Not very efficient.
Type I Error of TOST
We know that α (probability of the Type I Error) never exceeds the nominal level of the test (generally 0.05). In e.g., a 2×2×2 crossover with CV 20% and 20 subjects it is 0.049999895… We can simulate studies with a GMR at one of the borders of the acceptance range and assess them for ABE as well. We expect ≤5% to pass. The number of passing studies divided by the number of simulations gives the empiric Type I Error. I used the function power.TOST.sim().

The filled circles are runs with a fixed seed of 123456 and open circles ones with random seeds.
We see that convergence of this problem is bad. Therefore, e.g., in Two-Stage Designs 10⁶ simulations are common.
Sample size for ABEL (EMA)
Table A2¹ gives for a 4-period full replicate study (EMA method), GMR 0.85, CV 50%, and ≥80% power a sample size of 54, whilst sampleN.scABEL() gives only 52.

Five runs. As before filled circles are with a fixed seed (default) and open circles with random seeds.
With sampleN.scABEL() and only 10,000 simulations I got* five times 52 and once 54 (like The Lászlós). If you have more runs, you will get in some of them more extreme ones (in 1,000 runs of 10,000 I got x̃ 52 and a range of 48–56). However, this range shows that the result is not stable. Hence, we need more simulations to end up with 52 as a stable result. Although you are free to opt for more than 100,000 it’s a waste of time.
Sample size for RSABE (FDA)
Table A4¹ gives for a 4-period full replicate study (FDA method), GMR 0.90, CV 70%, and ≥90% power a sample size of only 46, whilst sampleN.RSABE() gives 48.

Filled circles are with a fixed seed (default) and open circles with random seeds.
With sampleN.RSABE() and 10,000 simulations I got* once 46 (like The Lászlós), once 48, and thrice 50. In 1,000 runs of 10,000 I got x̃ 48 and a range of 44–52. With more simulations we end up with 48 as a stable result.

Long answer for geeks
László Tóthfalusi coded in MatLab. Though – like in – the Mersenne Twister is the standard (pseudo-random) number generator, its implementation might differ. Even with the same seed and the same number of simulations different results may be obtained.
For a comparison with the tables, execute the scripts test_ABEL.R and test_RSABE.R in the folder /library/PowerTOST/tests. Patience, please; esp. the first one takes ages to complete. If you want to get a faster result, change in test_ABEL.R lines 71 and 80 from sdsims = TRUE to sdsims = FALSE.
Another discrepancy arises from the fact that all sample size functions of PowerTOST return balanced sequences, whereas the Two Lászlós reported the smallest sample size which gives at least the desired power. Therefore, in Tables A1/A3 you find some even numbers (for three sequences) and in Tables A2/A4 some odd numbers (for two sequences), whereas sampleN.RSABE() and sampleN.scABEL() will round up to the next multiple of three and two, respectively.
The Two Lászlós simulated subjects, whereas in sampleN.RSABE()/sampleN.scABEL() the ‘key’ statistics² are used (GMR lognormal-distributed and s² χ²-distributed). The latter approach is easily 100 times faster than the former.
- For ABEL subject simulations are provided by the function sampleN.scABEL.sdsims(). Only if you suspect heterogenicity (i.e., give CV as a vector with two elements, where the first one is CV_wT and the second CV_wR), you plan a partial replicate design, and CV_wT > CV_wR, this is the method of choice. If CV_wT ≤ CV_wR and for full replicate designs we recommend sampleN.scABEL() for speed reasons. See also the vignette.
- Sorry, subject simulations are not ~~implemented~~ possible for the FDA’s RSABE. Since the FDA’s mixed-effects model for ABE (applicable if s_wR < 0.294) cannot be set up in , intra-subject contrasts are used. However, it is a reasonably close approximation. For a comparison with SAS see page 16 of the file Implementation_scaledABE_sims.pdf in the folder /library/PowerTOST/doc.
  Furthermore, likely they used the rounded regulatory constant $\small{0.893}$ and not the exact $\small{\log_{e}(1.25)/0.25=0.8925742\ldots}$ like sampleN.RSABE(). You can try
  reg <- reg_const("FDA") reg$r_const <- 0.893 reg$name <- "The Laszlos\u2019 likely"
  and call sampleN.RSABE(..., regulator = reg). In re-calculating the entire tables with 100,000 sim’s I found five of the 352 scenarios where the sample size would be by one sequence smaller. However, I would not go that way because if you will evaluate the study with the SAS-code of the progesterone guidance or the template for Phoenix/WinNonlin, the exact regulatory constant will be used.
  Another story is at which CV (on the average) we will switch in the simulations. You can try the following code to compare sample sizes for ABE and RSABE:
  library(PowerTOST) fun <- function(x) { n.1 <- sampleN.TOST(CV = x, theta0 = theta0, targetpower = target, design = design, details = FALSE, print = FALSE)[["Sample size"]] n.2 <- sampleN.RSABE(CV = x, theta0 = theta0, targetpower = target, design = design, details = FALSE, print = FALSE)[["Sample size"]] return(n.1- n.2) } theta0 <- 0.9 target <- 0.8 design <- "2x2x4" cat("Equal sample sizes for ABE and RSABE at CV \u2264", signif(uniroot(fun, interval=c(0.2, 0.35), extendInt="yes")$root, 3), "\n")
  Given that, it hardly matters for true HVD(P)s. Note that the tables of the Two Lászlós don’t reach below 30% and they stated in the discussion:
  - In view of the consequences of the mixed approach, it could be judicious to consider larger numbers of subjects at variations fairly close to 30%.
If you want to reproduce simulations, keep the default argument sampleN.foo(..., setseed = TRUE). Recommended, unless you want to assess the performance and convergence. However, results with 100,000 simulations are extremely stable. For the ABEL example I got in 1,000 runs with random seeds 995 times 52 and just five times 54.

Tóthfalusi L, Endrényi L. Sample Sizes for Designing Bioequivalence Studies for Highly Variable Drugs. J Pharm Pharmaceut Sci. 2011;15(1):73–84. Open access.
Zheng C, Wang J, Zhao L. Testing bioequivalence for multiple formulations with power and sample size calculations. Pharmaceut. Stat. 2012;11(4):334–341. doi:10.1002/pst.1522.

If you perform five runs with 10,000 simulations, you might get different numbers. That’s the effect of random seeds and a low number of simulations.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

d_labes ★★★ Berlin, Germany, 2020-05-27 21:30 (2209 d 04:04 ago) @ Helmut Posting: # 21485 Views: 6,018	Simulations 101 Post reply
	Dear Helmut! Can't enough! One small correction: The graphics under "Sample size for RSABE" is apparent not correct here. — Regards, Detlew

Helmut
★★★

Vienna, Austria,
2020-05-27 23:06
(2209 d 02:28 ago)

@ d_labes
Posting: # 21486
Views: 6,054

Simulations 101

Post reply

Dear Detlew!

❝ :clap: Can't :clap: enough!

THX!

❝ One small correction: The graphics under "Sample size for RSABE" is apparent not correct here.

Hhm, why?

library(PowerTOST) library(magicaxis) runs <- 5 nsims <- as.integer(10^(seq(3, 6, 0.1))) res <- data.frame(nsims = nsims) for (k in seq_along(nsims)) { for (j in 1:runs) { ifelse (j == 1, setseed <- TRUE, setseed <- FALSE) res[k, j+1] <- sampleN.RSABE(theta0 = 0.9, CV = 0.7, nsims = nsims[k], setseed = setseed, design = "2x2x4", targetpower = 0.9, print = FALSE, details = FALSE)[["Sample size"]] } } ylim <- range(res[, 2:(runs+1)], na.rm = TRUE) dev.new(width = 5, height = 5) op <- par(no.readonly = TRUE) par(pty = "s") plot(res$nsims, res[, 1], type = "n", log = "x", ylim = ylim, axes = FALSE, xlab = "Number of simulations", ylab = "Sample size") grid(); box() magaxis(1) magaxis(2, minorn = 2, las = 1) for (j in 1:runs) { ifelse (j == 1, pch <- 16, pch <- 1) points(res$nsims, res[, j+1], pch = pch, col = "blue", cex = 1.15) } points(1e5, 48, pch = 16, col = "#00AA00", cex = 1.15) # default points(1e4, 46, pch = 16, col = "red", cex = 1.15) # Lászlós par(op) print(res, row.names = FALSE) nsims V2 V3 V4 V5 V6 1000 50 48 54 52 50 1258 50 48 52 50 52 1584 50 50 52 50 56 1995 50 48 50 50 50 2511 48 48 48 48 52 3162 48 50 50 50 50 3981 46 48 50 50 48 5011 46 50 50 50 48 6309 46 48 48 48 50 7943 46 50 48 48 50 10000 46 50 48 50 50 12589 46 50 48 50 48 15848 48 48 48 48 48 19952 48 48 50 50 48 25118 48 48 50 48 48 31622 48 48 48 48 48 39810 48 48 50 48 50 50118 48 48 48 48 48 63095 48 48 48 48 48 79432 48 48 48 48 48 100000 48 48 48 48 48 125892 48 48 48 48 48 158489 48 48 48 48 48 199526 48 48 48 48 48 251188 48 48 48 48 48 316227 48 48 48 48 48 398107 48 48 48 48 48 501187 48 48 48 48 48 630957 48 48 48 48 48 794328 48 48 48 48 48 1000000 48 48 48 48 48

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

d_labes ★★★ Berlin, Germany, 2020-05-28 20:34 (2208 d 05:00 ago) @ Helmut Posting: # 21487 Views: 6,056	Simulations 101 Post reply
	Dear Helmut! ❝ ❝ One small correction: The graphics under "Sample size for RSABE" is apparent not correct here. ❝ ❝ Hhm, why? Today it looks reasonable. Maybe I had a Halunkination ). — Regards, Detlew

Helmut
★★★

Vienna, Austria,
2020-05-29 15:30
(2207 d 10:03 ago)

@ Helmut
Posting: # 21490
Views: 5,919

More about the tables

Post reply

Dear sample size geeks, nerds, and simul-ants (weirdos for short),

for each combination of design, power, GMR, and CV: Red circles = sample sizes of the Two Lászlós (if >201 none). Green bars = sampleN.scABEL(..., nsims = 1e5) or sampleN.RSABE(..., nsims = 1e5); always balanced sequences. Blue circles = twenty runs simulated with 10,000 sim’s each (random seeds). Starting from the respective result, I decreased the sample size whilst maintaining power. This might give unbalanced sample sizes (even sample sizes for the partial replicate and odd ones for the full replicate).

ABEL (EMA)

3-period 3-sequence partial replicate [TRR|RTR|RTT]
- Power 80%
- Power 90%
4-period 2-sequence full replicate [TRTR|RTRT]
- Power 80%
- Power 90%

Now a comparison of the reported sample sizes (y-axis) vs. PowerTOST’ (x-axis). Blue from the tables, and red up-rounded to the next complete sequence. Do the linear fits closer to the unity line? Sometimes yes, sometimes not.

3-period 3-sequence partial replicate [TRR|RTR|RTT]
- Power 80%
- Power 90%
4-period 2-sequence full replicate [TRTR|RTRT]
- Power 80%
- Power 90%

RSABE (FDA)

3-period 3-sequence partial replicate [TRR|RTR|RTT]
- Power 80%
- Power 90%
4-period 2-sequence full replicate [TRTR|RTRT]
- Power 80%
- Power 90%

Comparison of the reported sample sizes (y-axis) vs. PowerTOST’ (x-axis). Blue from the tables, and red up-rounded to the next complete sequence. Do the linear fits get closer to the unity line? Generally yes.

3-period 3-sequence partial replicate [TRR|RTR|RTT]
- Power 80%
- Power 90%
4-period 2-sequence full replicate [TRTR|RTRT]
- Power 80%
- Power 90%

The upper part of the RSABE sample sizes looks disturbing first. Explanation: Since the sample sizes are generally smaller than for ABEL, the many imbalanced sequences have a larger impact. However, rounding up to the next complete sequence (red circles) in the lower part gives a better match (closer to the unity line).

Maybe you can make sumfink out of it. I still hold that 10,000 simulations are too few.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes