## Simulations 101 [Power / Sample Size]

Dear all,

off-list I was asked about some discrepancies between sample sizes estimated by

Excursion: An introduction to Monte Carlo Simulations.

off-list I was asked about some discrepancies between sample sizes estimated by

`PowerTOST`

’s functions `sampleN.scABEL()`

, `sampleN.RSABE()`

and the tables of the Two Lászlós^{1}( Charlie).- Short answer

Not relevant in practice.

- Longer answer

Unless the*same*(pseudo-random) number generator with the*same*seed and the*same*number of simulations is used, the outcome will*always*differ. The Two Lászlós simulated 10,000 studies, whereas in the functions of`PowerTOST`

100,000 are simulated by default.

Excursion: An introduction to Monte Carlo Simulations.

With a reasonably large number of simulations the empiric value will converge to the true value. How fast a sufficiently accurate result is obtained, depends on the problem. Hence, it is a good idea to assess that before you set the number of simulations. See IV and V below why we opted for 100,000 as the default in `PowerTOST`

.

- Rolling two dice

Here it’s easy to get the average sum of pips. If you remember probability and combinatorics, you will get the answer 7. If you are a gambler, you simply*know*it. If you have two handy, roll them. Of course, you will get anything between the possible extremes (2 and 12) in a single roll. But you will need many rolls to come*close*to the average 7. Since this is a boring excercise (unless you find an idiot betting against 7), let**R**do the job:

- Estimating π

We generate a set of uniformly distributed coordinates {*x*|*y*} in the range {–1, +1}. According to Mr Pythagoras all*x*^{2}+*y*^{2}≤ 1 are at or within the unit circle and coded in blue. Since \(A = \frac{\pi \, d^2}{4}\) and \(d = 1\), we get an estimate of π by 4 times the number of coded points (the “area”) divided by the number of simulations. Here for 50,000:

This was a lucky punch since the convergence is lousy. To get a*stable*estimate with just*four*significant digits, we need at least 100 million (‼) simulations. Not very efficient.

- Type I Error of TOST

We know that α (probability of the Type I Error) never exceeds the nominal level of the test (generally 0.05). In*e.g.*, a 2×2×2 crossover with CV 20% and 20 subjects it is 0.049999895… We can simulate studies with a GMR at one of the borders of the acceptance range and assess them for ABE as well. We expect ≤5% to pass. The number of passing studies divided by the number of simulations gives the empiric Type I Error. I used the function`power.TOST.sim()`

.

The filled circles are runs with a fixed seed of 123456 and open circles ones with random seeds.

We see that convergence of this problem is bad. Therefore,*e.g.*, in Two-Stage Designs 10^{6}simulations are common.

- Sample size for ABEL (EMA)

Table A2^{1}gives for a 4-period full replicate study (EMA method), GMR 0.85, CV 50%, and ≥80% power a sample size of**54**, whilst`sampleN.scABEL()`

gives only**52**.

Five runs. As before filled circles are with a fixed seed (default) and open circles with random seeds.

With`sampleN.scABEL()`

and only 10,000 simulations I got**five*times 52 and*once*54 (like The Lászlós). If you have more runs, you will get in some of them more extreme ones (in 1,000 runs of 10,000 I got x̃ 52 and a range of 48–56). However, this range shows that the result is not stable. Hence, we need more simulations to end up with 52 as a*stable*result. Although you are free to opt for more than 100,000 it’s a waste of time.

- Sample size for RSABE (FDA)

Table A4^{1}gives for a 4-period full replicate study (FDA method), GMR 0.90, CV 70%, and ≥90% power a sample size of only**46**, whilst`sampleN.RSABE()`

gives**48**.

Filled circles are with a fixed seed (default) and open circles with random seeds.

With`sampleN.RSABE()`

and 10,000 simulations I got**once*46 (like The Lászlós),*once*48, and*thrice*50. In 1,000 runs of 10,000 I got x̃ 48 and a range of 44–52. With more simulations we end up with 48 as a*stable*result.

- Long answer for geeks

László Tóthfalusi coded in**MatLab**. Though – like in**R**– the Mersenne Twister is the standard (pseudo-random) number generator, its implementation might differ. Even with the same seed and the same number of simulations different results*may*be obtained.

For a comparison with the tables, execute the scripts`test_ABEL.R`

and`test_RSABE.R`

in the folder /library/PowerTOST/tests. Patience, please; esp. the first one takes ages to complete. If you want to get a faster result, change in`test_ABEL.R`

lines 71 and 80 from`sdsims = TRUE`

to`sdsims = FALSE`

.

Another discrepancy arises from the fact that all sample size functions of`PowerTOST`

return*balanced*sequences, whereas the Two Lászlós reported the smallest sample size which gives at least the desired power. Therefore, in Tables A1/A3 you find some*even*numbers (for three sequences) and in Tables A2/A4 some*odd*numbers (for two sequences), whereas`sampleN.RSABE()`

and`sampleN.scABEL()`

will round up to the next multiple of three and two, respectively.

The Two Lászlós simulated subjects, whereas in`sampleN.RSABE()`

/`sampleN.scABEL()`

the ‘key’ statistics^{2}are used (GMR lognormal-distributed and s^{2}χ^{2}-distributed). The latter approach is easily 100 times faster than the former.- For ABEL subject simulations are provided by the function
`sampleN.scABEL.sdsims()`

. Only if you suspect heterogenicity (*i.e.*, give CV as a vector with two elements, where the first one is CV_{wT}and the second CV_{wR}), you plan a*partial*replicate design,*and*CV_{wT}> CV_{wR}, this is the method of choice. If CV_{wT}≤ CV_{wR}and for full replicate designs we recommend`sampleN.scABEL()`

for speed reasons. See also the vignette.

- Sorry, subject simulations are not
~~implemented~~possible for the FDA’s RSABE. Since the FDA’s mixed-effects model for ABE (applicable if s_{wR}< 0.294) cannot be setup in**R**, intra-subject contrasts are used. However, it is a reasonably close approximation. For a comparison with SAS see page 16 of the file Implementation_scaledABE_sims.pdf in the folder /library/PowerTOST/doc.

Furthermore, likely they used the*rounded*regulatory constant \(\small{0.893}\) and not the*exact*\(\small{\log_{e}(1.25)/0.25=0.8925742\ldots}\) like`sampleN.RSABE()`

. You can try

`reg <- reg_const("FDA")`

reg$r_const <- 0.893

reg$name <- "The Laszlos\u2019 likely"

and call`sampleN.RSABE(..., regulator = reg)`

. In re-calculating the entire tables with 100,000 sim’s I found five of the 352 scenarios where the sample size would be by one sequence smaller. However, I would not go that way because if you will evaluate the study with the SAS-code of the progesterone guidance or the template for Phoenix/WinNonlin, the exact regulatory constant will be used.

Another story is at which CV (on the average) we will switch in the simulations. You can try the following code to compare sample sizes for ABE and RSABE:

`library(PowerTOST)`

fun <- function(x) {

n.1 <- sampleN.TOST(CV = x, theta0 = theta0, targetpower = target,

design = design, details = FALSE,

print = FALSE)[["Sample size"]]

n.2 <- sampleN.RSABE(CV = x, theta0 = theta0, targetpower = target,

design = design, details = FALSE,

print = FALSE)[["Sample size"]]

return(n.1- n.2)

}

theta0 <- 0.9

target <- 0.8

design <- "2x2x4"

cat("Equal sample sizes for ABE and RSABE at CV \u2264",

signif(uniroot(fun, interval=c(0.2, 0.35),

extendInt="yes")$root, 3), "\n")

Given that, it hardly matters for*true*HVD(P)s. Note that the tables of the Two Lászlós don’t reach below 30% and they stated in the discussion:*In view of the consequences of the mixed approach, it could be judicious to consider larger numbers of subjects at variations fairly close to 30%.*

If you want to*reproduce*simulations, keep the default argument`sampleN.foo(..., setseed = TRUE)`

. Recommended, unless you want to assess the performance and convergence. However, results with 100,000 simulations are extremely stable. For the ABEL example I got in 1,000 runs with random seeds 995 times 52 and just five times 54. - For ABEL subject simulations are provided by the function

- Tóthfalusi L, Endrényi L.
*Sample Sizes for Designing Bioequivalence Studies for Highly Variable Drugs.*J Pharm Pharmaceut Sci. 2011;15(1):73–84. Open access.

- Zheng C, Wang J, Zhao L.
*Testing bioequivalence for multiple formulations with power and sample size calculations.*Pharmaceut. Stat. 2012;11(4):334–341. doi:10.1002/pst.1522.

- If you perform five runs with 10,000 simulations, you might get different numbers. That’s the effect of random seeds and a low number of simulations.

—

Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮

Science Quotes

*Dif-tor heh smusma*🖖Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮

Science Quotes

### Complete thread:

- Simulations 101Helmut 2020-05-26 23:47 [Power / Sample Size]
- Simulations 101 d_labes 2020-05-27 19:30
- Simulations 101 Helmut 2020-05-27 21:06
- Simulations 101 d_labes 2020-05-28 18:34

- Simulations 101 Helmut 2020-05-27 21:06
- More about the tables Helmut 2020-05-29 13:30

- Simulations 101 d_labes 2020-05-27 19:30